Elyasigomari, V; Lee, D A; Screen, H R C; Shaheed, M H
2017-03-01
For each cancer type, only a few genes are informative. Due to the so-called 'curse of dimensionality' problem, the gene selection task remains a challenge. To overcome this problem, we propose a two-stage gene selection method called MRMR-COA-HS. In the first stage, the minimum redundancy and maximum relevance (MRMR) feature selection is used to select a subset of relevant genes. The selected genes are then fed into a wrapper setup that combines a new algorithm, COA-HS, using the support vector machine as a classifier. The method was applied to four microarray datasets, and the performance was assessed by the leave one out cross-validation method. Comparative performance assessment of the proposed method with other evolutionary algorithms suggested that the proposed algorithm significantly outperforms other methods in selecting a fewer number of genes while maintaining the highest classification accuracy. The functions of the selected genes were further investigated, and it was confirmed that the selected genes are biologically relevant to each cancer type. Copyright © 2017. Published by Elsevier Inc.
Recursive feature selection with significant variables of support vectors.
Tsai, Chen-An; Huang, Chien-Hsun; Chang, Ching-Wei; Chen, Chun-Houh
2012-01-01
The development of DNA microarray makes researchers screen thousands of genes simultaneously and it also helps determine high- and low-expression level genes in normal and disease tissues. Selecting relevant genes for cancer classification is an important issue. Most of the gene selection methods use univariate ranking criteria and arbitrarily choose a threshold to choose genes. However, the parameter setting may not be compatible to the selected classification algorithms. In this paper, we propose a new gene selection method (SVM-t) based on the use of t-statistics embedded in support vector machine. We compared the performance to two similar SVM-based methods: SVM recursive feature elimination (SVMRFE) and recursive support vector machine (RSVM). The three methods were compared based on extensive simulation experiments and analyses of two published microarray datasets. In the simulation experiments, we found that the proposed method is more robust in selecting informative genes than SVMRFE and RSVM and capable to attain good classification performance when the variations of informative and noninformative genes are different. In the analysis of two microarray datasets, the proposed method yields better performance in identifying fewer genes with good prediction accuracy, compared to SVMRFE and RSVM.
2013-01-01
Background Gene expression data could likely be a momentous help in the progress of proficient cancer diagnoses and classification platforms. Lately, many researchers analyze gene expression data using diverse computational intelligence methods, for selecting a small subset of informative genes from the data for cancer classification. Many computational methods face difficulties in selecting small subsets due to the small number of samples compared to the huge number of genes (high-dimension), irrelevant genes, and noisy genes. Methods We propose an enhanced binary particle swarm optimization to perform the selection of small subsets of informative genes which is significant for cancer classification. Particle speed, rule, and modified sigmoid function are introduced in this proposed method to increase the probability of the bits in a particle’s position to be zero. The method was empirically applied to a suite of ten well-known benchmark gene expression data sets. Results The performance of the proposed method proved to be superior to other previous related works, including the conventional version of binary particle swarm optimization (BPSO) in terms of classification accuracy and the number of selected genes. The proposed method also requires lower computational time compared to BPSO. PMID:23617960
2012-01-01
Background Previous studies on tumor classification based on gene expression profiles suggest that gene selection plays a key role in improving the classification performance. Moreover, finding important tumor-related genes with the highest accuracy is a very important task because these genes might serve as tumor biomarkers, which is of great benefit to not only tumor molecular diagnosis but also drug development. Results This paper proposes a novel gene selection method with rich biomedical meaning based on Heuristic Breadth-first Search Algorithm (HBSA) to find as many optimal gene subsets as possible. Due to the curse of dimensionality, this type of method could suffer from over-fitting and selection bias problems. To address these potential problems, a HBSA-based ensemble classifier is constructed using majority voting strategy from individual classifiers constructed by the selected gene subsets, and a novel HBSA-based gene ranking method is designed to find important tumor-related genes by measuring the significance of genes using their occurrence frequencies in the selected gene subsets. The experimental results on nine tumor datasets including three pairs of cross-platform datasets indicate that the proposed method can not only obtain better generalization performance but also find many important tumor-related genes. Conclusions It is found that the frequencies of the selected genes follow a power-law distribution, indicating that only a few top-ranked genes can be used as potential diagnosis biomarkers. Moreover, the top-ranked genes leading to very high prediction accuracy are closely related to specific tumor subtype and even hub genes. Compared with other related methods, the proposed method can achieve higher prediction accuracy with fewer genes. Moreover, they are further justified by analyzing the top-ranked genes in the context of individual gene function, biological pathway, and protein-protein interaction network. PMID:22830977
2012-01-01
Background Discovering new biomarkers has a great role in improving early diagnosis of Hepatocellular carcinoma (HCC). The experimental determination of biomarkers needs a lot of time and money. This motivates this work to use in-silico prediction of biomarkers to reduce the number of experiments required for detecting new ones. This is achieved by extracting the most representative genes in microarrays of HCC. Results In this work, we provide a method for extracting the differential expressed genes, up regulated ones, that can be considered candidate biomarkers in high throughput microarrays of HCC. We examine the power of several gene selection methods (such as Pearson’s correlation coefficient, Cosine coefficient, Euclidean distance, Mutual information and Entropy with different estimators) in selecting informative genes. A biological interpretation of the highly ranked genes is done using KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways, ENTREZ and DAVID (Database for Annotation, Visualization, and Integrated Discovery) databases. The top ten genes selected using Pearson’s correlation coefficient and Cosine coefficient contained six genes that have been implicated in cancer (often multiple cancers) genesis in previous studies. A fewer number of genes were obtained by the other methods (4 genes using Mutual information, 3genes using Euclidean distance and only one gene using Entropy). A better result was obtained by the utilization of a hybrid approach based on intersecting the highly ranked genes in the output of all investigated methods. This hybrid combination yielded seven genes (2 genes for HCC and 5 genes in different types of cancer) in the top ten genes of the list of intersected genes. Conclusions To strengthen the effectiveness of the univariate selection methods, we propose a hybrid approach by intersecting several of these methods in a cascaded manner. This approach surpasses all of univariate selection methods when used individually according to biological interpretation and the examination of gene expression signal profiles. PMID:22867264
An Ensemble Framework Coping with Instability in the Gene Selection Process.
Castellanos-Garzón, José A; Ramos, Juan; López-Sánchez, Daniel; de Paz, Juan F; Corchado, Juan M
2018-03-01
This paper proposes an ensemble framework for gene selection, which is aimed at addressing instability problems presented in the gene filtering task. The complex process of gene selection from gene expression data faces different instability problems from the informative gene subsets found by different filter methods. This makes the identification of significant genes by the experts difficult. The instability of results can come from filter methods, gene classifier methods, different datasets of the same disease and multiple valid groups of biomarkers. Even though there is a wide number of proposals, the complexity imposed by this problem remains a challenge today. This work proposes a framework involving five stages of gene filtering to discover biomarkers for diagnosis and classification tasks. This framework performs a process of stable feature selection, facing the problems above and, thus, providing a more suitable and reliable solution for clinical and research purposes. Our proposal involves a process of multistage gene filtering, in which several ensemble strategies for gene selection were added in such a way that different classifiers simultaneously assess gene subsets to face instability. Firstly, we apply an ensemble of recent gene selection methods to obtain diversity in the genes found (stability according to filter methods). Next, we apply an ensemble of known classifiers to filter genes relevant to all classifiers at a time (stability according to classification methods). The achieved results were evaluated in two different datasets of the same disease (pancreatic ductal adenocarcinoma), in search of stability according to the disease, for which promising results were achieved.
The Cross-Entropy Based Multi-Filter Ensemble Method for Gene Selection.
Sun, Yingqiang; Lu, Chengbo; Li, Xiaobo
2018-05-17
The gene expression profile has the characteristics of a high dimension, low sample, and continuous type, and it is a great challenge to use gene expression profile data for the classification of tumor samples. This paper proposes a cross-entropy based multi-filter ensemble (CEMFE) method for microarray data classification. Firstly, multiple filters are used to select the microarray data in order to obtain a plurality of the pre-selected feature subsets with a different classification ability. The top N genes with the highest rank of each subset are integrated so as to form a new data set. Secondly, the cross-entropy algorithm is used to remove the redundant data in the data set. Finally, the wrapper method, which is based on forward feature selection, is used to select the best feature subset. The experimental results show that the proposed method is more efficient than other gene selection methods and that it can achieve a higher classification accuracy under fewer characteristic genes.
2010-01-01
Background Cluster analysis, and in particular hierarchical clustering, is widely used to extract information from gene expression data. The aim is to discover new classes, or sub-classes, of either individuals or genes. Performing a cluster analysis commonly involve decisions on how to; handle missing values, standardize the data and select genes. In addition, pre-processing, involving various types of filtration and normalization procedures, can have an effect on the ability to discover biologically relevant classes. Here we consider cluster analysis in a broad sense and perform a comprehensive evaluation that covers several aspects of cluster analyses, including normalization. Result We evaluated 2780 cluster analysis methods on seven publicly available 2-channel microarray data sets with common reference designs. Each cluster analysis method differed in data normalization (5 normalizations were considered), missing value imputation (2), standardization of data (2), gene selection (19) or clustering method (11). The cluster analyses are evaluated using known classes, such as cancer types, and the adjusted Rand index. The performances of the different analyses vary between the data sets and it is difficult to give general recommendations. However, normalization, gene selection and clustering method are all variables that have a significant impact on the performance. In particular, gene selection is important and it is generally necessary to include a relatively large number of genes in order to get good performance. Selecting genes with high standard deviation or using principal component analysis are shown to be the preferred gene selection methods. Hierarchical clustering using Ward's method, k-means clustering and Mclust are the clustering methods considered in this paper that achieves the highest adjusted Rand. Normalization can have a significant positive impact on the ability to cluster individuals, and there are indications that background correction is preferable, in particular if the gene selection is successful. However, this is an area that needs to be studied further in order to draw any general conclusions. Conclusions The choice of cluster analysis, and in particular gene selection, has a large impact on the ability to cluster individuals correctly based on expression profiles. Normalization has a positive effect, but the relative performance of different normalizations is an area that needs more research. In summary, although clustering, gene selection and normalization are considered standard methods in bioinformatics, our comprehensive analysis shows that selecting the right methods, and the right combinations of methods, is far from trivial and that much is still unexplored in what is considered to be the most basic analysis of genomic data. PMID:20937082
Mao, Yong; Zhou, Xiao-Bo; Pi, Dao-Ying; Sun, You-Xian; Wong, Stephen T C
2005-10-01
In microarray-based cancer classification, gene selection is an important issue owing to the large number of variables and small number of samples as well as its non-linearity. It is difficult to get satisfying results by using conventional linear statistical methods. Recursive feature elimination based on support vector machine (SVM RFE) is an effective algorithm for gene selection and cancer classification, which are integrated into a consistent framework. In this paper, we propose a new method to select parameters of the aforementioned algorithm implemented with Gaussian kernel SVMs as better alternatives to the common practice of selecting the apparently best parameters by using a genetic algorithm to search for a couple of optimal parameter. Fast implementation issues for this method are also discussed for pragmatic reasons. The proposed method was tested on two representative hereditary breast cancer and acute leukaemia datasets. The experimental results indicate that the proposed method performs well in selecting genes and achieves high classification accuracies with these genes.
Advances in metaheuristics for gene selection and classification of microarray data.
Duval, Béatrice; Hao, Jin-Kao
2010-01-01
Gene selection aims at identifying a (small) subset of informative genes from the initial data in order to obtain high predictive accuracy for classification. Gene selection can be considered as a combinatorial search problem and thus be conveniently handled with optimization methods. In this article, we summarize some recent developments of using metaheuristic-based methods within an embedded approach for gene selection. In particular, we put forward the importance and usefulness of integrating problem-specific knowledge into the search operators of such a method. To illustrate the point, we explain how ranking coefficients of a linear classifier such as support vector machine (SVM) can be profitably used to reinforce the search efficiency of Local Search and Evolutionary Search metaheuristic algorithms for gene selection and classification.
Wang, Shuaiqun; Aorigele; Kong, Wei; Zeng, Weiming; Hong, Xiaomin
2016-01-01
Gene expression data composed of thousands of genes play an important role in classification platforms and disease diagnosis. Hence, it is vital to select a small subset of salient features over a large number of gene expression data. Lately, many researchers devote themselves to feature selection using diverse computational intelligence methods. However, in the progress of selecting informative genes, many computational methods face difficulties in selecting small subsets for cancer classification due to the huge number of genes (high dimension) compared to the small number of samples, noisy genes, and irrelevant genes. In this paper, we propose a new hybrid algorithm HICATS incorporating imperialist competition algorithm (ICA) which performs global search and tabu search (TS) that conducts fine-tuned search. In order to verify the performance of the proposed algorithm HICATS, we have tested it on 10 well-known benchmark gene expression classification datasets with dimensions varying from 2308 to 12600. The performance of our proposed method proved to be superior to other related works including the conventional version of binary optimization algorithm in terms of classification accuracy and the number of selected genes.
Aorigele; Zeng, Weiming; Hong, Xiaomin
2016-01-01
Gene expression data composed of thousands of genes play an important role in classification platforms and disease diagnosis. Hence, it is vital to select a small subset of salient features over a large number of gene expression data. Lately, many researchers devote themselves to feature selection using diverse computational intelligence methods. However, in the progress of selecting informative genes, many computational methods face difficulties in selecting small subsets for cancer classification due to the huge number of genes (high dimension) compared to the small number of samples, noisy genes, and irrelevant genes. In this paper, we propose a new hybrid algorithm HICATS incorporating imperialist competition algorithm (ICA) which performs global search and tabu search (TS) that conducts fine-tuned search. In order to verify the performance of the proposed algorithm HICATS, we have tested it on 10 well-known benchmark gene expression classification datasets with dimensions varying from 2308 to 12600. The performance of our proposed method proved to be superior to other related works including the conventional version of binary optimization algorithm in terms of classification accuracy and the number of selected genes. PMID:27579323
Gene selection for tumor classification using neighborhood rough sets and entropy measures.
Chen, Yumin; Zhang, Zunjun; Zheng, Jianzhong; Ma, Ying; Xue, Yu
2017-03-01
With the development of bioinformatics, tumor classification from gene expression data becomes an important useful technology for cancer diagnosis. Since a gene expression data often contains thousands of genes and a small number of samples, gene selection from gene expression data becomes a key step for tumor classification. Attribute reduction of rough sets has been successfully applied to gene selection field, as it has the characters of data driving and requiring no additional information. However, traditional rough set method deals with discrete data only. As for the gene expression data containing real-value or noisy data, they are usually employed by a discrete preprocessing, which may result in poor classification accuracy. In this paper, we propose a novel gene selection method based on the neighborhood rough set model, which has the ability of dealing with real-value data whilst maintaining the original gene classification information. Moreover, this paper addresses an entropy measure under the frame of neighborhood rough sets for tackling the uncertainty and noisy of gene expression data. The utilization of this measure can bring about a discovery of compact gene subsets. Finally, a gene selection algorithm is designed based on neighborhood granules and the entropy measure. Some experiments on two gene expression data show that the proposed gene selection is an effective method for improving the accuracy of tumor classification. Copyright © 2017 Elsevier Inc. All rights reserved.
Gene selection for microarray data classification via subspace learning and manifold regularization.
Tang, Chang; Cao, Lijuan; Zheng, Xiao; Wang, Minhui
2017-12-19
With the rapid development of DNA microarray technology, large amount of genomic data has been generated. Classification of these microarray data is a challenge task since gene expression data are often with thousands of genes but a small number of samples. In this paper, an effective gene selection method is proposed to select the best subset of genes for microarray data with the irrelevant and redundant genes removed. Compared with original data, the selected gene subset can benefit the classification task. We formulate the gene selection task as a manifold regularized subspace learning problem. In detail, a projection matrix is used to project the original high dimensional microarray data into a lower dimensional subspace, with the constraint that the original genes can be well represented by the selected genes. Meanwhile, the local manifold structure of original data is preserved by a Laplacian graph regularization term on the low-dimensional data space. The projection matrix can serve as an importance indicator of different genes. An iterative update algorithm is developed for solving the problem. Experimental results on six publicly available microarray datasets and one clinical dataset demonstrate that the proposed method performs better when compared with other state-of-the-art methods in terms of microarray data classification. Graphical Abstract The graphical abstract of this work.
Takahashi, Hiro; Nemoto, Takeshi; Yoshida, Teruhiko; Honda, Hiroyuki; Hasegawa, Tadashi
2006-01-01
Background Recent advances in genome technologies have provided an excellent opportunity to determine the complete biological characteristics of neoplastic tissues, resulting in improved diagnosis and selection of treatment. To accomplish this objective, it is important to establish a sophisticated algorithm that can deal with large quantities of data such as gene expression profiles obtained by DNA microarray analysis. Results Previously, we developed the projective adaptive resonance theory (PART) filtering method as a gene filtering method. This is one of the clustering methods that can select specific genes for each subtype. In this study, we applied the PART filtering method to analyze microarray data that were obtained from soft tissue sarcoma (STS) patients for the extraction of subtype-specific genes. The performance of the filtering method was evaluated by comparison with other widely used methods, such as signal-to-noise, significance analysis of microarrays, and nearest shrunken centroids. In addition, various combinations of filtering and modeling methods were used to extract essential subtype-specific genes. The combination of the PART filtering method and boosting – the PART-BFCS method – showed the highest accuracy. Seven genes among the 15 genes that are frequently selected by this method – MIF, CYFIP2, HSPCB, TIMP3, LDHA, ABR, and RGS3 – are known prognostic marker genes for other tumors. These genes are candidate marker genes for the diagnosis of STS. Correlation analysis was performed to extract marker genes that were not selected by PART-BFCS. Sixteen genes among those extracted are also known prognostic marker genes for other tumors, and they could be candidate marker genes for the diagnosis of STS. Conclusion The procedure that consisted of two steps, such as the PART-BFCS and the correlation analysis, was proposed. The results suggest that novel diagnostic and therapeutic targets for STS can be extracted by a procedure that includes the PART filtering method. PMID:16948864
A multi-strategy approach to informative gene identification from gene expression data.
Liu, Ziying; Phan, Sieu; Famili, Fazel; Pan, Youlian; Lenferink, Anne E G; Cantin, Christiane; Collins, Catherine; O'Connor-McCourt, Maureen D
2010-02-01
An unsupervised multi-strategy approach has been developed to identify informative genes from high throughput genomic data. Several statistical methods have been used in the field to identify differentially expressed genes. Since different methods generate different lists of genes, it is very challenging to determine the most reliable gene list and the appropriate method. This paper presents a multi-strategy method, in which a combination of several data analysis techniques are applied to a given dataset and a confidence measure is established to select genes from the gene lists generated by these techniques to form the core of our final selection. The remainder of the genes that form the peripheral region are subject to exclusion or inclusion into the final selection. This paper demonstrates this methodology through its application to an in-house cancer genomics dataset and a public dataset. The results indicate that our method provides more reliable list of genes, which are validated using biological knowledge, biological experiments, and literature search. We further evaluated our multi-strategy method by consolidating two pairs of independent datasets, each pair is for the same disease, but generated by different labs using different platforms. The results showed that our method has produced far better results.
[Selection of reference genes of Siraitia grosvenorii by real-time PCR].
Tu, Dong-ping; Mo, Chang-ming; Ma, Xiao-jun; Zhao, Huan; Tang, Qi; Huang, Jie; Pan, Li-mei; Wei, Rong-chang
2015-01-01
Siraitia grosvenorii is a traditional Chinese medicine also as edible food. This study selected six candidate reference genes by real-time quantitative PCR, the expression stability of the candidate reference genes in the different samples was analyzed by using the software and methods of geNorm, NormFinder, BestKeeper, Delta CT method and RefFinder, reference genes for S. grosvenorii were selected for the first time. The results showed that 18SrRNA expressed most stable in all samples, was the best reference gene in the genetic analysis. The study has a guiding role for the analysis of gene expression using qRT-PCR methods, providing a suitable reference genes to ensure the results in the study on differential expressed gene in synthesis and biological pathways, also other genes of S. grosvenorii.
Yang, Mingxing; Li, Xiumin; Li, Zhibin; Ou, Zhimin; Liu, Ming; Liu, Suhuan; Li, Xuejun; Yang, Shuyu
2013-01-01
DNA microarray analysis is characterized by obtaining a large number of gene variables from a small number of observations. Cluster analysis is widely used to analyze DNA microarray data to make classification and diagnosis of disease. Because there are so many irrelevant and insignificant genes in a dataset, a feature selection approach must be employed in data analysis. The performance of cluster analysis of this high-throughput data depends on whether the feature selection approach chooses the most relevant genes associated with disease classes. Here we proposed a new method using multiple Orthogonal Partial Least Squares-Discriminant Analysis (mOPLS-DA) models and S-plots to select the most relevant genes to conduct three-class disease classification and prediction. We tested our method using Golub's leukemia microarray data. For three classes with subtypes, we proposed hierarchical orthogonal partial least squares-discriminant analysis (OPLS-DA) models and S-plots to select features for two main classes and their subtypes. For three classes in parallel, we employed three OPLS-DA models and S-plots to choose marker genes for each class. The power of feature selection to classify and predict three-class disease was evaluated using cluster analysis. Further, the general performance of our method was tested using four public datasets and compared with those of four other feature selection methods. The results revealed that our method effectively selected the most relevant features for disease classification and prediction, and its performance was better than that of the other methods.
Application of machine learning on brain cancer multiclass classification
NASA Astrophysics Data System (ADS)
Panca, V.; Rustam, Z.
2017-07-01
Classification of brain cancer is a problem of multiclass classification. One approach to solve this problem is by first transforming it into several binary problems. The microarray gene expression dataset has the two main characteristics of medical data: extremely many features (genes) and only a few number of samples. The application of machine learning on microarray gene expression dataset mainly consists of two steps: feature selection and classification. In this paper, the features are selected using a method based on support vector machine recursive feature elimination (SVM-RFE) principle which is improved to solve multiclass classification, called multiple multiclass SVM-RFE. Instead of using only the selected features on a single classifier, this method combines the result of multiple classifiers. The features are divided into subsets and SVM-RFE is used on each subset. Then, the selected features on each subset are put on separate classifiers. This method enhances the feature selection ability of each single SVM-RFE. Twin support vector machine (TWSVM) is used as the method of the classifier to reduce computational complexity. While ordinary SVM finds single optimum hyperplane, the main objective Twin SVM is to find two non-parallel optimum hyperplanes. The experiment on the brain cancer microarray gene expression dataset shows this method could classify 71,4% of the overall test data correctly, using 100 and 1000 genes selected from multiple multiclass SVM-RFE feature selection method. Furthermore, the per class results show that this method could classify data of normal and MD class with 100% accuracy.
Hybrid genetic algorithm-neural network: feature extraction for unpreprocessed microarray data.
Tong, Dong Ling; Schierz, Amanda C
2011-09-01
Suitable techniques for microarray analysis have been widely researched, particularly for the study of marker genes expressed to a specific type of cancer. Most of the machine learning methods that have been applied to significant gene selection focus on the classification ability rather than the selection ability of the method. These methods also require the microarray data to be preprocessed before analysis takes place. The objective of this study is to develop a hybrid genetic algorithm-neural network (GANN) model that emphasises feature selection and can operate on unpreprocessed microarray data. The GANN is a hybrid model where the fitness value of the genetic algorithm (GA) is based upon the number of samples correctly labelled by a standard feedforward artificial neural network (ANN). The model is evaluated by using two benchmark microarray datasets with different array platforms and differing number of classes (a 2-class oligonucleotide microarray data for acute leukaemia and a 4-class complementary DNA (cDNA) microarray dataset for SRBCTs (small round blue cell tumours)). The underlying concept of the GANN algorithm is to select highly informative genes by co-evolving both the GA fitness function and the ANN weights at the same time. The novel GANN selected approximately 50% of the same genes as the original studies. This may indicate that these common genes are more biologically significant than other genes in the datasets. The remaining 50% of the significant genes identified were used to build predictive models and for both datasets, the models based on the set of genes extracted by the GANN method produced more accurate results. The results also suggest that the GANN method not only can detect genes that are exclusively associated with a single cancer type but can also explore the genes that are differentially expressed in multiple cancer types. The results show that the GANN model has successfully extracted statistically significant genes from the unpreprocessed microarray data as well as extracting known biologically significant genes. We also show that assessing the biological significance of genes based on classification accuracy may be misleading and though the GANN's set of extra genes prove to be more statistically significant than those selected by other methods, a biological assessment of these genes is highly recommended to confirm their functionality. Copyright © 2011 Elsevier B.V. All rights reserved.
Improved Sparse Multi-Class SVM and Its Application for Gene Selection in Cancer Classification
Huang, Lingkang; Zhang, Hao Helen; Zeng, Zhao-Bang; Bushel, Pierre R.
2013-01-01
Background Microarray techniques provide promising tools for cancer diagnosis using gene expression profiles. However, molecular diagnosis based on high-throughput platforms presents great challenges due to the overwhelming number of variables versus the small sample size and the complex nature of multi-type tumors. Support vector machines (SVMs) have shown superior performance in cancer classification due to their ability to handle high dimensional low sample size data. The multi-class SVM algorithm of Crammer and Singer provides a natural framework for multi-class learning. Despite its effective performance, the procedure utilizes all variables without selection. In this paper, we propose to improve the procedure by imposing shrinkage penalties in learning to enforce solution sparsity. Results The original multi-class SVM of Crammer and Singer is effective for multi-class classification but does not conduct variable selection. We improved the method by introducing soft-thresholding type penalties to incorporate variable selection into multi-class classification for high dimensional data. The new methods were applied to simulated data and two cancer gene expression data sets. The results demonstrate that the new methods can select a small number of genes for building accurate multi-class classification rules. Furthermore, the important genes selected by the methods overlap significantly, suggesting general agreement among different variable selection schemes. Conclusions High accuracy and sparsity make the new methods attractive for cancer diagnostics with gene expression data and defining targets of therapeutic intervention. Availability: The source MATLAB code are available from http://math.arizona.edu/~hzhang/software.html. PMID:23966761
LS Bound based gene selection for DNA microarray data.
Zhou, Xin; Mao, K Z
2005-04-15
One problem with discriminant analysis of DNA microarray data is that each sample is represented by quite a large number of genes, and many of them are irrelevant, insignificant or redundant to the discriminant problem at hand. Methods for selecting important genes are, therefore, of much significance in microarray data analysis. In the present study, a new criterion, called LS Bound measure, is proposed to address the gene selection problem. The LS Bound measure is derived from leave-one-out procedure of LS-SVMs (least squares support vector machines), and as the upper bound for leave-one-out classification results it reflects to some extent the generalization performance of gene subsets. We applied this LS Bound measure for gene selection on two benchmark microarray datasets: colon cancer and leukemia. We also compared the LS Bound measure with other evaluation criteria, including the well-known Fisher's ratio and Mahalanobis class separability measure, and other published gene selection algorithms, including Weighting factor and SVM Recursive Feature Elimination. The strength of the LS Bound measure is that it provides gene subsets leading to more accurate classification results than the filter method while its computational complexity is at the level of the filter method. A companion website can be accessed at http://www.ntu.edu.sg/home5/pg02776030/lsbound/. The website contains: (1) the source code of the gene selection algorithm; (2) the complete set of tables and figures regarding the experimental study; (3) proof of the inequality (9). ekzmao@ntu.edu.sg.
Hierarchical Gene Selection and Genetic Fuzzy System for Cancer Microarray Data Classification
Nguyen, Thanh; Khosravi, Abbas; Creighton, Douglas; Nahavandi, Saeid
2015-01-01
This paper introduces a novel approach to gene selection based on a substantial modification of analytic hierarchy process (AHP). The modified AHP systematically integrates outcomes of individual filter methods to select the most informative genes for microarray classification. Five individual ranking methods including t-test, entropy, receiver operating characteristic (ROC) curve, Wilcoxon and signal to noise ratio are employed to rank genes. These ranked genes are then considered as inputs for the modified AHP. Additionally, a method that uses fuzzy standard additive model (FSAM) for cancer classification based on genes selected by AHP is also proposed in this paper. Traditional FSAM learning is a hybrid process comprising unsupervised structure learning and supervised parameter tuning. Genetic algorithm (GA) is incorporated in-between unsupervised and supervised training to optimize the number of fuzzy rules. The integration of GA enables FSAM to deal with the high-dimensional-low-sample nature of microarray data and thus enhance the efficiency of the classification. Experiments are carried out on numerous microarray datasets. Results demonstrate the performance dominance of the AHP-based gene selection against the single ranking methods. Furthermore, the combination of AHP-FSAM shows a great accuracy in microarray data classification compared to various competing classifiers. The proposed approach therefore is useful for medical practitioners and clinicians as a decision support system that can be implemented in the real medical practice. PMID:25823003
Hierarchical gene selection and genetic fuzzy system for cancer microarray data classification.
Nguyen, Thanh; Khosravi, Abbas; Creighton, Douglas; Nahavandi, Saeid
2015-01-01
This paper introduces a novel approach to gene selection based on a substantial modification of analytic hierarchy process (AHP). The modified AHP systematically integrates outcomes of individual filter methods to select the most informative genes for microarray classification. Five individual ranking methods including t-test, entropy, receiver operating characteristic (ROC) curve, Wilcoxon and signal to noise ratio are employed to rank genes. These ranked genes are then considered as inputs for the modified AHP. Additionally, a method that uses fuzzy standard additive model (FSAM) for cancer classification based on genes selected by AHP is also proposed in this paper. Traditional FSAM learning is a hybrid process comprising unsupervised structure learning and supervised parameter tuning. Genetic algorithm (GA) is incorporated in-between unsupervised and supervised training to optimize the number of fuzzy rules. The integration of GA enables FSAM to deal with the high-dimensional-low-sample nature of microarray data and thus enhance the efficiency of the classification. Experiments are carried out on numerous microarray datasets. Results demonstrate the performance dominance of the AHP-based gene selection against the single ranking methods. Furthermore, the combination of AHP-FSAM shows a great accuracy in microarray data classification compared to various competing classifiers. The proposed approach therefore is useful for medical practitioners and clinicians as a decision support system that can be implemented in the real medical practice.
Furlanello, Cesare; Serafini, Maria; Merler, Stefano; Jurman, Giuseppe
2003-11-06
We describe the E-RFE method for gene ranking, which is useful for the identification of markers in the predictive classification of array data. The method supports a practical modeling scheme designed to avoid the construction of classification rules based on the selection of too small gene subsets (an effect known as the selection bias, in which the estimated predictive errors are too optimistic due to testing on samples already considered in the feature selection process). With E-RFE, we speed up the recursive feature elimination (RFE) with SVM classifiers by eliminating chunks of uninteresting genes using an entropy measure of the SVM weights distribution. An optimal subset of genes is selected according to a two-strata model evaluation procedure: modeling is replicated by an external stratified-partition resampling scheme, and, within each run, an internal K-fold cross-validation is used for E-RFE ranking. Also, the optimal number of genes can be estimated according to the saturation of Zipf's law profiles. Without a decrease of classification accuracy, E-RFE allows a speed-up factor of 100 with respect to standard RFE, while improving on alternative parametric RFE reduction strategies. Thus, a process for gene selection and error estimation is made practical, ensuring control of the selection bias, and providing additional diagnostic indicators of gene importance.
Wang, Yun; Huang, Fangzhou
2018-01-01
The selection of feature genes with high recognition ability from the gene expression profiles has gained great significance in biology. However, most of the existing methods have a high time complexity and poor classification performance. Motivated by this, an effective feature selection method, called supervised locally linear embedding and Spearman's rank correlation coefficient (SLLE-SC2), is proposed which is based on the concept of locally linear embedding and correlation coefficient algorithms. Supervised locally linear embedding takes into account class label information and improves the classification performance. Furthermore, Spearman's rank correlation coefficient is used to remove the coexpression genes. The experiment results obtained on four public tumor microarray datasets illustrate that our method is valid and feasible. PMID:29666661
Xu, Jiucheng; Mu, Huiyu; Wang, Yun; Huang, Fangzhou
2018-01-01
The selection of feature genes with high recognition ability from the gene expression profiles has gained great significance in biology. However, most of the existing methods have a high time complexity and poor classification performance. Motivated by this, an effective feature selection method, called supervised locally linear embedding and Spearman's rank correlation coefficient (SLLE-SC 2 ), is proposed which is based on the concept of locally linear embedding and correlation coefficient algorithms. Supervised locally linear embedding takes into account class label information and improves the classification performance. Furthermore, Spearman's rank correlation coefficient is used to remove the coexpression genes. The experiment results obtained on four public tumor microarray datasets illustrate that our method is valid and feasible.
Yu, Fang; Chen, Ming-Hui; Kuo, Lynn; Talbott, Heather; Davis, John S
2015-08-07
Recently, the Bayesian method becomes more popular for analyzing high dimensional gene expression data as it allows us to borrow information across different genes and provides powerful estimators for evaluating gene expression levels. It is crucial to develop a simple but efficient gene selection algorithm for detecting differentially expressed (DE) genes based on the Bayesian estimators. In this paper, by extending the two-criterion idea of Chen et al. (Chen M-H, Ibrahim JG, Chi Y-Y. A new class of mixture models for differential gene expression in DNA microarray data. J Stat Plan Inference. 2008;138:387-404), we propose two new gene selection algorithms for general Bayesian models and name these new methods as the confident difference criterion methods. One is based on the standardized differences between two mean expression values among genes; the other adds the differences between two variances to it. The proposed confident difference criterion methods first evaluate the posterior probability of a gene having different gene expressions between competitive samples and then declare a gene to be DE if the posterior probability is large. The theoretical connection between the proposed first method based on the means and the Bayes factor approach proposed by Yu et al. (Yu F, Chen M-H, Kuo L. Detecting differentially expressed genes using alibrated Bayes factors. Statistica Sinica. 2008;18:783-802) is established under the normal-normal-model with equal variances between two samples. The empirical performance of the proposed methods is examined and compared to those of several existing methods via several simulations. The results from these simulation studies show that the proposed confident difference criterion methods outperform the existing methods when comparing gene expressions across different conditions for both microarray studies and sequence-based high-throughput studies. A real dataset is used to further demonstrate the proposed methodology. In the real data application, the confident difference criterion methods successfully identified more clinically important DE genes than the other methods. The confident difference criterion method proposed in this paper provides a new efficient approach for both microarray studies and sequence-based high-throughput studies to identify differentially expressed genes.
A Granular Self-Organizing Map for Clustering and Gene Selection in Microarray Data.
Ray, Shubhra Sankar; Ganivada, Avatharam; Pal, Sankar K
2016-09-01
A new granular self-organizing map (GSOM) is developed by integrating the concept of a fuzzy rough set with the SOM. While training the GSOM, the weights of a winning neuron and the neighborhood neurons are updated through a modified learning procedure. The neighborhood is newly defined using the fuzzy rough sets. The clusters (granules) evolved by the GSOM are presented to a decision table as its decision classes. Based on the decision table, a method of gene selection is developed. The effectiveness of the GSOM is shown in both clustering samples and developing an unsupervised fuzzy rough feature selection (UFRFS) method for gene selection in microarray data. While the superior results of the GSOM, as compared with the related clustering methods, are provided in terms of β -index, DB-index, Dunn-index, and fuzzy rough entropy, the genes selected by the UFRFS are not only better in terms of classification accuracy and a feature evaluation index, but also statistically more significant than the related unsupervised methods. The C-codes of the GSOM and UFRFS are available online at http://avatharamg.webs.com/software-code.
Computerized system for recognition of autism on the basis of gene expression microarray data.
Latkowski, Tomasz; Osowski, Stanislaw
2015-01-01
The aim of this paper is to provide a means to recognize a case of autism using gene expression microarrays. The crucial task is to discover the most important genes which are strictly associated with autism. The paper presents an application of different methods of gene selection, to select the most representative input attributes for an ensemble of classifiers. The set of classifiers is responsible for distinguishing autism data from the reference class. Simultaneous application of a few gene selection methods enables analysis of the ill-conditioned gene expression matrix from different points of view. The results of selection combined with a genetic algorithm and SVM classifier have shown increased accuracy of autism recognition. Early recognition of autism is extremely important for treatment of children and increases the probability of their recovery and return to normal social communication. The results of this research can find practical application in early recognition of autism on the basis of gene expression microarray analysis. Copyright © 2014 Elsevier Ltd. All rights reserved.
Zhou, Zhan; Zou, Yangyun; Liu, Gangbiao; Zhou, Jingqi; Wu, Jingcheng; Zhao, Shimin; Su, Zhixi; Gu, Xun
2017-08-29
Human genes exhibit different effects on fitness in cancer and normal cells. Here, we present an evolutionary approach to measure the selection pressure on human genes, using the well-known ratio of the nonsynonymous to synonymous substitution rate in both cancer genomes ( C N / C S ) and normal populations ( p N / p S ). A new mutation-profile-based method that adopts sample-specific mutation rate profiles instead of conventional substitution models was developed. We found that cancer-specific selection pressure is quite different from the selection pressure at the species and population levels. Both the relaxation of purifying selection on passenger mutations and the positive selection of driver mutations may contribute to the increased C N / C S values of human genes in cancer genomes compared with the p N / p S values in human populations. The C N / C S values also contribute to the improved classification of cancer genes and a better understanding of the onco-functionalization of cancer genes during oncogenesis. The use of our computational pipeline to identify cancer-specific positively and negatively selected genes may provide useful information for understanding the evolution of cancers and identifying possible targets for therapeutic intervention.
Li, Juntao; Wang, Yanyan; Jiang, Tao; Xiao, Huimin; Song, Xuekun
2018-05-09
Diagnosing acute leukemia is the necessary prerequisite to treating it. Multi-classification on the gene expression data of acute leukemia is help for diagnosing it which contains B-cell acute lymphoblastic leukemia (BALL), T-cell acute lymphoblastic leukemia (TALL) and acute myeloid leukemia (AML). However, selecting cancer-causing genes is a challenging problem in performing multi-classification. In this paper, weighted gene co-expression networks are employed to divide the genes into groups. Based on the dividing groups, a new regularized multinomial regression with overlapping group lasso penalty (MROGL) has been presented to simultaneously perform multi-classification and select gene groups. By implementing this method on three-class acute leukemia data, the grouped genes which work synergistically are identified, and the overlapped genes shared by different groups are also highlighted. Moreover, MROGL outperforms other five methods on multi-classification accuracy. Copyright © 2017. Published by Elsevier B.V.
Identifying Loci Under Selection Against Gene Flow in Isolation-with-Migration Models
Sousa, Vitor C.; Carneiro, Miguel; Ferrand, Nuno; Hey, Jody
2013-01-01
When divergence occurs in the presence of gene flow, there can arise an interesting dynamic in which selection against gene flow, at sites associated with population-specific adaptations or genetic incompatibilities, can cause net gene flow to vary across the genome. Loci linked to sites under selection may experience reduced gene flow and may experience genetic bottlenecks by the action of nearby selective sweeps. Data from histories such as these may be poorly fitted by conventional neutral model approaches to demographic inference, which treat all loci as equally subject to forces of genetic drift and gene flow. To allow for demographic inference in the face of such histories, as well as the identification of loci affected by selection, we developed an isolation-with-migration model that explicitly provides for variation among genomic regions in migration rates and/or rates of genetic drift. The method allows for loci to fall into any of multiple groups, each characterized by a different set of parameters, thus relaxing the assumption that all loci share the same demography. By grouping loci, the method can be applied to data with multiple loci and still have tractable dimensionality and statistical power. We studied the performance of the method using simulated data, and we applied the method to study the divergence of two subspecies of European rabbits (Oryctolagus cuniculus). PMID:23457232
Liu, Liang; Cooper, Tamara; Eldi, Preethi; Garcia-Valtanen, Pablo; Diener, Kerrilyn R; Howley, Paul M; Hayball, John D
2017-04-01
Recombinant vaccinia viruses (rVACVs) are promising antigen-delivery systems for vaccine development that are also useful as research tools. Two common methods for selection during construction of rVACV clones are (i) co-insertion of drug resistance or reporter protein genes, which requires the use of additional selection drugs or detection methods, and (ii) dominant host-range selection. The latter uses VACV variants rendered replication-incompetent in host cell lines by the deletion of host-range genes. Replicative ability is restored by co-insertion of the host-range genes, providing for dominant selection of the recombinant viruses. Here, we describe a new method for the construction of rVACVs using the cowpox CP77 protein and unmodified VACV as the starting material. Our selection system will expand the range of tools available for positive selection of rVACV during vector construction, and it is substantially more high-fidelity than approaches based on selection for drug resistance.
Tsai, Yu-Shuen; Aguan, Kripamoy; Pal, Nikhil R.; Chung, I-Fang
2011-01-01
Informative genes from microarray data can be used to construct prediction model and investigate biological mechanisms. Differentially expressed genes, the main targets of most gene selection methods, can be classified as single- and multiple-class specific signature genes. Here, we present a novel gene selection algorithm based on a Group Marker Index (GMI), which is intuitive, of low-computational complexity, and efficient in identification of both types of genes. Most gene selection methods identify only single-class specific signature genes and cannot identify multiple-class specific signature genes easily. Our algorithm can detect de novo certain conditions of multiple-class specificity of a gene and makes use of a novel non-parametric indicator to assess the discrimination ability between classes. Our method is effective even when the sample size is small as well as when the class sizes are significantly different. To compare the effectiveness and robustness we formulate an intuitive template-based method and use four well-known datasets. We demonstrate that our algorithm outperforms the template-based method in difficult cases with unbalanced distribution. Moreover, the multiple-class specific genes are good biomarkers and play important roles in biological pathways. Our literature survey supports that the proposed method identifies unique multiple-class specific marker genes (not reported earlier to be related to cancer) in the Central Nervous System data. It also discovers unique biomarkers indicating the intrinsic difference between subtypes of lung cancer. We also associate the pathway information with the multiple-class specific signature genes and cross-reference to published studies. We find that the identified genes participate in the pathways directly involved in cancer development in leukemia data. Our method gives a promising way to find genes that can involve in pathways of multiple diseases and hence opens up the possibility of using an existing drug on other diseases as well as designing a single drug for multiple diseases. PMID:21909426
Li, Jiangeng; Su, Lei; Pang, Zenan
2015-12-01
Feature selection techniques have been widely applied to tumor gene expression data analysis in recent years. A filter feature selection method named marginal Fisher analysis score (MFA score) which is based on graph embedding has been proposed, and it has been widely used mainly because it is superior to Fisher score. Considering the heavy redundancy in gene expression data, we proposed a new filter feature selection technique in this paper. It is named MFA score+ and is based on MFA score and redundancy excluding. We applied it to an artificial dataset and eight tumor gene expression datasets to select important features and then used support vector machine as the classifier to classify the samples. Compared with MFA score, t test and Fisher score, it achieved higher classification accuracy.
ROKU: a novel method for identification of tissue-specific genes.
Kadota, Koji; Ye, Jiazhen; Nakai, Yuji; Terada, Tohru; Shimizu, Kentaro
2006-06-12
One of the important goals of microarray research is the identification of genes whose expression is considerably higher or lower in some tissues than in others. We would like to have ways of identifying such tissue-specific genes. We describe a method, ROKU, which selects tissue-specific patterns from gene expression data for many tissues and thousands of genes. ROKU ranks genes according to their overall tissue specificity using Shannon entropy and detects tissues specific to each gene if any exist using an outlier detection method. We evaluated the capacity for the detection of various specific expression patterns using synthetic and real data. We observed that ROKU was superior to a conventional entropy-based method in its ability to rank genes according to overall tissue specificity and to detect genes whose expression pattern are specific only to objective tissues. ROKU is useful for the detection of various tissue-specific expression patterns. The framework is also directly applicable to the selection of diagnostic markers for molecular classification of multiple classes.
Multi-level gene/MiRNA feature selection using deep belief nets and active learning.
Ibrahim, Rania; Yousri, Noha A; Ismail, Mohamed A; El-Makky, Nagwa M
2014-01-01
Selecting the most discriminative genes/miRNAs has been raised as an important task in bioinformatics to enhance disease classifiers and to mitigate the dimensionality curse problem. Original feature selection methods choose genes/miRNAs based on their individual features regardless of how they perform together. Considering group features instead of individual ones provides a better view for selecting the most informative genes/miRNAs. Recently, deep learning has proven its ability in representing the data in multiple levels of abstraction, allowing for better discrimination between different classes. However, the idea of using deep learning for feature selection is not widely used in the bioinformatics field yet. In this paper, a novel multi-level feature selection approach named MLFS is proposed for selecting genes/miRNAs based on expression profiles. The approach is based on both deep and active learning. Moreover, an extension to use the technique for miRNAs is presented by considering the biological relation between miRNAs and genes. Experimental results show that the approach was able to outperform classical feature selection methods in hepatocellular carcinoma (HCC) by 9%, lung cancer by 6% and breast cancer by around 10% in F1-measure. Results also show the enhancement in F1-measure of our approach over recently related work in [1] and [2].
Takahashi, Hiro; Honda, Hiroyuki
2006-07-01
Considering the recent advances in and the benefits of DNA microarray technologies, many gene filtering approaches have been employed for the diagnosis and prognosis of diseases. In our previous study, we developed a new filtering method, namely, the projective adaptive resonance theory (PART) filtering method. This method was effective in subclass discrimination. In the PART algorithm, the genes with a low variance in gene expression in either class, not both classes, were selected as important genes for modeling. Based on this concept, we developed novel simple filtering methods such as modified signal-to-noise (S2N') in the present study. The discrimination model constructed using these methods showed higher accuracy with higher reproducibility as compared with many conventional filtering methods, including the t-test, S2N, NSC and SAM. The reproducibility of prediction was evaluated based on the correlation between the sets of U-test p-values on randomly divided datasets. With respect to leukemia, lymphoma and breast cancer, the correlation was high; a difference of >0.13 was obtained by the constructed model by using <50 genes selected by S2N'. Improvement was higher in the smaller genes and such higher correlation was observed when t-test, NSC and SAM were used. These results suggest that these modified methods, such as S2N', have high potential to function as new methods for marker gene selection in cancer diagnosis using DNA microarray data. Software is available upon request.
Dashtban, M; Balafar, Mohammadali
2017-03-01
Gene selection is a demanding task for microarray data analysis. The diverse complexity of different cancers makes this issue still challenging. In this study, a novel evolutionary method based on genetic algorithms and artificial intelligence is proposed to identify predictive genes for cancer classification. A filter method was first applied to reduce the dimensionality of feature space followed by employing an integer-coded genetic algorithm with dynamic-length genotype, intelligent parameter settings, and modified operators. The algorithmic behaviors including convergence trends, mutation and crossover rate changes, and running time were studied, conceptually discussed, and shown to be coherent with literature findings. Two well-known filter methods, Laplacian and Fisher score, were examined considering similarities, the quality of selected genes, and their influences on the evolutionary approach. Several statistical tests concerning choice of classifier, choice of dataset, and choice of filter method were performed, and they revealed some significant differences between the performance of different classifiers and filter methods over datasets. The proposed method was benchmarked upon five popular high-dimensional cancer datasets; for each, top explored genes were reported. Comparing the experimental results with several state-of-the-art methods revealed that the proposed method outperforms previous methods in DLBCL dataset. Copyright © 2017 Elsevier Inc. All rights reserved.
An efficient ensemble learning method for gene microarray classification.
Osareh, Alireza; Shadgar, Bita
2013-01-01
The gene microarray analysis and classification have demonstrated an effective way for the effective diagnosis of diseases and cancers. However, it has been also revealed that the basic classification techniques have intrinsic drawbacks in achieving accurate gene classification and cancer diagnosis. On the other hand, classifier ensembles have received increasing attention in various applications. Here, we address the gene classification issue using RotBoost ensemble methodology. This method is a combination of Rotation Forest and AdaBoost techniques which in turn preserve both desirable features of an ensemble architecture, that is, accuracy and diversity. To select a concise subset of informative genes, 5 different feature selection algorithms are considered. To assess the efficiency of the RotBoost, other nonensemble/ensemble techniques including Decision Trees, Support Vector Machines, Rotation Forest, AdaBoost, and Bagging are also deployed. Experimental results have revealed that the combination of the fast correlation-based feature selection method with ICA-based RotBoost ensemble is highly effective for gene classification. In fact, the proposed method can create ensemble classifiers which outperform not only the classifiers produced by the conventional machine learning but also the classifiers generated by two widely used conventional ensemble learning methods, that is, Bagging and AdaBoost.
Simple Method for Markerless Gene Deletion in Multidrug-Resistant Acinetobacter baumannii
Oh, Man Hwan; Lee, Je Chul; Kim, Jungmin
2015-01-01
The traditional markerless gene deletion technique based on overlap extension PCR has been used for generating gene deletions in multidrug-resistant Acinetobacter baumannii. However, the method is time-consuming because it requires restriction digestion of the PCR products in DNA cloning and the construction of new vectors containing a suitable antibiotic resistance cassette for the selection of A. baumannii merodiploids. Moreover, the availability of restriction sites and the selection of recombinant bacteria harboring the desired chimeric plasmid are limited, making the construction of a chimeric plasmid more difficult. We describe a rapid and easy cloning method for markerless gene deletion in A. baumannii, which has no limitation in the availability of restriction sites and allows for easy selection of the clones carrying the desired chimeric plasmid. Notably, it is not necessary to construct new vectors in our method. This method utilizes direct cloning of blunt-end DNA fragments, in which upstream and downstream regions of the target gene are fused with an antibiotic resistance cassette via overlap extension PCR and are inserted into a blunt-end suicide vector developed for blunt-end cloning. Importantly, the antibiotic resistance cassette is placed outside the downstream region in order to enable easy selection of the recombinants carrying the desired plasmid, to eliminate the antibiotic resistance cassette via homologous recombination, and to avoid the necessity of constructing new vectors. This strategy was successfully applied to functional analysis of the genes associated with iron acquisition by A. baumannii ATCC 19606 and to ompA gene deletion in other A. baumannii strains. Consequently, the proposed method is invaluable for markerless gene deletion in multidrug-resistant A. baumannii. PMID:25746991
Supervised group Lasso with applications to microarray data analysis
Ma, Shuangge; Song, Xiao; Huang, Jian
2007-01-01
Background A tremendous amount of efforts have been devoted to identifying genes for diagnosis and prognosis of diseases using microarray gene expression data. It has been demonstrated that gene expression data have cluster structure, where the clusters consist of co-regulated genes which tend to have coordinated functions. However, most available statistical methods for gene selection do not take into consideration the cluster structure. Results We propose a supervised group Lasso approach that takes into account the cluster structure in gene expression data for gene selection and predictive model building. For gene expression data without biological cluster information, we first divide genes into clusters using the K-means approach and determine the optimal number of clusters using the Gap method. The supervised group Lasso consists of two steps. In the first step, we identify important genes within each cluster using the Lasso method. In the second step, we select important clusters using the group Lasso. Tuning parameters are determined using V-fold cross validation at both steps to allow for further flexibility. Prediction performance is evaluated using leave-one-out cross validation. We apply the proposed method to disease classification and survival analysis with microarray data. Conclusion We analyze four microarray data sets using the proposed approach: two cancer data sets with binary cancer occurrence as outcomes and two lymphoma data sets with survival outcomes. The results show that the proposed approach is capable of identifying a small number of influential gene clusters and important genes within those clusters, and has better prediction performance than existing methods. PMID:17316436
Differentially Coexpressed Disease Gene Identification Based on Gene Coexpression Network.
Jiang, Xue; Zhang, Han; Quan, Xiongwen
2016-01-01
Screening disease-related genes by analyzing gene expression data has become a popular theme. Traditional disease-related gene selection methods always focus on identifying differentially expressed gene between case samples and a control group. These traditional methods may not fully consider the changes of interactions between genes at different cell states and the dynamic processes of gene expression levels during the disease progression. However, in order to understand the mechanism of disease, it is important to explore the dynamic changes of interactions between genes in biological networks at different cell states. In this study, we designed a novel framework to identify disease-related genes and developed a differentially coexpressed disease-related gene identification method based on gene coexpression network (DCGN) to screen differentially coexpressed genes. We firstly constructed phase-specific gene coexpression network using time-series gene expression data and defined the conception of differential coexpression of genes in coexpression network. Then, we designed two metrics to measure the value of gene differential coexpression according to the change of local topological structures between different phase-specific networks. Finally, we conducted meta-analysis of gene differential coexpression based on the rank-product method. Experimental results demonstrated the feasibility and effectiveness of DCGN and the superior performance of DCGN over other popular disease-related gene selection methods through real-world gene expression data sets.
A Feature Selection Algorithm to Compute Gene Centric Methylation from Probe Level Methylation Data.
Baur, Brittany; Bozdag, Serdar
2016-01-01
DNA methylation is an important epigenetic event that effects gene expression during development and various diseases such as cancer. Understanding the mechanism of action of DNA methylation is important for downstream analysis. In the Illumina Infinium HumanMethylation 450K array, there are tens of probes associated with each gene. Given methylation intensities of all these probes, it is necessary to compute which of these probes are most representative of the gene centric methylation level. In this study, we developed a feature selection algorithm based on sequential forward selection that utilized different classification methods to compute gene centric DNA methylation using probe level DNA methylation data. We compared our algorithm to other feature selection algorithms such as support vector machines with recursive feature elimination, genetic algorithms and ReliefF. We evaluated all methods based on the predictive power of selected probes on their mRNA expression levels and found that a K-Nearest Neighbors classification using the sequential forward selection algorithm performed better than other algorithms based on all metrics. We also observed that transcriptional activities of certain genes were more sensitive to DNA methylation changes than transcriptional activities of other genes. Our algorithm was able to predict the expression of those genes with high accuracy using only DNA methylation data. Our results also showed that those DNA methylation-sensitive genes were enriched in Gene Ontology terms related to the regulation of various biological processes.
Zhang, Ao; Tian, Suyan
2018-05-01
Pathway-based feature selection algorithms, which utilize biological information contained in pathways to guide which features/genes should be selected, have evolved quickly and become widespread in the field of bioinformatics. Based on how the pathway information is incorporated, we classify pathway-based feature selection algorithms into three major categories-penalty, stepwise forward, and weighting. Compared to the first two categories, the weighting methods have been underutilized even though they are usually the simplest ones. In this article, we constructed three different genes' connectivity information-based weights for each gene and then conducted feature selection upon the resulting weighted gene expression profiles. Using both simulations and a real-world application, we have demonstrated that when the data-driven connectivity information constructed from the data of specific disease under study is considered, the resulting weighted gene expression profiles slightly outperform the original expression profiles. In summary, a big challenge faced by the weighting method is how to estimate pathway knowledge-based weights more accurately and precisely. Only until the issue is conquered successfully will wide utilization of the weighting methods be impossible. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Sun, Hokeun; Wang, Shuang
2014-08-15
Existing association methods for rare variants from sequencing data have focused on aggregating variants in a gene or a genetic region because of the fact that analysing individual rare variants is underpowered. However, these existing rare variant detection methods are not able to identify which rare variants in a gene or a genetic region of all variants are associated with the complex diseases or traits. Once phenotypic associations of a gene or a genetic region are identified, the natural next step in the association study with sequencing data is to locate the susceptible rare variants within the gene or the genetic region. In this article, we propose a power set-based statistical selection procedure that is able to identify the locations of the potentially susceptible rare variants within a disease-related gene or a genetic region. The selection performance of the proposed selection procedure was evaluated through simulation studies, where we demonstrated the feasibility and superior power over several comparable existing methods. In particular, the proposed method is able to handle the mixed effects when both risk and protective variants are present in a gene or a genetic region. The proposed selection procedure was also applied to the sequence data on the ANGPTL gene family from the Dallas Heart Study to identify potentially susceptible rare variants within the trait-related genes. An R package 'rvsel' can be downloaded from http://www.columbia.edu/∼sw2206/ and http://statsun.pusan.ac.kr. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Inference on the Strength of Balancing Selection for Epistatically Interacting Loci
Buzbas, Erkan Ozge; Joyce, Paul; Rosenberg, Noah A.
2011-01-01
Existing inference methods for estimating the strength of balancing selection in multi-locus genotypes rely on the assumption that there are no epistatic interactions between loci. Complex systems in which balancing selection is prevalent, such as sets of human immune system genes, are known to contain components that interact epistatically. Therefore, current methods may not produce reliable inference on the strength of selection at these loci. In this paper, we address this problem by presenting statistical methods that can account for epistatic interactions in making inference about balancing selection. A theoretical result due to Fearnhead (2006) is used to build a multi-locus Wright-Fisher model of balancing selection, allowing for epistatic interactions among loci. Antagonistic and synergistic types of interactions are examined. The joint posterior distribution of the selection and mutation parameters is sampled by Markov chain Monte Carlo methods, and the plausibility of models is assessed via Bayes factors. As a component of the inference process, an algorithm to generate multi-locus allele frequencies under balancing selection models with epistasis is also presented. Recent evidence on interactions among a set of human immune system genes is introduced as a motivating biological system for the epistatic model, and data on these genes are used to demonstrate the methods. PMID:21277883
Pavlidis, Paul; Qin, Jie; Arango, Victoria; Mann, John J; Sibille, Etienne
2004-06-01
One of the challenges in the analysis of gene expression data is placing the results in the context of other data available about genes and their relationships to each other. Here, we approach this problem in the study of gene expression changes associated with age in two areas of the human prefrontal cortex, comparing two computational methods. The first method, "overrepresentation analysis" (ORA), is based on statistically evaluating the fraction of genes in a particular gene ontology class found among the set of genes showing age-related changes in expression. The second method, "functional class scoring" (FCS), examines the statistical distribution of individual gene scores among all genes in the gene ontology class and does not involve an initial gene selection step. We find that FCS yields more consistent results than ORA, and the results of ORA depended strongly on the gene selection threshold. Our findings highlight the utility of functional class scoring for the analysis of complex expression data sets and emphasize the advantage of considering all available genomic information rather than sets of genes that pass a predetermined "threshold of significance."
Cui, Bintao; Smooker, Peter M; Rouch, Duncan A; Deighton, Margaret A
2016-08-01
Accurate and reproducible measurement of gene transcription requires appropriate reference genes, which are stably expressed under different experimental conditions to provide normalization. Staphylococcus capitis is a human pathogen that produces biofilm under stress, such as imposed by antimicrobial agents. In this study, a set of five commonly used staphylococcal reference genes (gyrB, sodA, recA, tuf and rpoB) were systematically evaluated in two clinical isolates of Staphylococcus capitis (S. capitis subspecies urealyticus and capitis, respectively) under erythromycin stress in mid-log and stationary phases. Two public software programs (geNorm and NormFinder) and two manual calculation methods, reference residue normalization (RRN) and relative quantitative (RQ), were applied. The potential reference genes selected by the four algorithms were further validated by comparing the expression of a well-studied biofilm gene (icaA) with phenotypic biofilm formation in S. capitis under four different experimental conditions. The four methods differed considerably in their ability to predict the most suitable reference gene or gene combination for comparing icaA expression under different conditions. Under the conditions used here, the RQ method provided better selection of reference genes than the other three algorithms; however, this finding needs to be confirmed with a larger number of isolates. This study reinforces the need to assess the stability of reference genes for analysis of target gene expression under different conditions and the use of more than one algorithm in such studies. Although this work was conducted using a specific human pathogen, it emphasizes the importance of selecting suitable reference genes for accurate normalization of gene expression more generally.
Khang, Chang Hyun; Park, Sook-Young; Lee, Yong-Hwan; Kang, Seogchan
2005-06-01
Rapid progress in fungal genome sequencing presents many new opportunities for functional genomic analysis of fungal biology through the systematic mutagenesis of the genes identified through sequencing. However, the lack of efficient tools for targeted gene replacement is a limiting factor for fungal functional genomics, as it often necessitates the screening of a large number of transformants to identify the desired mutant. We developed an efficient method of gene replacement and evaluated factors affecting the efficiency of this method using two plant pathogenic fungi, Magnaporthe grisea and Fusarium oxysporum. This method is based on Agrobacterium tumefaciens-mediated transformation with a mutant allele of the target gene flanked by the herpes simplex virus thymidine kinase (HSVtk) gene as a conditional negative selection marker against ectopic transformants. The HSVtk gene product converts 5-fluoro-2'-deoxyuridine to a compound toxic to diverse fungi. Because ectopic transformants express HSVtk, while gene replacement mutants lack HSVtk, growing transformants on a medium amended with 5-fluoro-2'-deoxyuridine facilitates the identification of targeted mutants by counter-selecting against ectopic transformants. In addition to M. grisea and F. oxysporum, the method and associated vectors are likely to be applicable to manipulating genes in a broad spectrum of fungi, thus potentially serving as an efficient, universal functional genomic tool for harnessing the growing body of fungal genome sequence data to study fungal biology.
Tabu search and binary particle swarm optimization for feature selection using microarray data.
Chuang, Li-Yeh; Yang, Cheng-Huei; Yang, Cheng-Hong
2009-12-01
Gene expression profiles have great potential as a medical diagnosis tool because they represent the state of a cell at the molecular level. In the classification of cancer type research, available training datasets generally have a fairly small sample size compared to the number of genes involved. This fact poses an unprecedented challenge to some classification methodologies due to training data limitations. Therefore, a good selection method for genes relevant for sample classification is needed to improve the predictive accuracy, and to avoid incomprehensibility due to the large number of genes investigated. In this article, we propose to combine tabu search (TS) and binary particle swarm optimization (BPSO) for feature selection. BPSO acts as a local optimizer each time the TS has been run for a single generation. The K-nearest neighbor method with leave-one-out cross-validation and support vector machine with one-versus-rest serve as evaluators of the TS and BPSO. The proposed method is applied and compared to the 11 classification problems taken from the literature. Experimental results show that our method simplifies features effectively and either obtains higher classification accuracy or uses fewer features compared to other feature selection methods.
ROKU: a novel method for identification of tissue-specific genes
Kadota, Koji; Ye, Jiazhen; Nakai, Yuji; Terada, Tohru; Shimizu, Kentaro
2006-01-01
Background One of the important goals of microarray research is the identification of genes whose expression is considerably higher or lower in some tissues than in others. We would like to have ways of identifying such tissue-specific genes. Results We describe a method, ROKU, which selects tissue-specific patterns from gene expression data for many tissues and thousands of genes. ROKU ranks genes according to their overall tissue specificity using Shannon entropy and detects tissues specific to each gene if any exist using an outlier detection method. We evaluated the capacity for the detection of various specific expression patterns using synthetic and real data. We observed that ROKU was superior to a conventional entropy-based method in its ability to rank genes according to overall tissue specificity and to detect genes whose expression pattern are specific only to objective tissues. Conclusion ROKU is useful for the detection of various tissue-specific expression patterns. The framework is also directly applicable to the selection of diagnostic markers for molecular classification of multiple classes. PMID:16764735
Eveno, Emmanuelle; Collada, Carmen; Guevara, M Angeles; Léger, Valérie; Soto, Alvaro; Díaz, Luis; Léger, Patrick; González-Martínez, Santiago C; Cervera, M Teresa; Plomion, Christophe; Garnier-Géré, Pauline H
2008-02-01
The importance of natural selection for shaping adaptive trait differentiation among natural populations of allogamous tree species has long been recognized. Determining the molecular basis of local adaptation remains largely unresolved, and the respective roles of selection and demography in shaping population structure are actively debated. Using a multilocus scan that aims to detect outliers from simulated neutral expectations, we analyzed patterns of nucleotide diversity and genetic differentiation at 11 polymorphic candidate genes for drought stress tolerance in phenotypically contrasted Pinus pinaster Ait. populations across its geographical range. We compared 3 coalescent-based methods: 2 frequentist-like, including 1 approach specifically developed for biallelic single nucleotide polymorphisms (SNPs) here and 1 Bayesian. Five genes showed outlier patterns that were robust across methods at the haplotype level for 2 of them. Two genes presented higher F(ST) values than expected (PR-AGP4 and erd3), suggesting that they could have been affected by the action of diversifying selection among populations. In contrast, 3 genes presented lower F(ST) values than expected (dhn-1, dhn2, and lp3-1), which could represent signatures of homogenizing selection among populations. A smaller proportion of outliers were detected at the SNP level suggesting the potential functional significance of particular combinations of sites in drought-response candidate genes. The Bayesian method appeared robust to low sample sizes, flexible to assumptions regarding migration rates, and powerful for detecting selection at the haplotype level, but the frequentist-like method adapted to SNPs was more efficient for the identification of outlier SNPs showing low differentiation. Population-specific effects estimated in the Bayesian method also revealed populations with lower immigration rates, which could have led to favorable situations for local adaptation. Outlier patterns are discussed in relation to the different genes' putative involvement in drought tolerance responses, from published results in transcriptomics and association mapping in P. pinaster and other related species. These genes clearly constitute relevant candidates for future association studies in P. pinaster.
Kadota, Koji; Konishi, Tomokazu; Shimizu, Kentaro
2007-05-01
Large-scale expression profiling using DNA microarrays enables identification of tissue-selective genes for which expression is considerably higher and/or lower in some tissues than in others. Among numerous possible methods, only two outlier-detection-based methods (an AIC-based method and Sprent's non-parametric method) can treat equally various types of selective patterns, but they produce substantially different results. We investigated the performance of these two methods for different parameter settings and for a reduced number of samples. We focused on their ability to detect selective expression patterns robustly. We applied them to public microarray data collected from 36 normal human tissue samples and analyzed the effects of both changing the parameter settings and reducing the number of samples. The AIC-based method was more robust in both cases. The findings confirm that the use of the AIC-based method in the recently proposed ROKU method for detecting tissue-selective expression patterns is correct and that Sprent's method is not suitable for ROKU.
Novel harmonic regularization approach for variable selection in Cox's proportional hazards model.
Chu, Ge-Jin; Liang, Yong; Wang, Jia-Xuan
2014-01-01
Variable selection is an important issue in regression and a number of variable selection methods have been proposed involving nonconvex penalty functions. In this paper, we investigate a novel harmonic regularization method, which can approximate nonconvex Lq (1/2 < q < 1) regularizations, to select key risk factors in the Cox's proportional hazards model using microarray gene expression data. The harmonic regularization method can be efficiently solved using our proposed direct path seeking approach, which can produce solutions that closely approximate those for the convex loss function and the nonconvex regularization. Simulation results based on the artificial datasets and four real microarray gene expression datasets, such as real diffuse large B-cell lymphoma (DCBCL), the lung cancer, and the AML datasets, show that the harmonic regularization method can be more accurate for variable selection than existing Lasso series methods.
Derivation of an artificial gene to improve classification accuracy upon gene selection.
Seo, Minseok; Oh, Sejong
2012-02-01
Classification analysis has been developed continuously since 1936. This research field has advanced as a result of development of classifiers such as KNN, ANN, and SVM, as well as through data preprocessing areas. Feature (gene) selection is required for very high dimensional data such as microarray before classification work. The goal of feature selection is to choose a subset of informative features that reduces processing time and provides higher classification accuracy. In this study, we devised a method of artificial gene making (AGM) for microarray data to improve classification accuracy. Our artificial gene was derived from a whole microarray dataset, and combined with a result of gene selection for classification analysis. We experimentally confirmed a clear improvement of classification accuracy after inserting artificial gene. Our artificial gene worked well for popular feature (gene) selection algorithms and classifiers. The proposed approach can be applied to any type of high dimensional dataset. Copyright © 2011 Elsevier Ltd. All rights reserved.
Missing-value estimation using linear and non-linear regression with Bayesian gene selection.
Zhou, Xiaobo; Wang, Xiaodong; Dougherty, Edward R
2003-11-22
Data from microarray experiments are usually in the form of large matrices of expression levels of genes under different experimental conditions. Owing to various reasons, there are frequently missing values. Estimating these missing values is important because they affect downstream analysis, such as clustering, classification and network design. Several methods of missing-value estimation are in use. The problem has two parts: (1) selection of genes for estimation and (2) design of an estimation rule. We propose Bayesian variable selection to obtain genes to be used for estimation, and employ both linear and nonlinear regression for the estimation rule itself. Fast implementation issues for these methods are discussed, including the use of QR decomposition for parameter estimation. The proposed methods are tested on data sets arising from hereditary breast cancer and small round blue-cell tumors. The results compare very favorably with currently used methods based on the normalized root-mean-square error. The appendix is available from http://gspsnap.tamu.edu/gspweb/zxb/missing_zxb/ (user: gspweb; passwd: gsplab).
mazF, a novel counter-selectable marker for unmarked chromosomal manipulation in Bacillus subtilis.
Zhang, Xiao-Zhou; Yan, Xin; Cui, Zhong-Li; Hong, Qing; Li, Shun-Peng
2006-05-19
Here, we present a novel method for the directed genetic manipulation of the Bacillus subtilis chromosome free of any selection marker. Our new approach employed the Escherichia coli toxin gene mazF as a counter-selectable marker. The mazF gene was placed under the control of an isopropyl-beta-D-thiogalactopyranoside (IPTG)-inducible expression system and associated with a spectomycin-resistance gene to form the MazF cassette, which was flanked by two directly-repeated (DR) sequences. A double-crossover event between the linearized delivery vector and the chromosome integrated the MazF cassette into a target locus and yielded an IPTG-sensitive strain with spectomycin-resistance, in which the wild-type chromosome copy had been replaced by the modified copy at the targeted locus. Another single-crossover event between the two DR sequences led to the excision of the MazF cassette and generated a strain with IPTG resistance, thereby realizing the desired alteration to the chromosome without introducing any unwanted selection markers. We used this method repeatedly and successfully to inactivate a specific gene, to introduce a gene of interest and to realize the in-frame deletion of a target gene in the same strain. As there is no prerequisite strain for this method, it will be a powerful and universal tool.
Mansourian, Robert; Mutch, David M; Antille, Nicolas; Aubert, Jerome; Fogel, Paul; Le Goff, Jean-Marc; Moulin, Julie; Petrov, Anton; Rytz, Andreas; Voegel, Johannes J; Roberts, Matthew-Alan
2004-11-01
Microarray technology has become a powerful research tool in many fields of study; however, the cost of microarrays often results in the use of a low number of replicates (k). Under circumstances where k is low, it becomes difficult to perform standard statistical tests to extract the most biologically significant experimental results. Other more advanced statistical tests have been developed; however, their use and interpretation often remain difficult to implement in routine biological research. The present work outlines a method that achieves sufficient statistical power for selecting differentially expressed genes under conditions of low k, while remaining as an intuitive and computationally efficient procedure. The present study describes a Global Error Assessment (GEA) methodology to select differentially expressed genes in microarray datasets, and was developed using an in vitro experiment that compared control and interferon-gamma treated skin cells. In this experiment, up to nine replicates were used to confidently estimate error, thereby enabling methods of different statistical power to be compared. Gene expression results of a similar absolute expression are binned, so as to enable a highly accurate local estimate of the mean squared error within conditions. The model then relates variability of gene expression in each bin to absolute expression levels and uses this in a test derived from the classical ANOVA. The GEA selection method is compared with both the classical and permutational ANOVA tests, and demonstrates an increased stability, robustness and confidence in gene selection. A subset of the selected genes were validated by real-time reverse transcription-polymerase chain reaction (RT-PCR). All these results suggest that GEA methodology is (i) suitable for selection of differentially expressed genes in microarray data, (ii) intuitive and computationally efficient and (iii) especially advantageous under conditions of low k. The GEA code for R software is freely available upon request to authors.
[Production of marker-free plants expressing the gene of the hepatitis B virus surface antigen].
Rukavtsova, E B; Gaiazova, A R; Chebotareva, E N; Bur'ianova, Ia I
2009-08-01
The pBM plasmid, carrying the gene of hepatitis B virus surface antigen (HBsAg) and free of any selection markers of antibiotic or herbicide resistance, was constructed for genetic transformation of plants. A method for screening transformed plant seedlings on nonselective media was developed. Enzyme immunoassay was used for selecting transgenic plants with HBsAg gene among the produced regenerants; this method provides for a high sensitivity detection of HBsAg in plant extracts. Tobacco and tomato transgenic lines synthesizing this antigen at a level of 0.01-0.05% of the total soluble protein were obtained. The achieved level of HBsAg synthesis is sufficient for preclinical trials of the produced plants as a new generation safe edible vaccine. The developed method for selecting transformants can be used for producing safe plants free of selection markers.
Wang, Xianghong; Jiang, Daiming; Yang, Daichang
2015-01-01
The selection of homozygous lines is a crucial step in the characterization of newly generated transgenic plants. This is particularly time- and labor-consuming when transgenic stacking is required. Here, we report a fast and accurate method based on quantitative real-time PCR with a rice gene RBE4 as a reference gene for selection of homozygous lines when using multiple transgenic stacking in rice. Use of this method allowed can be used to determine the stacking of up to three transgenes within four generations. Selection accuracy reached 100 % for a single locus and 92.3 % for two loci. This method confers distinct advantages over current transgenic research methodologies, as it is more accurate, rapid, and reliable. Therefore, this protocol could be used to efficiently select homozygous plants and to expedite time- and labor-consuming processes normally required for multiple transgene stacking. This protocol was standardized for determination of multiple gene stacking in molecular breeding via marker-assisted selection.
Method for determining gene knockouts
Maranas, Costas D [Port Matilda, PA; Burgard, Anthony R [State College, PA; Pharkya, Priti [State College, PA
2011-09-27
A method for determining candidates for gene deletions and additions using a model of a metabolic network associated with an organism, the model includes a plurality of metabolic reactions defining metabolite relationships, the method includes selecting a bioengineering objective for the organism, selecting at least one cellular objective, forming an optimization problem that couples the at least one cellular objective with the bioengineering objective, and solving the optimization problem to yield at least one candidate.
Method for determining gene knockouts
Maranas, Costa D; Burgard, Anthony R; Pharkya, Priti
2013-06-04
A method for determining candidates for gene deletions and additions using a model of a metabolic network associated with an organism, the model includes a plurality of metabolic reactions defining metabolite relationships, the method includes selecting a bioengineering objective for the organism, selecting at least one cellular objective, forming an optimization problem that couples the at least one cellular objective with the bioengineering objective, and solving the optimization problem to yield at least one candidate.
2015-01-01
Background Over the past 50,000 years, shifts in human-environmental or human-human interactions shaped genetic differences within and among human populations, including variants under positive selection. Shaped by environmental factors, such variants influence the genetics of modern health, disease, and treatment outcome. Because evolutionary processes tend to act on gene regulation, we test whether regulatory variants are under positive selection. We introduce a new approach to enhance detection of genetic markers undergoing positive selection, using conditional entropy to capture recent local selection signals. Results We use conditional logistic regression to compare our Adjusted Haplotype Conditional Entropy (H|H) measure of positive selection to existing positive selection measures. H|H and existing measures were applied to published regulatory variants acting in cis (cis-eQTLs), with conditional logistic regression testing whether regulatory variants undergo stronger positive selection than the surrounding gene. These cis-eQTLs were drawn from six independent studies of genotype and RNA expression. The conditional logistic regression shows that, overall, H|H is substantially more powerful than existing positive-selection methods in identifying cis-eQTLs against other Single Nucleotide Polymorphisms (SNPs) in the same genes. When broken down by Gene Ontology, H|H predictions are particularly strong in some biological process categories, where regulatory variants are under strong positive selection compared to the bulk of the gene, distinct from those GO categories under overall positive selection. . However, cis-eQTLs in a second group of genes lack positive selection signatures detectable by H|H, consistent with ancient short haplotypes compared to the surrounding gene (for example, in innate immunity GO:0042742); under such other modes of selection, H|H would not be expected to be a strong predictor.. These conditional logistic regression models are adjusted for Minor allele frequency(MAF); otherwise, ascertainment bias is a huge factor in all eQTL data sets. Relationships between Gene Ontology categories, positive selection and eQTL specificity were replicated with H|H in a single larger data set. Our measure, Adjusted Haplotype Conditional Entropy (H|H), was essential in generating all of the results above because it: 1) is a stronger overall predictor for eQTLs than comparable existing approaches, and 2) shows low sequential auto-correlation, overcoming problems with convergence of these conditional regression statistical models. Conclusions Our new method, H|H, provides a consistently more robust signal associated with cis-eQTLs compared to existing methods. We interpret this to indicate that some cis-eQTLs are under positive selection compared to their surrounding genes. Conditional entropy indicative of a selective sweep is an especially strong predictor of eQTLs for genes in several biological processes of medical interest. Where conditional entropy is a weak or negative predictor of eQTLs, such as innate immune genes, this would be consistent with balancing selection acting on such eQTLs over long time periods. Different measures of selection may be needed for variant prioritization under other modes of evolutionary selection. PMID:26111110
Alshamlan, Hala M; Badr, Ghada H; Alohali, Yousef A
2015-06-01
Naturally inspired evolutionary algorithms prove effectiveness when used for solving feature selection and classification problems. Artificial Bee Colony (ABC) is a relatively new swarm intelligence method. In this paper, we propose a new hybrid gene selection method, namely Genetic Bee Colony (GBC) algorithm. The proposed algorithm combines the used of a Genetic Algorithm (GA) along with Artificial Bee Colony (ABC) algorithm. The goal is to integrate the advantages of both algorithms. The proposed algorithm is applied to a microarray gene expression profile in order to select the most predictive and informative genes for cancer classification. In order to test the accuracy performance of the proposed algorithm, extensive experiments were conducted. Three binary microarray datasets are use, which include: colon, leukemia, and lung. In addition, another three multi-class microarray datasets are used, which are: SRBCT, lymphoma, and leukemia. Results of the GBC algorithm are compared with our recently proposed technique: mRMR when combined with the Artificial Bee Colony algorithm (mRMR-ABC). We also compared the combination of mRMR with GA (mRMR-GA) and Particle Swarm Optimization (mRMR-PSO) algorithms. In addition, we compared the GBC algorithm with other related algorithms that have been recently published in the literature, using all benchmark datasets. The GBC algorithm shows superior performance as it achieved the highest classification accuracy along with the lowest average number of selected genes. This proves that the GBC algorithm is a promising approach for solving the gene selection problem in both binary and multi-class cancer classification. Copyright © 2015 Elsevier Ltd. All rights reserved.
Ensemble Feature Learning of Genomic Data Using Support Vector Machine
Anaissi, Ali; Goyal, Madhu; Catchpoole, Daniel R.; Braytee, Ali; Kennedy, Paul J.
2016-01-01
The identification of a subset of genes having the ability to capture the necessary information to distinguish classes of patients is crucial in bioinformatics applications. Ensemble and bagging methods have been shown to work effectively in the process of gene selection and classification. Testament to that is random forest which combines random decision trees with bagging to improve overall feature selection and classification accuracy. Surprisingly, the adoption of these methods in support vector machines has only recently received attention but mostly on classification not gene selection. This paper introduces an ensemble SVM-Recursive Feature Elimination (ESVM-RFE) for gene selection that follows the concepts of ensemble and bagging used in random forest but adopts the backward elimination strategy which is the rationale of RFE algorithm. The rationale behind this is, building ensemble SVM models using randomly drawn bootstrap samples from the training set, will produce different feature rankings which will be subsequently aggregated as one feature ranking. As a result, the decision for elimination of features is based upon the ranking of multiple SVM models instead of choosing one particular model. Moreover, this approach will address the problem of imbalanced datasets by constructing a nearly balanced bootstrap sample. Our experiments show that ESVM-RFE for gene selection substantially increased the classification performance on five microarray datasets compared to state-of-the-art methods. Experiments on the childhood leukaemia dataset show that an average 9% better accuracy is achieved by ESVM-RFE over SVM-RFE, and 5% over random forest based approach. The selected genes by the ESVM-RFE algorithm were further explored with Singular Value Decomposition (SVD) which reveals significant clusters with the selected data. PMID:27304923
Robust gene selection methods using weighting schemes for microarray data analysis.
Kang, Suyeon; Song, Jongwoo
2017-09-02
A common task in microarray data analysis is to identify informative genes that are differentially expressed between two different states. Owing to the high-dimensional nature of microarray data, identification of significant genes has been essential in analyzing the data. However, the performances of many gene selection techniques are highly dependent on the experimental conditions, such as the presence of measurement error or a limited number of sample replicates. We have proposed new filter-based gene selection techniques, by applying a simple modification to significance analysis of microarrays (SAM). To prove the effectiveness of the proposed method, we considered a series of synthetic datasets with different noise levels and sample sizes along with two real datasets. The following findings were made. First, our proposed methods outperform conventional methods for all simulation set-ups. In particular, our methods are much better when the given data are noisy and sample size is small. They showed relatively robust performance regardless of noise level and sample size, whereas the performance of SAM became significantly worse as the noise level became high or sample size decreased. When sufficient sample replicates were available, SAM and our methods showed similar performance. Finally, our proposed methods are competitive with traditional methods in classification tasks for microarrays. The results of simulation study and real data analysis have demonstrated that our proposed methods are effective for detecting significant genes and classification tasks, especially when the given data are noisy or have few sample replicates. By employing weighting schemes, we can obtain robust and reliable results for microarray data analysis.
Chen, Minhui; Wang, Jiying; Wang, Yanping; Wu, Ying; Fu, Jinluan; Liu, Jian-Feng
2018-05-18
Currently, genome-wide scans for positive selection signatures in commercial breed have been investigated. However, few studies have focused on selection footprints of indigenous breeds. Laiwu pig is an invaluable Chinese indigenous pig breed with extremely high proportion of intramuscular fat (IMF), and an excellent model to detect footprint as the result of natural and artificial selection for fat deposition in muscle. In this study, based on GeneSeek Genomic profiler Porcine HD data, three complementary methods, F ST , iHS (integrated haplotype homozygosity score) and CLR (composite likelihood ratio), were implemented to detect selection signatures in the whole genome of Laiwu pigs. Totally, 175 candidate selected regions were obtained by at least two of the three methods, which covered 43.75 Mb genomic regions and corresponded to 1.79% of the genome sequence. Gene annotation of the selected regions revealed a list of functionally important genes for feed intake and fat deposition, reproduction, and immune response. Especially, in accordance to the phenotypic features of Laiwu pigs, among the candidate genes, we identified several genes, NPY1R, NPY5R, PIK3R1 and JAKMIP1, involved in the actions of two sets of neurons, which are central regulators in maintaining the balance between food intake and energy expenditure. Our results identified a number of regions showing signatures of selection, as well as a list of functionally candidate genes with potential effect on phenotypic traits, especially fat deposition in muscle. Our findings provide insights into the mechanisms of artificial selection of fat deposition and further facilitate follow-up functional studies.
Gunavathi, Chellamuthu; Premalatha, Kandasamy
2014-01-01
Feature selection in cancer classification is a central area of research in the field of bioinformatics and used to select the informative genes from thousands of genes of the microarray. The genes are ranked based on T-statistics, signal-to-noise ratio (SNR), and F-test values. The swarm intelligence (SI) technique finds the informative genes from the top-m ranked genes. These selected genes are used for classification. In this paper the shuffled frog leaping with Lévy flight (SFLLF) is proposed for feature selection. In SFLLF, the Lévy flight is included to avoid premature convergence of shuffled frog leaping (SFL) algorithm. The SI techniques such as particle swarm optimization (PSO), cuckoo search (CS), SFL, and SFLLF are used for feature selection which identifies informative genes for classification. The k-nearest neighbour (k-NN) technique is used to classify the samples. The proposed work is applied on 10 different benchmark datasets and examined with SI techniques. The experimental results show that the results obtained from k-NN classifier through SFLLF feature selection method outperform PSO, CS, and SFL.
Cerón, J; Ortíz, A; Quintero, R; Güereca, L; Bravo, A
1995-01-01
In this paper we describe a PCR strategy that can be used to rapidly identify Bacillus thuringiensis strains that harbor any of the known cryI or cryIII genes. Four general PCR primers which amplify DNA fragments from the known cryI or cryIII genes were selected from conserved regions. Once a strain was identified as an organism that contains a particular type of cry gene, it could be easily characterized by performing additional PCR with specific cryI and cryIII primers selected from variable regions. The method described in this paper can be used to identify the 10 different cryI genes and the five different cryIII genes. One feature of this screening method is that each cry gene is expected to produce a PCR product having a precise molecular weight. The genes which produce PCR products having different sizes probably represent strains that harbor a potentially novel cry gene. Finally, we present evidence that novel crystal genes can be identified by the method described in this paper. PMID:8526493
Blueberry (Vaccinium corymbosum L.).
Song, Guo-Qing
2015-01-01
Vaccinium consists of approximately 450 species, of which highbush blueberry (Vaccinium corymbosum) is one of the three major Vaccinium fruit crops (i.e., blueberry, cranberry, and lingonberry) domesticated in the twentieth century. In blueberry the adventitious shoot regeneration using leaf explants has been the most desirable regeneration system to date; Agrobacterium tumefaciens-mediated transformation is the major gene delivery method and effective selection has been reported using either the neomycin phosphotransferase II gene (nptII) or the bialaphos resistance (bar) gene as selectable markers. The A. tumefaciens-mediated transformation protocol described in this chapter is based on combining the optimal conditions for efficient plant regeneration, reliable gene delivery, and effective selection. The protocol has led to successful regeneration of transgenic plants from leaf explants of four commercially important highbush blueberry cultivars for multiple purposes, providing a powerful approach to supplement conventional breeding methods for blueberry by introducing genes of interest.
Novel Harmonic Regularization Approach for Variable Selection in Cox's Proportional Hazards Model
Chu, Ge-Jin; Liang, Yong; Wang, Jia-Xuan
2014-01-01
Variable selection is an important issue in regression and a number of variable selection methods have been proposed involving nonconvex penalty functions. In this paper, we investigate a novel harmonic regularization method, which can approximate nonconvex Lq (1/2 < q < 1) regularizations, to select key risk factors in the Cox's proportional hazards model using microarray gene expression data. The harmonic regularization method can be efficiently solved using our proposed direct path seeking approach, which can produce solutions that closely approximate those for the convex loss function and the nonconvex regularization. Simulation results based on the artificial datasets and four real microarray gene expression datasets, such as real diffuse large B-cell lymphoma (DCBCL), the lung cancer, and the AML datasets, show that the harmonic regularization method can be more accurate for variable selection than existing Lasso series methods. PMID:25506389
Mallik, Saurav; Bhadra, Tapas; Maulik, Ujjwal
2017-01-01
Epigenetic Biomarker discovery is an important task in bioinformatics. In this article, we develop a new framework of identifying statistically significant epigenetic biomarkers using maximal-relevance and minimal-redundancy criterion based feature (gene) selection for multi-omics dataset. Firstly, we determine the genes that have both expression as well as methylation values, and follow normal distribution. Similarly, we identify the genes which consist of both expression and methylation values, but do not follow normal distribution. For each case, we utilize a gene-selection method that provides maximal-relevant, but variable-weighted minimum-redundant genes as top ranked genes. For statistical validation, we apply t-test on both the expression and methylation data consisting of only the normally distributed top ranked genes to determine how many of them are both differentially expressed andmethylated. Similarly, we utilize Limma package for performing non-parametric Empirical Bayes test on both expression and methylation data comprising only the non-normally distributed top ranked genes to identify how many of them are both differentially expressed and methylated. We finally report the top-ranking significant gene-markerswith biological validation. Moreover, our framework improves positive predictive rate and reduces false positive rate in marker identification. In addition, we provide a comparative analysis of our gene-selection method as well as othermethods based on classificationperformances obtained using several well-known classifiers.
Velasco, Valeria; Sherwood, Julie S.; Rojas-García, Pedro P.; Logue, Catherine M.
2014-01-01
The aim of this study was to compare a real-time PCR assay, with a conventional culture/PCR method, to detect S. aureus, mecA and Panton-Valentine Leukocidin (PVL) genes in animals and retail meat, using a two-step selective enrichment protocol. A total of 234 samples were examined (77 animal nasal swabs, 112 retail raw meat, and 45 deli meat). The multiplex real-time PCR targeted the genes: nuc (identification of S. aureus), mecA (associated with methicillin resistance) and PVL (virulence factor), and the primary and secondary enrichment samples were assessed. The conventional culture/PCR method included the two-step selective enrichment, selective plating, biochemical testing, and multiplex PCR for confirmation. The conventional culture/PCR method recovered 95/234 positive S. aureus samples. Application of real-time PCR on samples following primary and secondary enrichment detected S. aureus in 111/234 and 120/234 samples respectively. For detection of S. aureus, the kappa statistic was 0.68–0.88 (from substantial to almost perfect agreement) and 0.29–0.77 (from fair to substantial agreement) for primary and secondary enrichments, using real-time PCR. For detection of mecA gene, the kappa statistic was 0–0.49 (from no agreement beyond that expected by chance to moderate agreement) for primary and secondary enrichment samples. Two pork samples were mecA gene positive by all methods. The real-time PCR assay detected the mecA gene in samples that were negative for S. aureus, but positive for Staphylococcus spp. The PVL gene was not detected in any sample by the conventional culture/PCR method or the real-time PCR assay. Among S. aureus isolated by conventional culture/PCR method, the sequence type ST398, and multi-drug resistant strains were found in animals and raw meat samples. The real-time PCR assay may be recommended as a rapid method for detection of S. aureus and the mecA gene, with further confirmation of methicillin-resistant S. aureus (MRSA) using the standard culture method. PMID:24849624
Velasco, Valeria; Sherwood, Julie S; Rojas-García, Pedro P; Logue, Catherine M
2014-01-01
The aim of this study was to compare a real-time PCR assay, with a conventional culture/PCR method, to detect S. aureus, mecA and Panton-Valentine Leukocidin (PVL) genes in animals and retail meat, using a two-step selective enrichment protocol. A total of 234 samples were examined (77 animal nasal swabs, 112 retail raw meat, and 45 deli meat). The multiplex real-time PCR targeted the genes: nuc (identification of S. aureus), mecA (associated with methicillin resistance) and PVL (virulence factor), and the primary and secondary enrichment samples were assessed. The conventional culture/PCR method included the two-step selective enrichment, selective plating, biochemical testing, and multiplex PCR for confirmation. The conventional culture/PCR method recovered 95/234 positive S. aureus samples. Application of real-time PCR on samples following primary and secondary enrichment detected S. aureus in 111/234 and 120/234 samples respectively. For detection of S. aureus, the kappa statistic was 0.68-0.88 (from substantial to almost perfect agreement) and 0.29-0.77 (from fair to substantial agreement) for primary and secondary enrichments, using real-time PCR. For detection of mecA gene, the kappa statistic was 0-0.49 (from no agreement beyond that expected by chance to moderate agreement) for primary and secondary enrichment samples. Two pork samples were mecA gene positive by all methods. The real-time PCR assay detected the mecA gene in samples that were negative for S. aureus, but positive for Staphylococcus spp. The PVL gene was not detected in any sample by the conventional culture/PCR method or the real-time PCR assay. Among S. aureus isolated by conventional culture/PCR method, the sequence type ST398, and multi-drug resistant strains were found in animals and raw meat samples. The real-time PCR assay may be recommended as a rapid method for detection of S. aureus and the mecA gene, with further confirmation of methicillin-resistant S. aureus (MRSA) using the standard culture method.
Gene Regulatory Network Inferences Using a Maximum-Relevance and Maximum-Significance Strategy
Liu, Wei; Zhu, Wen; Liao, Bo; Chen, Xiangtao
2016-01-01
Recovering gene regulatory networks from expression data is a challenging problem in systems biology that provides valuable information on the regulatory mechanisms of cells. A number of algorithms based on computational models are currently used to recover network topology. However, most of these algorithms have limitations. For example, many models tend to be complicated because of the “large p, small n” problem. In this paper, we propose a novel regulatory network inference method called the maximum-relevance and maximum-significance network (MRMSn) method, which converts the problem of recovering networks into a problem of how to select the regulator genes for each gene. To solve the latter problem, we present an algorithm that is based on information theory and selects the regulator genes for a specific gene by maximizing the relevance and significance. A first-order incremental search algorithm is used to search for regulator genes. Eventually, a strict constraint is adopted to adjust all of the regulatory relationships according to the obtained regulator genes and thus obtain the complete network structure. We performed our method on five different datasets and compared our method to five state-of-the-art methods for network inference based on information theory. The results confirm the effectiveness of our method. PMID:27829000
Tian, Xin; Xin, Mingyuan; Luo, Jian; Liu, Mingyao; Jiang, Zhenran
2017-02-01
The selection of relevant genes for breast cancer metastasis is critical for the treatment and prognosis of cancer patients. Although much effort has been devoted to the gene selection procedures by use of different statistical analysis methods or computational techniques, the interpretation of the variables in the resulting survival models has been limited so far. This article proposes a new Random Forest (RF)-based algorithm to identify important variables highly related with breast cancer metastasis, which is based on the important scores of two variable selection algorithms, including the mean decrease Gini (MDG) criteria of Random Forest and the GeneRank algorithm with protein-protein interaction (PPI) information. The new gene selection algorithm can be called PPIRF. The improved prediction accuracy fully illustrated the reliability and high interpretability of gene list selected by the PPIRF approach.
Takahashi, Hiro; Aoyagi, Kazuhiko; Nakanishi, Yukihiro; Sasaki, Hiroki; Yoshida, Teruhiko; Honda, Hiroyuki
2006-07-01
Esophageal cancer is a well-known cancer with poorer prognosis than other cancers. An optimal and individualized treatment protocol based on accurate diagnosis is urgently needed to improve the treatment of cancer patients. For this purpose, it is important to develop a sophisticated algorithm that can manage a large amount of data, such as gene expression data from DNA microarrays, for optimal and individualized diagnosis. Marker gene selection is essential in the analysis of gene expression data. We have already developed a combination method of the use of the projective adaptive resonance theory and that of a boosted fuzzy classifier with the SWEEP operator denoted PART-BFCS. This method is superior to other methods, and has four features, namely fast calculation, accurate prediction, reliable prediction, and rule extraction. In this study, we applied this method to analyze microarray data obtained from esophageal cancer patients. A combination method of PART-BFCS and the U-test was also investigated. It was necessary to use a specific type of BFCS, namely, BFCS-1,2, because the esophageal cancer data were very complexity. PART-BFCS and PART-BFCS with the U-test models showed higher performances than two conventional methods, namely, k-nearest neighbor (kNN) and weighted voting (WV). The genes including CDK6 could be found by our methods and excellent IF-THEN rules could be extracted. The genes selected in this study have a high potential as new diagnosis markers for esophageal cancer. These results indicate that the new methods can be used in marker gene selection for the diagnosis of cancer patients.
Detecting gene subnetworks under selection in biological pathways.
Gouy, Alexandre; Daub, Joséphine T; Excoffier, Laurent
2017-09-19
Advances in high throughput sequencing technologies have created a gap between data production and functional data analysis. Indeed, phenotypes result from interactions between numerous genes, but traditional methods treat loci independently, missing important knowledge brought by network-level emerging properties. Therefore, detecting selection acting on multiple genes affecting the evolution of complex traits remains challenging. In this context, gene network analysis provides a powerful framework to study the evolution of adaptive traits and facilitates the interpretation of genome-wide data. We developed a method to analyse gene networks that is suitable to evidence polygenic selection. The general idea is to search biological pathways for subnetworks of genes that directly interact with each other and that present unusual evolutionary features. Subnetwork search is a typical combinatorial optimization problem that we solve using a simulated annealing approach. We have applied our methodology to find signals of adaptation to high-altitude in human populations. We show that this adaptation has a clear polygenic basis and is influenced by many genetic components. Our approach, implemented in the R package signet, improves on gene-level classical tests for selection by identifying both new candidate genes and new biological processes involved in adaptation to altitude. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Statistical Analysis of Big Data on Pharmacogenomics
Fan, Jianqing; Liu, Han
2013-01-01
This paper discusses statistical methods for estimating complex correlation structure from large pharmacogenomic datasets. We selectively review several prominent statistical methods for estimating large covariance matrix for understanding correlation structure, inverse covariance matrix for network modeling, large-scale simultaneous tests for selecting significantly differently expressed genes and proteins and genetic markers for complex diseases, and high dimensional variable selection for identifying important molecules for understanding molecule mechanisms in pharmacogenomics. Their applications to gene network estimation and biomarker selection are used to illustrate the methodological power. Several new challenges of Big data analysis, including complex data distribution, missing data, measurement error, spurious correlation, endogeneity, and the need for robust statistical methods, are also discussed. PMID:23602905
McKinney, Brett A.; White, Bill C.; Grill, Diane E.; Li, Peter W.; Kennedy, Richard B.; Poland, Gregory A.; Oberg, Ann L.
2013-01-01
Relief-F is a nonparametric, nearest-neighbor machine learning method that has been successfully used to identify relevant variables that may interact in complex multivariate models to explain phenotypic variation. While several tools have been developed for assessing differential expression in sequence-based transcriptomics, the detection of statistical interactions between transcripts has received less attention in the area of RNA-seq analysis. We describe a new extension and assessment of Relief-F for feature selection in RNA-seq data. The ReliefSeq implementation adapts the number of nearest neighbors (k) for each gene to optimize the Relief-F test statistics (importance scores) for finding both main effects and interactions. We compare this gene-wise adaptive-k (gwak) Relief-F method with standard RNA-seq feature selection tools, such as DESeq and edgeR, and with the popular machine learning method Random Forests. We demonstrate performance on a panel of simulated data that have a range of distributional properties reflected in real mRNA-seq data including multiple transcripts with varying sizes of main effects and interaction effects. For simulated main effects, gwak-Relief-F feature selection performs comparably to standard tools DESeq and edgeR for ranking relevant transcripts. For gene-gene interactions, gwak-Relief-F outperforms all comparison methods at ranking relevant genes in all but the highest fold change/highest signal situations where it performs similarly. The gwak-Relief-F algorithm outperforms Random Forests for detecting relevant genes in all simulation experiments. In addition, Relief-F is comparable to the other methods based on computational time. We also apply ReliefSeq to an RNA-Seq study of smallpox vaccine to identify gene expression changes between vaccinia virus-stimulated and unstimulated samples. ReliefSeq is an attractive tool for inclusion in the suite of tools used for analysis of mRNA-Seq data; it has power to detect both main effects and interaction effects. Software Availability: http://insilico.utulsa.edu/ReliefSeq.php. PMID:24339943
Bao, Le; Gu, Hong; Dunn, Katherine A; Bielawski, Joseph P
2007-02-08
Models of codon evolution have proven useful for investigating the strength and direction of natural selection. In some cases, a priori biological knowledge has been used successfully to model heterogeneous evolutionary dynamics among codon sites. These are called fixed-effect models, and they require that all codon sites are assigned to one of several partitions which are permitted to have independent parameters for selection pressure, evolutionary rate, transition to transversion ratio or codon frequencies. For single gene analysis, partitions might be defined according to protein tertiary structure, and for multiple gene analysis partitions might be defined according to a gene's functional category. Given a set of related fixed-effect models, the task of selecting the model that best fits the data is not trivial. In this study, we implement a set of fixed-effect codon models which allow for different levels of heterogeneity among partitions in the substitution process. We describe strategies for selecting among these models by a backward elimination procedure, Akaike information criterion (AIC) or a corrected Akaike information criterion (AICc). We evaluate the performance of these model selection methods via a simulation study, and make several recommendations for real data analysis. Our simulation study indicates that the backward elimination procedure can provide a reliable method for model selection in this setting. We also demonstrate the utility of these models by application to a single-gene dataset partitioned according to tertiary structure (abalone sperm lysin), and a multi-gene dataset partitioned according to the functional category of the gene (flagellar-related proteins of Listeria). Fixed-effect models have advantages and disadvantages. Fixed-effect models are desirable when data partitions are known to exhibit significant heterogeneity or when a statistical test of such heterogeneity is desired. They have the disadvantage of requiring a priori knowledge for partitioning sites. We recommend: (i) selection of models by using backward elimination rather than AIC or AICc, (ii) use a stringent cut-off, e.g., p = 0.0001, and (iii) conduct sensitivity analysis of results. With thoughtful application, fixed-effect codon models should provide a useful tool for large scale multi-gene analyses.
Takahashi, Hiro; Kobayashi, Takeshi; Honda, Hiroyuki
2005-01-15
For establishing prognostic predictors of various diseases using DNA microarray analysis technology, it is desired to find selectively significant genes for constructing the prognostic model and it is also necessary to eliminate non-specific genes or genes with error before constructing the model. We applied projective adaptive resonance theory (PART) to gene screening for DNA microarray data. Genes selected by PART were subjected to our FNN-SWEEP modeling method for the construction of a cancer class prediction model. The model performance was evaluated through comparison with a conventional screening signal-to-noise (S2N) method or nearest shrunken centroids (NSC) method. The FNN-SWEEP predictor with PART screening could discriminate classes of acute leukemia in blinded data with 97.1% accuracy and classes of lung cancer with 90.0% accuracy, while the predictor with S2N was only 85.3 and 70.0% or the predictor with NSC was 88.2 and 90.0%, respectively. The results have proven that PART was superior for gene screening. The software is available upon request from the authors. honda@nubio.nagoya-u.ac.jp
A Model-Based Approach for Identifying Signatures of Ancient Balancing Selection in Genetic Data
DeGiorgio, Michael; Lohmueller, Kirk E.; Nielsen, Rasmus
2014-01-01
While much effort has focused on detecting positive and negative directional selection in the human genome, relatively little work has been devoted to balancing selection. This lack of attention is likely due to the paucity of sophisticated methods for identifying sites under balancing selection. Here we develop two composite likelihood ratio tests for detecting balancing selection. Using simulations, we show that these methods outperform competing methods under a variety of assumptions and demographic models. We apply the new methods to whole-genome human data, and find a number of previously-identified loci with strong evidence of balancing selection, including several HLA genes. Additionally, we find evidence for many novel candidates, the strongest of which is FANK1, an imprinted gene that suppresses apoptosis, is expressed during meiosis in males, and displays marginal signs of segregation distortion. We hypothesize that balancing selection acts on this locus to stabilize the segregation distortion and negative fitness effects of the distorter allele. Thus, our methods are able to reproduce many previously-hypothesized signals of balancing selection, as well as discover novel interesting candidates. PMID:25144706
A model-based approach for identifying signatures of ancient balancing selection in genetic data.
DeGiorgio, Michael; Lohmueller, Kirk E; Nielsen, Rasmus
2014-08-01
While much effort has focused on detecting positive and negative directional selection in the human genome, relatively little work has been devoted to balancing selection. This lack of attention is likely due to the paucity of sophisticated methods for identifying sites under balancing selection. Here we develop two composite likelihood ratio tests for detecting balancing selection. Using simulations, we show that these methods outperform competing methods under a variety of assumptions and demographic models. We apply the new methods to whole-genome human data, and find a number of previously-identified loci with strong evidence of balancing selection, including several HLA genes. Additionally, we find evidence for many novel candidates, the strongest of which is FANK1, an imprinted gene that suppresses apoptosis, is expressed during meiosis in males, and displays marginal signs of segregation distortion. We hypothesize that balancing selection acts on this locus to stabilize the segregation distortion and negative fitness effects of the distorter allele. Thus, our methods are able to reproduce many previously-hypothesized signals of balancing selection, as well as discover novel interesting candidates.
Kim, Dongchul; Kang, Mingon; Biswas, Ashis; Liu, Chunyu; Gao, Jean
2016-08-10
Inferring gene regulatory networks is one of the most interesting research areas in the systems biology. Many inference methods have been developed by using a variety of computational models and approaches. However, there are two issues to solve. First, depending on the structural or computational model of inference method, the results tend to be inconsistent due to innately different advantages and limitations of the methods. Therefore the combination of dissimilar approaches is demanded as an alternative way in order to overcome the limitations of standalone methods through complementary integration. Second, sparse linear regression that is penalized by the regularization parameter (lasso) and bootstrapping-based sparse linear regression methods were suggested in state of the art methods for network inference but they are not effective for a small sample size data and also a true regulator could be missed if the target gene is strongly affected by an indirect regulator with high correlation or another true regulator. We present two novel network inference methods based on the integration of three different criteria, (i) z-score to measure the variation of gene expression from knockout data, (ii) mutual information for the dependency between two genes, and (iii) linear regression-based feature selection. Based on these criterion, we propose a lasso-based random feature selection algorithm (LARF) to achieve better performance overcoming the limitations of bootstrapping as mentioned above. In this work, there are three main contributions. First, our z score-based method to measure gene expression variations from knockout data is more effective than similar criteria of related works. Second, we confirmed that the true regulator selection can be effectively improved by LARF. Lastly, we verified that an integrative approach can clearly outperform a single method when two different methods are effectively jointed. In the experiments, our methods were validated by outperforming the state of the art methods on DREAM challenge data, and then LARF was applied to inferences of gene regulatory network associated with psychiatric disorders.
Kadota, Koji; Konishi, Tomokazu; Shimizu, Kentaro
2007-01-01
Large-scale expression profiling using DNA microarrays enables identification of tissue-selective genes for which expression is considerably higher and/or lower in some tissues than in others. Among numerous possible methods, only two outlier-detection-based methods (an AIC-based method and Sprent’s non-parametric method) can treat equally various types of selective patterns, but they produce substantially different results. We investigated the performance of these two methods for different parameter settings and for a reduced number of samples. We focused on their ability to detect selective expression patterns robustly. We applied them to public microarray data collected from 36 normal human tissue samples and analyzed the effects of both changing the parameter settings and reducing the number of samples. The AIC-based method was more robust in both cases. The findings confirm that the use of the AIC-based method in the recently proposed ROKU method for detecting tissue-selective expression patterns is correct and that Sprent’s method is not suitable for ROKU. PMID:19936074
Bakshi, Souvika; Saha, Bedabrata; Roy, Nand Kishor; Mishra, Sagarika; Panda, Sanjib Kumar; Sahoo, Lingaraj
2012-06-01
A new method for obtaining transgenic cowpea was developed using positive selection based on the Escherichia coli 6-phosphomannose isomerase gene as the selectable marker and mannose as the selective agent. Only transformed cells were capable of utilizing mannose as a carbon source. Cotyledonary node explants from 4-day-old in vitro-germinated seedlings of cultivar Pusa Komal were inoculated with Agrobacterium tumefaciens strain EHA105 carrying the vector pNOV2819. Regenerating transformed shoots were selected on medium supplemented with a combination of 20 g/l mannose and 5 g/l sucrose as carbon source. The transformed shoots were rooted on medium devoid of mannose. Transformation efficiency based on PCR analysis of individual putative transformed shoots was 3.6%. Southern blot analysis on five randomly chosen PCR-positive plants confirmed the integration of the pmi transgene. Qualitative reverse transcription (qRT-PCR) analysis demonstrated the expression of pmi in T₀ transgenic plants. Chlorophenol red (CPR) assays confirmed the activity of PMI in transgenic plants, and the gene was transmitted to progeny in a Mendelian fashion. The transformation method presented here for cowpea using mannose selection is efficient and reproducible, and could be used to introduce a desirable gene(s) into cowpea for biotic and abiotic stress tolerance.
A simple and reliable multi-gene transformation method for switchgrass.
Ogawa, Yoichi; Shirakawa, Makoto; Koumoto, Yasuko; Honda, Masaho; Asami, Yuki; Kondo, Yasuhiro; Hara-Nishimura, Ikuko
2014-07-01
A simple and reliable Agrobacterium -mediated transformation method was developed for switchgrass. Using this method, many transgenic plants carrying multiple genes-of-interest could be produced without untransformed escape. Switchgrass (Panicum virgatum L.) is a promising biomass crop for bioenergy. To obtain transgenic switchgrass plants carrying a multi-gene trait in a simple manner, an Agrobacterium-mediated transformation method was established by constructing a Gateway-based binary vector, optimizing transformation conditions and developing a novel selection method. A MultiRound Gateway-compatible destination binary vector carrying the bar selectable marker gene, pHKGB110, was constructed to introduce multiple genes of interest in a single transformation. Two reporter gene expression cassettes, GUSPlus and gfp, were constructed independently on two entry vectors and then introduced into a single T-DNA region of pHKGB110 via sequential LR reactions. Agrobacterium tumefaciens EHA101 carrying the resultant binary vector pHKGB112 and caryopsis-derived compact embryogenic calli were used for transformation experiments. Prolonged cocultivation for 7 days followed by cultivation on media containing meropenem improved transformation efficiency without overgrowth of Agrobacterium, which was, however, not inhibited by cefotaxime or Timentin. In addition, untransformed escape shoots were completely eliminated during the rooting stage by direct dipping the putatively transformed shoots into the herbicide Basta solution for a few seconds, designated as the 'herbicide dipping method'. It was also demonstrated that more than 90 % of the bar-positive transformants carried both reporters delivered from pHKGB112. This simple and reliable transformation method, which incorporates a new selection technique and the use of a MultiRound Gateway-based binary vector, would be suitable for producing a large number of transgenic lines carrying multiple genes.
SFM: A novel sequence-based fusion method for disease genes identification and prioritization.
Yousef, Abdulaziz; Moghadam Charkari, Nasrollah
2015-10-21
The identification of disease genes from human genome is of great importance to improve diagnosis and treatment of disease. Several machine learning methods have been introduced to identify disease genes. However, these methods mostly differ in the prior knowledge used to construct the feature vector for each instance (gene), the ways of selecting negative data (non-disease genes) where there is no investigational approach to find them and the classification methods used to make the final decision. In this work, a novel Sequence-based fusion method (SFM) is proposed to identify disease genes. In this regard, unlike existing methods, instead of using a noisy and incomplete prior-knowledge, the amino acid sequence of the proteins which is universal data has been carried out to present the genes (proteins) into four different feature vectors. To select more likely negative data from candidate genes, the intersection set of four negative sets which are generated using distance approach is considered. Then, Decision Tree (C4.5) has been applied as a fusion method to combine the results of four independent state-of the-art predictors based on support vector machine (SVM) algorithm, and to make the final decision. The experimental results of the proposed method have been evaluated by some standard measures. The results indicate the precision, recall and F-measure of 82.6%, 85.6% and 84, respectively. These results confirm the efficiency and validity of the proposed method. Copyright © 2015 Elsevier Ltd. All rights reserved.
Feature weight estimation for gene selection: a local hyperlinear learning approach
2014-01-01
Background Modeling high-dimensional data involving thousands of variables is particularly important for gene expression profiling experiments, nevertheless,it remains a challenging task. One of the challenges is to implement an effective method for selecting a small set of relevant genes, buried in high-dimensional irrelevant noises. RELIEF is a popular and widely used approach for feature selection owing to its low computational cost and high accuracy. However, RELIEF based methods suffer from instability, especially in the presence of noisy and/or high-dimensional outliers. Results We propose an innovative feature weighting algorithm, called LHR, to select informative genes from highly noisy data. LHR is based on RELIEF for feature weighting using classical margin maximization. The key idea of LHR is to estimate the feature weights through local approximation rather than global measurement, which is typically used in existing methods. The weights obtained by our method are very robust in terms of degradation of noisy features, even those with vast dimensions. To demonstrate the performance of our method, extensive experiments involving classification tests have been carried out on both synthetic and real microarray benchmark datasets by combining the proposed technique with standard classifiers, including the support vector machine (SVM), k-nearest neighbor (KNN), hyperplane k-nearest neighbor (HKNN), linear discriminant analysis (LDA) and naive Bayes (NB). Conclusion Experiments on both synthetic and real-world datasets demonstrate the superior performance of the proposed feature selection method combined with supervised learning in three aspects: 1) high classification accuracy, 2) excellent robustness to noise and 3) good stability using to various classification algorithms. PMID:24625071
Reranking candidate gene models with cross-species comparison for improved gene prediction
Liu, Qian; Crammer, Koby; Pereira, Fernando CN; Roos, David S
2008-01-01
Background Most gene finders score candidate gene models with state-based methods, typically HMMs, by combining local properties (coding potential, splice donor and acceptor patterns, etc). Competing models with similar state-based scores may be distinguishable with additional information. In particular, functional and comparative genomics datasets may help to select among competing models of comparable probability by exploiting features likely to be associated with the correct gene models, such as conserved exon/intron structure or protein sequence features. Results We have investigated the utility of a simple post-processing step for selecting among a set of alternative gene models, using global scoring rules to rerank competing models for more accurate prediction. For each gene locus, we first generate the K best candidate gene models using the gene finder Evigan, and then rerank these models using comparisons with putative orthologous genes from closely-related species. Candidate gene models with lower scores in the original gene finder may be selected if they exhibit strong similarity to probable orthologs in coding sequence, splice site location, or signal peptide occurrence. Experiments on Drosophila melanogaster demonstrate that reranking based on cross-species comparison outperforms the best gene models identified by Evigan alone, and also outperforms the comparative gene finders GeneWise and Augustus+. Conclusion Reranking gene models with cross-species comparison improves gene prediction accuracy. This straightforward method can be readily adapted to incorporate additional lines of evidence, as it requires only a ranked source of candidate gene models. PMID:18854050
Klangnurak, Wanlada; Fukuyo, Taketo; Rezanujjaman, M D; Seki, Masahide; Sugano, Sumio; Suzuki, Yutaka; Tokumoto, Toshinobu
2018-01-01
We previously reported the microarray-based selection of three ovulation-related genes in zebrafish. We used a different selection method in this study, RNA sequencing analysis. An additional eight up-regulated candidates were found as specifically up-regulated genes in ovulation-induced samples. Changes in gene expression were confirmed by qPCR analysis. Furthermore, up-regulation prior to ovulation during natural spawning was verified in samples from natural pairing. Gene knock-out zebrafish strains of one of the candidates, the starmaker gene (stm), were established by CRISPR genome editing techniques. Unexpectedly, homozygous mutants were fertile and could spawn eggs. However, a high percentage of unfertilized eggs and abnormal embryos were produced from these homozygous females. The results suggest that the stm gene is necessary for fertilization. In this study, we selected additional ovulation-inducing candidate genes, and a novel function of the stm gene was investigated.
Zhao, Shanrong; Zhang, Ying; Gamini, Ramya; Zhang, Baohong; von Schack, David
2018-03-19
To allow efficient transcript/gene detection, highly abundant ribosomal RNAs (rRNA) are generally removed from total RNA either by positive polyA+ selection or by rRNA depletion (negative selection) before sequencing. Comparisons between the two methods have been carried out by various groups, but the assessments have relied largely on non-clinical samples. In this study, we evaluated these two RNA sequencing approaches using human blood and colon tissue samples. Our analyses showed that rRNA depletion captured more unique transcriptome features, whereas polyA+ selection outperformed rRNA depletion with higher exonic coverage and better accuracy of gene quantification. For blood- and colon-derived RNAs, we found that 220% and 50% more reads, respectively, would have to be sequenced to achieve the same level of exonic coverage in the rRNA depletion method compared with the polyA+ selection method. Therefore, in most cases we strongly recommend polyA+ selection over rRNA depletion for gene quantification in clinical RNA sequencing. Our evaluation revealed that a small number of lncRNAs and small RNAs made up a large fraction of the reads in the rRNA depletion RNA sequencing data. Thus, we recommend that these RNAs are specifically depleted to improve the sequencing depth of the remaining RNAs.
Saito, Shinta; Ura, Kiyoe; Kodama, Miho; Adachi, Noritaka
2015-06-30
Targeted gene modification by homologous recombination provides a powerful tool for studying gene function in cells and animals. In higher eukaryotes, non-homologous integration of targeting vectors occurs several orders of magnitude more frequently than does targeted integration, making the gene-targeting technology highly inefficient. For this reason, negative-selection strategies have been employed to reduce the number of drug-resistant clones associated with non-homologous vector integration, particularly when artificial nucleases to introduce a DNA break at the target site are unavailable or undesirable. As such, an exon-trap strategy using a promoterless drug-resistance marker gene provides an effective way to counterselect non-homologous integrants. However, constructing exon-trapping targeting vectors has been a time-consuming and complicated process. By virtue of highly efficient att-mediated recombination, we successfully developed a simple and rapid method to construct plasmid-based vectors that allow for exon-trapping gene targeting. These exon-trap vectors were useful in obtaining correctly targeted clones in mouse embryonic stem cells and human HT1080 cells. Most importantly, with the use of a conditionally cytotoxic gene, we further developed a novel strategy for negative selection, thereby enhancing the efficiency of counterselection for non-homologous integration of exon-trap vectors. Our methods will greatly facilitate exon-trapping gene-targeting technologies in mammalian cells, particularly when combined with the novel negative selection strategy.
Recursive regularization for inferring gene networks from time-course gene expression profiles
Shimamura, Teppei; Imoto, Seiya; Yamaguchi, Rui; Fujita, André; Nagasaki, Masao; Miyano, Satoru
2009-01-01
Background Inferring gene networks from time-course microarray experiments with vector autoregressive (VAR) model is the process of identifying functional associations between genes through multivariate time series. This problem can be cast as a variable selection problem in Statistics. One of the promising methods for variable selection is the elastic net proposed by Zou and Hastie (2005). However, VAR modeling with the elastic net succeeds in increasing the number of true positives while it also results in increasing the number of false positives. Results By incorporating relative importance of the VAR coefficients into the elastic net, we propose a new class of regularization, called recursive elastic net, to increase the capability of the elastic net and estimate gene networks based on the VAR model. The recursive elastic net can reduce the number of false positives gradually by updating the importance. Numerical simulations and comparisons demonstrate that the proposed method succeeds in reducing the number of false positives drastically while keeping the high number of true positives in the network inference and achieves two or more times higher true discovery rate (the proportion of true positives among the selected edges) than the competing methods even when the number of time points is small. We also compared our method with various reverse-engineering algorithms on experimental data of MCF-7 breast cancer cells stimulated with two ErbB ligands, EGF and HRG. Conclusion The recursive elastic net is a powerful tool for inferring gene networks from time-course gene expression profiles. PMID:19386091
Methods for simultaneous control of lignin content and composition, and cellulose content in plants
Chiang, Vincent Lee C.; Li, Laigeng
2005-02-15
The present invention relates to a method of concurrently introducing multiple genes into plants and trees is provided. The method includes simultaneous transformation of plants with multiple genes from the phenylpropanoid pathways including 4CL, CAld5H, AldOMT, SAD and CAD genes and combinations thereof to produce various lines of transgenic plants displaying altered agronomic traits. The agronomic traits of the plants are regulated by the orientation of the specific genes and the selected gene combinations, which are incorporated into the plant genome.
Safo, Sandra E; Li, Shuzhao; Long, Qi
2018-03-01
Integrative analysis of high dimensional omics data is becoming increasingly popular. At the same time, incorporating known functional relationships among variables in analysis of omics data has been shown to help elucidate underlying mechanisms for complex diseases. In this article, our goal is to assess association between transcriptomic and metabolomic data from a Predictive Health Institute (PHI) study that includes healthy adults at a high risk of developing cardiovascular diseases. Adopting a strategy that is both data-driven and knowledge-based, we develop statistical methods for sparse canonical correlation analysis (CCA) with incorporation of known biological information. Our proposed methods use prior network structural information among genes and among metabolites to guide selection of relevant genes and metabolites in sparse CCA, providing insight on the molecular underpinning of cardiovascular disease. Our simulations demonstrate that the structured sparse CCA methods outperform several existing sparse CCA methods in selecting relevant genes and metabolites when structural information is informative and are robust to mis-specified structural information. Our analysis of the PHI study reveals that a number of gene and metabolic pathways including some known to be associated with cardiovascular diseases are enriched in the set of genes and metabolites selected by our proposed approach. © 2017, The International Biometric Society.
Sexual selection drives evolution and rapid turnover of male gene expression.
Harrison, Peter W; Wright, Alison E; Zimmer, Fabian; Dean, Rebecca; Montgomery, Stephen H; Pointer, Marie A; Mank, Judith E
2015-04-07
The profound and pervasive differences in gene expression observed between males and females, and the unique evolutionary properties of these genes in many species, have led to the widespread assumption that they are the product of sexual selection and sexual conflict. However, we still lack a clear understanding of the connection between sexual selection and transcriptional dimorphism, often termed sex-biased gene expression. Moreover, the relative contribution of sexual selection vs. drift in shaping broad patterns of expression, divergence, and polymorphism remains unknown. To assess the role of sexual selection in shaping these patterns, we assembled transcriptomes from an avian clade representing the full range of sexual dimorphism and sexual selection. We use these species to test the links between sexual selection and sex-biased gene expression evolution in a comparative framework. Through ancestral reconstruction of sex bias, we demonstrate a rapid turnover of sex bias across this clade driven by sexual selection and show it to be primarily the result of expression changes in males. We use phylogenetically controlled comparative methods to demonstrate that phenotypic measures of sexual selection predict the proportion of male-biased but not female-biased gene expression. Although male-biased genes show elevated rates of coding sequence evolution, consistent with previous reports in a range of taxa, there is no association between sexual selection and rates of coding sequence evolution, suggesting that expression changes may be more important than coding sequence in sexual selection. Taken together, our results highlight the power of sexual selection to act on gene expression differences and shape genome evolution.
Island-Model Genomic Selection for Long-Term Genetic Improvement of Autogamous Crops.
Yabe, Shiori; Yamasaki, Masanori; Ebana, Kaworu; Hayashi, Takeshi; Iwata, Hiroyoshi
2016-01-01
Acceleration of genetic improvement of autogamous crops such as wheat and rice is necessary to increase cereal production in response to the global food crisis. Population and pedigree methods of breeding, which are based on inbred line selection, are used commonly in the genetic improvement of autogamous crops. These methods, however, produce a few novel combinations of genes in a breeding population. Recurrent selection promotes recombination among genes and produces novel combinations of genes in a breeding population, but it requires inaccurate single-plant evaluation for selection. Genomic selection (GS), which can predict genetic potential of individuals based on their marker genotype, might have high reliability of single-plant evaluation and might be effective in recurrent selection. To evaluate the efficiency of recurrent selection with GS, we conducted simulations using real marker genotype data of rice cultivars. Additionally, we introduced the concept of an "island model" inspired by evolutionary algorithms that might be useful to maintain genetic variation through the breeding process. We conducted GS simulations using real marker genotype data of rice cultivars to evaluate the efficiency of recurrent selection and the island model in an autogamous species. Results demonstrated the importance of producing novel combinations of genes through recurrent selection. An initial population derived from admixture of multiple bi-parental crosses showed larger genetic gains than a population derived from a single bi-parental cross in whole cycles, suggesting the importance of genetic variation in an initial population. The island-model GS better maintained genetic improvement in later generations than the other GS methods, suggesting that the island-model GS can utilize genetic variation in breeding and can retain alleles with small effects in the breeding population. The island-model GS will become a new breeding method that enhances the potential of genomic selection in autogamous crops, especially bringing long-term improvement.
Island-Model Genomic Selection for Long-Term Genetic Improvement of Autogamous Crops
Yabe, Shiori; Yamasaki, Masanori; Ebana, Kaworu; Hayashi, Takeshi; Iwata, Hiroyoshi
2016-01-01
Acceleration of genetic improvement of autogamous crops such as wheat and rice is necessary to increase cereal production in response to the global food crisis. Population and pedigree methods of breeding, which are based on inbred line selection, are used commonly in the genetic improvement of autogamous crops. These methods, however, produce a few novel combinations of genes in a breeding population. Recurrent selection promotes recombination among genes and produces novel combinations of genes in a breeding population, but it requires inaccurate single-plant evaluation for selection. Genomic selection (GS), which can predict genetic potential of individuals based on their marker genotype, might have high reliability of single-plant evaluation and might be effective in recurrent selection. To evaluate the efficiency of recurrent selection with GS, we conducted simulations using real marker genotype data of rice cultivars. Additionally, we introduced the concept of an “island model” inspired by evolutionary algorithms that might be useful to maintain genetic variation through the breeding process. We conducted GS simulations using real marker genotype data of rice cultivars to evaluate the efficiency of recurrent selection and the island model in an autogamous species. Results demonstrated the importance of producing novel combinations of genes through recurrent selection. An initial population derived from admixture of multiple bi-parental crosses showed larger genetic gains than a population derived from a single bi-parental cross in whole cycles, suggesting the importance of genetic variation in an initial population. The island-model GS better maintained genetic improvement in later generations than the other GS methods, suggesting that the island-model GS can utilize genetic variation in breeding and can retain alleles with small effects in the breeding population. The island-model GS will become a new breeding method that enhances the potential of genomic selection in autogamous crops, especially bringing long-term improvement. PMID:27115872
Gene selection heuristic algorithm for nutrigenomics studies.
Valour, D; Hue, I; Grimard, B; Valour, B
2013-07-15
Large datasets from -omics studies need to be deeply investigated. The aim of this paper is to provide a new method (LEM method) for the search of transcriptome and metabolome connections. The heuristic algorithm here described extends the classical canonical correlation analysis (CCA) to a high number of variables (without regularization) and combines well-conditioning and fast-computing in "R." Reduced CCA models are summarized in PageRank matrices, the product of which gives a stochastic matrix that resumes the self-avoiding walk covered by the algorithm. Then, a homogeneous Markov process applied to this stochastic matrix converges the probabilities of interconnection between genes, providing a selection of disjointed subsets of genes. This is an alternative to regularized generalized CCA for the determination of blocks within the structure matrix. Each gene subset is thus linked to the whole metabolic or clinical dataset that represents the biological phenotype of interest. Moreover, this selection process reaches the aim of biologists who often need small sets of genes for further validation or extended phenotyping. The algorithm is shown to work efficiently on three published datasets, resulting in meaningfully broadened gene networks.
Discrete Biogeography Based Optimization for Feature Selection in Molecular Signatures.
Liu, Bo; Tian, Meihong; Zhang, Chunhua; Li, Xiangtao
2015-04-01
Biomarker discovery from high-dimensional data is a complex task in the development of efficient cancer diagnoses and classification. However, these data are usually redundant and noisy, and only a subset of them present distinct profiles for different classes of samples. Thus, selecting high discriminative genes from gene expression data has become increasingly interesting in the field of bioinformatics. In this paper, a discrete biogeography based optimization is proposed to select the good subset of informative gene relevant to the classification. In the proposed algorithm, firstly, the fisher-markov selector is used to choose fixed number of gene data. Secondly, to make biogeography based optimization suitable for the feature selection problem; discrete migration model and discrete mutation model are proposed to balance the exploration and exploitation ability. Then, discrete biogeography based optimization, as we called DBBO, is proposed by integrating discrete migration model and discrete mutation model. Finally, the DBBO method is used for feature selection, and three classifiers are used as the classifier with the 10 fold cross-validation method. In order to show the effective and efficiency of the algorithm, the proposed algorithm is tested on four breast cancer dataset benchmarks. Comparison with genetic algorithm, particle swarm optimization, differential evolution algorithm and hybrid biogeography based optimization, experimental results demonstrate that the proposed method is better or at least comparable with previous method from literature when considering the quality of the solutions obtained. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Selection signature in domesticated animals.
Pan, Zhang-yuan; He, Xiao-yun; Wang, Xiang-yu; Guo, Xiao-fei; Cao, Xiao-han; Hu, Wen-ping; Di, Ran; Liu, Qiu-yue; Chu, Ming-xing
2016-12-20
Domesticated animals play an important role in the life of humanity. All these domesticated animals undergo same process, first domesticated from wild animals, then after long time natural and artificial selection, formed various breeds that adapted to the local environment and human needs. In this process, domestication, natural and artificial selection will leave the selection signal in the genome. The research on these selection signals can find functional genes directly, is one of the most important strategies in screening functional genes. The current studies of selection signal have been performed in pigs, chickens, cattle, sheep, goats, dogs and other domestic animals, and found a great deal of functional genes. This paper provided an overview of the types and the detected methods of selection signal, and outlined researches of selection signal in domestic animals, and discussed the key issues in selection signal analysis and its prospects.
Novel method to load multiple genes onto a mammalian artificial chromosome.
Tóth, Anna; Fodor, Katalin; Praznovszky, Tünde; Tubak, Vilmos; Udvardy, Andor; Hadlaczky, Gyula; Katona, Robert L
2014-01-01
Mammalian artificial chromosomes are natural chromosome-based vectors that may carry a vast amount of genetic material in terms of both size and number. They are reasonably stable and segregate well in both mitosis and meiosis. A platform artificial chromosome expression system (ACEs) was earlier described with multiple loading sites for a modified lambda-integrase enzyme. It has been shown that this ACEs is suitable for high-level industrial protein production and the treatment of a mouse model for a devastating human disorder, Krabbe's disease. ACEs-treated mutant mice carrying a therapeutic gene lived more than four times longer than untreated counterparts. This novel gene therapy method is called combined mammalian artificial chromosome-stem cell therapy. At present, this method suffers from the limitation that a new selection marker gene should be present for each therapeutic gene loaded onto the ACEs. Complex diseases require the cooperative action of several genes for treatment, but only a limited number of selection marker genes are available and there is also a risk of serious side-effects caused by the unwanted expression of these marker genes in mammalian cells, organs and organisms. We describe here a novel method to load multiple genes onto the ACEs by using only two selectable marker genes. These markers may be removed from the ACEs before therapeutic application. This novel technology could revolutionize gene therapeutic applications targeting the treatment of complex disorders and cancers. It could also speed up cell therapy by allowing researchers to engineer a chromosome with a predetermined set of genetic factors to differentiate adult stem cells, embryonic stem cells and induced pluripotent stem (iPS) cells into cell types of therapeutic value. It is also a suitable tool for the investigation of complex biochemical pathways in basic science by producing an ACEs with several genes from a signal transduction pathway of interest.
Church, Sheri A; Livingstone, Kevin; Lai, Zhao; Kozik, Alexander; Knapp, Steven J; Michelmore, Richard W; Rieseberg, Loren H
2007-02-01
Using likelihood-based variable selection models, we determined if positive selection was acting on 523 EST sequence pairs from two lineages of sunflower and lettuce. Variable rate models are generally not used for comparisons of sequence pairs due to the limited information and the inaccuracy of estimates of specific substitution rates. However, previous studies have shown that the likelihood ratio test (LRT) is reliable for detecting positive selection, even with low numbers of sequences. These analyses identified 56 genes that show a signature of selection, of which 75% were not identified by simpler models that average selection across codons. Subsequent mapping studies in sunflower show four of five of the positively selected genes identified by these methods mapped to domestication QTLs. We discuss the validity and limitations of using variable rate models for comparisons of sequence pairs, as well as the limitations of using ESTs for identification of positively selected genes.
Modelling gene expression profiles related to prostate tumor progression using binary states
2013-01-01
Background Cancer is a complex disease commonly characterized by the disrupted activity of several cancer-related genes such as oncogenes and tumor-suppressor genes. Previous studies suggest that the process of tumor progression to malignancy is dynamic and can be traced by changes in gene expression. Despite the enormous efforts made for differential expression detection and biomarker discovery, few methods have been designed to model the gene expression level to tumor stage during malignancy progression. Such models could help us understand the dynamics and simplify or reveal the complexity of tumor progression. Methods We have modeled an on-off state of gene activation per sample then per stage to select gene expression profiles associated to tumor progression. The selection is guided by statistical significance of profiles based on random permutated datasets. Results We show that our method identifies expected profiles corresponding to oncogenes and tumor suppressor genes in a prostate tumor progression dataset. Comparisons with other methods support our findings and indicate that a considerable proportion of significant profiles is not found by other statistical tests commonly used to detect differential expression between tumor stages nor found by other tailored methods. Ontology and pathway analysis concurred with these findings. Conclusions Results suggest that our methodology may be a valuable tool to study tumor malignancy progression, which might reveal novel cancer therapies. PMID:23721350
Method for metabolizing carbazole in petroleum
Kayser, Kevin J.; Kilbane, II, John J.
2005-09-13
A method for selective cleavage of C--N bonds genes that encode for at least one enzyme suitable for conversion of carbazole to 2-aminobiphenyl-2,3-diol are combined with a gene encoding an amidase suitable for selectively cleaving a C--N bond in 2-aminobiphenyl-2,3-diol, forming an operon that encodes for cleavage of both C--N bonds of said carbazole. The operon is inserted into a host culture which, in turn, is contacted with the carbazole, resulting in selective cleavage of both C--N bonds of the carbazole. Also disclosed is a new microorganism that expresses a carbazole degradation trait constitutively and a method for degrading carbazole employing this microorganism.
Construction of human antibody gene libraries and selection of antibodies by phage display.
Frenzel, André; Kügler, Jonas; Wilke, Sonja; Schirrmann, Thomas; Hust, Michael
2014-01-01
Antibody phage display is the most commonly used in vitro selection technology and has yielded thousands of useful antibodies for research, diagnostics, and therapy.The prerequisite for successful generation and development of human recombinant antibodies using phage display is the construction of a high-quality antibody gene library. Here, we describe the methods for the construction of human immune and naive scFv gene libraries.The success also depends on the panning strategy for the selection of binders from these libraries. In this article, we describe a panning strategy that is high-throughput compatible and allows parallel selection in microtiter plates.
Niranjana, M; Vinod; Sharma, J B; Mallick, Niharika; Tomar, S M S; Jha, S K
2017-12-01
Leaf rust (Puccinia triticina) is a major biotic stress affecting wheat yields worldwide. Host-plant resistance is the best method for controlling leaf rust. Aegilops speltoides is a good source of resistance against wheat rusts. To date, five Lr genes, Lr28, Lr35, Lr36, Lr47, and Lr51, have been transferred from Ae. speltoides to bread wheat. In Selection2427, a bread wheat introgresed line with Ae. speltoides as the donor parent, a dominant gene for leaf rust resistance was mapped to the long arm of chromosome 3B (LrS2427). None of the Lr genes introgressed from Ae. speltoides have been mapped to chromosome 3B. Since none of the designated seedling leaf rust resistance genes have been located on chromosome 3B, LrS2427 seems to be a novel gene. Selection2427 showed a unique property typical of gametocidal genes, that when crossed to other bread wheat cultivars, the F 1 showed partial pollen sterility and poor seed setting, whilst Selection2427 showed reasonable male and female fertility. Accidental co-transfer of gametocidal genes with LrS2427 may have occurred in Selection2427. Though LrS2427 did not show any segregation distortion and assorted independently of putative gametocidal gene(s), its utilization will be difficult due to the selfish behavior of gametocidal genes.
Degrees of separation as a statistical tool for evaluating candidate genes.
Nelson, Ronald M; Pettersson, Mats E
2014-12-01
Selection of candidate genes is an important step in the exploration of complex genetic architecture. The number of gene networks available is increasing and these can provide information to help with candidate gene selection. It is currently common to use the degree of connectedness in gene networks as validation in Genome Wide Association (GWA) and Quantitative Trait Locus (QTL) mapping studies. However, it can cause misleading results if not validated properly. Here we present a method and tool for validating the gene pairs from GWA studies given the context of the network they co-occur in. It ensures that proposed interactions and gene associations are not statistical artefacts inherent to the specific gene network architecture. The CandidateBacon package provides an easy and efficient method to calculate the average degree of separation (DoS) between pairs of genes to currently available gene networks. We show how these empirical estimates of average connectedness are used to validate candidate gene pairs. Validation of interacting genes by comparing their connectedness with the average connectedness in the gene network will provide support for said interactions by utilising the growing amount of gene network information available. Copyright © 2014 Elsevier Ltd. All rights reserved.
Gene selection with multiple ordering criteria.
Chen, James J; Tsai, Chen-An; Tzeng, Shengli; Chen, Chun-Houh
2007-03-05
A microarray study may select different differentially expressed gene sets because of different selection criteria. For example, the fold-change and p-value are two commonly known criteria to select differentially expressed genes under two experimental conditions. These two selection criteria often result in incompatible selected gene sets. Also, in a two-factor, say, treatment by time experiment, the investigator may be interested in one gene list that responds to both treatment and time effects. We propose three layer ranking algorithms, point-admissible, line-admissible (convex), and Pareto, to provide a preference gene list from multiple gene lists generated by different ranking criteria. Using the public colon data as an example, the layer ranking algorithms are applied to the three univariate ranking criteria, fold-change, p-value, and frequency of selections by the SVM-RFE classifier. A simulation experiment shows that for experiments with small or moderate sample sizes (less than 20 per group) and detecting a 4-fold change or less, the two-dimensional (p-value and fold-change) convex layer ranking selects differentially expressed genes with generally lower FDR and higher power than the standard p-value ranking. Three applications are presented. The first application illustrates a use of the layer rankings to potentially improve predictive accuracy. The second application illustrates an application to a two-factor experiment involving two dose levels and two time points. The layer rankings are applied to selecting differentially expressed genes relating to the dose and time effects. In the third application, the layer rankings are applied to a benchmark data set consisting of three dilution concentrations to provide a ranking system from a long list of differentially expressed genes generated from the three dilution concentrations. The layer ranking algorithms are useful to help investigators in selecting the most promising genes from multiple gene lists generated by different filter, normalization, or analysis methods for various objectives.
Towards β-globin gene-targeting with integrase-defective lentiviral vectors.
Inanlou, Davoud Nouri; Yakhchali, Bagher; Khanahmad, Hossein; Gardaneh, Mossa; Movassagh, Hesam; Cohan, Reza Ahangari; Ardestani, Mehdi Shafiee; Mahdian, Reza; Zeinali, Sirous
2010-11-01
We have developed an integrase-defective lentiviral (LV) vector in combination with a gene-targeting approach for gene therapy of β-thalassemia. The β-globin gene-targeting construct has two homologous stems including sequence upstream and downstream of the β-globin gene, a β-globin gene positioned between hygromycin and neomycin resistant genes and a herpes simplex virus type 1 thymidine kinase (HSVtk) suicide gene. Utilization of integrase-defective LV as a vector for the β-globin gene increased the number of selected clones relative to non-viral methods. This method represents an important step toward the ultimate goal of a clinical gene therapy for β-thalassemia.
Ultsch, Alfred; Kringel, Dario; Kalso, Eija; Mogil, Jeffrey S; Lötsch, Jörn
2016-12-01
The increasing availability of "big data" enables novel research approaches to chronic pain while also requiring novel techniques for data mining and knowledge discovery. We used machine learning to combine the knowledge about n = 535 genes identified empirically as relevant to pain with the knowledge about the functions of thousands of genes. Starting from an accepted description of chronic pain as displaying systemic features described by the terms "learning" and "neuronal plasticity," a functional genomics analysis proposed that among the functions of the 535 "pain genes," the biological processes "learning or memory" (P = 8.6 × 10) and "nervous system development" (P = 2.4 × 10) are statistically significantly overrepresented as compared with the annotations to these processes expected by chance. After establishing that the hypothesized biological processes were among important functional genomics features of pain, a subset of n = 34 pain genes were found to be annotated with both Gene Ontology terms. Published empirical evidence supporting their involvement in chronic pain was identified for almost all these genes, including 1 gene identified in March 2016 as being involved in pain. By contrast, such evidence was virtually absent in a randomly selected set of 34 other human genes. Hence, the present computational functional genomics-based method can be used for candidate gene selection, providing an alternative to established methods.
A general method for identifying major hybrid male sterility genes in Drosophila.
Zeng, L W; Singh, R S
1995-10-01
The genes responsible for hybrid male sterility in species crosses are usually identified by introgressing chromosome segments, monitored by visible markers, between closely related species by continuous backcrosses. This commonly used method, however, suffers from two problems. First, it relies on the availability of markers to monitor the introgressed regions and so the portion of the genome examined is limited to the marked regions. Secondly, the introgressed regions are usually large and it is impossible to tell if the effects of the introgressed regions are the result of single (or few) major genes or many minor genes (polygenes). Here we introduce a simple and general method for identifying putative major hybrid male sterility genes which is free of these problems. In this method, the actual hybrid male sterility genes (rather than markers), or tightly linked gene complexes with large effects, are selectively introgressed from one species into the background of another species by repeated backcrosses. This is performed by selectively backcrossing heterozygous (for hybrid male sterility gene or genes) females producing fertile and sterile sons in roughly equal proportions to males of either parental species. As no marker gene is required for this procedure, this method can be used with any species pairs that produce unisexual sterility. With the application of this method, a small X chromosome region of Drosophila mauritiana which produces complete hybrid male sterility (aspermic testes) in the background of D. simulans was identified. Recombination analysis reveals that this region contains a second major hybrid male sterility gene linked to the forked locus located at either 62.7 +/- 0.66 map units or at the centromere region of the X chromosome of D. mauritiana.
Niklitschek, Mauricio; Baeza, Marcelo; Fernández-Lobato, María; Cifuentes, Víctor
2012-01-01
Generally two selection markers are required to obtain homozygous mutations in a diploid background, one for each gene copy that is interrupted. In this chapter is described a method that allows the double gene deletions of the two copies of a gene from a diploid organism, a wild-type strain of the Xanthophyllomyces dendrorhous yeast, using hygromycin B resistance as the only selection marker. To accomplish this, in a first step, a heterozygous hygromycin B-resistant strain is obtained by a single process of transformation (carrying the inserted hph gene). Following, the heterozygous mutant is grown in media with increasing concentrations of the antibiotic. In this way, the strains that became homozygous (by mitotic recombination) for the antibiotic marker would able to growth at higher concentration of the antibiotic than the heterozygous. The method can be potentially applied for obtaining double mutants of other diploid organisms.
Kim, Hyunjin; Choi, Sang-Min; Park, Sanghyun
2018-01-01
When a gene shows varying levels of expression among normal people but similar levels in disease patients or shows similar levels of expression among normal people but different levels in disease patients, we can assume that the gene is associated with the disease. By utilizing this gene expression heterogeneity, we can obtain additional information that abets discovery of disease-associated genes. In this study, we used collaborative filtering to calculate the degree of gene expression heterogeneity between classes and then scored the genes on the basis of the degree of gene expression heterogeneity to find "differentially predicted" genes. Through the proposed method, we discovered more prostate cancer-associated genes than 10 comparable methods. The genes prioritized by the proposed method are potentially significant to biological processes of a disease and can provide insight into them.
Random mutagenesis by error-prone pol plasmid replication in Escherichia coli.
Alexander, David L; Lilly, Joshua; Hernandez, Jaime; Romsdahl, Jillian; Troll, Christopher J; Camps, Manel
2014-01-01
Directed evolution is an approach that mimics natural evolution in the laboratory with the goal of modifying existing enzymatic activities or of generating new ones. The identification of mutants with desired properties involves the generation of genetic diversity coupled with a functional selection or screen. Genetic diversity can be generated using PCR or using in vivo methods such as chemical mutagenesis or error-prone replication of the desired sequence in a mutator strain. In vivo mutagenesis methods facilitate iterative selection because they do not require cloning, but generally produce a low mutation density with mutations not restricted to specific genes or areas within a gene. For this reason, this approach is typically used to generate new biochemical properties when large numbers of mutants can be screened or selected. Here we describe protocols for an advanced in vivo mutagenesis method that is based on error-prone replication of a ColE1 plasmid bearing the gene of interest. Compared to other in vivo mutagenesis methods, this plasmid-targeted approach allows increased mutation loads and facilitates iterative selection approaches. We also describe the mutation spectrum for this mutagenesis methodology in detail, and, using cycle 3 GFP as a target for mutagenesis, we illustrate the phenotypic diversity that can be generated using our method. In sum, error-prone Pol I replication is a mutagenesis method that is ideally suited for the evolution of new biochemical activities when a functional selection is available.
Distinguishing between Selective Sweeps from Standing Variation and from a De Novo Mutation
Peter, Benjamin M.; Huerta-Sanchez, Emilia; Nielsen, Rasmus
2012-01-01
An outstanding question in human genetics has been the degree to which adaptation occurs from standing genetic variation or from de novo mutations. Here, we combine several common statistics used to detect selection in an Approximate Bayesian Computation (ABC) framework, with the goal of discriminating between models of selection and providing estimates of the age of selected alleles and the selection coefficients acting on them. We use simulations to assess the power and accuracy of our method and apply it to seven of the strongest sweeps currently known in humans. We identify two genes, ASPM and PSCA, that are most likely affected by selection on standing variation; and we find three genes, ADH1B, LCT, and EDAR, in which the adaptive alleles seem to have swept from a new mutation. We also confirm evidence of selection for one further gene, TRPV6. In one gene, G6PD, neither neutral models nor models of selective sweeps fit the data, presumably because this locus has been subject to balancing selection. PMID:23071458
A comparative study of covariance selection models for the inference of gene regulatory networks.
Stifanelli, Patrizia F; Creanza, Teresa M; Anglani, Roberto; Liuzzi, Vania C; Mukherjee, Sayan; Schena, Francesco P; Ancona, Nicola
2013-10-01
The inference, or 'reverse-engineering', of gene regulatory networks from expression data and the description of the complex dependency structures among genes are open issues in modern molecular biology. In this paper we compared three regularized methods of covariance selection for the inference of gene regulatory networks, developed to circumvent the problems raising when the number of observations n is smaller than the number of genes p. The examined approaches provided three alternative estimates of the inverse covariance matrix: (a) the 'PINV' method is based on the Moore-Penrose pseudoinverse, (b) the 'RCM' method performs correlation between regression residuals and (c) 'ℓ(2C)' method maximizes a properly regularized log-likelihood function. Our extensive simulation studies showed that ℓ(2C) outperformed the other two methods having the most predictive partial correlation estimates and the highest values of sensitivity to infer conditional dependencies between genes even when a few number of observations was available. The application of this method for inferring gene networks of the isoprenoid biosynthesis pathways in Arabidopsis thaliana allowed to enlighten a negative partial correlation coefficient between the two hubs in the two isoprenoid pathways and, more importantly, provided an evidence of cross-talk between genes in the plastidial and the cytosolic pathways. When applied to gene expression data relative to a signature of HRAS oncogene in human cell cultures, the method revealed 9 genes (p-value<0.0005) directly interacting with HRAS, sharing the same Ras-responsive binding site for the transcription factor RREB1. This result suggests that the transcriptional activation of these genes is mediated by a common transcription factor downstream of Ras signaling. Software implementing the methods in the form of Matlab scripts are available at: http://users.ba.cnr.it/issia/iesina18/CovSelModelsCodes.zip. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.
Rapid Hypothesis Testing with Candida albicans through Gene Disruption with Short Homology Regions
Wilson, R. Bryce; Davis, Dana; Mitchell, Aaron P.
1999-01-01
Disruption of newly identified genes in the pathogen Candida albicans is a vital step in determination of gene function. Several gene disruption methods described previously employ long regions of homology flanking a selectable marker. Here, we describe disruption of C. albicans genes with PCR products that have 50 to 60 bp of homology to a genomic sequence on each end of a selectable marker. We used the method to disrupt two known genes, ARG5 and ADE2, and two sequences newly identified through the Candida genome project, HRM101 and ENX3. HRM101 and ENX3 are homologous to genes in the conserved RIM101 (previously called RIM1) and PacC pathways of Saccharomyces cerevisiae and Aspergillus nidulans. We show that three independent hrm101/hrm101 mutants and two independent enx3/enx3 mutants are defective in filamentation on Spider medium. These observations argue that HRM101 and ENX3 sequences are indeed portions of genes and that the respective gene products have related functions. PMID:10074081
Radiogenetic therapy: strategies to overcome tumor resistance.
Marples, B; Greco, O; Joiner, M C; Scott, S D
2003-01-01
The aim of cancer gene therapy is to selectively kill malignant cells at the tumor site, by exploiting traits specific to cancer cells and/or solid tumors. Strategies that take advantage of biological features common to different tumor types are particularly promising, since they have wide clinical applicability. Much attention has focused on genetic methods that complement radiotherapy, the principal treatment modality, or that exploit hypoxia, the most ubiquitous characteristic of most solid cancers. The goal of this review is to highlight two promising gene therapy methods developed specifically to target the tumor volume that can be readily used in combination with radiotherapy. The first approach uses radiation-responsive gene promoters to control the selective expression of a suicide gene (e.g., herpes simplex virus thymidine kinase) to irradiated tissue only, leading to targeted cell killing in the presence of a prodrug (e.g., ganciclovir). The second method utilizes oxygen-dependent promoters to produce selective therapeutic gene expression and prodrug activation in hypoxic cells, which are refractive to conventional radiotherapy. Further refining of tumor targeting can be achieved by combining radiation and hypoxia responsive elements in chimeric promoters activated by either and dual stimuli. The in vitro and in vivo studies described in this review suggest that the combination of gene therapy and radiotherapy protocols has potential for use in cancer care, particularly in cases currently refractory to treatment as a result of inherent or hypoxia-mediated radioresistance.
Genome-Wide Specific Selection in Three Domestic Sheep Breeds.
Wang, Huihua; Zhang, Li; Cao, Jiaxve; Wu, Mingming; Ma, Xiaomeng; Liu, Zhen; Liu, Ruizao; Zhao, Fuping; Wei, Caihong; Du, Lixin
2015-01-01
Commercial sheep raised for mutton grow faster than traditional Chinese sheep breeds. Here, we aimed to evaluate genetic selection among three different types of sheep breed: two well-known commercial mutton breeds and one indigenous Chinese breed. We first combined locus-specific branch lengths and di statistical methods to detect candidate regions targeted by selection in the three different populations. The results showed that the genetic distances reached at least medium divergence for each pairwise combination. We found these two methods were highly correlated, and identified many growth-related candidate genes undergoing artificial selection. For production traits, APOBR and FTO are associated with body mass index. For meat traits, ALDOA, STK32B and FAM190A are related to marbling. For reproduction traits, CCNB2 and SLC8A3 affect oocyte development. We also found two well-known genes, GHR (which affects meat production and quality) and EDAR (associated with hair thickness) were associated with German mutton merino sheep. Furthermore, four genes (POL, RPL7, MSL1 and SHISA9) were associated with pre-weaning gain in our previous genome-wide association study. Our results indicated that combine locus-specific branch lengths and di statistical approaches can reduce the searching ranges for specific selection. And we got many credible candidate genes which not only confirm the results of previous reports, but also provide a suite of novel candidate genes in defined breeds to guide hybridization breeding.
Antibiotic Combinations That Enable One-Step, Targeted Mutagenesis of Chromosomal Genes.
Lee, Wonsik; Do, Truc; Zhang, Ge; Kahne, Daniel; Meredith, Timothy C; Walker, Suzanne
2018-06-08
Targeted modification of bacterial chromosomes is necessary to understand new drug targets, investigate virulence factors, elucidate cell physiology, and validate results of -omics-based approaches. For some bacteria, reverse genetics remains a major bottleneck to progress in research. Here, we describe a compound-centric strategy that combines new negative selection markers with known positive selection markers to achieve simple, efficient one-step genome engineering of bacterial chromosomes. The method was inspired by the observation that certain nonessential metabolic pathways contain essential late steps, suggesting that antibiotics targeting a late step can be used to select for the absence of genes that control flux into the pathway. Guided by this hypothesis, we have identified antibiotic/counterselectable markers to accelerate reverse engineering of two increasingly antibiotic-resistant pathogens, Staphylococcus aureus and Acinetobacter baumannii. For S. aureus, we used wall teichoic acid biosynthesis inhibitors to select for the absence of tarO and for A. baumannii, we used colistin to select for the absence of lpxC. We have obtained desired gene deletions, gene fusions, and promoter swaps in a single plating step with perfect efficiency. Our method can also be adapted to generate markerless deletions of genes using FLP recombinase. The tools described here will accelerate research on two important pathogens, and the concept we outline can be readily adapted to any organism for which a suitable target pathway can be identified.
2014-01-01
Background Discerning the traits evolving under neutral conditions from those traits evolving rapidly because of various selection pressures is a great challenge. We propose a new method, composite selection signals (CSS), which unifies the multiple pieces of selection evidence from the rank distribution of its diverse constituent tests. The extreme CSS scores capture highly differentiated loci and underlying common variants hauling excess haplotype homozygosity in the samples of a target population. Results The data on high-density genotypes were analyzed for evidence of an association with either polledness or double muscling in various cohorts of cattle and sheep. In cattle, extreme CSS scores were found in the candidate regions on autosome BTA-1 and BTA-2, flanking the POLL locus and MSTN gene, for polledness and double muscling, respectively. In sheep, the regions with extreme scores were localized on autosome OAR-2 harbouring the MSTN gene for double muscling and on OAR-10 harbouring the RXFP2 gene for polledness. In comparison to the constituent tests, there was a partial agreement between the signals at the four candidate loci; however, they consistently identified additional genomic regions harbouring no known genes. Persuasively, our list of all the additional significant CSS regions contains genes that have been successfully implicated to secondary phenotypic diversity among several subpopulations in our data. For example, the method identified a strong selection signature for stature in cattle capturing selective sweeps harbouring UQCC-GDF5 and PLAG1-CHCHD7 gene regions on BTA-13 and BTA-14, respectively. Both gene pairs have been previously associated with height in humans, while PLAG1-CHCHD7 has also been reported for stature in cattle. In the additional analysis, CSS identified significant regions harbouring multiple genes for various traits under selection in European cattle including polledness, adaptation, metabolism, growth rate, stature, immunity, reproduction traits and some other candidate genes for dairy and beef production. Conclusions CSS successfully localized the candidate regions in validation datasets as well as identified previously known and novel regions for various traits experiencing selection pressure. Together, the results demonstrate the utility of CSS by its improved power, reduced false positives and high-resolution of selection signals as compared to individual constituent tests. PMID:24636660
Method for nonlinear optimization for gas tagging and other systems
Chen, Ting; Gross, Kenny C.; Wegerich, Stephan
1998-01-01
A method and system for providing nuclear fuel rods with a configuration of isotopic gas tags. The method includes selecting a true location of a first gas tag node, selecting initial locations for the remaining n-1 nodes using target gas tag compositions, generating a set of random gene pools with L nodes, applying a Hopfield network for computing on energy, or cost, for each of the L gene pools and using selected constraints to establish minimum energy states to identify optimal gas tag nodes with each energy compared to a convergence threshold and then upon identifying the gas tag node continuing this procedure until establishing the next gas tag node until all remaining n nodes have been established.
Method for nonlinear optimization for gas tagging and other systems
Chen, T.; Gross, K.C.; Wegerich, S.
1998-01-06
A method and system are disclosed for providing nuclear fuel rods with a configuration of isotopic gas tags. The method includes selecting a true location of a first gas tag node, selecting initial locations for the remaining n-1 nodes using target gas tag compositions, generating a set of random gene pools with L nodes, applying a Hopfield network for computing on energy, or cost, for each of the L gene pools and using selected constraints to establish minimum energy states to identify optimal gas tag nodes with each energy compared to a convergence threshold and then upon identifying the gas tag node continuing this procedure until establishing the next gas tag node until all remaining n nodes have been established. 6 figs.
Bigda, Jacek J; Koszałka, Patrycja
2013-08-10
In this report we describe Wacław Szybalski's fundamental contribution to gene therapy and immunotherapy. His 1962 PNAS paper (Szybalska and Szybalski, 1962) documented the first successful gene repair in mammalian cells. Furthermore, this was also the first report on the HAT selection method used later in many applications. Most importantly, somatic cell fusion and HAT selection were subsequently used to develop monoclonal antibody technology, which contributed significantly to the progress of today's medicine. Copyright © 2013 Elsevier B.V. All rights reserved.
Forward and reverse mutagenesis in C. elegans
Kutscher, Lena M.; Shaham, Shai
2014-01-01
Mutagenesis drives natural selection. In the lab, mutations allow gene function to be deciphered. C. elegans is highly amendable to functional genetics because of its short generation time, ease of use, and wealth of available gene-alteration techniques. Here we provide an overview of historical and contemporary methods for mutagenesis in C. elegans, and discuss principles and strategies for forward (genome-wide mutagenesis) and reverse (target-selected and gene-specific mutagenesis) genetic studies in this animal. PMID:24449699
Brock, Guy N; Shaffer, John R; Blakesley, Richard E; Lotz, Meredith J; Tseng, George C
2008-01-10
Gene expression data frequently contain missing values, however, most down-stream analyses for microarray experiments require complete data. In the literature many methods have been proposed to estimate missing values via information of the correlation patterns within the gene expression matrix. Each method has its own advantages, but the specific conditions for which each method is preferred remains largely unclear. In this report we describe an extensive evaluation of eight current imputation methods on multiple types of microarray experiments, including time series, multiple exposures, and multiple exposures x time series data. We then introduce two complementary selection schemes for determining the most appropriate imputation method for any given data set. We found that the optimal imputation algorithms (LSA, LLS, and BPCA) are all highly competitive with each other, and that no method is uniformly superior in all the data sets we examined. The success of each method can also depend on the underlying "complexity" of the expression data, where we take complexity to indicate the difficulty in mapping the gene expression matrix to a lower-dimensional subspace. We developed an entropy measure to quantify the complexity of expression matrixes and found that, by incorporating this information, the entropy-based selection (EBS) scheme is useful for selecting an appropriate imputation algorithm. We further propose a simulation-based self-training selection (STS) scheme. This technique has been used previously for microarray data imputation, but for different purposes. The scheme selects the optimal or near-optimal method with high accuracy but at an increased computational cost. Our findings provide insight into the problem of which imputation method is optimal for a given data set. Three top-performing methods (LSA, LLS and BPCA) are competitive with each other. Global-based imputation methods (PLS, SVD, BPCA) performed better on mcroarray data with lower complexity, while neighbour-based methods (KNN, OLS, LSA, LLS) performed better in data with higher complexity. We also found that the EBS and STS schemes serve as complementary and effective tools for selecting the optimal imputation algorithm.
Traditional and modern plant breeding methods with examples in rice (Oryza sativa L.).
Breseghello, Flavio; Coelho, Alexandre Siqueira Guedes
2013-09-04
Plant breeding can be broadly defined as alterations caused in plants as a result of their use by humans, ranging from unintentional changes resulting from the advent of agriculture to the application of molecular tools for precision breeding. The vast diversity of breeding methods can be simplified into three categories: (i) plant breeding based on observed variation by selection of plants based on natural variants appearing in nature or within traditional varieties; (ii) plant breeding based on controlled mating by selection of plants presenting recombination of desirable genes from different parents; and (iii) plant breeding based on monitored recombination by selection of specific genes or marker profiles, using molecular tools for tracking within-genome variation. The continuous application of traditional breeding methods in a given species could lead to the narrowing of the gene pool from which cultivars are drawn, rendering crops vulnerable to biotic and abiotic stresses and hampering future progress. Several methods have been devised for introducing exotic variation into elite germplasm without undesirable effects. Cases in rice are given to illustrate the potential and limitations of different breeding approaches.
Mutual information estimation reveals global associations between stimuli and biological processes
Suzuki, Taiji; Sugiyama, Masashi; Kanamori, Takafumi; Sese, Jun
2009-01-01
Background Although microarray gene expression analysis has become popular, it remains difficult to interpret the biological changes caused by stimuli or variation of conditions. Clustering of genes and associating each group with biological functions are often used methods. However, such methods only detect partial changes within cell processes. Herein, we propose a method for discovering global changes within a cell by associating observed conditions of gene expression with gene functions. Results To elucidate the association, we introduce a novel feature selection method called Least-Squares Mutual Information (LSMI), which computes mutual information without density estimaion, and therefore LSMI can detect nonlinear associations within a cell. We demonstrate the effectiveness of LSMI through comparison with existing methods. The results of the application to yeast microarray datasets reveal that non-natural stimuli affect various biological processes, whereas others are no significant relation to specific cell processes. Furthermore, we discover that biological processes can be categorized into four types according to the responses of various stimuli: DNA/RNA metabolism, gene expression, protein metabolism, and protein localization. Conclusion We proposed a novel feature selection method called LSMI, and applied LSMI to mining the association between conditions of yeast and biological processes through microarray datasets. In fact, LSMI allows us to elucidate the global organization of cellular process control. PMID:19208155
2011-01-01
Background Several computational candidate gene selection and prioritization methods have recently been developed. These in silico selection and prioritization techniques are usually based on two central approaches - the examination of similarities to known disease genes and/or the evaluation of functional annotation of genes. Each of these approaches has its own caveats. Here we employ a previously described method of candidate gene prioritization based mainly on gene annotation, in accompaniment with a technique based on the evaluation of pertinent sequence motifs or signatures, in an attempt to refine the gene prioritization approach. We apply this approach to X-linked mental retardation (XLMR), a group of heterogeneous disorders for which some of the underlying genetics is known. Results The gene annotation-based binary filtering method yielded a ranked list of putative XLMR candidate genes with good plausibility of being associated with the development of mental retardation. In parallel, a motif finding approach based on linear discriminatory analysis (LDA) was employed to identify short sequence patterns that may discriminate XLMR from non-XLMR genes. High rates (>80%) of correct classification was achieved, suggesting that the identification of these motifs effectively captures genomic signals associated with XLMR vs. non-XLMR genes. The computational tools developed for the motif-based LDA is integrated into the freely available genomic analysis portal Galaxy (http://main.g2.bx.psu.edu/). Nine genes (APLN, ZC4H2, MAGED4, MAGED4B, RAP2C, FAM156A, FAM156B, TBL1X, and UXT) were highlighted as highly-ranked XLMR methods. Conclusions The combination of gene annotation information and sequence motif-orientated computational candidate gene prediction methods highlight an added benefit in generating a list of plausible candidate genes, as has been demonstrated for XLMR. Reviewers: This article was reviewed by Dr Barbara Bardoni (nominated by Prof Juergen Brosius); Prof Neil Smalheiser and Dr Dustin Holloway (nominated by Prof Charles DeLisi). PMID:21668950
Huang, Xuena; Gao, Yangchun; Jiang, Bei; Zhou, Zunchun; Zhan, Aibin
2016-01-15
As invasive species have successfully colonized a wide range of dramatically different local environments, they offer a good opportunity to study interactions between species and rapidly changing environments. Gene expression represents one of the primary and crucial mechanisms for rapid adaptation to local environments. Here, we aim to select reference genes for quantitative gene expression analysis based on quantitative Real-Time PCR (qRT-PCR) for a model invasive ascidian, Ciona savignyi. We analyzed the stability of ten candidate reference genes in three tissues (siphon, pharynx and intestine) under two key environmental stresses (temperature and salinity) in the marine realm based on three programs (geNorm, NormFinder and delta Ct method). Our results demonstrated only minor difference for stability rankings among the three methods. The use of different single reference gene might influence the data interpretation, while multiple reference genes could minimize possible errors. Therefore, reference gene combinations were recommended for different tissues - the optimal reference gene combination for siphon was RPS15 and RPL17 under temperature stress, and RPL17, UBQ and TubA under salinity treatment; for pharynx, TubB, TubA and RPL17 were the most stable genes under temperature stress, while TubB, TubA and UBQ were the best under salinity stress; for intestine, UBQ, RPS15 and RPL17 were the most reliable reference genes under both treatments. Our results suggest that the necessity of selection and test of reference genes for different tissues under varying environmental stresses. The results obtained here are expected to reveal mechanisms of gene expression-mediated invasion success using C. savignyi as a model species. Copyright © 2015 Elsevier B.V. All rights reserved.
An 'instant gene bank' method for gene cloning by mutant complementation.
Gems, D; Aleksenko, A; Belenky, L; Robertson, S; Ramsden, M; Vinetski, Y; Clutterbuck, A J
1994-02-01
We describe a new method of gene cloning by complementation of mutant alleles which obviates the need for construction of a gene library in a plasmid vector in vitro and its amplification in Escherichia coli. The method involves simultaneous transformation of mutant strains of the fungus Aspergillus nidulans with (i) fragmented chromosomal DNA from a donor species and (ii) DNA of a plasmid without a selectable marker gene, but with a fungal origin of DNA replication ('helper plasmid'). Transformant colonies appear as the result of the joining of chromosomal DNA fragments carrying the wild-type copies of the mutant allele with the helper plasmid. Joining may occur either by ligation (if the helper plasmid is in linear form) or recombination (if it is cccDNA). This event occurs with high efficiency in vivo, and generates an autonomously replicating plasmid cointegrate. Transformants containing Penicillium chrysogenum genomic DNA complementing A. nidulans niaD, nirA and argB mutations have been obtained. While some of these cointegrates were evidently rearranged or consisted only of unaltered replicating plasmid, in other cases plasmids could be recovered into E. coli and were subsequently shown to contain the selected gene. The utility of this "instant gene bank" technique is demonstrated here by the molecular cloning of the P. canescens trpC gene.
Data-adaptive test statistics for microarray data.
Mukherjee, Sach; Roberts, Stephen J; van der Laan, Mark J
2005-09-01
An important task in microarray data analysis is the selection of genes that are differentially expressed between different tissue samples, such as healthy and diseased. However, microarray data contain an enormous number of dimensions (genes) and very few samples (arrays), a mismatch which poses fundamental statistical problems for the selection process that have defied easy resolution. In this paper, we present a novel approach to the selection of differentially expressed genes in which test statistics are learned from data using a simple notion of reproducibility in selection results as the learning criterion. Reproducibility, as we define it, can be computed without any knowledge of the 'ground-truth', but takes advantage of certain properties of microarray data to provide an asymptotically valid guide to expected loss under the true data-generating distribution. We are therefore able to indirectly minimize expected loss, and obtain results substantially more robust than conventional methods. We apply our method to simulated and oligonucleotide array data. By request to the corresponding author.
A Hybrid Computational Method for the Discovery of Novel Reproduction-Related Genes
Chen, Lei; Chu, Chen; Kong, Xiangyin; Huang, Guohua; Huang, Tao; Cai, Yu-Dong
2015-01-01
Uncovering the molecular mechanisms underlying reproduction is of great importance to infertility treatment and to the generation of healthy offspring. In this study, we discovered novel reproduction-related genes with a hybrid computational method, integrating three different types of method, which offered new clues for further reproduction research. This method was first executed on a weighted graph, constructed based on known protein-protein interactions, to search the shortest paths connecting any two known reproduction-related genes. Genes occurring in these paths were deemed to have a special relationship with reproduction. These newly discovered genes were filtered with a randomization test. Then, the remaining genes were further selected according to their associations with known reproduction-related genes measured by protein-protein interaction score and alignment score obtained by BLAST. The in-depth analysis of the high confidence novel reproduction genes revealed hidden mechanisms of reproduction and provided guidelines for further experimental validations. PMID:25768094
A hybrid computational method for the discovery of novel reproduction-related genes.
Chen, Lei; Chu, Chen; Kong, Xiangyin; Huang, Guohua; Huang, Tao; Cai, Yu-Dong
2015-01-01
Uncovering the molecular mechanisms underlying reproduction is of great importance to infertility treatment and to the generation of healthy offspring. In this study, we discovered novel reproduction-related genes with a hybrid computational method, integrating three different types of method, which offered new clues for further reproduction research. This method was first executed on a weighted graph, constructed based on known protein-protein interactions, to search the shortest paths connecting any two known reproduction-related genes. Genes occurring in these paths were deemed to have a special relationship with reproduction. These newly discovered genes were filtered with a randomization test. Then, the remaining genes were further selected according to their associations with known reproduction-related genes measured by protein-protein interaction score and alignment score obtained by BLAST. The in-depth analysis of the high confidence novel reproduction genes revealed hidden mechanisms of reproduction and provided guidelines for further experimental validations.
Logsdon, Benjamin A.; Mezey, Jason
2010-01-01
Cellular gene expression measurements contain regulatory information that can be used to discover novel network relationships. Here, we present a new algorithm for network reconstruction powered by the adaptive lasso, a theoretically and empirically well-behaved method for selecting the regulatory features of a network. Any algorithms designed for network discovery that make use of directed probabilistic graphs require perturbations, produced by either experiments or naturally occurring genetic variation, to successfully infer unique regulatory relationships from gene expression data. Our approach makes use of appropriately selected cis-expression Quantitative Trait Loci (cis-eQTL), which provide a sufficient set of independent perturbations for maximum network resolution. We compare the performance of our network reconstruction algorithm to four other approaches: the PC-algorithm, QTLnet, the QDG algorithm, and the NEO algorithm, all of which have been used to reconstruct directed networks among phenotypes leveraging QTL. We show that the adaptive lasso can outperform these algorithms for networks of ten genes and ten cis-eQTL, and is competitive with the QDG algorithm for networks with thirty genes and thirty cis-eQTL, with rich topologies and hundreds of samples. Using this novel approach, we identify unique sets of directed relationships in Saccharomyces cerevisiae when analyzing genome-wide gene expression data for an intercross between a wild strain and a lab strain. We recover novel putative network relationships between a tyrosine biosynthesis gene (TYR1), and genes involved in endocytosis (RCY1), the spindle checkpoint (BUB2), sulfonate catabolism (JLP1), and cell-cell communication (PRM7). Our algorithm provides a synthesis of feature selection methods and graphical model theory that has the potential to reveal new directed regulatory relationships from the analysis of population level genetic and gene expression data. PMID:21152011
Detection of genomic signatures of recent selection in commercial broiler chickens.
Fu, Weixuan; Lee, William R; Abasht, Behnam
2016-08-26
Identification of the genomic signatures of recent selection may help uncover causal polymorphisms controlling traits relevant to recent decades of selective breeding in livestock. In this study, we aimed at detecting signatures of recent selection in commercial broiler chickens using genotype information from single nucleotide polymorphisms (SNPs). A total of 565 chickens from five commercial purebred lines, including three broiler sire (male) lines and two broiler dam (female) lines, were genotyped using the 60K SNP Illumina iSelect chicken array. To detect genomic signatures of recent selection, we applied two methods based on population comparison, cross-population extended haplotype homozygosity (XP-EHH) and cross-population composite likelihood ratio (XP-CLR), and further analyzed the results to find genomic regions under recent selection in multiple purebred lines. A total of 321 candidate selection regions spanning approximately 1.45 % of the chicken genome in each line were detected by consensus of results of both XP-EHH and XP-CLR methods. To minimize false discovery due to genetic drift, only 42 of the candidate selection regions that were shared by 2 or more purebred lines were considered as high-confidence selection regions in the study. Of these 42 regions, 20 were 50 kb or less while 4 regions were larger than 0.5 Mb. In total, 91 genes could be found in the 42 regions, among which 19 regions contained only 1 or 2 genes, and 9 regions were located at gene deserts. Our results provide a genome-wide scan of recent selection signatures in five purebred lines of commercial broiler chickens. We found several candidate genes for recent selection in multiple lines, such as SOX6 (Sex Determining Region Y-Box 6) and cTR (Thyroid hormone receptor beta). These genes may have been under recent selection due to their essential roles in growth, development and reproduction in chickens. Furthermore, our results suggest that in some candidate regions, the same or opposite alleles have been under recent selection in multiple lines. Most of the candidate genes in the selection regions are novel, and as such they should be of great interest for future research into the genetic architecture of traits relevant to modern broiler breeding.
The prospect of gene therapy for prostate cancer: update on theory and status.
Koeneman, K S; Hsieh, J T
2001-09-01
Molecularly based novel therapeutic agents are needed to address the problem of locally recurrent, or metastatic, advanced hormone-refractory prostate cancer. Recent basic science advances in mechanisms of gene expression, vector delivery, and targeting have rendered clinically relevant gene therapy to the prostatic fossa and distant sites feasible in the near future. Current research and clinical investigative efforts involving methods for more effective vector delivery and targeting, with enhanced gene expression to selected (specific) sites, are reviewed. These areas of research involve tissue-specific promoters, transgene exploration, vector design and delivery, and selective vector targeting. The 'vectorology' involved mainly addresses selective tissue homing with ligands, mechanisms of innate immune system evasion for durable transgene expression, and the possibility of repeat administration.
Csilléry, Katalin; Lalagüe, Hadrien; Vendramin, Giovanni G; González-Martínez, Santiago C; Fady, Bruno; Oddou-Muratorio, Sylvie
2014-10-01
Detecting signatures of selection in tree populations threatened by climate change is currently a major research priority. Here, we investigated the signature of local adaptation over a short spatial scale using 96 European beech (Fagus sylvatica L.) individuals originating from two pairs of populations on the northern and southern slopes of Mont Ventoux (south-eastern France). We performed both single and multilocus analysis of selection based on 53 climate-related candidate genes containing 546 SNPs. FST outlier methods at the SNP level revealed a weak signal of selection, with three marginally significant outliers in the northern populations. At the gene level, considering haplotypes as alleles, two additional marginally significant outliers were detected, one on each slope. To account for the uncertainty of haplotype inference, we averaged the Bayes factors over many possible phase reconstructions. Epistatic selection offers a realistic multilocus model of selection in natural populations. Here, we used a test suggested by Ohta based on the decomposition of the variance of linkage disequilibrium. Overall populations, 0.23% of the SNP pairs (haplotypes) showed evidence of epistatic selection, with nearly 80% of them being within genes. One of the between gene epistatic selection signals arose between an FST outlier and a nonsynonymous mutation in a drought response gene. Additionally, we identified haplotypes containing selectively advantageous allele combinations which were unique to high or low elevations and northern or southern populations. Several haplotypes contained nonsynonymous mutations situated in genes with known functional importance for adaptation to climatic factors. © 2014 John Wiley & Sons Ltd.
Dutta, B; Pusztai, L; Qi, Y; André, F; Lazar, V; Bianchini, G; Ueno, N; Agarwal, R; Wang, B; Shiang, C Y; Hortobagyi, G N; Mills, G B; Symmans, W F; Balázsi, G
2012-01-01
Background: The rapid collection of diverse genome-scale data raises the urgent need to integrate and utilise these resources for biological discovery or biomedical applications. For example, diverse transcriptomic and gene copy number variation data are currently collected for various cancers, but relatively few current methods are capable to utilise the emerging information. Methods: We developed and tested a data-integration method to identify gene networks that drive the biology of breast cancer clinical subtypes. The method simultaneously overlays gene expression and gene copy number data on protein–protein interaction, transcriptional-regulatory and signalling networks by identifying coincident genomic and transcriptional disturbances in local network neighborhoods. Results: We identified distinct driver-networks for each of the three common clinical breast cancer subtypes: oestrogen receptor (ER)+, human epidermal growth factor receptor 2 (HER2)+, and triple receptor-negative breast cancers (TNBC) from patient and cell line data sets. Driver-networks inferred from independent datasets were significantly reproducible. We also confirmed the functional relevance of a subset of randomly selected driver-network members for TNBC in gene knockdown experiments in vitro. We found that TNBC driver-network members genes have increased functional specificity to TNBC cell lines and higher functional sensitivity compared with genes selected by differential expression alone. Conclusion: Clinical subtype-specific driver-networks identified through data integration are reproducible and functionally important. PMID:22343619
Genetic subdivision and candidate genes under selection in North American grey wolves.
Schweizer, Rena M; vonHoldt, Bridgett M; Harrigan, Ryan; Knowles, James C; Musiani, Marco; Coltman, David; Novembre, John; Wayne, Robert K
2016-01-01
Previous genetic studies of the highly mobile grey wolf (Canis lupus) found population structure that coincides with habitat and phenotype differences. We hypothesized that these ecologically distinct populations (ecotypes) should exhibit signatures of selection in genes related to morphology, coat colour and metabolism. To test these predictions, we quantified population structure related to habitat using a genotyping array to assess variation in 42 036 single-nucleotide polymorphisms (SNPs) in 111 North American grey wolves. Using these SNP data and individual-level measurements of 12 environmental variables, we identified six ecotypes: West Forest, Boreal Forest, Arctic, High Arctic, British Columbia and Atlantic Forest. Next, we explored signals of selection across these wolf ecotypes through the use of three complementary methods to detect selection: FST /haplotype homozygosity bivariate percentilae, bayescan, and environmentally correlated directional selection with bayenv. Across all methods, we found consistent signals of selection on genes related to morphology, coat coloration, metabolism, as predicted, as well as vision and hearing. In several high-ranking candidate genes, including LEPR, TYR and SLC14A2, we found variation in allele frequencies that follow environmental changes in temperature and precipitation, a result that is consistent with local adaptation rather than genetic drift. Our findings show that local adaptation can occur despite gene flow in a highly mobile species and can be detected through a moderately dense genomic scan. These patterns of local adaptation revealed by SNP genotyping likely reflect high fidelity to natal habitats of dispersing wolves, strong ecological divergence among habitats, and moderate levels of linkage in the wolf genome. © 2015 John Wiley & Sons Ltd.
Sexual selection and sex linkage.
Kirkpatrick, Mark; Hall, David W
2004-04-01
Some animal groups, such as birds, seem prone to extreme forms of sexual selection. One contributing factor may be sex linkage of genes affecting male displays and female preferences. Here we show that sex linkage can have substantial effects on the genetic correlation between these traits and consequently for Fisher's runaway and the good-genes mechanisms of sexual selection. Under some kinds of sex linkage (e.g. Z-linked preferences), a runaway is more likely than under autosomal inheritance, while under others (e.g., X-linked preferences and autosomal displays), the good-genes mechanism is particularly powerful. These theoretical results suggest empirical tests based on the comparative method.
confFuse: High-Confidence Fusion Gene Detection across Tumor Entities.
Huang, Zhiqin; Jones, David T W; Wu, Yonghe; Lichter, Peter; Zapatka, Marc
2017-01-01
Background: Fusion genes play an important role in the tumorigenesis of many cancers. Next-generation sequencing (NGS) technologies have been successfully applied in fusion gene detection for the last several years, and a number of NGS-based tools have been developed for identifying fusion genes during this period. Most fusion gene detection tools based on RNA-seq data report a large number of candidates (mostly false positives), making it hard to prioritize candidates for experimental validation and further analysis. Selection of reliable fusion genes for downstream analysis becomes very important in cancer research. We therefore developed confFuse, a scoring algorithm to reliably select high-confidence fusion genes which are likely to be biologically relevant. Results: confFuse takes multiple parameters into account in order to assign each fusion candidate a confidence score, of which score ≥8 indicates high-confidence fusion gene predictions. These parameters were manually curated based on our experience and on certain structural motifs of fusion genes. Compared with alternative tools, based on 96 published RNA-seq samples from different tumor entities, our method can significantly reduce the number of fusion candidates (301 high-confidence from 8,083 total predicted fusion genes) and keep high detection accuracy (recovery rate 85.7%). Validation of 18 novel, high-confidence fusions detected in three breast tumor samples resulted in a 100% validation rate. Conclusions: confFuse is a novel downstream filtering method that allows selection of highly reliable fusion gene candidates for further downstream analysis and experimental validations. confFuse is available at https://github.com/Zhiqin-HUANG/confFuse.
NASA Astrophysics Data System (ADS)
Yu, Yuan; Tong, Qi; Li, Zhongxia; Tian, Jinhai; Wang, Yizhi; Su, Feng; Wang, Yongsheng; Liu, Jun; Zhang, Yong
2014-02-01
PhiC31 integrase-mediated gene delivery has been extensively used in gene therapy and animal transgenesis. However, random integration events are observed in phiC31-mediated integration in different types of mammalian cells; as a result, the efficiencies of pseudo attP site integration and evaluation of site-specific integration are compromised. To improve this system, we used an attB-TK fusion gene as a negative selection marker, thereby eliminating random integration during phiC31-mediated transfection. We also excised the selection system and plasmid bacterial backbone by using two other site-specific recombinases, Cre and Dre. Thus, we generated clean transgenic bovine fetal fibroblast cells free of selectable marker and plasmid bacterial backbone. These clean cells were used as donor nuclei for somatic cell nuclear transfer (SCNT), indicating a similar developmental competence of SCNT embryos to that of non-transgenic cells. Therefore, the present gene delivery system facilitated the development of gene therapy and agricultural biotechnology.
Automatic design of synthetic gene circuits through mixed integer non-linear programming.
Huynh, Linh; Kececioglu, John; Köppe, Matthias; Tagkopoulos, Ilias
2012-01-01
Automatic design of synthetic gene circuits poses a significant challenge to synthetic biology, primarily due to the complexity of biological systems, and the lack of rigorous optimization methods that can cope with the combinatorial explosion as the number of biological parts increases. Current optimization methods for synthetic gene design rely on heuristic algorithms that are usually not deterministic, deliver sub-optimal solutions, and provide no guaranties on convergence or error bounds. Here, we introduce an optimization framework for the problem of part selection in synthetic gene circuits that is based on mixed integer non-linear programming (MINLP), which is a deterministic method that finds the globally optimal solution and guarantees convergence in finite time. Given a synthetic gene circuit, a library of characterized parts, and user-defined constraints, our method can find the optimal selection of parts that satisfy the constraints and best approximates the objective function given by the user. We evaluated the proposed method in the design of three synthetic circuits (a toggle switch, a transcriptional cascade, and a band detector), with both experimentally constructed and synthetic promoter libraries. Scalability and robustness analysis shows that the proposed framework scales well with the library size and the solution space. The work described here is a step towards a unifying, realistic framework for the automated design of biological circuits.
Sajid, Mohammed; Chevalley-Maurel, Séverine; Ramesar, Jai; Klop, Onny; Franke-Fayard, Blandine M. D.; Janse, Chris J.; Khan, Shahid M.
2011-01-01
Research on the biology of malaria parasites has greatly benefited from the application of reverse genetic technologies, in particular through the analysis of gene deletion mutants and studies on transgenic parasites that express heterologous or mutated proteins. However, transfection in Plasmodium is limited by the paucity of drug-selectable markers that hampers subsequent genetic modification of the same mutant. We report the development of a novel ‘gene insertion/marker out’ (GIMO) method for two rodent malaria parasites, which uses negative selection to rapidly generate transgenic mutants ready for subsequent modifications. We have created reference mother lines for both P. berghei ANKA and P. yoelii 17XNL that serve as recipient parasites for GIMO-transfection. Compared to existing protocols GIMO-transfection greatly simplifies and speeds up the generation of mutants expressing heterologous proteins, free of drug-resistance genes, and requires far fewer laboratory animals. In addition we demonstrate that GIMO-transfection is also a simple and fast method for genetic complementation of mutants with a gene deletion or mutation. The implementation of GIMO-transfection procedures should greatly enhance Plasmodium reverse-genetic research. PMID:22216235
Construction of human antibody gene libraries and selection of antibodies by phage display.
Schirrmann, Thomas; Hust, Michael
2010-01-01
Recombinant antibodies as therapeutics offer new opportunities for the treatment of many tumor diseases. To date, 18 antibody-based drugs are approved for cancer treatment and hundreds of anti-tumor antibodies are under development. The first clinically approved antibodies were of murine origin or human-mouse chimeric. However, since murine antibody domains are immunogenic in human patients and could result in human anti-mouse antibody (HAMA) responses, currently mainly humanized and fully human antibodies are developed for therapeutic applications.Here, in vitro antibody selection technologies directly allow the selection of human antibodies and the corresponding genes from human antibody gene libraries. Antibody phage display is the most common way to generate human antibodies and has already yielded thousands of recombinant antibodies for research, diagnostics and therapy. Here, we describe methods for the construction of human scFv gene libraries and the antibody selection.
English, Sangeeta B.; Shih, Shou-Ching; Ramoni, Marco F.; Smith, Lois E.; Butte, Atul J.
2014-01-01
Though genome-wide technologies, such as microarrays, are widely used, data from these methods are considered noisy; there is still varied success in downstream biological validation. We report a method that increases the likelihood of successfully validating microarray findings using real time RT-PCR, including genes at low expression levels and with small differences. We use a Bayesian network to identify the most relevant sources of noise based on the successes and failures in validation for an initial set of selected genes, and then improve our subsequent selection of genes for validation based on eliminating these sources of noise. The network displays the significant sources of noise in an experiment, and scores the likelihood of validation for every gene. We show how the method can significantly increase validation success rates. In conclusion, in this study, we have successfully added a new automated step to determine the contributory sources of noise that determine successful or unsuccessful downstream biological validation. PMID:18790084
Schaschl, Helmut; Huber, Susanne; Schaefer, Katrin; Windhager, Sonja; Wallner, Bernard; Fieder, Martin
2015-05-13
The evolutionary highly conserved neurohypophyseal hormones oxytocin and arginine vasopressin play key roles in regulating social cognition and behaviours. The effects of these two peptides are meditated by their specific receptors, which are encoded by the oxytocin receptor (OXTR) and arginine vasopressin receptor 1a genes (AVPR1A), respectively. In several species, polymorphisms in these genes have been linked to various behavioural traits. Little, however, is known about whether positive selection acts on sequence variants in genes influencing variation in human behaviours. We identified, in both neuroreceptor genes, signatures of balancing selection in the cis-regulative acting sequences such as transcription factor binding and enhancer sequences, as well as in a transcriptional repressor sequence motif. Additionally, in the intron 3 of the OXTR gene, the SNP rs59190448 appears to be under positive directional selection. For rs59190448, only one phenotypical association is known so far, but it is in high LD' (>0.8) with loci of known association; i.e., variants associated with key pro-social behaviours and mental disorders in humans. Only for one SNP on the OXTR gene (rs59190448) was a sign of positive directional selection detected with all three methods of selection detection. For rs59190448, however, only one phenotypical association is known, but rs59190448 is in high LD' (>0.8), with variants associated with important pro-social behaviours and mental disorders in humans. We also detected various signatures of balancing selection on both neuroreceptor genes.
Selective Gene Transfection of Individual Cells In Vitro with Plasmonic Nanobubbles
Lukianova-Hleb, Ekaterina; Samaniego, Adam P.; Wen, Jianguo; Metelitsa, Leonid; Chang, Chung-Che; Lapotko, Dmitri
2011-01-01
Gene delivery and transfection of eukaryotic cells is widely used for research and for developing gene cell therapy. However, the existing methods lack selectivity, efficacy and safety when heterogeneous cell systems must be treated. We report a new method that employs plasmonic nanobubbles (PNBs) for delivery and transfection. A PNB is a novel, tunable cellular agent with a dual mechanical and optical action due to the formation of the vapor nanobubble around a transiently heated gold nanoparticle upon its exposure to a laser pulse. PNBs enabled the mechanical injection of the extracellular cDNA plasmid into the cytoplasm of individual target living cells, cultured leukemia cells and human CD34+CD117+ stem cells and expression of a green fluorescent protein (GFP) in those cells. PNB generation and lifetime correlated with the expression of green fluorescent protein in PNB-treated cells. Optical scattering by PNBs additionally provided the detection of the target cells and the guidance of cDNA injection at single cell level. In both cell models PNBs demonstrated a gene transfection effect in a single pulse treatment with high selectivity, efficacy and safety. Thus, PNBs provided targeted gene delivery at the single cell level in a single pulse procedure that can be used for safe and effective gene therapy. PMID:21315120
Selective gene transfection of individual cells in vitro with plasmonic nanobubbles.
Lukianova-Hleb, Ekaterina Y; Samaniego, Adam P; Wen, Jianguo; Metelitsa, Leonid S; Chang, Chung-Che; Lapotko, Dmitri O
2011-06-10
Gene delivery and transfection of eukaryotic cells are widely used for research and for developing gene cell therapy. However, the existing methods lack selectivity, efficacy and safety when heterogeneous cell systems must be treated. We report a new method that employs plasmonic nanobubbles (PNBs) for delivery and transfection. A PNB is a novel, tunable cellular agent with a dual mechanical and optical action due to the formation of the vapor nanobubble around a transiently heated gold nanoparticle upon its exposure to a laser pulse. PNBs enabled the mechanical injection of the extracellular cDNA plasmid into the cytoplasm of individual target living cells, cultured leukemia cells and human CD34+ CD117+ stem cells and expression of a green fluorescent protein (GFP) in those cells. PNB generation and lifetime correlated with the expression of green fluorescent protein in PNB-treated cells. Optical scattering by PNBs additionally provided the detection of the target cells and the guidance of cDNA injection at single cell level. In both cell models PNBs demonstrated a gene transfection effect in a single pulse treatment with high selectivity, efficacy and safety. Thus, PNBs provided targeted gene delivery at the single cell level in a single pulse procedure that can be used for safe and effective gene therapy. Copyright © 2011 Elsevier B.V. All rights reserved.
Lex-SVM: exploring the potential of exon expression profiling for disease classification.
Yuan, Xiongying; Zhao, Yi; Liu, Changning; Bu, Dongbo
2011-04-01
Exon expression profiling technologies, including exon arrays and RNA-Seq, measure the abundance of every exon in a gene. Compared with gene expression profiling technologies like 3' array, exon expression profiling technologies could detect alterations in both transcription and alternative splicing, therefore they are expected to be more sensitive in diagnosis. However, exon expression profiling also brings higher dimension, more redundancy, and significant correlation among features. Ignoring the correlation structure among exons of a gene, a popular classification method like L1-SVM selects exons individually from each gene and thus is vulnerable to noise. To overcome this limitation, we present in this paper a new variant of SVM named Lex-SVM to incorporate correlation structure among exons and known splicing patterns to promote classification performance. Specifically, we construct a new norm, ex-norm, including our prior knowledge on exon correlation structure to regularize the coefficients of a linear SVM. Lex-SVM can be solved efficiently using standard linear programming techniques. The advantage of Lex-SVM is that it can select features group-wisely, force features in a subgroup to take equal weihts and exclude the features that contradict the majority in the subgroup. Experimental results suggest that on exon expression profile, Lex-SVM is more accurate than existing methods. Lex-SVM also generates a more compact model and selects genes more consistently in cross-validation. Unlike L1-SVM selecting only one exon in a gene, Lex-SVM assigns equal weights to as many exons in a gene as possible, lending itself easier for further interpretation.
Positive selection on MHC class II DRB and DQB genes in the bank vole (Myodes glareolus).
Scherman, Kristin; Råberg, Lars; Westerdahl, Helena
2014-05-01
The major histocompatibility complex (MHC) class IIB genes show considerable sequence similarity between loci. The MHC class II DQB and DRB genes are known to exhibit a high level of polymorphism, most likely maintained by parasite-mediated selection. Studies of the MHC in wild rodents have focused on DRB, whilst DQB has been given much less attention. Here, we characterised DQB genes in Swedish bank voles Myodes glareolus, using full-length transcripts. We then designed primers that specifically amplify exon 2 from DRB (202 bp) and DQB (205 bp) and investigated molecular signatures of natural selection on DRB and DQB alleles. The presence of two separate gene clusters was confirmed using BLASTN and phylogenetic analysis, where our seven transcripts clustered according to either DQB or DRB homologues. These gene clusters were again confirmed on exon 2 data from 454-amplicon sequencing. Our DRB primers amplify a similar number of alleles per individual as previously published DRB primers, though our reads are longer. Traditional d N/d S analyses of DRB sequences in the bank vole have not found a conclusive signal of positive selection. Using a more advanced substitution model (the Kumar method) we found positive selection in the peptide binding region (PBR) of both DRB and DQB genes. Maximum likelihood models of codon substitutions detected positively selected sites located in the PBR of both DQB and DRB. Interestingly, these analyses detected at least twice as many positively selected sites in DQB than DRB, suggesting that DQB has been under stronger positive selection than DRB over evolutionary time.
Alshamlan, Hala; Badr, Ghada; Alohali, Yousef
2015-01-01
An artificial bee colony (ABC) is a relatively recent swarm intelligence optimization approach. In this paper, we propose the first attempt at applying ABC algorithm in analyzing a microarray gene expression profile. In addition, we propose an innovative feature selection algorithm, minimum redundancy maximum relevance (mRMR), and combine it with an ABC algorithm, mRMR-ABC, to select informative genes from microarray profile. The new approach is based on a support vector machine (SVM) algorithm to measure the classification accuracy for selected genes. We evaluate the performance of the proposed mRMR-ABC algorithm by conducting extensive experiments on six binary and multiclass gene expression microarray datasets. Furthermore, we compare our proposed mRMR-ABC algorithm with previously known techniques. We reimplemented two of these techniques for the sake of a fair comparison using the same parameters. These two techniques are mRMR when combined with a genetic algorithm (mRMR-GA) and mRMR when combined with a particle swarm optimization algorithm (mRMR-PSO). The experimental results prove that the proposed mRMR-ABC algorithm achieves accurate classification performance using small number of predictive genes when tested using both datasets and compared to previously suggested methods. This shows that mRMR-ABC is a promising approach for solving gene selection and cancer classification problems. PMID:25961028
Alshamlan, Hala; Badr, Ghada; Alohali, Yousef
2015-01-01
An artificial bee colony (ABC) is a relatively recent swarm intelligence optimization approach. In this paper, we propose the first attempt at applying ABC algorithm in analyzing a microarray gene expression profile. In addition, we propose an innovative feature selection algorithm, minimum redundancy maximum relevance (mRMR), and combine it with an ABC algorithm, mRMR-ABC, to select informative genes from microarray profile. The new approach is based on a support vector machine (SVM) algorithm to measure the classification accuracy for selected genes. We evaluate the performance of the proposed mRMR-ABC algorithm by conducting extensive experiments on six binary and multiclass gene expression microarray datasets. Furthermore, we compare our proposed mRMR-ABC algorithm with previously known techniques. We reimplemented two of these techniques for the sake of a fair comparison using the same parameters. These two techniques are mRMR when combined with a genetic algorithm (mRMR-GA) and mRMR when combined with a particle swarm optimization algorithm (mRMR-PSO). The experimental results prove that the proposed mRMR-ABC algorithm achieves accurate classification performance using small number of predictive genes when tested using both datasets and compared to previously suggested methods. This shows that mRMR-ABC is a promising approach for solving gene selection and cancer classification problems.
Gene selection for cancer classification with the help of bees.
Moosa, Johra Muhammad; Shakur, Rameen; Kaykobad, Mohammad; Rahman, Mohammad Sohel
2016-08-10
Development of biologically relevant models from gene expression data notably, microarray data has become a topic of great interest in the field of bioinformatics and clinical genetics and oncology. Only a small number of gene expression data compared to the total number of genes explored possess a significant correlation with a certain phenotype. Gene selection enables researchers to obtain substantial insight into the genetic nature of the disease and the mechanisms responsible for it. Besides improvement of the performance of cancer classification, it can also cut down the time and cost of medical diagnoses. This study presents a modified Artificial Bee Colony Algorithm (ABC) to select minimum number of genes that are deemed to be significant for cancer along with improvement of predictive accuracy. The search equation of ABC is believed to be good at exploration but poor at exploitation. To overcome this limitation we have modified the ABC algorithm by incorporating the concept of pheromones which is one of the major components of Ant Colony Optimization (ACO) algorithm and a new operation in which successive bees communicate to share their findings. The proposed algorithm is evaluated using a suite of ten publicly available datasets after the parameters are tuned scientifically with one of the datasets. Obtained results are compared to other works that used the same datasets. The performance of the proposed method is proved to be superior. The method presented in this paper can provide subset of genes leading to more accurate classification results while the number of selected genes is smaller. Additionally, the proposed modified Artificial Bee Colony Algorithm could conceivably be applied to problems in other areas as well.
Cisgenic apple trees; development, characterization, and performance
Krens, Frans A.; Schaart, Jan G.; van der Burgh, Aranka M.; Tinnenbroek-Capel, Iris E. M.; Groenwold, Remmelt; Kodde, Linda P.; Broggini, Giovanni A. L.; Gessler, Cesare; Schouten, Henk J.
2015-01-01
Two methods were developed for the generation of cisgenic apples. Both have been successfully applied producing trees. The first method avoids the use of any foreign selectable marker genes; only the gene-of-interest is integrated between the T-DNA border sequences. The second method makes use of recombinase-based marker excision. For the first method we used the MdMYB10 gene from a red-fleshed apple coding for a transcription factor involved in regulating anthocyanin biosynthesis. Red plantlets were obtained and presence of the cisgene was confirmed. Plantlets were grafted and grown in a greenhouse. After 3 years, the first flowers appeared, showing red petals. Pollination led to production of red-fleshed cisgenic apples. The second method used the pM(arker)F(ree) vector system, introducing the scab resistance gene Rvi6, derived from apple. Agrobacterium-mediated transformation, followed by selection on kanamycin, produced genetically modified apple lines. Next, leaves from in vitro material were treated to activate the recombinase leading to excision of selection genes. Subsequently, the leaf explants were subjected to negative selection for marker-free plantlets by inducing regeneration on medium containing 5-fluorocytosine. After verification of the marker-free nature, the obtained plants were grafted onto rootstocks. Young trees from four cisgenic lines and one intragenic line, all containing Rvi6, were planted in an orchard. Appropriate controls were incorporated in this trial. We scored scab incidence for three consecutive years on leaves after inoculations with Rvi6-avirulent strains. One cisgenic line and the intragenic line performed as well as the resistant control. In 2014 trees started to overcome their juvenile character and formed flowers and fruits. The first results of scoring scab symptoms on apple fruits were obtained. Apple fruits from susceptible controls showed scab symptoms, while fruits from cisgenic and intragenic lines were free of scab. PMID:25964793
Pashaei, Elnaz; Pashaei, Elham; Aydin, Nizamettin
2018-04-14
In cancer classification, gene selection is an important data preprocessing technique, but it is a difficult task due to the large search space. Accordingly, the objective of this study is to develop a hybrid meta-heuristic Binary Black Hole Algorithm (BBHA) and Binary Particle Swarm Optimization (BPSO) (4-2) model that emphasizes gene selection. In this model, the BBHA is embedded in the BPSO (4-2) algorithm to make the BPSO (4-2) more effective and to facilitate the exploration and exploitation of the BPSO (4-2) algorithm to further improve the performance. This model has been associated with Random Forest Recursive Feature Elimination (RF-RFE) pre-filtering technique. The classifiers which are evaluated in the proposed framework are Sparse Partial Least Squares Discriminant Analysis (SPLSDA); k-nearest neighbor and Naive Bayes. The performance of the proposed method was evaluated on two benchmark and three clinical microarrays. The experimental results and statistical analysis confirm the better performance of the BPSO (4-2)-BBHA compared with the BBHA, the BPSO (4-2) and several state-of-the-art methods in terms of avoiding local minima, convergence rate, accuracy and number of selected genes. The results also show that the BPSO (4-2)-BBHA model can successfully identify known biologically and statistically significant genes from the clinical datasets. Copyright © 2018 Elsevier Inc. All rights reserved.
Gene disruption in Trichoderma atroviride via Agrobacterium-mediated transformation.
Zeilinger, Susanne
2004-02-01
A modified Agrobacterium-mediated transformation method for the efficient disruption of two genes encoding signaling compounds of the mycoparasite Trichoderma atroviride is described, using the hph gene of Escherichia coli as selection marker. The transformation vectors contained about 1 kb of 5' and 3' non-coding regions from the tmk1 (encoding a MAP kinase) or tga3 (encoding an alpha-subunit of a heterotrimeric G protein) target loci flanking a selection marker. Transformation of fungal conidia and selection on hygromycin-containing media applying an overlay-based procedure, which overcomes the lack of formation of distinct single colonies by the fungus, led to stable clones for both disruption constructs. Southern and PCR analyses proved gene disruption by single-copy homologous integration with a frequency of approximately 60% for both genes; and the loss of tmk1 and tga3 transcript formation in the disruptants was demonstrated by RT-PCR.
Detecting and Characterizing Genomic Signatures of Positive Selection in Global Populations
Liu, Xuanyao; Ong, Rick Twee-Hee; Pillai, Esakimuthu Nisha; Elzein, Abier M.; Small, Kerrin S.; Clark, Taane G.; Kwiatkowski, Dominic P.; Teo, Yik-Ying
2013-01-01
Natural selection is a significant force that shapes the architecture of the human genome and introduces diversity across global populations. The question of whether advantageous mutations have arisen in the human genome as a result of single or multiple mutation events remains unanswered except for the fact that there exist a handful of genes such as those that confer lactase persistence, affect skin pigmentation, or cause sickle cell anemia. We have developed a long-range-haplotype method for identifying genomic signatures of positive selection to complement existing methods, such as the integrated haplotype score (iHS) or cross-population extended haplotype homozygosity (XP-EHH), for locating signals across the entire allele frequency spectrum. Our method also locates the founder haplotypes that carry the advantageous variants and infers their corresponding population frequencies. This presents an opportunity to systematically interrogate the whole human genome whether a selection signal shared across different populations is the consequence of a single mutation process followed subsequently by gene flow between populations or of convergent evolution due to the occurrence of multiple independent mutation events either at the same variant or within the same gene. The application of our method to data from 14 populations across the world revealed that positive-selection events tend to cluster in populations of the same ancestry. Comparing the founder haplotypes for events that are present across different populations revealed that convergent evolution is a rare occurrence and that the majority of shared signals stem from the same evolutionary event. PMID:23731540
Application of nanomaterials in the bioanalytical detection of disease-related genes.
Zhu, Xiaoqian; Li, Jiao; He, Hanping; Huang, Min; Zhang, Xiuhua; Wang, Shengfu
2015-12-15
In the diagnosis of genetic diseases and disorders, nanomaterials-based gene detection systems have significant advantages over conventional diagnostic systems in terms of simplicity, sensitivity, specificity, and portability. In this review, we describe the application of nanomaterials for disease-related genes detection in different methods excluding PCR-related method, such as colorimetry, fluorescence-based methods, electrochemistry, microarray methods, surface-enhanced Raman spectroscopy (SERS), quartz crystal microbalance (QCM) methods, and dynamic light scattering (DLS). The most commonly used nanomaterials are gold, silver, carbon and semiconducting nanoparticles. Various nanomaterials-based gene detection methods are introduced, their respective advantages are discussed, and selected examples are provided to illustrate the properties of these nanomaterials and their emerging applications for the detection of specific nucleic acid sequences. Copyright © 2015. Published by Elsevier B.V.
A genome-wide scan for signatures of selection in Chinese indigenous and commercial pig breeds.
Yang, Songbai; Li, Xiuling; Li, Kui; Fan, Bin; Tang, Zhonglin
2014-01-15
Modern breeding and artificial selection play critical roles in pig domestication and shape the genetic variation of different breeds. China has many indigenous pig breeds with various characteristics in morphology and production performance that differ from those of foreign commercial pig breeds. However, the signatures of selection on genes implying for economic traits between Chinese indigenous and commercial pigs have been poorly understood. We identified footprints of positive selection at the whole genome level, comprising 44,652 SNPs genotyped in six Chinese indigenous pig breeds, one developed breed and two commercial breeds. An empirical genome-wide distribution of Fst (F-statistics) was constructed based on estimations of Fst for each SNP across these nine breeds. We detected selection at the genome level using the High-Fst outlier method and found that 81 candidate genes show high evidence of positive selection. Furthermore, the results of network analyses showed that the genes that displayed evidence of positive selection were mainly involved in the development of tissues and organs, and the immune response. In addition, we calculated the pairwise Fst between Chinese indigenous and commercial breeds (CHN VS EURO) and between Northern and Southern Chinese indigenous breeds (Northern VS Southern). The IGF1R and ESR1 genes showed evidence of positive selection in the CHN VS EURO and Northern VS Southern groups, respectively. In this study, we first identified the genomic regions that showed evidences of selection between Chinese indigenous and commercial pig breeds using the High-Fst outlier method. These regions were found to be involved in the development of tissues and organs, the immune response, growth and litter size. The results of this study provide new insights into understanding the genetic variation and domestication in pigs.
A genome-wide scan for signatures of selection in Chinese indigenous and commercial pig breeds
2014-01-01
Background Modern breeding and artificial selection play critical roles in pig domestication and shape the genetic variation of different breeds. China has many indigenous pig breeds with various characteristics in morphology and production performance that differ from those of foreign commercial pig breeds. However, the signatures of selection on genes implying for economic traits between Chinese indigenous and commercial pigs have been poorly understood. Results We identified footprints of positive selection at the whole genome level, comprising 44,652 SNPs genotyped in six Chinese indigenous pig breeds, one developed breed and two commercial breeds. An empirical genome-wide distribution of Fst (F-statistics) was constructed based on estimations of Fst for each SNP across these nine breeds. We detected selection at the genome level using the High-Fst outlier method and found that 81 candidate genes show high evidence of positive selection. Furthermore, the results of network analyses showed that the genes that displayed evidence of positive selection were mainly involved in the development of tissues and organs, and the immune response. In addition, we calculated the pairwise Fst between Chinese indigenous and commercial breeds (CHN VS EURO) and between Northern and Southern Chinese indigenous breeds (Northern VS Southern). The IGF1R and ESR1 genes showed evidence of positive selection in the CHN VS EURO and Northern VS Southern groups, respectively. Conclusions In this study, we first identified the genomic regions that showed evidences of selection between Chinese indigenous and commercial pig breeds using the High-Fst outlier method. These regions were found to be involved in the development of tissues and organs, the immune response, growth and litter size. The results of this study provide new insights into understanding the genetic variation and domestication in pigs. PMID:24422716
Sorhannus, Ulf
2011-01-01
Hypotheses about horizontal transfer of antifreeze protein genes to ice-living diatoms were addressed using two different statistical methods available in the program Prunier. The role of diversifying selection in driving the differentiation of a set of antifreeze protein genes in the diatom genus Fragilariopsis was also investigated. Four horizontal gene transfer events were identified. Two of these took place between two major eukaryote lineages, that is from the diatom Chaetoceros neogracile to the copepod Stephos longipes and from a basidiomycete clade to a monophyletic group, consisting of the diatom species Fragilariopsis curta and Fragilariopsis cylindrus. The remaining two events included transfers from an ascomycete lineage to the proteobacterium Stigmatella aurantiaca and from the proteobacterium Polaribacter irgensii to a group composed of 4 proteobacterium species. After the Fragilariopsis lineage acquired the antifreeze protein gene from the basidiomycetes, it duplicated and went through episodic evolution, characterized by strong positive selection acting on short segments of the branches in the tree. This selection pattern suggests that the paralogs differentiated functionally over relatively short time periods. Taken together, the results obtained here indicate that the group of antifreeze protein genes considered here have a complex evolutionary history. PMID:22253534
Positive selection and functional divergence of farnesyl pyrophosphate synthase genes in plants.
Qian, Jieying; Liu, Yong; Chao, Naixia; Ma, Chengtong; Chen, Qicong; Sun, Jian; Wu, Yaosheng
2017-02-04
Farnesyl pyrophosphate synthase (FPS) belongs to the short-chain prenyltransferase family, and it performs a conserved and essential role in the terpenoid biosynthesis pathway. However, its classification, evolutionary history, and the forces driving the evolution of FPS genes in plants remain poorly understood. Phylogeny and positive selection analysis was used to identify the evolutionary forces that led to the functional divergence of FPS in plants, and recombinant detection was undertaken using the Genetic Algorithm for Recombination Detection (GARD) method. The dataset included 68 FPS variation pattern sequences (2 gymnosperms, 10 monocotyledons, 54 dicotyledons, and 2 outgroups). This study revealed that the FPS gene was under positive selection in plants. No recombinant within the FPS gene was found. Therefore, it was inferred that the positive selection of FPS had not been influenced by a recombinant episode. The positively selected sites were mainly located in the catalytic center and functional areas, which indicated that the 98S and 234D were important positively selected sites for plant FPS in the terpenoid biosynthesis pathway. They were located in the FPS conserved domain of the catalytic site. We inferred that the diversification of FPS genes was associated with functional divergence and could be driven by positive selection. It was clear that protein sequence evolution via positive selection was able to drive adaptive diversification in plant FPS proteins. This study provides information on the classification and positive selection of plant FPS genes, and the results could be useful for further research on the regulation of triterpenoid biosynthesis.
High throughput selection of antibiotic-resistant transgenic Arabidopsis plants.
Nagashima, Yukihiro; Koiwa, Hisashi
2017-05-15
Kanamycin resistance is the most frequently used antibiotic-resistance marker for Arabidopsis transformations, however, this method frequently causes escape of untransformed plants, particularly at the high seedling density during the selection. Here we developed a robust high-density selection method using top agar for Arabidopsis thaliana. Top agar effectively suppressed growth of untransformed wild-type plants on selection media at high density. Survival of the transformed plants during the selection were confirmed by production of green true leaves and expression of a firefly luciferase reporter gene. Top agar method allowed selection using a large amount of seeds in Arabidopsis transformation. Copyright © 2017 Elsevier Inc. All rights reserved.
USDA-ARS?s Scientific Manuscript database
Reverse Transcription quantitative Polymerase Chain Reaction (qRT-PCR) is a popular method for measuring transcript abundance. The most commonly used method of interpretation is relative quantification and thus necessitates the use of normalization controls (i.e. reference genes) to standardize tran...
Regional gene mapping using mixed radiation hybrids and reverse chromosome painting.
Lin, J Y; Bedford, J S
1997-11-01
We describe a new approach for low-resolution physical mapping using pooled DNA probe from mixed (non-clonal) populations of human-CHO cell hybrids and reverse chromosome painting. This mapping method is based on a process in which the human chromosome fragments bearing a complementing gene were selectively retained in a large non-clonal population of CHO-human hybrid cells during a series of 12- to 15-Gy gamma irradiations each followed by continuous growth selection. The location of the gene could then be identified by reverse chromosome painting on normal human metaphase spreads using biotinylated DNA from this population of "enriched" hybrid cells. We tested the validity of this method by correctly mapping the complementing human HPRT gene, whose location is well established. We then demonstrated the method's usefulness by mapping the chromosome location of a human gene which complemented the defect responsible for the hypersensitivity to ionizing radiation in CHO irs-20 cells. This method represents an efficient alternative to conventional concordance analysis in somatic cell hybrids where detailed chromosome analysis of numerous hybrid clones is necessary. Using this approach, it is possible to localize a gene for which there is no prior sequence or linkage information to a subchromosomal region, thus facilitating association with known mapping landmarks (e.g. RFLP, YAC or STS contigs) for higher-resolution mapping.
Gene set analysis approaches for RNA-seq data: performance evaluation and application guideline
Rahmatallah, Yasir; Emmert-Streib, Frank
2016-01-01
Transcriptome sequencing (RNA-seq) is gradually replacing microarrays for high-throughput studies of gene expression. The main challenge of analyzing microarray data is not in finding differentially expressed genes, but in gaining insights into the biological processes underlying phenotypic differences. To interpret experimental results from microarrays, gene set analysis (GSA) has become the method of choice, in particular because it incorporates pre-existing biological knowledge (in a form of functionally related gene sets) into the analysis. Here we provide a brief review of several statistically different GSA approaches (competitive and self-contained) that can be adapted from microarrays practice as well as those specifically designed for RNA-seq. We evaluate their performance (in terms of Type I error rate, power, robustness to the sample size and heterogeneity, as well as the sensitivity to different types of selection biases) on simulated and real RNA-seq data. Not surprisingly, the performance of various GSA approaches depends only on the statistical hypothesis they test and does not depend on whether the test was developed for microarrays or RNA-seq data. Interestingly, we found that competitive methods have lower power as well as robustness to the samples heterogeneity than self-contained methods, leading to poor results reproducibility. We also found that the power of unsupervised competitive methods depends on the balance between up- and down-regulated genes in tested gene sets. These properties of competitive methods have been overlooked before. Our evaluation provides a concise guideline for selecting GSA approaches, best performing under particular experimental settings in the context of RNA-seq. PMID:26342128
Ravanfar, Seyed Ali; Orbovic, Vladimir; Moradpour, Mahdi; Abdul Aziz, Maheran; Karan, Ratna; Wallace, Simon; Parajuli, Saroj
2017-04-01
Development of in vitro plant regeneration method from Brassica explants via organogenesis and somatic embryogenesis is influenced by many factors such as culture environment, culture medium composition, explant sources, and genotypes which are reviewed in this study. An efficient in vitro regeneration system to allow genetic transformation of Brassica is a crucial tool for improving its economical value. Methods to optimize transformation protocols for the efficient introduction of desirable traits, and a comparative analysis of these methods are also reviewed. Hence, binary vectors, selectable marker genes, minimum inhibitory concentration of selection agents, reporter marker genes, preculture media, Agrobacterium concentration and regeneration ability of putative transformants for improvement of Agrobacterium-mediated transformation of Brassica are discussed.
Analyzing Kernel Matrices for the Identification of Differentially Expressed Genes
Xia, Xiao-Lei; Xing, Huanlai; Liu, Xueqin
2013-01-01
One of the most important applications of microarray data is the class prediction of biological samples. For this purpose, statistical tests have often been applied to identify the differentially expressed genes (DEGs), followed by the employment of the state-of-the-art learning machines including the Support Vector Machines (SVM) in particular. The SVM is a typical sample-based classifier whose performance comes down to how discriminant samples are. However, DEGs identified by statistical tests are not guaranteed to result in a training dataset composed of discriminant samples. To tackle this problem, a novel gene ranking method namely the Kernel Matrix Gene Selection (KMGS) is proposed. The rationale of the method, which roots in the fundamental ideas of the SVM algorithm, is described. The notion of ''the separability of a sample'' which is estimated by performing -like statistics on each column of the kernel matrix, is first introduced. The separability of a classification problem is then measured, from which the significance of a specific gene is deduced. Also described is a method of Kernel Matrix Sequential Forward Selection (KMSFS) which shares the KMGS method's essential ideas but proceeds in a greedy manner. On three public microarray datasets, our proposed algorithms achieved noticeably competitive performance in terms of the B.632+ error rate. PMID:24349110
Measured Gene-by-Environment Interaction in Relation to Attention-Deficit/Hyperactivity Disorder
ERIC Educational Resources Information Center
Nigg, Joel; Nikolas, Molly; Burt, S. Alexandra
2010-01-01
Objective: To summarize and evaluate the state of knowledge regarding the role of measured gene-by-environment interactions in relation to attention-deficit/hyperactivity disorder. Method: A selective review of methodologic issues was followed by a systematic search for relevant articles on measured gene-by-environment interactions; the search…
Selection of reference genes for miRNA qRT-PCR under abiotic stress in grapevine.
Luo, Meng; Gao, Zhen; Li, Hui; Li, Qin; Zhang, Caixi; Xu, Wenping; Song, Shiren; Ma, Chao; Wang, Shiping
2018-03-13
Grapevine is among the fruit crops with high economic value, and because of the economic losses caused by abiotic stresses, the stress resistance of Vitis vinifera has become an increasingly important research area. Among the mechanisms responding to environmental stresses, the role of miRNA has received much attention recently. qRT-PCR is a powerful method for miRNA quantitation, but the accuracy of the method strongly depends on the appropriate reference genes. To determine the most suitable reference genes for grapevine miRNA qRT-PCR, 15 genes were chosen as candidate reference genes. After eliminating 6 candidate reference genes with unsatisfactory amplification efficiency, the expression stability of the remaining candidate reference genes under salinity, cold and drought was analysed using four algorithms, geNorm, NormFinder, deltaCt and Bestkeeper. The results indicated that U6 snRNA was the most suitable reference gene under salinity and cold stresses; whereas miR168 was the best for drought stress. The best reference gene sets for salinity, cold and drought stresses were miR160e + miR164a, miR160e + miR168 and ACT + UBQ + GAPDH, respectively. The selected reference genes or gene sets were verified using miR319 or miR408 as the target gene.
Aghdam, Rosa; Baghfalaki, Taban; Khosravi, Pegah; Saberi Ansari, Elnaz
2017-12-01
Deciphering important genes and pathways from incomplete gene expression data could facilitate a better understanding of cancer. Different imputation methods can be applied to estimate the missing values. In our study, we evaluated various imputation methods for their performance in preserving significant genes and pathways. In the first step, 5% genes are considered in random for two types of ignorable and non-ignorable missingness mechanisms with various missing rates. Next, 10 well-known imputation methods were applied to the complete datasets. The significance analysis of microarrays (SAM) method was applied to detect the significant genes in rectal and lung cancers to showcase the utility of imputation approaches in preserving significant genes. To determine the impact of different imputation methods on the identification of important genes, the chi-squared test was used to compare the proportions of overlaps between significant genes detected from original data and those detected from the imputed datasets. Additionally, the significant genes are tested for their enrichment in important pathways, using the ConsensusPathDB. Our results showed that almost all the significant genes and pathways of the original dataset can be detected in all imputed datasets, indicating that there is no significant difference in the performance of various imputation methods tested. The source code and selected datasets are available on http://profiles.bs.ipm.ir/softwares/imputation_methods/. Copyright © 2017. Production and hosting by Elsevier B.V.
Evaluation of the effect and profitability of gene-assisted selection in pig breeding system*
Li, Ya-lan; Zhang, Qin; Chen, Yao-sheng
2007-01-01
Objective: To evaluate the effect and profitability of using the quantitative trait loci (QTL)-linked direct marker (DR marker) in gene-assisted selection (GAS). Methods: Three populations (100, 200, or 300 sows plus 10 boars within each group) with segregating QTL were simulated stochastically. Five economic traits were investigated, including number of born alive (NBA), average daily gain to 100 kg body weight (ADG), feed conversion ratio (FCR), back fat at 100 kg body weight (BF) and intramuscular fat (IMF). Selection was based on the estimated breeding value (EBV) of each trait. The starting frequencies of the QTL’s favorable allele were 0.1, 0.3 and 0.5, respectively. The economic return was calculated by gene flow method. Results: The selection efficiency was higher than 100% when DR markers were used in GAS for 5 traits. The selection efficiency for NBA was the highest, and the lowest was for ADG whose QTL had the lowest variance. The mixed model applied DR markers and obtained higher extra genetic gain and extra economic returns. We also found that the lower the frequency of the favorable allele of the QTL, the higher the extra return obtained. Conclusion: GAS is an effective selection scheme to increase the genetic gain and the economic returns in pig breeding. PMID:17973344
Frikha-Gargouri, Olfa; Ben Abdallah, Dorra; Bhar, Ilhem; Tounsi, Slim
2017-01-01
This study aimed to improve the screening method for the selection of Bacillus biocontrol agents against crown gall disease. The relationship between the strain biocontrol ability and their in vitro studied traits was investigated to identify the most important factors to be considered for the selection of effective biocontrol agents. In fact, previous selection procedure relying only on in vitro antibacterial activity was shown to be not suitable in some cases. A direct plant-protection strategy was performed to screen the 32 Bacillus biocontrol agent candidates. Moreover, potential in vitro biocontrol traits were investigated including biofilm formation, motility, hemolytic activity, detection of lipopeptide biosynthetic genes ( sfp, ituC and bmyB ) and production of antibacterial compounds. The obtained results indicated high correlations of the efficiency of the biocontrol with the reduction of gall weight ( p = 0.000) and the antibacterial activity in vitro ( p = 0.000). Moreover, there was strong correlations of the efficiency of the biocontrol ( p = 0.004) and the reduction in gall weight ( p = 0.000) with the presence of the bmyB gene. This gene directs the synthesis of the lipopeptide bacillomycin belonging to the iturinic family of lipopeptides. These results were also confirmed by the two-way hierarchical cluster analysis and the correspondence analysis showing the relatedness of these four variables. According to the obtained results a new screening procedure of Bacillus biocontrol agents against crown gall disease could be advanced consisting on two step selection procedure. The first consists on selecting strains with high antibacterial activity in vitro or those harbouring the bmyB gene. Further selection has to be performed on tomato plants in vivo . Moreover, based on the results of the biocontrol assay, five potent strains exhibiting high biocontrol abilities were selected. They were identified as Bacillus subtilis or Bacillus amyloliquefaciens . These strains were found to produce either surfactin or surfactin and iturin lipopeptides. In conclusion, our study presented a new and effective method to evaluate the biocontrol ability of antagonistic Bacillus strains against crown gall disease that could increase the efficiency of screening method of biocontrol agents. Besides, the selected strains could be used as novel biocontrol agents against pathogenic Agrobacterium tumefaciens strains.
Frikha-Gargouri, Olfa; Ben Abdallah, Dorra; Bhar, Ilhem; Tounsi, Slim
2017-01-01
This study aimed to improve the screening method for the selection of Bacillus biocontrol agents against crown gall disease. The relationship between the strain biocontrol ability and their in vitro studied traits was investigated to identify the most important factors to be considered for the selection of effective biocontrol agents. In fact, previous selection procedure relying only on in vitro antibacterial activity was shown to be not suitable in some cases. A direct plant-protection strategy was performed to screen the 32 Bacillus biocontrol agent candidates. Moreover, potential in vitro biocontrol traits were investigated including biofilm formation, motility, hemolytic activity, detection of lipopeptide biosynthetic genes (sfp, ituC and bmyB) and production of antibacterial compounds. The obtained results indicated high correlations of the efficiency of the biocontrol with the reduction of gall weight (p = 0.000) and the antibacterial activity in vitro (p = 0.000). Moreover, there was strong correlations of the efficiency of the biocontrol (p = 0.004) and the reduction in gall weight (p = 0.000) with the presence of the bmyB gene. This gene directs the synthesis of the lipopeptide bacillomycin belonging to the iturinic family of lipopeptides. These results were also confirmed by the two-way hierarchical cluster analysis and the correspondence analysis showing the relatedness of these four variables. According to the obtained results a new screening procedure of Bacillus biocontrol agents against crown gall disease could be advanced consisting on two step selection procedure. The first consists on selecting strains with high antibacterial activity in vitro or those harbouring the bmyB gene. Further selection has to be performed on tomato plants in vivo. Moreover, based on the results of the biocontrol assay, five potent strains exhibiting high biocontrol abilities were selected. They were identified as Bacillus subtilis or Bacillus amyloliquefaciens. These strains were found to produce either surfactin or surfactin and iturin lipopeptides. In conclusion, our study presented a new and effective method to evaluate the biocontrol ability of antagonistic Bacillus strains against crown gall disease that could increase the efficiency of screening method of biocontrol agents. Besides, the selected strains could be used as novel biocontrol agents against pathogenic Agrobacterium tumefaciens strains. PMID:28855909
2013-01-01
Background The cloning of gene sequences forms the basis for many molecular biological studies. One important step in the cloning process is the isolation of bacterial transformants carrying vector DNA. This involves a vector-encoded selectable marker gene, which in most cases, confers resistance to an antibiotic. However, there are a number of circumstances in which a different selectable marker is required or may be preferable. Such situations can include restrictions to host strain choice, two phase cloning experiments and mutagenesis experiments, issues that result in additional unnecessary cloning steps, in which the DNA needs to be subcloned into a vector with a suitable selectable marker. Results We have used restriction enzyme mediated gene disruption to modify the selectable marker gene of a given vector by cloning a different selectable marker gene into the original marker present in that vector. Cloning a new selectable marker into a pre-existing marker was found to change the selection phenotype conferred by that vector, which we were able to demonstrate using multiple commonly used vectors and multiple resistance markers. This methodology was also successfully applied not only to cloning vectors, but also to expression vectors while keeping the expression characteristics of the vector unaltered. Conclusions Changing the selectable marker of a given vector has a number of advantages and applications. This rapid and efficient method could be used for co-expression of recombinant proteins, optimisation of two phase cloning procedures, as well as multiple genetic manipulations within the same host strain without the need to remove a pre-existing selectable marker in a previously genetically modified strain. PMID:23497512
Automatic Design of Synthetic Gene Circuits through Mixed Integer Non-linear Programming
Huynh, Linh; Kececioglu, John; Köppe, Matthias; Tagkopoulos, Ilias
2012-01-01
Automatic design of synthetic gene circuits poses a significant challenge to synthetic biology, primarily due to the complexity of biological systems, and the lack of rigorous optimization methods that can cope with the combinatorial explosion as the number of biological parts increases. Current optimization methods for synthetic gene design rely on heuristic algorithms that are usually not deterministic, deliver sub-optimal solutions, and provide no guaranties on convergence or error bounds. Here, we introduce an optimization framework for the problem of part selection in synthetic gene circuits that is based on mixed integer non-linear programming (MINLP), which is a deterministic method that finds the globally optimal solution and guarantees convergence in finite time. Given a synthetic gene circuit, a library of characterized parts, and user-defined constraints, our method can find the optimal selection of parts that satisfy the constraints and best approximates the objective function given by the user. We evaluated the proposed method in the design of three synthetic circuits (a toggle switch, a transcriptional cascade, and a band detector), with both experimentally constructed and synthetic promoter libraries. Scalability and robustness analysis shows that the proposed framework scales well with the library size and the solution space. The work described here is a step towards a unifying, realistic framework for the automated design of biological circuits. PMID:22536398
Nwakanma, Davis C.; Duffy, Craig W.; Amambua-Ngwa, Alfred; Oriero, Eniyou C.; Bojang, Kalifa A.; Pinder, Margaret; Drakeley, Chris J.; Sutherland, Colin J.; Milligan, Paul J.; MacInnis, Bronwyn; Kwiatkowski, Dominic P.; Clark, Taane G.; Greenwood, Brian M.; Conway, David J.
2014-01-01
Background. Analysis of genome-wide polymorphism in many organisms has potential to identify genes under recent selection. However, data on historical allele frequency changes are rarely available for direct confirmation. Methods. We genotyped single nucleotide polymorphisms (SNPs) in 4 Plasmodium falciparum drug resistance genes in 668 archived parasite-positive blood samples of a Gambian population between 1984 and 2008. This covered a period before antimalarial resistance was detected locally, through subsequent failure of multiple drugs until introduction of artemisinin combination therapy. We separately performed genome-wide sequence analysis of 52 clinical isolates from 2008 to prospect for loci under recent directional selection. Results. Resistance alleles increased from very low frequencies, peaking in 2000 for chloroquine resistance-associated crt and mdr1 genes and at the end of the survey period for dhfr and dhps genes respectively associated with pyrimethamine and sulfadoxine resistance. Temporal changes fit a model incorporating likely selection coefficients over the period. Three of the drug resistance loci were in the top 4 regions under strong selection implicated by the genome-wide analysis. Conclusions. Genome-wide polymorphism analysis of an endemic population sample robustly identifies loci with detailed documentation of recent selection, demonstrating power to prospectively detect emerging drug resistance genes. PMID:24265439
[The effect of polymorphism F279Y of GHR gene on milk production trait in Chinese Holstein cattle].
Ma, Yan-Nan; He, Peng-Jia; Zhu, Jing; Lei, Zhao-Min; Liu, Zhe; Wu, Jian-Ping
2013-09-01
To study the effect of the polymorphism F279Y of the growth hormone receptor (GHR) gene on milk yield and composition in Chinese Holstein cattle. Hundred thirty two Chinese Holstein cattle were selected as study materials, according to DHI production performance method to get the data of milk yield and composition; PCR- SSCP and sequencing method were used to detect the genotypes; least square method was used to acquire correlation analysis. Chinese Holstein cattle F279Y of GHR gene loci A and T allele frequency were 0.68 and 0.32, respectively, the experimental group significantly deviated from Hardy Weinberg equilibrium (P < 0.01); 305 d milk yield of AA genotype was significantly higher than AT type (P < 0.05), 305 d milk fat yield, 305 d milk protein yield and 305 d lactose of AT type had better trend than those of AA type in numeric; Therefore, allele A was dominant gene of high milk yield, allele T has positive effect on milk composition. Mutation F279Y of GHR gene can be used as genetic markers in Chinese Holstein milk production traits of marker assisted selection (MAS) breeding.
An Adaptive Genetic Association Test Using Double Kernel Machines.
Zhan, Xiang; Epstein, Michael P; Ghosh, Debashis
2015-10-01
Recently, gene set-based approaches have become very popular in gene expression profiling studies for assessing how genetic variants are related to disease outcomes. Since most genes are not differentially expressed, existing pathway tests considering all genes within a pathway suffer from considerable noise and power loss. Moreover, for a differentially expressed pathway, it is of interest to select important genes that drive the effect of the pathway. In this article, we propose an adaptive association test using double kernel machines (DKM), which can both select important genes within the pathway as well as test for the overall genetic pathway effect. This DKM procedure first uses the garrote kernel machines (GKM) test for the purposes of subset selection and then the least squares kernel machine (LSKM) test for testing the effect of the subset of genes. An appealing feature of the kernel machine framework is that it can provide a flexible and unified method for multi-dimensional modeling of the genetic pathway effect allowing for both parametric and nonparametric components. This DKM approach is illustrated with application to simulated data as well as to data from a neuroimaging genetics study.
Nordeste, Ricardo F; Trainer, Maria A; Charles, Trevor C
2010-01-01
Development of different PHAs as alternatives to petrochemically derived plastics can be facilitated by mining metagenomic libraries for diverse PHA cycle genes that might be useful for synthesis of bioplastics. The specific phenotypes associated with mutations of the PHA synthesis pathway genes in Sinorhizobium meliloti allows for the use of powerful selection and screening tools to identify complementing novel PHA synthesis genes. Identification of novel genes through their function rather than sequence facilitates finding functional proteins that may otherwise have been excluded through sequence-only screening methodology. We present here methods that we have developed for the isolation of clones expressing novel PHA metabolism genes from metagenomic libraries.
Cheng, Jiujun; Nordeste, Ricardo; Trainer, Maria A; Charles, Trevor C
2017-01-01
Development of different PHAs as alternatives to petrochemically derived plastics can be facilitated by mining metagenomic libraries for diverse PHA cycle genes that might be useful for synthesis of bio-plastics. The specific phenotypes associated with mutations of the PHA synthesis pathway genes in Sinorhizobium meliloti and Pseudomonas putida, allows the use of powerful selection and screening tools to identify complementing novel PHA synthesis genes. Identification of novel genes through their function rather than sequence facilitates the functional proteins that may otherwise have been excluded through sequence-only screening methodology. We present here methods that we have developed for the isolation of clones expressing novel PHA metabolism genes from metagenomic libraries.
Agaba, Morris; Cavener, Douglas R.
2017-01-01
Background The capacity of visually oriented species to perceive and respond to visual signal is integral to their evolutionary success. Giraffes are closely related to okapi, but the two species have broad range of phenotypic differences including their visual capacities. Vision studies rank giraffe’s visual acuity higher than all other artiodactyls despite sharing similar vision ecological determinants with many of them. The extent to which the giraffe’s unique visual capacity and its difference with okapi is reflected by changes in their vision genes is not understood. Methods The recent availability of giraffe and okapi genomes provided opportunity to identify giraffe and okapi vision genes. Multiple strategies were employed to identify thirty-six candidate mammalian vision genes in giraffe and okapi genomes. Quantification of selection pressure was performed by a combination of branch-site tests of positive selection and clade models of selection divergence through comparing giraffe and okapi vision genes and orthologous sequences from other mammals. Results Signatures of selection were identified in key genes that could potentially underlie giraffe and okapi visual adaptations. Importantly, some genes that contribute to optical transparency of the eye and those that are critical in light signaling pathway were found to show signatures of adaptive evolution or selection divergence. Comparison between giraffe and other ruminants identifies significant selection divergence in CRYAA and OPN1LW. Significant selection divergence was identified in SAG while positive selection was detected in LUM when okapi is compared with ruminants and other mammals. Sequence analysis of OPN1LW showed that at least one of the sites known to affect spectral sensitivity of the red pigment is uniquely divergent between giraffe and other ruminants. Discussion By taking a systemic approach to gene function in vision, the results provide the first molecular clues associated with giraffe and okapi vision adaptations. At least some of the genes that exhibit signature of selection may reflect adaptive response to differences in giraffe and okapi habitat. We hypothesize that requirement for long distance vision associated with predation and communication with conspecifics likely played an important role in the adaptive pressure on giraffe vision genes. PMID:28396824
GeneRIF indexing: sentence selection based on machine learning.
Jimeno-Yepes, Antonio J; Sticco, J Caitlin; Mork, James G; Aronson, Alan R
2013-05-31
A Gene Reference Into Function (GeneRIF) describes novel functionality of genes. GeneRIFs are available from the National Center for Biotechnology Information (NCBI) Gene database. GeneRIF indexing is performed manually, and the intention of our work is to provide methods to support creating the GeneRIF entries. The creation of GeneRIF entries involves the identification of the genes mentioned in MEDLINE®; citations and the sentences describing a novel function. We have compared several learning algorithms and several features extracted or derived from MEDLINE sentences to determine if a sentence should be selected for GeneRIF indexing. Features are derived from the sentences or using mechanisms to augment the information provided by them: assigning a discourse label using a previously trained model, for example. We show that machine learning approaches with specific feature combinations achieve results close to one of the annotators. We have evaluated different feature sets and learning algorithms. In particular, Naïve Bayes achieves better performance with a selection of features similar to one used in related work, which considers the location of the sentence, the discourse of the sentence and the functional terminology in it. The current performance is at a level similar to human annotation and it shows that machine learning can be used to automate the task of sentence selection for GeneRIF annotation. The current experiments are limited to the human species. We would like to see how the methodology can be extended to other species, specifically the normalization of gene mentions in other species.
Ranking of Prokaryotic Genomes Based on Maximization of Sortedness of Gene Lengths
Bolshoy, A; Salih, B; Cohen, I; Tatarinova, T
2014-01-01
How variations of gene lengths (some genes become longer than their predecessors, while other genes become shorter and the sizes of these factions are randomly different from organism to organism) depend on organismal evolution and adaptation is still an open question. We propose to rank the genomes according to lengths of their genes, and then find association between the genome rank and variousproperties, such as growth temperature, nucleotide composition, and pathogenicity. This approach reveals evolutionary driving factors. The main purpose of this study is to test effectiveness and robustness of several ranking methods. The selected method of evaluation is measuring of overall sortedness of the data. We have demonstrated that all considered methods give consistent results and Bubble Sort and Simulated Annealing achieve the highest sortedness. Also, Bubble Sort is considerably faster than the Simulated Annealing method. PMID:26146586
Ranking of Prokaryotic Genomes Based on Maximization of Sortedness of Gene Lengths.
Bolshoy, A; Salih, B; Cohen, I; Tatarinova, T
How variations of gene lengths (some genes become longer than their predecessors, while other genes become shorter and the sizes of these factions are randomly different from organism to organism) depend on organismal evolution and adaptation is still an open question. We propose to rank the genomes according to lengths of their genes, and then find association between the genome rank and variousproperties, such as growth temperature, nucleotide composition, and pathogenicity. This approach reveals evolutionary driving factors. The main purpose of this study is to test effectiveness and robustness of several ranking methods. The selected method of evaluation is measuring of overall sortedness of the data. We have demonstrated that all considered methods give consistent results and Bubble Sort and Simulated Annealing achieve the highest sortedness. Also, Bubble Sort is considerably faster than the Simulated Annealing method.
Zehner, R; Zimmermann, S; Mebs, D
1998-01-01
To identify common animal species by analysis of the cytochrome b gene a method has been developed to obtain PCR products of a large domain of the cytochrome b gene (981 bp out of 1140 bp) in humans, selected mammals and birds using the same specifically designed primers. Species-specific RFLP patterns are generated by co-restriction with the restriction endonucleases ALU I and NCO I. The RFLP patterns obtained are conclusive even in mixtures of two or more species. The results were confirmed by sequence analysis which in addition explained intraspecies variations in the RFLP patterns. The method has been applied to forensic casework studies where the origin of roasted meat, stomach contents and a bone sample has been successfully identified.
Roux, Julien; Liu, Jialin; Robinson-Rechavi, Marc
2017-01-01
Abstract The evolutionary history of vertebrates is marked by three ancient whole-genome duplications: two successive rounds in the ancestor of vertebrates, and a third one specific to teleost fishes. Biased loss of most duplicates enriched the genome for specific genes, such as slow evolving genes, but this selective retention process is not well understood. To understand what drives the long-term preservation of duplicate genes, we characterized duplicated genes in terms of their expression patterns. We used a new method of expression enrichment analysis, TopAnat, applied to in situ hybridization data from thousands of genes from zebrafish and mouse. We showed that the presence of expression in the nervous system is a good predictor of a higher rate of retention of duplicate genes after whole-genome duplication. Further analyses suggest that purifying selection against the toxic effects of misfolded or misinteracting proteins, which is particularly strong in nonrenewing neural tissues, likely constrains the evolution of coding sequences of nervous system genes, leading indirectly to the preservation of duplicate genes after whole-genome duplication. Whole-genome duplications thus greatly contributed to the expansion of the toolkit of genes available for the evolution of profound novelties of the nervous system at the base of the vertebrate radiation. PMID:28981708
Suzuki, Motoshi; Toyoda, Naoya; Takagi, Shin
2014-01-01
Methods for turning on/off gene expression at the experimenter’s discretion would be useful for various biological studies. Recently, we reported on a novel microscope system utilizing an infrared laser-evoked gene operator (IR-LEGO) designed for inducing heat shock response efficiently in targeted single cells in living organisms without cell damage, thereby driving expression of a transgene under the control of a heat shock promoter. Although the original IR-LEGO can be successfully used for gene induction, several limitations hinder its wider application. Here, using the nematode Caenorhabditis elegans (C. elegans) as a subject, we have made improvements in IR-LEGO. For better spatial control of heating, a pulsed irradiation method using an optical chopper was introduced. As a result, single cells of C. elegans embryos as early as the 2-cell stage and single neurons in ganglia can be induced to express genes selectively. In addition, the introduction of site-specific recombination systems to IR-LEGO enables the induction of gene expression controlled by constitutive and cell type-specific promoters. The strategies adopted here will be useful for future applications of IR-LEGO to other organisms. PMID:24465705
Yılmaz Isıkhan, Selen; Karabulut, Erdem; Alpar, Celal Reha
2016-01-01
Background/Aim . Evaluating the success of dose prediction based on genetic or clinical data has substantially advanced recently. The aim of this study is to predict various clinical dose values from DNA gene expression datasets using data mining techniques. Materials and Methods . Eleven real gene expression datasets containing dose values were included. First, important genes for dose prediction were selected using iterative sure independence screening. Then, the performances of regression trees (RTs), support vector regression (SVR), RT bagging, SVR bagging, and RT boosting were examined. Results . The results demonstrated that a regression-based feature selection method substantially reduced the number of irrelevant genes from raw datasets. Overall, the best prediction performance in nine of 11 datasets was achieved using SVR; the second most accurate performance was provided using a gradient-boosting machine (GBM). Conclusion . Analysis of various dose values based on microarray gene expression data identified common genes found in our study and the referenced studies. According to our findings, SVR and GBM can be good predictors of dose-gene datasets. Another result of the study was to identify the sample size of n = 25 as a cutoff point for RT bagging to outperform a single RT.
Chan, Pek-Lan; Rose, Ray J; Abdul Murad, Abdul Munir; Zainal, Zamri; Low, Eng-Ti Leslie; Ooi, Leslie Cheng-Li; Ooi, Siew-Eng; Yahya, Suzaini; Singh, Rajinder
2014-01-01
The somatic embryogenesis tissue culture process has been utilized to propagate high yielding oil palm. Due to the low callogenesis and embryogenesis rates, molecular studies were initiated to identify genes regulating the process, and their expression levels are usually quantified using reverse transcription quantitative real-time PCR (RT-qPCR). With the recent release of oil palm genome sequences, it is crucial to establish a proper strategy for gene analysis using RT-qPCR. Selection of the most suitable reference genes should be performed for accurate quantification of gene expression levels. In this study, eight candidate reference genes selected from cDNA microarray study and literature review were evaluated comprehensively across 26 tissue culture samples using RT-qPCR. These samples were collected from two tissue culture lines and media treatments, which consisted of leaf explants cultures, callus and embryoids from consecutive developmental stages. Three statistical algorithms (geNorm, NormFinder and BestKeeper) confirmed that the expression stability of novel reference genes (pOP-EA01332, PD00380 and PD00569) outperformed classical housekeeping genes (GAPDH, NAD5, TUBULIN, UBIQUITIN and ACTIN). PD00380 and PD00569 were identified as the most stably expressed genes in total samples, MA2 and MA8 tissue culture lines. Their applicability to validate the expression profiles of a putative ethylene-responsive transcription factor 3-like gene demonstrated the importance of using the geometric mean of two genes for normalization. Systematic selection of the most stably expressed reference genes for RT-qPCR was established in oil palm tissue culture samples. PD00380 and PD00569 were selected for accurate and reliable normalization of gene expression data from RT-qPCR. These data will be valuable to the research associated with the tissue culture process. Also, the method described here will facilitate the selection of appropriate reference genes in other oil palm tissues and in the expression profiling of genes relating to yield, biotic and abiotic stresses.
Catto, James W F; Abbod, Maysam F; Wild, Peter J; Linkens, Derek A; Pilarsky, Christian; Rehman, Ishtiaq; Rosario, Derek J; Denzinger, Stefan; Burger, Maximilian; Stoehr, Robert; Knuechel, Ruth; Hartmann, Arndt; Hamdy, Freddie C
2010-03-01
New methods for identifying bladder cancer (BCa) progression are required. Gene expression microarrays can reveal insights into disease biology and identify novel biomarkers. However, these experiments produce large datasets that are difficult to interpret. To develop a novel method of microarray analysis combining two forms of artificial intelligence (AI): neurofuzzy modelling (NFM) and artificial neural networks (ANN) and validate it in a BCa cohort. We used AI and statistical analyses to identify progression-related genes in a microarray dataset (n=66 tumours, n=2800 genes). The AI-selected genes were then investigated in a second cohort (n=262 tumours) using immunohistochemistry. We compared the accuracy of AI and statistical approaches to identify tumour progression. AI identified 11 progression-associated genes (odds ratio [OR]: 0.70; 95% confidence interval [CI], 0.56-0.87; p=0.0004), and these were more discriminate than genes chosen using statistical analyses (OR: 1.24; 95% CI, 0.96-1.60; p=0.09). The expression of six AI-selected genes (LIG3, FAS, KRT18, ICAM1, DSG2, and BRCA2) was determined using commercial antibodies and successfully identified tumour progression (concordance index: 0.66; log-rank test: p=0.01). AI-selected genes were more discriminate than pathologic criteria at determining progression (Cox multivariate analysis: p=0.01). Limitations include the use of statistical correlation to identify 200 genes for AI analysis and that we did not compare regression identified genes with immunohistochemistry. AI and statistical analyses use different techniques of inference to determine gene-phenotype associations and identify distinct prognostic gene signatures that are equally valid. We have identified a prognostic gene signature whose members reflect a variety of carcinogenic pathways that could identify progression in non-muscle-invasive BCa. 2009 European Association of Urology. Published by Elsevier B.V. All rights reserved.
Analysis of the GRNs Inference by Using Tsallis Entropy and a Feature Selection Approach
NASA Astrophysics Data System (ADS)
Lopes, Fabrício M.; de Oliveira, Evaldo A.; Cesar, Roberto M.
An important problem in the bioinformatics field is to understand how genes are regulated and interact through gene networks. This knowledge can be helpful for many applications, such as disease treatment design and drugs creation purposes. For this reason, it is very important to uncover the functional relationship among genes and then to construct the gene regulatory network (GRN) from temporal expression data. However, this task usually involves data with a large number of variables and small number of observations. In this way, there is a strong motivation to use pattern recognition and dimensionality reduction approaches. In particular, feature selection is specially important in order to select the most important predictor genes that can explain some phenomena associated with the target genes. This work presents a first study about the sensibility of entropy methods regarding the entropy functional form, applied to the problem of topology recovery of GRNs. The generalized entropy proposed by Tsallis is used to study this sensibility. The inference process is based on a feature selection approach, which is applied to simulated temporal expression data generated by an artificial gene network (AGN) model. The inferred GRNs are validated in terms of global network measures. Some interesting conclusions can be drawn from the experimental results, as reported for the first time in the present paper.
Zhang, Hongkai; Torkamani, Ali; Jones, Teresa M; Ruiz, Diana I; Pons, Jaume; Lerner, Richard A
2011-08-16
Use of large combinatorial antibody libraries and next-generation sequencing of nucleic acids are two of the most powerful methods in modern molecular biology. The libraries are screened using the principles of evolutionary selection, albeit in real time, to enrich for members with a particular phenotype. This selective process necessarily results in the loss of information about less-fit molecules. On the other hand, sequencing of the library, by itself, gives information that is mostly unrelated to phenotype. If the two methods could be combined, the full potential of very large molecular libraries could be realized. Here we report the implementation of a phenotype-information-phenotype cycle that integrates information and gene recovery. After selection for phage-encoded antibodies that bind to targets expressed on the surface of Escherichia coli, the information content of the selected pool is obtained by pyrosequencing. Sequences that encode specific antibodies are identified by a bioinformatic analysis and recovered by a stringent affinity method that is uniquely suited for gene isolation from a highly degenerate collection of nucleic acids. This approach can be generalized for selection of antibodies against targets that are present as minor components of complex systems.
Yoshizumi, Takeshi; Oikawa, Kazusato; Chuah, Jo-Ann; Kodama, Yutaka; Numata, Keiji
2018-05-14
Selective gene delivery into organellar genomes (mitochondrial and plastid genomes) has been limited because of a lack of appropriate platform technology, even though these organelles are essential for metabolite and energy production. Techniques for selective organellar modification are needed to functionally improve organelles and produce transplastomic/transmitochondrial plants. However, no method for mitochondrial genome modification has yet been established for multicellular organisms including plants. Likewise, modification of plastid genomes has been limited to a few plant species and algae. In the present study, we developed ionic complexes of fusion peptides containing organellar targeting signal and plasmid DNA for selective delivery of exogenous DNA into the plastid and mitochondrial genomes of intact plants. This is the first report of exogenous DNA being integrated into the mitochondrial genomes of not only plants, but also multicellular organisms in general. This fusion peptide-mediated gene delivery system is a breakthrough platform for both plant organellar biotechnology and gene therapy for mitochondrial diseases in animals.
Fujimura, Tatsuya; Takahagi, Yoichi; Shigehisa, Tamotsu; Nagashima, Hiroshi; Miyagawa, Shuji; Shirakura, Ryota; Murakami, Hiroshi
2008-09-01
The objective of the present study was to isolate alpha 1,3-galactosyltransferase (GalGT)-gene double knockout (DKO) cells using a novel simple method of cell selection method. To obtain GalGT-DKO cells, GalGT-gene single knockout (SKO) fetal fibroblast cells were cultured for three to nine passages and GalGT-null cells were separated using a biotin-labeled IB4 lectin attached to streptavidin-coated magnetic beads. After 15-17 days of additional cultivation, seven GalGT-DKO cell colonies were obtained from a total of 2.5 x 10(7) GalGT-SKO cells. A total of 926 somatic nuclear transferred embryos reconstructed with the DKO cells were transferred into eight recipient pigs, producing four farrowed, three liveborns, and six stillborns. Absence of GalGT gene in the cloned pigs was confirmed by PCR and Southern blotting. Flow cytometric analysis revealed that alphaGal antigens were not present in the cells of the cloned DKO pigs.
A Versatile Panel of Reference Gene Assays for the Measurement of Chicken mRNA by Quantitative PCR
Maier, Helena J.; Van Borm, Steven; Young, John R.; Fife, Mark
2016-01-01
Quantitative real-time PCR assays are widely used for the quantification of mRNA within avian experimental samples. Multiple stably-expressed reference genes, selected for the lowest variation in representative samples, can be used to control random technical variation. Reference gene assays must be reliable, have high amplification specificity and efficiency, and not produce signals from contaminating DNA. Whilst recent research papers identify specific genes that are stable in particular tissues and experimental treatments, here we describe a panel of ten avian gene primer and probe sets that can be used to identify suitable reference genes in many experimental contexts. The panel was tested with TaqMan and SYBR Green systems in two experimental scenarios: a tissue collection and virus infection of cultured fibroblasts. GeNorm and NormFinder algorithms were able to select appropriate reference gene sets in each case. We show the effects of using the selected genes on the detection of statistically significant differences in expression. The results are compared with those obtained using 28s ribosomal RNA, the present most widely accepted reference gene in chicken work, identifying circumstances where its use might provide misleading results. Methods for eliminating DNA contamination of RNA reduced, but did not completely remove, detectable DNA. We therefore attached special importance to testing each qPCR assay for absence of signal using DNA template. The assays and analyses developed here provide a useful resource for selecting reference genes for investigations of avian biology. PMID:27537060
Detecting and characterizing genomic signatures of positive selection in global populations.
Liu, Xuanyao; Ong, Rick Twee-Hee; Pillai, Esakimuthu Nisha; Elzein, Abier M; Small, Kerrin S; Clark, Taane G; Kwiatkowski, Dominic P; Teo, Yik-Ying
2013-06-06
Natural selection is a significant force that shapes the architecture of the human genome and introduces diversity across global populations. The question of whether advantageous mutations have arisen in the human genome as a result of single or multiple mutation events remains unanswered except for the fact that there exist a handful of genes such as those that confer lactase persistence, affect skin pigmentation, or cause sickle cell anemia. We have developed a long-range-haplotype method for identifying genomic signatures of positive selection to complement existing methods, such as the integrated haplotype score (iHS) or cross-population extended haplotype homozygosity (XP-EHH), for locating signals across the entire allele frequency spectrum. Our method also locates the founder haplotypes that carry the advantageous variants and infers their corresponding population frequencies. This presents an opportunity to systematically interrogate the whole human genome whether a selection signal shared across different populations is the consequence of a single mutation process followed subsequently by gene flow between populations or of convergent evolution due to the occurrence of multiple independent mutation events either at the same variant or within the same gene. The application of our method to data from 14 populations across the world revealed that positive-selection events tend to cluster in populations of the same ancestry. Comparing the founder haplotypes for events that are present across different populations revealed that convergent evolution is a rare occurrence and that the majority of shared signals stem from the same evolutionary event. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
A polynomial based model for cell fate prediction in human diseases.
Ma, Lichun; Zheng, Jie
2017-12-21
Cell fate regulation directly affects tissue homeostasis and human health. Research on cell fate decision sheds light on key regulators, facilitates understanding the mechanisms, and suggests novel strategies to treat human diseases that are related to abnormal cell development. In this study, we proposed a polynomial based model to predict cell fate. This model was derived from Taylor series. As a case study, gene expression data of pancreatic cells were adopted to test and verify the model. As numerous features (genes) are available, we employed two kinds of feature selection methods, i.e. correlation based and apoptosis pathway based. Then polynomials of different degrees were used to refine the cell fate prediction function. 10-fold cross-validation was carried out to evaluate the performance of our model. In addition, we analyzed the stability of the resultant cell fate prediction model by evaluating the ranges of the parameters, as well as assessing the variances of the predicted values at randomly selected points. Results show that, within both the two considered gene selection methods, the prediction accuracies of polynomials of different degrees show little differences. Interestingly, the linear polynomial (degree 1 polynomial) is more stable than others. When comparing the linear polynomials based on the two gene selection methods, it shows that although the accuracy of the linear polynomial that uses correlation analysis outcomes is a little higher (achieves 86.62%), the one within genes of the apoptosis pathway is much more stable. Considering both the prediction accuracy and the stability of polynomial models of different degrees, the linear model is a preferred choice for cell fate prediction with gene expression data of pancreatic cells. The presented cell fate prediction model can be extended to other cells, which may be important for basic research as well as clinical study of cell development related diseases.
Lepre, Jorge; Rice, J Jeremy; Tu, Yuhai; Stolovitzky, Gustavo
2004-05-01
Despite the growing literature devoted to finding differentially expressed genes in assays probing different tissues types, little attention has been paid to the combinatorial nature of feature selection inherent to large, high-dimensional gene expression datasets. New flexible data analysis approaches capable of searching relevant subgroups of genes and experiments are needed to understand multivariate associations of gene expression patterns with observed phenotypes. We present in detail a deterministic algorithm to discover patterns of multivariate gene associations in gene expression data. The patterns discovered are differential with respect to a control dataset. The algorithm is exhaustive and efficient, reporting all existent patterns that fit a given input parameter set while avoiding enumeration of the entire pattern space. The value of the pattern discovery approach is demonstrated by finding a set of genes that differentiate between two types of lymphoma. Moreover, these genes are found to behave consistently in an independent dataset produced in a different laboratory using different arrays, thus validating the genes selected using our algorithm. We show that the genes deemed significant in terms of their multivariate statistics will be missed using other methods. Our set of pattern discovery algorithms including a user interface is distributed as a package called Genes@Work. This package is freely available to non-commercial users and can be downloaded from our website (http://www.research.ibm.com/FunGen).
Feeney, Mistianne; Punja, Zamir K
2015-01-01
Hemp (Cannabis sativa L.) suspension culture cells were transformed with Agrobacterium tumefaciens strain EHA101 carrying the binary plasmid pNOV3635. The plasmid contains a phosphomannose isomerase (PMI) selectable marker gene. Cells transformed with PMI are capable of metabolizing the selective agent mannose, whereas cells not expressing the gene are incapable of using the carbon source and will stop growing. Callus masses proliferating on selection medium were screened for PMI expression using a chlorophenol red assay. Genomic DNA was extracted from putatively transformed callus lines, and the presence of the PMI gene was confirmed using PCR and Southern hybridization. Using this method, an average transformation frequency of 31.23% ± 0.14 was obtained for all transformation experiments, with a range of 15.1-55.3%.
Lee, Sang Mi; Kim, Ji Woo; Jeong, Young-Hee; Kim, Se Eun; Kim, Yeong Ji; Moon, Seung Ju; Lee, Ji-Hye; Kim, Keun-Jung; Kim, Min-Kyu; Kang, Man-Jong
2014-11-01
Transgenic animals have become important tools for the production of therapeutic proteins in the domestic animal. Production efficiencies of transgenic animals by conventional methods as microinjection and retrovirus vector methods are low, and the foreign gene expression levels are also low because of their random integration in the host genome. In this study, we investigated the homologous recombination on the porcine β-casein gene locus using a knock-in vector for the β-casein gene locus. We developed the knock-in vector on the porcine β-casein gene locus and isolated knock-in fibroblast for nuclear transfer. The knock-in vector consisted of the neomycin resistance gene (neo) as a positive selectable marker gene, diphtheria toxin-A gene as negative selection marker, and 5' arm and 3' arm from the porcine β-casein gene. The secretion of enhanced green fluorescent protein (EGFP) was more easily detected in the cell culture media than it was by western blot analysis of cell extract of the HC11 mouse mammary epithelial cells transfected with EGFP knock-in vector. These results indicated that a knock-in system using β-casein gene induced high expression of transgene by the gene regulatory sequence of endogenous β-casein gene. These fibroblasts may be used to produce transgenic pigs for the production of therapeutic proteins via the mammary glands.
Lee, Sang Mi; Kim, Ji Woo; Jeong, Young-Hee; Kim, Se Eun; Kim, Yeong Ji; Moon, Seung Ju; Lee, Ji-Hye; Kim, Keun-Jung; Kim, Min-Kyu; Kang, Man-Jong
2014-01-01
Transgenic animals have become important tools for the production of therapeutic proteins in the domestic animal. Production efficiencies of transgenic animals by conventional methods as microinjection and retrovirus vector methods are low, and the foreign gene expression levels are also low because of their random integration in the host genome. In this study, we investigated the homologous recombination on the porcine β-casein gene locus using a knock-in vector for the β-casein gene locus. We developed the knock-in vector on the porcine β-casein gene locus and isolated knock-in fibroblast for nuclear transfer. The knock-in vector consisted of the neomycin resistance gene (neo) as a positive selectable marker gene, diphtheria toxin-A gene as negative selection marker, and 5′ arm and 3′ arm from the porcine β-casein gene. The secretion of enhanced green fluorescent protein (EGFP) was more easily detected in the cell culture media than it was by western blot analysis of cell extract of the HC11 mouse mammary epithelial cells transfected with EGFP knock-in vector. These results indicated that a knock-in system using β-casein gene induced high expression of transgene by the gene regulatory sequence of endogenous β-casein gene. These fibroblasts may be used to produce transgenic pigs for the production of therapeutic proteins via the mammary glands. PMID:25358326
Dos Reis, Mario
2015-04-01
First principles of population genetics are used to obtain formulae relating the non-synonymous to synonymous substitution rate ratio to the selection coefficients acting at codon sites in protein-coding genes. Two theoretical cases are discussed and two examples from real data (a chloroplast gene and a virus polymerase) are given. The formulae give much insight into the dynamics of non-synonymous substitutions and may inform the development of methods to detect adaptive evolution. © 2015 The Author(s) Published by the Royal Society. All rights reserved.
Prediction of regulatory gene pairs using dynamic time warping and gene ontology.
Yang, Andy C; Hsu, Hui-Huang; Lu, Ming-Da; Tseng, Vincent S; Shih, Timothy K
2014-01-01
Selecting informative genes is the most important task for data analysis on microarray gene expression data. In this work, we aim at identifying regulatory gene pairs from microarray gene expression data. However, microarray data often contain multiple missing expression values. Missing value imputation is thus needed before further processing for regulatory gene pairs becomes possible. We develop a novel approach to first impute missing values in microarray time series data by combining k-Nearest Neighbour (KNN), Dynamic Time Warping (DTW) and Gene Ontology (GO). After missing values are imputed, we then perform gene regulation prediction based on our proposed DTW-GO distance measurement of gene pairs. Experimental results show that our approach is more accurate when compared with existing missing value imputation methods on real microarray data sets. Furthermore, our approach can also discover more regulatory gene pairs that are known in the literature than other methods.
Reproducibility-optimized test statistic for ranking genes in microarray studies.
Elo, Laura L; Filén, Sanna; Lahesmaa, Riitta; Aittokallio, Tero
2008-01-01
A principal goal of microarray studies is to identify the genes showing differential expression under distinct conditions. In such studies, the selection of an optimal test statistic is a crucial challenge, which depends on the type and amount of data under analysis. While previous studies on simulated or spike-in datasets do not provide practical guidance on how to choose the best method for a given real dataset, we introduce an enhanced reproducibility-optimization procedure, which enables the selection of a suitable gene- anking statistic directly from the data. In comparison with existing ranking methods, the reproducibilityoptimized statistic shows good performance consistently under various simulated conditions and on Affymetrix spike-in dataset. Further, the feasibility of the novel statistic is confirmed in a practical research setting using data from an in-house cDNA microarray study of asthma-related gene expression changes. These results suggest that the procedure facilitates the selection of an appropriate test statistic for a given dataset without relying on a priori assumptions, which may bias the findings and their interpretation. Moreover, the general reproducibilityoptimization procedure is not limited to detecting differential expression only but could be extended to a wide range of other applications as well.
Identification of selection footprints on the X chromosome in pig.
Ma, Yunlong; Zhang, Haihan; Zhang, Qin; Ding, Xiangdong
2014-01-01
Identifying footprints of selection can provide a straightforward insight into the mechanism of artificial selection and further dig out the causal genes related to important traits. In this study, three between-population and two within-population approaches, the Cross Population Extend Haplotype Homozygosity Test (XPEHH), the Cross Population Composite Likelihood Ratio (XPCLR), the F-statistics (Fst), the Integrated Haplotype Score (iHS) and the Tajima's D, were implemented to detect the selection footprints on the X chromosome in three pig breeds using Illumina Porcine60K SNP chip. In the detection of selection footprints using between-population methods, 11, 11 and 7 potential selection regions with length of 15.62 Mb, 12.32 Mb and 9.38 Mb were identified in Landrace, Chinese Songliao and Yorkshire by XPEHH, respectively, and 16, 13 and 17 potential selection regions with length of 15.20 Mb, 13.00 Mb and 19.21 Mb by XPCLR, 4, 2 and 4 potential selection regions with length of 3.20 Mb, 1.60 Mb and 3.20 Mb by Fst. For within-population methods, 7, 10 and 9 potential selection regions with length of 8.12 Mb, 8.40 Mb and 9.99 Mb were identified in Landrace, Chinese Songliao and Yorkshire by iHS, and 4, 3 and 2 potential selection regions with length of 3.20 Mb, 2.40 Mb and 1.60 Mb by Tajima's D. Moreover, the selection regions from different methods were partly overlapped, especially the regions around 22∼25 Mb were detected under selection in Landrace and Yorkshire while no selection in Chinese Songliao by all three between-population methods. Only quite few overlap of selection regions identified by between-population and within-population methods were found. Bioinformatics analysis showed that the genes relevant with meat quality, reproduction and immune were found in potential selection regions. In addition, three out of five significant SNPs associated with hematological traits reported in our genome-wide association study were harbored in potential selection regions.
Genome Engineering in Bacillus anthracis Using Cre Recombinase
Pomerantsev, Andrei P.; Sitaraman, Ramakrishnan; Galloway, Craig R.; Kivovich, Violetta; Leppla, Stephen H.
2006-01-01
Genome engineering is a powerful method for the study of bacterial virulence. With the availability of the complete genomic sequence of Bacillus anthracis, it is now possible to inactivate or delete selected genes of interest. However, many current methods for disrupting or deleting more than one gene require use of multiple antibiotic resistance determinants. In this report we used an approach that temporarily inserts an antibiotic resistance marker into a selected region of the genome and subsequently removes it, leaving the target region (a single gene or a larger genomic segment) permanently mutated. For this purpose, a spectinomycin resistance cassette flanked by bacteriophage P1 loxP sites oriented as direct repeats was inserted within a selected gene. After identification of strains having the spectinomycin cassette inserted by a double-crossover event, a thermo-sensitive plasmid expressing Cre recombinase was introduced at the permissive temperature. Cre recombinase action at the loxP sites excised the spectinomycin marker, leaving a single loxP site within the targeted gene or genomic segment. The Cre-expressing plasmid was then removed by growth at the restrictive temperature. The procedure could then be repeated to mutate additional genes. In this way, we sequentially mutated two pairs of genes: pepM and spo0A, and mcrB and mrr. Furthermore, loxP sites introduced at distant genes could be recombined by Cre recombinase to cause deletion of large intervening regions. In this way, we deleted the capBCAD region of the pXO2 plasmid and the entire 30 kb of chromosomal DNA between the mcrB and mrr genes, and in the latter case we found that the 32 intervening open reading frames were not essential to growth. PMID:16369025
Optimal Reference Gene Selection for Expression Studies in Human Reticulocytes.
Aggarwal, Anu; Jamwal, Manu; Viswanathan, Ganesh K; Sharma, Prashant; Sachdeva, ManUpdesh S; Bansal, Deepak; Malhotra, Pankaj; Das, Reena
2018-05-01
Reference genes are indispensable for normalizing mRNA levels across samples in real-time quantitative PCR. Their expression levels vary under different experimental conditions and because of several inherent characteristics. Appropriate reference gene selection is thus critical for gene-expression studies. This study aimed at selecting optimal reference genes for gene-expression analysis of reticulocytes and at validating them in hereditary spherocytosis (HS) and β-thalassemia intermedia (βTI) patients. Seven reference genes (PGK1, MPP1, HPRT1, ACTB, GAPDH, RN18S1, and SDHA) were selected because of published reports. Real-time quantitative PCR was performed on reticulocytes in 20 healthy volunteers, 15 HS patients, and 10 βTI patients. Threshold cycle values were compared with fold-change method and RefFinder software. The stable reference genes recommended by RefFinder were validated with SLC4A1 and flow cytometric eosin-5'-maleimide binding assay values in HS patients and HBG2 and high performance liquid chromatography-derived percentage of hemoglobin F in βTI. Comprehensive ranking predicted MPP1 and GAPDH as optimal reference genes for reticulocytes that were not affected in HS and βTI. This was further confirmed on validation with eosin-5'-maleimide results and percentage of hemoglobin F in HS and βTI patients, respectively. Hence, MPP1 and GAPDH are good reference genes for reticulocyte expression studies compared with ACTB and RN18S1, the two most commonly used reference genes. Copyright © 2018 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.
Compartmentalized partnered replication for the directed evolution of genetic parts and circuits.
Abil, Zhanar; Ellefson, Jared W; Gollihar, Jimmy D; Watkins, Ella; Ellington, Andrew D
2017-12-01
Compartmentalized partnered replication (CPR) is an emulsion-based directed evolution method based on a robust and modular phenotype-genotype linkage. In contrast to other in vivo directed evolution approaches, CPR largely mitigates host fitness effects due to a relatively short expression time of the gene of interest. CPR is based on gene circuits in which the selection of a 'partner' function from a library leads to the production of a thermostable polymerase. After library preparation, bacteria produce partner proteins that can potentially lead to enhancement of transcription, translation, gene regulation, and other aspects of cellular metabolism that reinforce thermostable polymerase production. Individual cells are then trapped in water-in-oil emulsion droplets in the presence of primers and dNTPs, followed by the recovery of the partner genes via emulsion PCR. In this step, droplets with cells expressing partner proteins that promote polymerase production will produce higher copy numbers of the improved partner gene. The resulting partner genes can subsequently be recloned for the next round of selection. Here, we present a step-by-step guideline for the procedure by providing examples of (i) selection of T7 RNA polymerases that recognize orthogonal promoters and (ii) selection of tRNA for enhanced amber codon suppression. A single round of CPR should take ∼3-5 d, whereas a whole directed evolution can be performed in 3-10 rounds, depending on selection efficiency.
Detecting signatures of positive selection associated with musical aptitude in the human genome
Liu, Xuanyao; Kanduri, Chakravarthi; Oikkonen, Jaana; Karma, Kai; Raijas, Pirre; Ukkola-Vuoti, Liisa; Teo, Yik-Ying; Järvelä, Irma
2016-01-01
Abilities related to musical aptitude appear to have a long history in human evolution. To elucidate the molecular and evolutionary background of musical aptitude, we compared genome-wide genotyping data (641 K SNPs) of 148 Finnish individuals characterized for musical aptitude. We assigned signatures of positive selection in a case-control setting using three selection methods: haploPS, XP-EHH and FST. Gene ontology classification revealed that the positive selection regions contained genes affecting inner-ear development. Additionally, literature survey has shown that several of the identified genes were known to be involved in auditory perception (e.g. GPR98, USH2A), cognition and memory (e.g. GRIN2B, IL1A, IL1B, RAPGEF5), reward mechanisms (RGS9), and song perception and production of songbirds (e.g. FOXP1, RGS9, GPR98, GRIN2B). Interestingly, genes related to inner-ear development and cognition were also detected in a previous genome-wide association study of musical aptitude. However, the candidate genes detected in this study were not reported earlier in studies of musical abilities. Identification of genes related to language development (FOXP1 and VLDLR) support the popular hypothesis that music and language share a common genetic and evolutionary background. The findings are consistent with the evolutionary conservation of genes related to auditory processes in other species and provide first empirical evidence for signatures of positive selection for abilities that contribute to musical aptitude. PMID:26879527
Detecting signatures of positive selection associated with musical aptitude in the human genome.
Liu, Xuanyao; Kanduri, Chakravarthi; Oikkonen, Jaana; Karma, Kai; Raijas, Pirre; Ukkola-Vuoti, Liisa; Teo, Yik-Ying; Järvelä, Irma
2016-02-16
Abilities related to musical aptitude appear to have a long history in human evolution. To elucidate the molecular and evolutionary background of musical aptitude, we compared genome-wide genotyping data (641 K SNPs) of 148 Finnish individuals characterized for musical aptitude. We assigned signatures of positive selection in a case-control setting using three selection methods: haploPS, XP-EHH and FST. Gene ontology classification revealed that the positive selection regions contained genes affecting inner-ear development. Additionally, literature survey has shown that several of the identified genes were known to be involved in auditory perception (e.g. GPR98, USH2A), cognition and memory (e.g. GRIN2B, IL1A, IL1B, RAPGEF5), reward mechanisms (RGS9), and song perception and production of songbirds (e.g. FOXP1, RGS9, GPR98, GRIN2B). Interestingly, genes related to inner-ear development and cognition were also detected in a previous genome-wide association study of musical aptitude. However, the candidate genes detected in this study were not reported earlier in studies of musical abilities. Identification of genes related to language development (FOXP1 and VLDLR) support the popular hypothesis that music and language share a common genetic and evolutionary background. The findings are consistent with the evolutionary conservation of genes related to auditory processes in other species and provide first empirical evidence for signatures of positive selection for abilities that contribute to musical aptitude.
Navigating the Interface Between Landscape Genetics and Landscape Genomics.
Storfer, Andrew; Patton, Austin; Fraik, Alexandra K
2018-01-01
As next-generation sequencing data become increasingly available for non-model organisms, a shift has occurred in the focus of studies of the geographic distribution of genetic variation. Whereas landscape genetics studies primarily focus on testing the effects of landscape variables on gene flow and genetic population structure, landscape genomics studies focus on detecting candidate genes under selection that indicate possible local adaptation. Navigating the transition between landscape genomics and landscape genetics can be challenging. The number of molecular markers analyzed has shifted from what used to be a few dozen loci to thousands of loci and even full genomes. Although genome scale data can be separated into sets of neutral loci for analyses of gene flow and population structure and putative loci under selection for inference of local adaptation, there are inherent differences in the questions that are addressed in the two study frameworks. We discuss these differences and their implications for study design, marker choice and downstream analysis methods. Similar to the rapid proliferation of analysis methods in the early development of landscape genetics, new analytical methods for detection of selection in landscape genomics studies are burgeoning. We focus on genome scan methods for detection of selection, and in particular, outlier differentiation methods and genetic-environment association tests because they are the most widely used. Use of genome scan methods requires an understanding of the potential mismatches between the biology of a species and assumptions inherent in analytical methods used, which can lead to high false positive rates of detected loci under selection. Key to choosing appropriate genome scan methods is an understanding of the underlying demographic structure of study populations, and such data can be obtained using neutral loci from the generated genome-wide data or prior knowledge of a species' phylogeographic history. To this end, we summarize recent simulation studies that test the power and accuracy of genome scan methods under a variety of demographic scenarios and sampling designs. We conclude with a discussion of additional considerations for future method development, and a summary of methods that show promise for landscape genomics studies but are not yet widely used.
Navigating the Interface Between Landscape Genetics and Landscape Genomics
Storfer, Andrew; Patton, Austin; Fraik, Alexandra K.
2018-01-01
As next-generation sequencing data become increasingly available for non-model organisms, a shift has occurred in the focus of studies of the geographic distribution of genetic variation. Whereas landscape genetics studies primarily focus on testing the effects of landscape variables on gene flow and genetic population structure, landscape genomics studies focus on detecting candidate genes under selection that indicate possible local adaptation. Navigating the transition between landscape genomics and landscape genetics can be challenging. The number of molecular markers analyzed has shifted from what used to be a few dozen loci to thousands of loci and even full genomes. Although genome scale data can be separated into sets of neutral loci for analyses of gene flow and population structure and putative loci under selection for inference of local adaptation, there are inherent differences in the questions that are addressed in the two study frameworks. We discuss these differences and their implications for study design, marker choice and downstream analysis methods. Similar to the rapid proliferation of analysis methods in the early development of landscape genetics, new analytical methods for detection of selection in landscape genomics studies are burgeoning. We focus on genome scan methods for detection of selection, and in particular, outlier differentiation methods and genetic-environment association tests because they are the most widely used. Use of genome scan methods requires an understanding of the potential mismatches between the biology of a species and assumptions inherent in analytical methods used, which can lead to high false positive rates of detected loci under selection. Key to choosing appropriate genome scan methods is an understanding of the underlying demographic structure of study populations, and such data can be obtained using neutral loci from the generated genome-wide data or prior knowledge of a species' phylogeographic history. To this end, we summarize recent simulation studies that test the power and accuracy of genome scan methods under a variety of demographic scenarios and sampling designs. We conclude with a discussion of additional considerations for future method development, and a summary of methods that show promise for landscape genomics studies but are not yet widely used. PMID:29593776
Development of a Markerless Knockout Method for Actinobacillus succinogenes
Joshi, Rajasi V.; Schindler, Bryan D.; McPherson, Nikolas R.; Tiwari, Kanupriya
2014-01-01
Actinobacillus succinogenes is one of the best natural succinate-producing organisms, but it still needs engineering to further increase succinate yield and productivity. In this study, we developed a markerless knockout method for A. succinogenes using natural transformation or electroporation. The Escherichia coli isocitrate dehydrogenase gene with flanking flippase recognition target sites was used as the positive selection marker, making use of A. succinogenes's auxotrophy for glutamate to select for growth on isocitrate. The Saccharomyces cerevisiae flippase recombinase (Flp) was used to remove the selection marker, allowing its reuse. Finally, the plasmid expressing flp was cured using acridine orange. We demonstrate that at least two consecutive deletions can be introduced into the same strain using this approach, that no more than a total of 1 kb of DNA is needed on each side of the selection cassette to protect from exonuclease activity during transformation, and that no more than 200 bp of homologous DNA is needed on each side for efficient recombination. We also demonstrate that electroporation can be used as an alternative transformation method to obtain knockout mutants and that an enriched defined medium can be used for direct selection of knockout mutants on agar plates with high efficiency. Single-knockout mutants of the fumarate reductase and of the pyruvate formate lyase-encoding genes were obtained using this knockout strategy. Double-knockout mutants were also obtained by deleting the citrate lyase-, β-galactosidase-, and aconitase-encoding genes in the pyruvate formate lyase knockout mutant strain. PMID:24610845
Development of a markerless knockout method for Actinobacillus succinogenes.
Joshi, Rajasi V; Schindler, Bryan D; McPherson, Nikolas R; Tiwari, Kanupriya; Vieille, Claire
2014-05-01
Actinobacillus succinogenes is one of the best natural succinate-producing organisms, but it still needs engineering to further increase succinate yield and productivity. In this study, we developed a markerless knockout method for A. succinogenes using natural transformation or electroporation. The Escherichia coli isocitrate dehydrogenase gene with flanking flippase recognition target sites was used as the positive selection marker, making use of A. succinogenes's auxotrophy for glutamate to select for growth on isocitrate. The Saccharomyces cerevisiae flippase recombinase (Flp) was used to remove the selection marker, allowing its reuse. Finally, the plasmid expressing flp was cured using acridine orange. We demonstrate that at least two consecutive deletions can be introduced into the same strain using this approach, that no more than a total of 1 kb of DNA is needed on each side of the selection cassette to protect from exonuclease activity during transformation, and that no more than 200 bp of homologous DNA is needed on each side for efficient recombination. We also demonstrate that electroporation can be used as an alternative transformation method to obtain knockout mutants and that an enriched defined medium can be used for direct selection of knockout mutants on agar plates with high efficiency. Single-knockout mutants of the fumarate reductase and of the pyruvate formate lyase-encoding genes were obtained using this knockout strategy. Double-knockout mutants were also obtained by deleting the citrate lyase-, β-galactosidase-, and aconitase-encoding genes in the pyruvate formate lyase knockout mutant strain.
Shi, Weiwei; Bugrim, Andrej; Nikolsky, Yuri; Nikolskya, Tatiana; Brennan, Richard J
2008-01-01
ABSTRACT The ideal toxicity biomarker is composed of the properties of prediction (is detected prior to traditional pathological signs of injury), accuracy (high sensitivity and specificity), and mechanistic relationships to the endpoint measured (biological relevance). Gene expression-based toxicity biomarkers ("signatures") have shown good predictive power and accuracy, but are difficult to interpret biologically. We have compared different statistical methods of feature selection with knowledge-based approaches, using GeneGo's database of canonical pathway maps, to generate gene sets for the classification of renal tubule toxicity. The gene set selection algorithms include four univariate analyses: t-statistics, fold-change, B-statistics, and RankProd, and their combination and overlap for the identification of differentially expressed probes. Enrichment analysis following the results of the four univariate analyses, Hotelling T-square test, and, finally out-of-bag selection, a variant of cross-validation, were used to identify canonical pathway maps-sets of genes coordinately involved in key biological processes-with classification power. Differentially expressed genes identified by the different statistical univariate analyses all generated reasonably performing classifiers of tubule toxicity. Maps identified by enrichment analysis or Hotelling T-square had lower classification power, but highlighted perturbed lipid homeostasis as a common discriminator of nephrotoxic treatments. The out-of-bag method yielded the best functionally integrated classifier. The map "ephrins signaling" performed comparably to a classifier derived using sparse linear programming, a machine learning algorithm, and represents a signaling network specifically involved in renal tubule development and integrity. Such functional descriptors of toxicity promise to better integrate predictive toxicogenomics with mechanistic analysis, facilitating the interpretation and risk assessment of predictive genomic investigations.
Chen, Meng-Yun; Liang, Dan; Zhang, Peng
2015-11-01
Incongruence between different phylogenomic analyses is the main challenge faced by phylogeneticists in the genomic era. To reduce incongruence, phylogenomic studies normally adopt some data filtering approaches, such as reducing missing data or using slowly evolving genes, to improve the signal quality of data. Here, we assembled a phylogenomic data set of 58 jawed vertebrate taxa and 4682 genes to investigate the backbone phylogeny of jawed vertebrates under both concatenation and coalescent-based frameworks. To evaluate the efficiency of extracting phylogenetic signals among different data filtering methods, we chose six highly intractable internodes within the backbone phylogeny of jawed vertebrates as our test questions. We found that our phylogenomic data set exhibits substantial conflicting signal among genes for these questions. Our analyses showed that non-specific data sets that are generated without bias toward specific questions are not sufficient to produce consistent results when there are several difficult nodes within a phylogeny. Moreover, phylogenetic accuracy based on non-specific data is considerably influenced by the size of data and the choice of tree inference methods. To address such incongruences, we selected genes that resolve a given internode but not the entire phylogeny. Notably, not only can this strategy yield correct relationships for the question, but it also reduces inconsistency associated with data sizes and inference methods. Our study highlights the importance of gene selection in phylogenomic analyses, suggesting that simply using a large amount of data cannot guarantee correct results. Constructing question-specific data sets may be more powerful for resolving problematic nodes. © The Author(s) 2015. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Cultivating Insect Cells To Produce Recombinant Proteins
NASA Technical Reports Server (NTRS)
Spaulding, Glenn; Goodwin, Thomas; Prewett, Tacey; Andrews, Angela; Francis, Karen; O'Connor, Kim
1996-01-01
Method of producing recombinant proteins involves growth of insect cells in nutrient solution in cylindrical bioreactor rotating about cylindrical axis, oriented horizontally and infecting cells with viruses into which genes of selected type cloned. Genes in question those encoding production of desired proteins. Horizontal rotating bioreactor preferred for use in method, denoted by acronym "HARV", described in "High-Aspect-Ratio Rotating Cell-Culture Vessel" (MSC-21662).
An ensemble rank learning approach for gene prioritization.
Lee, Po-Feng; Soo, Von-Wun
2013-01-01
Several different computational approaches have been developed to solve the gene prioritization problem. We intend to use the ensemble boosting learning techniques to combine variant computational approaches for gene prioritization in order to improve the overall performance. In particular we add a heuristic weighting function to the Rankboost algorithm according to: 1) the absolute ranks generated by the adopted methods for a certain gene, and 2) the ranking relationship between all gene-pairs from each prioritization result. We select 13 known prostate cancer genes in OMIM database as training set and protein coding gene data in HGNC database as test set. We adopt the leave-one-out strategy for the ensemble rank boosting learning. The experimental results show that our ensemble learning approach outperforms the four gene-prioritization methods in ToppGene suite in the ranking results of the 13 known genes in terms of mean average precision, ROC and AUC measures.
Gene trap and gene inversion methods for conditional gene inactivation in the mouse
Xin, Hong-Bo; Deng, Ke-Yu; Shui, Bo; Qu, Shimian; Sun, Qi; Lee, Jane; Greene, Kai Su; Wilson, Jason; Yu, Ying; Feldman, Morris; Kotlikoff, Michael I.
2005-01-01
Conditional inactivation of individual genes in mice using site-specific recombinases is an extremely powerful method for determining the complex roles of mammalian genes in developmental and tissue-specific contexts, a major goal of post-genomic research. However, the process of generating mice with recombinase recognition sequences placed at specific locations within a gene, while maintaining a functional allele, is time consuming, expensive and technically challenging. We describe a system that combines gene trap and site-specific DNA inversion to generate mouse embryonic stem (ES) cell clones for the rapid production of conditional knockout mice, and the use of this system in an initial gene trap screen. Gene trapping should allow the selection of thousands of ES cell clones with defined insertions that can be used to generate conditional knockout mice, thereby providing extensive parallelism that eliminates the time-consuming steps of targeting vector construction and homologous recombination for each gene. PMID:15659575
Ludovini, Vienna; Bianconi, Fortunato; Siggillino, Annamaria; Piobbico, Danilo; Vannucci, Jacopo; Metro, Giulio; Chiari, Rita; Bellezza, Guido; Puma, Francesco; Della Fazia, Maria Agnese; Servillo, Giuseppe; Crinò, Lucio
2016-05-24
Risk assessment and treatment choice remains a challenge in early non-small-cell lung cancer (NSCLC). The aim of this study was to identify novel genes involved in the risk of early relapse (ER) compared to no relapse (NR) in resected lung adenocarcinoma (AD) patients using a combination of high throughput technology and computational analysis. We identified 18 patients (n.13 NR and n.5 ER) with stage I AD. Frozen samples of patients in ER, NR and corresponding normal lung (NL) were subjected to Microarray technology and quantitative-PCR (Q-PCR). A gene network computational analysis was performed to select predictive genes. An independent set of 79 ADs stage I samples was used to validate selected genes by Q-PCR.From microarray analysis we selected 50 genes, using the fold change ratio of ER versus NR. They were validated both in pool and individually in patient samples (ER and NR) by Q-PCR. Fourteen increased and 25 decreased genes showed a concordance between two methods. They were used to perform a computational gene network analysis that identified 4 increased (HOXA10, CLCA2, AKR1B10, FABP3) and 6 decreased (SCGB1A1, PGC, TFF1, PSCA, SPRR1B and PRSS1) genes. Moreover, in an independent dataset of ADs samples, we showed that both high FABP3 expression and low SCGB1A1 expression was associated with a worse disease-free survival (DFS).Our results indicate that it is possible to define, through gene expression and computational analysis, a characteristic gene profiling of patients with an increased risk of relapse that may become a tool for patient selection for adjuvant therapy.
Hybrid feature selection algorithm using symmetrical uncertainty and a harmony search algorithm
NASA Astrophysics Data System (ADS)
Salameh Shreem, Salam; Abdullah, Salwani; Nazri, Mohd Zakree Ahmad
2016-04-01
Microarray technology can be used as an efficient diagnostic system to recognise diseases such as tumours or to discriminate between different types of cancers in normal tissues. This technology has received increasing attention from the bioinformatics community because of its potential in designing powerful decision-making tools for cancer diagnosis. However, the presence of thousands or tens of thousands of genes affects the predictive accuracy of this technology from the perspective of classification. Thus, a key issue in microarray data is identifying or selecting the smallest possible set of genes from the input data that can achieve good predictive accuracy for classification. In this work, we propose a two-stage selection algorithm for gene selection problems in microarray data-sets called the symmetrical uncertainty filter and harmony search algorithm wrapper (SU-HSA). Experimental results show that the SU-HSA is better than HSA in isolation for all data-sets in terms of the accuracy and achieves a lower number of genes on 6 out of 10 instances. Furthermore, the comparison with state-of-the-art methods shows that our proposed approach is able to obtain 5 (out of 10) new best results in terms of the number of selected genes and competitive results in terms of the classification accuracy.
iPcc: a novel feature extraction method for accurate disease class discovery and prediction
Ren, Xianwen; Wang, Yong; Zhang, Xiang-Sun; Jin, Qi
2013-01-01
Gene expression profiling has gradually become a routine procedure for disease diagnosis and classification. In the past decade, many computational methods have been proposed, resulting in great improvements on various levels, including feature selection and algorithms for classification and clustering. In this study, we present iPcc, a novel method from the feature extraction perspective to further propel gene expression profiling technologies from bench to bedside. We define ‘correlation feature space’ for samples based on the gene expression profiles by iterative employment of Pearson’s correlation coefficient. Numerical experiments on both simulated and real gene expression data sets demonstrate that iPcc can greatly highlight the latent patterns underlying noisy gene expression data and thus greatly improve the robustness and accuracy of the algorithms currently available for disease diagnosis and classification based on gene expression profiles. PMID:23761440
Validating internal controls for quantitative plant gene expression studies.
Brunner, Amy M; Yakovlev, Igor A; Strauss, Steven H
2004-08-18
Real-time reverse transcription PCR (RT-PCR) has greatly improved the ease and sensitivity of quantitative gene expression studies. However, accurate measurement of gene expression with this method relies on the choice of a valid reference for data normalization. Studies rarely verify that gene expression levels for reference genes are adequately consistent among the samples used, nor compare alternative genes to assess which are most reliable for the experimental conditions analyzed. Using real-time RT-PCR to study the expression of 10 poplar (genus Populus) housekeeping genes, we demonstrate a simple method for determining the degree of stability of gene expression over a set of experimental conditions. Based on a traditional method for analyzing the stability of varieties in plant breeding, it defines measures of gene expression stability from analysis of variance (ANOVA) and linear regression. We found that the potential internal control genes differed widely in their expression stability over the different tissues, developmental stages and environmental conditions studied. Our results support that quantitative comparisons of candidate reference genes are an important part of real-time RT-PCR studies that seek to precisely evaluate variation in gene expression. The method we demonstrated facilitates statistical and graphical evaluation of gene expression stability. Selection of the best reference gene for a given set of experimental conditions should enable detection of biologically significant changes in gene expression that are too small to be revealed by less precise methods, or when highly variable reference genes are unknowingly used in real-time RT-PCR experiments.
Explicit Building Block Multiobjective Evolutionary Computation: Methods and Applications
2005-06-16
which is introduced in 1990 by Richard Dawkins in his book ”The Selfish Gene .” [34] 356 E.5.7 Pareto Envelop-based Selection Algorithm I and II...IGC Intelligent Gene Collector . . . . . . . . . . . . . . . . . 59 OED Orthogonal Experimental Design . . . . . . . . . . . . . 59 MED Main Effect...complete one experiment 74 `′ The string length hold within the computer (can be longer than number of genes
NASA Astrophysics Data System (ADS)
Sasaki, Shota; Hokari, Yutaro; Kanzaki, Makoto; Kaneko, Toshiro
2015-09-01
Gene transfection, which is the process of deliberately introducing nucleic acids into cells, is expected to play an important role in medical treatment because the process is necessary for gene therapy and creation of induced pluripotent stem (iPS) cells. However, the conventional transfection methods have some problems, so we focus attention on promising transfection methods by atmospheric pressure plasma (APP). We have previously reported that the cell membrane permeability, which is closely related with gene transfection, is improved using a cell-solution electrode for generating He-APP. He-APP is irradiated to the solution containing the adherent cells and delivery materials such as fluorescent dyes (YOYO-1) and plasmid DNA (GFP). In case of YOYO-1 delivery, more than 80% of cells can be transferred only in the plasma-irradiated area and the spatially-selective membrane permeabilization is realized by the plasma irradiation. In addition, it is confirmed that plasmid DNA is transfected and the GFP genes are expressed using same APP irradiation system with no obvious cellular damage.
Roux, Julien; Liu, Jialin; Robinson-Rechavi, Marc
2017-11-01
The evolutionary history of vertebrates is marked by three ancient whole-genome duplications: two successive rounds in the ancestor of vertebrates, and a third one specific to teleost fishes. Biased loss of most duplicates enriched the genome for specific genes, such as slow evolving genes, but this selective retention process is not well understood. To understand what drives the long-term preservation of duplicate genes, we characterized duplicated genes in terms of their expression patterns. We used a new method of expression enrichment analysis, TopAnat, applied to in situ hybridization data from thousands of genes from zebrafish and mouse. We showed that the presence of expression in the nervous system is a good predictor of a higher rate of retention of duplicate genes after whole-genome duplication. Further analyses suggest that purifying selection against the toxic effects of misfolded or misinteracting proteins, which is particularly strong in nonrenewing neural tissues, likely constrains the evolution of coding sequences of nervous system genes, leading indirectly to the preservation of duplicate genes after whole-genome duplication. Whole-genome duplications thus greatly contributed to the expansion of the toolkit of genes available for the evolution of profound novelties of the nervous system at the base of the vertebrate radiation. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Liang, Liqin; Li, Jianqiang; Cheng, Lin; Ling, Jian; Luo, Zhongqin; Bai, Miao; Xie, Bingyan
2014-11-01
The Fusarium oxysporum species complex consists of fungal pathogens that cause serial vascular wilt disease on more than 100 cultivated species throughout the world. Gene function analysis is rapidly becoming more and more important as the whole-genome sequences of various F. oxysporum strains are being completed. Gene-disruption techniques are a common molecular tool for studying gene function, yet are often a limiting step in gene function identification. In this study we have developed a F. oxysporum high-efficiency gene-disruption strategy based on split-marker homologous recombination cassettes with dual selection and electroporation transformation. The method was efficiently used to delete three RNA-dependent RNA polymerase (RdRP) genes. The gene-disruption cassettes of three genes can be constructed simultaneously within a short time using this technique. The optimal condition for electroporation is 10μF capacitance, 300Ω resistance, 4kV/cm field strength, with 1μg of DNA (gene-disruption cassettes). Under these optimal conditions, we were able to obtain 95 transformants per μg DNA. And after positive-negative selection, the transformants were efficiently screened by PCR, screening efficiency averaged 85%: 90% (RdRP1), 85% (RdRP2) and 77% (RdRP3). This gene-disruption strategy should pave the way for high throughout genetic analysis in F. oxysporum. Copyright © 2014 Elsevier GmbH. All rights reserved.
Risk Classification with an Adaptive Naive Bayes Kernel Machine Model.
Minnier, Jessica; Yuan, Ming; Liu, Jun S; Cai, Tianxi
2015-04-22
Genetic studies of complex traits have uncovered only a small number of risk markers explaining a small fraction of heritability and adding little improvement to disease risk prediction. Standard single marker methods may lack power in selecting informative markers or estimating effects. Most existing methods also typically do not account for non-linearity. Identifying markers with weak signals and estimating their joint effects among many non-informative markers remains challenging. One potential approach is to group markers based on biological knowledge such as gene structure. If markers in a group tend to have similar effects, proper usage of the group structure could improve power and efficiency in estimation. We propose a two-stage method relating markers to disease risk by taking advantage of known gene-set structures. Imposing a naive bayes kernel machine (KM) model, we estimate gene-set specific risk models that relate each gene-set to the outcome in stage I. The KM framework efficiently models potentially non-linear effects of predictors without requiring explicit specification of functional forms. In stage II, we aggregate information across gene-sets via a regularization procedure. Estimation and computational efficiency is further improved with kernel principle component analysis. Asymptotic results for model estimation and gene set selection are derived and numerical studies suggest that the proposed procedure could outperform existing procedures for constructing genetic risk models.
Integrative Analysis of Cancer Diagnosis Studies with Composite Penalization
Liu, Jin; Huang, Jian; Ma, Shuangge
2013-01-01
Summary In cancer diagnosis studies, high-throughput gene profiling has been extensively conducted, searching for genes whose expressions may serve as markers. Data generated from such studies have the “large d, small n” feature, with the number of genes profiled much larger than the sample size. Penalization has been extensively adopted for simultaneous estimation and marker selection. Because of small sample sizes, markers identified from the analysis of single datasets can be unsatisfactory. A cost-effective remedy is to conduct integrative analysis of multiple heterogeneous datasets. In this article, we investigate composite penalization methods for estimation and marker selection in integrative analysis. The proposed methods use the minimax concave penalty (MCP) as the outer penalty. Under the homogeneity model, the ridge penalty is adopted as the inner penalty. Under the heterogeneity model, the Lasso penalty and MCP are adopted as the inner penalty. Effective computational algorithms based on coordinate descent are developed. Numerical studies, including simulation and analysis of practical cancer datasets, show satisfactory performance of the proposed methods. PMID:24578589
Dong, Niu; Montanez, Belen; Creelman, Robert A; Cornish, Katrina
2006-02-01
A new method has been developed for guayule tissue culture and transformation. Guayule leaf explants have a poor survival rate when placed on normal MS medium and under normal culture room light conditions. Low light and low ammonium treatment greatly improved shoot organogenesis and transformation from leaf tissues. Using this method, a 35S promoter driven BAR gene and an ubiquitin-3 promoter driven GUS gene (with intron) have been successfully introduced into guayule. These transgenic guayule plants were resistant to the herbicide ammonium-glufosinate and were positive to GUS staining. Molecular analysis showed the expected band and signal in all GUS positive transformants. The transformation efficiency with glufosinate selection ranged from 3 to 6%. Transformation with a pBIN19-based plasmid containing a NPTII gene and then selection with kanamycin also works well using this method. The ratio of kanamycin-resistant calli to total starting explants reached 50% in some experiments.
An Adaptive Genetic Association Test Using Double Kernel Machines
Zhan, Xiang; Epstein, Michael P.; Ghosh, Debashis
2014-01-01
Recently, gene set-based approaches have become very popular in gene expression profiling studies for assessing how genetic variants are related to disease outcomes. Since most genes are not differentially expressed, existing pathway tests considering all genes within a pathway suffer from considerable noise and power loss. Moreover, for a differentially expressed pathway, it is of interest to select important genes that drive the effect of the pathway. In this article, we propose an adaptive association test using double kernel machines (DKM), which can both select important genes within the pathway as well as test for the overall genetic pathway effect. This DKM procedure first uses the garrote kernel machines (GKM) test for the purposes of subset selection and then the least squares kernel machine (LSKM) test for testing the effect of the subset of genes. An appealing feature of the kernel machine framework is that it can provide a flexible and unified method for multi-dimensional modeling of the genetic pathway effect allowing for both parametric and nonparametric components. This DKM approach is illustrated with application to simulated data as well as to data from a neuroimaging genetics study. PMID:26640602
Taye, Mengistie; Lee, Wonseok; Caetano-Anolles, Kelsey; Dessie, Tadelle; Hanotte, Olivier; Mwai, Okeyo Ally; Kemp, Stephen; Cho, Seoae; Oh, Sung Jong; Lee, Hak-Kyo; Kim, Heebal
2017-12-01
As African indigenous cattle evolved in a hot tropical climate, they have developed an inherent thermotolerance; survival mechanisms include a light-colored and shiny coat, increased sweating, and cellular and molecular mechanisms to cope with high environmental temperature. Here, we report the positive selection signature of genes in African cattle breeds which contribute for their heat tolerance mechanisms. We compared the genomes of five indigenous African cattle breeds with the genomes of four commercial cattle breeds using cross-population composite likelihood ratio (XP-CLR) and cross-population extended haplotype homozygosity (XP-EHH) statistical methods. We identified 296 (XP-EHH) and 327 (XP-CLR) positively selected genes. Gene ontology analysis resulted in 41 biological process terms and six Kyoto Encyclopedia of Genes and Genomes pathways. Several genes and pathways were found to be involved in oxidative stress response, osmotic stress response, heat shock response, hair and skin properties, sweat gland development and sweating, feed intake and metabolism, and reproduction functions. The genes and pathways identified directly or indirectly contribute to the superior heat tolerance mechanisms in African cattle populations. The result will improve our understanding of the biological mechanisms of heat tolerance in African cattle breeds and opens an avenue for further study. © 2017 Japanese Society of Animal Science.
A novel feature extraction approach for microarray data based on multi-algorithm fusion
Jiang, Zhu; Xu, Rong
2015-01-01
Feature extraction is one of the most important and effective method to reduce dimension in data mining, with emerging of high dimensional data such as microarray gene expression data. Feature extraction for gene selection, mainly serves two purposes. One is to identify certain disease-related genes. The other is to find a compact set of discriminative genes to build a pattern classifier with reduced complexity and improved generalization capabilities. Depending on the purpose of gene selection, two types of feature extraction algorithms including ranking-based feature extraction and set-based feature extraction are employed in microarray gene expression data analysis. In ranking-based feature extraction, features are evaluated on an individual basis, without considering inter-relationship between features in general, while set-based feature extraction evaluates features based on their role in a feature set by taking into account dependency between features. Just as learning methods, feature extraction has a problem in its generalization ability, which is robustness. However, the issue of robustness is often overlooked in feature extraction. In order to improve the accuracy and robustness of feature extraction for microarray data, a novel approach based on multi-algorithm fusion is proposed. By fusing different types of feature extraction algorithms to select the feature from the samples set, the proposed approach is able to improve feature extraction performance. The new approach is tested against gene expression dataset including Colon cancer data, CNS data, DLBCL data, and Leukemia data. The testing results show that the performance of this algorithm is better than existing solutions. PMID:25780277
A novel feature extraction approach for microarray data based on multi-algorithm fusion.
Jiang, Zhu; Xu, Rong
2015-01-01
Feature extraction is one of the most important and effective method to reduce dimension in data mining, with emerging of high dimensional data such as microarray gene expression data. Feature extraction for gene selection, mainly serves two purposes. One is to identify certain disease-related genes. The other is to find a compact set of discriminative genes to build a pattern classifier with reduced complexity and improved generalization capabilities. Depending on the purpose of gene selection, two types of feature extraction algorithms including ranking-based feature extraction and set-based feature extraction are employed in microarray gene expression data analysis. In ranking-based feature extraction, features are evaluated on an individual basis, without considering inter-relationship between features in general, while set-based feature extraction evaluates features based on their role in a feature set by taking into account dependency between features. Just as learning methods, feature extraction has a problem in its generalization ability, which is robustness. However, the issue of robustness is often overlooked in feature extraction. In order to improve the accuracy and robustness of feature extraction for microarray data, a novel approach based on multi-algorithm fusion is proposed. By fusing different types of feature extraction algorithms to select the feature from the samples set, the proposed approach is able to improve feature extraction performance. The new approach is tested against gene expression dataset including Colon cancer data, CNS data, DLBCL data, and Leukemia data. The testing results show that the performance of this algorithm is better than existing solutions.
Xi, Jianing; Wang, Minghui; Li, Ao
2018-06-05
Discovery of mutated driver genes is one of the primary objective for studying tumorigenesis. To discover some relatively low frequently mutated driver genes from somatic mutation data, many existing methods incorporate interaction network as prior information. However, the prior information of mRNA expression patterns are not exploited by these existing network-based methods, which is also proven to be highly informative of cancer progressions. To incorporate prior information from both interaction network and mRNA expressions, we propose a robust and sparse co-regularized nonnegative matrix factorization to discover driver genes from mutation data. Furthermore, our framework also conducts Frobenius norm regularization to overcome overfitting issue. Sparsity-inducing penalty is employed to obtain sparse scores in gene representations, of which the top scored genes are selected as driver candidates. Evaluation experiments by known benchmarking genes indicate that the performance of our method benefits from the two type of prior information. Our method also outperforms the existing network-based methods, and detect some driver genes that are not predicted by the competing methods. In summary, our proposed method can improve the performance of driver gene discovery by effectively incorporating prior information from interaction network and mRNA expression patterns into a robust and sparse co-regularized matrix factorization framework.
Stekel, Dov J.; Sarti, Donatella; Trevino, Victor; Zhang, Lihong; Salmon, Mike; Buckley, Chris D.; Stevens, Mark; Pallen, Mark J.; Penn, Charles; Falciani, Francesco
2005-01-01
A key step in the analysis of microarray data is the selection of genes that are differentially expressed. Ideally, such experiments should be properly replicated in order to infer both technical and biological variability, and the data should be subjected to rigorous hypothesis tests to identify the differentially expressed genes. However, in microarray experiments involving the analysis of very large numbers of biological samples, replication is not always practical. Therefore, there is a need for a method to select differentially expressed genes in a rational way from insufficiently replicated data. In this paper, we describe a simple method that uses bootstrapping to generate an error model from a replicated pilot study that can be used to identify differentially expressed genes in subsequent large-scale studies on the same platform, but in which there may be no replicated arrays. The method builds a stratified error model that includes array-to-array variability, feature-to-feature variability and the dependence of error on signal intensity. We apply this model to the characterization of the host response in a model of bacterial infection of human intestinal epithelial cells. We demonstrate the effectiveness of error model based microarray experiments and propose this as a general strategy for a microarray-based screening of large collections of biological samples. PMID:15800204
Stress-Driven Selection of Novel Phenotypes
NASA Technical Reports Server (NTRS)
Fox, George E.; Stepaov, Victor G.; Liu, Yamei
2011-01-01
A process has been developed that can confer novel properties, such as metal resistance, to a host bacterium. This same process can also be used to produce RNAs and peptides that have novel properties, such as the ability to bind particular compounds. It is inherent in the method that the peptide or RNA will behave as expected in the target organism. Plasmid-born mini-gene libraries coding for either a population of combinatorial peptides or stable, artificial RNAs carrying random inserts are produced. These libraries, which have no bias towards any biological function, are used to transform the organism of interest and to serve as an initial source of genetic variation for stress-driven evolution. The transformed bacteria are propagated under selective pressure in order to obtain variants with the desired properties. The process is highly distinct from in vitro methods because the variants are selected in the context of the cell while it is experiencing stress. Hence, the selected peptide or RNA will, by definition, work as expected in the target cell as the cell adapts to its presence during the selection process. Once the novel gene, which produces the sought phenotype, is obtained, it can be transferred to the main genome to increase the genetic stability in the organism. Alternatively, the cell line can be used to produce novel RNAs or peptides with selectable properties in large quantity for separate purposes. The system allows for easy, large-scale purification of the RNAs or peptide products. The process has been reduced to practice by imposing sub-inhibitory concentrations of NiCl2 on cells of the bacterium Escherichia coli that were transformed separately with the peptide library and RNA library. The evolved resistant clones were isolated, and sequences of the selected mini-gene variants were established. Clones resistant to NiCl2 were found to carry identical plasmid variants with a functional mini-gene that specifically conferred significant nickel tolerance on the host cells. Sequencing of the selected mini-gene revealed a propensity of the encoded peptide to bind transient metal ions. Expression of the mini-gene markedly improved growth parameters of the evolved clones at sub-inhibitory concentrations of NiCl2 while being slightly detrimental in the absence of stress. Similar results have been obtained with the RNA libraries. Overall, the results demonstrate a very natural outcome of the selection experiments in which the mini-genes were expected to be either successfully integrated into bacterial genetic networks, or rejected depending upon their effect on host fitness. This described approach can be useful as a laboratory model to study the dynamics of bacterial adaptive evolution on the molecular level. It can also provide a strategy for screening expressed DNA libraries in search of novel genes with desirable properties.
Popova, Blagovesta; Schubert, Steffen; Bulla, Ingo; Buchwald, Daniela; Kramer, Wilfried
2015-01-01
A major challenge in gene library generation is to guarantee a large functional size and diversity that significantly increases the chances of selecting different functional protein variants. The use of trinucleotides mixtures for controlled randomization results in superior library diversity and offers the ability to specify the type and distribution of the amino acids at each position. Here we describe the generation of a high diversity gene library using tHisF of the hyperthermophile Thermotoga maritima as a scaffold. Combining various rational criteria with contingency, we targeted 26 selected codons of the thisF gene sequence for randomization at a controlled level. We have developed a novel method of creating full-length gene libraries by combinatorial assembly of smaller sub-libraries. Full-length libraries of high diversity can easily be assembled on demand from smaller and much less diverse sub-libraries, which circumvent the notoriously troublesome long-term archivation and repeated proliferation of high diversity ensembles of phages or plasmids. We developed a generally applicable software tool for sequence analysis of mutated gene sequences that provides efficient assistance for analysis of library diversity. Finally, practical utility of the library was demonstrated in principle by assessment of the conformational stability of library members and isolating protein variants with HisF activity from it. Our approach integrates a number of features of nucleic acids synthetic chemistry, biochemistry and molecular genetics to a coherent, flexible and robust method of combinatorial gene synthesis. PMID:26355961
Popova, Blagovesta; Schubert, Steffen; Bulla, Ingo; Buchwald, Daniela; Kramer, Wilfried
2015-01-01
A major challenge in gene library generation is to guarantee a large functional size and diversity that significantly increases the chances of selecting different functional protein variants. The use of trinucleotides mixtures for controlled randomization results in superior library diversity and offers the ability to specify the type and distribution of the amino acids at each position. Here we describe the generation of a high diversity gene library using tHisF of the hyperthermophile Thermotoga maritima as a scaffold. Combining various rational criteria with contingency, we targeted 26 selected codons of the thisF gene sequence for randomization at a controlled level. We have developed a novel method of creating full-length gene libraries by combinatorial assembly of smaller sub-libraries. Full-length libraries of high diversity can easily be assembled on demand from smaller and much less diverse sub-libraries, which circumvent the notoriously troublesome long-term archivation and repeated proliferation of high diversity ensembles of phages or plasmids. We developed a generally applicable software tool for sequence analysis of mutated gene sequences that provides efficient assistance for analysis of library diversity. Finally, practical utility of the library was demonstrated in principle by assessment of the conformational stability of library members and isolating protein variants with HisF activity from it. Our approach integrates a number of features of nucleic acids synthetic chemistry, biochemistry and molecular genetics to a coherent, flexible and robust method of combinatorial gene synthesis.
Glebes, Tirzah Y; Sandoval, Nicholas R; Gillis, Jacob H; Gill, Ryan T
2015-01-01
Engineering both feedstock and product tolerance is important for transitioning towards next-generation biofuels derived from renewable sources. Tolerance to chemical inhibitors typically results in complex phenotypes, for which multiple genetic changes must often be made to confer tolerance. Here, we performed a genome-wide search for furfural-tolerant alleles using the TRackable Multiplex Recombineering (TRMR) method (Warner et al. (2010), Nature Biotechnology), which uses chromosomally integrated mutations directed towards increased or decreased expression of virtually every gene in Escherichia coli. We employed various growth selection strategies to assess the role of selection design towards growth enrichments. We also compared genes with increased fitness from our TRMR selection to those from a previously reported genome-wide identification study of furfural tolerance genes using a plasmid-based genomic library approach (Glebes et al. (2014) PLOS ONE). In several cases, growth improvements were observed for the chromosomally integrated promoter/RBS mutations but not for the plasmid-based overexpression constructs. Through this assessment, four novel tolerance genes, ahpC, yhjH, rna, and dicA, were identified and confirmed for their effect on improving growth in the presence of furfural. © 2014 Wiley Periodicals, Inc.
Li, Ziyi; Safo, Sandra E; Long, Qi
2017-07-11
Sparse principal component analysis (PCA) is a popular tool for dimensionality reduction, pattern recognition, and visualization of high dimensional data. It has been recognized that complex biological mechanisms occur through concerted relationships of multiple genes working in networks that are often represented by graphs. Recent work has shown that incorporating such biological information improves feature selection and prediction performance in regression analysis, but there has been limited work on extending this approach to PCA. In this article, we propose two new sparse PCA methods called Fused and Grouped sparse PCA that enable incorporation of prior biological information in variable selection. Our simulation studies suggest that, compared to existing sparse PCA methods, the proposed methods achieve higher sensitivity and specificity when the graph structure is correctly specified, and are fairly robust to misspecified graph structures. Application to a glioblastoma gene expression dataset identified pathways that are suggested in the literature to be related with glioblastoma. The proposed sparse PCA methods Fused and Grouped sparse PCA can effectively incorporate prior biological information in variable selection, leading to improved feature selection and more interpretable principal component loadings and potentially providing insights on molecular underpinnings of complex diseases.
Zhang, Xiaoshuai; Xue, Fuzhong; Liu, Hong; Zhu, Dianwen; Peng, Bin; Wiemels, Joseph L; Yang, Xiaowei
2014-12-10
Genome-wide Association Studies (GWAS) are typically designed to identify phenotype-associated single nucleotide polymorphisms (SNPs) individually using univariate analysis methods. Though providing valuable insights into genetic risks of common diseases, the genetic variants identified by GWAS generally account for only a small proportion of the total heritability for complex diseases. To solve this "missing heritability" problem, we implemented a strategy called integrative Bayesian Variable Selection (iBVS), which is based on a hierarchical model that incorporates an informative prior by considering the gene interrelationship as a network. It was applied here to both simulated and real data sets. Simulation studies indicated that the iBVS method was advantageous in its performance with highest AUC in both variable selection and outcome prediction, when compared to Stepwise and LASSO based strategies. In an analysis of a leprosy case-control study, iBVS selected 94 SNPs as predictors, while LASSO selected 100 SNPs. The Stepwise regression yielded a more parsimonious model with only 3 SNPs. The prediction results demonstrated that the iBVS method had comparable performance with that of LASSO, but better than Stepwise strategies. The proposed iBVS strategy is a novel and valid method for Genome-wide Association Studies, with the additional advantage in that it produces more interpretable posterior probabilities for each variable unlike LASSO and other penalized regression methods.
Two-Way Gene Interaction From Microarray Data Based on Correlation Methods.
Alavi Majd, Hamid; Talebi, Atefeh; Gilany, Kambiz; Khayyer, Nasibeh
2016-06-01
Gene networks have generated a massive explosion in the development of high-throughput techniques for monitoring various aspects of gene activity. Networks offer a natural way to model interactions between genes, and extracting gene network information from high-throughput genomic data is an important and difficult task. The purpose of this study is to construct a two-way gene network based on parametric and nonparametric correlation coefficients. The first step in constructing a Gene Co-expression Network is to score all pairs of gene vectors. The second step is to select a score threshold and connect all gene pairs whose scores exceed this value. In the foundation-application study, we constructed two-way gene networks using nonparametric methods, such as Spearman's rank correlation coefficient and Blomqvist's measure, and compared them with Pearson's correlation coefficient. We surveyed six genes of venous thrombosis disease, made a matrix entry representing the score for the corresponding gene pair, and obtained two-way interactions using Pearson's correlation, Spearman's rank correlation, and Blomqvist's coefficient. Finally, these methods were compared with Cytoscape, based on BIND, and Gene Ontology, based on molecular function visual methods; R software version 3.2 and Bioconductor were used to perform these methods. Based on the Pearson and Spearman correlations, the results were the same and were confirmed by Cytoscape and GO visual methods; however, Blomqvist's coefficient was not confirmed by visual methods. Some results of the correlation coefficients are not the same with visualization. The reason may be due to the small number of data.
Simultaneous grouping pursuit and feature selection over an undirected graph*
Zhu, Yunzhang; Shen, Xiaotong; Pan, Wei
2013-01-01
Summary In high-dimensional regression, grouping pursuit and feature selection have their own merits while complementing each other in battling the curse of dimensionality. To seek a parsimonious model, we perform simultaneous grouping pursuit and feature selection over an arbitrary undirected graph with each node corresponding to one predictor. When the corresponding nodes are reachable from each other over the graph, regression coefficients can be grouped, whose absolute values are the same or close. This is motivated from gene network analysis, where genes tend to work in groups according to their biological functionalities. Through a nonconvex penalty, we develop a computational strategy and analyze the proposed method. Theoretical analysis indicates that the proposed method reconstructs the oracle estimator, that is, the unbiased least squares estimator given the true grouping, leading to consistent reconstruction of grouping structures and informative features, as well as to optimal parameter estimation. Simulation studies suggest that the method combines the benefit of grouping pursuit with that of feature selection, and compares favorably against its competitors in selection accuracy and predictive performance. An application to eQTL data is used to illustrate the methodology, where a network is incorporated into analysis through an undirected graph. PMID:24098061
2013-01-01
Background High–throughput (HT) technologies provide huge amount of gene expression data that can be used to identify biomarkers useful in the clinical practice. The most frequently used approaches first select a set of genes (i.e. gene signature) able to characterize differences between two or more phenotypical conditions, and then provide a functional assessment of the selected genes with an a posteriori enrichment analysis, based on biological knowledge. However, this approach comes with some drawbacks. First, gene selection procedure often requires tunable parameters that affect the outcome, typically producing many false hits. Second, a posteriori enrichment analysis is based on mapping between biological concepts and gene expression measurements, which is hard to compute because of constant changes in biological knowledge and genome analysis. Third, such mapping is typically used in the assessment of the coverage of gene signature by biological concepts, that is either score–based or requires tunable parameters as well, limiting its power. Results We present Knowledge Driven Variable Selection (KDVS), a framework that uses a priori biological knowledge in HT data analysis. The expression data matrix is transformed, according to prior knowledge, into smaller matrices, easier to analyze and to interpret from both computational and biological viewpoints. Therefore KDVS, unlike most approaches, does not exclude a priori any function or process potentially relevant for the biological question under investigation. Differently from the standard approach where gene selection and functional assessment are applied independently, KDVS embeds these two steps into a unified statistical framework, decreasing the variability derived from the threshold–dependent selection, the mapping to the biological concepts, and the signature coverage. We present three case studies to assess the usefulness of the method. Conclusions We showed that KDVS not only enables the selection of known biological functionalities with accuracy, but also identification of new ones. An efficient implementation of KDVS was devised to obtain results in a fast and robust way. Computing time is drastically reduced by the effective use of distributed resources. Finally, integrated visualization techniques immediately increase the interpretability of results. Overall, KDVS approach can be considered as a viable alternative to enrichment–based approaches. PMID:23302187
Antkowiak, Maciej; Torres-Mapa, Maria Leilani; Witts, Emily C.; Miles, Gareth B.; Dholakia, Kishan; Gunn-Moore, Frank J.
2013-01-01
A prevailing problem in neuroscience is the fast and targeted delivery of DNA into selected neurons. The development of an appropriate methodology would enable the transfection of multiple genes into the same cell or different genes into different neighboring cells as well as rapid cell selective functionalization of neurons. Here, we show that optimized femtosecond optical transfection fulfills these requirements. We also demonstrate successful optical transfection of channelrhodopsin-2 in single selected neurons. We extend the functionality of this technique for wider uptake by neuroscientists by using fast three-dimensional laser beam steering enabling an image-guided “point-and-transfect” user-friendly transfection of selected cells. A sub-second transfection timescale per cell makes this method more rapid by at least two orders of magnitude when compared to alternative single-cell transfection techniques. This novel technology provides the ability to carry out large-scale cell selective genetic studies on neuronal ensembles and perform rapid genetic programming of neural circuits. PMID:24257461
Antkowiak, Maciej; Torres-Mapa, Maria Leilani; Witts, Emily C; Miles, Gareth B; Dholakia, Kishan; Gunn-Moore, Frank J
2013-11-21
A prevailing problem in neuroscience is the fast and targeted delivery of DNA into selected neurons. The development of an appropriate methodology would enable the transfection of multiple genes into the same cell or different genes into different neighboring cells as well as rapid cell selective functionalization of neurons. Here, we show that optimized femtosecond optical transfection fulfills these requirements. We also demonstrate successful optical transfection of channelrhodopsin-2 in single selected neurons. We extend the functionality of this technique for wider uptake by neuroscientists by using fast three-dimensional laser beam steering enabling an image-guided "point-and-transfect" user-friendly transfection of selected cells. A sub-second transfection timescale per cell makes this method more rapid by at least two orders of magnitude when compared to alternative single-cell transfection techniques. This novel technology provides the ability to carry out large-scale cell selective genetic studies on neuronal ensembles and perform rapid genetic programming of neural circuits.
Tagliavia, Marcello; Cuttitta, Angela
2016-01-01
High rates of plasmid instability are associated with the use of some expression vectors in Escherichia coli, resulting in the loss of recombinant protein expression. This is due to sequence alterations in vector promoter elements caused by the background expression of the cloned gene, which leads to the selection of fast-growing, plasmid-containing cells that do not express the target protein. This phenomenon, which is worsened when expressing toxic proteins, results in preparations containing very little or no recombinant protein, or even in clone loss; however, no methods to prevent loss of recombinant protein expression are currently available. We have exploited the phenomenon of translational coupling, a mechanism of prokaryotic gene expression regulation, in order to select cells containing plasmids still able to express recombinant proteins. Here we designed an expression vector in which the cloned gene and selection marker are co-expressed. Our approach allowed for the selection of the recombinant protein-expressing cells and proved effective even for clones encoding toxic proteins.
Sternburg, Erin L; Dias, Kristen C; Karginov, Fedor V
2017-06-16
The CRISPR/Cas9 genome engineering system has revolutionized biology by allowing for precise genome editing with little effort. Guided by a single guide RNA (sgRNA) that confers specificity, the Cas9 protein cleaves both DNA strands at the targeted locus. The DNA break can trigger either non-homologous end joining (NHEJ) or homology directed repair (HDR). NHEJ can introduce small deletions or insertions which lead to frame-shift mutations, while HDR allows for larger and more precise perturbations. Here, we present protocols for generating knockout cell lines by coupling established CRISPR/Cas9 methods with two options for downstream selection/screening. The NHEJ approach uses a single sgRNA cut site and selection-independent screening, where protein production is assessed by dot immunoblot in a high-throughput manner. The HDR approach uses two sgRNA cut sites that span the gene of interest. Together with a provided HDR template, this method can achieve deletion of tens of kb, aided by the inserted selectable resistance marker. The appropriate applications and advantages of each method are discussed.
Development of a Universal RNA Beacon for Exogenous Gene Detection
Guo, Yuanjian; Lu, Zhongju; Cohen, Ira Stephen
2015-01-01
Stem cell therapy requires a nontoxic and high-throughput method to achieve a pure cell population to prevent teratomas that can occur if even one cell in the implant has not been transformed. A promising method to detect and separate cells expressing a particular gene is RNA beacon technology. However, developing a successful, specific beacon to a particular transfected gene can take months to develop and in some cases is impossible. Here, we report on an off-the-shelf universal beacon that decreases the time and cost of applying beacon technology to select any living cell population transfected with an exogenous gene. PMID:25769653
Development of a universal RNA beacon for exogenous gene detection.
Guo, Yuanjian; Lu, Zhongju; Cohen, Ira Stephen; Scarlata, Suzanne
2015-05-01
Stem cell therapy requires a nontoxic and high-throughput method to achieve a pure cell population to prevent teratomas that can occur if even one cell in the implant has not been transformed. A promising method to detect and separate cells expressing a particular gene is RNA beacon technology. However, developing a successful, specific beacon to a particular transfected gene can take months to develop and in some cases is impossible. Here, we report on an off-the-shelf universal beacon that decreases the time and cost of applying beacon technology to select any living cell population transfected with an exogenous gene. ©AlphaMed Press.
Hey, Jody; Nielsen, Rasmus
2004-01-01
The genetic study of diverging, closely related populations is required for basic questions on demography and speciation, as well as for biodiversity and conservation research. However, it is often unclear whether divergence is due simply to separation or whether populations have also experienced gene flow. These questions can be addressed with a full model of population separation with gene flow, by applying a Markov chain Monte Carlo method for estimating the posterior probability distribution of model parameters. We have generalized this method and made it applicable to data from multiple unlinked loci. These loci can vary in their modes of inheritance, and inheritance scalars can be implemented either as constants or as parameters to be estimated. By treating inheritance scalars as parameters it is also possible to address variation among loci in the impact via linkage of recurrent selective sweeps or background selection. These methods are applied to a large multilocus data set from Drosophila pseudoobscura and D. persimilis. The species are estimated to have diverged approximately 500,000 years ago. Several loci have nonzero estimates of gene flow since the initial separation of the species, with considerable variation in gene flow estimates among loci, in both directions between the species. PMID:15238526
Genomic Selection in Plant Breeding: Methods, Models, and Perspectives.
Crossa, José; Pérez-Rodríguez, Paulino; Cuevas, Jaime; Montesinos-López, Osval; Jarquín, Diego; de Los Campos, Gustavo; Burgueño, Juan; González-Camacho, Juan M; Pérez-Elizalde, Sergio; Beyene, Yoseph; Dreisigacker, Susanne; Singh, Ravi; Zhang, Xuecai; Gowda, Manje; Roorkiwal, Manish; Rutkoski, Jessica; Varshney, Rajeev K
2017-11-01
Genomic selection (GS) facilitates the rapid selection of superior genotypes and accelerates the breeding cycle. In this review, we discuss the history, principles, and basis of GS and genomic-enabled prediction (GP) as well as the genetics and statistical complexities of GP models, including genomic genotype×environment (G×E) interactions. We also examine the accuracy of GP models and methods for two cereal crops and two legume crops based on random cross-validation. GS applied to maize breeding has shown tangible genetic gains. Based on GP results, we speculate how GS in germplasm enhancement (i.e., prebreeding) programs could accelerate the flow of genes from gene bank accessions to elite lines. Recent advances in hyperspectral image technology could be combined with GS and pedigree-assisted breeding. Copyright © 2017 Elsevier Ltd. All rights reserved.
Ander, Bradley P.; Zhang, Xiaoshuai; Xue, Fuzhong; Sharp, Frank R.; Yang, Xiaowei
2013-01-01
The discovery of genetic or genomic markers plays a central role in the development of personalized medicine. A notable challenge exists when dealing with the high dimensionality of the data sets, as thousands of genes or millions of genetic variants are collected on a relatively small number of subjects. Traditional gene-wise selection methods using univariate analyses face difficulty to incorporate correlational, structural, or functional structures amongst the molecular measures. For microarray gene expression data, we first summarize solutions in dealing with ‘large p, small n’ problems, and then propose an integrative Bayesian variable selection (iBVS) framework for simultaneously identifying causal or marker genes and regulatory pathways. A novel partial least squares (PLS) g-prior for iBVS is developed to allow the incorporation of prior knowledge on gene-gene interactions or functional relationships. From the point view of systems biology, iBVS enables user to directly target the joint effects of multiple genes and pathways in a hierarchical modeling diagram to predict disease status or phenotype. The estimated posterior selection probabilities offer probabilitic and biological interpretations. Both simulated data and a set of microarray data in predicting stroke status are used in validating the performance of iBVS in a Probit model with binary outcomes. iBVS offers a general framework for effective discovery of various molecular biomarkers by combining data-based statistics and knowledge-based priors. Guidelines on making posterior inferences, determining Bayesian significance levels, and improving computational efficiencies are also discussed. PMID:23844055
Peng, Bin; Zhu, Dianwen; Ander, Bradley P; Zhang, Xiaoshuai; Xue, Fuzhong; Sharp, Frank R; Yang, Xiaowei
2013-01-01
The discovery of genetic or genomic markers plays a central role in the development of personalized medicine. A notable challenge exists when dealing with the high dimensionality of the data sets, as thousands of genes or millions of genetic variants are collected on a relatively small number of subjects. Traditional gene-wise selection methods using univariate analyses face difficulty to incorporate correlational, structural, or functional structures amongst the molecular measures. For microarray gene expression data, we first summarize solutions in dealing with 'large p, small n' problems, and then propose an integrative Bayesian variable selection (iBVS) framework for simultaneously identifying causal or marker genes and regulatory pathways. A novel partial least squares (PLS) g-prior for iBVS is developed to allow the incorporation of prior knowledge on gene-gene interactions or functional relationships. From the point view of systems biology, iBVS enables user to directly target the joint effects of multiple genes and pathways in a hierarchical modeling diagram to predict disease status or phenotype. The estimated posterior selection probabilities offer probabilitic and biological interpretations. Both simulated data and a set of microarray data in predicting stroke status are used in validating the performance of iBVS in a Probit model with binary outcomes. iBVS offers a general framework for effective discovery of various molecular biomarkers by combining data-based statistics and knowledge-based priors. Guidelines on making posterior inferences, determining Bayesian significance levels, and improving computational efficiencies are also discussed.
Zhu, Li-Ping; Yue, Xin-Jing; Han, Kui; Li, Zhi-Feng; Zheng, Lian-Shuai; Yi, Xiu-Nan; Wang, Hai-Long; Zhang, You-Ming; Li, Yue-Zhong
2015-07-22
Exotic genes, especially clustered multiple-genes for a complex pathway, are normally integrated into chromosome for heterologous expression. The influences of insertion sites on heterologous expression and allotropic expressions of exotic genes on host remain mostly unclear. We compared the integration and expression efficiencies of single and multiple exotic genes that were inserted into Myxococcus xanthus genome by transposition and attB-site-directed recombination. While the site-directed integration had a rather stable chloramphenicol acetyl transferase (CAT) activity, the transposition produced varied CAT enzyme activities. We attempted to integrate the 56-kb gene cluster for the biosynthesis of antitumor polyketides epothilones into M. xanthus genome by site-direction but failed, which was determined to be due to the insertion size limitation at the attB site. The transposition technique produced many recombinants with varied production capabilities of epothilones, which, however, were not paralleled to the transcriptional characteristics of the local sites where the genes were integrated. Comparative transcriptomics analysis demonstrated that the allopatric integrations caused selective changes of host transcriptomes, leading to varied expressions of epothilone genes in different mutants. With the increase of insertion fragment size, transposition is a more practicable integration method for the expression of exotic genes. Allopatric integrations selectively change host transcriptomes, which lead to varied expression efficiencies of exotic genes.
Liu, Jing; Wang, Qun; Sun, Minying; Zhu, Linlin; Yang, Michael; Zhao, Yu
2014-01-01
Quantitative real-time reverse transcription PCR (qRT-PCR) has become a widely used method for gene expression analysis; however, its data interpretation largely depends on the stability of reference genes. The transcriptomics of Panax ginseng, one of the most popular and traditional ingredients used in Chinese medicines, is increasingly being studied. Furthermore, it is vital to establish a series of reliable reference genes when qRT-PCR is used to assess the gene expression profile of ginseng. In this study, we screened out candidate reference genes for ginseng using gene expression data generated by a high-throughput sequencing platform. Based on the statistical tests, 20 reference genes (10 traditional housekeeping genes and 10 novel genes) were selected. These genes were tested for the normalization of expression levels in five growth stages and three distinct plant organs of ginseng by qPCR. These genes were subsequently ranked and compared according to the stability of their expressions using geNorm, NormFinder, and BestKeeper computational programs. Although the best reference genes were found to vary across different samples, CYP and EF-1α were the most stable genes amongst all samples. GAPDH/30S RPS20, CYP/60S RPL13 and CYP/QCR were the optimum pair of reference genes in the roots, stems, and leaves. CYP/60S RPL13, CYP/eIF-5A, aTUB/V-ATP, eIF-5A/SAR1, and aTUB/pol IIa were the most stably expressed combinations in each of the five developmental stages. Our study serves as a foundation for developing an accurate method of qRT-PCR and will benefit future studies on gene expression profiles of Panax Ginseng.
Two-Way Gene Interaction From Microarray Data Based on Correlation Methods
Alavi Majd, Hamid; Talebi, Atefeh; Gilany, Kambiz; Khayyer, Nasibeh
2016-01-01
Background Gene networks have generated a massive explosion in the development of high-throughput techniques for monitoring various aspects of gene activity. Networks offer a natural way to model interactions between genes, and extracting gene network information from high-throughput genomic data is an important and difficult task. Objectives The purpose of this study is to construct a two-way gene network based on parametric and nonparametric correlation coefficients. The first step in constructing a Gene Co-expression Network is to score all pairs of gene vectors. The second step is to select a score threshold and connect all gene pairs whose scores exceed this value. Materials and Methods In the foundation-application study, we constructed two-way gene networks using nonparametric methods, such as Spearman’s rank correlation coefficient and Blomqvist’s measure, and compared them with Pearson’s correlation coefficient. We surveyed six genes of venous thrombosis disease, made a matrix entry representing the score for the corresponding gene pair, and obtained two-way interactions using Pearson’s correlation, Spearman’s rank correlation, and Blomqvist’s coefficient. Finally, these methods were compared with Cytoscape, based on BIND, and Gene Ontology, based on molecular function visual methods; R software version 3.2 and Bioconductor were used to perform these methods. Results Based on the Pearson and Spearman correlations, the results were the same and were confirmed by Cytoscape and GO visual methods; however, Blomqvist’s coefficient was not confirmed by visual methods. Conclusions Some results of the correlation coefficients are not the same with visualization. The reason may be due to the small number of data. PMID:27621916
Chan, Pek-Lan; Rose, Ray J.; Abdul Murad, Abdul Munir; Zainal, Zamri; Leslie Low, Eng-Ti; Ooi, Leslie Cheng-Li; Ooi, Siew-Eng; Yahya, Suzaini; Singh, Rajinder
2014-01-01
Background The somatic embryogenesis tissue culture process has been utilized to propagate high yielding oil palm. Due to the low callogenesis and embryogenesis rates, molecular studies were initiated to identify genes regulating the process, and their expression levels are usually quantified using reverse transcription quantitative real-time PCR (RT-qPCR). With the recent release of oil palm genome sequences, it is crucial to establish a proper strategy for gene analysis using RT-qPCR. Selection of the most suitable reference genes should be performed for accurate quantification of gene expression levels. Results In this study, eight candidate reference genes selected from cDNA microarray study and literature review were evaluated comprehensively across 26 tissue culture samples using RT-qPCR. These samples were collected from two tissue culture lines and media treatments, which consisted of leaf explants cultures, callus and embryoids from consecutive developmental stages. Three statistical algorithms (geNorm, NormFinder and BestKeeper) confirmed that the expression stability of novel reference genes (pOP-EA01332, PD00380 and PD00569) outperformed classical housekeeping genes (GAPDH, NAD5, TUBULIN, UBIQUITIN and ACTIN). PD00380 and PD00569 were identified as the most stably expressed genes in total samples, MA2 and MA8 tissue culture lines. Their applicability to validate the expression profiles of a putative ethylene-responsive transcription factor 3-like gene demonstrated the importance of using the geometric mean of two genes for normalization. Conclusions Systematic selection of the most stably expressed reference genes for RT-qPCR was established in oil palm tissue culture samples. PD00380 and PD00569 were selected for accurate and reliable normalization of gene expression data from RT-qPCR. These data will be valuable to the research associated with the tissue culture process. Also, the method described here will facilitate the selection of appropriate reference genes in other oil palm tissues and in the expression profiling of genes relating to yield, biotic and abiotic stresses. PMID:24927412
Genome engineering using a synthetic gene circuit in Bacillus subtilis.
Jeong, Da-Eun; Park, Seung-Hwan; Pan, Jae-Gu; Kim, Eui-Joong; Choi, Soo-Keun
2015-03-31
Genome engineering without leaving foreign DNA behind requires an efficient counter-selectable marker system. Here, we developed a genome engineering method in Bacillus subtilis using a synthetic gene circuit as a counter-selectable marker system. The system contained two repressible promoters (B. subtilis xylA (Pxyl) and spac (Pspac)) and two repressor genes (lacI and xylR). Pxyl-lacI was integrated into the B. subtilis genome with a target gene containing a desired mutation. The xylR and Pspac-chloramphenicol resistant genes (cat) were located on a helper plasmid. In the presence of xylose, repression of XylR by xylose induced LacI expression, the LacIs repressed the Pspac promoter and the cells become chloramphenicol sensitive. Thus, to survive in the presence of chloramphenicol, the cell must delete Pxyl-lacI by recombination between the wild-type and mutated target genes. The recombination leads to mutation of the target gene. The remaining helper plasmid was removed easily under the chloramphenicol absent condition. In this study, we showed base insertion, deletion and point mutation of the B. subtilis genome without leaving any foreign DNA behind. Additionally, we successfully deleted a 2-kb gene (amyE) and a 38-kb operon (ppsABCDE). This method will be useful to construct designer Bacillus strains for various industrial applications. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Interpreting the genomic landscape of speciation: a road map for finding barriers to gene flow.
Ravinet, M; Faria, R; Butlin, R K; Galindo, J; Bierne, N; Rafajlović, M; Noor, M A F; Mehlig, B; Westram, A M
2017-08-01
Speciation, the evolution of reproductive isolation among populations, is continuous, complex, and involves multiple, interacting barriers. Until it is complete, the effects of this process vary along the genome and can lead to a heterogeneous genomic landscape with peaks and troughs of differentiation and divergence. When gene flow occurs during speciation, barriers restricting gene flow locally in the genome lead to patterns of heterogeneity. However, genomic heterogeneity can also be produced or modified by variation in factors such as background selection and selective sweeps, recombination and mutation rate variation, and heterogeneous gene density. Extracting the effects of gene flow, divergent selection and reproductive isolation from such modifying factors presents a major challenge to speciation genomics. We argue one of the principal aims of the field is to identify the barrier loci involved in limiting gene flow. We first summarize the expected signatures of selection at barrier loci, at the genomic regions linked to them and across the entire genome. We then discuss the modifying factors that complicate the interpretation of the observed genomic landscape. Finally, we end with a road map for future speciation research: a proposal for how to account for these modifying factors and to progress towards understanding the nature of barrier loci. Despite the difficulties of interpreting empirical data, we argue that the availability of promising technical and analytical methods will shed further light on the important roles that gene flow and divergent selection have in shaping the genomic landscape of speciation. © 2017 European Society For Evolutionary Biology. Journal of Evolutionary Biology © 2017 European Society For Evolutionary Biology.
Ye, Weixing; Zhu, Lei; Liu, Yingying; Crickmore, Neil; Peng, Donghai; Ruan, Lifang; Sun, Ming
2012-07-01
We have designed a high-throughput system for the identification of novel crystal protein genes (cry) from Bacillus thuringiensis strains. The system was developed with two goals: (i) to acquire the mixed plasmid-enriched genomic sequence of B. thuringiensis using next-generation sequencing biotechnology, and (ii) to identify cry genes with a computational pipeline (using BtToxin_scanner). In our pipeline method, we employed three different kinds of well-developed prediction methods, BLAST, hidden Markov model (HMM), and support vector machine (SVM), to predict the presence of Cry toxin genes. The pipeline proved to be fast (average speed, 1.02 Mb/min for proteins and open reading frames [ORFs] and 1.80 Mb/min for nucleotide sequences), sensitive (it detected 40% more protein toxin genes than a keyword extraction method using genomic sequences downloaded from GenBank), and highly specific. Twenty-one strains from our laboratory's collection were selected based on their plasmid pattern and/or crystal morphology. The plasmid-enriched genomic DNA was extracted from these strains and mixed for Illumina sequencing. The sequencing data were de novo assembled, and a total of 113 candidate cry sequences were identified using the computational pipeline. Twenty-seven candidate sequences were selected on the basis of their low level of sequence identity to known cry genes, and eight full-length genes were obtained with PCR. Finally, three new cry-type genes (primary ranks) and five cry holotypes, which were designated cry8Ac1, cry7Ha1, cry21Ca1, cry32Fa1, and cry21Da1 by the B. thuringiensis Toxin Nomenclature Committee, were identified. The system described here is both efficient and cost-effective and can greatly accelerate the discovery of novel cry genes.
Annotating novel genes by integrating synthetic lethals and genomic information
Schöner, Daniel; Kalisch, Markus; Leisner, Christian; Meier, Lukas; Sohrmann, Marc; Faty, Mahamadou; Barral, Yves; Peter, Matthias; Gruissem, Wilhelm; Bühlmann, Peter
2008-01-01
Background Large scale screening for synthetic lethality serves as a common tool in yeast genetics to systematically search for genes that play a role in specific biological processes. Often the amounts of data resulting from a single large scale screen far exceed the capacities of experimental characterization of every identified target. Thus, there is need for computational tools that select promising candidate genes in order to reduce the number of follow-up experiments to a manageable size. Results We analyze synthetic lethality data for arp1 and jnm1, two spindle migration genes, in order to identify novel members in this process. To this end, we use an unsupervised statistical method that integrates additional information from biological data sources, such as gene expression, phenotypic profiling, RNA degradation and sequence similarity. Different from existing methods that require large amounts of synthetic lethal data, our method merely relies on synthetic lethality information from two single screens. Using a Multivariate Gaussian Mixture Model, we determine the best subset of features that assign the target genes to two groups. The approach identifies a small group of genes as candidates involved in spindle migration. Experimental testing confirms the majority of our candidates and we present she1 (YBL031W) as a novel gene involved in spindle migration. We applied the statistical methodology also to TOR2 signaling as another example. Conclusion We demonstrate the general use of Multivariate Gaussian Mixture Modeling for selecting candidate genes for experimental characterization from synthetic lethality data sets. For the given example, integration of different data sources contributes to the identification of genetic interaction partners of arp1 and jnm1 that play a role in the same biological process. PMID:18194531
Wang, Tao; Li, Hua; Wang, Hua; Su, Jing
2015-04-16
The present study established a typing method with NotI-based pulsed-field gel electrophoresis (PFGE) and stress response gene schemed multilocus sequence typing (MLST) for 55 Oenococcus oeni strains isolated from six individual regions in China and two model strains PSU-1 (CP000411) and ATCC BAA-1163 (AAUV00000000). Seven stress response genes, cfa, clpL, clpP, ctsR, mleA, mleP and omrA, were selected for MLST testing, and positive selective pressure was detected for these genes. Furthermore, both methods separated the strains into two clusters. The PFGE clusters are correlated with the region, whereas the sequence types (STs) formed by the MLST confirm the two clusters identified by PFGE. In addition, the population structure was a mixture of evolutionary pathways, and the strains exhibited both clonal and panmictic characteristics. Copyright © 2015 Elsevier B.V. All rights reserved.
Houtz, Robert L.
1998-01-01
The gene sequence for ribulose-1,5-bisphosphate carboxylase/oxygenase (Rubisco) large subunit (LS) .epsilon.N-methyltransferase (protein methylase III or Rubisco LSMT) is disclosed. This enzyme catalyzes methylation of the .epsilon.-amine of lysine-14 in the large subunit of Rubisco. In addition, a full-length cDNA clone for Rubisco LSMT is disclosed. Transgenic plants and methods of producing same which (1) have the Rubisco LSMT gene inserted into the DNA, and (2) have the Rubisco LSMT gene product or the action of the gene product deleted from the DNA are also provided. Further, methods of using the gene to selectively deliver desired agents to a plant are also disclosed.
Houtz, Robert L.
1999-01-01
The gene sequence for ribulose-1,5-bisphosphate carboxylase/oxygenase (Rubisco) large subunit (LS) .sup..epsilon. N-methyltransferase (protein methylase III or Rubisco LSMT) is disclosed. This enzyme catalyzes methylation of the .epsilon.-amine of lysine-14 in the large subunit of Rubisco. In addition, a full-length cDNA clone for Rubisco LSMT is disclosed. Transgenic plants and methods of producing same which (1) have the Rubisco LSMT gene inserted into the DNA, and (2) have the Rubisco LSMT gene product or the action of the gene product deleted from the DNA are also provided. Further, methods of using the gene to selectively deliver desired agents to a plant are also disclosed.
Houtz, R.L.
1998-03-03
The gene sequence for ribulose-1,5-bisphosphate carboxylase/oxygenase (Rubisco) large subunit (LS) {epsilon}N-methyltransferase (protein methylase III or Rubisco LSMT) is disclosed. This enzyme catalyzes methylation of the {epsilon}-amine of lysine-14 in the large subunit of Rubisco. In addition, a full-length cDNA clone for Rubisco LSMT is disclosed. Transgenic plants and methods of producing same which (1) have the Rubisco LSMT gene inserted into the DNA, and (2) have the Rubisco LSMT gene product or the action of the gene product deleted from the DNA are also provided. Further, methods of using the gene to selectively deliver desired agents to a plant are also disclosed. 5 figs.
Houtz, R.L.
1999-02-02
The gene sequence for ribulose-1,5-bisphosphate carboxylase/oxygenase (Rubisco) large subunit (LS){sup {epsilon}}N-methyltransferase (protein methylase III or Rubisco LSMT) is disclosed. This enzyme catalyzes methylation of the {epsilon}-amine of lysine-14 in the large subunit of Rubisco. In addition, a full-length cDNA clone for Rubisco LSMT is disclosed. Transgenic plants and methods of producing same which (1) have the Rubisco LSMT gene inserted into the DNA, and (2) have the Rubisco LSMT gene product or the action of the gene product deleted from the DNA are also provided. Further, methods of using the gene to selectively deliver desired agents to a plant are also disclosed. 8 figs.
Novel sull binary vectors enable an inexpensive foliar selection method in Arabidopsis
USDA-ARS?s Scientific Manuscript database
Sulfonamide resistance is conferred by the sulI gene found on many Enterobacteriaceae R plasmids and Tn21 type transposons. The sulI gene encodes a sulfonamide insensitive dihydropteroate synthase enzyme required for folate biosynthesis. Transformation of tobacco, potato or Arabidopsis using sulI as...
Reference genes for quantitative PCR in the adipose tissue of mice with metabolic disease.
Almeida-Oliveira, Fernanda; Leandro, João G B; Ausina, Priscila; Sola-Penna, Mauro; Majerowicz, David
2017-04-01
Obesity and diabetes are metabolic diseases and they are increasing in prevalence. The dynamics of gene expression associated with these diseases is fundamental to identifying genes involved in related biological processes. qPCR is a sensitive technique for mRNA quantification and the most commonly used method in gene-expression studies. However, the reliability of these results is directly influenced by data normalization. As reference genes are the major normalization method used, this work aims to identify reference genes for qPCR in adipose tissues of mice with type-I diabetes or obesity. We selected 12 genes that are commonly used as reference genes. The expression of these genes in the adipose tissues of mice was analyzed in the context of three different experimental protocols: 1) untreated animals; 2) high-fat-diet animals; and 3) streptozotocin-treated animals. Gene-expression stability was analyzed using four different algorithms. Our data indicate that TATA-binding protein is stably expressed across adipose tissues in control animals. This gene was also a useful reference when the brown adipose tissues of control and obese mice were analyzed. The mitochondrial ATP synthase F1 complex gene exhibits stable expression in subcutaneous and perigonadal adipose tissue from control and obese mice. Moreover, this gene is the best reference for qPCR normalization in adipose tissue from streptozotocin-treated animals. These results show that there is no perfect stable gene suited for use under all experimental conditions. In conclusion, the selection of appropriate genes is a prerequisite to ensure qPCR reliability and must be performed separately for different experimental protocols. Copyright © 2017 Elsevier Masson SAS. All rights reserved.
High degree of genetic differentiation in marine three-spined sticklebacks (Gasterosteus aculeatus).
Defaveri, Jacquelin; Shikano, Takahito; Shimada, Yukinori; Merilä, Juha
2013-09-01
Populations of widespread marine organisms are typically characterized by a low degree of genetic differentiation in neutral genetic markers, but much less is known about differentiation in genes whose functional roles are associated with specific selection regimes. To uncover possible adaptive population divergence and heterogeneous genomic differentiation in marine three-spined sticklebacks (Gasterosteus aculeatus), we used a candidate gene-based genome-scan approach to analyse variability in 138 microsatellite loci located within/close to (<6 kb) functionally important genes in samples collected from ten geographic locations. The degree of genetic differentiation in markers classified as neutral or under balancing selection-as determined with several outlier detection methods-was low (F(ST) = 0.033 or 0.011, respectively), whereas average FST for directionally selected markers was significantly higher (F(ST) = 0.097). Clustering analyses provided support for genomic and geographic heterogeneity in selection: six genetic clusters were identified based on allele frequency differences in the directionally selected loci, whereas four were identified with the neutral loci. Allelic variation in several loci exhibited significant associations with environmental variables, supporting the conjecture that temperature and salinity, but not optic conditions, are important drivers of adaptive divergence among populations. In general, these results suggest that in spite of the high degree of physical connectivity and gene flow as inferred from neutral marker genes, marine stickleback populations are strongly genetically structured in loci associated with functionally relevant genes. © 2013 John Wiley & Sons Ltd.
Evolutionary maintenance of filovirus-like genes in bat genomes
2011-01-01
Background Little is known of the biological significance and evolutionary maintenance of integrated non-retroviral RNA virus genes in eukaryotic host genomes. Here, we isolated novel filovirus-like genes from bat genomes and tested for evolutionary maintenance. We also estimated the age of filovirus VP35-like gene integrations and tested the phylogenetic hypotheses that there is a eutherian mammal clade and a marsupial/ebolavirus/Marburgvirus dichotomy for filoviruses. Results We detected homologous copies of VP35-like and NP-like gene integrations in both Old World and New World species of Myotis (bats). We also detected previously unknown VP35-like genes in rodents that are positionally homologous. Comprehensive phylogenetic estimates for filovirus NP-like and VP35-like loci support two main clades with a marsupial and a rodent grouping within the ebolavirus/Lloviu virus/Marburgvirus clade. The concordance of VP35-like, NP-like and mitochondrial gene trees with the expected species tree supports the notion that the copies we examined are orthologs that predate the global spread and radiation of the genus Myotis. Parametric simulations were consistent with selective maintenance for the open reading frame (ORF) of VP35-like genes in Myotis. The ORF of the filovirus-like VP35 gene has been maintained in bat genomes for an estimated 13. 4 MY. ORFs were disrupted for the NP-like genes in Myotis. Likelihood ratio tests revealed that a model that accommodates positive selection is a significantly better fit to the data than a model that does not allow for positive selection for VP35-like sequences. Moreover, site-by-site analysis of selection using two methods indicated at least 25 sites in the VP35-like alignment are under positive selection in Myotis. Conclusions Our results indicate that filovirus-like elements have significance beyond genomic imprints of prior infection. That is, there appears to be, or have been, functionally maintained copies of such genes in mammals. "Living fossils" of filoviruses appear to be selectively maintained in a diverse mammalian genus (Myotis). PMID:22093762
2013-01-01
Background Currently, there is very limited knowledge about the genes involved in normal pigmentation variation in East Asian populations. We carried out a genome-wide scan of signatures of positive selection using the 1000 Genomes Phase I dataset, in order to identify pigmentation genes showing putative signatures of selective sweeps in East Asia. We applied a broad range of methods to detect signatures of selection including: 1) Tests designed to identify deviations of the Site Frequency Spectrum (SFS) from neutral expectations (Tajima’s D, Fay and Wu’s H and Fu and Li’s D* and F*), 2) Tests focused on the identification of high-frequency haplotypes with extended linkage disequilibrium (iHS and Rsb) and 3) Tests based on genetic differentiation between populations (LSBL). Based on the results obtained from a genome wide analysis of 25 kb windows, we constructed an empirical distribution for each statistic across all windows, and identified pigmentation genes that are outliers in the distribution. Results Our tests identified twenty genes that are relevant for pigmentation biology. Of these, eight genes (ATRN, EDAR, KLHL7, MITF, OCA2, TH, TMEM33 and TRPM1,) were extreme outliers (top 0.1% of the empirical distribution) for at least one statistic, and twelve genes (ADAM17, BNC2, CTSD, DCT, EGFR, LYST, MC1R, MLPH, OPRM1, PDIA6, PMEL (SILV) and TYRP1) were in the top 1% of the empirical distribution for at least one statistic. Additionally, eight of these genes (BNC2, EGFR, LYST, MC1R, OCA2, OPRM1, PMEL (SILV) and TYRP1) have been associated with pigmentary traits in association studies. Conclusions We identified a number of putative pigmentation genes showing extremely unusual patterns of genetic variation in East Asia. Most of these genes are outliers for different tests and/or different populations, and have already been described in previous scans for positive selection, providing strong support to the hypothesis that recent selective sweeps left a signature in these regions. However, it will be necessary to carry out association and functional studies to demonstrate the implication of these genes in normal pigmentation variation. PMID:23848512
Hider, Jessica L; Gittelman, Rachel M; Shah, Tapan; Edwards, Melissa; Rosenbloom, Arnold; Akey, Joshua M; Parra, Esteban J
2013-07-12
Currently, there is very limited knowledge about the genes involved in normal pigmentation variation in East Asian populations. We carried out a genome-wide scan of signatures of positive selection using the 1000 Genomes Phase I dataset, in order to identify pigmentation genes showing putative signatures of selective sweeps in East Asia. We applied a broad range of methods to detect signatures of selection including: 1) Tests designed to identify deviations of the Site Frequency Spectrum (SFS) from neutral expectations (Tajima's D, Fay and Wu's H and Fu and Li's D* and F*), 2) Tests focused on the identification of high-frequency haplotypes with extended linkage disequilibrium (iHS and Rsb) and 3) Tests based on genetic differentiation between populations (LSBL). Based on the results obtained from a genome wide analysis of 25 kb windows, we constructed an empirical distribution for each statistic across all windows, and identified pigmentation genes that are outliers in the distribution. Our tests identified twenty genes that are relevant for pigmentation biology. Of these, eight genes (ATRN, EDAR, KLHL7, MITF, OCA2, TH, TMEM33 and TRPM1,) were extreme outliers (top 0.1% of the empirical distribution) for at least one statistic, and twelve genes (ADAM17, BNC2, CTSD, DCT, EGFR, LYST, MC1R, MLPH, OPRM1, PDIA6, PMEL (SILV) and TYRP1) were in the top 1% of the empirical distribution for at least one statistic. Additionally, eight of these genes (BNC2, EGFR, LYST, MC1R, OCA2, OPRM1, PMEL (SILV) and TYRP1) have been associated with pigmentary traits in association studies. We identified a number of putative pigmentation genes showing extremely unusual patterns of genetic variation in East Asia. Most of these genes are outliers for different tests and/or different populations, and have already been described in previous scans for positive selection, providing strong support to the hypothesis that recent selective sweeps left a signature in these regions. However, it will be necessary to carry out association and functional studies to demonstrate the implication of these genes in normal pigmentation variation.
Validating internal controls for quantitative plant gene expression studies
Brunner, Amy M; Yakovlev, Igor A; Strauss, Steven H
2004-01-01
Background Real-time reverse transcription PCR (RT-PCR) has greatly improved the ease and sensitivity of quantitative gene expression studies. However, accurate measurement of gene expression with this method relies on the choice of a valid reference for data normalization. Studies rarely verify that gene expression levels for reference genes are adequately consistent among the samples used, nor compare alternative genes to assess which are most reliable for the experimental conditions analyzed. Results Using real-time RT-PCR to study the expression of 10 poplar (genus Populus) housekeeping genes, we demonstrate a simple method for determining the degree of stability of gene expression over a set of experimental conditions. Based on a traditional method for analyzing the stability of varieties in plant breeding, it defines measures of gene expression stability from analysis of variance (ANOVA) and linear regression. We found that the potential internal control genes differed widely in their expression stability over the different tissues, developmental stages and environmental conditions studied. Conclusion Our results support that quantitative comparisons of candidate reference genes are an important part of real-time RT-PCR studies that seek to precisely evaluate variation in gene expression. The method we demonstrated facilitates statistical and graphical evaluation of gene expression stability. Selection of the best reference gene for a given set of experimental conditions should enable detection of biologically significant changes in gene expression that are too small to be revealed by less precise methods, or when highly variable reference genes are unknowingly used in real-time RT-PCR experiments. PMID:15317655
Comparison of statistical tests for association between rare variants and binary traits.
Bacanu, Silviu-Alin; Nelson, Matthew R; Whittaker, John C
2012-01-01
Genome-wide association studies have found thousands of common genetic variants associated with a wide variety of diseases and other complex traits. However, a large portion of the predicted genetic contribution to many traits remains unknown. One plausible explanation is that some of the missing variation is due to the effects of rare variants. Nonetheless, the statistical analysis of rare variants is challenging. A commonly used method is to contrast, within the same region (gene), the frequency of minor alleles at rare variants between cases and controls. However, this strategy is most useful under the assumption that the tested variants have similar effects. We previously proposed a method that can accommodate heterogeneous effects in the analysis of quantitative traits. Here we extend this method to include binary traits that can accommodate covariates. We use simulations for a variety of causal and covariate impact scenarios to compare the performance of the proposed method to standard logistic regression, C-alpha, SKAT, and EREC. We found that i) logistic regression methods perform well when the heterogeneity of the effects is not extreme and ii) SKAT and EREC have good performance under all tested scenarios but they can be computationally intensive. Consequently, it would be more computationally desirable to use a two-step strategy by (i) selecting promising genes by faster methods and ii) analyzing selected genes using SKAT/EREC. To select promising genes one can use (1) regression methods when effect heterogeneity is assumed to be low and the covariates explain a non-negligible part of trait variability, (2) C-alpha when heterogeneity is assumed to be large and covariates explain a small fraction of trait's variability and (3) the proposed trend and heterogeneity test when the heterogeneity is assumed to be non-trivial and the covariates explain a large fraction of trait variability.
Cooper, Tara E.; Krause, David J.
2013-01-01
Sulfolobus species have become the model organisms for studying the unique biology of the crenarchaeal division of the archaeal domain. In particular, Sulfolobus islandicus provides a powerful opportunity to explore natural variation via experimental functional genomics. To support these efforts, we further expanded genetic tools for S. islandicus by developing a stringent positive selection for agmatine prototrophs in strains in which the argD gene, encoding arginine decarboxylase, has been deleted. Strains with deletions in argD were shown to be auxotrophic for agmatine even in nutrient-rich medium, but growth could be restored by either supplementation of exogenous agmatine or reintroduction of a functional copy of the argD gene from S. solfataricus P2 into the ΔargD host. Using this stringent selection, a robust targeted gene knockout system was established via an improved next generation of the MID (marker insertion and unmarked target gene deletion) method. Application of this novel system was validated by targeted knockout of the upsEF genes involved in UV-inducible cell aggregation formation. PMID:23835176
Network-Based Method for Identifying Co-Regeneration Genes in Bone, Dentin, Nerve and Vessel Tissues
Pan, Hongying; Zhang, Yu-Hang; Feng, Kaiyan; Kong, XiangYin; Cai, Yu-Dong
2017-01-01
Bone and dental diseases are serious public health problems. Most current clinical treatments for these diseases can produce side effects. Regeneration is a promising therapy for bone and dental diseases, yielding natural tissue recovery with few side effects. Because soft tissues inside the bone and dentin are densely populated with nerves and vessels, the study of bone and dentin regeneration should also consider the co-regeneration of nerves and vessels. In this study, a network-based method to identify co-regeneration genes for bone, dentin, nerve and vessel was constructed based on an extensive network of protein–protein interactions. Three procedures were applied in the network-based method. The first procedure, searching, sought the shortest paths connecting regeneration genes of one tissue type with regeneration genes of other tissues, thereby extracting possible co-regeneration genes. The second procedure, testing, employed a permutation test to evaluate whether possible genes were false discoveries; these genes were excluded by the testing procedure. The last procedure, screening, employed two rules, the betweenness ratio rule and interaction score rule, to select the most essential genes. A total of seventeen genes were inferred by the method, which were deemed to contribute to co-regeneration of at least two tissues. All these seventeen genes were extensively discussed to validate the utility of the method. PMID:28974058
Chen, Lei; Pan, Hongying; Zhang, Yu-Hang; Feng, Kaiyan; Kong, XiangYin; Huang, Tao; Cai, Yu-Dong
2017-10-02
Bone and dental diseases are serious public health problems. Most current clinical treatments for these diseases can produce side effects. Regeneration is a promising therapy for bone and dental diseases, yielding natural tissue recovery with few side effects. Because soft tissues inside the bone and dentin are densely populated with nerves and vessels, the study of bone and dentin regeneration should also consider the co-regeneration of nerves and vessels. In this study, a network-based method to identify co-regeneration genes for bone, dentin, nerve and vessel was constructed based on an extensive network of protein-protein interactions. Three procedures were applied in the network-based method. The first procedure, searching, sought the shortest paths connecting regeneration genes of one tissue type with regeneration genes of other tissues, thereby extracting possible co-regeneration genes. The second procedure, testing, employed a permutation test to evaluate whether possible genes were false discoveries; these genes were excluded by the testing procedure. The last procedure, screening, employed two rules, the betweenness ratio rule and interaction score rule, to select the most essential genes. A total of seventeen genes were inferred by the method, which were deemed to contribute to co-regeneration of at least two tissues. All these seventeen genes were extensively discussed to validate the utility of the method.
Heuristic Bayesian segmentation for discovery of coexpressed genes within genomic regions.
Pehkonen, Petri; Wong, Garry; Törönen, Petri
2010-01-01
Segmentation aims to separate homogeneous areas from the sequential data, and plays a central role in data mining. It has applications ranging from finance to molecular biology, where bioinformatics tasks such as genome data analysis are active application fields. In this paper, we present a novel application of segmentation in locating genomic regions with coexpressed genes. We aim at automated discovery of such regions without requirement for user-given parameters. In order to perform the segmentation within a reasonable time, we use heuristics. Most of the heuristic segmentation algorithms require some decision on the number of segments. This is usually accomplished by using asymptotic model selection methods like the Bayesian information criterion. Such methods are based on some simplification, which can limit their usage. In this paper, we propose a Bayesian model selection to choose the most proper result from heuristic segmentation. Our Bayesian model presents a simple prior for the segmentation solutions with various segment numbers and a modified Dirichlet prior for modeling multinomial data. We show with various artificial data sets in our benchmark system that our model selection criterion has the best overall performance. The application of our method in yeast cell-cycle gene expression data reveals potential active and passive regions of the genome.
Eckert, Andrew J; Wegrzyn, Jill L; Pande, Barnaly; Jermstad, Kathleen D; Lee, Jennifer M; Liechty, John D; Tearse, Brandon R; Krutovsky, Konstantin V; Neale, David B
2009-09-01
Forest trees exhibit remarkable adaptations to their environments. The genetic basis for phenotypic adaptation to climatic gradients has been established through a long history of common garden, provenance, and genecological studies. The identities of genes underlying these traits, however, have remained elusive and thus so have the patterns of adaptive molecular diversity in forest tree genomes. Here, we report an analysis of diversity and divergence for a set of 121 cold-hardiness candidate genes in coastal Douglas fir (Pseudotsuga menziesii var. menziesii). Application of several different tests for neutrality, including those that incorporated demographic models, revealed signatures of selection consistent with selective sweeps at three to eight loci, depending upon the severity of a bottleneck event and the method used to detect selection. Given the high levels of recombination, these candidate genes are likely to be closely linked to the target of selection if not the genes themselves. Putative homologs in Arabidopsis act primarily to stabilize the plasma membrane and protect against denaturation of proteins at freezing temperatures. These results indicate that surveys of nucleotide diversity and divergence, when framed within the context of further association mapping experiments, will come full circle with respect to their utility in the dissection of complex phenotypic traits into their genetic components.
Cai, Jing; Li, Pengfei; Luo, Xiao; Chang, Tianliang; Li, Jiaxing; Zhao, Yuwei; Xu, Yao
2018-01-01
Hulless barley (Hordeum vulgare L. var. nudum. hook. f.) has been cultivated as a major crop in the Qinghai-Tibet plateau of China for thousands of years. Compared to other cereal crops, the Tibetan hulless barley has developed stronger endogenous resistances to survive in the severe environment of its habitat. To understand the unique resistant mechanisms of this plant, detailed genetic studies need to be performed. The quantitative real-time reverse transcription-polymerase chain reaction (qRT-PCR) is the most commonly used method in detecting gene expression. However, the selection of stable reference genes under limited experimental conditions was considered to be an essential step for obtaining accurate results in qRT-PCR. In this study, 10 candidate reference genes-ACT (Actin), E2 (Ubiquitin conjugating enzyme 2), TUBα (Alpha-tubulin), TUBβ6 (Beta-tubulin 6), GAPDH (Glyceraldehyde 3-phosphate dehydrogenase), EF-1α (Elongation factor 1-alpha), SAMDC (S-adenosylmethionine decarboxylase), PKABA1 (Gene for protein kinase HvPKABA1), PGK (Phosphoglycerate kinase), and HSP90 (Heat shock protein 90)-were selected from the NCBI gene database of barley. Following qRT-PCR amplifications of all candidate reference genes in Tibetan hulless barley seedlings under various stressed conditions, the stabilities of these candidates were analyzed by three individual software packages including geNorm, NormFinder, and BestKeeper. The results demonstrated that TUBβ6, E2, TUBα, and HSP90 were generally the most suitable sets under all tested conditions; similarly, TUBα and HSP90 showed peak stability under salt stress, TUBα and EF-1α were the most suitable reference genes under cold stress, and ACT and E2 were the most stable under drought stress. Finally, a known circadian gene CCA1 was used to verify the service ability of chosen reference genes. The results confirmed that all recommended reference genes by the three software were suitable for gene expression analysis under tested stress conditions by the qRT-PCR method.
Identification of Selection Footprints on the X Chromosome in Pig
Zhang, Qin; Ding, Xiangdong
2014-01-01
Identifying footprints of selection can provide a straightforward insight into the mechanism of artificial selection and further dig out the causal genes related to important traits. In this study, three between-population and two within-population approaches, the Cross Population Extend Haplotype Homozygosity Test (XPEHH), the Cross Population Composite Likelihood Ratio (XPCLR), the F-statistics (Fst), the Integrated Haplotype Score (iHS) and the Tajima's D, were implemented to detect the selection footprints on the X chromosome in three pig breeds using Illumina Porcine60K SNP chip. In the detection of selection footprints using between-population methods, 11, 11 and 7 potential selection regions with length of 15.62 Mb, 12.32 Mb and 9.38 Mb were identified in Landrace, Chinese Songliao and Yorkshire by XPEHH, respectively, and 16, 13 and 17 potential selection regions with length of 15.20 Mb, 13.00 Mb and 19.21 Mb by XPCLR, 4, 2 and 4 potential selection regions with length of 3.20 Mb, 1.60 Mb and 3.20 Mb by Fst. For within-population methods, 7, 10 and 9 potential selection regions with length of 8.12 Mb, 8.40 Mb and 9.99 Mb were identified in Landrace, Chinese Songliao and Yorkshire by iHS, and 4, 3 and 2 potential selection regions with length of 3.20 Mb, 2.40 Mb and 1.60 Mb by Tajima's D. Moreover, the selection regions from different methods were partly overlapped, especially the regions around 22 ∼25 Mb were detected under selection in Landrace and Yorkshire while no selection in Chinese Songliao by all three between-population methods. Only quite few overlap of selection regions identified by between-population and within-population methods were found. Bioinformatics analysis showed that the genes relevant with meat quality, reproduction and immune were found in potential selection regions. In addition, three out of five significant SNPs associated with hematological traits reported in our genome-wide association study were harbored in potential selection regions. PMID:24740293
Vafaee Sharbaf, Fatemeh; Mosafer, Sara; Moattar, Mohammad Hossein
2016-06-01
This paper proposes an approach for gene selection in microarray data. The proposed approach consists of a primary filter approach using Fisher criterion which reduces the initial genes and hence the search space and time complexity. Then, a wrapper approach which is based on cellular learning automata (CLA) optimized with ant colony method (ACO) is used to find the set of features which improve the classification accuracy. CLA is applied due to its capability to learn and model complicated relationships. The selected features from the last phase are evaluated using ROC curve and the most effective while smallest feature subset is determined. The classifiers which are evaluated in the proposed framework are K-nearest neighbor; support vector machine and naïve Bayes. The proposed approach is evaluated on 4 microarray datasets. The evaluations confirm that the proposed approach can find the smallest subset of genes while approaching the maximum accuracy. Copyright © 2016 Elsevier Inc. All rights reserved.
Building synthetic gene circuits from combinatorial libraries: screening and selection strategies.
Schaerli, Yolanda; Isalan, Mark
2013-07-01
The promise of wide-ranging biotechnology applications inspires synthetic biologists to design novel genetic circuits. However, building such circuits rationally is still not straightforward and often involves painstaking trial-and-error. Mimicking the process of natural selection can help us to bridge the gap between our incomplete understanding of nature's design rules and our desire to build functional networks. By adopting the powerful method of directed evolution, which is usually applied to protein engineering, functional networks can be obtained through screening or selecting from randomised combinatorial libraries. This review first highlights the practical options to introduce combinatorial diversity into gene circuits and then examines strategies for identifying the potentially rare library members with desired functions, either by screening or selection.
Dynamic association rules for gene expression data analysis.
Chen, Shu-Chuan; Tsai, Tsung-Hsien; Chung, Cheng-Han; Li, Wen-Hsiung
2015-10-14
The purpose of gene expression analysis is to look for the association between regulation of gene expression levels and phenotypic variations. This association based on gene expression profile has been used to determine whether the induction/repression of genes correspond to phenotypic variations including cell regulations, clinical diagnoses and drug development. Statistical analyses on microarray data have been developed to resolve gene selection issue. However, these methods do not inform us of causality between genes and phenotypes. In this paper, we propose the dynamic association rule algorithm (DAR algorithm) which helps ones to efficiently select a subset of significant genes for subsequent analysis. The DAR algorithm is based on association rules from market basket analysis in marketing. We first propose a statistical way, based on constructing a one-sided confidence interval and hypothesis testing, to determine if an association rule is meaningful. Based on the proposed statistical method, we then developed the DAR algorithm for gene expression data analysis. The method was applied to analyze four microarray datasets and one Next Generation Sequencing (NGS) dataset: the Mice Apo A1 dataset, the whole genome expression dataset of mouse embryonic stem cells, expression profiling of the bone marrow of Leukemia patients, Microarray Quality Control (MAQC) data set and the RNA-seq dataset of a mouse genomic imprinting study. A comparison of the proposed method with the t-test on the expression profiling of the bone marrow of Leukemia patients was conducted. We developed a statistical way, based on the concept of confidence interval, to determine the minimum support and minimum confidence for mining association relationships among items. With the minimum support and minimum confidence, one can find significant rules in one single step. The DAR algorithm was then developed for gene expression data analysis. Four gene expression datasets showed that the proposed DAR algorithm not only was able to identify a set of differentially expressed genes that largely agreed with that of other methods, but also provided an efficient and accurate way to find influential genes of a disease. In the paper, the well-established association rule mining technique from marketing has been successfully modified to determine the minimum support and minimum confidence based on the concept of confidence interval and hypothesis testing. It can be applied to gene expression data to mine significant association rules between gene regulation and phenotype. The proposed DAR algorithm provides an efficient way to find influential genes that underlie the phenotypic variance.
Veeranagouda, Yaligara; Debono-Lagneaux, Delphine; Fournet, Hamida; Thill, Gilbert; Didier, Michel
2018-01-16
The emergence of clustered regularly interspaced short palindromic repeats-Cas9 (CRISPR-Cas9) gene editing systems has enabled the creation of specific mutants at low cost, in a short time and with high efficiency, in eukaryotic cells. Since a CRISPR-Cas9 system typically creates an array of mutations in targeted sites, a successful gene editing project requires careful selection of edited clones. This process can be very challenging, especially when working with multiallelic genes and/or polyploid cells (such as cancer and plants cells). Here we described a next-generation sequencing method called CRISPR-Cas9 Edited Site Sequencing (CRES-Seq) for the efficient and high-throughput screening of CRISPR-Cas9-edited clones. CRES-Seq facilitates the precise genotyping up to 96 CRISPR-Cas9-edited sites (CRES) in a single MiniSeq (Illumina) run with an approximate sequencing cost of $6/clone. CRES-Seq is particularly useful when multiple genes are simultaneously targeted by CRISPR-Cas9, and also for screening of clones generated from multiallelic genes/polyploid cells. © 2018 by John Wiley & Sons, Inc. Copyright © 2018 John Wiley & Sons, Inc.
Fan, Yue; Wang, Xiao; Peng, Qinke
2017-01-01
Gene regulatory networks (GRNs) play an important role in cellular systems and are important for understanding biological processes. Many algorithms have been developed to infer the GRNs. However, most algorithms only pay attention to the gene expression data but do not consider the topology information in their inference process, while incorporating this information can partially compensate for the lack of reliable expression data. Here we develop a Bayesian group lasso with spike and slab priors to perform gene selection and estimation for nonparametric models. B-spline basis functions are used to capture the nonlinear relationships flexibly and penalties are used to avoid overfitting. Further, we incorporate the topology information into the Bayesian method as a prior. We present the application of our method on DREAM3 and DREAM4 datasets and two real biological datasets. The results show that our method performs better than existing methods and the topology information prior can improve the result.
Bragalini, Claudia; Ribière, Céline; Parisot, Nicolas; Vallon, Laurent; Prudent, Elsa; Peyretaillade, Eric; Girlanda, Mariangela; Peyret, Pierre; Marmeisse, Roland; Luis, Patricia
2014-01-01
Eukaryotic microbial communities play key functional roles in soil biology and potentially represent a rich source of natural products including biocatalysts. Culture-independent molecular methods are powerful tools to isolate functional genes from uncultured microorganisms. However, none of the methods used in environmental genomics allow for a rapid isolation of numerous functional genes from eukaryotic microbial communities. We developed an original adaptation of the solution hybrid selection (SHS) for an efficient recovery of functional complementary DNAs (cDNAs) synthesized from soil-extracted polyadenylated mRNAs. This protocol was tested on the Glycoside Hydrolase 11 gene family encoding endo-xylanases for which we designed 35 explorative 31-mers capture probes. SHS was implemented on four soil eukaryotic cDNA pools. After two successive rounds of capture, >90% of the resulting cDNAs were GH11 sequences, of which 70% (38 among 53 sequenced genes) were full length. Between 1.5 and 25% of the cloned captured sequences were expressed in Saccharomyces cerevisiae. Sequencing of polymerase chain reaction-amplified GH11 gene fragments from the captured sequences highlighted hundreds of phylogenetically diverse sequences that were not yet described, in public databases. This protocol offers the possibility of performing exhaustive exploration of eukaryotic gene families within microbial communities thriving in any type of environment. PMID:25281543
Zhou, Xionghui; Liu, Juan
2014-01-01
Although many methods have been proposed to reconstruct gene regulatory network, most of them, when applied in the sample-based data, can not reveal the gene regulatory relations underlying the phenotypic change (e.g. normal versus cancer). In this paper, we adopt phenotype as a variable when constructing the gene regulatory network, while former researches either neglected it or only used it to select the differentially expressed genes as the inputs to construct the gene regulatory network. To be specific, we integrate phenotype information with gene expression data to identify the gene dependency pairs by using the method of conditional mutual information. A gene dependency pair (A,B) means that the influence of gene A on the phenotype depends on gene B. All identified gene dependency pairs constitute a directed network underlying the phenotype, namely gene dependency network. By this way, we have constructed gene dependency network of breast cancer from gene expression data along with two different phenotype states (metastasis and non-metastasis). Moreover, we have found the network scale free, indicating that its hub genes with high out-degrees may play critical roles in the network. After functional investigation, these hub genes are found to be biologically significant and specially related to breast cancer, which suggests that our gene dependency network is meaningful. The validity has also been justified by literature investigation. From the network, we have selected 43 discriminative hubs as signature to build the classification model for distinguishing the distant metastasis risks of breast cancer patients, and the result outperforms those classification models with published signatures. In conclusion, we have proposed a promising way to construct the gene regulatory network by using sample-based data, which has been shown to be effective and accurate in uncovering the hidden mechanism of the biological process and identifying the gene signature for phenotypic change.
Soares Medeiros, Lia Carolina; South, Lilith; Peng, Duo; Bustamante, Juan M; Wang, Wei; Bunkofske, Molly; Perumal, Natasha; Sanchez-Valdez, Fernando; Tarleton, Rick L
2017-11-07
Trypanosomatids (order Kinetoplastida), including the human pathogens Trypanosoma cruzi (agent of Chagas disease), Trypanosoma brucei , (African sleeping sickness), and Leishmania (leishmaniasis), affect millions of people and animals globally. T. cruzi is considered one of the least studied and most poorly understood tropical disease-causing parasites, in part because of the relative lack of facile genetic engineering tools. This situation has improved recently through the application of clustered regularly interspaced short palindromic repeats-CRISPR-associated protein 9 (CRISPR-Cas9) technology, but a number of limitations remain, including the toxicity of continuous Cas9 expression and the long drug marker selection times. In this study, we show that the delivery of ribonucleoprotein (RNP) complexes composed of recombinant Cas9 from Staphylococcus aureus (SaCas9), but not from the more routinely used Streptococcus pyogenes Cas9 (SpCas9), and in vitro -transcribed single guide RNAs (sgRNAs) results in rapid gene edits in T. cruzi and other kinetoplastids at frequencies approaching 100%. The highly efficient genome editing via SaCas9/sgRNA RNPs was obtained for both reporter and endogenous genes and observed in multiple parasite life cycle stages in various strains of T. cruzi , as well as in T. brucei and Leishmania major RNP complex delivery was also used to successfully tag proteins at endogenous loci and to assess the biological functions of essential genes. Thus, the use of SaCas9 RNP complexes for gene editing in kinetoplastids provides a simple, rapid, and cloning- and selection-free method to assess gene function in these important human pathogens. IMPORTANCE Protozoan parasites remain some of the highest-impact human and animal pathogens, with very limited treatment and prevention options. The development of improved therapeutics and vaccines depends on a better understanding of the unique biology of these organisms, and understanding their biology, in turn, requires the ability to track and manipulate the products of genes. In this work, we describe new methods that are available to essentially any laboratory and applicable to any parasite isolate for easily and rapidly editing the genomes of kinetoplastid parasites. We demonstrate that these methods provide the means to quickly assess function, including that of the products of essential genes and potential targets of drugs, and to tag gene products at their endogenous loci. This is all achieved without gene cloning or drug selection. We expect this advance to enable investigations, especially in Trypanosoma cruzi and Leishmania spp., that have eluded investigators for decades. Copyright © 2017 Soares Medeiros et al.
Kakui, Yasutaka; Sunaga, Tomonari; Arai, Kunio; Dodgson, James; Ji, Liang; Csikász-Nagy, Attila; Carazo-Salas, Rafael; Sato, Masamitsu
2015-01-01
Integration of an external gene into a fission yeast chromosome is useful to investigate the effect of the gene product. An easy way to knock-in a gene construct is use of an integration plasmid, which can be targeted and inserted to a chromosome through homologous recombination. Despite the advantage of integration, construction of integration plasmids is energy- and time-consuming, because there is no systematic library of integration plasmids with various promoters, fluorescent protein tags, terminators and selection markers; therefore, researchers are often forced to make appropriate ones through multiple rounds of cloning procedures. Here, we establish materials and methods to easily construct integration plasmids. We introduce a convenient cloning system based on Golden Gate DNA shuffling, which enables the connection of multiple DNA fragments at once: any kind of promoters and terminators, the gene of interest, in combination with any fluorescent protein tag genes and any selection markers. Each of those DNA fragments, called a ‘module’, can be tandemly ligated in the order we desire in a single reaction, which yields a circular plasmid in a one-step manner. The resulting plasmids can be integrated through standard methods for transformation. Thus, these materials and methods help easy construction of knock-in strains, and this will further increase the value of fission yeast as a model organism. PMID:26108218
Gene regulatory network identification from the yeast cell cycle based on a neuro-fuzzy system.
Wang, B H; Lim, J W; Lim, J S
2016-08-30
Many studies exist for reconstructing gene regulatory networks (GRNs). In this paper, we propose a method based on an advanced neuro-fuzzy system, for gene regulatory network reconstruction from microarray time-series data. This approach uses a neural network with a weighted fuzzy function to model the relationships between genes. Fuzzy rules, which determine the regulators of genes, are very simplified through this method. Additionally, a regulator selection procedure is proposed, which extracts the exact dynamic relationship between genes, using the information obtained from the weighted fuzzy function. Time-series related features are extracted from the original data to employ the characteristics of temporal data that are useful for accurate GRN reconstruction. The microarray dataset of the yeast cell cycle was used for our study. We measured the mean squared prediction error for the efficiency of the proposed approach and evaluated the accuracy in terms of precision, sensitivity, and F-score. The proposed method outperformed the other existing approaches.
Efficient production of artificially designed gelatins with a Bacillus brevis system.
Kajino, T; Takahashi, H; Hirai, M; Yamada, Y
2000-01-01
Artificially designed gelatins comprising tandemly repeated 30-amino-acid peptide units derived from human alphaI collagen were successfully produced with a Bacillus brevis system. The DNA encoding the peptide unit was synthesized by taking into consideration the codon usage of the host cells, but no clones having a tandemly repeated gene were obtained through the above-mentioned strategy. Minirepeat genes could be selected in vivo from a mixture of every possible sequence encoding an artificial gelatin by randomly ligating the mixed sequence unit and transforming it into Escherichia coli. Larger repeat genes constructed by connecting minirepeat genes obtained by in vivo selection were also stable in the expression host cells. Gelatins derived from the eight-unit and six-unit repeat genes were extracellularly produced at the level of 0.5 g/liter and easily purified by ammonium sulfate fractionation and anion-exchange chromatography. The purified artificial gelatins had the predicted N-terminal sequences and amino acid compositions and a solgel property similar to that of the native gelatin. These results suggest that the selection of a repeat unit sequence stable in an expression host is a shortcut for the efficient production of repetitive proteins and that it can conveniently be achieved by the in vivo selection method. This study revealed the possible industrial application of artificially designed repetitive proteins.
Gene-culture coevolution in the age of genomics
Richerson, Peter J.; Boyd, Robert; Henrich, Joseph
2010-01-01
The use of socially learned information (culture) is central to human adaptations. We investigate the hypothesis that the process of cultural evolution has played an active, leading role in the evolution of genes. Culture normally evolves more rapidly than genes, creating novel environments that expose genes to new selective pressures. Many human genes that have been shown to be under recent or current selection are changing as a result of new environments created by cultural innovations. Some changed in response to the development of agricultural subsistence systems in the Early and Middle Holocene. Alleles coding for adaptations to diets rich in plant starch (e.g., amylase copy number) and to epidemic diseases evolved as human populations expanded (e.g., sickle cell and G6PD deficiency alleles that provide protection against malaria). Large-scale scans using patterns of linkage disequilibrium to detect recent selection suggest that many more genes evolved in response to agriculture. Genetic change in response to the novel social environment of contemporary modern societies is also likely to be occurring. The functional effects of most of the alleles under selection during the last 10,000 years are currently unknown. Also unknown is the role of paleoenvironmental change in regulating the tempo of hominin evolution. Although the full extent of culture-driven gene-culture coevolution is thus far unknown for the deeper history of the human lineage, theory and some evidence suggest that such effects were profound. Genomic methods promise to have a major impact on our understanding of gene-culture coevolution over the span of hominin evolutionary history. PMID:20445092
Zhang, Bo; Zhang, Lin; Dai, Ruixue; Yu, Meiying; Zhao, Guoping; Ding, Xiaoming
2013-01-01
Streptomyces bacteria are known for producing important natural compounds by secondary metabolism, especially antibiotics with novel biological activities. Functional studies of antibiotic-biosynthesizing gene clusters are generally through homologous genomic recombination by gene-targeting vectors. Here, we present a rapid and efficient method for construction of gene-targeting vectors. This approach is based on Streptomyces phage φBT1 integrase-mediated multisite in vitro site-specific recombination. Four 'entry clones' were assembled into a circular plasmid to generate the destination gene-targeting vector by a one-step reaction. The four 'entry clones' contained two clones of the upstream and downstream flanks of the target gene, a selectable marker and an E. coli-Streptomyces shuttle vector. After targeted modification of the genome, the selectable markers were removed by φC31 integrase-mediated in vivo site-specific recombination between pre-placed attB and attP sites. Using this method, part of the calcium-dependent antibiotic (CDA) and actinorhodin (Act) biosynthetic gene clusters were deleted, and the rrdA encoding RrdA, a negative regulator of Red production, was also deleted. The final prodiginine production of the engineered strain was over five times that of the wild-type strain. This straightforward φBT1 and φC31 integrase-based strategy provides an alternative approach for rapid gene-targeting vector construction and marker removal in streptomycetes.
van der Geize, R.; de Jong, W.; Hessels, G. I.; Grommen, A. W. F.; Jacobs, A. A. C.; Dijkhuizen, L.
2008-01-01
A novel method to efficiently generate unmarked in-frame gene deletions in Rhodococcus equi was developed, exploiting the cytotoxic effect of 5-fluorocytosine (5-FC) by the action of cytosine deaminase (CD) and uracil phosphoribosyltransferase (UPRT) enzymes. The opportunistic, intracellular pathogen R. equi is resistant to high concentrations of 5-FC. Introduction of Escherichia coli genes encoding CD and UPRT conferred conditional lethality to R. equi cells incubated with 5-FC. To exemplify the use of the codA::upp cassette as counter-selectable marker, an unmarked in-frame gene deletion mutant of R. equi was constructed. The supA and supB genes, part of a putative cholesterol catabolic gene cluster, were efficiently deleted from the R. equi wild-type genome. Phenotypic analysis of the generated ΔsupAB mutant confirmed that supAB are essential for growth of R. equi on cholesterol. Macrophage survival assays revealed that the ΔsupAB mutant is able to survive and proliferate in macrophages comparable to wild type. Thus, cholesterol metabolism does not appear to be essential for macrophage survival of R. equi. The CD-UPRT based 5-FC counter-selection may become a useful asset in the generation of unmarked in-frame gene deletions in other actinobacteria as well, as actinobacteria generally appear to be 5-FC resistant and 5-FU sensitive. PMID:18984616
Loudig, Olivier; Brandwein-Gensler, Margaret; Kim, Ryung S; Lin, Juan; Isayeva, Tatyana; Liu, Christina; Segall, Jeffrey E; Kenny, Paraic A; Prystowsky, Michael B
2011-12-01
High-throughput gene expression profiling from formalin-fixed, paraffin-embedded tissues has become a reality, and several methods are now commercially available. The Illumina whole-genome complementary DNA-mediated annealing, selection, extension and ligation assay (Illumina, Inc) is a full-transcriptome version of the original 512-gene complementary DNA-mediated annealing, selection, extension and ligation assay, allowing high-throughput profiling of 24,526 annotated genes from degraded and formalin-fixed, paraffin-embedded RNA. This assay has the potential to allow identification of novel gene signatures associated with clinical outcome using banked archival pathology specimen resources. We tested the reproducibility of the whole-genome complementary DNA-mediated annealing, selection, extension and ligation assay and its sensitivity for detecting differentially expressed genes in RNA extracted from matched fresh and formalin-fixed, paraffin-embedded cells, after 1 and 13 months of storage, using the human breast cell lines MCF7 and MCF10A. Then, using tumor worst pattern of invasion as a classifier, 1 component of the "risk model," we selected 12 formalin-fixed, paraffin-embedded oral squamous cell carcinomas for whole-genome complementary DNA-mediated annealing, selection, extension and ligation assay analysis. We profiled 5 tumors with nonaggressive, nondispersed pattern of invasion, and 7 tumors with aggressive dispersed pattern of invasion and satellites scattered at least 1 mm apart. To minimize variability, the formalin-fixed, paraffin-embedded specimens were prepared from snap-frozen tissues, and RNA was obtained within 24 hours of fixation. One hundred four down-regulated genes and 72 up-regulated genes in tumors with aggressive dispersed pattern of invasion were identified. We performed quantitative reverse transcriptase polymerase chain reaction validation of 4 genes using Taqman assays and in situ protein detection of 1 gene by immunohistochemistry. Functional cluster analysis of genes up-regulated in tumors with aggressive pattern of invasion suggests presence of genes involved in cellular cytoarchitecture, some of which already associated with tumor invasion. Identification of these genes provides biologic rationale for our histologic classification, with regard to tumor invasion, and demonstrates that the whole-genome complementary DNA-mediated annealing, selection, extension and ligation assay is a powerful assay for profiling degraded RNA from archived specimens when combined with quantitative reverse transcriptase polymerase chain reaction validation. Copyright © 2011 Elsevier Inc. All rights reserved.
Screening of duplicated loci reveals hidden divergence patterns in a complex salmonid genome
Limborg, Morten T.; Larson, Wesley; Seeb, Lisa W.; Seeb, James E.
2017-01-01
A whole-genome duplication (WGD) doubles the entire genomic content of a species and is thought to have catalysed adaptive radiation in some polyploid-origin lineages. However, little is known about general consequences of a WGD because gene duplicates (i.e., paralogs) are commonly filtered in genomic studies; such filtering may remove substantial portions of the genome in data sets from polyploid-origin species. We demonstrate a new method that enables genome-wide scans for signatures of selection at both nonduplicated and duplicated loci by taking locus-specific copy number into account. We apply this method to RAD sequence data from different ecotypes of a polyploid-origin salmonid (Oncorhynchus nerka) and reveal signatures of divergent selection that would have been missed if duplicated loci were filtered. We also find conserved signatures of elevated divergence at pairs of homeologous chromosomes with residual tetrasomic inheritance, suggesting that joint evolution of some nondiverged gene duplicates may affect the adaptive potential of these genes. These findings illustrate that including duplicated loci in genomic analyses enables novel insights into the evolutionary consequences of WGDs and local segmental gene duplications.
Integrative Analysis of High-throughput Cancer Studies with Contrasted Penalization
Shi, Xingjie; Liu, Jin; Huang, Jian; Zhou, Yong; Shia, BenChang; Ma, Shuangge
2015-01-01
In cancer studies with high-throughput genetic and genomic measurements, integrative analysis provides a way to effectively pool and analyze heterogeneous raw data from multiple independent studies and outperforms “classic” meta-analysis and single-dataset analysis. When marker selection is of interest, the genetic basis of multiple datasets can be described using the homogeneity model or the heterogeneity model. In this study, we consider marker selection under the heterogeneity model, which includes the homogeneity model as a special case and can be more flexible. Penalization methods have been developed in the literature for marker selection. This study advances from the published ones by introducing the contrast penalties, which can accommodate the within- and across-dataset structures of covariates/regression coefficients and, by doing so, further improve marker selection performance. Specifically, we develop a penalization method that accommodates the across-dataset structures by smoothing over regression coefficients. An effective iterative algorithm, which calls an inner coordinate descent iteration, is developed. Simulation shows that the proposed method outperforms the benchmark with more accurate marker identification. The analysis of breast cancer and lung cancer prognosis studies with gene expression measurements shows that the proposed method identifies genes different from those using the benchmark and has better prediction performance. PMID:24395534
Wang, Genhong; Chen, Yanfei; Zhang, Xiaoying; Bai, Bingchuan; Yan, Hao; Qin, Daoyuan; Xia, Qingyou
2018-06-01
The silkworm, Bombyx mori, is one of the world's most economically important insect. Surveying variations in gene expression among multiple tissue/organ samples will provide clues for gene function assignments and will be helpful for identifying genes related to economic traits or specific cellular processes. To ensure their accuracy, commonly used gene expression quantification methods require a set of stable reference genes for data normalization. In this study, 24 candidate reference genes were assessed in 10 tissue/organ samples of day 3 fifth-instar B. mori larvae using geNorm and NormFinder. The results revealed that, using the combination of the expression of BGIBMGA003186 and BGIBMGA008209 was the optimum choice for normalizing the expression data of the B. mori tissue/organ samples. The most stable gene, BGIBMGA003186, is recommended if just one reference gene is used. Moreover, the commonly used reference gene encoding cytoplasmic actin was the least appropriate reference gene of the samples investigated. The reliability of the selected reference genes was further confirmed by evaluating the expression profiles of two cathepsin genes. Our results may be useful for future studies involving the quantification of relative gene expression levels of different tissue/organ samples in B. mori. © 2018 Wiley Periodicals, Inc.
Selecting and validating reference genes for quantitative real-time PCR in Plutella xylostella (L.).
You, Yanchun; Xie, Miao; Vasseur, Liette; You, Minsheng
2018-05-01
Gene expression analysis provides important clues regarding gene functions, and quantitative real-time PCR (qRT-PCR) is a widely used method in gene expression studies. Reference genes are essential for normalizing and accurately assessing gene expression. In the present study, 16 candidate reference genes (ACTB, CyPA, EF1-α, GAPDH, HSP90, NDPk, RPL13a, RPL18, RPL19, RPL32, RPL4, RPL8, RPS13, RPS4, α-TUB, and β-TUB) from Plutella xylostella were selected to evaluate gene expression stability across different experimental conditions using five statistical algorithms (geNorm, NormFinder, Delta Ct, BestKeeper, and RefFinder). The results suggest that different reference genes or combinations of reference genes are suitable for normalization in gene expression studies of P. xylostella according to the different developmental stages, strains, tissues, and insecticide treatments. Based on the given experimental sets, the most stable reference genes were RPS4 across different developmental stages, RPL8 across different strains and tissues, and EF1-α across different insecticide treatments. A comprehensive and systematic assessment of potential reference genes for gene expression normalization is essential for post-genomic functional research in P. xylostella, a notorious pest with worldwide distribution and a high capacity to adapt and develop resistance to insecticides.
Zhang, Shutao; Chen, Chun; Xie, Tingna; Ye, Sudan
2017-01-01
The selection of stable reference genes is a critical step for the accurate quantification of gene expression. To identify and validate the reference genes in Pandora neoaphidis-an obligate aphid pathogenic fungus-the expression of 13classical candidate reference genes were evaluated by quantitative real-time reverse transcriptase polymerase chain reaction(qPCR) at four developmental stages (conidia, conidia with germ tubes, short hyphae and elongated hyphae). Four statistical algorithms, including geNorm, NormFinder, BestKeeper and Delta Ct method were used to rank putative reference genes according to their expression stability and indicate the best reference gene or combination of reference genes for accurate normalization. The analysis of comprehensive ranking revealed that ACT1and 18Swas the most stably expressed genes throughout the developmental stages. To further validate the suitability of the reference genes identified in this study, the expression of cell division control protein 25 (CDC25) and Chitinase 1(CHI1) genes were used to further confirm the validated candidate reference genes. Our study presented the first systematic study of reference gene(s) selection for P. neoaphidis study and provided guidelines to obtain more accurate qPCR results for future developmental efforts.
A Penalized Robust Method for Identifying Gene-Environment Interactions
Shi, Xingjie; Liu, Jin; Huang, Jian; Zhou, Yong; Xie, Yang; Ma, Shuangge
2015-01-01
In high-throughput studies, an important objective is to identify gene-environment interactions associated with disease outcomes and phenotypes. Many commonly adopted methods assume specific parametric or semiparametric models, which may be subject to model mis-specification. In addition, they usually use significance level as the criterion for selecting important interactions. In this study, we adopt the rank-based estimation, which is much less sensitive to model specification than some of the existing methods and includes several commonly encountered data and models as special cases. Penalization is adopted for the identification of gene-environment interactions. It achieves simultaneous estimation and identification and does not rely on significance level. For computation feasibility, a smoothed rank estimation is further proposed. Simulation shows that under certain scenarios, for example with contaminated or heavy-tailed data, the proposed method can significantly outperform the existing alternatives with more accurate identification. We analyze a lung cancer prognosis study with gene expression measurements under the AFT (accelerated failure time) model. The proposed method identifies interactions different from those using the alternatives. Some of the identified genes have important implications. PMID:24616063
Lu, Tao
2016-01-01
The gene regulation network (GRN) evaluates the interactions between genes and look for models to describe the gene expression behavior. These models have many applications; for instance, by characterizing the gene expression mechanisms that cause certain disorders, it would be possible to target those genes to block the progress of the disease. Many biological processes are driven by nonlinear dynamic GRN. In this article, we propose a nonparametric differential equation (ODE) to model the nonlinear dynamic GRN. Specially, we address following questions simultaneously: (i) extract information from noisy time course gene expression data; (ii) model the nonlinear ODE through a nonparametric smoothing function; (iii) identify the important regulatory gene(s) through a group smoothly clipped absolute deviation (SCAD) approach; (iv) test the robustness of the model against possible shortening of experimental duration. We illustrate the usefulness of the model and associated statistical methods through a simulation and a real application examples.
Boscariol, R L; Almeida, W A B; Derbyshire, M T V C; Mourão Filho, F A A; Mendes, B M J
2003-09-01
A new method for obtaining transgenic sweet orange plants was developed in which positive selection (Positech) based on the Escherichia coli phosphomannose-isomerase (PMI) gene as the selectable marker gene and mannose as the selective agent was used. Epicotyl segments from in vitro-germinated plants of Valencia, Hamlin, Natal and Pera sweet oranges were inoculated with Agrobacterium tumefaciens EHA101-pNOV2116 and subsequently selected on medium supplemented with different concentrations of mannose or with a combination of mannose and sucrose as a carbon source. Genetic transformation was confirmed by PCR and Southern blot. The transgene expression was evaluated using a chlorophenol red assay and isoenzymes. The transformation efficiency rate ranged from 3% to 23.8%, depending on cultivar. This system provides an efficient manner for selecting transgenic sweet orange plants without using antibiotics or herbicides.
Lin, Liyuan; Han, Xiaojiao; Chen, Yicun; Wu, Qingke; Wang, Yangdong
2013-12-01
Quantitative real-time PCR has emerged as a highly sensitive and widely used method for detection of gene expression profiles, via which accurate detection depends on reliable normalization. Since no single control is appropriate for all experimental treatments, it is generally advocated to select suitable internal controls prior to use for normalization. This study reported the evaluation of the expression stability of twelve potential reference genes in different tissue/organs and six fruit developmental stages of Litsea cubeba in order to screen the superior internal reference genes for data normalization. Two softwares-geNorm, and NormFinder-were used to identify stability of these candidate genes. The cycle threshold difference and coefficient of variance were also calculated to evaluate the expression stability of candidate genes. F-BOX, EF1α, UBC, and TUA were selected as the most stable reference genes across 11 sample pools. F-BOX, EF1α, and EIF4α exhibited the highest expression stability in different tissue/organs and different fruit developmental stages. Besides, a combination of two stable reference genes would be sufficient for gene expression normalization in different fruit developmental stages. In addition, the relative expression profiles of DXS and DXR were evaluated by EF1α, UBC, and SAMDC. The results further validated the reliability of stable reference genes and also highlighted the importance of selecting suitable internal controls for L. cubeba. These reference genes will be of great importance for transcript normalization in future gene expression studies on L. cubeba.
Qiu, Huazhang; Wu, Namei; Zheng, Yanjie; Chen, Min; Weng, Shaohuang; Chen, Yuanzhong; Lin, Xinhua
2015-01-01
A robust and versatile signal-on fluorescence sensing strategy was developed to provide label-free detection of various target analytes. The strategy used SYBR Green I dye and graphene oxide as signal reporter and signal-to-background ratio enhancer, respectively. Multidrug resistance protein 1 (MDR1) gene and mercury ion (Hg2+) were selected as target analytes to investigate the generality of the method. The linear relationship and specificity of the detections showed that the sensitive and selective analyses of target analytes could be achieved by the proposed strategy with low detection limits of 0.5 and 2.2 nM for MDR1 gene and Hg2+, respectively. Moreover, the strategy was used to detect real samples. Analytical results of MDR1 gene in the serum indicated that the developed method is a promising alternative approach for real applications in complex systems. Furthermore, the recovery of the proposed method for Hg2+ detection was acceptable. Thus, the developed label-free signal-on fluorescence sensing strategy exhibited excellent universality, sensitivity, and handling convenience. PMID:25565810
Xue, Hongqi; Wu, Shuang; Wu, Yichao; Ramirez Idarraga, Juan C; Wu, Hulin
2018-05-02
Mechanism-driven low-dimensional ordinary differential equation (ODE) models are often used to model viral dynamics at cellular levels and epidemics of infectious diseases. However, low-dimensional mechanism-based ODE models are limited for modeling infectious diseases at molecular levels such as transcriptomic or proteomic levels, which is critical to understand pathogenesis of diseases. Although linear ODE models have been proposed for gene regulatory networks (GRNs), nonlinear regulations are common in GRNs. The reconstruction of large-scale nonlinear networks from time-course gene expression data remains an unresolved issue. Here, we use high-dimensional nonlinear additive ODEs to model GRNs and propose a 4-step procedure to efficiently perform variable selection for nonlinear ODEs. To tackle the challenge of high dimensionality, we couple the 2-stage smoothing-based estimation method for ODEs and a nonlinear independence screening method to perform variable selection for the nonlinear ODE models. We have shown that our method possesses the sure screening property and it can handle problems with non-polynomial dimensionality. Numerical performance of the proposed method is illustrated with simulated data and a real data example for identifying the dynamic GRN of Saccharomyces cerevisiae. Copyright © 2018 John Wiley & Sons, Ltd.
Taherikalani, Morovat; Mohammadzad, Mohammad Reza; Soroush, Setareh; Maleki, Mohammad Hossein; Azizi-Jalilian, Farid; Pakzad, Iraj; Sadeghifard, Nourkhoda; Asadollahi, Parisa; Emaneini, Mohammad; Monjezi, Aazam; Alikhani, Mohammad Yousef
2016-04-01
Methicillin-resistant Staphylococcus aureus (MRSA) is one of the most important pathogens worldwide and compared to other staphylococcal species that are associated with higher mortality rate. A total of 500 Staphylococcus spp. was collected from selected hospitals in Ilam, Kermanshah, Khorram Abad and Hamadan cities and, via phenotypic and genotypic methods, was assessed to find MRSA. The presence or absence of prevalent antibiotic resistance genes and virulence genes was evaluated among MRSA isolates, using polymerase chain reaction (PCR) method, and then the SCCmec typing of these isolates was assayed by multiplex PCR. A total of 372 (74.4%) Stapylococcus spp. isolates were identified as S. aureus, among which 200 (53.8%) possessed the mecA gene and were distinguished as MRSA. All of MRSA isolates contained blaZ gene. The frequency of ermA and ermC genes among erythromycin-resistant MRSA isolates was 21.6% and 66.7%, respectively. The frequency of the virulence genes eta, hla and sea among MRSA isolates was 10%, 80.5% and 100%, respectively. SCCmec type IV accounted for 30.6% of the MRSA isolates and SCCmec type III, SCCmec type II and SCCmec type I accounted for 30%, 22% and 17.5% of the isolates, respectively. The antibiotic resistance genes and the virulence genes of blaZ, hla, sea, eta and ermC had high frequencies among the MRSA isolates. This study showed that the antibiotic resistance genes had higher frequencies among SCCmec types I and IV, which confirms the previous reports in this field.
SNP discovery in candidate adaptive genes using exon capture in a free-ranging alpine ungulate
Roffler, Gretchen H.; Amish, Stephen J.; Smith, Seth; Cosart, Ted F.; Kardos, Marty; Schwartz, Michael K.; Luikart, Gordon
2016-01-01
Identification of genes underlying genomic signatures of natural selection is key to understanding adaptation to local conditions. We used targeted resequencing to identify SNP markers in 5321 candidate adaptive genes associated with known immunological, metabolic and growth functions in ovids and other ungulates. We selectively targeted 8161 exons in protein-coding and nearby 5′ and 3′ untranslated regions of chosen candidate genes. Targeted sequences were taken from bighorn sheep (Ovis canadensis) exon capture data and directly from the domestic sheep genome (Ovis aries v. 3; oviAri3). The bighorn sheep sequences used in the Dall's sheep (Ovis dalli dalli) exon capture aligned to 2350 genes on the oviAri3 genome with an average of 2 exons each. We developed a microfluidic qPCR-based SNP chip to genotype 476 Dall's sheep from locations across their range and test for patterns of selection. Using multiple corroborating approaches (lositan and bayescan), we detected 28 SNP loci potentially under selection. We additionally identified candidate loci significantly associated with latitude, longitude, precipitation and temperature, suggesting local environmental adaptation. The three methods demonstrated consistent support for natural selection on nine genes with immune and disease-regulating functions (e.g. Ovar-DRA, APC, BATF2, MAGEB18), cell regulation signalling pathways (e.g. KRIT1, PI3K, ORRC3), and respiratory health (CYSLTR1). Characterizing adaptive allele distributions from novel genetic techniques will facilitate investigation of the influence of environmental variation on local adaptation of a northern alpine ungulate throughout its range. This research demonstrated the utility of exon capture for gene-targeted SNP discovery and subsequent SNP chip genotyping using low-quality samples in a nonmodel species.
Computational selection of antibody-drug conjugate targets for breast cancer
Fauteux, François; Hill, Jennifer J.; Jaramillo, Maria L.; Pan, Youlian; Phan, Sieu; Famili, Fazel; O'Connor-McCourt, Maureen
2016-01-01
The selection of therapeutic targets is a critical aspect of antibody-drug conjugate research and development. In this study, we applied computational methods to select candidate targets overexpressed in three major breast cancer subtypes as compared with a range of vital organs and tissues. Microarray data corresponding to over 8,000 tissue samples were collected from the public domain. Breast cancer samples were classified into molecular subtypes using an iterative ensemble approach combining six classification algorithms and three feature selection techniques, including a novel kernel density-based method. This feature selection method was used in conjunction with differential expression and subcellular localization information to assemble a primary list of targets. A total of 50 cell membrane targets were identified, including one target for which an antibody-drug conjugate is in clinical use, and six targets for which antibody-drug conjugates are in clinical trials for the treatment of breast cancer and other solid tumors. In addition, 50 extracellular proteins were identified as potential targets for non-internalizing strategies and alternative modalities. Candidate targets linked with the epithelial-to-mesenchymal transition were identified by analyzing differential gene expression in epithelial and mesenchymal tumor-derived cell lines. Overall, these results show that mining human gene expression data has the power to select and prioritize breast cancer antibody-drug conjugate targets, and the potential to lead to new and more effective cancer therapeutics. PMID:26700623
A Simple Test Identifies Selection on Complex Traits.
Beissinger, Tim; Kruppa, Jochen; Cavero, David; Ha, Ngoc-Thuy; Erbe, Malena; Simianer, Henner
2018-05-01
Important traits in agricultural, natural, and human populations are increasingly being shown to be under the control of many genes that individually contribute only a small proportion of genetic variation. However, the majority of modern tools in quantitative and population genetics, including genome-wide association studies and selection-mapping protocols, are designed to identify individual genes with large effects. We have developed an approach to identify traits that have been under selection and are controlled by large numbers of loci. In contrast to existing methods, our technique uses additive-effects estimates from all available markers, and relates these estimates to allele-frequency change over time. Using this information, we generate a composite statistic, denoted [Formula: see text] which can be used to test for significant evidence of selection on a trait. Our test requires pre- and postselection genotypic data but only a single time point with phenotypic information. Simulations demonstrate that [Formula: see text] is powerful for identifying selection, particularly in situations where the trait being tested is controlled by many genes, which is precisely the scenario where classical approaches for selection mapping are least powerful. We apply this test to breeding populations of maize and chickens, where we demonstrate the successful identification of selection on traits that are documented to have been under selection. Copyright © 2018 Beissinger et al.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, Liyou; Yi, T. Y.; Van Nostrand, Joy
Phylogenetic analyses were done for the Shewanella strains isolated from Baltic Sea (38 strains), US DOE Hanford Uranium bioremediation site [Hanford Reach of the Columbia River (HRCR), 11 strains], Pacific Ocean and Hawaiian sediments (8 strains), and strains from other resources (16 strains) with three out group strains, Rhodopseudomonas palustris, Clostridium cellulolyticum, and Thermoanaerobacter ethanolicus X514, using DNA relatedness derived from WCGA-based DNA-DNA hybridizations, sequence similarities of 16S rRNA gene and gyrB gene, and sequence similarities of 6 loci of Shewanella genome selected from a shared gene list of the Shewanella strains with whole genome sequenced based on the averagemore » nucleotide identity of them (ANI). The phylogenetic trees based on 16S rRNA and gyrB gene sequences, and DNA relatedness derived from WCGA hybridizations of the tested Shewanella strains share exactly the same sub-clusters with very few exceptions, in which the strains were basically grouped by species. However, the phylogenetic analysis based on DNA relatedness derived from WCGA hybridizations dramatically increased the differentiation resolution at species and strains level within Shewanella genus. When the tree based on DNA relatedness derived from WCGA hybridizations was compared to the tree based on the combined sequences of the selected functional genes (6 loci), we found that the resolutions of both methods are similar, but the clustering of the tree based on DNA relatedness derived from WMGA hybridizations was clearer. These results indicate that WCGA-based DNA-DNA hybridization is an idea alternative of conventional DNA-DNA hybridization methods and it is superior to the phylogenetics methods based on sequence similarities of single genes. Detailed analysis is being performed for the re-classification of the strains examined.« less
Methods in Molecular Biology Mouse Genetics: Methods and Protocols | Center for Cancer Research
Mouse Genetics: Methods and Protocols provides selected mouse genetic techniques and their application in modeling varieties of human diseases. The chapters are mainly focused on the generation of different transgenic mice to accomplish the manipulation of genes of interest, tracing cell lineages, and modeling human diseases.
Sensory trait variation in an echolocating bat suggests roles for both selection and plasticity
2014-01-01
Background Across heterogeneous environments selection and gene flow interact to influence the rate and extent of adaptive trait evolution. This complex relationship is further influenced by the rarely considered role of phenotypic plasticity in the evolution of adaptive population variation. Plasticity can be adaptive if it promotes colonization and survival in novel environments and in doing so may increase the potential for future population differentiation via selection. Gene flow between selectively divergent environments may favour the evolution of phenotypic plasticity or conversely, plasticity itself may promote gene flow, leading to a pattern of trait differentiation in the presence of gene flow. Variation in sensory traits is particularly informative in testing the role of environment in trait and population differentiation. Here we test the hypothesis of ‘adaptive differentiation with minimal gene flow’ in resting echolocation frequencies (RF) of Cape horseshoe bats (Rhinolophus capensis) across a gradient of increasingly cluttered habitats. Results Our analysis reveals a geographically structured pattern of increasing RF from open to highly cluttered habitats in R. capensis; however genetic drift appears to be a minor player in the processes influencing this pattern. Although Bayesian analysis of population structure uncovered a number of spatially defined mitochondrial groups and coalescent methods revealed regional-scale gene flow, phylogenetic analysis of mitochondrial sequences did not correlate with RF differentiation. Instead, habitat discontinuities between biomes, and not genetic and geographic distances, best explained echolocation variation in this species. We argue that both selection for increased detection distance in relatively less cluttered habitats and adaptive phenotypic plasticity may have influenced the evolution of matched echolocation frequencies and habitats across different populations. Conclusions Our study reveals significant sensory trait differentiation in the presence of historical gene flow and suggests roles for both selection and plasticity in the evolution of echolocation variation in R. capensis. These results highlight the importance of population level analyses to i) illuminate the subtle interplay between selection, plasticity and gene flow in the evolution of adaptive traits and ii) demonstrate that evolutionary processes may act simultaneously and that their relative influence may vary across different environments. PMID:24674227
Sensory trait variation in an echolocating bat suggests roles for both selection and plasticity.
Odendaal, Lizelle J; Jacobs, David S; Bishop, Jacqueline M
2014-03-27
Across heterogeneous environments selection and gene flow interact to influence the rate and extent of adaptive trait evolution. This complex relationship is further influenced by the rarely considered role of phenotypic plasticity in the evolution of adaptive population variation. Plasticity can be adaptive if it promotes colonization and survival in novel environments and in doing so may increase the potential for future population differentiation via selection. Gene flow between selectively divergent environments may favour the evolution of phenotypic plasticity or conversely, plasticity itself may promote gene flow, leading to a pattern of trait differentiation in the presence of gene flow. Variation in sensory traits is particularly informative in testing the role of environment in trait and population differentiation. Here we test the hypothesis of 'adaptive differentiation with minimal gene flow' in resting echolocation frequencies (RF) of Cape horseshoe bats (Rhinolophus capensis) across a gradient of increasingly cluttered habitats. Our analysis reveals a geographically structured pattern of increasing RF from open to highly cluttered habitats in R. capensis; however genetic drift appears to be a minor player in the processes influencing this pattern. Although Bayesian analysis of population structure uncovered a number of spatially defined mitochondrial groups and coalescent methods revealed regional-scale gene flow, phylogenetic analysis of mitochondrial sequences did not correlate with RF differentiation. Instead, habitat discontinuities between biomes, and not genetic and geographic distances, best explained echolocation variation in this species. We argue that both selection for increased detection distance in relatively less cluttered habitats and adaptive phenotypic plasticity may have influenced the evolution of matched echolocation frequencies and habitats across different populations. Our study reveals significant sensory trait differentiation in the presence of historical gene flow and suggests roles for both selection and plasticity in the evolution of echolocation variation in R. capensis. These results highlight the importance of population level analyses to i) illuminate the subtle interplay between selection, plasticity and gene flow in the evolution of adaptive traits and ii) demonstrate that evolutionary processes may act simultaneously and that their relative influence may vary across different environments.
Cotton transformation via pollen tube pathway.
Wang, Min; Zhang, Baohong; Wang, Qinglian
2013-01-01
Although many gene transfer methods have been employed for successfully obtaining transgenic cotton, the major constraint in cotton improvement is the limitation of genotype because the majority of transgenic methods require plant regeneration from a single transformed cell which is limited by cotton tissue culture. Comparing with other plant species, it is difficult to induce plant regeneration from cotton; currently, only a limited number of cotton cultivars can be cultured for obtaining regenerated plants. Thus, development of a simple and genotype-independent genetic transformation method is particularly important for cotton community. In this chapter, we present a simple, cost-efficient, and genotype-independent cotton transformation method-pollen tube pathway-mediated transformation. This method uses pollen tube pathway to deliver transgene into cotton embryo sacs and then insert foreign genes into cotton genome. There are three major steps for pollen tube pathway-mediated genetic transformation, which include injection of -foreign genes into pollen tube, integration of foreign genes into plant genome, and selection of transgenic plants.
In silico identification of novel ligands for G-quadruplex in the c- MYC promoter
NASA Astrophysics Data System (ADS)
Kang, Hyun-Jin; Park, Hyun-Ju
2015-04-01
G-quadruplex DNA formed in NHEIII1 region of oncogene promoter inhibits transcription of the genes. In this study, virtual screening combining pharmacophore-based search and structure-based docking screening was conducted to discover ligands binding to G-quadruplex in promoter region of c- MYC. Several hit ligands showed the selective PCR-arresting effects for oligonucleotide containing c- MYC G-quadruplex forming sequence. Among them, three hits selectively inhibited cell proliferation and decreased c- MYC mRNA level in Ramos cells, where NHEIII1 is included in translocated c- MYC gene for overexpression. Promoter assay using two kinds of constructs with wild-type and mutant sequences showed that interaction of these ligands with the G-quadruplex resulted in turning-off of the reporter gene. In conclusion, combined virtual screening methods were successfully used for discovery of selective c- MYC promoter G-quadruplex binders with anticancer activity.
Zhou, Bangjun; Zeng, Lirong
2017-01-01
Virus-induced gene silencing (VIGS) has been used in many plant species as an attractive post transcriptional gene silencing (PTGS) method for studying gene function either individually or at large-scale in a high-throughput manner. However, the specificity and efficiency for knocking down members of a highly homologous gene family have remained to date a significant challenge in VIGS due to silencing of off-targets. Here we present an improved method for the selection and evaluation of gene fragments used for VIGS to specifically and efficiently knock down members of a highly homologous gene family. Using this method, we knocked down twelve and four members, respectively of group III of the gene family encoding ubiquitin-conjugating enzymes (E2) in Nicotiana benthamiana . Assays using these VIGS-treated plants revealed that the group III E2s are essential for plant development, plant immunity-associated reactive oxygen species (ROS) production, expression of the gene NbRbohB that is required for ROS production, and suppression of immunity-associated programmed cell death (PCD) by AvrPtoB, an effector protein of the bacterial pathogen Pseudomons syringae . Moreover, functional redundancy for plant development and ROS production was found to exist among members of group III E2s. We have found that employment of a gene fragment as short as approximately 70 base pairs (bp) that contains at least three mismatched nucleotides to other genes within any 21-bp sequences prevents silencing of off-target(s) in VIGS. This improved approach in the selection and evaluation of gene fragments allows for specific and efficient knocking down of highly homologous members of a gene family. Using this approach, we implicated N. benthamiana group III E2s in plant development, immunity-associated ROS production, and suppression of multiple immunity-associated PCD by AvrPtoB. We also unraveled functional redundancy among group III members in their requirement for plant development and plant immunity-associated ROS production.
Phylogeny and evolution of Newcastle disease virus genotypes isolated in Asia during 2008-2011.
Ebrahimi, Mohammad Majid; Shahsavandi, Shahla; Moazenijula, Gholamreza; Shamsara, Mahdi
2012-08-01
The full-length fusion (F) genes of 51 Newcastle disease (ND) strains isolated from chickens in Asia during the period 2008-2011 were genetically analyzed. Phylogenetic analysis showed that genotype VII of NDV still predominant in the domestic poultry of Asia. The sub-genotype VIIb circulated in the Iran and Indian sub-continent countries, whereas VIId sub-genotype existed in Far East countries. The non-synonymous to synonymous substitutions ratio was calculated 0.27 for VIId sub-genotype and 0.51 for VIIb sub-genotype indicates purifying/stabilizing selection which resulted in a low evolution rate in F gene of VIIb sub-genotype. There is evidence of localized positive selection when comparing these sub-genotypes protein sequences. Five codons in F gene of ND viruses had a posterior probability >90% using the Bayesian method, indicating these sites were under positive selection. To identify sites under positive selection; amino acid substitution classified depends on their radicalism and neutrality. The results indicate that although most positions were under purifying selection and can be eliminated, a few positions located in sub-genotype specific regions were subject to positive selection.
Bessonov, Kyrylo; Walkey, Christopher J.; Shelp, Barry J.; van Vuuren, Hennie J. J.; Chiu, David; van der Merwe, George
2013-01-01
Analyzing time-course expression data captured in microarray datasets is a complex undertaking as the vast and complex data space is represented by a relatively low number of samples as compared to thousands of available genes. Here, we developed the Interdependent Correlation Clustering (ICC) method to analyze relationships that exist among genes conditioned on the expression of a specific target gene in microarray data. Based on Correlation Clustering, the ICC method analyzes a large set of correlation values related to gene expression profiles extracted from given microarray datasets. ICC can be applied to any microarray dataset and any target gene. We applied this method to microarray data generated from wine fermentations and selected NSF1, which encodes a C2H2 zinc finger-type transcription factor, as the target gene. The validity of the method was verified by accurate identifications of the previously known functional roles of NSF1. In addition, we identified and verified potential new functions for this gene; specifically, NSF1 is a negative regulator for the expression of sulfur metabolism genes, the nuclear localization of Nsf1 protein (Nsf1p) is controlled in a sulfur-dependent manner, and the transcription of NSF1 is regulated by Met4p, an important transcriptional activator of sulfur metabolism genes. The inter-disciplinary approach adopted here highlighted the accuracy and relevancy of the ICC method in mining for novel gene functions using complex microarray datasets with a limited number of samples. PMID:24130853
Master, Adam; Wójcicka, Anna; Giżewska, Kamilla; Popławski, Piotr; Williams, Graham R.; Nauman, Alicja
2016-01-01
Background Translational control is a mechanism of protein synthesis regulation emerging as an important target for new therapeutics. Naturally occurring microRNAs and synthetic small inhibitory RNAs (siRNAs) are the most recognized regulatory molecules acting via RNA interference. Surprisingly, recent studies have shown that interfering RNAs may also activate gene transcription via the newly discovered phenomenon of small RNA-induced gene activation (RNAa). Thus far, the small activating RNAs (saRNAs) have only been demonstrated as promoter-specific transcriptional activators. Findings We demonstrate that oligonucleotide-based trans-acting factors can also specifically enhance gene expression at the level of protein translation by acting at sequence-specific targets within the messenger RNA 5’-untranslated region (5’UTR). We designed a set of short synthetic oligonucleotides (dGoligos), specifically targeting alternatively spliced 5’UTRs in transcripts expressed from the THRB and CDKN2A suppressor genes. The in vitro translation efficiency of reporter constructs containing alternative TRβ1 5’UTRs was increased by up to more than 55-fold following exposure to specific dGoligos. Moreover, we found that the most folded 5’UTR has higher translational regulatory potential when compared to the weakly folded TRβ1 variant. This suggests such a strategy may be especially applied to enhance translation from relatively inactive transcripts containing long 5’UTRs of complex structure. Significance This report represents the first method for gene-specific translation enhancement using selective trans-acting factors designed to target specific 5’UTR cis-acting elements. This simple strategy may be developed further to complement other available methods for gene expression regulation including gene silencing. The dGoligo-mediated translation-enhancing approach has the potential to be transferred to increase the translation efficiency of any suitable target gene and may have future application in gene therapy strategies to enhance expression of proteins including tumor suppressors. PMID:27171412
A trait stacking system via intra-genomic homologous recombination.
Kumar, Sandeep; Worden, Andrew; Novak, Stephen; Lee, Ryan; Petolino, Joseph F
2016-11-01
A gene targeting method has been developed, which allows the conversion of 'breeding stacks', containing unlinked transgenes into a 'molecular stack' and thereby circumventing the breeding challenges associated with transgene segregation. A gene targeting method has been developed for converting two unlinked trait loci into a single locus transgene stack. The method utilizes intra-genomic homologous recombination (IGHR) between stably integrated target and donor loci which share sequence homology and nuclease cleavage sites whereby the donor contains a promoterless herbicide resistance transgene. Upon crossing with a zinc finger nuclease (ZFN)-expressing plant, double-strand breaks (DSB) are created in both the stably integrated target and donor loci. DSBs flanking the donor locus result in intra-genomic mobilization of a promoterless selectable marker-containing donor sequence, which can be utilized as a template for homology-directed repair of a concomitant DSB at the target locus resulting in a functional selectable marker via nuclease-mediated cassette exchange (NMCE). The method was successfully demonstrated in maize using a glyphosate tolerance gene as a donor whereby up to 3.3 % of the resulting progeny embryos cultured on selection medium regenerated plants with the donor sequence integrated into the target locus. The process could be extended to multiple cycles of trait stacking by virtue of a unique intron sequence homology for NMCE between the target and the donor loci. This is the first report that describes NMCE via IGHR, thereby enabling trait stacking using conventional crossing.
Khalil, M A; Sonbol, F I
2014-01-01
The objective was to investigate the biofilm-forming capacity of methicillin resistant Staphylococcus aureus (MRSA) isolated from eye lenses of infected patients. A total of 32 MRSA isolated from contact lenses of patients with ocular infections were screened for their biofilm-forming capacity using tube method (TM), Congo red agar (CRA), and microtiter plate (MtP) methods. The effect of some stress factor on the biofilm formation was studied. The biofilm-forming related genes, icaA, icaD and 10 microbial surface components that recognize adhesive matrix molecule (MSCRAMM), of the selected MRSA were also detected using polymerase chain reaction. Of 32 MRSA isolates, 34.37%, 59.37%, and 81.25% showed positive results using CRA, TM or MtP, respectively. Biofilm production was found to be reduced in the presence of ethanol or ethylenediaminetetraacetic acid and at extreme pH values. On the other hand, glucose or heparin leads to a concentration dependent increase of biofilm production by the isolates. The selected biofilm producing MRSA isolate was found to harbor the icaA, icaD and up to nine of 10 tested MSCRAMM genes, whereas the selected non biofilm producing MRSA isolate did not carry any of the tested genes. The MtP method was found to be the most effective phenotypic screening method for detection of biofilm formation by MRSA. Furthermore, the molecular approach should be taken into consideration for the rapid and correct diagnosis of virulent bacteria associated with contact eye lenses.
van Eck, Herman J; Vos, Peter G; Valkonen, Jari P T; Uitdewilligen, Jan G A M L; Lensing, Hellen; de Vetten, Nick; Visser, Richard G F
2017-03-01
The method of graphical genotyping is applied to a panel of tetraploid potato cultivars to visualize haplotype sharing. The method allowed to map genes involved in virus and nematode resistance. The physical coordinates of the amount of linkage drag surrounding these genes are easily interpretable. Graphical genotyping is a visually attractive and easily interpretable method to represent genetic marker data. In this paper, the method is extended from diploids to a panel of tetraploid potato cultivars. Application of filters to select a subset of SNPs allows one to visualize haplotype sharing between individuals that also share a specific locus. The method is illustrated with cultivars resistant to Potato virus Y (PVY), while simultaneously selecting for the absence of the SNPs in susceptible clones. SNP data will then merge into an image which displays the coordinates of a distal genomic region on the northern arm of chromosome 11 where a specific haplotype is introgressed from the wild potato species S. stoloniferum (CPC 2093) carrying a gene (Ny (o,n)sto ) conferring resistance to two PVY strains, PVY O and PVY NTN . Graphical genotyping was also successful in showing the haplotypes on chromosome 12 carrying Ry-f sto , another resistance gene derived from S. stoloniferum conferring broad-spectrum resistance to PVY, as well as chromosome 5 haplotypes from S. vernei, with the Gpa5 locus involved in resistance against Globodera pallida cyst nematodes. The image also shows shortening of linkage drag by meiotic recombination of the introgression segment in more recent breeding material. Identity-by-descent was found to be a requirement for using graphical genotyping, which is proposed as a non-statistical alternative method for gene discovery, as compared with genome-wide association studies. The potential and limitations of the method are discussed.
Analysis of high-throughput biological data using their rank values.
Dembélé, Doulaye
2018-01-01
High-throughput biological technologies are routinely used to generate gene expression profiling or cytogenetics data. To achieve high performance, methods available in the literature become more specialized and often require high computational resources. Here, we propose a new versatile method based on the data-ordering rank values. We use linear algebra, the Perron-Frobenius theorem and also extend a method presented earlier for searching differentially expressed genes for the detection of recurrent copy number aberration. A result derived from the proposed method is a one-sample Student's t-test based on rank values. The proposed method is to our knowledge the only that applies to gene expression profiling and to cytogenetics data sets. This new method is fast, deterministic, and requires a low computational load. Probabilities are associated with genes to allow a statistically significant subset selection in the data set. Stability scores are also introduced as quality parameters. The performance and comparative analyses were carried out using real data sets. The proposed method can be accessed through an R package available from the CRAN (Comprehensive R Archive Network) website: https://cran.r-project.org/web/packages/fcros .
Meta-Analysis of Tumor Stem-Like Breast Cancer Cells Using Gene Set and Network Analysis
Lee, Won Jun; Kim, Sang Cheol; Yoon, Jung-Ho; Yoon, Sang Jun; Lim, Johan; Kim, You-Sun; Kwon, Sung Won; Park, Jeong Hill
2016-01-01
Generally, cancer stem cells have epithelial-to-mesenchymal-transition characteristics and other aggressive properties that cause metastasis. However, there have been no confident markers for the identification of cancer stem cells and comparative methods examining adherent and sphere cells are widely used to investigate mechanism underlying cancer stem cells, because sphere cells have been known to maintain cancer stem cell characteristics. In this study, we conducted a meta-analysis that combined gene expression profiles from several studies that utilized tumorsphere technology to investigate tumor stem-like breast cancer cells. We used our own gene expression profiles along with the three different gene expression profiles from the Gene Expression Omnibus, which we combined using the ComBat method, and obtained significant gene sets using the gene set analysis of our datasets and the combined dataset. This experiment focused on four gene sets such as cytokine-cytokine receptor interaction that demonstrated significance in both datasets. Our observations demonstrated that among the genes of four significant gene sets, six genes were consistently up-regulated and satisfied the p-value of < 0.05, and our network analysis showed high connectivity in five genes. From these results, we established CXCR4, CXCL1 and HMGCS1, the intersecting genes of the datasets with high connectivity and p-value of < 0.05, as significant genes in the identification of cancer stem cells. Additional experiment using quantitative reverse transcription-polymerase chain reaction showed significant up-regulation in MCF-7 derived sphere cells and confirmed the importance of these three genes. Taken together, using meta-analysis that combines gene set and network analysis, we suggested CXCR4, CXCL1 and HMGCS1 as candidates involved in tumor stem-like breast cancer cells. Distinct from other meta-analysis, by using gene set analysis, we selected possible markers which can explain the biological mechanisms and suggested network analysis as an additional criterion for selecting candidates. PMID:26870956
Effects of threshold on the topology of gene co-expression networks.
Couto, Cynthia Martins Villar; Comin, César Henrique; Costa, Luciano da Fontoura
2017-09-26
Several developments regarding the analysis of gene co-expression profiles using complex network theory have been reported recently. Such approaches usually start with the construction of an unweighted gene co-expression network, therefore requiring the selection of a suitable threshold defining which pairs of vertices will be connected. We aimed at addressing such an important problem by suggesting and comparing five different approaches for threshold selection. Each of the methods considers a respective biologically-motivated criterion for electing a potentially suitable threshold. A set of 21 microarray experiments from different biological groups was used to investigate the effect of applying the five proposed criteria to several biological situations. For each experiment, we used the Pearson correlation coefficient to measure the relationship between each gene pair, and the resulting weight matrices were thresholded considering several values, generating respective adjacency matrices (co-expression networks). Each of the five proposed criteria was then applied in order to select the respective threshold value. The effects of these thresholding approaches on the topology of the resulting networks were compared by using several measurements, and we verified that, depending on the database, the impact on the topological properties can be large. However, a group of databases was verified to be similarly affected by most of the considered criteria. Based on such results, it can be suggested that when the generated networks present similar measurements, the thresholding method can be chosen with greater freedom. If the generated networks are markedly different, the thresholding method that better suits the interests of each specific research study represents a reasonable choice.
Kuramae, Eiko E; Robert, Vincent; Echavarri-Erasun, Carlos; Boekhout, Teun
2007-01-01
Background The construction of robust and well resolved phylogenetic trees is important for our understanding of many, if not all biological processes, including speciation and origin of higher taxa, genome evolution, metabolic diversification, multicellularity, origin of life styles, pathogenicity and so on. Many older phylogenies were not well supported due to insufficient phylogenetic signal present in the single or few genes used in phylogenetic reconstructions. Importantly, single gene phylogenies were not always found to be congruent. The phylogenetic signal may, therefore, be increased by enlarging the number of genes included in phylogenetic studies. Unfortunately, concatenation of many genes does not take into consideration the evolutionary history of each individual gene. Here, we describe an approach to select informative phylogenetic proteins to be used in the Tree of Life (TOL) and barcoding projects by comparing the cophenetic correlation coefficients (CCC) among individual protein distance matrices of proteins, using the fungi as an example. The method demonstrated that the quality and number of concatenated proteins is important for a reliable estimation of TOL. Approximately 40–45 concatenated proteins seem needed to resolve fungal TOL. Results In total 4852 orthologous proteins (KOGs) were assigned among 33 fungal genomes from the Asco- and Basidiomycota and 70 of these represented single copy proteins. The individual protein distance matrices based on 531 concatenated proteins that has been used for phylogeny reconstruction before [14] were compared one with another in order to select those with the highest CCC, which then was used as a reference. This reference distance matrix was compared with those of the 70 single copy proteins selected and their CCC values were calculated. Sixty four KOGs showed a CCC above 0.50 and these were further considered for their phylogenetic potential. Proteins belonging to the cellular processes and signaling KOG category seem more informative than those belonging to the other three categories: information storage and processing; metabolism; and the poorly characterized category. After concatenation of 40 proteins the topology of the phylogenetic tree remained stable, but after concatenation of 60 or more proteins the bootstrap support values of some branches decreased, most likely due to the inclusion of proteins with lowers CCC values. The selection of protein sequences to be used in various TOL projects remains a critical and important process. The method described in this paper will contribute to a more objective selection of phylogenetically informative protein sequences. Conclusion This study provides candidate protein sequences to be considered as phylogenetic markers in different branches of fungal TOL. The selection procedure described here will be useful to select informative protein sequences to resolve branches of TOL that contain few or no species with completely sequenced genomes. The robust phylogenetic trees resulting from this method may contribute to our understanding of organismal diversification processes. The method proposed can be extended easily to other branches of TOL. PMID:17688684
Chee-Sanford, Joanne C; Mackie, Roderick I; Koike, Satoshi; Krapac, Ivan G; Lin, Yu-Feng; Yannarell, Anthony C; Maxwell, Scott; Aminov, Rustam I
2009-01-01
Antibiotics are used in animal livestock production for therapeutic treatment of disease and at subtherapeutic levels for growth promotion and improvement of feed efficiency. It is estimated that approximately 75% of antibiotics are not absorbed by animals and are excreted in waste. Antibiotic resistance selection occurs among gastrointestinal bacteria, which are also excreted in manure and stored in waste holding systems. Land application of animal waste is a common disposal method used in the United States and is a means for environmental entry of both antibiotics and genetic resistance determinants. Concerns for bacterial resistance gene selection and dissemination of resistance genes have prompted interest about the concentrations and biological activity of drug residues and break-down metabolites, and their fate and transport. Fecal bacteria can survive for weeks to months in the environment, depending on species and temperature, however, genetic elements can persist regardless of cell viability. Phylogenetic analyses indicate antibiotic resistance genes have evolved, although some genes have been maintained in bacteria before the modern antibiotic era. Quantitative measurements of drug residues and levels of resistance genes are needed, in addition to understanding the environmental mechanisms of genetic selection, gene acquisition, and the spatiotemporal dynamics of these resistance genes and their bacterial hosts. This review article discusses an accumulation of findings that address aspects of the fate, transport, and persistence of antibiotics and antibiotic resistance genes in natural environments, with emphasis on mechanisms pertaining to soil environments following land application of animal waste effluent.
CHAI, Lian En; LAW, Chow Kuan; MOHAMAD, Mohd Saberi; CHONG, Chuii Khim; CHOON, Yee Wen; DERIS, Safaai; ILLIAS, Rosli Md
2014-01-01
Background: Gene expression data often contain missing expression values. Therefore, several imputation methods have been applied to solve the missing values, which include k-nearest neighbour (kNN), local least squares (LLS), and Bayesian principal component analysis (BPCA). However, the effects of these imputation methods on the modelling of gene regulatory networks from gene expression data have rarely been investigated and analysed using a dynamic Bayesian network (DBN). Methods: In the present study, we separately imputed datasets of the Escherichia coli S.O.S. DNA repair pathway and the Saccharomyces cerevisiae cell cycle pathway with kNN, LLS, and BPCA, and subsequently used these to generate gene regulatory networks (GRNs) using a discrete DBN. We made comparisons on the basis of previous studies in order to select the gene network with the least error. Results: We found that BPCA and LLS performed better on larger networks (based on the S. cerevisiae dataset), whereas kNN performed better on smaller networks (based on the E. coli dataset). Conclusion: The results suggest that the performance of each imputation method is dependent on the size of the dataset, and this subsequently affects the modelling of the resultant GRNs using a DBN. In addition, on the basis of these results, a DBN has the capacity to discover potential edges, as well as display interactions, between genes. PMID:24876803
Moreira, Viviane S; Soares, Virgínia L F; Silva, Raner J S; Sousa, Aurizangela O; Otoni, Wagner C; Costa, Marcio G C
2018-05-01
Bixa orellana L., popularly known as annatto, produces several secondary metabolites of pharmaceutical and industrial interest, including bixin, whose molecular basis of biosynthesis remain to be determined. Gene expression analysis by quantitative real-time PCR (qPCR) is an important tool to advance such knowledge. However, correct interpretation of qPCR data requires the use of suitable reference genes in order to reduce experimental variations. In the present study, we have selected four different candidates for reference genes in B. orellana , coding for 40S ribosomal protein S9 (RPS9), histone H4 (H4), 60S ribosomal protein L38 (RPL38) and 18S ribosomal RNA (18SrRNA). Their expression stabilities in different tissues (e.g. flower buds, flowers, leaves and seeds at different developmental stages) were analyzed using five statistical tools (NormFinder, geNorm, BestKeeper, ΔCt method and RefFinder). The results indicated that RPL38 is the most stable gene in different tissues and stages of seed development and 18SrRNA is the most unstable among the analyzed genes. In order to validate the candidate reference genes, we have analyzed the relative expression of a target gene coding for carotenoid cleavage dioxygenase 1 (CCD1) using the stable RPL38 and the least stable gene, 18SrRNA , for normalization of the qPCR data. The results demonstrated significant differences in the interpretation of the CCD1 gene expression data, depending on the reference gene used, reinforcing the importance of the correct selection of reference genes for normalization.
Van den Eynden, Jimmy; Fierro, Ana Carolina; Verbeke, Lieven P C; Marchal, Kathleen
2015-04-23
With the advances in high throughput technologies, increasing amounts of cancer somatic mutation data are being generated and made available. Only a small number of (driver) mutations occur in driver genes and are responsible for carcinogenesis, while the majority of (passenger) mutations do not influence tumour biology. In this study, SomInaClust is introduced, a method that accurately identifies driver genes based on their mutation pattern across tumour samples and then classifies them into oncogenes or tumour suppressor genes respectively. SomInaClust starts from the observation that oncogenes mainly contain mutations that, due to positive selection, cluster at similar positions in a gene across patient samples, whereas tumour suppressor genes contain a high number of protein-truncating mutations throughout the entire gene length. The method was shown to prioritize driver genes in 9 different solid cancers. Furthermore it was found to be complementary to existing similar-purpose methods with the additional advantages that it has a higher sensitivity, also for rare mutations (occurring in less than 1% of all samples), and it accurately classifies candidate driver genes in putative oncogenes and tumour suppressor genes. Pathway enrichment analysis showed that the identified genes belong to known cancer signalling pathways, and that the distinction between oncogenes and tumour suppressor genes is biologically relevant. SomInaClust was shown to detect candidate driver genes based on somatic mutation patterns of inactivation and clustering and to distinguish oncogenes from tumour suppressor genes. The method could be used for the identification of new cancer genes or to filter mutation data for further data-integration purposes.
The complexity of selection at the major primate beta-defensin locus.
Semple, Colin A M; Maxwell, Alison; Gautier, Philippe; Kilanowski, Fiona M; Eastwood, Hayden; Barran, Perdita E; Dorin, Julia R
2005-05-18
We have examined the evolution of the genes at the major human beta-defensin locus and the orthologous loci in a range of other primates and mouse. For the first time these data allow us to examine selective episodes in the more recent evolutionary history of this locus as well as the ancient past. We have used a combination of maximum likelihood based tests and a maximum parsimony based sliding window approach to give a detailed view of the varying modes of selection operating at this locus. We provide evidence for strong positive selection soon after the duplication of these genes within an ancestral mammalian genome. Consequently variable selective pressures have acted on beta-defensin genes in different evolutionary lineages, with episodes both of negative, and more rarely positive selection, during the divergence of primates. Positive selection appears to have been more common in the rodent lineage, accompanying the birth of novel, rodent-specific beta-defensin genes. These observations allow a fuller understanding of the evolution of mammalian innate immunity. In both the rodent and primate lineages, sites in the second exon have been subject to positive selection and by implication are important in functional diversity. A small number of sites in the mature human peptides were found to have undergone repeated episodes of selection in different primate lineages. Particular sites were consistently implicated by multiple methods at positions throughout the mature peptides. These sites are clustered at positions predicted to be important for the specificity of the antimicrobial or chemoattractant properties of beta-defensins. Surprisingly, sites within the prepropeptide region were also implicated as being subject to significant positive selection, suggesting previously unappreciated functional significance for this region. Identification of these putatively functional sites has important implications for our understanding of beta-defensin function and for novel antibiotic design.
Genomewide analysis of TCP transcription factor gene family in Malus domestica.
Xu, Ruirui; Sun, Peng; Jia, Fengjuan; Lu, Longtao; Li, Yuanyuan; Zhang, Shizhong; Huang, Jinguang
2014-12-01
Teosinte branched 1/cycloidea/proliferating cell factor 1 (TCP) proteins are a large family of transcriptional regulators in angiosperms. They are involved in various biological processes, including development and plant metabolism pathways. In this study, a total of 52 TCP genes were identified in apple (Malus domestica) genome. Bioinformatic methods were employed to predicate and analyse their relevant gene classification, gene structure, chromosome location, sequence alignment and conserved domains of MdTCP proteins. Expression analysis from microarray data showed that the expression levels of 28 and 51 MdTCP genes changed during the ripening and rootstock-scion interaction processes, respectively. The expression patterns of 12 selected MdTCP genes were analysed in different tissues and in response to abiotic stresses. All of the selected genes were detected in at least one of the tissues tested, and most of them were modulated by adverse treatments indicating that the MdTCPs were involved in various developmental and physiological processes. To the best of our knowledge, this is the first study of a genomewide analysis of apple TCP gene family. These results provide valuable information for studies on functions of the TCP transcription factor genes in apple.
Statistical Inference of Selection and Divergence of the Rice Blast Resistance Gene Pi-ta
Amei, Amei; Lee, Seonghee; Mysore, Kirankumar S.; Jia, Yulin
2014-01-01
The resistance gene Pi-ta has been effectively used to control rice blast disease, but some populations of cultivated and wild rice have evolved resistance. Insights into the evolutionary processes that led to this resistance during crop domestication may be inferred from the population history of domesticated and wild rice strains. In this study, we applied a recently developed statistical method, time-dependent Poisson random field model, to examine the evolution of the Pi-ta gene in cultivated and weedy rice. Our study suggests that the Pi-ta gene may have more recently introgressed into cultivated rice, indica and japonica, and U.S. weedy rice from the wild species, O. rufipogon. In addition, the Pi-ta gene is under positive selection in japonica, tropical japonica, U.S. cultivars and U.S. weedy rice. We also found that sequences of two domains of the Pi-ta gene, the nucleotide binding site and leucine-rich repeat domain, are highly conserved among all rice accessions examined. Our results provide a valuable analytical tool for understanding the evolution of disease resistance genes in crop plants. PMID:25335927
The development of a cisgenic apple plant.
Vanblaere, Thalia; Szankowski, Iris; Schaart, Jan; Schouten, Henk; Flachowsky, Henryk; Broggini, Giovanni A L; Gessler, Cesare
2011-07-20
Cisgenesis represents a step toward a new generation of GM crops. The lack of selectable genes (e.g. antibiotic or herbicide resistance) in the final product and the fact that the inserted gene(s) derive from organisms sexually compatible with the target crop should rise less environmental concerns and increase consumer's acceptance. Here we report the generation of a cisgenic apple plant by inserting the endogenous apple scab resistance gene HcrVf2 under the control of its own regulatory sequences into the scab susceptible apple cultivar Gala. A previously developed method based on Agrobacterium-mediated transformation combined with a positive and negative selection system and a chemically inducible recombination machinery allowed the generation of apple cv. Gala carrying the scab resistance gene HcrVf2 under its native regulatory sequences and no foreign genes. Three cisgenic lines were chosen for detailed investigation and were shown to carry a single T-DNA insertion and express the target gene HcrVf2. This is the first report of the generation of a true cisgenic plant. Copyright © 2011 Elsevier B.V. All rights reserved.
Molecular Biology of Anaerobic Aromatic Biodegradation.
1992-08-14
manipulate and clone genes for aromatic acid degradation from the bacterium, Rhodopseudomonas palustris . These tools have enabled us to identify genes...anaerobic degradation of two selected aromatic acids - benzoate and 4-hydroxybenzoate - by one bacterial species - Rhodopseudomonas palustris . Our...PUBLICATIONS. Papers: Gibson, J., J. F. Geissler, and C. S. Harwood. 1990. Benzoate-coenzyme A ligase from Rhodopseudomonas palustris . Methods in Enzymology
Guyon, Laurent; Lajaunie, Christian; Fer, Frédéric; Bhajun, Ricky; Sulpice, Eric; Pinna, Guillaume; Campalans, Anna; Radicella, J Pablo; Rouillier, Philippe; Mary, Mélissa; Combe, Stéphanie; Obeid, Patricia; Vert, Jean-Philippe; Gidrol, Xavier
2015-09-18
Phenotypic screening monitors phenotypic changes induced by perturbations, including those generated by drugs or RNA interference. Currently-used methods for scoring screen hits have proven to be problematic, particularly when applied to physiologically relevant conditions such as low cell numbers or inefficient transfection. Here, we describe the Φ-score, which is a novel scoring method for the identification of phenotypic modifiers or hits in cell-based screens. Φ-score performance was assessed with simulations, a validation experiment and its application to gene identification in a large-scale RNAi screen. Using robust statistics and a variance model, we demonstrated that the Φ-score showed better sensitivity, selectivity and reproducibility compared to classical approaches. The improved performance of the Φ-score paves the way for cell-based screening of primary cells, which are often difficult to obtain from patients in sufficient numbers. We also describe a dedicated merging procedure to pool scores from small interfering RNAs targeting the same gene so as to provide improved visualization and hit selection.
Guyon, Laurent; Lajaunie, Christian; fer, Frédéric; bhajun, Ricky; sulpice, Eric; pinna, Guillaume; campalans, Anna; radicella, J. Pablo; rouillier, Philippe; mary, Mélissa; combe, Stéphanie; obeid, Patricia; vert, Jean-Philippe; gidrol, Xavier
2015-01-01
Phenotypic screening monitors phenotypic changes induced by perturbations, including those generated by drugs or RNA interference. Currently-used methods for scoring screen hits have proven to be problematic, particularly when applied to physiologically relevant conditions such as low cell numbers or inefficient transfection. Here, we describe the Φ-score, which is a novel scoring method for the identification of phenotypic modifiers or hits in cell-based screens. Φ-score performance was assessed with simulations, a validation experiment and its application to gene identification in a large-scale RNAi screen. Using robust statistics and a variance model, we demonstrated that the Φ-score showed better sensitivity, selectivity and reproducibility compared to classical approaches. The improved performance of the Φ-score paves the way for cell-based screening of primary cells, which are often difficult to obtain from patients in sufficient numbers. We also describe a dedicated merging procedure to pool scores from small interfering RNAs targeting the same gene so as to provide improved visualization and hit selection. PMID:26382112
A statistical approach to identify, monitor, and manage incomplete curated data sets.
Howe, Douglas G
2018-04-02
Many biological knowledge bases gather data through expert curation of published literature. High data volume, selective partial curation, delays in access, and publication of data prior to the ability to curate it can result in incomplete curation of published data. Knowing which data sets are incomplete and how incomplete they are remains a challenge. Awareness that a data set may be incomplete is important for proper interpretation, to avoiding flawed hypothesis generation, and can justify further exploration of published literature for additional relevant data. Computational methods to assess data set completeness are needed. One such method is presented here. In this work, a multivariate linear regression model was used to identify genes in the Zebrafish Information Network (ZFIN) Database having incomplete curated gene expression data sets. Starting with 36,655 gene records from ZFIN, data aggregation, cleansing, and filtering reduced the set to 9870 gene records suitable for training and testing the model to predict the number of expression experiments per gene. Feature engineering and selection identified the following predictive variables: the number of journal publications; the number of journal publications already attributed for gene expression annotation; the percent of journal publications already attributed for expression data; the gene symbol; and the number of transgenic constructs associated with each gene. Twenty-five percent of the gene records (2483 genes) were used to train the model. The remaining 7387 genes were used to test the model. One hundred and twenty-two and 165 of the 7387 tested genes were identified as missing expression annotations based on their residuals being outside the model lower or upper 95% confidence interval respectively. The model had precision of 0.97 and recall of 0.71 at the negative 95% confidence interval and precision of 0.76 and recall of 0.73 at the positive 95% confidence interval. This method can be used to identify data sets that are incompletely curated, as demonstrated using the gene expression data set from ZFIN. This information can help both database resources and data consumers gauge when it may be useful to look further for published data to augment the existing expertly curated information.
2010-01-01
Background Feral sheep are considered to be a source of genetic variation that has been lost from their domestic counterparts through selection. Methods This study investigates variation in the genes KRTAP1-1, KRT33, ADRB3 and DQA2 in Merino-like feral sheep populations from New Zealand and its offshore islands. These genes have previously been shown to influence wool, lamb survival and animal health. Results All the genes were polymorphic, but no new allele was identified in the feral populations. In some of these populations, allele frequencies differed from those observed in commercial Merino sheep and other breeds found in New Zealand. Heterozygosity levels were comparable to those observed in other studies on feral sheep. Our results suggest that some of the feral populations may have been either inbred or outbred over the duration of their apparent isolation. Conclusion The variation described here allows us to draw some conclusions about the likely genetic origin of the populations and selective pressures that may have acted upon them, but they do not appear to be a source of new genetic material, at least for these four genes. PMID:21176141
Kim, Seungjin; Krajmalnik-Brown, Rosa; Kim, Jong-Oh; Chung, Jinwook
2014-11-01
The application of effective remediation technologies can benefit from adequate preliminary testing, such as in lab-scale and Pilot-scale systems. Bioremediation technologies have demonstrated tremendous potential with regards to cost, but they cannot be used for all contaminated sites due to limitations in biological activity. The purpose of this study was to develop a DNA diagnostic method that reduces the time to select contaminated sites that are good candidates for bioremediation. We applied an oligonucleotide microarray method to detect and monitor genes that lead to aliphatic and aromatic degradation. Further, the bioremediation of a contaminated site, selected based on the results of the genetic diagnostic method, was achieved successfully by applying bioslurping in field tests. This gene-based diagnostic technique is a powerful tool to evaluate the potential for bioremediation in petroleum hydrocarbon contaminated soil. Copyright © 2014 Elsevier B.V. All rights reserved.
Organic acid-tolerant microorganisms and uses thereof for producing organic acids
Pfleger, Brian Frederick; Begemann, Matthew Brett
2014-05-06
Organic acid-tolerant microorganisms and methods of using same. The organic acid-tolerant microorganisms comprise modifications that reduce or ablate AcsA activity or AcsA homolog activity. The modifications increase tolerance of the microorganisms to such organic acids as 3-hydroxypropionic acid (3HP), acrylic acid, and propionic acid. Further modifications to the microorganisms such as increasing expression of malonyl-CoA reductase and/or acetyl-CoA carboxylase provide or increase the ability of the microorganisms to produce 3HP. Methods of generating an organic acid with the modified microorganisms are provided. Methods of using acsA or homologs thereof as counter-selectable markers include replacing acsA or homologs thereof in cells with genes of interest and selecting for the cells comprising the genes of interest with amounts of organic acids effective to inhibit growth of cells harboring acsA or the homologs.
Agrobacterium tumefaciens-mediated transformation of oleaginous yeast Lipomyces species
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dai, Ziyu; Deng, Shuang; Culley, David E.
Background: Because of interest in the production of renewable bio-hydrocarbon fuels, various living organisms have been explored for their potential use in producing fuels and chemicals. The oil-producing (oleaginous) yeast Lipomyces starkeyi is the subject of active research regarding the production of lipids using a wide variety of carbon and nutrient sources. The genome of L. starkeyi has been published, which opens the door to production strain improvements using the tools of synthetic biology and metabolic engineering. However, using these tools for strain improvement requires the establishment of effective and reliable transformation methods with suitable selectable markers (antibiotic resistance ormore » auxotrophic marker genes) and the necessary genetic elements (promoters and terminators) for expression of introduced genes. Chemical-based methods have been published, but suffer from low efficiency or the requirement for targeting to rRNA loci. To address these problems, Agrobacterium-mediated transformation was investigated as an alternative method for L. starkeyi and other Lipomyces species. Results: In this study, Agrobacterium-mediated transformation was demonstrated to be effective in the transformation of both L. starkeyi and other Lipomyces species and that the introduced DNA can be reliably integrated into the chromosomes of these species. The gene deletion of Ku70 and Pex10 was also demonstrated in L. starkeyi. In addition to the bacterial antibiotic selection marker gene hygromycin B phosphotransferase, the bacterial -glucuronidase reporter gene under the control of L. starkeyi translation elongation factor 1 promoter was also stably expressed in seven different Lipomyces species. Conclusion: The results from this study clearly demonstrate that Agrobacterium-mediated transformation is a reliable genetic tool for gene deletion and integration and expression of heterologous genes in L. starkeyi and other Lipomyces species.« less
Agrobacterium tumefaciens-mediated transformation of oleaginous yeast Lipomyces species.
Dai, Ziyu; Deng, Shuang; Culley, David E; Bruno, Kenneth S; Magnuson, Jon K
2017-08-01
Interest in using renewable sources of carbon, especially lignocellulosic biomass, for the production of hydrocarbon fuels and chemicals has fueled interest in exploring various organisms capable of producing hydrocarbon biofuels and chemicals or their precursors. The oleaginous (oil-producing) yeast Lipomyces starkeyi is the subject of active research regarding the production of triacylglycerides as hydrocarbon fuel precursors using a variety of carbohydrate and nutrient sources. The genome of L. starkeyi has been published, which opens the door to production strain improvements through the development and use of the tools of synthetic biology for this oleaginous species. The first step in establishment of synthetic biology tools for an organism is the development of effective and reliable transformation methods with suitable selectable marker genes and demonstration of the utility of the genetic elements needed for expression of introduced genes or deletion of endogenous genes. Chemical-based methods of transformation have been published but suffer from low efficiency. To address these problems, Agrobacterium-mediated transformation was investigated as an alternative method for L. starkeyi and other Lipomyces species. In this study, Agrobacterium-mediated transformation was demonstrated to be effective in the transformation of both L. starkeyi and other Lipomyces species. The deletion of the peroxisomal biogenesis factor 10 gene was also demonstrated in L. starkeyi. In addition to the bacterial antibiotic selection marker gene hygromycin B phosphotransferase, the bacterial β-glucuronidase reporter gene under the control of L. starkeyi translation elongation factor 1α promoter was also stably expressed in six different Lipomyces species. The results from this study demonstrate that Agrobacterium-mediated transformation is a reliable and effective genetic tool for homologous recombination and expression of heterologous genes in L. starkeyi and other Lipomyces species.
Kimura, Hiroyuki; Ishibashi, Jun-Ichiro; Masuda, Harue; Kato, Kenji; Hanada, Satoshi
2007-04-01
International drilling projects for the study of microbial communities in the deep-subsurface hot biosphere have been expanded. Core samples obtained by deep drilling are commonly contaminated with mesophilic microorganisms in the drilling fluid, making it difficult to examine the microbial community by 16S rRNA gene clone library analysis. To eliminate mesophilic organism contamination, we previously developed a new method (selective phylogenetic analysis [SePA]) based on the strong correlation between the guanine-plus-cytosine (G+C) contents of the 16S rRNA genes and the optimal growth temperatures of prokaryotes, and we verified the method's effectiveness (H. Kimura, M. Sugihara, K. Kato, and S. Hanada, Appl. Environ. Microbiol. 72:21-27, 2006). In the present study we ascertained SePA's ability to eliminate contamination by archaeal rRNA genes, using deep-sea hydrothermal fluid (117 degrees C) and surface seawater (29.9 degrees C) as substitutes for deep-subsurface geothermal samples and drilling fluid, respectively. Archaeal 16S rRNA gene fragments, PCR amplified from the surface seawater, were denatured at 82 degrees C and completely digested with exonuclease I (Exo I), while gene fragments from the deep-sea hydrothermal fluid remained intact after denaturation at 84 degrees C because of their high G+C contents. An examination using mixtures of DNAs from the two environmental samples showed that denaturation at 84 degrees C and digestion with Exo I completely eliminated archaeal 16S rRNA genes from the surface seawater. Our method was quite useful for culture-independent community analysis of hyperthermophilic archaea in core samples recovered from deep-subsurface geothermal environments.
Hernández-Hernández, Tania; Martínez-Castilla, León Patricio; Alvarez-Buylla, Elena R
2007-02-01
B-class MADS-box genes have been shown to be the key regulators of petal and stamen specification in several eudicot model species such as Arabidopsis thaliana, Antirrhinum majus, and Petunia hybrida. Orthologs of these genes have been found across angiosperms and gymnosperms, and it is thought that the basic regulatory function of B proteins is conserved in seed plant lineages. The evolution of B genes is characterized by numerous duplications that might represent key elements fostering the functional diversification of duplicates with a deep impact on their role in the evolution of the floral developmental program. To evaluate this, we performed a rigorous statistical analysis with B gene sequences. Using maximum likelihood and Bayesian methods, we estimated molecular substitution rates and determined the selective regimes operating at each residue of B proteins. We implemented tests that rely on phylogenetic hypotheses and codon substitution models to detect significant differences in substitution rates (DSRs) and sites under positive adaptive selection (PS) in specific lineages before and after duplication events. With these methods, we identified several protein residues fixed by PS shortly after the origin of PISTILLATA-like and APETALA3-like lineages in angiosperms and shortly after the origin of the euAP3-like lineage in core eudicots, the 2 main B gene duplications. The residues inferred to have been fixed by positive selection lie mostly within the K domain of the protein, which is key to promote heterodimerization. Additionally, we used a likelihood method that accommodates DSRs among lineages to estimate duplication dates for AP3-PI and euAP3-TM6, calibrating with data from the fossil record. The dates obtained are consistent with angiosperm origins and diversification of core eudicots. Our results strongly suggest that novel multimer formation with other MADS proteins could have been crucial for the functional divergence of B MADS-box genes. We thus propose a mechanism of functional diversification and persistence of gene duplicates by the appearance of novel multimerization capabilities after duplications. Multimer formation in different combinations of regulatory proteins can be a mechanistic basis for the origin of novel regulatory functions and a gene regulatory mechanism for the appearance of morphological innovations.
Cabiati, Manuela; Raucci, Serena; Caselli, Chiara; Guzzardi, Maria Angela; D'Amico, Andrea; Prescimone, Tommaso; Giannessi, Daniela; Del Ry, Silvia
2012-06-01
Obesity is a complex pathology with interacting and confounding causes due to the environment, hormonal signaling patterns, and genetic predisposition. At present, the Zucker rat is an eligible genetic model for research on obesity and metabolic syndrome, allowing scrutiny of gene expression profiles. Real-time PCR is the benchmark method for measuring mRNA expressions, but the accuracy and reproducibility of its data greatly depend on appropriate normalization strategies. In the Zucker rat model, no specific reference genes have been identified in myocardium, kidney, and lung, the main organs involved in this syndrome. The aim of this study was to select among ten candidates (Actb, Gapdh, Polr2a, Ywhag, Rpl13a, Sdha, Ppia, Tbp, Hprt1 and Tfrc) a set of reference genes that can be used for the normalization of mRNA expression data obtained by real-time PCR in obese and lean Zucker rats both at fasting and during acute hyperglycemia. The most stable genes in the heart were Sdha, Tbp, and Hprt1; in kidney, Tbp, Actb, and Gapdh were chosen, while Actb, Ywhag, and Sdha were selected as the most stably expressed set for pulmonary tissue. The normalization strategy was used to analyze mRNA expression of tumor necrosis factor α, the main inflammatory mediator in obesity, whose variations were more significant when normalized with the appropriately selected reference genes. The findings obtained in this study underline the importance of having three stably expressed reference gene sets for use in the cardiac, renal, and pulmonary tissues of an experimental model of obese and hyperglycemic Zucker rats.
2013-01-01
Background Coffee production in Africa represents a significant share of the total export revenues and influences the lives of millions of people, yet severe socio-economic repercussions are annually felt in result of the overall losses caused by the coffee berry disease (CBD). This quarantine disease is caused by the fungus Colletotrichum kahawae Waller and Bridge, which remains one of the most devastating threats to Coffea arabica production in Africa at high altitude, and its dispersal to Latin America and Asia represents a serious concern. Understanding the molecular genetic basis of coffee resistance to this disease is of high priority to support breeding strategies. Selection and validation of suitable reference genes presenting stable expression in the system studied is the first step to engage studies of gene expression profiling. Results In this study, a set of ten genes (S24, 14-3-3, RPL7, GAPDH, UBQ9, VATP16, SAND, UQCC, IDE and β-Tub9) was evaluated to identify reference genes during the first hours of interaction (12, 48 and 72 hpi) between resistant and susceptible coffee genotypes and C. kahawae. Three analyses were done for the selection of these genes considering the entire dataset and the two genotypes (resistant and susceptible), separately. The three statistical methods applied GeNorm, NormFinder, and BestKeeper, allowed identifying IDE as one of the most stable genes for all datasets analysed, and in contrast GADPH and UBQ9 as the least stable ones. In addition, the expression of two defense-related transcripts, encoding for a receptor like kinase and a pathogenesis related protein 10, were used to validate the reference genes selected. Conclusion Taken together, our results provide guidelines for reference gene(s) selection towards a more accurate and widespread use of qPCR to study the interaction between Coffea spp. and C. kahawae. PMID:24073624
Evaluating intra- and inter-individual variation in the human placental transcriptome.
Hughes, David A; Kircher, Martin; He, Zhisong; Guo, Song; Fairbrother, Genevieve L; Moreno, Carlos S; Khaitovich, Philipp; Stoneking, Mark
2015-03-19
Gene expression variation is a phenotypic trait of particular interest as it represents the initial link between genotype and other phenotypes. Analyzing how such variation apportions among and within groups allows for the evaluation of how genetic and environmental factors influence such traits. It also provides opportunities to identify genes and pathways that may have been influenced by non-neutral processes. Here we use a population genetics framework and next generation sequencing to evaluate how gene expression variation is apportioned among four human groups in a natural biological tissue, the placenta. We estimate that on average, 33.2%, 58.9%, and 7.8% of the placental transcriptome is explained by variation within individuals, among individuals, and among human groups, respectively. Additionally, when technical and biological traits are included in models of gene expression they each account for roughly 2% of total gene expression variation. Notably, the variation that is significantly different among groups is enriched in biological pathways associated with immune response, cell signaling, and metabolism. Many biological traits demonstrate correlated changes in expression in numerous pathways of potential interest to clinicians and evolutionary biologists. Finally, we estimate that the majority of the human placental transcriptome exhibits expression profiles consistent with neutrality; the remainder are consistent with stabilizing selection, directional selection, or diversifying selection. We apportion placental gene expression variation into individual, population, and biological trait factors and identify how each influence the transcriptome. Additionally, we advance methods to associate expression profiles with different forms of selection.
Teste, Marie-Ange; Duquenne, Manon; François, Jean M; Parrou, Jean-Luc
2009-01-01
Background Real-time RT-PCR is the recommended method for quantitative gene expression analysis. A compulsory step is the selection of good reference genes for normalization. A few genes often referred to as HouseKeeping Genes (HSK), such as ACT1, RDN18 or PDA1 are among the most commonly used, as their expression is assumed to remain unchanged over a wide range of conditions. Since this assumption is very unlikely, a geometric averaging of multiple, carefully selected internal control genes is now strongly recommended for normalization to avoid this problem of expression variation of single reference genes. The aim of this work was to search for a set of reference genes for reliable gene expression analysis in Saccharomyces cerevisiae. Results From public microarray datasets, we selected potential reference genes whose expression remained apparently invariable during long-term growth on glucose. Using the algorithm geNorm, ALG9, TAF10, TFC1 and UBC6 turned out to be genes whose expression remained stable, independent of the growth conditions and the strain backgrounds tested in this study. We then showed that the geometric averaging of any subset of three genes among the six most stable genes resulted in very similar normalized data, which contrasted with inconsistent results among various biological samples when the normalization was performed with ACT1. Normalization with multiple selected genes was therefore applied to transcriptional analysis of genes involved in glycogen metabolism. We determined an induction ratio of 100-fold for GPH1 and 20-fold for GSY2 between the exponential phase and the diauxic shift on glucose. There was no induction of these two genes at this transition phase on galactose, although in both cases, the kinetics of glycogen accumulation was similar. In contrast, SGA1 expression was independent of the carbon source and increased by 3-fold in stationary phase. Conclusion In this work, we provided a set of genes that are suitable reference genes for quantitative gene expression analysis by real-time RT-PCR in yeast biological samples covering a large panel of physiological states. In contrast, we invalidated and discourage the use of ACT1 as well as other commonly used reference genes (PDA1, TDH3, RDN18, etc) as internal controls for quantitative gene expression analysis in yeast. PMID:19874630
Teste, Marie-Ange; Duquenne, Manon; François, Jean M; Parrou, Jean-Luc
2009-10-30
Real-time RT-PCR is the recommended method for quantitative gene expression analysis. A compulsory step is the selection of good reference genes for normalization. A few genes often referred to as HouseKeeping Genes (HSK), such as ACT1, RDN18 or PDA1 are among the most commonly used, as their expression is assumed to remain unchanged over a wide range of conditions. Since this assumption is very unlikely, a geometric averaging of multiple, carefully selected internal control genes is now strongly recommended for normalization to avoid this problem of expression variation of single reference genes. The aim of this work was to search for a set of reference genes for reliable gene expression analysis in Saccharomyces cerevisiae. From public microarray datasets, we selected potential reference genes whose expression remained apparently invariable during long-term growth on glucose. Using the algorithm geNorm, ALG9, TAF10, TFC1 and UBC6 turned out to be genes whose expression remained stable, independent of the growth conditions and the strain backgrounds tested in this study. We then showed that the geometric averaging of any subset of three genes among the six most stable genes resulted in very similar normalized data, which contrasted with inconsistent results among various biological samples when the normalization was performed with ACT1. Normalization with multiple selected genes was therefore applied to transcriptional analysis of genes involved in glycogen metabolism. We determined an induction ratio of 100-fold for GPH1 and 20-fold for GSY2 between the exponential phase and the diauxic shift on glucose. There was no induction of these two genes at this transition phase on galactose, although in both cases, the kinetics of glycogen accumulation was similar. In contrast, SGA1 expression was independent of the carbon source and increased by 3-fold in stationary phase. In this work, we provided a set of genes that are suitable reference genes for quantitative gene expression analysis by real-time RT-PCR in yeast biological samples covering a large panel of physiological states. In contrast, we invalidated and discourage the use of ACT1 as well as other commonly used reference genes (PDA1, TDH3, RDN18, etc) as internal controls for quantitative gene expression analysis in yeast.
Demographically-Based Evaluation of Genomic Regions under Selection in Domestic Dogs
Freedman, Adam H.; Schweizer, Rena M.; Ortega-Del Vecchyo, Diego; Han, Eunjung; Davis, Brian W.; Gronau, Ilan; Silva, Pedro M.; Galaverni, Marco; Fan, Zhenxin; Marx, Peter; Lorente-Galdos, Belen; Ramirez, Oscar; Hormozdiari, Farhad; Alkan, Can; Vilà, Carles; Squire, Kevin; Geffen, Eli; Kusak, Josip; Boyko, Adam R.; Parker, Heidi G.; Lee, Clarence; Tadigotla, Vasisht; Siepel, Adam; Bustamante, Carlos D.; Harkins, Timothy T.; Nelson, Stanley F.; Marques-Bonet, Tomas; Ostrander, Elaine A.; Wayne, Robert K.; Novembre, John
2016-01-01
Controlling for background demographic effects is important for accurately identifying loci that have recently undergone positive selection. To date, the effects of demography have not yet been explicitly considered when identifying loci under selection during dog domestication. To investigate positive selection on the dog lineage early in the domestication, we examined patterns of polymorphism in six canid genomes that were previously used to infer a demographic model of dog domestication. Using an inferred demographic model, we computed false discovery rates (FDR) and identified 349 outlier regions consistent with positive selection at a low FDR. The signals in the top 100 regions were frequently centered on candidate genes related to brain function and behavior, including LHFPL3, CADM2, GRIK3, SH3GL2, MBP, PDE7B, NTAN1, and GLRA1. These regions contained significant enrichments in behavioral ontology categories. The 3rd top hit, CCRN4L, plays a major role in lipid metabolism, that is supported by additional metabolism related candidates revealed in our scan, including SCP2D1 and PDXC1. Comparing our method to an empirical outlier approach that does not directly account for demography, we found only modest overlaps between the two methods, with 60% of empirical outliers having no overlap with our demography-based outlier detection approach. Demography-aware approaches have lower-rates of false discovery. Our top candidates for selection, in addition to expanding the set of neurobehavioral candidate genes, include genes related to lipid metabolism, suggesting a dietary target of selection that was important during the period when proto-dogs hunted and fed alongside hunter-gatherers. PMID:26943675
Enhanced transgene expression in rice following selection controlled by weak promoters.
Zhou, Jie; Yang, Yong; Wang, Xuming; Yu, Feibo; Yu, Chulang; Chen, Juan; Cheng, Ye; Yan, Chenqi; Chen, Jianping
2013-03-27
Techniques that enable high levels of transgene expression in plants are attractive for the commercial production of plant-made recombinant pharmaceutical proteins or other gene transfer related strategies. The conventional way to increase the yield of desired transgenic products is to use strong promoters to control the expression of the transgene. Although many such promoters have been identified and characterized, the increase obtainable from a single promoter is ultimately limited to a certain extent. In this study, we report a method to magnify the effect of a single promoter by using a weak promoter-based selection system in transgenic rice. tCUP1, a fragment derived from the tobacco cryptic promoter (tCUP), was tested for its activity in rice by fusion to both a β-glucuronidase (GUS) reporter and a hygromycin phosphotransferase (HPT) selectable marker. The tCUP1 promoter allowed the recovery of transformed rice plants and conferred tissue specific expression of the GUS reporter, but was much weaker than the CaMV 35S promoter in driving a selectable marker for growth of resistant calli. However, in the resistant calli and regenerated transgenic plants selected by the use of tCUP1, the constitutive expression of green fluorescent protein (GFP) was dramatically increased as a result of the additive effect of multiple T-DNA insertions. The correlation between attenuated selection by a weak promoter and elevation of copy number and foreign gene expression was confirmed by using another relatively weak promoter from nopaline synthase (Nos). The use of weak promoter derived selectable markers leads to a high T-DNA copy number and then greatly increases the expression of the foreign gene. The method described here provides an effective approach to robustly enhance the expression of heterogenous transgenes through copy number manipulation in rice.
A Cancer Gene Selection Algorithm Based on the K-S Test and CFS.
Su, Qiang; Wang, Yina; Jiang, Xiaobing; Chen, Fuxue; Lu, Wen-Cong
2017-01-01
To address the challenging problem of selecting distinguished genes from cancer gene expression datasets, this paper presents a gene subset selection algorithm based on the Kolmogorov-Smirnov (K-S) test and correlation-based feature selection (CFS) principles. The algorithm selects distinguished genes first using the K-S test, and then, it uses CFS to select genes from those selected by the K-S test. We adopted support vector machines (SVM) as the classification tool and used the criteria of accuracy to evaluate the performance of the classifiers on the selected gene subsets. This approach compared the proposed gene subset selection algorithm with the K-S test, CFS, minimum-redundancy maximum-relevancy (mRMR), and ReliefF algorithms. The average experimental results of the aforementioned gene selection algorithms for 5 gene expression datasets demonstrate that, based on accuracy, the performance of the new K-S and CFS-based algorithm is better than those of the K-S test, CFS, mRMR, and ReliefF algorithms. The experimental results show that the K-S test-CFS gene selection algorithm is a very effective and promising approach compared to the K-S test, CFS, mRMR, and ReliefF algorithms.
Genome-wide scan for selection signatures in six cattle breeds in South Africa.
Makina, Sithembile O; Muchadeyi, Farai C; van Marle-Köster, Este; Taylor, Jerry F; Makgahlela, Mahlako L; Maiwashe, Azwihangwisi
2015-11-26
The detection of selection signatures in breeds of livestock species can contribute to the identification of regions of the genome that are, or have been, functionally important and, as a consequence, have been targeted by selection. This study used two approaches to detect signatures of selection within and between six cattle breeds in South Africa, including Afrikaner (n = 44), Nguni (n = 54), Drakensberger (n = 47), Bonsmara (n = 44), Angus (n = 31) and Holstein (n = 29). The first approach was based on the detection of genomic regions in which haplotypes have been driven towards complete fixation within breeds. The second approach identified regions of the genome that had very different allele frequencies between populations (F ST). Forty-seven candidate genomic regions were identified as harbouring putative signatures of selection using both methods. Twelve of these candidate selected regions were shared among the breeds and ten were validated by previous studies. Thirty-three of these regions were successfully annotated and candidate genes were identified. Among these genes the keratin genes (KRT222, KRT24, KRT25, KRT26, and KRT27) and one heat shock protein gene (HSPB9) on chromosome 19 between 42,896,570 and 42,897,840 bp were detected for the Nguni breed. These genes were previously associated with adaptation to tropical environments in Zebu cattle. In addition, a number of candidate genes associated with the nervous system (WNT5B, FMOD, PRELP, and ATP2B), immune response (CYM, CDC6, and CDK10), production (MTPN, IGFBP4, TGFB1, and AJAP1) and reproductive performance (ADIPOR2, OVOS2, and RBBP8) were also detected as being under selection. The results presented here provide a foundation for detecting mutations that underlie genetic variation of traits that have economic importance for cattle breeds in South Africa.
López-Arredondo, Damar L; Herrera-Estrella, Luis
2013-05-01
Antibiotic and herbicide resistance genes are currently the most frequently used selectable marker genes for plant research and crop development. However, the use of antibiotics and herbicides must be carefully controlled because the degree of susceptibility to these compounds varies widely among plant species and because they can also affect plant regeneration. Therefore, new selectable marker systems that are effective for a broad range of plant species are still needed. Here, we report a simple and inexpensive system based on providing transgenic plant cells the capacity to convert a nonmetabolizable compound (phosphite, Phi) into an essential nutrient for cell growth (phosphate) trough the expression of a bacterial gene encoding a phosphite oxidoreductase (PTXD). This system is effective for the selection of Arabidopsis transgenic plants by germinating T0 seeds directly on media supplemented with Phi and to select transgenic tobacco shoots from cocultivated leaf disc explants using nutrient media supplemented with Phi as both a source of phosphorus and selective agent. Because the ptxD/Phi system also allows the establishment of large-scale screening systems under greenhouse conditions completely eliminating false transformation events, it should facilitate the development of novel plant transformation methods. © 2013 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd.
A Third Approach to Gene Prediction Suggests Thousands of Additional Human Transcribed Regions
Glusman, Gustavo; Qin, Shizhen; El-Gewely, M. Raafat; Siegel, Andrew F; Roach, Jared C; Hood, Leroy; Smit, Arian F. A
2006-01-01
The identification and characterization of the complete ensemble of genes is a main goal of deciphering the digital information stored in the human genome. Many algorithms for computational gene prediction have been described, ultimately derived from two basic concepts: (1) modeling gene structure and (2) recognizing sequence similarity. Successful hybrid methods combining these two concepts have also been developed. We present a third orthogonal approach to gene prediction, based on detecting the genomic signatures of transcription, accumulated over evolutionary time. We discuss four algorithms based on this third concept: Greens and CHOWDER, which quantify mutational strand biases caused by transcription-coupled DNA repair, and ROAST and PASTA, which are based on strand-specific selection against polyadenylation signals. We combined these algorithms into an integrated method called FEAST, which we used to predict the location and orientation of thousands of putative transcription units not overlapping known genes. Many of the newly predicted transcriptional units do not appear to code for proteins. The new algorithms are particularly apt at detecting genes with long introns and lacking sequence conservation. They therefore complement existing gene prediction methods and will help identify functional transcripts within many apparent “genomic deserts.” PMID:16543943
Genes under positive selection in a model plant pathogenic fungus, Botrytis.
Aguileta, Gabriela; Lengelle, Juliette; Chiapello, Hélène; Giraud, Tatiana; Viaud, Muriel; Fournier, Elisabeth; Rodolphe, François; Marthey, Sylvain; Ducasse, Aurélie; Gendrault, Annie; Poulain, Julie; Wincker, Patrick; Gout, Lilian
2012-07-01
The rapid evolution of particular genes is essential for the adaptation of pathogens to new hosts and new environments. Powerful methods have been developed for detecting targets of selection in the genome. Here we used divergence data to compare genes among four closely related fungal pathogens adapted to different hosts to elucidate the functions putatively involved in adaptive processes. For this goal, ESTs were sequenced in the specialist fungal pathogens Botrytis tulipae and Botrytis ficariarum, and compared with genome sequences of Botrytis cinerea and Sclerotinia sclerotiorum, responsible for diseases on over 200 plant species. A maximum likelihood-based analysis of 642 predicted orthologs detected 21 genes showing footprints of positive selection. These results were validated by resequencing nine of these genes in additional Botrytis species, showing they have also been rapidly evolving in other related species. Twenty of the 21 genes had not previously been identified as pathogenicity factors in B. cinerea, but some had functions related to plant-fungus interactions. The putative functions were involved in respiratory and energy metabolism, protein and RNA metabolism, signal transduction or virulence, similarly to what was detected in previous studies using the same approach in other pathogens. Mutants of B. cinerea were generated for four of these genes as a first attempt to elucidate their functions. Copyright © 2012 Elsevier B.V. All rights reserved.
Safe genetically engineered plants
NASA Astrophysics Data System (ADS)
Rosellini, D.; Veronesi, F.
2007-10-01
The application of genetic engineering to plants has provided genetically modified plants (GMPs, or transgenic plants) that are cultivated worldwide on increasing areas. The most widespread GMPs are herbicide-resistant soybean and canola and insect-resistant corn and cotton. New GMPs that produce vaccines, pharmaceutical or industrial proteins, and fortified food are approaching the market. The techniques employed to introduce foreign genes into plants allow a quite good degree of predictability of the results, and their genome is minimally modified. However, some aspects of GMPs have raised concern: (a) control of the insertion site of the introduced DNA sequences into the plant genome and of its mutagenic effect; (b) presence of selectable marker genes conferring resistance to an antibiotic or an herbicide, linked to the useful gene; (c) insertion of undesired bacterial plasmid sequences; and (d) gene flow from transgenic plants to non-transgenic crops or wild plants. In response to public concerns, genetic engineering techniques are continuously being improved. Techniques to direct foreign gene integration into chosen genomic sites, to avoid the use of selectable genes or to remove them from the cultivated plants, to reduce the transfer of undesired bacterial sequences, and make use of alternative, safer selectable genes, are all fields of active research. In our laboratory, some of these new techniques are applied to alfalfa, an important forage plant. These emerging methods for plant genetic engineering are briefly reviewed in this work.
Shang, Shuai; Zhong, Huaming; Wu, Xiaoyang; Wei, Qinguo; Zhang, Huanxin; Chen, Jun; Chen, Yao; Tang, Xuexi; Zhang, Honghai
2018-04-01
Toll-like receptors (TLRs) encoded by the TLR multigene family play an important role in initial pathogen recognition in vertebrates. Among the TLRs, TLR2 and TLR4 may be of particular importance to reptiles. In order to study the evolutionary patterns and structural characteristics of TLRs, we explored the available genomes of several representative members of reptiles. 25 TLR2 genes and 19 TLR4 genes from reptiles were obtained in this study. Phylogenetic results showed that the TLR2 gene duplication occurred in several species. Evolutionary analysis by at least two methods identified 30 and 13 common positively selected codons in TLR2 and TLR4, respectively. Most positively selected sites of TLR2 and TLR4 were located in the Leucine-rich repeat (LRRs). Branch model analysis showed that TLR2 genes were under different evolutionary forces in reptiles, while the TLR4 genes showed no significant selection pressure. The different evolutionary adaptation of TLR2 and TLR4 among the reptiles might be due to their different function in recognizing bacteria. Overall, we explored the structure and evolution of TLR2 and TLR4 genes in reptiles for the first time. Our study revealed valuable information regarding TLR2 and TLR4 in reptiles, and provided novel insights into the conservation concern of natural populations. Copyright © 2017 Elsevier B.V. All rights reserved.
Liu, Li-Zhi; Wu, Fang-Xiang; Zhang, Wen-Jun
2014-01-01
As an abstract mapping of the gene regulations in the cell, gene regulatory network is important to both biological research study and practical applications. The reverse engineering of gene regulatory networks from microarray gene expression data is a challenging research problem in systems biology. With the development of biological technologies, multiple time-course gene expression datasets might be collected for a specific gene network under different circumstances. The inference of a gene regulatory network can be improved by integrating these multiple datasets. It is also known that gene expression data may be contaminated with large errors or outliers, which may affect the inference results. A novel method, Huber group LASSO, is proposed to infer the same underlying network topology from multiple time-course gene expression datasets as well as to take the robustness to large error or outliers into account. To solve the optimization problem involved in the proposed method, an efficient algorithm which combines the ideas of auxiliary function minimization and block descent is developed. A stability selection method is adapted to our method to find a network topology consisting of edges with scores. The proposed method is applied to both simulation datasets and real experimental datasets. It shows that Huber group LASSO outperforms the group LASSO in terms of both areas under receiver operating characteristic curves and areas under the precision-recall curves. The convergence analysis of the algorithm theoretically shows that the sequence generated from the algorithm converges to the optimal solution of the problem. The simulation and real data examples demonstrate the effectiveness of the Huber group LASSO in integrating multiple time-course gene expression datasets and improving the resistance to large errors or outliers.
Wojtas, Bartosz; Pfeifer, Aleksandra; Oczko-Wojciechowska, Malgorzata; Krajewska, Jolanta; Czarniecka, Agnieszka; Kukulska, Aleksandra; Eszlinger, Markus; Musholt, Thomas; Stokowy, Tomasz; Swierniak, Michal; Stobiecka, Ewa; Chmielik, Ewa; Rusinek, Dagmara; Tyszkiewicz, Tomasz; Halczok, Monika; Hauptmann, Steffen; Lange, Dariusz; Jarzab, Michal; Paschke, Ralf; Jarzab, Barbara
2017-01-01
Distinguishing between follicular thyroid cancer (FTC) and follicular thyroid adenoma (FTA) constitutes a long-standing diagnostic problem resulting in equivocal histopathological diagnoses. There is therefore a need for additional molecular markers. To identify molecular differences between FTC and FTA, we analyzed the gene expression microarray data of 52 follicular neoplasms. We also performed a meta-analysis involving 14 studies employing high throughput methods (365 follicular neoplasms analyzed). Based on these two analyses, we selected 18 genes differentially expressed between FTA and FTC. We validated them by quantitative real-time polymerase chain reaction (qRT-PCR) in an independent set of 71 follicular neoplasms from formaldehyde-fixed paraffin embedded (FFPE) tissue material. We confirmed differential expression for 7 genes (CPQ, PLVAP, TFF3, ACVRL1, ZFYVE21, FAM189A2, and CLEC3B). Finally, we created a classifier that distinguished between FTC and FTA with an accuracy of 78%, sensitivity of 76%, and specificity of 80%, based on the expression of 4 genes (CPQ, PLVAP, TFF3, ACVRL1). In our study, we have demonstrated that meta-analysis is a valuable method for selecting possible molecular markers. Based on our results, we conclude that there might exist a plausible limit of gene classifier accuracy of approximately 80%, when follicular tumors are discriminated based on formalin-fixed postoperative material. PMID:28574441
Wojtas, Bartosz; Pfeifer, Aleksandra; Oczko-Wojciechowska, Malgorzata; Krajewska, Jolanta; Czarniecka, Agnieszka; Kukulska, Aleksandra; Eszlinger, Markus; Musholt, Thomas; Stokowy, Tomasz; Swierniak, Michal; Stobiecka, Ewa; Chmielik, Ewa; Rusinek, Dagmara; Tyszkiewicz, Tomasz; Halczok, Monika; Hauptmann, Steffen; Lange, Dariusz; Jarzab, Michal; Paschke, Ralf; Jarzab, Barbara
2017-06-02
Distinguishing between follicular thyroid cancer (FTC) and follicular thyroid adenoma (FTA) constitutes a long-standing diagnostic problem resulting in equivocal histopathological diagnoses. There is therefore a need for additional molecular markers. To identify molecular differences between FTC and FTA, we analyzed the gene expression microarray data of 52 follicular neoplasms. We also performed a meta-analysis involving 14 studies employing high throughput methods (365 follicular neoplasms analyzed). Based on these two analyses, we selected 18 genes differentially expressed between FTA and FTC. We validated them by quantitative real-time polymerase chain reaction (qRT-PCR) in an independent set of 71 follicular neoplasms from formaldehyde-fixed paraffin embedded (FFPE) tissue material. We confirmed differential expression for 7 genes ( CPQ , PLVAP , TFF3 , ACVRL1 , ZFYVE21 , FAM189A2 , and CLEC3B ). Finally, we created a classifier that distinguished between FTC and FTA with an accuracy of 78%, sensitivity of 76%, and specificity of 80%, based on the expression of 4 genes ( CPQ , PLVAP , TFF3 , ACVRL1 ). In our study, we have demonstrated that meta-analysis is a valuable method for selecting possible molecular markers. Based on our results, we conclude that there might exist a plausible limit of gene classifier accuracy of approximately 80%, when follicular tumors are discriminated based on formalin-fixed postoperative material.
Why Choose This One? Factors in Scientists' Selection of Bioinformatics Tools
ERIC Educational Resources Information Center
Bartlett, Joan C.; Ishimura, Yusuke; Kloda, Lorie A.
2011-01-01
Purpose: The objective was to identify and understand the factors involved in scientists' selection of preferred bioinformatics tools, such as databases of gene or protein sequence information (e.g., GenBank) or programs that manipulate and analyse biological data (e.g., BLAST). Methods: Eight scientists maintained research diaries for a two-week…
Zhang, Lei; Zhao, Xihua; Zhang, Guoxiu; Zhang, Jiajia; Wang, Xuedong; Zhang, Suping; Wang, Wei; Wei, Dongzhi
2016-02-09
Filamentous fungi play important roles in the production of plant cell-wall degrading enzymes. In recent years, homologous recombinant technologies have contributed significantly to improved enzymes production and system design of genetically manipulated strains. When introducing multiple gene deletions, we need a robust and convenient way to control selectable marker genes, especially when only a limited number of markers are available in filamentous fungi. Integration after transformation is predominantly nonhomologous in most fungi other than yeast. Fungal strains deficient in the non-homologous end-joining (NHEJ) pathway have limitations associated with gene function analyses despite they are excellent recipient strains for gene targets. We describe strategies and methods to address these challenges above and leverage the power of resilient NHEJ deficiency strains. We have established a foolproof light-inducible platform for one-step unmarked genetic modification in industrial eukaryotic microorganisms designated as 'LML 3.0', and an on-off control protocol of NHEJ pathway called 'OFN 1.0', using a synthetic light-switchable transactivation to control Cre recombinase-based excision and inversion. The methods provide a one-step strategy to sequentially modify genes without introducing selectable markers and NHEJ-deficiency. The strategies can be used to manipulate many biological processes in a wide range of eukaryotic cells.
Matoušková, Petra; Bártíková, Hana; Boušová, Iva; Hanušová, Veronika; Szotáková, Barbora; Skálová, Lenka
2014-01-01
Obesity and metabolic syndrome is increasing health problem worldwide. Among other ways, nutritional intervention using phytochemicals is important method for treatment and prevention of this disease. Recent studies have shown that certain phytochemicals could alter the expression of specific genes and microRNAs (miRNAs) that play a fundamental role in the pathogenesis of obesity. For study of the obesity and its treatment, monosodium glutamate (MSG)-injected mice with developed central obesity, insulin resistance and liver lipid accumulation are frequently used animal models. To understand the mechanism of phytochemicals action in obese animals, the study of selected genes expression together with miRNA quantification is extremely important. For this purpose, real-time quantitative PCR is a sensitive and reproducible method, but it depends on proper normalization entirely. The aim of present study was to identify the appropriate reference genes for mRNA and miRNA quantification in MSG mice treated with green tea catechins, potential anti-obesity phytochemicals. Two sets of reference genes were tested: first set contained seven commonly used genes for normalization of messenger RNA, the second set of candidate reference genes included ten small RNAs for normalization of miRNA. The expression stability of these reference genes were tested upon treatment of mice with catechins using geNorm, NormFinder and BestKeeper algorithms. Selected normalizers for mRNA quantification were tested and validated on expression of quinone oxidoreductase, biotransformation enzyme known to be modified by catechins. The effect of selected normalizers for miRNA quantification was tested on two obesity- and diabetes- related miRNAs, miR-221 and miR-29b, respectively. Finally, the combinations of B2M/18S/HPRT1 and miR-16/sno234 were validated as optimal reference genes for mRNA and miRNA quantification in liver and 18S/RPlP0/HPRT1 and sno234/miR-186 in small intestine of MSG mice. These reference genes will be used for mRNA and miRNA normalization in further study of green tea catechins action in obese mice.
Jenkins, Paul A; Song, Yun S; Brem, Rachel B
2012-01-01
Genetic exchange between isolated populations, or introgression between species, serves as a key source of novel genetic material on which natural selection can act. While detecting historical gene flow from DNA sequence data is of much interest, many existing methods can be limited by requirements for deep population genomic sampling. In this paper, we develop a scalable genealogy-based method to detect candidate signatures of gene flow into a given population when the source of the alleles is unknown. Our method does not require sequenced samples from the source population, provided that the alleles have not reached fixation in the sampled recipient population. The method utilizes recent advances in algorithms for the efficient reconstruction of ancestral recombination graphs, which encode genealogical histories of DNA sequence data at each site, and is capable of detecting the signatures of gene flow whose footprints are of length up to single genes. Further, we employ a theoretical framework based on coalescent theory to test for statistical significance of certain recombination patterns consistent with gene flow from divergent sources. Implementing these methods for application to whole-genome sequences of environmental yeast isolates, we illustrate the power of our approach to highlight loci with unusual recombination histories. By developing innovative theory and methods to analyze signatures of gene flow from population sequence data, our work establishes a foundation for the continued study of introgression and its evolutionary relevance.
Jenkins, Paul A.; Song, Yun S.; Brem, Rachel B.
2012-01-01
Genetic exchange between isolated populations, or introgression between species, serves as a key source of novel genetic material on which natural selection can act. While detecting historical gene flow from DNA sequence data is of much interest, many existing methods can be limited by requirements for deep population genomic sampling. In this paper, we develop a scalable genealogy-based method to detect candidate signatures of gene flow into a given population when the source of the alleles is unknown. Our method does not require sequenced samples from the source population, provided that the alleles have not reached fixation in the sampled recipient population. The method utilizes recent advances in algorithms for the efficient reconstruction of ancestral recombination graphs, which encode genealogical histories of DNA sequence data at each site, and is capable of detecting the signatures of gene flow whose footprints are of length up to single genes. Further, we employ a theoretical framework based on coalescent theory to test for statistical significance of certain recombination patterns consistent with gene flow from divergent sources. Implementing these methods for application to whole-genome sequences of environmental yeast isolates, we illustrate the power of our approach to highlight loci with unusual recombination histories. By developing innovative theory and methods to analyze signatures of gene flow from population sequence data, our work establishes a foundation for the continued study of introgression and its evolutionary relevance. PMID:23226196
Personalized gene silencing therapeutics for Huntington disease.
Kay, C; Skotte, N H; Southwell, A L; Hayden, M R
2014-07-01
Gene silencing offers a novel therapeutic strategy for dominant genetic disorders. In specific diseases, selective silencing of only one copy of a gene may be advantageous over non-selective silencing of both copies. Huntington disease (HD) is an autosomal dominant disorder caused by an expanded CAG trinucleotide repeat in the Huntingtin gene (HTT). Silencing both expanded and normal copies of HTT may be therapeutically beneficial, but preservation of normal HTT expression is preferred. Allele-specific methods can selectively silence the mutant HTT transcript by targeting either the expanded CAG repeat or single nucleotide polymorphisms (SNPs) in linkage disequilibrium with the expansion. Both approaches require personalized treatment strategies based on patient genotypes. We compare the prospect of safe treatment of HD by CAG- and SNP-specific silencing approaches and review HD population genetics used to guide target identification in the patient population. Clinical implementation of allele-specific HTT silencing faces challenges common to personalized genetic medicine, requiring novel solutions from clinical scientists and regulatory authorities. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Wang, Pengxia; Zhu, Yiguang; Zhang, Yuyang; Zhang, Chunyi; Xu, Jianyi; Deng, Yun; Peng, Donghai; Ruan, Lifang; Sun, Ming
2016-06-10
Bacillus thuringiensis and Bacillus cereus are two important species in B. cereus group. The intensive study of these strains at the molecular level and construction of genetically modified bacteria requires the development of efficient genetic tools. To insert genes into or delete genes from bacterial chromosomes, marker-less manipulation methods were employed. We present a novel genetic manipulation method for B. thuringiensis and B. cereus strains that does not leave selection markers. Our approach takes advantage of the relaxase Mob02281 encoded by plasmid pBMB0228 from Bacillus thuringiensis. In addition to its mobilization function, this Mob protein can mediate recombination between oriT sites. The Mob02281 mobilization module was associated with a spectinomycin-resistance gene to form a Mob-Spc cassette, which was flanked by the core 24-bp oriT sequences from pBMB0228. A strain in which the wild-type chromosome was replaced with the modified copy containing the Mob-Spc cassette at the target locus was obtained via homologous recombination. Thus, the spectinomycin-resistance gene can be used to screen for Mob-Spc cassette integration mutants. Recombination between the two oriT sequences mediated by Mob02281, encoded by the Mob-Spc cassette, resulted in the excision of the Mob-Spc cassette, producing the desired chromosomal alteration without introducing unwanted selection markers. We used this system to generate an in-frame deletion of a target gene in B. thuringiensis as well as a gene located in an operon of B. cereus. Moreover, we demonstrated that this system can be used to introduce a single gene or an expression cassette of interest in B. thuringiensis. The Mob/oriT recombination system provides an efficient method for unmarked genetic manipulation and for constructing genetically modified bacteria of B. thuringiensis and B. cereus. Our method extends the available genetic tools for B. thuringiensis and B. cereus strains.
RNAi-mediated resistance to rice black-streaked dwarf virus in transgenic rice.
Ahmed, Mohamed M S; Bian, Shiquan; Wang, Muyue; Zhao, Jing; Zhang, Bingwei; Liu, Qiaoquan; Zhang, Changquan; Tang, Shuzhu; Gu, Minghong; Yu, Hengxiu
2017-04-01
Rice black-streaked dwarf virus (RBSDV), a member of the genus Fijivirus in the family Reoviridae, causes significant economic losses in rice production in China and many other Asian countries. Development of resistant varieties by using conventional breeding methods is limited, as germplasm with high level of resistance to RBSDV have not yet been found. One of the most promising methods to confer resistance against RBSDV is the use of RNA interference (RNAi) technology. RBSDV non-structural protein P7-2, encoded by S7-2 gene, is a potential F-box protein and involved in the plant-virus interaction through the ubiquitination pathway. P8, encoded by S8 gene, is the minor core protein that possesses potent active transcriptional repression activity. In this study, we transformed rice calli using a mini-twin T-DNA vector harboring RNAi constructs of the RBSDV genes S7-2 or S8, and obtained plants harboring the target gene constructs and the selectable marker gene, hygromycin phosphotransferase (HPT). From the offspring of these transgenic plants, we obtained selectable marker (HPT gene)-free plants. Homozygous T 5 transgenic lines which harbored either S7-2-RNAi or S8-RNAi exhibited high level resistance against RBSDV under field infection pressure from indigenous viruliferous small brown planthoppers. Thus, our results showed that RNA interference with the expression of S7-2 or S8 genes seemed an effective way to induce high level resistance in rice against RBSD disease.
Selection of Phototransduction Genes in Homo sapiens.
Christopher, Mark; Scheetz, Todd E; Mullins, Robert F; Abràmoff, Michael D
2013-08-13
We investigated the evidence of recent positive selection in the human phototransduction system at single nucleotide polymorphism (SNP) and gene level. SNP genotyping data from the International HapMap Project for European, Eastern Asian, and African populations was used to discover differences in haplotype length and allele frequency between these populations. Numeric selection metrics were computed for each SNP and aggregated into gene-level metrics to measure evidence of recent positive selection. The level of recent positive selection in phototransduction genes was evaluated and compared to a set of genes shown previously to be under recent selection, and a set of highly conserved genes as positive and negative controls, respectively. Six of 20 phototransduction genes evaluated had gene-level selection metrics above the 90th percentile: RGS9, GNB1, RHO, PDE6G, GNAT1, and SLC24A1. The selection signal across these genes was found to be of similar magnitude to the positive control genes and much greater than the negative control genes. There is evidence for selective pressure in the genes involved in retinal phototransduction, and traces of this selective pressure can be demonstrated using SNP-level and gene-level metrics of allelic variation. We hypothesize that the selective pressure on these genes was related to their role in low light vision and retinal adaptation to ambient light changes. Uncovering the underlying genetics of evolutionary adaptations in phototransduction not only allows greater understanding of vision and visual diseases, but also the development of patient-specific diagnostic and intervention strategies.
Moretti, Stefano; van Leeuwen, Danitsja; Gmuender, Hans; Bonassi, Stefano; van Delft, Joost; Kleinjans, Jos; Patrone, Fioravante; Merlo, Domenico Franco
2008-01-01
Background In gene expression analysis, statistical tests for differential gene expression provide lists of candidate genes having, individually, a sufficiently low p-value. However, the interpretation of each single p-value within complex systems involving several interacting genes is problematic. In parallel, in the last sixty years, game theory has been applied to political and social problems to assess the power of interacting agents in forcing a decision and, more recently, to represent the relevance of genes in response to certain conditions. Results In this paper we introduce a Bootstrap procedure to test the null hypothesis that each gene has the same relevance between two conditions, where the relevance is represented by the Shapley value of a particular coalitional game defined on a microarray data-set. This method, which is called Comparative Analysis of Shapley value (shortly, CASh), is applied to data concerning the gene expression in children differentially exposed to air pollution. The results provided by CASh are compared with the results from a parametric statistical test for testing differential gene expression. Both lists of genes provided by CASh and t-test are informative enough to discriminate exposed subjects on the basis of their gene expression profiles. While many genes are selected in common by CASh and the parametric test, it turns out that the biological interpretation of the differences between these two selections is more interesting, suggesting a different interpretation of the main biological pathways in gene expression regulation for exposed individuals. A simulation study suggests that CASh offers more power than t-test for the detection of differential gene expression variability. Conclusion CASh is successfully applied to gene expression analysis of a data-set where the joint expression behavior of genes may be critical to characterize the expression response to air pollution. We demonstrate a synergistic effect between coalitional games and statistics that resulted in a selection of genes with a potential impact in the regulation of complex pathways. PMID:18764936
SoFoCles: feature filtering for microarray classification based on gene ontology.
Papachristoudis, Georgios; Diplaris, Sotiris; Mitkas, Pericles A
2010-02-01
Marker gene selection has been an important research topic in the classification analysis of gene expression data. Current methods try to reduce the "curse of dimensionality" by using statistical intra-feature set calculations, or classifiers that are based on the given dataset. In this paper, we present SoFoCles, an interactive tool that enables semantic feature filtering in microarray classification problems with the use of external, well-defined knowledge retrieved from the Gene Ontology. The notion of semantic similarity is used to derive genes that are involved in the same biological path during the microarray experiment, by enriching a feature set that has been initially produced with legacy methods. Among its other functionalities, SoFoCles offers a large repository of semantic similarity methods that are used in order to derive feature sets and marker genes. The structure and functionality of the tool are discussed in detail, as well as its ability to improve classification accuracy. Through experimental evaluation, SoFoCles is shown to outperform other classification schemes in terms of classification accuracy in two real datasets using different semantic similarity computation approaches.
2014-01-01
Background Signatures of selection are regions in the genome that have been preferentially increased in frequency and fixed in a population because of their functional importance in specific processes. These regions can be detected because of their lower genetic variability and specific regional linkage disequilibrium (LD) patterns. Methods By comparing the differences in regional LD variation between dairy and beef cattle types, and between indicine and taurine subspecies, we aim at finding signatures of selection for production and adaptation in cattle breeds. The VarLD method was applied to compare the LD variation in the autosomal genome between breeds, including Angus and Brown Swiss, representing taurine breeds, and Nelore and Gir, representing indicine breeds. Genomic regions containing the top 0.01 and 0.1 percentile of signals were characterized using the UMD3.1 Bos taurus genome assembly to identify genes in those regions and compared with previously reported selection signatures and regions with copy number variation. Results For all comparisons, the top 0.01 and 0.1 percentile included 26 and 165 signals and 17 and 125 genes, respectively, including TECRL, BT.23182 or FPPS, CAST, MYOM1, UVRAG and DNAJA1. Conclusions The VarLD method is a powerful tool to identify differences in linkage disequilibrium between cattle populations and putative signatures of selection with potential adaptive and productive importance. PMID:24592996
[Effect of EMP-1 gene on human esophageal cancer cell line].
Wang, Hai-tao; Liu, Zhi-hua; Wang, Xiu-qin; Wu, Min
2002-03-01
EMP-1 was selected from a series of differential expressed genes obtained from cDNA microarray in the authors' lab. Epithelial membrane pnteiu-1 gene (EMP-1) was expressed 6 fold lower in esophageal cancer than in normal tissue. The authors further designed the experiment to study the effect of human EMP-1 gene on human esophageal cancer cell line in order to explain the function of this gene on the carcinogensis and progression esophageal cancer. EMP-1 gene was cloned into eukaryotic vector and transfected into the human esophageal cancer cell line. The transfection effect was qualified by Western blot and RT-PCR method. The cell growth curve was observed and the cell cycle was checked by FACS method. EMP-1 was transfected into EC9706 cell line and its expression was up-regulated. The cell growth is accelerated and expression of EMP-1 is linked to induction of S phase arrest. EMP-1 gene has some relationship with carcinogenesis of esophagus.
Exploring Wound-Healing Genomic Machinery with a Network-Based Approach
Vitali, Francesca; Marini, Simone; Balli, Martina; Grosemans, Hanne; Sampaolesi, Maurilio; Lussier, Yves A.; Cusella De Angelis, Maria Gabriella; Bellazzi, Riccardo
2017-01-01
The molecular mechanisms underlying tissue regeneration and wound healing are still poorly understood despite their importance. In this paper we develop a bioinformatics approach, combining biology and network theory to drive experiments for better understanding the genetic underpinnings of wound healing mechanisms and for selecting potential drug targets. We start by selecting literature-relevant genes in murine wound healing, and inferring from them a Protein-Protein Interaction (PPI) network. Then, we analyze the network to rank wound healing-related genes according to their topological properties. Lastly, we perform a procedure for in-silico simulation of a treatment action in a biological pathway. The findings obtained by applying the developed pipeline, including gene expression analysis, confirms how a network-based bioinformatics method is able to prioritize candidate genes for in vitro analysis, thus speeding up the understanding of molecular mechanisms and supporting the discovery of potential drug targets. PMID:28635674
Removal of Heterologous Sequences from Plasmodium falciparum Mutants Using FLPe-Recombinase
van Schaijk, Ben C. L.; Vos, Martijn W.; Janse, Chris J.; Sauerwein, Robert W.; Khan, Shahid M.
2010-01-01
Genetically-modified mutants are now indispensable Plasmodium gene-function reagents, which are also being pursued as genetically attenuated parasite vaccines. Currently, the generation of transgenic malaria-parasites requires the use of drug-resistance markers. Here we present the development of an FRT/FLP-recombinase system that enables the generation of transgenic parasites free of resistance genes. We demonstrate in the human malaria parasite, P. falciparum, the complete and efficient removal of the introduced resistance gene. We targeted two neighbouring genes, p52 and p36, using a construct that has a selectable marker cassette flanked by FRT-sequences. This permitted the subsequent removal of the selectable marker cassette by transient transfection of a plasmid that expressed a 37°C thermostable and enhanced FLP-recombinase. This method of removing heterologous DNA sequences from the genome opens up new possibilities in Plasmodium research to sequentially target multiple genes and for using genetically-modified parasites as live, attenuated malaria vaccines. PMID:21152048
High-Content Analysis of CRISPR-Cas9 Gene-Edited Human Embryonic Stem Cells.
Carlson-Stevermer, Jared; Goedland, Madelyn; Steyer, Benjamin; Movaghar, Arezoo; Lou, Meng; Kohlenberg, Lucille; Prestil, Ryan; Saha, Krishanu
2016-01-12
CRISPR-Cas9 gene editing of human cells and tissues holds much promise to advance medicine and biology, but standard editing methods require weeks to months of reagent preparation and selection where much or all of the initial edited samples are destroyed during analysis. ArrayEdit, a simple approach utilizing surface-modified multiwell plates containing one-pot transcribed single-guide RNAs, separates thousands of edited cell populations for automated, live, high-content imaging and analysis. The approach lowers the time and cost of gene editing and produces edited human embryonic stem cells at high efficiencies. Edited genes can be expressed in both pluripotent stem cells and differentiated cells. This preclinical platform adds important capabilities to observe editing and selection in situ within complex structures generated by human cells, ultimately enabling optical and other molecular perturbations in the editing workflow that could refine the specificity and versatility of gene editing. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Arunkumar, Ramesh; Josephs, Emily B; Williamson, Robert J; Wright, Stephen I
2013-11-01
Selection on the gametophyte can be a major force shaping plant genomes as 7-11% of genes are expressed only in that phase and 60% of genes are expressed in both the gametophytic and sporophytic phases. The efficacy of selection on gametophytic tissues is likely to be influenced by sexual selection acting on male and female functions of hermaphroditic plants. Moreover, the haploid nature of the gametophytic phase allows selection to be efficient in removing recessive deleterious mutations and fixing recessive beneficial mutations. To assess the importance of gametophytic selection, we compared the strength of purifying selection and extent of positive selection on gametophyte- and sporophyte-specific genes in the highly outcrossing plant Capsella grandiflora. We found that pollen-exclusive genes had a larger fraction of sites under strong purifying selection, a greater proportion of adaptive substitutions, and faster protein evolution compared with seedling-exclusive genes. In contrast, sperm cell-exclusive genes had a smaller fraction of sites under strong purifying selection, a lower proportion of adaptive substitutions, and slower protein evolution compared with seedling-exclusive genes. Observations of strong selection acting on pollen-expressed genes are likely explained by sexual selection resulting from pollen competition aided by the haploid nature of that tissue. The relaxation of selection in sperm might be due to the reduced influence of intrasexual competition, but reduced gene expression may also be playing an important role.
Nishida, I; Sugiura, M; Enju, A; Nakamura, M
2000-12-01
A new isogene for acyl-(acyl-carrier-protein):glycerol-3-phosphate acyltransferase (GPAT; EC 2.3.1.15) in squash has been cloned and the gene product was identified as oleate-selective GPAT. Using PCR primers that could hybridise with exons for a previously cloned squash GPAT, we obtained two PCR products of different size: one coded for a previously cloned squash GPAT corresponding to non-selective isoforms AT2 and AT3, and the other for a new isozyme, probably the oleate-selective isoform AT1. Full-length amino acid sequences of respective isozymes were deduced from the nucleotide sequences of genomic genes and cDNAs, which were cloned by a series of PCR-based methods. Thus, we designated the new gene CmATS1;1 and the other one CmATS1;2. Genome blot analysis revealed that the squash genome contained the two isogenes at non-allelic loci. AT1-active fractions were partially purified, and three polypeptide bands were identified as being AT1 polypeptides, which exhibited relative molecular masses of 39.5-40.5 kDa, pI values of 6.75-7.15, and oleate selectivity over palmitate. Partial amino-terminal sequences obtained from two of these bands verified that the new isogene codes for AT1 polypeptides.
The role of green fluorescent protein (GFP) in transgenic plants to reduce gene silencing phenomena.
El-Shemy, Hany A; Khalafalla, Mutasim M; Ishimoto, Masao
2009-01-01
The green fluorescent protein (GFP) of jellyfish (Aequorea victoria) has significant advantages over other reporter genes, because expression can be detected in living cells without any substrates. Recently, epigenetic phenomena are important to consider in plant biotechnology experiments for elucidate unknown mechanism. Therefore, soybean immature cotyledons were generated embryogenesis cells and engineered with two different gene constructs (pHV and pHVS) using gene gun method. Both constructs contain a gene conferring resistance to hygromycin (hpt) as a selective marker and a modified glycinin (11S globulin) gene (V3-1) as a target. However, sGFP(S65T) as a reporter gene was used only in pHVS as a reporter gene for study the relation between using sGFP(S65T) and gene silencing phenomena. Fluorescence microscopic was used for screening after the selection of hygromycin, identified clearly the expression of sGFP(S65T) in the transformed soybean embryos bombarded with the pHVS construct. Protein analysis was used to detect gene expression overall seeds using SDS-PAGE. Percentage of gene down regulation was highly in pHV construct compared with pHVS. Thus, sGFP(S65T ) as a reporter gene in vector system may be play useful role for transgenic evaluation and avoid gene silencing in plants for the benefit of plant transformation system.
Berghoff, Bork A; Karlsson, Torgny; Källman, Thomas; Wagner, E Gerhart H; Grabherr, Manfred G
2017-01-01
Measuring how gene expression changes in the course of an experiment assesses how an organism responds on a molecular level. Sequencing of RNA molecules, and their subsequent quantification, aims to assess global gene expression changes on the RNA level (transcriptome). While advances in high-throughput RNA-sequencing (RNA-seq) technologies allow for inexpensive data generation, accurate post-processing and normalization across samples is required to eliminate any systematic noise introduced by the biochemical and/or technical processes. Existing methods thus either normalize on selected known reference genes that are invariant in expression across the experiment, assume that the majority of genes are invariant, or that the effects of up- and down-regulated genes cancel each other out during the normalization. Here, we present a novel method, moose 2 , which predicts invariant genes in silico through a dynamic programming (DP) scheme and applies a quadratic normalization based on this subset. The method allows for specifying a set of known or experimentally validated invariant genes, which guides the DP. We experimentally verified the predictions of this method in the bacterium Escherichia coli , and show how moose 2 is able to (i) estimate the expression value distances between RNA-seq samples, (ii) reduce the variation of expression values across all samples, and (iii) to subsequently reveal new functional groups of genes during the late stages of DNA damage. We further applied the method to three eukaryotic data sets, on which its performance compares favourably to other methods. The software is implemented in C++ and is publicly available from http://grabherr.github.io/moose2/. The proposed RNA-seq normalization method, moose 2 , is a valuable alternative to existing methods, with two major advantages: (i) in silico prediction of invariant genes provides a list of potential reference genes for downstream analyses, and (ii) non-linear artefacts in RNA-seq data are handled adequately to minimize variations between replicates.
Identifying biologically relevant putative mechanisms in a given phenotype comparison
Hanoudi, Samer; Donato, Michele; Draghici, Sorin
2017-01-01
A major challenge in life science research is understanding the mechanism involved in a given phenotype. The ability to identify the correct mechanisms is needed in order to understand fundamental and very important phenomena such as mechanisms of disease, immune systems responses to various challenges, and mechanisms of drug action. The current data analysis methods focus on the identification of the differentially expressed (DE) genes using their fold change and/or p-values. Major shortcomings of this approach are that: i) it does not consider the interactions between genes; ii) its results are sensitive to the selection of the threshold(s) used, and iii) the set of genes produced by this approach is not always conducive to formulating mechanistic hypotheses. Here we present a method that can construct networks of genes that can be considered putative mechanisms. The putative mechanisms constructed by this approach are not limited to the set of DE genes, but also considers all known and relevant gene-gene interactions. We analyzed three real datasets for which both the causes of the phenotype, as well as the true mechanisms were known. We show that the method identified the correct mechanisms when applied on microarray datasets from mouse. We compared the results of our method with the results of the classical approach, showing that our method produces more meaningful biological insights. PMID:28486531
Milnthorpe, Andrew T; Soloviev, Mikhail
2011-04-15
The Cancer Genome Anatomy Project (CGAP) xProfiler and cDNA Digital Gene Expression Displayer (DGED) have been made available to the scientific community over a decade ago and since then were used widely to find genes which are differentially expressed between cancer and normal tissues. The tissue types are usually chosen according to the ontology hierarchy developed by NCBI. The xProfiler uses an internally available flat file database to determine the presence or absence of genes in the chosen libraries, while cDNA DGED uses the publicly available UniGene Expression and Gene relational databases to count the sequences found for each gene in the presented libraries. We discovered that the CGAP approach often includes libraries from dependent or irrelevant tissues (one third of libraries were incorrect on average, with some tissue searches no correct libraries being selected at all). We also discovered that the CGAP approach reported genes from outside the selected libraries and may omit genes found within the libraries. Other errors include the incorrect estimation of the significance values and inaccurate settings for the library size cut-off values. We advocated a revised approach to finding libraries associated with tissues. In doing so, libraries from dependent or irrelevant tissues do not get included in the final library pool. We also revised the method for determining the presence or absence of a gene by searching the UniGene relational database, revised calculation of statistical significance and sorted the library cut-off filter. Our results justify re-evaluation of all previously reported results where NCBI CGAP expression data and tools were used.
2011-01-01
Background The Cancer Genome Anatomy Project (CGAP) xProfiler and cDNA Digital Gene Expression Displayer (DGED) have been made available to the scientific community over a decade ago and since then were used widely to find genes which are differentially expressed between cancer and normal tissues. The tissue types are usually chosen according to the ontology hierarchy developed by NCBI. The xProfiler uses an internally available flat file database to determine the presence or absence of genes in the chosen libraries, while cDNA DGED uses the publicly available UniGene Expression and Gene relational databases to count the sequences found for each gene in the presented libraries. Results We discovered that the CGAP approach often includes libraries from dependent or irrelevant tissues (one third of libraries were incorrect on average, with some tissue searches no correct libraries being selected at all). We also discovered that the CGAP approach reported genes from outside the selected libraries and may omit genes found within the libraries. Other errors include the incorrect estimation of the significance values and inaccurate settings for the library size cut-off values. We advocated a revised approach to finding libraries associated with tissues. In doing so, libraries from dependent or irrelevant tissues do not get included in the final library pool. We also revised the method for determining the presence or absence of a gene by searching the UniGene relational database, revised calculation of statistical significance and sorted the library cut-off filter. Conclusion Our results justify re-evaluation of all previously reported results where NCBI CGAP expression data and tools were used. PMID:21496233
Zhang, RuiJie; Li, Xia; Jiang, YongShuai; Liu, GuiYou; Li, ChuanXing; Zhang, Fan; Xiao, Yun; Gong, BinSheng
2009-02-01
High-throughout single nucleotide polymorphism detection technology and the existing knowledge provide strong support for mining the disease-related haplotypes and genes. In this study, first, we apply four kinds of haplotype identification methods (Confidence Intervals, Four Gamete Tests, Solid Spine of LD and fusing method of haplotype block) into high-throughout SNP genotype data to identify blocks, then use cluster analysis to verify the effectiveness of the four methods, and select the alcoholism-related SNP haplotypes through risk analysis. Second, we establish a mapping from haplotypes to alcoholism-related genes. Third, we inquire NCBI SNP and gene databases to locate the blocks and identify the candidate genes. In the end, we make gene function annotation by KEGG, Biocarta, and GO database. We find 159 haplotype blocks, which relate to the alcoholism most possibly on chromosome 1 approximately 22, including 227 haplotypes, of which 102 SNP haplotypes may increase the risk of alcoholism. We get 121 alcoholism-related genes and verify their reliability by the functional annotation of biology. In a word, we not only can handle the SNP data easily, but also can locate the disease-related genes precisely by combining our novel strategies of mining alcoholism-related haplotypes and genes with existing knowledge framework.
A novel sgRNA selection system for CRISPR-Cas9 in mammalian cells.
Zhang, Haiwei; Zhang, Xixi; Fan, Cunxian; Xie, Qun; Xu, Chengxian; Zhao, Qun; Liu, Yongbo; Wu, Xiaoxia; Zhang, Haibing
2016-03-18
CRISPR-Cas9 mediated genome editing system has been developed as a powerful tool for elucidating the function of genes through genetic engineering in multiple cells and organisms. This system takes advantage of a single guide RNA (sgRNA) to direct the Cas9 endonuclease to a specific DNA site to generate mutant alleles. Since the targeting efficiency of sgRNAs to distinct DNA loci can vary widely, there remains a need for a rapid, simple and efficient sgRNA selection method to overcome this limitation of the CRISPR-Cas9 system. Here we report a novel system to select sgRNA with high efficacy for DNA sequence modification by a luciferase assay. Using this sgRNAs selection system, we further demonstrated successful examples of one sgRNA for generating one gene knockout cell lines where the targeted genes are shown to be functionally defective. This system provides a potential application to optimize the sgRNAs in different species and to generate a powerful CRISPR-Cas9 genome-wide screening system with minimum amounts of sgRNAs. Copyright © 2016 Elsevier Inc. All rights reserved.
Chung, In-Hyuk; Yoo, Hye Sook; Eah, Jae-Yong; Yoon, Hyun-Kyu; Jung, Jin-Wook; Hwang, Seung Yong; Kim, Chang-Bae
2010-10-01
DNA barcoding with the gene encoding cytochrome c oxidase I (COI) in the mitochondrial genome has been proposed as a standard marker to identify and discover animal species. Some migratory wild birds are suspected of transmitting avian influenza and pose a threat to aircraft safety because of bird strikes. We have previously reported the COI gene sequences of 92 Korean bird species. In the present study, we developed a DNA microarray to identify 17 selected bird species on the basis of nucleotide diversity. We designed and synthesized 19 specific oligonucleotide probes; these probes were arrayed on a silylated glass slide. The length of the probes was 19-24 bps. The COI sequences amplified from the tissues of the selected birds were labeled with a fluorescent probe for microarray hybridization, and unique hybridization patterns were detected for each selected species. These patterns may be considered diagnostic patterns for species identification. This microarray system will provide a sensitive and a high-throughput method for identification of Korean birds.
Genomic Signature of Kin Selection in an Ant with Obligately Sterile Workers
Warner, Michael R.; Mikheyev, Alexander S.
2017-01-01
Abstract Kin selection is thought to drive the evolution of cooperation and conflict, but the specific genes and genome-wide patterns shaped by kin selection are unknown. We identified thousands of genes associated with the sterile ant worker caste, the archetype of an altruistic phenotype shaped by kin selection, and then used population and comparative genomic approaches to study patterns of molecular evolution at these genes. Consistent with population genetic theoretical predictions, worker-upregulated genes experienced reduced selection compared with genes upregulated in reproductive castes. Worker-upregulated genes included more taxonomically restricted genes, indicating that the worker caste has recruited more novel genes, yet these genes also experienced reduced selection. Our study identifies a putative genomic signature of kin selection and helps to integrate emerging sociogenomic data with longstanding social evolution theory. PMID:28419349
2011-01-01
Background Changes in transcriptional orientation (“CTOs”) occur frequently in prokaryotic genomes. Such changes usually result from genomic inversions, which may cause a conflict between the directions of replication and transcription and an increase in mutation rate. However, CTOs do not always lead to the replication-transcription confrontation. Furthermore, CTOs may cause deleterious disruptions of operon structure and/or gene regulations. The currently existing CTOs may indicate relaxation of selection pressure. Therefore, it is of interest to investigate whether CTOs have an independent effect on the evolutionary rates of the affected genes, and whether these genes are subject to any type of selection pressure in prokaryotes. Methods Three closely related enterbacteria, Escherichia coli, Klebsiella pneumoniae and Salmonella enterica serovar Typhimurium, were selected for comparisons of synonymous (dS) and nonsynonymous (dN) substitution rate between the genes that have experienced changes in transcriptional orientation (changed-orientation genes, “COGs”) and those that do not (same-orientation genes, “SOGs”). The dN/dS ratio was also derived to evaluate the selection pressure on the analyzed genes. Confounding factors in the estimation of evolutionary rates, such as gene essentiality, gene expression level, replication-transcription confrontation, and decreased dS at gene terminals were controlled in the COG-SOG comparisons. Results We demonstrate that COGs have significantly higher dN and dS than SOGs when a series of confounding factors are controlled. However, the dN/dS ratios are similar between the two gene groups, suggesting that the increase in dS can sufficiently explain the increase in dN in COGs. Therefore, the increases in evolutionary rates in COGs may be mainly mutation-driven. Conclusions Here we show that CTOs can increase the evolutionary rates of the affected genes. This effect is independent of the replication-transcription confrontation, which is suggested to be the major cause of inversion-associated evolutionary rate increases. The real cause of such evolutionary rate increases remains unclear but is worth further explorations. PMID:22152004
Árnason, Einar
2015-01-01
Natural selection, the most important force in evolution, comes in three forms. Negative purifying selection removes deleterious variation and maintains adaptations. Positive directional selection fixes beneficial variants, producing new adaptations. Balancing selection maintains variation in a population. Important mechanisms of balancing selection include heterozygote advantage, frequency-dependent advantage of rarity, and local and fluctuating episodic selection. A rare pathogen gains an advantage because host defenses are predominantly effective against prevalent types. Similarly, a rare immune variant gives its host an advantage because the prevalent pathogens cannot escape the host’s apostatic defense. Due to the stochastic nature of evolution, neutral variation may accumulate on genealogical branches, but trans-species polymorphisms are rare under neutrality and are strong evidence for balancing selection. Balanced polymorphism maintains diversity at the major histocompatibility complex (MHC) in vertebrates. The Atlantic cod is missing genes for both MHC-II and CD4, vital parts of the adaptive immune system. Nevertheless, cod are healthy in their ecological niche, maintaining large populations that support major commercial fisheries. Innate immunity is of interest from an evolutionary perspective, particularly in taxa lacking adaptive immunity. Here, we analyze extensive amino acid and nucleotide polymorphisms of the cathelicidin gene family in Atlantic cod and closely related taxa. There are three major clusters, Cath1, Cath2, and Cath3, that we consider to be paralogous genes. There is extensive nucleotide and amino acid allelic variation between and within clusters. The major feature of the results is that the variation clusters by alleles and not by species in phylogenetic trees and discriminant analysis of principal components. Variation within the three groups shows trans-species polymorphism that is older than speciation and that is suggestive of balancing selection maintaining the variation. Using Bayesian and likelihood methods positive and negative selection is evident at sites in the conserved part of the genes and, to a larger extent, in the active part which also shows episodic diversifying selection, further supporting the argument for balancing selection. PMID:26038731
2012-01-01
High-dimensional gene expression data provide a rich source of information because they capture the expression level of genes in dynamic states that reflect the biological functioning of a cell. For this reason, such data are suitable to reveal systems related properties inside a cell, e.g., in order to elucidate molecular mechanisms of complex diseases like breast or prostate cancer. However, this is not only strongly dependent on the sample size and the correlation structure of a data set, but also on the statistical hypotheses tested. Many different approaches have been developed over the years to analyze gene expression data to (I) identify changes in single genes, (II) identify changes in gene sets or pathways, and (III) identify changes in the correlation structure in pathways. In this paper, we review statistical methods for all three types of approaches, including subtypes, in the context of cancer data and provide links to software implementations and tools and address also the general problem of multiple hypotheses testing. Further, we provide recommendations for the selection of such analysis methods. Reviewers This article was reviewed by Arcady Mushegian, Byung-Soo Kim and Joel Bader. PMID:23227854
Natural Selection in a Bangladeshi Population from the Cholera-Endemic Ganges River Delta
Karlsson, Elinor K.; Harris, Jason B.; Tabrizi, Shervin; Rahman, Atiqur; Shlyakhter, Ilya; Patterson, Nick; O'Dushlaine, Colm; Schaffner, Stephen F.; Gupta, Sameer; Chowdhury, Fahima; Sheikh, Alaullah; Shin, Ok Sarah; Ellis, Crystal; Becker, Christine E.; Stuart, Lynda M.; Calderwood, Stephen B.; Ryan, Edward T.; Qadri, Firdausi; Sabeti, Pardis C.; LaRocque, Regina C.
2015-01-01
As an ancient disease with high fatality, cholera has likely exerted strong selective pressure on affected human populations. We performed a genome-wide study of natural selection in a population from the Ganges River Delta, the historic geographic epicenter of cholera. We identified 305 candidate selected regions using the Composite of Multiple Signals (CMS) method. The regions were enriched for potassium channel genes involved in cyclic AMP-mediated chloride secretion and for components of the innate immune system involved in NF-κB signaling. We demonstrate that a number of these strongly selected genes are associated with cholera susceptibility in two separate cohorts. We further identify repeated examples of selection and association in an NF-kB / inflammasome-dependent pathway that is activated in vitro by Vibrio cholerae. Our findings shed light on the genetic basis of cholera resistance in a population from the Ganges River Delta and present a promising approach for identifying genetic factors influencing susceptibility to infectious diseases. PMID:23825302
Scuba: scalable kernel-based gene prioritization.
Zampieri, Guido; Tran, Dinh Van; Donini, Michele; Navarin, Nicolò; Aiolli, Fabio; Sperduti, Alessandro; Valle, Giorgio
2018-01-25
The uncovering of genes linked to human diseases is a pressing challenge in molecular biology and precision medicine. This task is often hindered by the large number of candidate genes and by the heterogeneity of the available information. Computational methods for the prioritization of candidate genes can help to cope with these problems. In particular, kernel-based methods are a powerful resource for the integration of heterogeneous biological knowledge, however, their practical implementation is often precluded by their limited scalability. We propose Scuba, a scalable kernel-based method for gene prioritization. It implements a novel multiple kernel learning approach, based on a semi-supervised perspective and on the optimization of the margin distribution. Scuba is optimized to cope with strongly unbalanced settings where known disease genes are few and large scale predictions are required. Importantly, it is able to efficiently deal both with a large amount of candidate genes and with an arbitrary number of data sources. As a direct consequence of scalability, Scuba integrates also a new efficient strategy to select optimal kernel parameters for each data source. We performed cross-validation experiments and simulated a realistic usage setting, showing that Scuba outperforms a wide range of state-of-the-art methods. Scuba achieves state-of-the-art performance and has enhanced scalability compared to existing kernel-based approaches for genomic data. This method can be useful to prioritize candidate genes, particularly when their number is large or when input data is highly heterogeneous. The code is freely available at https://github.com/gzampieri/Scuba .
Analysis of SCN5A Gene Variants in East Slovak Patients with Cardiomyopathy.
Priganc, Mariana; Zigová, Michaela; Boroňová, Iveta; Bernasovská, Jarmila; Dojčáková, Dana; Szabadosová, Viktória; Mydlárová Blaščáková, Marta; Tóthová, Iveta; Kmec, Ján; Bernasovský, Ivan
2017-03-01
Mutations in ion channels genes are potential cause of cardiomyopathy. The SCN5A gene (sodium channel, voltage gated, type V alpha subunit gene; 3p21) belongs to the family of cardiac sodium channel genes. Mutations in SCN5A gene lead to decreased Na+ current and ion unbalance. The SCN5A gene mutations are found in approximately 2% of patients with dilated cardiomyopathy (DCM), and they may be potential phenotype modifiers in hypertrophic cardiomyopathy (HCM). The role of SCN5A gene mutations in cardiomyopathy is not fully elucidated. Three selected exons (12, 20, and 21) of the SCN5A gene in the cohort of 58 East Slovak patients with dilated and HCM were analyzed by the Sanger sequencing method in order to detect etiopathogenic mutations associated with dilated and HCM. The mutation screening of three selected exons of SCN5A gene in the cohort of 27 DCM, 12 HCM patients, and 16 controls identified 10 missense genetic variants. Three of them (T1247I, A1260D, and G1262S), all in exon 21 of the SCN5A gene, were potentially damaging and disease-causing variants. Data from this study demonstrate that SCN5A gene variants have important role in the etiopathogenesis of dilated and HCM. © 2016 Wiley Periodicals, Inc.
Gene-Based Multiclass Cancer Diagnosis with Class-Selective Rejections
Jrad, Nisrine; Grall-Maës, Edith; Beauseroy, Pierre
2009-01-01
Supervised learning of microarray data is receiving much attention in recent years. Multiclass cancer diagnosis, based on selected gene profiles, are used as adjunct of clinical diagnosis. However, supervised diagnosis may hinder patient care, add expense or confound a result. To avoid this misleading, a multiclass cancer diagnosis with class-selective rejection is proposed. It rejects some patients from one, some, or all classes in order to ensure a higher reliability while reducing time and expense costs. Moreover, this classifier takes into account asymmetric penalties dependant on each class and on each wrong or partially correct decision. It is based on ν-1-SVM coupled with its regularization path and minimizes a general loss function defined in the class-selective rejection scheme. The state of art multiclass algorithms can be considered as a particular case of the proposed algorithm where the number of decisions is given by the classes and the loss function is defined by the Bayesian risk. Two experiments are carried out in the Bayesian and the class selective rejection frameworks. Five genes selected datasets are used to assess the performance of the proposed method. Results are discussed and accuracies are compared with those computed by the Naive Bayes, Nearest Neighbor, Linear Perceptron, Multilayer Perceptron, and Support Vector Machines classifiers. PMID:19584932
Evaluation of the effect and profitability of gene-assisted selection in pig breeding system.
Li, Ya-Lan; Zhang, Qin; Chen, Yao-Sheng
2007-11-01
To evaluate the effect and profitability of using the quantitative trait loci (QTL)-linked direct marker (DR marker) in gene-assisted selection (GAS). Three populations (100, 200, or 300 sows plus 10 boars within each group) with segregating QTL were simulated stochastically. Five economic traits were investigated, including number of born alive (NBA), average daily gain to 100 kg body weight (ADG), feed conversion ratio (FCR), back fat at 100 kg body weight (BF) and intramuscular fat (IMF). Selection was based on the estimated breeding value (EBV) of each trait. The starting frequencies of the QTL's favorable allele were 0.1, 0.3 and 0.5, respectively. The economic return was calculated by gene flow method. The selection efficiency was higher than 100% when DR markers were used in GAS for 5 traits. The selection efficiency for NBA was the highest, and the lowest was for ADG whose QTL had the lowest variance. The mixed model applied DR markers and obtained higher extra genetic gain and extra economic returns. We also found that the lower the frequency of the favorable allele of the QTL, the higher the extra return obtained. GAS is an effective selection scheme to increase the genetic gain and the economic returns in pig breeding.
Chen, Yunjia; Qiu, Shihong; Luan, Chi-Hao; Luo, Ming
2007-01-01
Background Expression of higher eukaryotic genes as soluble, stable recombinant proteins is still a bottleneck step in biochemical and structural studies of novel proteins today. Correct identification of stable domains/fragments within the open reading frame (ORF), combined with proper cloning strategies, can greatly enhance the success rate when higher eukaryotic proteins are expressed as these domains/fragments. Furthermore, a HTP cloning pipeline incorporated with bioinformatics domain/fragment selection methods will be beneficial to studies of structure and function genomics/proteomics. Results With bioinformatics tools, we developed a domain/domain boundary prediction (DDBP) method, which was trained by available experimental data. Combined with an improved cloning strategy, DDBP had been applied to 57 proteins from C. elegans. Expression and purification results showed there was a 10-fold increase in terms of obtaining purified proteins. Based on the DDBP method, the improved GATEWAY cloning strategy and a robotic platform, we constructed a high throughput (HTP) cloning pipeline, including PCR primer design, PCR, BP reaction, transformation, plating, colony picking and entry clones extraction, which have been successfully applied to 90 C. elegans genes, 88 Brucella genes, and 188 human genes. More than 97% of the targeted genes were obtained as entry clones. This pipeline has a modular design and can adopt different operations for a variety of cloning/expression strategies. Conclusion The DDBP method and improved cloning strategy were satisfactory. The cloning pipeline, combined with our recombinant protein HTP expression pipeline and the crystal screening robots, constitutes a complete platform for structure genomics/proteomics. This platform will increase the success rate of purification and crystallization dramatically and promote the further advancement of structure genomics/proteomics. PMID:17663785
Wang, Yan; Qian, Guoliang; Liu, Fengquan; Li, Yue-Zhong; Shen, Yuemao; Du, Liangcheng
2013-11-15
Lysobacter is a genus of Gram-negative gliding bacteria that are emerged as novel biocontrol agents and new sources of bioactive natural products. The bacteria are naturally resistant to many antibiotics commonly used in transformant selection, which has hampered the genetic manipulations. Here, we described a facile method for quick-and-easy identification of the target transformants from a large population of the wild type and nontarget transformants. The method is based on a distinct yellow-to-black color change as a visual selection marker for site-specific integration of the gene of interest. Through transposon random mutagenesis, we identified a black-colored strain from the yellow-colored L. enzymogenes . The black strain was resulted from a disruption of hmgA, a gene required for tyrosine/phenylalanine metabolism. The disruption of hmgA led to accumulation of dark brown pigments. As proof of principle, we constructed a series of expression vectors for a regulator gene found within the WAP-8294A biosynthetic gene cluster. The yield of WAP-8294A in the black strains increased by 2 fold compared to the wild type. Interestingly, the yield of another antibiotic (HSAF) increased up to 7 fold in the black strains. WAP-8294A is a family of potent anti-MRSA antibiotics and is currently in clinical studies, and HSAF is an antifungal compound with distinct structural features and a novel mode of action. This work represents the first successful metabolic engineering in Lysobacter. The development of this facile method opens a way toward manipulating antibiotic production in the largely unexplored sources.
Wang, Yan; Qian, Guoliang; Liu, Fengquan; Li, Yue-Zhong; Shen, Yuemao; Du, Liangcheng
2013-01-01
Lysobacter is a genus of Gram -negative gliding bacteria that are emerged as novel biocontrol agents and new sources of bioactive natural products. The bacteria are naturally resistant to many antibiotics commonly used in transformant selection, which has hampered the genetic manipulations. Here, we described a facile method for quick -and-easy identification of the target transformants from a large population of the wild type and non-target transformants. The method is based on a distinct yellow-to-black color change as a visual selection marker for site-specific integration of the gene of interest. Through transposon random mutagenesis, we identified a black-colored strain from the yellow-colored L. enzymogenes. The black strain was resulted from a disruption of hmgA, a gene required for tyrosine /phenylalanine metabolism. The disruption of hmgA led to accumulation of dark brown pigments. As proof of principle, we constructed a series of expression vectors for a regulator gene found within the WAP-8294A biosynthetic gene cluster. The yield of WAP-8294A in the black strains increased by 2 fold compared to the wild type. Interestingly, the yield of another antibiotic (HSAF) increased up to 7 fold in the black strains. WAP-8294A is a family of potent anti-MRSA antibiotics and is currently in clinical studies, and HSAF is an antifungal compound with distinct structural features and a novel mode of action. This work represents the first successful metabolic engineering in Lysobacter. The development of this facile method opens a way toward manipulating antibiotic production in the largely unexplored sources. PMID:23937053
Advances in methods for detection of anaerobic ammonium oxidizing (anammox) bacteria.
Li, Meng; Gu, Ji-Dong
2011-05-01
Anaerobic ammonium oxidation (anammox), the biochemical process oxidizing ammonium into dinitrogen gas using nitrite as an electron acceptor, has only been recognized for its significant role in the global nitrogen cycle not long ago, and its ubiquitous distribution in a wide range of environments has changed our knowledge about the contributors to the global nitrogen cycle. Currently, several groups of methods are used in detection of anammox bacteria based on their physiological and biochemical characteristics, cellular chemical composition, and both 16S rRNA gene and selective functional genes as biomarkers, including hydrazine oxidoreductase and nitrite reductase encoding genes hzo and nirS, respectively. Results from these methods coupling with advances in quantitative PCR, reverse transcription of mRNA genes and stable isotope labeling have improved our understanding on the distribution, diversity, and activity of anammox bacteria in different environments both natural and engineered ones. In this review, we summarize these methods used in detection of anammox bacteria from various environments, highlight the strengths and weakness of these methods, and also discuss the new development potentials on the existing and new techniques in the future.
Yang, C; Hamel, C; Vujanovic, V; Gan, Y
2012-01-01
Aims This study explores nontarget effects of fungicide application on field-grown chickpea. Methods and Results Molecular methods were used to test the effects of foliar application of fungicide on the diversity and distribution of nifH genes associated with two chickpea cultivars and their nodulation. Treatments were replicated four times in a split-plot design in the field, in 2008 and 2009. Chemical disease control did not change the richness of the nifH genes associated with chickpea, but selected different dominant nifH gene sequences in 2008, as revealed by correspondence analysis. Disease control strategies had no significant effect on disease severity or nifH gene distribution in 2009. Dry weather conditions rather than disease restricted plant growth that year, suggesting that reduced infection rather than the fungicide is the factor modifying the distribution of nifH gene in chickpea rhizosphere. Reduced nodule size and enhanced N2-fixation in protected plants indicate that disease control affects plant physiology, which may in turn influence rhizosphere bacteria. The genotypes of chickpea also affected the diversity of the nifH gene in the rhizosphere, illustrating the importance of plant selective effects on bacterial communities. Conclusions We conclude that the chemical disease control affects nodulation and the diversity of nifH gene in chickpea rhizosphere, by modifying host plant physiology. A direct effect of fungicide on the bacteria cannot be ruled out, however, as residual amounts of fungicide were found to accumulate in the rhizosphere soil of protected plants. Significance and Impact of the Study Systemic nontarget effect of phytoprotection on nifH gene diversity in chickpea rhizosphere is reported for the first time. This result suggests the possibility of manipulating associative biological nitrogen fixation in the field. PMID:22335393
USDA-ARS?s Scientific Manuscript database
Genetic manipulation is an essential technique to analyze gene function; however, limited methods are available for Babesia bovis, a causative pathogen of the globally important cattle disease, bovine babesiosis. To date, two stable transfection systems have been developed for B. bovis, using select...
The locus of sexual selection: moving sexual selection studies into the post-genomics era.
Wilkinson, G S; Breden, F; Mank, J E; Ritchie, M G; Higginson, A D; Radwan, J; Jaquiery, J; Salzburger, W; Arriero, E; Barribeau, S M; Phillips, P C; Renn, S C P; Rowe, L
2015-04-01
Sexual selection drives fundamental evolutionary processes such as trait elaboration and speciation. Despite this importance, there are surprisingly few examples of genes unequivocally responsible for variation in sexually selected phenotypes. This lack of information inhibits our ability to predict phenotypic change due to universal behaviours, such as fighting over mates and mate choice. Here, we discuss reasons for this apparent gap and provide recommendations for how it can be overcome by adopting contemporary genomic methods, exploiting underutilized taxa that may be ideal for detecting the effects of sexual selection and adopting appropriate experimental paradigms. Identifying genes that determine variation in sexually selected traits has the potential to improve theoretical models and reveal whether the genetic changes underlying phenotypic novelty utilize common or unique molecular mechanisms. Such a genomic approach to sexual selection will help answer questions in the evolution of sexually selected phenotypes that were first asked by Darwin and can furthermore serve as a model for the application of genomics in all areas of evolutionary biology. © 2015 European Society For Evolutionary Biology. Journal of Evolutionary Biology © 2015 European Society For Evolutionary Biology.
Adaptations to Climate in Candidate Genes for Common Metabolic Disorders
Hancock, Angela M; Witonsky, David B; Gordon, Adam S; Eshel, Gidon; Pritchard, Jonathan K; Coop, Graham; Di Rienzo, Anna
2008-01-01
Evolutionary pressures due to variation in climate play an important role in shaping phenotypic variation among and within species and have been shown to influence variation in phenotypes such as body shape and size among humans. Genes involved in energy metabolism are likely to be central to heat and cold tolerance. To test the hypothesis that climate shaped variation in metabolism genes in humans, we used a bioinformatics approach based on network theory to select 82 candidate genes for common metabolic disorders. We genotyped 873 tag SNPs in these genes in 54 worldwide populations (including the 52 in the Human Genome Diversity Project panel) and found correlations with climate variables using rank correlation analysis and a newly developed method termed Bayesian geographic analysis. In addition, we genotyped 210 carefully matched control SNPs to provide an empirical null distribution for spatial patterns of allele frequency due to population history alone. For nearly all climate variables, we found an excess of genic SNPs in the tail of the distributions of the test statistics compared to the control SNPs, implying that metabolic genes as a group show signals of spatially varying selection. Among our strongest signals were several SNPs (e.g., LEPR R109K, FABP2 A54T) that had previously been associated with phenotypes directly related to cold tolerance. Since variation in climate may be correlated with other aspects of environmental variation, it is possible that some of the signals that we detected reflect selective pressures other than climate. Nevertheless, our results are consistent with the idea that climate has been an important selective pressure acting on candidate genes for common metabolic disorders. PMID:18282109
Shi, Weimin; Zhang, Xiaoya; Shen, Qi
2010-01-01
Quantitative structure-activity relationship (QSAR) study of chemokine receptor 5 (CCR5) binding affinity of substituted 1-(3,3-diphenylpropyl)-piperidinyl amides and ureas and toxicity of aromatic compounds have been performed. The gene expression programming (GEP) was used to select variables and produce nonlinear QSAR models simultaneously using the selected variables. In our GEP implementation, a simple and convenient method was proposed to infer the K-expression from the number of arguments of the function in a gene, without building the expression tree. The results were compared to those obtained by artificial neural network (ANN) and support vector machine (SVM). It has been demonstrated that the GEP is a useful tool for QSAR modeling. Copyright 2009 Elsevier Masson SAS. All rights reserved.
Using RNA-seq data to select reference genes for normalizing gene expression in apple roots.
Zhou, Zhe; Cong, Peihua; Tian, Yi; Zhu, Yanmin
2017-01-01
Gene expression in apple roots in response to various stress conditions is a less-explored research subject. Reliable reference genes for normalizing quantitative gene expression data have not been carefully investigated. In this study, the suitability of a set of 15 apple genes were evaluated for their potential use as reliable reference genes. These genes were selected based on their low variance of gene expression in apple root tissues from a recent RNA-seq data set, and a few previously reported apple reference genes for other tissue types. Four methods, Delta Ct, geNorm, NormFinder and BestKeeper, were used to evaluate their stability in apple root tissues of various genotypes and under different experimental conditions. A small panel of stably expressed genes, MDP0000095375, MDP0000147424, MDP0000233640, MDP0000326399 and MDP0000173025 were recommended for normalizing quantitative gene expression data in apple roots under various abiotic or biotic stresses. When the most stable and least stable reference genes were used for data normalization, significant differences were observed on the expression patterns of two target genes, MdLecRLK5 (MDP0000228426, a gene encoding a lectin receptor like kinase) and MdMAPK3 (MDP0000187103, a gene encoding a mitogen-activated protein kinase). Our data also indicated that for those carefully validated reference genes, a single reference gene is sufficient for reliable normalization of the quantitative gene expression. Depending on the experimental conditions, the most suitable reference genes can be specific to the sample of interest for more reliable RT-qPCR data normalization.
Using RNA-seq data to select reference genes for normalizing gene expression in apple roots
Zhou, Zhe; Cong, Peihua; Tian, Yi
2017-01-01
Gene expression in apple roots in response to various stress conditions is a less-explored research subject. Reliable reference genes for normalizing quantitative gene expression data have not been carefully investigated. In this study, the suitability of a set of 15 apple genes were evaluated for their potential use as reliable reference genes. These genes were selected based on their low variance of gene expression in apple root tissues from a recent RNA-seq data set, and a few previously reported apple reference genes for other tissue types. Four methods, Delta Ct, geNorm, NormFinder and BestKeeper, were used to evaluate their stability in apple root tissues of various genotypes and under different experimental conditions. A small panel of stably expressed genes, MDP0000095375, MDP0000147424, MDP0000233640, MDP0000326399 and MDP0000173025 were recommended for normalizing quantitative gene expression data in apple roots under various abiotic or biotic stresses. When the most stable and least stable reference genes were used for data normalization, significant differences were observed on the expression patterns of two target genes, MdLecRLK5 (MDP0000228426, a gene encoding a lectin receptor like kinase) and MdMAPK3 (MDP0000187103, a gene encoding a mitogen-activated protein kinase). Our data also indicated that for those carefully validated reference genes, a single reference gene is sufficient for reliable normalization of the quantitative gene expression. Depending on the experimental conditions, the most suitable reference genes can be specific to the sample of interest for more reliable RT-qPCR data normalization. PMID:28934340
Frimodt-Møller, Jakob; Charbon, Godefroid; Krogfelt, Karen A; Løbner-Olesen, Anders
2017-09-11
The optimal chromosomal position(s) of a given DNA element was/were determined by transposon-mediated random insertion followed by fitness selection. In bacteria, the impact of the genetic context on the function of a genetic element can be difficult to assess. Several mechanisms, including topological effects, transcriptional interference from neighboring genes, and/or replication-associated gene dosage, may affect the function of a given genetic element. Here, we describe a method that permits the random integration of a DNA element into the chromosome of Escherichia coli and select the most favorable locations using a simple growth competition experiment. The method takes advantage of a well-described transposon-based system of random insertion, coupled with a selection of the fittest clone(s) by growth advantage, a procedure that is easily adjustable to experimental needs. The nature of the fittest clone(s) can be determined by whole-genome sequencing on a complex multi-clonal population or by easy gene walking for the rapid identification of selected clones. Here, the non-coding DNA region DARS2, which controls the initiation of chromosome replication in E. coli, was used as an example. The function of DARS2 is known to be affected by replication-associated gene dosage; the closer DARS2 gets to the origin of DNA replication, the more active it becomes. DARS2 was randomly inserted into the chromosome of a DARS2-deleted strain. The resultant clones containing individual insertions were pooled and competed against one another for hundreds of generations. Finally, the fittest clones were characterized and found to contain DARS2 inserted in close proximity to the original DARS2 location.
Hegde, Shivanand; Hegde, Shrilakshmi; Zimmermann, Martina; Flöck, Martina; Spergser, Joachim; Rosengarten, Renate
2015-01-01
Mycoplasmas possess complex pathogenicity determinants that are largely unknown at the molecular level. Mycoplasma agalactiae serves as a useful model to study the molecular basis of mycoplasma pathogenicity. The generation and in vivo screening of a transposon mutant library of M. agalactiae were employed to unravel its host colonization factors. Tn4001mod mutants were sequenced using a novel sequencing method, and functionally heterogeneous pools containing 15 to 19 selected mutants were screened simultaneously through two successive cycles of sheep intramammary infections. A PCR-based negative selection method was employed to identify mutants that failed to colonize the udders and draining lymph nodes in the animals. A total of 14 different mutants found to be absent from ≥95% of samples were identified and subsequently verified via a second round of stringent confirmatory screening where 100% absence was considered attenuation. Using this criterion, seven mutants with insertions in genes MAG1050, MAG2540, MAG3390, uhpT, eutD, adhT, and MAG4460 were not recovered from any of the infected animals. Among the attenuated mutants, many contain disruptions in hypothetical genes, implying their previously unknown role in M. agalactiae pathogenicity. These data indicate the putative role of functionally different genes, including hypothetical ones, in the pathogenesis of M. agalactiae. Defining the precise functions of the identified genes is anticipated to increase our understanding of M. agalactiae infections and to develop successful intervention strategies against it. PMID:25916984
A closer look at cross-validation for assessing the accuracy of gene regulatory networks and models.
Tabe-Bordbar, Shayan; Emad, Amin; Zhao, Sihai Dave; Sinha, Saurabh
2018-04-26
Cross-validation (CV) is a technique to assess the generalizability of a model to unseen data. This technique relies on assumptions that may not be satisfied when studying genomics datasets. For example, random CV (RCV) assumes that a randomly selected set of samples, the test set, well represents unseen data. This assumption doesn't hold true where samples are obtained from different experimental conditions, and the goal is to learn regulatory relationships among the genes that generalize beyond the observed conditions. In this study, we investigated how the CV procedure affects the assessment of supervised learning methods used to learn gene regulatory networks (or in other applications). We compared the performance of a regression-based method for gene expression prediction estimated using RCV with that estimated using a clustering-based CV (CCV) procedure. Our analysis illustrates that RCV can produce over-optimistic estimates of the model's generalizability compared to CCV. Next, we defined the 'distinctness' of test set from training set and showed that this measure is predictive of performance of the regression method. Finally, we introduced a simulated annealing method to construct partitions with gradually increasing distinctness and showed that performance of different gene expression prediction methods can be better evaluated using this method.
Zhang, Cheng; Ni, Pan; Ahmad, Hafiz Ishfaq; Gemingguli, M; Baizilaitibei, A; Gulibaheti, D; Fang, Yaping; Wang, Haiyang; Asif, Akhtar Rasool; Xiao, Changyi; Chen, Jianhai; Ma, Yunlong; Liu, Xiangdong; Du, Xiaoyong; Zhao, Shuhong
2018-01-01
Animal domestication gives rise to gradual changes at the genomic level through selection in populations. Selective sweeps have been traced in the genomes of many animal species, including humans, cattle, and dogs. However, little is known regarding positional candidate genes and genomic regions that exhibit signatures of selection in domestic horses. In addition, an understanding of the genetic processes underlying horse domestication, especially the origin of Chinese native populations, is still lacking. In our study, we generated whole genome sequences from 4 Chinese native horses and combined them with 48 publicly available full genome sequences, from which 15 341 213 high-quality unique single-nucleotide polymorphism variants were identified. Kazakh and Lichuan horses are 2 typical Asian native breeds that were formed in Kazakh or Northwest China and South China, respectively. We detected 1390 loss-of-function (LoF) variants in protein-coding genes, and gene ontology (GO) enrichment analysis revealed that some LoF-affected genes were overrepresented in GO terms related to the immune response. Bayesian clustering, distance analysis, and principal component analysis demonstrated that the population structure of these breeds largely reflected weak geographic patterns. Kazakh and Lichuan horses were assigned to the same lineage with other Asian native breeds, in agreement with previous studies on the genetic origin of Chinese domestic horses. We applied the composite likelihood ratio method to scan for genomic regions showing signals of recent selection in the horse genome. A total of 1052 genomic windows of 10 kB, corresponding to 933 distinct core regions, significantly exceeded neutral simulations. The GO enrichment analysis revealed that the genes under selective sweeps were overrepresented with GO terms, including “negative regulation of canonical Wnt signaling pathway,” “muscle contraction,” and “axon guidance.” Frequent exercise training in domestic horses may have resulted in changes in the expression of genes related to metabolism, muscle structure, and the nervous system.
Hara-Kudo, Yukiko; Konishi, Noriko; Ohtsuka, Kayoko; Iwabuchi, Kaori; Kikuchi, Rie; Isobe, Junko; Yamazaki, Takumiko; Suzuki, Fumie; Nagai, Yuhki; Yamada, Hiroko; Tanouchi, Atsuko; Mori, Tetsuya; Nakagawa, Hiroshi; Ueda, Yasufumi; Terajima, Jun
2016-08-02
To establish an efficient detection method for Shiga toxin (Stx)-producing Escherichia coli (STEC) O26, O103, O111, O121, O145, and O157 in food, an interlaboratory study using all the serogroups of detection targets was firstly conducted. We employed a series of tests including enrichment, real-time PCR assays, and concentration by immunomagnetic separation, followed by plating onto selective agar media (IMS-plating methods). This study was particularly focused on the efficiencies of real-time PCR assays in detecting stx and O-antigen genes of the six serogroups and of IMS-plating methods onto selective agar media including chromogenic agar. Ground beef and radish sprouts samples were inoculated with the six STEC serogroups either at 4-6CFU/25g (low levels) or at 22-29CFU/25g (high levels). The sensitivity of stx detection in ground beef at both levels of inoculation with all six STEC serogroups was 100%. The sensitivity of stx detection was also 100% in radish sprouts at high levels of inoculation with all six STEC serogroups, and 66.7%-91.7% at low levels of inoculation. The sensitivity of detection of O-antigen genes was 100% in both ground beef and radish sprouts at high inoculation levels, while at low inoculation levels, it was 95.8%-100% in ground beef and 66.7%-91.7% in radish sprouts. The sensitivity of detection with IMS-plating was either the same or lower than those of the real-time PCR assays targeting stx and O-antigen genes. The relationship between the results of IMS-plating methods and Ct values of real-time PCR assays were firstly analyzed in detail. Ct values in most samples that tested negative in the IMS-plating method were higher than the maximum Ct values in samples that tested positive in the IMS-plating method. This study indicates that all six STEC serogroups in food contaminated with more than 29CFU/25g were detected by real-time PCR assays targeting stx and O-antigen genes and IMS-plating onto selective agar media. Therefore, screening of stx and O-antigen genes followed by isolation of STECs by IMS-plating methods may be an efficient method to detect the six STEC serogroups. Copyright © 2016 Elsevier B.V. All rights reserved.
Primary Airway Epithelial Cell Gene Editing Using CRISPR-Cas9.
Everman, Jamie L; Rios, Cydney; Seibold, Max A
2018-01-01
The adaptation of the clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR associated endonuclease 9 (CRISPR-Cas9) machinery from prokaryotic organisms has resulted in a gene editing system that is highly versatile, easily constructed, and can be leveraged to generate human cells knocked out (KO) for a specific gene. While standard transfection techniques can be used for the introduction of CRISPR-Cas9 expression cassettes to many cell types, delivery by this method is not efficient in many primary cell types, including primary human airway epithelial cells (AECs). More efficient delivery in AECs can be achieved through lentiviral-mediated transduction, allowing the CRISPR-Cas9 system to be integrated into the genome of the cell, resulting in stable expression of the nuclease machinery and increasing editing rates. In parallel, advancements have been made in the culture, expansion, selection, and differentiation of AECs, which allow the robust generation of a bulk edited AEC population from transduced cells. Applying these methods, we detail here our latest protocol to generate mucociliary epithelial cultures knocked out for a specific gene from donor-isolated primary human basal airway epithelial cells. This protocol includes methods to: (1) design and generate lentivirus which targets a specific gene for KO with CRISPR-Cas9 machinery, (2) efficiently transduce AECs, (3) culture and select for a bulk edited AEC population, (4) molecularly screen AECs for Cas9 cutting and specific sequence edits, and (5) further expand and differentiate edited cells to a mucociliary airway epithelial culture. The AEC knockouts generated using this protocol provide an excellent primary cell model system with which to characterize the function of genes involved in airway dysfunction and disease.
Ruduś, Izabela; Kępczyński, Jan
2018-01-01
Molecular studies of primary and secondary dormancy in Avena fatua L., a serious weed of cereal and other crops, are intended to reveal the species-specific details of underlying molecular mechanisms which in turn may be useable in weed management. Among others, quantitative real-time PCR (RT-qPCR) data of comparative gene expression analysis may give some insight into the involvement of particular wild oat genes in dormancy release, maintenance or induction by unfavorable conditions. To assure obtaining biologically significant results using this method, the expression stability of selected candidate reference genes in different data subsets was evaluated using four statistical algorithms i.e. geNorm, NormFinder, Best Keeper and ΔCt method. Although some discrepancies in their ranking outputs were noticed, evidently two ubiquitin-conjugating enzyme homologs, AfUBC1 and AfUBC2, as well as one homolog of glyceraldehyde 3-phosphate dehydrogenase AfGAPDH1 and TATA-binding protein AfTBP2 appeared as more stably expressed than AfEF1a (translation elongation factor 1α), AfGAPDH2 or the least stable α-tubulin homolog AfTUA1 in caryopses and seedlings of A. fatua. Gene expression analysis of a dormancy-related wild oat transcription factor VIVIPAROUS1 (AfVP1) allowed for a validation of candidate reference genes performance. Based on the obtained results it can be recommended that the normalization factor calculated as a geometric mean of Cq values of AfUBC1, AfUBC2 and AfGAPDH1 would be optimal for RT-qPCR results normalization in the experiments comprising A. fatua caryopses of different dormancy status.
Shekh, Kamran; Tang, Song; Niyogi, Som; Hecker, Markus
2017-09-01
Gene expression analysis represents a powerful approach to characterize the specific mechanisms by which contaminants interact with organisms. One of the key considerations when conducting gene expression analyses using quantitative real-time reverse transcription-polymerase chain reaction (qPCR) is the selection of appropriate reference genes, which is often overlooked. Specifically, to reach meaningful conclusions when using relative quantification approaches, expression levels of reference genes must be highly stable and cannot vary as a function of experimental conditions. However, to date, information on the stability of commonly used reference genes across developmental stages, tissues and after exposure to contaminants such as metals is lacking for many vertebrate species including teleost fish. Therefore, in this study, we assessed the stability of expression of 8 reference gene candidates in the gills and skin of three different early life-stages of rainbow trout after acute exposure (24h) to two metals, cadmium (Cd) and copper (Cu) using qPCR. Candidate housekeeping genes were: beta actin (b-actin), DNA directed RNA polymerase II subunit I (DRP2), elongation factor-1 alpha (EF1a), glyceraldehyde 3-phosphate dehydrogenase (GAPDH), glucose-6-phosphate dehydrogenase (G6PD), hypoxanthine phosphoribosyltransferase (HPRT), ribosomal protein L8 (RPL8), and 18S ribosomal RNA (18S). Four algorithms, geNorm, NormFinder, BestKeeper, and the comparative ΔCt method were employed to systematically evaluate the expression stability of these candidate genes under control and exposed conditions as well as across three different life-stages. Finally, stability of genes was ranked by taking geometric means of the ranks established by the different methods. Stability of reference genes was ranked in the following order (from lower to higher stability): HPRT
Rough set soft computing cancer classification and network: one stone, two birds.
Zhang, Yue
2010-07-15
Gene expression profiling provides tremendous information to help unravel the complexity of cancer. The selection of the most informative genes from huge noise for cancer classification has taken centre stage, along with predicting the function of such identified genes and the construction of direct gene regulatory networks at different system levels with a tuneable parameter. A new study by Wang and Gotoh described a novel Variable Precision Rough Sets-rooted robust soft computing method to successfully address these problems and has yielded some new insights. The significance of this progress and its perspectives will be discussed in this article.
Martin-Ortigosa, Susana; Peterson, David J.; Valenstein, Justin S.; Lin, Victor S.-Y.; Trewyn, Brian G.; Lyznik, L. Alexander; Wang, Kan
2014-01-01
The delivery of proteins instead of DNA into plant cells allows for a transient presence of the protein or enzyme that can be useful for biochemical analysis or genome modifications. This may be of particular interest for genome editing, because it can avoid DNA (transgene) integration into the genome and generate precisely modified “nontransgenic” plants. In this work, we explore direct protein delivery to plant cells using mesoporous silica nanoparticles (MSNs) as carriers to deliver Cre recombinase protein into maize (Zea mays) cells. Cre protein was loaded inside the pores of gold-plated MSNs, and these particles were delivered by the biolistic method to plant cells harboring loxP sites flanking a selection gene and a reporter gene. Cre protein was released inside the cell, leading to recombination of the loxP sites and elimination of both genes. Visual selection was used to select recombination events from which fertile plants were regenerated. Up to 20% of bombarded embryos produced calli with the recombined loxP sites under our experimental conditions. This direct and reproducible technology offers an alternative for DNA-free genome-editing technologies in which MSNs can be tailored to accommodate the desired enzyme and to reach the desired tissue through the biolistic method. PMID:24376280
Examination of Global Methylation and Targeted Imprinted Genes in Prader-Willi Syndrome.
Manzardo, A M; Butler, M G
2016-01-01
Methylation changes observed in Prader-Willi syndrome (PWS) may impact global methylation as well as regional methylation status of imprinted genes on chromosome 15 (in cis) or other imprinted obesity-related genes on other chromosomes (in trans) leading to differential effects on gene expression impacting obesity phenotype unique to (PWS). Characterize the global methylation profiles and methylation status for select imprinted genes associated with obesity phenotype in a well-characterized imprinted, obesity-related syndrome (PWS) relative to a cohort of obese and non-obese individuals. Global methylation was assayed using two methodologies: 1) enriched LINE-1 repeat sequences by EpigenDx and 2) ELISA-based immunoassay method sensitive to genomic 5-methylcytosine by Epigentek. Target gene methylation patterns at selected candidate obesity gene loci were determined using methylation-specific PCR. Study participants were recruited as part of an ongoing research program on obesity-related genomics and Prader-Willi syndrome. Individuals with non-syndromic obesity (N=26), leanness (N=26) and PWS (N=39). A detailed characterization of the imprinting status of select target genes within the critical PWS 15q11-q13 genomic region showed enhanced cis but not trans methylation of imprinted genes. No significant differences in global methylation were found between non-syndromic obese, PWS or non-obese controls. None. Percentage methylation and the methylation index. The methylation abnormality in PWS due to errors of genomic imprinting effects both upstream and downstream effectors in the 15q11-q13 region showing enhanced cis but not trans methylation of imprinted genes. Obesity in our subject cohorts did not appear to impact global methylation levels using the described methodology.
Examination of Global Methylation and Targeted Imprinted Genes in Prader-Willi Syndrome
Manzardo, AM; Butler, MG
2016-01-01
Context Methylation changes observed in Prader-Willi syndrome (PWS) may impact global methylation as well as regional methylation status of imprinted genes on chromosome 15 (in cis) or other imprinted obesity-related genes on other chromosomes (in trans) leading to differential effects on gene expression impacting obesity phenotype unique to (PWS). Objective Characterize the global methylation profiles and methylation status for select imprinted genes associated with obesity phenotype in a well-characterized imprinted, obesity-related syndrome (PWS) relative to a cohort of obese and non-obese individuals. Design Global methylation was assayed using two methodologies: 1) enriched LINE-1 repeat sequences by EpigenDx and 2) ELISA-based immunoassay method sensitive to genomic 5-methylcytosine by Epigentek. Target gene methylation patterns at selected candidate obesity gene loci were determined using methylation-specific PCR. Setting Study participants were recruited as part of an ongoing research program on obesity-related genomics and Prader-Willi syndrome. Participants Individuals with non-syndromic obesity (N=26), leanness (N=26) and PWS (N=39). Results A detailed characterization of the imprinting status of select target genes within the critical PWS 15q11-q13 genomic region showed enhanced cis but not trans methylation of imprinted genes. No significant differences in global methylation were found between non-syndromic obese, PWS or non-obese controls. Intervention None. Main outcome measures Percentage methylation and the methylation index. Conclusion The methylation abnormality in PWS due to errors of genomic imprinting effects both upstream and downstream effectors in the 15q11-q13 region showing enhanced cis but not trans methylation of imprinted genes. Obesity in our subject cohorts did not appear to impact global methylation levels using the described methodology. PMID:28111641
Simplified Identification of mRNA or DNA in Whole Cells
NASA Technical Reports Server (NTRS)
Almeida, Eduardo; Kadambi, Geeta
2007-01-01
A recently invented method of detecting a selected messenger ribonucleic acid (mRNA) or deoxyribonucleic acid (DNA) sequence offers two important advantages over prior such methods: it is simpler and can be implemented by means of compact equipment. The simplification and miniaturization achieved by this invention are such that this method is suitable for use outside laboratories, in field settings in which space and power supplies may be limited. The present method is based partly on hybridization of nucleic acid, which is a powerful technique for detection of specific complementary nucleic acid sequences and is increasingly being used for detection of changes in gene expression in microarrays containing thousands of gene probes.
AUCTSP: an improved biomarker gene pair class predictor.
Kagaris, Dimitri; Khamesipour, Alireza; Yiannoutsos, Constantin T
2018-06-26
The Top Scoring Pair (TSP) classifier, based on the concept of relative ranking reversals in the expressions of pairs of genes, has been proposed as a simple, accurate, and easily interpretable decision rule for classification and class prediction of gene expression profiles. The idea that differences in gene expression ranking are associated with presence or absence of disease is compelling and has strong biological plausibility. Nevertheless, the TSP formulation ignores significant available information which can improve classification accuracy and is vulnerable to selecting genes which do not have differential expression in the two conditions ("pivot" genes). We introduce the AUCTSP classifier as an alternative rank-based estimator of the magnitude of the ranking reversals involved in the original TSP. The proposed estimator is based on the Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) and as such, takes into account the separation of the entire distribution of gene expression levels in gene pairs under the conditions considered, as opposed to comparing gene rankings within individual subjects as in the original TSP formulation. Through extensive simulations and case studies involving classification in ovarian, leukemia, colon, breast and prostate cancers and diffuse large b-cell lymphoma, we show the superiority of the proposed approach in terms of improving classification accuracy, avoiding overfitting and being less prone to selecting non-informative (pivot) genes. The proposed AUCTSP is a simple yet reliable and robust rank-based classifier for gene expression classification. While the AUCTSP works by the same principle as TSP, its ability to determine the top scoring gene pair based on the relative rankings of two marker genes across all subjects as opposed to each individual subject results in significant performance gains in classification accuracy. In addition, the proposed method tends to avoid selection of non-informative (pivot) genes as members of the top-scoring pair.
Joyce, Priya; Kuwahata, Melissa; Turner, Nicole; Lakshmanan, Prakash
2010-02-01
A reproducible method for transformation of sugarcane using various strains of Agrobacterium tumefaciens (A. tumefaciens) (AGL0, AGL1, EHA105 and LBA4404) has been developed. The selection system and co-cultivation medium were the most important factors determining the success of transformation and transgenic plant regeneration. Plant regeneration at a frequency of 0.8-4.8% occurred only when callus was transformed with A. tumefaciens carrying a newly constructed superbinary plasmid containing neomycin phosphotransferase (nptII) and beta-glucuronidase (gusA) genes, both driven by the maize ubiquitin (ubi-1) promoter. Regeneration was successful in plants carrying the nptII gene but not the hygromycin phosphotransferase (hph) gene. NptII gene selection was imposed at a concentration of 150 mg/l paromomycin sulphate and applied either immediately or 4 days after the co-cultivation period. Co-cultivation on Murashige and Skoog (MS)-based medium for a period of 4 days produced the highest number of transgenic plants. Over 200 independent transgenic lines were created using this protocol. Regenerated plants appeared phenotypically normal and contained both gusA and nptII genes. Southern blot analysis revealed 1-3 transgene insertion events that were randomly integrated in the majority of the plants produced.
Zhu, Xun; Wan, Hu; Shakeel, Muhammad; Zhan, Sha; Jin, Byung-Rae; Li, Jianhong
2014-01-01
The brown planthopper (BPH), Nilaparvata lugens (Hemiptera, Delphacidae), is one of the most important rice pests. Abundant genetic studies on BPH have been conducted using reverse-transcription quantitative real-time PCR (qRT-PCR). Using qRT-PCR, the expression levels of target genes are calculated on the basis of endogenous controls. These genes need to be appropriately selected by experimentally assessing whether they are stably expressed under different conditions. However, such studies on potential reference genes in N. lugens are lacking. In this paper, we presented a systematic exploration of eight candidate reference genes in N. lugens, namely, actin 1 (ACT), muscle actin (MACT), ribosomal protein S11 (RPS11), ribosomal protein S15e (RPS15), alpha 2-tubulin (TUB), elongation factor 1 delta (EF), 18S ribosomal RNA (18S), and arginine kinase (AK) and used four alternative methods (BestKeeper, geNorm, NormFinder, and the delta Ct method) to evaluate the suitability of these genes as endogenous controls. We examined their expression levels among different experimental factors (developmental stage, body part, geographic population, temperature variation, pesticide exposure, diet change, and starvation) following the MIQE (Minimum Information for publication of Quantitative real time PCR Experiments) guidelines. Based on the results of RefFinder, which integrates four currently available major software programs to compare and rank the tested candidate reference genes, RPS15, RPS11, and TUB were found to be the most suitable reference genes in different developmental stages, body parts, and geographic populations, respectively. RPS15 was the most suitable gene under different temperature and diet conditions, while RPS11 was the most suitable gene under different pesticide exposure and starvation conditions. This work sheds light on establishing a standardized qRT-PCR procedure in N. lugens, and serves as a starting point for screening for reference genes for expression studies of related insects. PMID:24466124
Yuan, Miao; Lu, Yanhui; Zhu, Xun; Wan, Hu; Shakeel, Muhammad; Zhan, Sha; Jin, Byung-Rae; Li, Jianhong
2014-01-01
The brown planthopper (BPH), Nilaparvata lugens (Hemiptera, Delphacidae), is one of the most important rice pests. Abundant genetic studies on BPH have been conducted using reverse-transcription quantitative real-time PCR (qRT-PCR). Using qRT-PCR, the expression levels of target genes are calculated on the basis of endogenous controls. These genes need to be appropriately selected by experimentally assessing whether they are stably expressed under different conditions. However, such studies on potential reference genes in N. lugens are lacking. In this paper, we presented a systematic exploration of eight candidate reference genes in N. lugens, namely, actin 1 (ACT), muscle actin (MACT), ribosomal protein S11 (RPS11), ribosomal protein S15e (RPS15), alpha 2-tubulin (TUB), elongation factor 1 delta (EF), 18S ribosomal RNA (18S), and arginine kinase (AK) and used four alternative methods (BestKeeper, geNorm, NormFinder, and the delta Ct method) to evaluate the suitability of these genes as endogenous controls. We examined their expression levels among different experimental factors (developmental stage, body part, geographic population, temperature variation, pesticide exposure, diet change, and starvation) following the MIQE (Minimum Information for publication of Quantitative real time PCR Experiments) guidelines. Based on the results of RefFinder, which integrates four currently available major software programs to compare and rank the tested candidate reference genes, RPS15, RPS11, and TUB were found to be the most suitable reference genes in different developmental stages, body parts, and geographic populations, respectively. RPS15 was the most suitable gene under different temperature and diet conditions, while RPS11 was the most suitable gene under different pesticide exposure and starvation conditions. This work sheds light on establishing a standardized qRT-PCR procedure in N. lugens, and serves as a starting point for screening for reference genes for expression studies of related insects.
Efficient experimental design for uncertainty reduction in gene regulatory networks.
Dehghannasiri, Roozbeh; Yoon, Byung-Jun; Dougherty, Edward R
2015-01-01
An accurate understanding of interactions among genes plays a major role in developing therapeutic intervention methods. Gene regulatory networks often contain a significant amount of uncertainty. The process of prioritizing biological experiments to reduce the uncertainty of gene regulatory networks is called experimental design. Under such a strategy, the experiments with high priority are suggested to be conducted first. The authors have already proposed an optimal experimental design method based upon the objective for modeling gene regulatory networks, such as deriving therapeutic interventions. The experimental design method utilizes the concept of mean objective cost of uncertainty (MOCU). MOCU quantifies the expected increase of cost resulting from uncertainty. The optimal experiment to be conducted first is the one which leads to the minimum expected remaining MOCU subsequent to the experiment. In the process, one must find the optimal intervention for every gene regulatory network compatible with the prior knowledge, which can be prohibitively expensive when the size of the network is large. In this paper, we propose a computationally efficient experimental design method. This method incorporates a network reduction scheme by introducing a novel cost function that takes into account the disruption in the ranking of potential experiments. We then estimate the approximate expected remaining MOCU at a lower computational cost using the reduced networks. Simulation results based on synthetic and real gene regulatory networks show that the proposed approximate method has close performance to that of the optimal method but at lower computational cost. The proposed approximate method also outperforms the random selection policy significantly. A MATLAB software implementing the proposed experimental design method is available at http://gsp.tamu.edu/Publications/supplementary/roozbeh15a/.
Efficient experimental design for uncertainty reduction in gene regulatory networks
2015-01-01
Background An accurate understanding of interactions among genes plays a major role in developing therapeutic intervention methods. Gene regulatory networks often contain a significant amount of uncertainty. The process of prioritizing biological experiments to reduce the uncertainty of gene regulatory networks is called experimental design. Under such a strategy, the experiments with high priority are suggested to be conducted first. Results The authors have already proposed an optimal experimental design method based upon the objective for modeling gene regulatory networks, such as deriving therapeutic interventions. The experimental design method utilizes the concept of mean objective cost of uncertainty (MOCU). MOCU quantifies the expected increase of cost resulting from uncertainty. The optimal experiment to be conducted first is the one which leads to the minimum expected remaining MOCU subsequent to the experiment. In the process, one must find the optimal intervention for every gene regulatory network compatible with the prior knowledge, which can be prohibitively expensive when the size of the network is large. In this paper, we propose a computationally efficient experimental design method. This method incorporates a network reduction scheme by introducing a novel cost function that takes into account the disruption in the ranking of potential experiments. We then estimate the approximate expected remaining MOCU at a lower computational cost using the reduced networks. Conclusions Simulation results based on synthetic and real gene regulatory networks show that the proposed approximate method has close performance to that of the optimal method but at lower computational cost. The proposed approximate method also outperforms the random selection policy significantly. A MATLAB software implementing the proposed experimental design method is available at http://gsp.tamu.edu/Publications/supplementary/roozbeh15a/. PMID:26423515
Kernel-imbedded Gaussian processes for disease classification using microarray gene expression data
Zhao, Xin; Cheung, Leo Wang-Kit
2007-01-01
Background Designing appropriate machine learning methods for identifying genes that have a significant discriminating power for disease outcomes has become more and more important for our understanding of diseases at genomic level. Although many machine learning methods have been developed and applied to the area of microarray gene expression data analysis, the majority of them are based on linear models, which however are not necessarily appropriate for the underlying connection between the target disease and its associated explanatory genes. Linear model based methods usually also bring in false positive significant features more easily. Furthermore, linear model based algorithms often involve calculating the inverse of a matrix that is possibly singular when the number of potentially important genes is relatively large. This leads to problems of numerical instability. To overcome these limitations, a few non-linear methods have recently been introduced to the area. Many of the existing non-linear methods have a couple of critical problems, the model selection problem and the model parameter tuning problem, that remain unsolved or even untouched. In general, a unified framework that allows model parameters of both linear and non-linear models to be easily tuned is always preferred in real-world applications. Kernel-induced learning methods form a class of approaches that show promising potentials to achieve this goal. Results A hierarchical statistical model named kernel-imbedded Gaussian process (KIGP) is developed under a unified Bayesian framework for binary disease classification problems using microarray gene expression data. In particular, based on a probit regression setting, an adaptive algorithm with a cascading structure is designed to find the appropriate kernel, to discover the potentially significant genes, and to make the optimal class prediction accordingly. A Gibbs sampler is built as the core of the algorithm to make Bayesian inferences. Simulation studies showed that, even without any knowledge of the underlying generative model, the KIGP performed very close to the theoretical Bayesian bound not only in the case with a linear Bayesian classifier but also in the case with a very non-linear Bayesian classifier. This sheds light on its broader usability to microarray data analysis problems, especially to those that linear methods work awkwardly. The KIGP was also applied to four published microarray datasets, and the results showed that the KIGP performed better than or at least as well as any of the referred state-of-the-art methods did in all of these cases. Conclusion Mathematically built on the kernel-induced feature space concept under a Bayesian framework, the KIGP method presented in this paper provides a unified machine learning approach to explore both the linear and the possibly non-linear underlying relationship between the target features of a given binary disease classification problem and the related explanatory gene expression data. More importantly, it incorporates the model parameter tuning into the framework. The model selection problem is addressed in the form of selecting a proper kernel type. The KIGP method also gives Bayesian probabilistic predictions for disease classification. These properties and features are beneficial to most real-world applications. The algorithm is naturally robust in numerical computation. The simulation studies and the published data studies demonstrated that the proposed KIGP performs satisfactorily and consistently. PMID:17328811
Spectral biclustering of microarray data: coclustering genes and conditions.
Kluger, Yuval; Basri, Ronen; Chang, Joseph T; Gerstein, Mark
2003-04-01
Global analyses of RNA expression levels are useful for classifying genes and overall phenotypes. Often these classification problems are linked, and one wants to find "marker genes" that are differentially expressed in particular sets of "conditions." We have developed a method that simultaneously clusters genes and conditions, finding distinctive "checkerboard" patterns in matrices of gene expression data, if they exist. In a cancer context, these checkerboards correspond to genes that are markedly up- or downregulated in patients with particular types of tumors. Our method, spectral biclustering, is based on the observation that checkerboard structures in matrices of expression data can be found in eigenvectors corresponding to characteristic expression patterns across genes or conditions. In addition, these eigenvectors can be readily identified by commonly used linear algebra approaches, in particular the singular value decomposition (SVD), coupled with closely integrated normalization steps. We present a number of variants of the approach, depending on whether the normalization over genes and conditions is done independently or in a coupled fashion. We then apply spectral biclustering to a selection of publicly available cancer expression data sets, and examine the degree to which the approach is able to identify checkerboard structures. Furthermore, we compare the performance of our biclustering methods against a number of reasonable benchmarks (e.g., direct application of SVD or normalized cuts to raw data).
Latent feature decompositions for integrative analysis of multi-platform genomic data
Gregory, Karl B.; Momin, Amin A.; Coombes, Kevin R.; Baladandayuthapani, Veerabhadran
2015-01-01
Increased availability of multi-platform genomics data on matched samples has sparked research efforts to discover how diverse molecular features interact both within and between platforms. In addition, simultaneous measurements of genetic and epigenetic characteristics illuminate the roles their complex relationships play in disease progression and outcomes. However, integrative methods for diverse genomics data are faced with the challenges of ultra-high dimensionality and the existence of complex interactions both within and between platforms. We propose a novel modeling framework for integrative analysis based on decompositions of the large number of platform-specific features into a smaller number of latent features. Subsequently we build a predictive model for clinical outcomes accounting for both within- and between-platform interactions based on Bayesian model averaging procedures. Principal components, partial least squares and non-negative matrix factorization as well as sparse counterparts of each are used to define the latent features, and the performance of these decompositions is compared both on real and simulated data. The latent feature interactions are shown to preserve interactions between the original features and not only aid prediction but also allow explicit selection of outcome-related features. The methods are motivated by and applied to, a glioblastoma multiforme dataset from The Cancer Genome Atlas to predict patient survival times integrating gene expression, microRNA, copy number and methylation data. For the glioblastoma data, we find a high concordance between our selected prognostic genes and genes with known associations with glioblastoma. In addition, our model discovers several relevant cross-platform interactions such as copy number variation associated gene dosing and epigenetic regulation through promoter methylation. On simulated data, we show that our proposed method successfully incorporates interactions within and between genomic platforms to aid accurate prediction and variable selection. Our methods perform best when principal components are used to define the latent features. PMID:26146492
Koczula, A; Willenborg, J; Bertram, R; Takamatsu, D; Valentin-Weigand, P; Goethe, R
2014-12-01
The lack of knowledge about pathogenicity mechanisms of Streptococcus (S.) suis is, at least partially, attributed to limited methods for its genetic manipulation. Here, we established a Cre-lox based recombination system for markerless gene deletions in S. suis serotype 2 with high selective pressure and without undesired side effects. Copyright © 2014 Elsevier B.V. All rights reserved.
Identifying key genes in glaucoma based on a benchmarked dataset and the gene regulatory network.
Chen, Xi; Wang, Qiao-Ling; Zhang, Meng-Hui
2017-10-01
The current study aimed to identify key genes in glaucoma based on a benchmarked dataset and gene regulatory network (GRN). Local and global noise was added to the gene expression dataset to produce a benchmarked dataset. Differentially-expressed genes (DEGs) between patients with glaucoma and normal controls were identified utilizing the Linear Models for Microarray Data (Limma) package based on benchmarked dataset. A total of 5 GRN inference methods, including Zscore, GeneNet, context likelihood of relatedness (CLR) algorithm, Partial Correlation coefficient with Information Theory (PCIT) and GEne Network Inference with Ensemble of Trees (Genie3) were evaluated using receiver operating characteristic (ROC) and precision and recall (PR) curves. The interference method with the best performance was selected to construct the GRN. Subsequently, topological centrality (degree, closeness and betweenness) was conducted to identify key genes in the GRN of glaucoma. Finally, the key genes were validated by performing reverse transcription-quantitative polymerase chain reaction (RT-qPCR). A total of 176 DEGs were detected from the benchmarked dataset. The ROC and PR curves of the 5 methods were analyzed and it was determined that Genie3 had a clear advantage over the other methods; thus, Genie3 was used to construct the GRN. Following topological centrality analysis, 14 key genes for glaucoma were identified, including IL6 , EPHA2 and GSTT1 and 5 of these 14 key genes were validated by RT-qPCR. Therefore, the current study identified 14 key genes in glaucoma, which may be potential biomarkers to use in the diagnosis of glaucoma and aid in identifying the molecular mechanism of this disease.
Yang, Chunxiao; Li, Hui; Pan, Huipeng; Ma, Yabin; Zhang, Deyong; Liu, Yong; Zhang, Zhanhong; Zheng, Changying; Chu, Dong
2015-01-01
Reverse transcriptase-quantitative polymerase chain reaction (RT-qPCR) is a reliable technique for measuring and evaluating gene expression during variable biological processes. To facilitate gene expression studies, normalization of genes of interest relative to stable reference genes is crucial. The western flower thrips Frankliniella occidentalis (Pergande) (Thysanoptera: Thripidae), the main vector of tomato spotted wilt virus (TSWV), is a destructive invasive species. In this study, the expression profiles of 11 candidate reference genes from nonviruliferous and viruliferous F. occidentalis were investigated. Five distinct algorithms, geNorm, NormFinder, BestKeeper, the ΔCt method, and RefFinder, were used to determine the performance of these genes. geNorm, NormFinder, BestKeeper, and RefFinder identified heat shock protein 70 (HSP70), heat shock protein 60 (HSP60), elongation factor 1 α, and ribosomal protein l32 (RPL32) as the most stable reference genes, and the ΔCt method identified HSP60, HSP70, RPL32, and heat shock protein 90 as the most stable reference genes. Additionally, two reference genes were sufficient for reliable normalization in nonviruliferous and viruliferous F. occidentalis. This work provides a foundation for investigating the molecular mechanisms of TSWV and F. occidentalis interactions.
A large-scale benchmark of gene prioritization methods.
Guala, Dimitri; Sonnhammer, Erik L L
2017-04-21
In order to maximize the use of results from high-throughput experimental studies, e.g. GWAS, for identification and diagnostics of new disease-associated genes, it is important to have properly analyzed and benchmarked gene prioritization tools. While prospective benchmarks are underpowered to provide statistically significant results in their attempt to differentiate the performance of gene prioritization tools, a strategy for retrospective benchmarking has been missing, and new tools usually only provide internal validations. The Gene Ontology(GO) contains genes clustered around annotation terms. This intrinsic property of GO can be utilized in construction of robust benchmarks, objective to the problem domain. We demonstrate how this can be achieved for network-based gene prioritization tools, utilizing the FunCoup network. We use cross-validation and a set of appropriate performance measures to compare state-of-the-art gene prioritization algorithms: three based on network diffusion, NetRank and two implementations of Random Walk with Restart, and MaxLink that utilizes network neighborhood. Our benchmark suite provides a systematic and objective way to compare the multitude of available and future gene prioritization tools, enabling researchers to select the best gene prioritization tool for the task at hand, and helping to guide the development of more accurate methods.
Effective Feature Selection for Classification of Promoter Sequences.
K, Kouser; P G, Lavanya; Rangarajan, Lalitha; K, Acharya Kshitish
2016-01-01
Exploring novel computational methods in making sense of biological data has not only been a necessity, but also productive. A part of this trend is the search for more efficient in silico methods/tools for analysis of promoters, which are parts of DNA sequences that are involved in regulation of expression of genes into other functional molecules. Promoter regions vary greatly in their function based on the sequence of nucleotides and the arrangement of protein-binding short-regions called motifs. In fact, the regulatory nature of the promoters seems to be largely driven by the selective presence and/or the arrangement of these motifs. Here, we explore computational classification of promoter sequences based on the pattern of motif distributions, as such classification can pave a new way of functional analysis of promoters and to discover the functionally crucial motifs. We make use of Position Specific Motif Matrix (PSMM) features for exploring the possibility of accurately classifying promoter sequences using some of the popular classification techniques. The classification results on the complete feature set are low, perhaps due to the huge number of features. We propose two ways of reducing features. Our test results show improvement in the classification output after the reduction of features. The results also show that decision trees outperform SVM (Support Vector Machine), KNN (K Nearest Neighbor) and ensemble classifier LibD3C, particularly with reduced features. The proposed feature selection methods outperform some of the popular feature transformation methods such as PCA and SVD. Also, the methods proposed are as accurate as MRMR (feature selection method) but much faster than MRMR. Such methods could be useful to categorize new promoters and explore regulatory mechanisms of gene expressions in complex eukaryotic species.
Jasim, B; Jimtha John, C; Shimil, V; Jyothis, M; Radhakrishnan, E K
2014-09-01
The study mainly aimed quantitative analysis of IAA produced by endophytic bacteria under various conditions including the presence of extract from Piper nigrum. Analysis of genetic basis of IAA production was also conducted by studying the presence and diversity of the ipdc gene among the selected isolates. Five endophytic bacteria isolated previously from P. nigrum were used for the study. The effect of temperature, pH, agitation, tryptophan concentration and plant extract on modulating IAA production of selected isolates was analysed by colorimetric method. Comparative and quantitative analysis of IAA production by colorimetric isolates under optimal culture condition was analysed by HPTLC method. Presence of ipdc gene and thereby biosynthetic basis of IAA production among the selected isolates were studied by PCR-based amplification and subsequent insilico analysis of sequence obtained. Among the selected bacterial isolates from P. nigrum, isolate PnB 8 (Klebsiella pneumoniae) was found to have the maximum yield of IAA under various conditions optimized and was confirmed by colorimetric, HPLC and HPTLC analysis. Very interestingly, the study showed stimulating effect of phytochemicals from P. nigrum on IAA production by endophytic bacteria isolated from same plant. This study is unique because of the selection of endophytes from same source for comparative and quantitative analysis of IAA production under various conditions. Study on stimulatory effect of phytochemicals on bacterial IAA production as explained in the study is a novel approach. Studies on molecular basis of IAA production which was confirmed by sequence analysis of ipdc gene make the study scientifically attractive. Even though microbial production of IAA is well known, current report on detailed optimization, effect of plant extract and molecular confirmation of IAA biosynthesis is comparatively novel in its approach. © 2014 The Society for Applied Microbiology.
Bikel, Shirley; Jacobo-Albavera, Leonor; Sánchez-Muñoz, Fausto; Cornejo-Granados, Fernanda; Canizales-Quinteros, Samuel; Soberón, Xavier; Sotelo-Mundo, Rogerio R.; del Río-Navarro, Blanca E.; Mendoza-Vargas, Alfredo; Sánchez, Filiberto
2017-01-01
Background In spite of the emergence of RNA sequencing (RNA-seq), microarrays remain in widespread use for gene expression analysis in the clinic. There are over 767,000 RNA microarrays from human samples in public repositories, which are an invaluable resource for biomedical research and personalized medicine. The absolute gene expression analysis allows the transcriptome profiling of all expressed genes under a specific biological condition without the need of a reference sample. However, the background fluorescence represents a challenge to determine the absolute gene expression in microarrays. Given that the Y chromosome is absent in female subjects, we used it as a new approach for absolute gene expression analysis in which the fluorescence of the Y chromosome genes of female subjects was used as the background fluorescence for all the probes in the microarray. This fluorescence was used to establish an absolute gene expression threshold, allowing the differentiation between expressed and non-expressed genes in microarrays. Methods We extracted the RNA from 16 children leukocyte samples (nine males and seven females, ages 6–10 years). An Affymetrix Gene Chip Human Gene 1.0 ST Array was carried out for each sample and the fluorescence of 124 genes of the Y chromosome was used to calculate the absolute gene expression threshold. After that, several expressed and non-expressed genes according to our absolute gene expression threshold were compared against the expression obtained using real-time quantitative polymerase chain reaction (RT-qPCR). Results From the 124 genes of the Y chromosome, three genes (DDX3Y, TXLNG2P and EIF1AY) that displayed significant differences between sexes were used to calculate the absolute gene expression threshold. Using this threshold, we selected 13 expressed and non-expressed genes and confirmed their expression level by RT-qPCR. Then, we selected the top 5% most expressed genes and found that several KEGG pathways were significantly enriched. Interestingly, these pathways were related to the typical functions of leukocytes cells, such as antigen processing and presentation and natural killer cell mediated cytotoxicity. We also applied this method to obtain the absolute gene expression threshold in already published microarray data of liver cells, where the top 5% expressed genes showed an enrichment of typical KEGG pathways for liver cells. Our results suggest that the three selected genes of the Y chromosome can be used to calculate an absolute gene expression threshold, allowing a transcriptome profiling of microarray data without the need of an additional reference experiment. Discussion Our approach based on the establishment of a threshold for absolute gene expression analysis will allow a new way to analyze thousands of microarrays from public databases. This allows the study of different human diseases without the need of having additional samples for relative expression experiments. PMID:29230367
Yi, Ming; Stephens, Robert M.
2008-01-01
Analysis of microarray and other high throughput data often involves identification of genes consistently up or down-regulated across samples as the first step in extraction of biological meaning. This gene-level paradigm can be limited as a result of valid sample fluctuations and biological complexities. In this report, we describe a novel method, SLEPR, which eliminates this limitation by relying on pathway-level consistencies. Our method first selects the sample-level differentiated genes from each individual sample, capturing genes missed by other analysis methods, ascertains the enrichment levels of associated pathways from each of those lists, and then ranks annotated pathways based on the consistency of enrichment levels of individual samples from both sample classes. As a proof of concept, we have used this method to analyze three public microarray datasets with a direct comparison with the GSEA method, one of the most popular pathway-level analysis methods in the field. We found that our method was able to reproduce the earlier observations with significant improvements in depth of coverage for validated or expected biological themes, but also produced additional insights that make biological sense. This new method extends existing analyses approaches and facilitates integration of different types of HTP data. PMID:18818771
Croze, Myriam; Živković, Daniel; Stephan, Wolfgang; Hutter, Stephan
2016-08-01
Balancing selection has been widely assumed to be an important evolutionary force, yet even today little is known about its abundance and its impact on the patterns of genetic diversity. Several studies have shown examples of balancing selection in humans, plants or parasites, and many genes under balancing selection are involved in immunity. It has been proposed that host-parasite coevolution is one of the main forces driving immune genes to evolve under balancing selection. In this paper, we review the literature on balancing selection on immunity genes in several organisms, including Drosophila. Furthermore, we performed a genome scan for balancing selection in an African population of Drosophila melanogaster using coalescent simulations of a demographic model with and without selection. We find very few genes under balancing selection and only one novel candidate gene related to immunity. Finally, we discuss the possible causes of the low number of genes under balancing selection. Copyright © 2016 The Authors. Published by Elsevier GmbH.. All rights reserved.
Wen, Shuxiang; Chen, Xiaoling; Xu, Fuzhou; Sun, Huiling
2016-01-01
Real-time quantitative reverse transcription PCR (qRT-PCR) offers a robust method for measurement of gene expression levels. Selection of reliable reference gene(s) for gene expression study is conducive to reduce variations derived from different amounts of RNA and cDNA, the efficiency of the reverse transcriptase or polymerase enzymes. Until now reference genes identified for other members of the family Pasteurellaceae have not been validated for Avibacterium paragallinarum. The aim of this study was to validate nine reference genes of serovars A, B, and C strains of A. paragallinarum in different growth phase by qRT-PCR. Three of the most widely used statistical algorithms, geNorm, NormFinder and ΔCT method were used to evaluate the expression stability of reference genes. Data analyzed by overall rankings showed that in exponential and stationary phase of serovar A, the most stable reference genes were gyrA and atpD respectively; in exponential and stationary phase of serovar B, the most stable reference genes were atpD and recN respectively; in exponential and stationary phase of serovar C, the most stable reference genes were rpoB and recN respectively. This study provides recommendations for stable endogenous control genes for use in further studies involving measurement of gene expression levels.
Peng, Jiajie; Zhang, Xuanshuo; Hui, Weiwei; Lu, Junya; Li, Qianqian; Liu, Shuhui; Shang, Xuequn
2018-03-19
Gene Ontology (GO) is one of the most popular bioinformatics resources. In the past decade, Gene Ontology-based gene semantic similarity has been effectively used to model gene-to-gene interactions in multiple research areas. However, most existing semantic similarity approaches rely only on GO annotations and structure, or incorporate only local interactions in the co-functional network. This may lead to inaccurate GO-based similarity resulting from the incomplete GO topology structure and gene annotations. We present NETSIM2, a new network-based method that allows researchers to measure GO-based gene functional similarities by considering the global structure of the co-functional network with a random walk with restart (RWR)-based method, and by selecting the significant term pairs to decrease the noise information. Based on the EC number (Enzyme Commission)-based groups of yeast and Arabidopsis, evaluation test shows that NETSIM2 can enhance the accuracy of Gene Ontology-based gene functional similarity. Using NETSIM2 as an example, we found that the accuracy of semantic similarities can be significantly improved after effectively incorporating the global gene-to-gene interactions in the co-functional network, especially on the species that gene annotations in GO are far from complete.
A double built-in containment strategy for production of recombinant proteins in transgenic rice.
Zhang, Xianwen; Wang, Dongfang; Zhao, Sinan; Shen, Zhicheng
2014-01-01
Using transgenic rice as a bioreactor for mass production of pharmaceutical proteins could potentially reduce the cost of production significantly. However, a major concern over the bioreactor transgenic rice is the risk of its unintended spreading into environment and into food or feed supplies. Here we report a mitigating method to prevent unwanted transgenic rice spreading by a double built-in containment strategy, which sets a selectively termination method and a visual tag technology in the T-DNA for transformation. We created transgenic rice with an inserted T-DNA that harbors a human proinsulin gene fused with the far-red fluorescent protein gene mKate_S158A, an RNAi cassette suppressing the expression of the rice bentazon detoxification enzyme CYP81A6, and an EPSPS gene as the selection marker for transformation. Herbicide spray tests indicated that such transgenic rice plants can be killed selectively by a spray of bentazon at regular field application dosage for rice weed control. Moreover, the transgenic rice seeds were bright red in color due to the fused far-red fluorescent protein, and could be easily visualized under daylight by naked eyes. Thus, the transgenic rice plants reported in this study could be selectively killed by a commonly used herbicide during their growth stage, and their seeds may be detected visually during processing and consumption after harvest. This double built-in containment strategy may greatly enhance the confinement of the transgenic rice.
NASA Astrophysics Data System (ADS)
Kuo, C.; Hsu, B.; Shen, T.; Tseng, S.; Tsai, J.; Huang, K.; Kao, P.; Chen, J.
2013-12-01
Salmonella spp. is a common water-borne pathogens and its genus comprises more than 2,500 serotypes. Major pathogenic genotypes which cause typhoid fever, enteritis and other intestinal-type diseases are S. Typhimurium, S. Enteritidis, S. Stanley, S. Agona, S.Albany, S. Schwarzengrund, S. Newport, S. Choleraesuis, and S. Derby. Hence, the identification of the serotypes of Salmonella spp. is important. In the present study, the analytical procedures include direct concentration method, non-selective pre-enrichment method and selective enrichment method of Salmonella spp.. Both selective enrichment method and cultured bacteria were detected with specific primers of Salmonella spp. by polymerase chain reaction (PCR). At last, the serotypes of Salmonella were confirmed by using MLST (multilocus sequence typing) with aroC, dnaN, hemD, hisD, purE, sucA, thrA housekeeping genes to identify the strains of positive samples. This study contains 121 samples from three different types of water sources including the drinking water (51), streams (45), and swine wastewater (25). Thirteen samples with positive invA gene are separated from culture method. The strains of these positive samples which identified from MLST method are S. Albany, S. Typhimurium, S. Newport, S. Bareilly, and S. Derby. Some of the serotypes, S. Albany, S. Typhimurium and S. Newport, are highly pathogenic which correlated to human diarrhea. In our results, MLST is a useful method to identify the strains of Salmonella spp.. Keywords: Salmonella, PCR, MLST.
Reference genes for measuring mRNA expression.
Dundas, Jitesh; Ling, Maurice
2012-12-01
The aim of this review is to find answers to some of the questions surrounding reference genes and their reliability for quantitative experiments. Reference genes are assumed to be at a constant expression level, over a range of conditions such as temperature. These genes, such as GADPH and beta-actin, are used extensively for gene expression studies using techniques like quantitative PCR. There have been several studies carried out on identifying reference genes. However, a lot of evidence indicates issues to the general suitability of these genes. Recent studies had shown that different factors, including the environment and methods, play an important role in changing the expression levels of the reference genes. Thus, we conclude that there is no reference gene that can deemed suitable for all the experimental conditions. In addition, we believe that every experiment will require the scientific evaluation and selection of the best candidate gene for use as a reference gene to obtain reliable scientific results.
A model of directional selection applied to the evolution of drug resistance in HIV-1.
Seoighe, Cathal; Ketwaroo, Farahnaz; Pillay, Visva; Scheffler, Konrad; Wood, Natasha; Duffet, Rodger; Zvelebil, Marketa; Martinson, Neil; McIntyre, James; Morris, Lynn; Hide, Winston
2007-04-01
Understanding how pathogens acquire resistance to drugs is important for the design of treatment strategies, particularly for rapidly evolving viruses such as HIV-1. Drug treatment can exert strong selective pressures and sites within targeted genes that confer resistance frequently evolve far more rapidly than the neutral rate. Rapid evolution at sites that confer resistance to drugs can be used to help elucidate the mechanisms of evolution of drug resistance and to discover or corroborate novel resistance mutations. We have implemented standard maximum likelihood methods that are used to detect diversifying selection and adapted them for use with serially sampled reverse transcriptase (RT) coding sequences isolated from a group of 300 HIV-1 subtype C-infected women before and after single-dose nevirapine (sdNVP) to prevent mother-to-child transmission. We have also extended the standard models of codon evolution for application to the detection of directional selection. Through simulation, we show that the directional selection model can provide a substantial improvement in sensitivity over models of diversifying selection. Five of the sites within the RT gene that are known to harbor mutations that confer resistance to nevirapine (NVP) strongly supported the directional selection model. There was no evidence that other mutations that are known to confer NVP resistance were selected in this cohort. The directional selection model, applied to serially sampled sequences, also had more power than the diversifying selection model to detect selection resulting from factors other than drug resistance. Because inference of selection from serial samples is unlikely to be adversely affected by recombination, the methods we describe may have general applicability to the analysis of positive selection affecting recombining coding sequences when serially sampled data are available.
Zinser, Erik R; Schneider, Dominique; Blot, Michel; Kolter, Roberto
2003-01-01
The loss of preexisting genes or gene activities during evolution is a major mechanism of ecological specialization. Evolutionary processes that can account for gene loss or inactivation have so far been restricted to one of two mechanisms: direct selection for the loss of gene activities that are disadvantageous under the conditions of selection (i.e., antagonistic pleiotropy) and selection-independent genetic drift of neutral (or nearly neutral) mutations (i.e., mutation accumulation). In this study we demonstrate with an evolved strain of Escherichia coli that a third, distinct mechanism exists by which gene activities can be lost. This selection-dependent mechanism involves the expropriation of one gene's upstream regulatory element by a second gene via a homologous recombination event. Resulting from this genetic exchange is the activation of the second gene and a concomitant inactivation of the first gene. This gene-for-gene expression tradeoff provides a net fitness gain, even if the forfeited activity of the first gene can play a positive role in fitness under the conditions of selection. PMID:12930738
Zhang, Lei; Zhao, Xihua; Zhang, Guoxiu; Zhang, Jiajia; Wang, Xuedong; Zhang, Suping; Wang, Wei; Wei, Dongzhi
2016-01-01
Filamentous fungi play important roles in the production of plant cell-wall degrading enzymes. In recent years, homologous recombinant technologies have contributed significantly to improved enzymes production and system design of genetically manipulated strains. When introducing multiple gene deletions, we need a robust and convenient way to control selectable marker genes, especially when only a limited number of markers are available in filamentous fungi. Integration after transformation is predominantly nonhomologous in most fungi other than yeast. Fungal strains deficient in the non-homologous end-joining (NHEJ) pathway have limitations associated with gene function analyses despite they are excellent recipient strains for gene targets. We describe strategies and methods to address these challenges above and leverage the power of resilient NHEJ deficiency strains. We have established a foolproof light-inducible platform for one-step unmarked genetic modification in industrial eukaryotic microorganisms designated as ‘LML 3.0’, and an on-off control protocol of NHEJ pathway called ‘OFN 1.0’, using a synthetic light-switchable transactivation to control Cre recombinase-based excision and inversion. The methods provide a one-step strategy to sequentially modify genes without introducing selectable markers and NHEJ-deficiency. The strategies can be used to manipulate many biological processes in a wide range of eukaryotic cells. PMID:26857594
Functional cohesion of gene sets determined by latent semantic indexing of PubMed abstracts.
Xu, Lijing; Furlotte, Nicholas; Lin, Yunyue; Heinrich, Kevin; Berry, Michael W; George, Ebenezer O; Homayouni, Ramin
2011-04-14
High-throughput genomic technologies enable researchers to identify genes that are co-regulated with respect to specific experimental conditions. Numerous statistical approaches have been developed to identify differentially expressed genes. Because each approach can produce distinct gene sets, it is difficult for biologists to determine which statistical approach yields biologically relevant gene sets and is appropriate for their study. To address this issue, we implemented Latent Semantic Indexing (LSI) to determine the functional coherence of gene sets. An LSI model was built using over 1 million Medline abstracts for over 20,000 mouse and human genes annotated in Entrez Gene. The gene-to-gene LSI-derived similarities were used to calculate a literature cohesion p-value (LPv) for a given gene set using a Fisher's exact test. We tested this method against genes in more than 6,000 functional pathways annotated in Gene Ontology (GO) and found that approximately 75% of gene sets in GO biological process category and 90% of the gene sets in GO molecular function and cellular component categories were functionally cohesive (LPv<0.05). These results indicate that the LPv methodology is both robust and accurate. Application of this method to previously published microarray datasets demonstrated that LPv can be helpful in selecting the appropriate feature extraction methods. To enable real-time calculation of LPv for mouse or human gene sets, we developed a web tool called Gene-set Cohesion Analysis Tool (GCAT). GCAT can complement other gene set enrichment approaches by determining the overall functional cohesion of data sets, taking into account both explicit and implicit gene interactions reported in the biomedical literature. GCAT is freely available at http://binf1.memphis.edu/gcat.
2012-01-01
Background Water stress limits plant survival and production in many parts of the world. Identification of genes and alleles responding to water stress conditions is important in breeding plants better adapted to drought. Currently there are no studies examining the transcriptome wide gene and allelic expression patterns under water stress conditions. We used RNA sequencing (RNA-seq) to identify the candidate genes and alleles and to explore the evolutionary signatures of selection. Results We studied the effect of water stress on gene expression in Eucalyptus camaldulensis seedlings derived from three natural populations. We used reference-guided transcriptome mapping to study gene expression. Several genes showed differential expression between control and stress conditions. Gene ontology (GO) enrichment tests revealed up-regulation of 140 stress-related gene categories and down-regulation of 35 metabolic and cell wall organisation gene categories. More than 190,000 single nucleotide polymorphisms (SNPs) were detected and 2737 of these showed differential allelic expression. Allelic expression of 52% of these variants was correlated with differential gene expression. Signatures of selection patterns were studied by estimating the proportion of nonsynonymous to synonymous substitution rates (Ka/Ks). The average Ka/Ks ratio among the 13,719 genes was 0.39 indicating that most of the genes are under purifying selection. Among the positively selected genes (Ka/Ks > 1.5) apoptosis and cell death categories were enriched. Of the 287 positively selected genes, ninety genes showed differential expression and 27 SNPs from 17 positively selected genes showed differential allelic expression between treatments. Conclusions Correlation of allelic expression of several SNPs with total gene expression indicates that these variants may be the cis-acting variants or in linkage disequilibrium with such variants. Enrichment of apoptosis and cell death gene categories among the positively selected genes reveals the past selection pressures experienced by the populations used in this study. PMID:22853646
Campos, José Luis; Charlesworth, Brian
2017-01-01
We used whole-genome resequencing data from a population of Drosophila melanogaster to investigate the causes of the negative correlation between the within-population synonymous nucleotide site diversity (πS) of a gene and its degree of divergence from related species at nonsynonymous nucleotide sites (KA). By using the estimated distributions of mutational effects on fitness at nonsynonymous and UTR sites, we predicted the effects of background selection at sites within a gene on πS and found that these could account for only part of the observed correlation between πS and KA. We developed a model of the effects of selective sweeps that included gene conversion as well as crossing over. We used this model to estimate the average strength of selection on positively selected mutations in coding sequences and in UTRs, as well as the proportions of new mutations that are selectively advantageous. Genes with high levels of selective constraint on nonsynonymous sites were found to have lower strengths of positive selection and lower proportions of advantageous mutations than genes with low levels of constraint. Overall, background selection and selective sweeps within a typical gene reduce its synonymous diversity to ∼75% of its value in the absence of selection, with larger reductions for genes with high KA. Gene conversion has a major effect on the estimates of the parameters of positive selection, such that the estimated strength of selection on favorable mutations is greatly reduced if it is ignored. PMID:28559322
Zhang, Songdou; An, Shiheng; Li, Zhen; Wu, Fengming; Yang, Qingpo; Liu, Yichen; Cao, Jinjun; Zhang, Huaijiang; Zhang, Qingwen; Liu, Xiaoxia
2015-01-25
Recent studies have focused on determining functional genes and microRNAs in the pest Helicoverpa armigera (Lepidoptera: Noctuidae). Most of these studies used quantitative real-time PCR (qRT-PCR). Suitable reference genes are necessary to normalize gene expression data of qRT-PCR. However, a comprehensive study on the reference genes in H. armigera remains lacking. Twelve candidate reference genes of H. armigera were selected and evaluated for their expression stability under different biotic and abiotic conditions. The comprehensive stability ranking of candidate reference genes was recommended by RefFinder and the optimal number of reference genes was calculated by geNorm. Two target genes, thioredoxin (TRX) and Cu/Zn superoxide dismutase (SOD), were used to validate the selection of reference genes. Results showed that the most suitable candidate combinations of reference genes were as follows: 28S and RPS15 for developmental stages; RPS15 and RPL13 for larvae tissues; EF and RPL27 for adult tissues; GAPDH, RPL27, and β-TUB for nuclear polyhedrosis virus infection; RPS15 and RPL32 for insecticide treatment; RPS15 and RPL27 for temperature treatment; and RPL32, RPS15, and RPL27 for all samples. This study not only establishes an accurate method for normalizing qRT-PCR data in H. armigera but also serve as a reference for further study on gene transcription in H. armigera and other insects. Copyright © 2014 Elsevier B.V. All rights reserved.
Messmer, Bradley T; Raphael, Benjamin J; Aerni, Sarah J; Widhopf, George F; Rassenti, Laura Z; Gribben, John G; Kay, Neil E; Kipps, Thomas J
2009-01-01
The leukemia cells of unrelated patients with chronic lymphocytic leukemia (CLL) display a restricted repertoire of immunoglobulin (Ig) gene rearrangements with preferential usage of certain Ig gene segments. We developed a computational method to rigorously quantify biases in Ig sequence similarity in large patient databases and to identify groups of patients with unusual levels of sequence similarity. We applied our method to sequences from 1577 CLL patients through the CLL Research Consortium (CRC), and identified 67 similarity groups into which roughly 20% of all patients could be assigned. Immunoglobulin light chain class was highly correlated within all groups and light chain gene usage was similar within sets. Surprisingly, over 40% of the identified groups were composed of somatically mutated genes. This study significantly expands the evidence that antigen selection shapes the Ig repertoire in CLL. PMID:18640719
Messmer, Bradley T; Raphael, Benjamin J; Aerni, Sarah J; Widhopf, George F; Rassenti, Laura Z; Gribben, John G; Kay, Neil E; Kipps, Thomas J
2009-03-01
The leukemia cells of unrelated patients with chronic lymphocytic leukemia (CLL) display a restricted repertoire of immunoglobulin (Ig) gene rearrangements with preferential usage of certain Ig gene segments. We developed a computational method to rigorously quantify biases in Ig sequence similarity in large patient databases and to identify groups of patients with unusual levels of sequence similarity. We applied our method to sequences from 1577 CLL patients through the CLL Research Consortium (CRC), and identified 67 similarity groups into which roughly 20% of all patients could be assigned. Immunoglobulin light chain class was highly correlated within all groups and light chain gene usage was similar within sets. Surprisingly, over 40% of the identified groups were composed of somatically mutated genes. This study significantly expands the evidence that antigen selection shapes the Ig repertoire in CLL.
Zhang, Jie; Li, Yongxiang; Zheng, Jun; Zhang, Hongwei; Yang, Xiaohong; Wang, Jianhua; Wang, Guoying
2017-01-01
The extensive genetic variation present in maize (Zea mays) germplasm makes it possible to detect signatures of positive artificial selection that occurred during temperate and tropical maize improvement. Here we report an analysis of 532,815 polymorphisms from a maize association panel consisting of 368 diverse temperate and tropical inbred lines. We developed a gene-oriented approach adapting exonic polymorphisms to identify recently selected alleles by comparing haplotypes across the maize genome. This analysis revealed evidence of selection for more than 1100 genomic regions during recent improvement, and included regulatory genes and key genes with visible mutant phenotypes. We find that selected candidate target genes in temperate maize are enriched in biosynthetic processes, and further examination of these candidates highlights two cases, sucrose flux and oil storage, in which multiple genes in a common pathway can be cooperatively selected. Finally, based on available parallel gene expression data, we hypothesize that some genes were selected for regulatory variations, resulting in altered gene expression. PMID:28099470
Genetic tools for the investigation of Roseobacter clade bacteria
2009-01-01
Background The Roseobacter clade represents one of the most abundant, metabolically versatile and ecologically important bacterial groups found in marine habitats. A detailed molecular investigation of the regulatory and metabolic networks of these organisms is currently limited for many strains by missing suitable genetic tools. Results Conjugation and electroporation methods for the efficient and stable genetic transformation of selected Roseobacter clade bacteria including Dinoroseobacter shibae, Oceanibulbus indolifex, Phaeobacter gallaeciensis, Phaeobacter inhibens, Roseobacter denitrificans and Roseobacter litoralis were tested. For this purpose an antibiotic resistance screening was performed and suitable genetic markers were selected. Based on these transformation protocols stably maintained plasmids were identified. A plasmid encoded oxygen-independent fluorescent system was established using the flavin mononucleotide-based fluorescent protein FbFP. Finally, a chromosomal gene knockout strategy was successfully employed for the inactivation of the anaerobic metabolism regulatory gene dnr from D. shibae DFL12T. Conclusion A genetic toolbox for members of the Roseobacter clade was established. This provides a solid methodical basis for the detailed elucidation of gene regulatory and metabolic networks underlying the ecological success of this group of marine bacteria. PMID:20021642
Seo, Hogyu David; Lee, Daeyoup
2018-05-15
Random mutagenesis of a target gene is commonly used to identify mutations that yield the desired phenotype. Of the methods that may be used to achieve random mutagenesis, error-prone PCR is a convenient and efficient strategy for generating a diverse pool of mutants (i.e., a mutant library). Error-prone PCR is the method of choice when a researcher seeks to mutate a pre-defined region, such as the coding region of a gene while leaving other genomic regions unaffected. After the mutant library is amplified by error-prone PCR, it must be cloned into a suitable plasmid. The size of the library generated by error-prone PCR is constrained by the efficiency of the cloning step. However, in the fission yeast, Schizosaccharomyces pombe, the cloning step can be replaced by the use of a highly efficient one-step fusion PCR to generate constructs for transformation. Mutants of desired phenotypes may then be selected using appropriate reporters. Here, we describe this strategy in detail, taking as an example, a reporter inserted at centromeric heterochromatin.
The complexity of selection at the major primate β-defensin locus
Semple, Colin AM; Maxwell, Alison; Gautier, Philippe; Kilanowski, Fiona M; Eastwood, Hayden; Barran, Perdita E; Dorin, Julia R
2005-01-01
Background We have examined the evolution of the genes at the major human β-defensin locus and the orthologous loci in a range of other primates and mouse. For the first time these data allow us to examine selective episodes in the more recent evolutionary history of this locus as well as the ancient past. We have used a combination of maximum likelihood based tests and a maximum parsimony based sliding window approach to give a detailed view of the varying modes of selection operating at this locus. Results We provide evidence for strong positive selection soon after the duplication of these genes within an ancestral mammalian genome. Consequently variable selective pressures have acted on β-defensin genes in different evolutionary lineages, with episodes both of negative, and more rarely positive selection, during the divergence of primates. Positive selection appears to have been more common in the rodent lineage, accompanying the birth of novel, rodent-specific β-defensin genes. These observations allow a fuller understanding of the evolution of mammalian innate immunity. In both the rodent and primate lineages, sites in the second exon have been subject to positive selection and by implication are important in functional diversity. A small number of sites in the mature human peptides were found to have undergone repeated episodes of selection in different primate lineages. Particular sites were consistently implicated by multiple methods at positions throughout the mature peptides. These sites are clustered at positions predicted to be important for the specificity of the antimicrobial or chemoattractant properties of β-defensins. Surprisingly, sites within the prepropeptide region were also implicated as being subject to significant positive selection, suggesting previously unappreciated functional significance for this region. Conclusions Identification of these putatively functional sites has important implications for our understanding of β-defensin function and for novel antibiotic design. PMID:15904491
The Impact of Normalization Methods on RNA-Seq Data Analysis
Zyprych-Walczak, J.; Szabelska, A.; Handschuh, L.; Górczak, K.; Klamecka, K.; Figlerowicz, M.; Siatkowski, I.
2015-01-01
High-throughput sequencing technologies, such as the Illumina Hi-seq, are powerful new tools for investigating a wide range of biological and medical problems. Massive and complex data sets produced by the sequencers create a need for development of statistical and computational methods that can tackle the analysis and management of data. The data normalization is one of the most crucial steps of data processing and this process must be carefully considered as it has a profound effect on the results of the analysis. In this work, we focus on a comprehensive comparison of five normalization methods related to sequencing depth, widely used for transcriptome sequencing (RNA-seq) data, and their impact on the results of gene expression analysis. Based on this study, we suggest a universal workflow that can be applied for the selection of the optimal normalization procedure for any particular data set. The described workflow includes calculation of the bias and variance values for the control genes, sensitivity and specificity of the methods, and classification errors as well as generation of the diagnostic plots. Combining the above information facilitates the selection of the most appropriate normalization method for the studied data sets and determines which methods can be used interchangeably. PMID:26176014
Combining multiple tools outperforms individual methods in gene set enrichment analyses.
Alhamdoosh, Monther; Ng, Milica; Wilson, Nicholas J; Sheridan, Julie M; Huynh, Huy; Wilson, Michael J; Ritchie, Matthew E
2017-02-01
Gene set enrichment (GSE) analysis allows researchers to efficiently extract biological insight from long lists of differentially expressed genes by interrogating them at a systems level. In recent years, there has been a proliferation of GSE analysis methods and hence it has become increasingly difficult for researchers to select an optimal GSE tool based on their particular dataset. Moreover, the majority of GSE analysis methods do not allow researchers to simultaneously compare gene set level results between multiple experimental conditions. The ensemble of genes set enrichment analyses (EGSEA) is a method developed for RNA-sequencing data that combines results from twelve algorithms and calculates collective gene set scores to improve the biological relevance of the highest ranked gene sets. EGSEA's gene set database contains around 25 000 gene sets from sixteen collections. It has multiple visualization capabilities that allow researchers to view gene sets at various levels of granularity. EGSEA has been tested on simulated data and on a number of human and mouse datasets and, based on biologists' feedback, consistently outperforms the individual tools that have been combined. Our evaluation demonstrates the superiority of the ensemble approach for GSE analysis, and its utility to effectively and efficiently extrapolate biological functions and potential involvement in disease processes from lists of differentially regulated genes. EGSEA is available as an R package at http://www.bioconductor.org/packages/EGSEA/ . The gene sets collections are available in the R package EGSEAdata from http://www.bioconductor.org/packages/EGSEAdata/ . monther.alhamdoosh@csl.com.au mritchie@wehi.edu.au. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Qanbari, Saber; Strom, Tim M.; Haberer, Georg; Weigend, Steffen; Gheyas, Almas A.; Turner, Frances; Burt, David W.; Preisinger, Rudolf; Gianola, Daniel; Simianer, Henner
2012-01-01
In most studies aimed at localizing footprints of past selection, outliers at tails of the empirical distribution of a given test statistic are assumed to reflect locus-specific selective forces. Significance cutoffs are subjectively determined, rather than being related to a clear set of hypotheses. Here, we define an empirical p-value for the summary statistic by means of a permutation method that uses the observed SNP structure in the real data. To illustrate the methodology, we applied our approach to a panel of 2.9 million autosomal SNPs identified from re-sequencing a pool of 15 individuals from a brown egg layer line. We scanned the genome for local reductions in heterozygosity, suggestive of selective sweeps. We also employed a modified sliding window approach that accounts for gaps in the sequence and increases scanning resolution by moving the overlapping windows by steps of one SNP only, and suggest to call this a “creeping window” strategy. The approach confirmed selective sweeps in the region of previously described candidate genes, i.e. TSHR, PRL, PRLHR, INSR, LEPR, IGF1, and NRAMP1 when used as positive controls. The genome scan revealed 82 distinct regions with strong evidence of selection (genome-wide p-value<0.001), including genes known to be associated with eggshell structure and immune system such as CALB1 and GAL cluster, respectively. A substantial proportion of signals was found in poor gene content regions including the most extreme signal on chromosome 1. The observation of multiple signals in a highly selected layer line of chicken is consistent with the hypothesis that egg production is a complex trait controlled by many genes. PMID:23209582
Sima, Chao; Amundson, Sally A.; Zenhausern, Frederic
2018-01-01
Purpose To compile a list of genes that have been reported to be affected by external ionizing radiation (IR) and to assess their performance as candidate biomarkers for individual human radiation dosimetry. Methods Eligible studies were identified through extensive searches of the online databases from 1978 to 2017. Original English-language publications of microarray studies assessing radiation-induced changes in gene expression levels in human blood after external IR were included. Genes identified in at least half of the selected studies were retained for bio-statistical analysis in order to evaluate their diagnostic ability. Results 24 studies met the criteria and were included in this study. Radiation-induced expression of 10,170 unique genes was identified and the 31 genes that have been identified in at least 50% of studies (12/24 studies) were selected for diagnostic power analysis. Twenty-seven genes showed a significant Spearman’s correlation with radiation dose. Individually, TNFSF4, FDXR, MYC, ZMAT3 and GADD45A provided the best discrimination of radiation dose < 2 Gy and dose ≥ 2 Gy according to according to their maximized Youden’s index (0.67, 0.55, 0.55, 0.55 and 0.53 respectively). Moreover, 12 combinations of three genes display an area under the Receiver Operating Curve (ROC) curve (AUC) = 1 reinforcing the concept of biomarker combinations instead of looking for an ideal and unique biomarker. Conclusion Gene expression is a promising approach for radiation dosimetry assessment. A list of robust candidate biomarkers has been identified from analysis of the studies published to date, confirming for example the potential of well-known genes such as FDXR and TNFSF4 or highlighting other promising gene such as ZMAT3. However, heterogeneity in protocols and analysis methods will require additional studies to confirm these results. PMID:29879226
A sub-space greedy search method for efficient Bayesian Network inference.
Zhang, Qing; Cao, Yong; Li, Yong; Zhu, Yanming; Sun, Samuel S M; Guo, Dianjing
2011-09-01
Bayesian network (BN) has been successfully used to infer the regulatory relationships of genes from microarray dataset. However, one major limitation of BN approach is the computational cost because the calculation time grows more than exponentially with the dimension of the dataset. In this paper, we propose a sub-space greedy search method for efficient Bayesian Network inference. Particularly, this method limits the greedy search space by only selecting gene pairs with higher partial correlation coefficients. Using both synthetic and real data, we demonstrate that the proposed method achieved comparable results with standard greedy search method yet saved ∼50% of the computational time. We believe that sub-space search method can be widely used for efficient BN inference in systems biology. Copyright © 2011 Elsevier Ltd. All rights reserved.
Thyagarajan, Bhaskar; Scheyhing, Kelly; Xue, Haipeng; Fontes, Andrew; Chesnut, Jon; Rao, Mahendra; Lakshmipathy, Uma
2009-03-01
Stable expression of transgenes in stem cells has been a challenge due to the nonavailability of efficient transfection methods and the inability of transgenes to support sustained gene expression. Several methods have been reported to stably modify both embryonic and adult stem cells. These methods rely on integration of the transgene into the genome of the host cell, which could result in an expression pattern dependent on the number of integrations and the genomic locus of integration. To overcome this issue, site-specific integration methods mediated by integrase, adeno-associated virus or via homologous recombination have been used to generate stable human embryonic stem cell (hESC) lines. In this study, we describe a vector that is maintained episomally in hESCs. The vector used in this study is based on components derived from the Epstein-Barr virus, containing the Epstein-Barr virus nuclear antigen 1 expression cassette and the OriP origin of replication. The vector also expresses the drug-resistance marker gene hygromycin, which allows for selection and long-term maintenance of cells harboring the plasmid. Using this vector system, we show sustained expression of green fluorescent protein in undifferentiated hESCs and their differentiating embryoid bodies. In addition, the stable hESC clones show comparable expression with and without drug selection. Consistent with this observation, bulk-transfected adipose tissue-derived mesenchymal stem cells showed persistent marker gene expression as they differentiate into adipocytes, osteoblasts and chondroblasts. Episomal vectors offer a fast and efficient method to create hESC reporter lines, which in turn allows one to test the effect of overexpression of various genes on stem cell growth, proliferation and differentiation.
Shirk, Andrew J; Landguth, Erin L; Cushman, Samuel A
2018-01-01
Anthropogenic migration barriers fragment many populations and limit the ability of species to respond to climate-induced biome shifts. Conservation actions designed to conserve habitat connectivity and mitigate barriers are needed to unite fragmented populations into larger, more viable metapopulations, and to allow species to track their climate envelope over time. Landscape genetic analysis provides an empirical means to infer landscape factors influencing gene flow and thereby inform such conservation actions. However, there are currently many methods available for model selection in landscape genetics, and considerable uncertainty as to which provide the greatest accuracy in identifying the true landscape model influencing gene flow among competing alternative hypotheses. In this study, we used population genetic simulations to evaluate the performance of seven regression-based model selection methods on a broad array of landscapes that varied by the number and type of variables contributing to resistance, the magnitude and cohesion of resistance, as well as the functional relationship between variables and resistance. We also assessed the effect of transformations designed to linearize the relationship between genetic and landscape distances. We found that linear mixed effects models had the highest accuracy in every way we evaluated model performance; however, other methods also performed well in many circumstances, particularly when landscape resistance was high and the correlation among competing hypotheses was limited. Our results provide guidance for which regression-based model selection methods provide the most accurate inferences in landscape genetic analysis and thereby best inform connectivity conservation actions. Published 2017. This article is a U.S. Government work and is in the public domain in the USA.
Zhang, Cui; Gao, Han; Yang, Zhenke; Jiang, Yuanyuan; Li, Zhenkui; Wang, Xu; Xiao, Bo; Su, Xin-Zhuan; Cui, Huiting; Yuan, Jing
2017-03-01
CRISPR/Cas9 has been successfully adapted for gene editing in malaria parasites including Plasmodium falciparum and Plasmodium yoelii. However, the reported methods were limited to editing one gene at a time. In practice, it is often desired to modify multiple genetic loci in a parasite genome. Here we describe a CRISPR/Cas9 mediated genome editing method that allows successive modification of more than one gene in the genome of P. yoelii using an improved single-vector system (pYCm) we developed previously. Drug resistant genes encoding human dihydrofolate reductase (hDHFR) and a yeast bifunctional protein (yFCU), with cytosine deaminase (CD) and uridyl phosphoribosyl transferase (UPRT) activities in the plasmid, allowed sequential positive (pyrimethamine, Pyr) and negative (5-fluorocytosine, 5FC) selections and generation of transgenic parasites free of the episomal plasmid after genetic modification. Using this system, we were able to efficiently tag a gene of interest (Pyp28) and subsequently disrupted two genes (Pyctrp and Pycdpk3) that are individually critical for ookinete motility. Disruption of the genes either eliminated (Pyctrp) or greatly reduced (Pycdpk3) ookinete forward motility in matrigel in vitro and completely blocked oocyst development in mosquito midgut. The method will greatly facilitate studies of parasite gene function, development, and disease pathogenesis. Copyright © 2016 Elsevier B.V. All rights reserved.
Moon, Myungjin; Nakai, Kenta
2018-04-01
Currently, cancer biomarker discovery is one of the important research topics worldwide. In particular, detecting significant genes related to cancer is an important task for early diagnosis and treatment of cancer. Conventional studies mostly focus on genes that are differentially expressed in different states of cancer; however, noise in gene expression datasets and insufficient information in limited datasets impede precise analysis of novel candidate biomarkers. In this study, we propose an integrative analysis of gene expression and DNA methylation using normalization and unsupervised feature extractions to identify candidate biomarkers of cancer using renal cell carcinoma RNA-seq datasets. Gene expression and DNA methylation datasets are normalized by Box-Cox transformation and integrated into a one-dimensional dataset that retains the major characteristics of the original datasets by unsupervised feature extraction methods, and differentially expressed genes are selected from the integrated dataset. Use of the integrated dataset demonstrated improved performance as compared with conventional approaches that utilize gene expression or DNA methylation datasets alone. Validation based on the literature showed that a considerable number of top-ranked genes from the integrated dataset have known relationships with cancer, implying that novel candidate biomarkers can also be acquired from the proposed analysis method. Furthermore, we expect that the proposed method can be expanded for applications involving various types of multi-omics datasets.
Identification of essential genes in Streptococcus pneumoniae by allelic replacement mutagenesis.
Song, Jae-Hoon; Ko, Kwan Soo; Lee, Ji-Young; Baek, Jin Yang; Oh, Won Sup; Yoon, Ha Sik; Jeong, Jin-Yong; Chun, Jongsik
2005-06-30
To find potential targets of novel antimicrobial agents, we identified essential genes of Streptococcus pneumoniae using comparative genomics and allelic replacement mutagenesis. We compared the genome of S. pneumoniae R6 with those of Bacillus subtilis, Enterococcus faecalis, Escherichia coli, and Staphylococcus aureus, and selected 693 candidate target genes with > 40% amino acid sequence identity to the corresponding genes in at least two of the other species. The 693 genes were disrupted and 133 were found to be essential for growth. Of these, 32 encoded proteins of unknown function, and we were able to identify orthologues of 22 of these genes by genomic comparisons. The experimental method used in this study is easy to perform, rapid and efficient for identifying essential genes of bacterial pathogens.
Species Tree Inference Using a Mixture Model.
Ullah, Ikram; Parviainen, Pekka; Lagergren, Jens
2015-09-01
Species tree reconstruction has been a subject of substantial research due to its central role across biology and medicine. A species tree is often reconstructed using a set of gene trees or by directly using sequence data. In either of these cases, one of the main confounding phenomena is the discordance between a species tree and a gene tree due to evolutionary events such as duplications and losses. Probabilistic methods can resolve the discordance by coestimating gene trees and the species tree but this approach poses a scalability problem for larger data sets. We present MixTreEM-DLRS: A two-phase approach for reconstructing a species tree in the presence of gene duplications and losses. In the first phase, MixTreEM, a novel structural expectation maximization algorithm based on a mixture model is used to reconstruct a set of candidate species trees, given sequence data for monocopy gene families from the genomes under study. In the second phase, PrIME-DLRS, a method based on the DLRS model (Åkerborg O, Sennblad B, Arvestad L, Lagergren J. 2009. Simultaneous Bayesian gene tree reconstruction and reconciliation analysis. Proc Natl Acad Sci U S A. 106(14):5714-5719), is used for selecting the best species tree. PrIME-DLRS can handle multicopy gene families since DLRS, apart from modeling sequence evolution, models gene duplication and loss using a gene evolution model (Arvestad L, Lagergren J, Sennblad B. 2009. The gene evolution model and computing its associated probabilities. J ACM. 56(2):1-44). We evaluate MixTreEM-DLRS using synthetic and biological data, and compare its performance with a recent genome-scale species tree reconstruction method PHYLDOG (Boussau B, Szöllősi GJ, Duret L, Gouy M, Tannier E, Daubin V. 2013. Genome-scale coestimation of species and gene trees. Genome Res. 23(2):323-330) as well as with a fast parsimony-based algorithm Duptree (Wehe A, Bansal MS, Burleigh JG, Eulenstein O. 2008. Duptree: a program for large-scale phylogenetic analyses using gene tree parsimony. Bioinformatics 24(13):1540-1541). Our method is competitive with PHYLDOG in terms of accuracy and runs significantly faster and our method outperforms Duptree in accuracy. The analysis constituted by MixTreEM without DLRS may also be used for selecting the target species tree, yielding a fast and yet accurate algorithm for larger data sets. MixTreEM is freely available at http://prime.scilifelab.se/mixtreem/. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Fujibuchi, Wataru; Anderson, John S. J.; Landsman, David
2001-01-01
Consensus pattern and matrix-based searches designed to predict cis-acting transcriptional regulatory sequences have historically been subject to large numbers of false positives. We sought to decrease false positives by incorporating expression profile data into a consensus pattern-based search method. We have systematically analyzed the expression phenotypes of over 6000 yeast genes, across 121 expression profile experiments, and correlated them with the distribution of 14 known regulatory elements over sequences upstream of the genes. Our method is based on a metric we term probabilistic element assessment (PEA), which is a ranking of potential sites based on sequence similarity in the upstream regions of genes with similar expression phenotypes. For eight of the 14 known elements that we examined, our method had a much higher selectivity than a naïve consensus pattern search. Based on our analysis, we have developed a web-based tool called PROSPECT, which allows consensus pattern-based searching of gene clusters obtained from microarray data. PMID:11574681
Fernández, Maria V.; Budde, John; Del-Aguila, Jorge L.; Ibañez, Laura; Deming, Yuetiva; Harari, Oscar; Norton, Joanne; Morris, John C.; Goate, Alison M.; Cruchaga, Carlos
2018-01-01
Gene-based tests to study the combined effect of rare variants on a particular phenotype have been widely developed for case-control studies, but their evolution and adaptation for family-based studies, especially studies of complex incomplete families, has been slower. In this study, we have performed a practical examination of all the latest gene-based methods available for family-based study designs using both simulated and real datasets. We examined the performance of several collapsing, variance-component, and transmission disequilibrium tests across eight different software packages and 22 models utilizing a cohort of 285 families (N = 1,235) with late-onset Alzheimer disease (LOAD). After a thorough examination of each of these tests, we propose a methodological approach to identify, with high confidence, genes associated with the tested phenotype and we provide recommendations to select the best software and model for family-based gene-based analyses. Additionally, in our dataset, we identified PTK2B, a GWAS candidate gene for sporadic AD, along with six novel genes (CHRD, CLCN2, HDLBP, CPAMD8, NLRP9, and MAS1L) as candidate genes for familial LOAD. PMID:29670507
Cloning and study of the pectate lyase gene of Erwinia carotovora
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bukanov, N.O.; Fonshtein, M.Yu.; Evtushenkov, A.N.
1986-04-01
The cloning of the gene of a secretable protein of Erwinia carotovora, pectate lyase, in Escherichia coli was described. Primary cloning was conducted using the phage vector lambda 47.1. In the gene library of E. carotovora obtained, eight phages carrying the gene sought were identified according to the appearance of enzymatic activity of the gene product, pectate lyase, in situ. The BamHI fragment of DNA, common to all these phages, was recloned on the plasmid pUC19. It was shown that the cloned pectate lyase gene is represented on the E. carotovora chromosome in one copy. Methods of production of representativemore » gene libraries on phage vectors from no less than 1 ..mu..g of cloned DNA even for the genomes of eukaryotes have now been developed. Vectors have been created, for example, lambda 47.1, permitting the selection only of hybrid molecules. A number of methods have been developed for the search for a required gene in the library, depending on whether the cloned gene can be expressed or not, and if it can, what properties it will impart to the hybrid clone containing it.« less
Steiper, Michael E; Wolfe, Nathan D; Karesh, William B; Kilbourn, Annelisa M; Bosi, Edwin J; Ruvolo, Maryellen
2006-07-01
The alpha-globin genes are implicated in human resistance to malaria, a disease caused by Plasmodium parasites. This study is the first to analyze DNA sequences from a novel alpha-globin-type gene in orangutans, a species affected by Plasmodium. Phylogenetic methods show that the gene is a duplication of an alpha-globin gene and is located 5' of alpha-2 globin. The alpha-globin-type gene is notable for having four amino acid replacements relative to the orangutan's alpha-1 and alpha-2 globin genes, with no synonymous differences. Pairwise K(a)/K(s) methods and likelihood ratio tests (LRTs) revealed that the evolutionary history of the alpha-globin-type gene has been marked by either neutral or positive evolution, but not purifying selection. A comparative analysis of the amino acid replacements of the alpha-globin-type gene with human hemoglobinopathies and hemoglobin structure showed that two of the four replaced sites are members of the same molecular bond, one that is crucial to the proper functioning of the hemoglobin molecule. This suggested an adaptive evolutionary change. Functionally, this locus may result in a thalassemia-like phenotype in orangutans, possibly as an adaptation to combat Plasmodium.
Fernández, Maria V; Budde, John; Del-Aguila, Jorge L; Ibañez, Laura; Deming, Yuetiva; Harari, Oscar; Norton, Joanne; Morris, John C; Goate, Alison M; Cruchaga, Carlos
2018-01-01
Gene-based tests to study the combined effect of rare variants on a particular phenotype have been widely developed for case-control studies, but their evolution and adaptation for family-based studies, especially studies of complex incomplete families, has been slower. In this study, we have performed a practical examination of all the latest gene-based methods available for family-based study designs using both simulated and real datasets. We examined the performance of several collapsing, variance-component, and transmission disequilibrium tests across eight different software packages and 22 models utilizing a cohort of 285 families ( N = 1,235) with late-onset Alzheimer disease (LOAD). After a thorough examination of each of these tests, we propose a methodological approach to identify, with high confidence, genes associated with the tested phenotype and we provide recommendations to select the best software and model for family-based gene-based analyses. Additionally, in our dataset, we identified PTK2B , a GWAS candidate gene for sporadic AD, along with six novel genes ( CHRD, CLCN2, HDLBP, CPAMD8, NLRP9 , and MAS1L ) as candidate genes for familial LOAD.
Missing value imputation in DNA microarrays based on conjugate gradient method.
Dorri, Fatemeh; Azmi, Paeiz; Dorri, Faezeh
2012-02-01
Analysis of gene expression profiles needs a complete matrix of gene array values; consequently, imputation methods have been suggested. In this paper, an algorithm that is based on conjugate gradient (CG) method is proposed to estimate missing values. k-nearest neighbors of the missed entry are first selected based on absolute values of their Pearson correlation coefficient. Then a subset of genes among the k-nearest neighbors is labeled as the best similar ones. CG algorithm with this subset as its input is then used to estimate the missing values. Our proposed CG based algorithm (CGimpute) is evaluated on different data sets. The results are compared with sequential local least squares (SLLSimpute), Bayesian principle component analysis (BPCAimpute), local least squares imputation (LLSimpute), iterated local least squares imputation (ILLSimpute) and adaptive k-nearest neighbors imputation (KNNKimpute) methods. The average of normalized root mean squares error (NRMSE) and relative NRMSE in different data sets with various missing rates shows CGimpute outperforms other methods. Copyright © 2011 Elsevier Ltd. All rights reserved.
New RNAi strategy for selective suppression of a mutant allele in polyglutamine disease.
Kubodera, Takayuki; Yokota, Takanori; Ishikawa, Kinya; Mizusawa, Hidehiro
2005-12-01
In gene therapy of dominantly inherited diseases with small interfering RNA (siRNA), mutant allele specific suppression may be necessary for diseases in which the defective gene normally has an important role. It is difficult, however, to design a mutant allele-specific siRNA for trinucleotide repeat diseases in which the difference of sequences is only repeat length. To overcome this problem, we use a new RNA interference (RNAi) strategy for selective suppression of mutant alleles. Both mutant and wild-type alleles are inhibited by the most effective siRNA, and wild-type protein is restored using the wild-type mRNA modified to be resistant to the siRNA. Here, we applied this method to spinocerebellar ataxia type 6 (SCA6). We discuss its feasibility and problems for future gene therapy.
Arun, Alok; Baumlé, Véronique; Amelot, Gaël; Nieberding, Caroline M.
2015-01-01
Real-time quantitative reverse transcription PCR (qRT-PCR) is a technique widely used to quantify the transcriptional expression level of candidate genes. qRT-PCR requires the selection of one or several suitable reference genes, whose expression profiles remain stable across conditions, to normalize the qRT-PCR expression profiles of candidate genes. Although several butterfly species (Lepidoptera) have become important models in molecular evolutionary ecology, so far no study aimed at identifying reference genes for accurate data normalization for any butterfly is available. The African bush brown butterfly Bicyclus anynana has drawn considerable attention owing to its suitability as a model for evolutionary ecology, and we here provide a maiden extensive study to identify suitable reference gene in this species. We monitored the expression profile of twelve reference genes: eEF-1α, FK506, UBQL40, RpS8, RpS18, HSP, GAPDH, VATPase, ACT3, TBP, eIF2 and G6PD. We tested the stability of their expression profiles in three different tissues (wings, brains, antennae), two developmental stages (pupal and adult) and two sexes (male and female), all of which were subjected to two food treatments (food stress and control feeding ad libitum). The expression stability and ranking of twelve reference genes was assessed using two algorithm-based methods, NormFinder and geNorm. Both methods identified RpS8 as the best suitable reference gene for expression data normalization. We also showed that the use of two reference genes is sufficient to effectively normalize the qRT-PCR data under varying tissues and experimental conditions that we used in B. anynana. Finally, we tested the effect of choosing reference genes with different stability on the normalization of the transcript abundance of a candidate gene involved in olfactory communication in B. anynana, the Fatty Acyl Reductase 2, and we confirmed that using an unstable reference gene can drastically alter the expression profile of the target candidate genes. PMID:25793735
Arun, Alok; Baumlé, Véronique; Amelot, Gaël; Nieberding, Caroline M
2015-01-01
Real-time quantitative reverse transcription PCR (qRT-PCR) is a technique widely used to quantify the transcriptional expression level of candidate genes. qRT-PCR requires the selection of one or several suitable reference genes, whose expression profiles remain stable across conditions, to normalize the qRT-PCR expression profiles of candidate genes. Although several butterfly species (Lepidoptera) have become important models in molecular evolutionary ecology, so far no study aimed at identifying reference genes for accurate data normalization for any butterfly is available. The African bush brown butterfly Bicyclus anynana has drawn considerable attention owing to its suitability as a model for evolutionary ecology, and we here provide a maiden extensive study to identify suitable reference gene in this species. We monitored the expression profile of twelve reference genes: eEF-1α, FK506, UBQL40, RpS8, RpS18, HSP, GAPDH, VATPase, ACT3, TBP, eIF2 and G6PD. We tested the stability of their expression profiles in three different tissues (wings, brains, antennae), two developmental stages (pupal and adult) and two sexes (male and female), all of which were subjected to two food treatments (food stress and control feeding ad libitum). The expression stability and ranking of twelve reference genes was assessed using two algorithm-based methods, NormFinder and geNorm. Both methods identified RpS8 as the best suitable reference gene for expression data normalization. We also showed that the use of two reference genes is sufficient to effectively normalize the qRT-PCR data under varying tissues and experimental conditions that we used in B. anynana. Finally, we tested the effect of choosing reference genes with different stability on the normalization of the transcript abundance of a candidate gene involved in olfactory communication in B. anynana, the Fatty Acyl Reductase 2, and we confirmed that using an unstable reference gene can drastically alter the expression profile of the target candidate genes.
Literature-based discovery of diabetes- and ROS-related targets
2010-01-01
Background Reactive oxygen species (ROS) are known mediators of cellular damage in multiple diseases including diabetic complications. Despite its importance, no comprehensive database is currently available for the genes associated with ROS. Methods We present ROS- and diabetes-related targets (genes/proteins) collected from the biomedical literature through a text mining technology. A web-based literature mining tool, SciMiner, was applied to 1,154 biomedical papers indexed with diabetes and ROS by PubMed to identify relevant targets. Over-represented targets in the ROS-diabetes literature were obtained through comparisons against randomly selected literature. The expression levels of nine genes, selected from the top ranked ROS-diabetes set, were measured in the dorsal root ganglia (DRG) of diabetic and non-diabetic DBA/2J mice in order to evaluate the biological relevance of literature-derived targets in the pathogenesis of diabetic neuropathy. Results SciMiner identified 1,026 ROS- and diabetes-related targets from the 1,154 biomedical papers (http://jdrf.neurology.med.umich.edu/ROSDiabetes/). Fifty-three targets were significantly over-represented in the ROS-diabetes literature compared to randomly selected literature. These over-represented targets included well-known members of the oxidative stress response including catalase, the NADPH oxidase family, and the superoxide dismutase family of proteins. Eight of the nine selected genes exhibited significant differential expression between diabetic and non-diabetic mice. For six genes, the direction of expression change in diabetes paralleled enhanced oxidative stress in the DRG. Conclusions Literature mining compiled ROS-diabetes related targets from the biomedical literature and led us to evaluate the biological relevance of selected targets in the pathogenesis of diabetic neuropathy. PMID:20979611
Progress in gene targeting and gene therapy for retinitis pigmentosa
DOE Office of Scientific and Technical Information (OSTI.GOV)
Farrar, G.J.; Humphries, M.M.; Erven, A.
1994-09-01
Previously, we localized disease genes involved in retinitis pigmentosa (RP), an inherited retinal degeneration, close to the rhodopsin and peripherin genes on 3q and 6p. Subsequently, we and others identified mutations in these genes in RP patients. Currently animal models for human retinopathies are being generated using gene targeting by homologous recombination in embryonic stem (ES) cells. Genomic clones for retinal genes including rhodopsin and peripherin have been obtained from a phage library carrying mouse DNA isogenic with the ES cell line (CC1.2). The peripherin clone has been sequenced to establish the genomic structure of the mouse gene. Targeting vectorsmore » for rhodopsin and peripherin including a neomycin cassette for positive selection and thymidine kinase genes enabling selection against random intergrants are under construction. Progress in vector construction will be presented. Simultaneously we are developing systems for delivery of gene therapies to retinal tissues utilizing replication-deficient adenovirus (Ad5). Efficacy of infection subsequent to various methods of intraocular injection and with varying viral titers is being assayed using an adenovirus construct containing a CMV promoter LacZ fusion as reporter and the range of tissues infected and the level of duration of LacZ expression monitored. Viral constructs with the LacZ reporter gene under the control of retinal specific promoters such as rhodopsin and IRBP cloned into pXCJL.1 are under construction. An update on developments in photoreceptor cell-directed expression of virally delivered genes will be presented.« less
Li, Meng-Yao; Song, Xiong; Wang, Feng; Xiong, Ai-Sheng
2016-01-01
Parsley, one of the most important vegetables in the Apiaceae family, is widely used in the food, medicinal, and cosmetic industries. Recent studies on parsley mainly focus on its chemical composition, and further research involving the analysis of the plant's gene functions and expressions is required. qPCR is a powerful method for detecting very low quantities of target transcript levels and is widely used to study gene expression. To ensure the accuracy of results, a suitable reference gene is necessary for expression normalization. In this study, four software, namely geNorm, NormFinder, BestKeeper, and RefFinder were used to evaluate the expression stabilities of eight candidate reference genes of parsley (GAPDH, ACTIN, eIF-4α, SAND, UBC, TIP41, EF-1α, and TUB) under various conditions, including abiotic stresses (heat, cold, salt, and drought) and hormone stimuli treatments (GA, SA, MeJA, and ABA). Results showed that EF-1α and TUB were the most stable genes for abiotic stresses, whereas EF-1α, GAPDH, and TUB were the top three choices for hormone stimuli treatments. Moreover, EF-1α and TUB were the most stable reference genes among all tested samples, and UBC was the least stable one. Expression analysis of PcDREB1 and PcDREB2 further verified that the selected stable reference genes were suitable for gene expression normalization. This study can guide the selection of suitable reference genes in gene expression in parsley. PMID:27746803
Li, Meng-Yao; Song, Xiong; Wang, Feng; Xiong, Ai-Sheng
2016-01-01
Parsley, one of the most important vegetables in the Apiaceae family, is widely used in the food, medicinal, and cosmetic industries. Recent studies on parsley mainly focus on its chemical composition, and further research involving the analysis of the plant's gene functions and expressions is required. qPCR is a powerful method for detecting very low quantities of target transcript levels and is widely used to study gene expression. To ensure the accuracy of results, a suitable reference gene is necessary for expression normalization. In this study, four software, namely geNorm, NormFinder, BestKeeper, and RefFinder were used to evaluate the expression stabilities of eight candidate reference genes of parsley ( GAPDH, ACTIN, eIF-4 α, SAND, UBC, TIP41, EF-1 α, and TUB ) under various conditions, including abiotic stresses (heat, cold, salt, and drought) and hormone stimuli treatments (GA, SA, MeJA, and ABA). Results showed that EF-1 α and TUB were the most stable genes for abiotic stresses, whereas EF-1 α, GAPDH , and TUB were the top three choices for hormone stimuli treatments. Moreover, EF-1 α and TUB were the most stable reference genes among all tested samples, and UBC was the least stable one. Expression analysis of PcDREB1 and PcDREB2 further verified that the selected stable reference genes were suitable for gene expression normalization. This study can guide the selection of suitable reference genes in gene expression in parsley.
Dong, Shan-Shan; Guo, Yan; Zhu, Dong-Li; Chen, Xiao-Feng; Wu, Xiao-Ming; Shen, Hui; Chen, Xiang-Ding; Tan, Li-Jun; Tian, Qing; Deng, Hong-Wen; Yang, Tie-Lin
2016-01-01
OBJECTIVES With ENCODE epigenomic data and results from published genome-wide association studies (GWASs), we aimed to find regulatory signatures of obesity genes and discover novel susceptibility genes. METHODS Obesity genes were obtained from public GWASs databases and their promoters were annotated based on the regulatory elements information. Significantly enriched or depleted epigenomic elements in the promoters of obesity genes were evaluated and all human genes were then prioritized according to the existence of the selected elements to predict new candidate genes. Top ranked genes were subsequently applied to validate their associations with obesity-related traits in three independent in-house GWASs samples. RESULTS We identified RAD21 and EZH2 as over-represented, STAT2 and IRF3 as depleted transcription factors. Histone modification of H3K9me3 and chromatin state segmentation of “poised promoter” and “repressed” were overrepresented. All genes were prioritized and we selected the top five genes for validation at population level. Combined results from the three GWASs samples, rs7522101 in ESRRG remained significantly associated with BMI after multiple testing corrections (P = 7.25 × 10−5). It was also associated with β-cell function (P = 1.99 × 10−3) and fasting glucose level (P < 0.05) in the meta-analyses of glucose and insulin-related traits consortium (MAGIC) dataset. CONCLUSIONS In summary, we identified epigenomic characteristics for obesity genes and suggested ESRRG as a novel obesity susceptibility gene. PMID:27113491
MicroRNA-integrated and network-embedded gene selection with diffusion distance.
Huang, Di; Zhou, Xiaobo; Lyon, Christopher J; Hsueh, Willa A; Wong, Stephen T C
2010-10-29
Gene network information has been used to improve gene selection in microarray-based studies by selecting marker genes based both on their expression and the coordinate expression of genes within their gene network under a given condition. Here we propose a new network-embedded gene selection model. In this model, we first address the limitations of microarray data. Microarray data, although widely used for gene selection, measures only mRNA abundance, which does not always reflect the ultimate gene phenotype, since it does not account for post-transcriptional effects. To overcome this important (critical in certain cases) but ignored-in-almost-all-existing-studies limitation, we design a new strategy to integrate together microarray data with the information of microRNA, the major post-transcriptional regulatory factor. We also handle the challenges led by gene collaboration mechanism. To incorporate the biological facts that genes without direct interactions may work closely due to signal transduction and that two genes may be functionally connected through multi paths, we adopt the concept of diffusion distance. This concept permits us to simulate biological signal propagation and therefore to estimate the collaboration probability for all gene pairs, directly or indirectly-connected, according to multi paths connecting them. We demonstrate, using type 2 diabetes (DM2) as an example, that the proposed strategies can enhance the identification of functional gene partners, which is the key issue in a network-embedded gene selection model. More importantly, we show that our gene selection model outperforms related ones. Genes selected by our model 1) have improved classification capability; 2) agree with biological evidence of DM2-association; and 3) are involved in many well-known DM2-associated pathways.
Sheng, Zizhang; Schramm, Chaim A; Kong, Rui; Mullikin, James C; Mascola, John R; Kwong, Peter D; Shapiro, Lawrence
2017-01-01
Somatic hypermutation (SHM) plays a critical role in the maturation of antibodies, optimizing recognition initiated by recombination of V(D)J genes. Previous studies have shown that the propensity to mutate is modulated by the context of surrounding nucleotides and that SHM machinery generates biased substitutions. To investigate the intrinsic mutation frequency and substitution bias of SHMs at the amino acid level, we analyzed functional human antibody repertoires and developed mGSSP (method for gene-specific substitution profile), a method to construct amino acid substitution profiles from next-generation sequencing-determined B cell transcripts. We demonstrated that these gene-specific substitution profiles (GSSPs) are unique to each V gene and highly consistent between donors. We also showed that the GSSPs constructed from functional antibody repertoires are highly similar to those constructed from antibody sequences amplified from non-productively rearranged passenger alleles, which do not undergo functional selection. This suggests the types and frequencies, or mutational space, of a majority of amino acid changes sampled by the SHM machinery to be well captured by GSSPs. We further observed the rates of mutational exchange between some amino acids to be both asymmetric and context dependent and to correlate weakly with their biochemical properties. GSSPs provide an improved, position-dependent alternative to standard substitution matrices, and can be utilized to developing software for accurately modeling the SHM process. GSSPs can also be used for predicting the amino acid mutational space available for antigen-driven selection and for understanding factors modulating the maturation pathways of antibody lineages in a gene-specific context. The mGSSP method can be used to build, compare, and plot GSSPs; we report the GSSPs constructed for 69 common human V genes (DOI: 10.6084/m9.figshare.3511083) and provide high-resolution logo plots for each (DOI: 10.6084/m9.figshare.3511085).
Johnston, Iain G; Williams, Ben P
2016-02-24
Since their endosymbiotic origin, mitochondria have lost most of their genes. Although many selective mechanisms underlying the evolution of mitochondrial genomes have been proposed, a data-driven exploration of these hypotheses is lacking, and a quantitatively supported consensus remains absent. We developed HyperTraPS, a methodology coupling stochastic modeling with Bayesian inference, to identify the ordering of evolutionary events and suggest their causes. Using 2015 complete mitochondrial genomes, we inferred evolutionary trajectories of mtDNA gene loss across the eukaryotic tree of life. We find that proteins comprising the structural cores of the electron transport chain are preferentially encoded within mitochondrial genomes across eukaryotes. A combination of high GC content and high protein hydrophobicity is required to explain patterns of mtDNA gene retention; a model that accounts for these selective pressures can also predict the success of artificial gene transfer experiments in vivo. This work provides a general method for data-driven inference of the ordering of evolutionary and progressive events, here identifying the distinct features shaping mitochondrial genomes of present-day species. Copyright © 2016 Elsevier Inc. All rights reserved.
YamiPred: A Novel Evolutionary Method for Predicting Pre-miRNAs and Selecting Relevant Features.
Kleftogiannis, Dimitrios; Theofilatos, Konstantinos; Likothanassis, Spiros; Mavroudi, Seferina
2015-01-01
MicroRNAs (miRNAs) are small non-coding RNAs, which play a significant role in gene regulation. Predicting miRNA genes is a challenging bioinformatics problem and existing experimental and computational methods fail to deal with it effectively. We developed YamiPred, an embedded classification method that combines the efficiency and robustness of support vector machines (SVM) with genetic algorithms (GA) for feature selection and parameters optimization. YamiPred was tested in a new and realistic human dataset and was compared with state-of-the-art computational intelligence approaches and the prevalent SVM-based tools for miRNA prediction. Experimental results indicate that YamiPred outperforms existing approaches in terms of accuracy and of geometric mean of sensitivity and specificity. The embedded feature selection component selects a compact feature subset that contributes to the performance optimization. Further experimentation with this minimal feature subset has achieved very high classification performance and revealed the minimum number of samples required for developing a robust predictor. YamiPred also confirmed the important role of commonly used features such as entropy and enthalpy, and uncovered the significance of newly introduced features, such as %A-U aggregate nucleotide frequency and positional entropy. The best model trained on human data has successfully predicted pre-miRNAs to other organisms including the category of viruses.
Moore, Jason H; Gilbert, Joshua C; Tsai, Chia-Ti; Chiang, Fu-Tien; Holden, Todd; Barney, Nate; White, Bill C
2006-07-21
Detecting, characterizing, and interpreting gene-gene interactions or epistasis in studies of human disease susceptibility is both a mathematical and a computational challenge. To address this problem, we have previously developed a multifactor dimensionality reduction (MDR) method for collapsing high-dimensional genetic data into a single dimension (i.e. constructive induction) thus permitting interactions to be detected in relatively small sample sizes. In this paper, we describe a comprehensive and flexible framework for detecting and interpreting gene-gene interactions that utilizes advances in information theory for selecting interesting single-nucleotide polymorphisms (SNPs), MDR for constructive induction, machine learning methods for classification, and finally graphical models for interpretation. We illustrate the usefulness of this strategy using artificial datasets simulated from several different two-locus and three-locus epistasis models. We show that the accuracy, sensitivity, specificity, and precision of a naïve Bayes classifier are significantly improved when SNPs are selected based on their information gain (i.e. class entropy removed) and reduced to a single attribute using MDR. We then apply this strategy to detecting, characterizing, and interpreting epistatic models in a genetic study (n = 500) of atrial fibrillation and show that both classification and model interpretation are significantly improved.
The Use of EST Expression Matrixes for the Quality Control of Gene Expression Data
Milnthorpe, Andrew T.; Soloviev, Mikhail
2012-01-01
EST expression profiling provides an attractive tool for studying differential gene expression, but cDNA libraries' origins and EST data quality are not always known or reported. Libraries may originate from pooled or mixed tissues; EST clustering, EST counts, library annotations and analysis algorithms may contain errors. Traditional data analysis methods, including research into tissue-specific gene expression, assume EST counts to be correct and libraries to be correctly annotated, which is not always the case. Therefore, a method capable of assessing the quality of expression data based on that data alone would be invaluable for assessing the quality of EST data and determining their suitability for mRNA expression analysis. Here we report an approach to the selection of a small generic subset of 244 UniGene clusters suitable for identification of the tissue of origin for EST libraries and quality control of the expression data using EST expression information alone. We created a small expression matrix of UniGene IDs using two rounds of selection followed by two rounds of optimisation. Our selection procedures differ from traditional approaches to finding “tissue-specific” genes and our matrix yields consistency high positive correlation values for libraries with confirmed tissues of origin and can be applied for tissue typing and quality control of libraries as small as just a few hundred total ESTs. Furthermore, we can pick up tissue correlations between related tissues e.g. brain and peripheral nervous tissue, heart and muscle tissues and identify tissue origins for a few libraries of uncharacterised tissue identity. It was possible to confirm tissue identity for some libraries which have been derived from cancer tissues or have been normalised. Tissue matching is affected strongly by cancer progression or library normalisation and our approach may potentially be applied for elucidating the stage of normalisation in normalised libraries or for cancer staging. PMID:22412959