family sequence analysis: Topics by Science.gov

Sample records for family sequence analysis

Examining inter-family differences in intra-family (parent-adolescent) dynamics using grid-sequence analysis.

PubMed

Brinberg, Miriam; Fosco, Gregory M; Ram, Nilam

2017-12-01

Family systems theorists have forwarded a set of theoretical principles meant to guide family scientists and practitioners in their conceptualization of patterns of family interaction-intra-family dynamics-that, over time, give rise to family and individual dysfunction and/or adaptation. In this article, we present an analytic approach that merges state space grid methods adapted from the dynamic systems literature with sequence analysis methods adapted from molecular biology into a "grid-sequence" method for studying inter-family differences in intra-family dynamics. Using dyadic data from 86 parent-adolescent dyads who provided up to 21 daily reports about connectedness, we illustrate how grid-sequence analysis can be used to identify a typology of intrafamily dynamics and to inform theory about how specific types of intrafamily dynamics contribute to adolescent behavior problems and family members' mental health. Methodologically, grid-sequence analysis extends the toolbox of techniques for analysis of family experience sampling and daily diary data. Substantively, we identify patterns of family level microdynamics that may serve as new markers of risk/protective factors and potential points for intervention in families. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Sensitivity of BRCA1/2 testing in high-risk breast/ovarian/male breast cancer families: little contribution of comprehensive RNA/NGS panel testing.

PubMed

Byers, Helen; Wallis, Yvonne; van Veen, Elke M; Lalloo, Fiona; Reay, Kim; Smith, Philip; Wallace, Andrew J; Bowers, Naomi; Newman, William G; Evans, D Gareth

2016-11-01

The sensitivity of testing BRCA1 and BRCA2 remains unresolved as the frequency of deep intronic splicing variants has not been defined in high-risk familial breast/ovarian cancer families. This variant category is reported at significant frequency in other tumour predisposition genes, including NF1 and MSH2. We carried out comprehensive whole gene RNA analysis on 45 high-risk breast/ovary and male breast cancer families with no identified pathogenic variant on exonic sequencing and copy number analysis of BRCA1/2. In addition, we undertook variant screening of a 10-gene high/moderate risk breast/ovarian cancer panel by next-generation sequencing. DNA testing identified the causative variant in 50/56 (89%) breast/ovarian/male breast cancer families with Manchester scores of ≥50 with two variants being confirmed to affect splicing on RNA analysis. RNA sequencing of BRCA1/BRCA2 on 45 individuals from high-risk families identified no deep intronic variants and did not suggest loss of RNA expression as a cause of lost sensitivity. Panel testing in 42 samples identified a known RAD51D variant, a high-risk ATM variant in another breast ovary family and a truncating CHEK2 mutation. Current exonic sequencing and copy number analysis variant detection methods of BRCA1/2 have high sensitivity in high-risk breast/ovarian cancer families. Sequence analysis of RNA does not identify any variants undetected by current analysis of BRCA1/2. However, RNA analysis clarified the pathogenicity of variants of unknown significance detected by current methods. The low diagnostic uplift achieved through sequence analysis of the other known breast/ovarian cancer susceptibility genes indicates that further high-risk genes remain to be identified.
Dipeptide frequency/bias analysis identifies conserved sites of nonrandomness shared by cysteine-rich motifs.

PubMed

Campion, S R; Ameen, A S; Lai, L; King, J M; Munzenmaier, T N

2001-08-15

This report describes the application of a simple computational tool, AAPAIR.TAB, for the systematic analysis of the cysteine-rich EGF, Sushi, and Laminin motif/sequence families at the two-amino acid level. Automated dipeptide frequency/bias analysis detects preferences in the distribution of amino acids in established protein families, by determining which "ordered dipeptides" occur most frequently in comprehensive motif-specific sequence data sets. Graphic display of the dipeptide frequency/bias data revealed family-specific preferences for certain dipeptides, but more importantly detected a shared preference for employment of the ordered dipeptides Gly-Tyr (GY) and Gly-Phe (GF) in all three protein families. The dipeptide Asn-Gly (NG) also exhibited high-frequency and bias in the EGF and Sushi motif families, whereas Asn-Thr (NT) was distinguished in the Laminin family. Evaluation of the distribution of dipeptides identified by frequency/bias analysis subsequently revealed the highly restricted localization of the G(F/Y) and N(G/T) sequence elements at two separate sites of extreme conservation in the consensus sequence of all three sequence families. The similar employment of the high-frequency/bias dipeptides in three distinct protein sequence families was further correlated with the concurrence of these shared molecular determinants at similar positions within the distinctive scaffolds of three structurally divergent, but similarly employed, motif modules.
Single-Exome sequencing identified a novel RP2 mutation in a child with X-linked retinitis pigmentosa.

PubMed

Lim, Hassol; Park, Young-Mi; Lee, Jong-Keuk; Taek Lim, Hyun

2016-10-01

To present an efficient and successful application of a single-exome sequencing study in a family clinically diagnosed with X-linked retinitis pigmentosa. Exome sequencing study based on clinical examination data. An 8-year-old proband and his family. The proband and his family members underwent comprehensive ophthalmologic examinations. Exome sequencing was undertaken in the proband using Agilent SureSelect Human All Exon Kit and Illumina HiSeq 2000 platform. Bioinformatic analysis used Illumina pipeline with Burrows-Wheeler Aligner-Genome Analysis Toolkit (BWA-GATK), followed by ANNOVAR to perform variant functional annotation. All variants passing filter criteria were validated by Sanger sequencing to confirm familial segregation. Analysis of exome sequence data identified a novel frameshift mutation in RP2 gene resulting in a premature stop codon (c.665delC, p.Pro222fsTer237). Sanger sequencing revealed this mutation co-segregated with the disease phenotype in the child's family. We identified a novel causative mutation in RP2 from a single proband's exome sequence data analysis. This study highlights the effectiveness of the whole-exome sequencing in the genetic diagnosis of X-linked retinitis pigmentosa, over the conventional sequencing methods. Even using a single exome, exome sequencing technology would be able to pinpoint pathogenic variant(s) for X-linked retinitis pigmentosa, when properly applied with aid of adequate variant filtering strategy. Copyright © 2016 Canadian Ophthalmological Society. Published by Elsevier Inc. All rights reserved.
DWARF – a data warehouse system for analyzing protein families

PubMed Central

Fischer, Markus; Thai, Quan K; Grieb, Melanie; Pleiss, Jürgen

2006-01-01

Background The emerging field of integrative bioinformatics provides the tools to organize and systematically analyze vast amounts of highly diverse biological data and thus allows to gain a novel understanding of complex biological systems. The data warehouse DWARF applies integrative bioinformatics approaches to the analysis of large protein families. Description The data warehouse system DWARF integrates data on sequence, structure, and functional annotation for protein fold families. The underlying relational data model consists of three major sections representing entities related to the protein (biochemical function, source organism, classification to homologous families and superfamilies), the protein sequence (position-specific annotation, mutant information), and the protein structure (secondary structure information, superimposed tertiary structure). Tools for extracting, transforming and loading data from public available resources (ExPDB, GenBank, DSSP) are provided to populate the database. The data can be accessed by an interface for searching and browsing, and by analysis tools that operate on annotation, sequence, or structure. We applied DWARF to the family of α/β-hydrolases to host the Lipase Engineering database. Release 2.3 contains 6138 sequences and 167 experimentally determined protein structures, which are assigned to 37 superfamilies 103 homologous families. Conclusion DWARF has been designed for constructing databases of large structurally related protein families and for evaluating their sequence-structure-function relationships by a systematic analysis of sequence, structure and functional annotation. It has been applied to predict biochemical properties from sequence, and serves as a valuable tool for protein engineering. PMID:17094801
SACCHARIS: an automated pipeline to streamline discovery of carbohydrate active enzyme activities within polyspecific families and de novo sequence datasets.

PubMed

Jones, Darryl R; Thomas, Dallas; Alger, Nicholas; Ghavidel, Ata; Inglis, G Douglas; Abbott, D Wade

2018-01-01

Deposition of new genetic sequences in online databases is expanding at an unprecedented rate. As a result, sequence identification continues to outpace functional characterization of carbohydrate active enzymes (CAZymes). In this paradigm, the discovery of enzymes with novel functions is often hindered by high volumes of uncharacterized sequences particularly when the enzyme sequence belongs to a family that exhibits diverse functional specificities (i.e., polyspecificity). Therefore, to direct sequence-based discovery and characterization of new enzyme activities we have developed an automated in silico pipeline entitled: Sequence Analysis and Clustering of CarboHydrate Active enzymes for Rapid Informed prediction of Specificity (SACCHARIS). This pipeline streamlines the selection of uncharacterized sequences for discovery of new CAZyme or CBM specificity from families currently maintained on the CAZy website or within user-defined datasets. SACCHARIS was used to generate a phylogenetic tree of a GH43, a CAZyme family with defined subfamily designations. This analysis confirmed that large datasets can be organized into sequence clusters of manageable sizes that possess related functions. Seeding this tree with a GH43 sequence from Bacteroides dorei DSM 17855 (BdGH43b, revealed it partitioned as a single sequence within the tree. This pattern was consistent with it possessing a unique enzyme activity for GH43 as BdGH43b is the first described α-glucanase described for this family. The capacity of SACCHARIS to extract and cluster characterized carbohydrate binding module sequences was demonstrated using family 6 CBMs (i.e., CBM6s). This CBM family displays a polyspecific ligand binding profile and contains many structurally determined members. Using SACCHARIS to identify a cluster of divergent sequences, a CBM6 sequence from a unique clade was demonstrated to bind yeast mannan, which represents the first description of an α-mannan binding CBM. Additionally, we have performed a CAZome analysis of an in-house sequenced bacterial genome and a comparative analysis of B. thetaiotaomicron VPI-5482 and B. thetaiotaomicron 7330, to demonstrate that SACCHARIS can generate "CAZome fingerprints", which differentiate between the saccharolytic potential of two related strains in silico. Establishing sequence-function and sequence-structure relationships in polyspecific CAZyme families are promising approaches for streamlining enzyme discovery. SACCHARIS facilitates this process by embedding CAZyme and CBM family trees generated from biochemically to structurally characterized sequences, with protein sequences that have unknown functions. In addition, these trees can be integrated with user-defined datasets (e.g., genomics, metagenomics, and transcriptomics) to inform experimental characterization of new CAZymes or CBMs not currently curated, and for researchers to compare differential sequence patterns between entire CAZomes. In this light, SACCHARIS provides an in silico tool that can be tailored for enzyme bioprospecting in datasets of increasing complexity and for diverse applications in glycobiotechnology.
Microsatellite analysis in the genome of Acanthaceae: An in silico approach.

PubMed

Kaliswamy, Priyadharsini; Vellingiri, Srividhya; Nathan, Bharathi; Selvaraj, Saravanakumar

2015-01-01

Acanthaceae is one of the advanced and specialized families with conventionally used medicinal plants. Simple sequence repeats (SSRs) play a major role as molecular markers for genome analysis and plant breeding. The microsatellites existing in the complete genome sequences would help to attain a direct role in the genome organization, recombination, gene regulation, quantitative genetic variation, and evolution of genes. The current study reports the frequency of microsatellites and appropriate markers for the Acanthaceae family genome sequences. The whole nucleotide sequences of Acanthaceae species were obtained from National Center for Biotechnology Information database and screened for the presence of SSRs. SSR Locator tool was used to predict the microsatellites and inbuilt Primer3 module was used for primer designing. Totally 110 repeats from 108 sequences of Acanthaceae family plant genomes were identified, and the occurrence of dinucleotide repeats was found to be abundant in the genome sequences. The essential amino acid isoleucine was found rich in all the sequences. We also designed the SSR-based primers/markers for 59 sequences of this family that contains microsatellite repeats in their genome. The identified microsatellites and primers might be useful for breeding and genetic studies of plants that belong to Acanthaceae family in the future.
SeqHBase: a big data toolset for family based sequencing data analysis.

PubMed

He, Min; Person, Thomas N; Hebbring, Scott J; Heinzen, Ethan; Ye, Zhan; Schrodi, Steven J; McPherson, Elizabeth W; Lin, Simon M; Peissig, Peggy L; Brilliant, Murray H; O'Rawe, Jason; Robison, Reid J; Lyon, Gholson J; Wang, Kai

2015-04-01

Whole-genome sequencing (WGS) and whole-exome sequencing (WES) technologies are increasingly used to identify disease-contributing mutations in human genomic studies. It can be a significant challenge to process such data, especially when a large family or cohort is sequenced. Our objective was to develop a big data toolset to efficiently manipulate genome-wide variants, functional annotations and coverage, together with conducting family based sequencing data analysis. Hadoop is a framework for reliable, scalable, distributed processing of large data sets using MapReduce programming models. Based on Hadoop and HBase, we developed SeqHBase, a big data-based toolset for analysing family based sequencing data to detect de novo, inherited homozygous, or compound heterozygous mutations that may contribute to disease manifestations. SeqHBase takes as input BAM files (for coverage at every site), variant call format (VCF) files (for variant calls) and functional annotations (for variant prioritisation). We applied SeqHBase to a 5-member nuclear family and a 10-member 3-generation family with WGS data, as well as a 4-member nuclear family with WES data. Analysis times were almost linearly scalable with number of data nodes. With 20 data nodes, SeqHBase took about 5 secs to analyse WES familial data and approximately 1 min to analyse WGS familial data. These results demonstrate SeqHBase's high efficiency and scalability, which is necessary as WGS and WES are rapidly becoming standard methods to study the genetics of familial disorders. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Identification of the sequence motif of glycoside hydrolase 13 family members

PubMed Central

Kumar, Vikash

2011-01-01

A bioinformatics analysis of sequences of enzymes of the glycoside hydrolase (GH) 13 family members such as α-amylase, cyclodextrin glycosyltransferase (CGTase), branching enzyme and cyclomaltodextrinase has been carried out in order to find out the sequence motifs that govern the reactions specificities of these enzymes by using hidden Markov model (HMM) profile. This analysis suggests the existence of such sequence motifs and residues of these motifs constituting the −1 to +3 catalytic subsites of the enzyme. Hence, by introducing mutations in the residues of these four subsites, one can change the reaction specificities of the enzymes. In general it has been observed that α -amylase sequence motif have low sequence conservation than rest of the motifs of the GH13 family members. PMID:21544166
Study of mitochondria D-loop gene to detect the heterogeneity of gemak in Turnicidae family

NASA Astrophysics Data System (ADS)

Setiati, N.; Partaya

2018-03-01

As a part of life biodiversity, birds in Turnicidae family should be preserved from the extinction and its type heterogeneity decline. One effort for giving the strategic base of plasma nutfah conservation is through genetic heterogeneity study. The aim of the research is to analyze D-loop gen from DNA mitochondria of gemak bird in Turnicidae family molecularly. From the result of the analysis, it may be known the genetic heterogeneity of gemak bird based on the sequence of D-loop gen. The collection of both types of gemak of Turnicidae family is still easy since we can find them in ricefield area after harvest particularly for Gemakloreng (Turnix sylvatica), it means while gemak tegalan (Turnixsusciator) is getting difficult to find. Based on the above DNA quantification standard, the blood sample of Gemak in this research is mostly grouped into pure blood (ranges from 1,63 – 1,90), and it deserves to be used for PCR analysis. The sequencing analysis has not detected the sequence of nucleotide completely. However, it indicates sequence polymorphism of base as the arranger of D-loop gen. D-loop gen may identify genetic heterogeneity of gemak bird of Turnicidae family, but it is necessary to perform further sequencing analysis with PCR-RFLP technique. This complete nucleotide sequence is obtained and easy to detect after being cut restriction enzyme.
Molecular evolution of miraculin-like proteins in soybean Kunitz super-family.

PubMed

Selvakumar, Purushotham; Gahloth, Deepankar; Tomar, Prabhat Pratap Singh; Sharma, Nidhi; Sharma, Ashwani Kumar

2011-12-01

Miraculin-like proteins (MLPs) belong to soybean Kunitz super-family and have been characterized from many plant families like Rutaceae, Solanaceae, Rubiaceae, etc. Many of them possess trypsin inhibitory activity and are involved in plant defense. MLPs exhibit significant sequence identity (~30-95%) to native miraculin protein, also belonging to Kunitz super-family compared with a typical Kunitz family member (~30%). The sequence and structure-function comparison of MLPs with that of a classical Kunitz inhibitor have demonstrated that MLPs have evolved to form a distinct group within Kunitz super-family. Sequence analysis of new genes along with available MLP sequences in the literature revealed three major groups for these proteins. A significant feature of Rutaceae MLP type 2 sequences is the presence of phosphorylation motif. Subtle changes are seen in putative reactive loop residues among different MLPs suggesting altered specificities to specific proteases. In phylogenetic analysis, Rutaceae MLP type 1 and type 2 proteins clustered together on separate branches, whereas native miraculin along with other MLPs formed distinct clusters. Site-specific positive Darwinian selection was observed at many sites in both the groups of Rutaceae MLP sequences with most of the residues undergoing positive selection located in loop regions. The results demonstrate the sequence and thereby the structure-function divergence of MLPs as a distinct group within soybean Kunitz super-family due to biotic and abiotic stresses of local environment.
Acyl carrier protein structural classification and normal mode analysis

PubMed Central

Cantu, David C; Forrester, Michael J; Charov, Katherine; Reilly, Peter J

2012-01-01

All acyl carrier protein primary and tertiary structures were gathered into the ThYme database. They are classified into 16 families by amino acid sequence similarity, with members of the different families having sequences with statistically highly significant differences. These classifications are supported by tertiary structure superposition analysis. Tertiary structures from a number of families are very similar, suggesting that these families may come from a single distant ancestor. Normal vibrational mode analysis was conducted on experimentally determined freestanding structures, showing greater fluctuations at chain termini and loops than in most helices. Their modes overlap more so within families than between different families. The tertiary structures of three acyl carrier protein families that lacked any known structures were predicted as well. PMID:22374859
Microsatellite analysis in the genome of Acanthaceae: An in silico approach

PubMed Central

Kaliswamy, Priyadharsini; Vellingiri, Srividhya; Nathan, Bharathi; Selvaraj, Saravanakumar

2015-01-01

Background: Acanthaceae is one of the advanced and specialized families with conventionally used medicinal plants. Simple sequence repeats (SSRs) play a major role as molecular markers for genome analysis and plant breeding. The microsatellites existing in the complete genome sequences would help to attain a direct role in the genome organization, recombination, gene regulation, quantitative genetic variation, and evolution of genes. Objective: The current study reports the frequency of microsatellites and appropriate markers for the Acanthaceae family genome sequences. Materials and Methods: The whole nucleotide sequences of Acanthaceae species were obtained from National Center for Biotechnology Information database and screened for the presence of SSRs. SSR Locator tool was used to predict the microsatellites and inbuilt Primer3 module was used for primer designing. Results: Totally 110 repeats from 108 sequences of Acanthaceae family plant genomes were identified, and the occurrence of dinucleotide repeats was found to be abundant in the genome sequences. The essential amino acid isoleucine was found rich in all the sequences. We also designed the SSR-based primers/markers for 59 sequences of this family that contains microsatellite repeats in their genome. Conclusion: The identified microsatellites and primers might be useful for breeding and genetic studies of plants that belong to Acanthaceae family in the future. PMID:25709226
Linkage Study Revealed Complex Haplotypes in a Multifamily due to Different Mutations in CAPN3 Gene in an Iranian Ethnic Group.

PubMed

Mojbafan, Marzieh; Tonekaboni, Seyed Hassan; Abiri, Maryam; Kianfar, Soudeh; Sarhadi, Ameneh; Nilipour, Yalda; Tavakkoly-Bazzaz, Javad; Zeinali, Sirous

2016-07-01

Calpainopathy is an autosomal recessive form of limb girdle muscular dystrophies which is caused by mutation in CAPN3 gene. In the present study, co-segregation of this disorder was analyzed with four short tandem repeat markers linked to the CAPN3 gene. Three apparently unrelated Iranian families with same ethnicity were investigated. Haplotype analysis and sequencing of the CAPN3 gene were performed. DNA sample from one of the patients was simultaneously sent for next-generation sequencing. DNA sequencing identified two mutations. It was seen as a homozygous c.2105C>T in exon 19 in one family, a homozygous novel mutation c.380G>A in exon 3 in another family, and a compound heterozygote form of these two mutations in the third family. Next-generation sequencing also confirmed our results. It was expected that, due to the rare nature of limb girdle muscular dystrophies, affected individuals from the same ethnic group share similar mutations. Haplotype analysis showed two different homozygote patterns in two families, yet a compound heterozygote pattern in the third family as seen in the mutation analysis. This study shows that haplotype analysis would help in determining presence of different founders.
Cloning and sequence analysis of Hemonchus contortus HC58cDNA.

PubMed

Muleke, Charles I; Ruofeng, Yan; Lixin, Xu; Xinwen, Bo; Xiangrui, Li

2007-06-01

The complete coding sequence of Hemonchus contortus HC58cDNA was generated by rapid amplification of cDNA ends and polymerase chain reaction using primers based on the 5' and 3' ends of the parasite mRNA, accession no. AF305964. The HC58cDNA gene was 851 bp long, with open reading frame of 717 bp, precursors to 239 amino acids coding for approximately 27 kDa protein. Analysis of amino acid sequence revealed conserved residues of cysteine, histidine, asparagine, occluding loop pattern, hemoglobinase motif and glutamine of the oxyanion hole characteristic of cathepsin B like proteases (CBL). Comparison of the predicted amino acid sequences showed the protein shared 33.5-58.7% identity to cathepsin B homologues in the papain clan CA family (family C1). Phylogenetic analysis revealed close evolutionary proximity of the protein sequence to counterpart sequences in the CBL, suggesting that HC58cDNA was a member of the papain family.
Molecular characterization and distribution of a 145-bp tandem repeat family in the genus Populus.

PubMed

Rajagopal, J; Das, S; Khurana, D K; Srivastava, P S; Lakshmikumaran, M

1999-10-01

This report aims to describe the identification and molecular characterization of a 145-bp tandem repeat family that accounts for nearly 1.5% of the Populus genome. Three members of this repeat family were cloned and sequenced from Populus deltoides and P. ciliata. The dimers of the repeat were sequenced in order to confirm the head-to-tail organization of the repeat. Hybridization-based analysis using the 145-bp tandem repeat as a probe on genomic DNA gave rise to ladder patterns which were identified to be a result of methylation and (or) sequence heterogeneity. Analysis of the methylation pattern of the repeat family using methylation-sensitive isoschizomers revealed variable methylation of the C residues and lack of methylation of the A residues. Sequence comparisons between the monomers revealed a high degree of sequence divergence that ranged between 6% and 11% in P. deltoides and between 4.2% and 8.3% in P. ciliata. This indicated the presence of sub-families within the 145-bp tandem family of repeats. Divergence was mainly due to the accumulation of point mutations and was concentrated in the central region of the repeat. The 145-bp tandem repeat family did not show significant homology to known tandem repeats from plants. A short stretch of 36 bp was found to show homology of 66.7% to a centromeric repeat from Chironomus plumosus. Dot-blot analysis and Southern hybridization data revealed the presence of the repeat family in 13 of the 14 Populus species examined. The absence of the 145-bp repeat from P. euphratica suggested that this species is relatively distant from other members of the genus, which correlates with taxonomic classifications. The widespread occurrence of the tandem family in the genus indicated that this family may be of ancient origin.
Motor sequencing deficit as an endophenotype of speech sound disorder: a genome-wide linkage analysis in a multigenerational family.

PubMed

Peter, Beate; Matsushita, Mark; Raskind, Wendy H

2012-10-01

The aim of this pilot study was to investigate a measure of motor sequencing deficit as a potential endophenotype of speech sound disorder (SSD) in a multigenerational family with evidence of familial SSD. In a multigenerational family with evidence of a familial motor-based SSD, affectation status and a measure of motor sequencing during oral motor testing were obtained. To further investigate the role of motor sequencing as an endophenotype for genetic studies, parametric and nonparametric linkage analyses were carried out using a genome-wide panel of 404 microsatellites. In seven of the 10 family members with available data, SSD affectation status and motor sequencing status coincided. Linkage analysis revealed four regions of interest, 6p21, 7q32, 7q36, and 8q24, primarily identified with the measure of motor sequencing ability. The 6p21 region overlaps with a locus implicated in rapid alternating naming in a recent genome-wide dyslexia linkage study. The 7q32 locus contains a locus implicated in dyslexia. The 7q36 locus borders on a gene known to affect the component traits of language impairment. The results are consistent with a motor-based endophenotype of SSD that would be informative for genetic studies. The linkage results in this first genome-wide study in a multigenerational family with SSD warrant follow-up in additional families and with fine mapping or next-generation approaches to gene identification.
Motor sequencing deficit as an endophenotype of speech sound disorder: A genome-wide linkage analysis in a multigenerational family

PubMed Central

Peter, Beate; Matsushita, Mark; Raskind, Wendy H.

2012-01-01

Objectives The purpose of this pilot study was to investigate a measure of motor sequencing deficit as a potential endophenotype of speech sound disorder (SSD) in a multigenerational family with evidence of familial SSD. Methods In a multigenerational family with evidence of a familial motor-based SSD, affectation status and a measure of motor sequencing during oral motor testing were obtained. To further investigate the role of motor sequencing as an endophenotype for genetic studies, parametric and nonparametric linkage analyses were conducted using a genome-wide panel of 404 microsatellites. Results In seven of the ten family members with available data, SSD affectation status and motor sequencing status coincided. Linkage analysis revealed four regions of interest, 6p21, 7q32, 7q36, and 8q24, primarily identified with the measure of motor sequencing ability. The 6p21 region overlaps with a locus implicated in rapid alternating naming in a recent genome-wide dyslexia linkage study. The 7q32 locus contains a locus implicated in dyslexia. The 7q36 locus borders on a gene known to affect component traits of language impairment. Conclusions Results are consistent with a motor-based endophenotype of SSD that would be informative for genetic studies. The linkage results in this first genome-wide study in a multigenerational family with SSD warrant follow-up in additional families and with fine mapping or next-generation approaches to gene identification. PMID:22517379
The Application of Next-Generation Sequencing for Mutation Detection in Autosomal-Dominant Hereditary Hearing Impairment.

PubMed

Gürtler, Nicolas; Röthlisberger, Benno; Ludin, Katja; Schlegel, Christoph; Lalwani, Anil K

2017-07-01

Identification of the causative mutation using next-generation sequencing in autosomal-dominant hereditary hearing impairment, as mutation analysis in hereditary hearing impairment by classic genetic methods, is hindered by the high heterogeneity of the disease. Two Swiss families with autosomal-dominant hereditary hearing impairment. Amplified DNA libraries for next-generation sequencing were constructed from extracted genomic DNA, derived from peripheral blood, and enriched by a custom-made sequence capture library. Validated, pooled libraries were sequenced on an Illumina MiSeq instrument, 300 cycles and paired-end sequencing. Technical data analysis was performed with SeqMonk, variant analysis with GeneTalk or VariantStudio. The detection of mutations in genes related to hearing loss by next-generation sequencing was subsequently confirmed using specific polymerase-chain-reaction and Sanger sequencing. Mutation detection in hearing-loss-related genes. The first family harbored the mutation c.5383+5delGTGA in the TECTA-gene. In the second family, a novel mutation c.2614-2625delCATGGCGCCGTG in the WFS1-gene and a second mutation TCOF1-c.1028G>A were identified. Next-generation sequencing successfully identified the causative mutation in families with autosomal-dominant hereditary hearing impairment. The results helped to clarify the pathogenic role of a known mutation and led to the detection of a novel one. NGS represents a feasible approach with great potential future in the diagnostics of hereditary hearing impairment, even in smaller labs.
PNMA family: Protein interaction network and cell signalling pathways implicated in cancer and apoptosis.

PubMed

Pang, Siew Wai; Lahiri, Chandrajit; Poh, Chit Laa; Tan, Kuan Onn

2018-05-01

Paraneoplastic Ma Family (PNMA) comprises a growing number of family members which share relatively conserved protein sequences encoded by the human genome and is localized to several human chromosomes, including the X-chromosome. Based on sequence analysis, PNMA family members share sequence homology to the Gag protein of LTR retrotransposon, and several family members with aberrant protein expressions have been reported to be closely associated with the human Paraneoplastic Disorder (PND). In addition, gene mutations of specific members of PNMA family are known to be associated with human mental retardation or 3-M syndrome consisting of restrictive post-natal growth or dwarfism, and development of skeletal abnormalities. Other than sequence homology, the physiological function of many members in this family remains unclear. However, several members of this family have been characterized, including cell signalling events mediated by these proteins that are associated with apoptosis, and cancer in different cell types. Furthermore, while certain PNMA family members show restricted gene expression in the human brain and testis, other PNMA family members exhibit broader gene expression or preferential and selective protein interaction profiles, suggesting functional divergence within the family. Functional analysis of some members of this family have identified protein domains that are required for subcellular localization, protein-protein interactions, and cell signalling events which are the focus of this review paper. Copyright © 2018 Elsevier Inc. All rights reserved.

Genetic Diagnosis in Consanguineous Families With Kidney Disease by Homozygosity Mapping Coupled With Whole-Exome Sequencing

PubMed Central

Al-Romaih, Khaldoun I.; Genovese, Giulio; Al-Mojalli, Hamad; Al-Othman, Saleh; Al-Manea, Hadeel; Al-Suleiman, Mohammed; Al-Jondubi, Mohammed; Atallah, Nourah; Al-Rodhyan, Maha; Weins, Astrid; Pollak, Martin R.; Adra, Chaker N.

2011-01-01

Background Accurate diagnosis of the primary cause of an individual’s kidney disease can be essential for proper management. Some kidney diseases have overlapping histopathological features despite being caused by defects in different genes. In this report we describe two consanguineous Saudi Arabian families in which individuals presented with kidney failure and mixed clinical and histological features initially thought consistent with focal segmental glomerulosclerosis. Study Design Case series. Setting and participants We studied members of two apparently unrelated families from Saudi Arabia with kidney disease. Measurements Whole-genome single-nucleotide polymorphism analysis followed by targeted isolation and sequencing of exons using genomic DNA samples from affected members of these families, followed by additional focused genotyping and sequence analysis. Results The two apparently unrelated families shared a region of homozygosity on chromosome 2q13. Exome sequence from the affected individuals lacked any sequence reads from the NPHP1 gene, which is located within this homozygous region. Additional PCR based genotyping confirmed that affected individuals had NPHP1 deletions, rather than defects in a known FSGS-associated gene. Limitations The methods used here may not result in a clear genetic diagnosis in many cases of apparent familial kidney disease. Conclusions This analysis demonstrates the power of new high-throughput genotyping and sequencing technologies to aid in the rapid genetic diagnosis of individuals with an inherited form of kidney disease. We believe it is likely that such tools may become useful clinical genetic tools and alter the manner in which diagnoses are made in nephrology. PMID:21658830
Pfarao: a web application for protein family analysis customized for cytoskeletal and motor proteins (CyMoBase).

PubMed

Odronitz, Florian; Kollmar, Martin

2006-11-29

Annotation of protein sequences of eukaryotic organisms is crucial for the understanding of their function in the cell. Manual annotation is still by far the most accurate way to correctly predict genes. The classification of protein sequences, their phylogenetic relation and the assignment of function involves information from various sources. This often leads to a collection of heterogeneous data, which is hard to track. Cytoskeletal and motor proteins consist of large and diverse superfamilies comprising up to several dozen members per organism. Up to date there is no integrated tool available to assist in the manual large-scale comparative genomic analysis of protein families. Pfarao (Protein Family Application for Retrieval, Analysis and Organisation) is a database driven online working environment for the analysis of manually annotated protein sequences and their relationship. Currently, the system can store and interrelate a wide range of information about protein sequences, species, phylogenetic relations and sequencing projects as well as links to literature and domain predictions. Sequences can be imported from multiple sequence alignments that are generated during the annotation process. A web interface allows to conveniently browse the database and to compile tabular and graphical summaries of its content. We implemented a protein sequence-centric web application to store, organize, interrelate, and present heterogeneous data that is generated in manual genome annotation and comparative genomics. The application has been developed for the analysis of cytoskeletal and motor proteins (CyMoBase) but can easily be adapted for any protein.
Exome sequencing and genome-wide linkage analysis in 17 families illustrate the complex contribution of TTN truncating variants to dilated cardiomyopathy.

PubMed

Norton, Nadine; Li, Duanxiang; Rampersaud, Evadnie; Morales, Ana; Martin, Eden R; Zuchner, Stephan; Guo, Shengru; Gonzalez, Michael; Hedges, Dale J; Robertson, Peggy D; Krumm, Niklas; Nickerson, Deborah A; Hershberger, Ray E

2013-04-01

BACKGROUND- Familial dilated cardiomyopathy (DCM) is a genetically heterogeneous disease with >30 known genes. TTN truncating variants were recently implicated in a candidate gene study to cause 25% of familial and 18% of sporadic DCM cases. METHODS AND RESULTS- We used an unbiased genome-wide approach using both linkage analysis and variant filtering across the exome sequences of 48 individuals affected with DCM from 17 families to identify genetic cause. Linkage analysis ranked the TTN region as falling under the second highest genome-wide multipoint linkage peak, multipoint logarithm of odds, 1.59. We identified 6 TTN truncating variants carried by individuals affected with DCM in 7 of 17 DCM families (logarithm of odds, 2.99); 2 of these 7 families also had novel missense variants that segregated with disease. Two additional novel truncating TTN variants did not segregate with DCM. Nucleotide diversity at the TTN locus, including missense variants, was comparable with 5 other known DCM genes. The average number of missense variants in the exome sequences from the DCM cases or the ≈5400 cases from the Exome Sequencing Project was ≈23 per individual. The average number of TTN truncating variants in the Exome Sequencing Project was 0.014 per individual. We also identified a region (chr9q21.11-q22.31) with no known DCM genes with a maximum heterogeneity logarithm of odds score of 1.74. CONCLUSIONS- These data suggest that TTN truncating variants contribute to DCM cause. However, the lack of segregation of all identified TTN truncating variants illustrates the challenge of determining variant pathogenicity even with full exome sequencing.
A novel mutation in PRPF31, causative of autosomal dominant retinitis pigmentosa, using the BGISEQ-500 sequencer.

PubMed

Zheng, Yu; Wang, Hai-Lin; Li, Jian-Kang; Xu, Li; Tellier, Laurent; Li, Xiao-Lin; Huang, Xiao-Yan; Li, Wei; Niu, Tong-Tong; Yang, Huan-Ming; Zhang, Jian-Guo; Liu, Dong-Ning

2018-01-01

To study the genes responsible for retinitis pigmentosa. A total of 15 Chinese families with retinitis pigmentosa, containing 94 sporadically afflicted cases, were recruited. The targeted sequences were captured using the Target_Eye_365_V3 chip and sequenced using the BGISEQ-500 sequencer, according to the manufacturer's instructions. Data were aligned to UCSC Genome Browser build hg19, using the Burroughs Wheeler Aligner MEM algorithm. Local realignment was performed with the Genome Analysis Toolkit (GATK v.3.3.0) IndelRealigner, and variants were called with the Genome Analysis Toolkit Haplotypecaller, without any use of imputation. Variants were filtered against a panel derived from 1000 Genomes Project, 1000G_ASN, ESP6500, ExAC and dbSNP138. In all members of Family ONE and Family TWO with available DNA samples, the genetic variant was validated using Sanger sequencing. A novel, pathogenic variant of retinitis pigmentosa, c.357_358delAA (p.Ser119SerfsX5) was identified in PRPF31 in 2 of 15 autosomal-dominant retinitis pigmentosa (ADRP) families, as well as in one, sporadic case. Sanger sequencing was performed upon probands, as well as upon other family members. This novel, pathogenic genotype co-segregated with retinitis pigmentosa phenotype in these two families. ADRP is a subtype of retinitis pigmentosa, defined by its genotype, which accounts for 20%-40% of the retinitis pigmentosa patients. Our study thus expands the spectrum of PRPF31 mutations known to occur in ADRP, and provides further demonstration of the applicability of the BGISEQ500 sequencer for genomics research.
Whole-exome sequencing identifies USH2A mutations in a pseudo-dominant Usher syndrome family.

PubMed

Zheng, Sui-Lian; Zhang, Hong-Liang; Lin, Zhen-Lang; Kang, Qian-Yan

2015-10-01

Usher syndrome (USH) is an autosomal recessive (AR) multi-sensory degenerative disorder leading to deaf-blindness. USH is clinically subdivided into three subclasses, and 10 genes have been identified thus far. Clinical and genetic heterogeneities in USH make a precise diagnosis difficult. A dominant‑like USH family in successive generations was identified, and the present study aimed to determine the genetic predisposition of this family. Whole‑exome sequencing was performed in two affected patients and an unaffected relative. Systematic data were analyzed by bioinformatic analysis to remove the candidate mutations via step‑wise filtering. Direct Sanger sequencing and co‑segregation analysis were performed in the pedigree. One novel and two known mutations in the USH2A gene were identified, and were further confirmed by direct sequencing and co‑segregation analysis. The affected mother carried compound mutations in the USH2A gene, while the unaffected father carried a heterozygous mutation. The present study demonstrates that whole‑exome sequencing is a robust approach for the molecular diagnosis of disorders with high levels of genetic heterogeneity.
Mouse Vk gene classification by nucleic acid sequence similarity.

PubMed

Strohal, R; Helmberg, A; Kroemer, G; Kofler, R

1989-01-01

Analyses of immunoglobulin (Ig) variable (V) region gene usage in the immune response, estimates of V gene germline complexity, and other nucleic acid hybridization-based studies depend on the extent to which such genes are related (i.e., sequence similarity) and their organization in gene families. While mouse Igh heavy chain V region (VH) gene families are relatively well-established, a corresponding systematic classification of Igk light chain V region (Vk) genes has not been reported. The present analysis, in the course of which we reviewed the known extent of the Vk germline gene repertoire and Vk gene usage in a variety of responses to foreign and self antigens, provides a classification of mouse Vk genes in gene families composed of members with greater than 80% overall nucleic acid sequence similarity. This classification differed in several aspects from that of VH genes: only some Vk gene families were as clearly separated (by greater than 25% sequence dissimilarity) as typical VH gene families; most Vk gene families were closely related and, in several instances, members from different families were very similar (greater than 80%) over large sequence portions; frequently, classification by nucleic acid sequence similarity diverged from existing classifications based on amino-terminal protein sequence similarity. Our data have implications for Vk gene analyses by nucleic acid hybridization and describe potentially important differences in sequence organization between VH and Vk genes.
Isolation of anonymous DNA sequences from within a submicroscopic X chromosomal deletion in a patient with choroideremia, deafness, and mental retardation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nussbaum, R.L.; Lesko, J.G.; Lewis, R.A.

1987-09-01

Choroideremia, an X-chromosome linked retinal dystrophy of unknown pathogenesis, causes progressive nightblindness and eventual central blindness in affected males by the third to fourth decade of life. Choroideremia has been mapped to Xq13-21 by tight linkage to restriction fragment length polymorphism loci. The authors have recently identified two families in which choroideremia is inherited with mental retardation and deafness. In family XL-62, an interstitial deletion Xq21 is visible by cytogenetic analysis and two linked anonymous DNA markers, DXYS1 and DXS72, are deleted. In the second family, XL-45, an interstitial deletion was suspected on phenotypic grounds but could not be confirmedmore » by high-resolution cytogenetic analysis. They used phenol-enhanced reassociation of 48,XXXX DNA in competition with excess XL-45 DNA to generate a library of cloned DNA enriched for sequences that might be deleted in XL-45. Two of the first 83 sequences characterized from the library were found to be deleted in probands from family XL-45 as well as from family XL-62. Isolation of these sequences proves that XL-45 does contain a submicroscopic deletion and provides a starting point for identifying overlapping genomic sequences that span the XL-45 deletion. Each overlapping sequence will be studied to identify exons from the choroideremia locus.« less
Phylogenetic relationships in the family Streptomycetaceae using multi-locus sequence analysis

USDA-ARS?s Scientific Manuscript database

The family Streptomycetaceae, notably species in the genus Streptomyces, have long been the subject of investigation due to their well-known ability to produce secondary metabolites. The emergence of drug resistant pathogens and the relative ease of producing genome sequences has renewed the importa...
Pfarao: a web application for protein family analysis customized for cytoskeletal and motor proteins (CyMoBase)

PubMed Central

Odronitz, Florian; Kollmar, Martin

2006-01-01

Background Annotation of protein sequences of eukaryotic organisms is crucial for the understanding of their function in the cell. Manual annotation is still by far the most accurate way to correctly predict genes. The classification of protein sequences, their phylogenetic relation and the assignment of function involves information from various sources. This often leads to a collection of heterogeneous data, which is hard to track. Cytoskeletal and motor proteins consist of large and diverse superfamilies comprising up to several dozen members per organism. Up to date there is no integrated tool available to assist in the manual large-scale comparative genomic analysis of protein families. Description Pfarao (Protein Family Application for Retrieval, Analysis and Organisation) is a database driven online working environment for the analysis of manually annotated protein sequences and their relationship. Currently, the system can store and interrelate a wide range of information about protein sequences, species, phylogenetic relations and sequencing projects as well as links to literature and domain predictions. Sequences can be imported from multiple sequence alignments that are generated during the annotation process. A web interface allows to conveniently browse the database and to compile tabular and graphical summaries of its content. Conclusion We implemented a protein sequence-centric web application to store, organize, interrelate, and present heterogeneous data that is generated in manual genome annotation and comparative genomics. The application has been developed for the analysis of cytoskeletal and motor proteins (CyMoBase) but can easily be adapted for any protein. PMID:17134497
Easy-to-use phylogenetic analysis system for hepatitis B virus infection.

PubMed

Sugiyama, Masaya; Inui, Ayano; Shin-I, Tadasu; Komatsu, Haruki; Mukaide, Motokazu; Masaki, Naohiko; Murata, Kazumoto; Ito, Kiyoaki; Nakanishi, Makoto; Fujisawa, Tomoo; Mizokami, Masashi

2011-10-01

The molecular phylogenetic analysis has been broadly applied to clinical and virological study. However, the appropriate settings and application of calculation parameters are difficult for non-specialists of molecular genetics. In the present study, the phylogenetic analysis tool was developed for the easy determination of genotypes and transmission route. A total of 23 patients of 10 families infected with hepatitis B virus (HBV) were enrolled and expected to undergo intrafamilial transmission. The extracted HBV DNA were amplified and sequenced in a region of the S gene. The software to automatically classify query sequence was constructed and installed on the Hepatitis Virus Database (HVDB). Reference sequences were retrieved from HVDB, which contained major genotypes from A to H. Multiple-alignments using CLUSTAL W were performed before the genetic distance matrix was calculated with the six-parameter method. The phylogenetic tree was output by the neighbor-joining method. User interface using WWW-browser was also developed for intuitive control. This system was named as the easy-to-use phylogenetic analysis system (E-PAS). Twenty-three sera of 10 families were analyzed to evaluate E-PAS. The queries obtained from nine families were genotype C and were located in one cluster per family. However, one patient of a family was classified into the cluster different from her family, suggesting that E-PAS detected the sample distinct from that of her family on the transmission route. The E-PAS to output phylogenetic tree was developed since requisite material was sequence data only. E-PAS could expand to determine HBV genotypes as well as transmission routes. © 2011 The Japan Society of Hepatology.
A metagenomic viral discovery approach identifies potential zoonotic and novel mammalian viruses in Neoromicia bats within South Africa.

PubMed

Geldenhuys, Marike; Mortlock, Marinda; Weyer, Jacqueline; Bezuidt, Oliver; Seamark, Ernest C J; Kearney, Teresa; Gleasner, Cheryl; Erkkila, Tracy H; Cui, Helen; Markotter, Wanda

2018-01-01

Species within the Neoromicia bat genus are abundant and widely distributed in Africa. It is common for these insectivorous bats to roost in anthropogenic structures in urban regions. Additionally, Neoromicia capensis have previously been identified as potential hosts for Middle East respiratory syndrome (MERS)-related coronaviruses. This study aimed to ascertain the gastrointestinal virome of these bats, as viruses excreted in fecal material or which may be replicating in rectal or intestinal tissues have the greatest opportunities of coming into contact with other hosts. Samples were collected in five regions of South Africa over eight years. Initial virome composition was determined by viral metagenomic sequencing by pooling samples and enriching for viral particles. Libraries were sequenced on the Illumina MiSeq and NextSeq500 platforms, producing a combined 37 million reads. Bioinformatics analysis of the high throughput sequencing data detected the full genome of a novel species of the Circoviridae family, and also identified sequence data from the Adenoviridae, Coronaviridae, Herpesviridae, Parvoviridae, Papillomaviridae, Phenuiviridae, and Picornaviridae families. Metagenomic sequencing data was insufficient to determine the viral diversity of certain families due to the fragmented coverage of genomes and lack of suitable sequencing depth, as some viruses were detected from the analysis of reads-data only. Follow up conventional PCR assays targeting conserved gene regions for the Adenoviridae, Coronaviridae, and Herpesviridae families were used to confirm metagenomic data and generate additional sequences to determine genetic diversity. The complete coding genome of a MERS-related coronavirus was recovered with additional amplicon sequencing on the MiSeq platform. The new genome shared 97.2% overall nucleotide identity to a previous Neoromicia-associated MERS-related virus, also from South Africa. Conventional PCR analysis detected diverse adenovirus and herpesvirus sequences that were widespread throughout Neoromicia populations in South Africa. Furthermore, similar adenovirus sequences were detected within these populations throughout several years. With the exception of the coronaviruses, the study represents the first report of sequence data from several viral families within a Southern African insectivorous bat genus; highlighting the need for continued investigations in this regard.
A metagenomic viral discovery approach identifies potential zoonotic and novel mammalian viruses in Neoromicia bats within South Africa

PubMed Central

Geldenhuys, Marike; Mortlock, Marinda; Weyer, Jacqueline; Bezuidt, Oliver; Seamark, Ernest C. J.; Kearney, Teresa; Gleasner, Cheryl; Erkkila, Tracy H.; Cui, Helen; Markotter, Wanda

2018-01-01

Species within the Neoromicia bat genus are abundant and widely distributed in Africa. It is common for these insectivorous bats to roost in anthropogenic structures in urban regions. Additionally, Neoromicia capensis have previously been identified as potential hosts for Middle East respiratory syndrome (MERS)-related coronaviruses. This study aimed to ascertain the gastrointestinal virome of these bats, as viruses excreted in fecal material or which may be replicating in rectal or intestinal tissues have the greatest opportunities of coming into contact with other hosts. Samples were collected in five regions of South Africa over eight years. Initial virome composition was determined by viral metagenomic sequencing by pooling samples and enriching for viral particles. Libraries were sequenced on the Illumina MiSeq and NextSeq500 platforms, producing a combined 37 million reads. Bioinformatics analysis of the high throughput sequencing data detected the full genome of a novel species of the Circoviridae family, and also identified sequence data from the Adenoviridae, Coronaviridae, Herpesviridae, Parvoviridae, Papillomaviridae, Phenuiviridae, and Picornaviridae families. Metagenomic sequencing data was insufficient to determine the viral diversity of certain families due to the fragmented coverage of genomes and lack of suitable sequencing depth, as some viruses were detected from the analysis of reads-data only. Follow up conventional PCR assays targeting conserved gene regions for the Adenoviridae, Coronaviridae, and Herpesviridae families were used to confirm metagenomic data and generate additional sequences to determine genetic diversity. The complete coding genome of a MERS-related coronavirus was recovered with additional amplicon sequencing on the MiSeq platform. The new genome shared 97.2% overall nucleotide identity to a previous Neoromicia-associated MERS-related virus, also from South Africa. Conventional PCR analysis detected diverse adenovirus and herpesvirus sequences that were widespread throughout Neoromicia populations in South Africa. Furthermore, similar adenovirus sequences were detected within these populations throughout several years. With the exception of the coronaviruses, the study represents the first report of sequence data from several viral families within a Southern African insectivorous bat genus; highlighting the need for continued investigations in this regard. PMID:29579103
Sequence Search and Comparative Genomic Analysis of SUMO-Activating Enzymes Using CoGe.

PubMed

Carretero-Paulet, Lorenzo; Albert, Victor A

2016-01-01

The growing number of genome sequences completed during the last few years has made necessary the development of bioinformatics tools for the easy access and retrieval of sequence data, as well as for downstream comparative genomic analyses. Some of these are implemented as online platforms that integrate genomic data produced by different genome sequencing initiatives with data mining tools as well as various comparative genomic and evolutionary analysis possibilities.Here, we use the online comparative genomics platform CoGe ( http://www.genomevolution.org/coge/ ) (Lyons and Freeling. Plant J 53:661-673, 2008; Tang and Lyons. Front Plant Sci 3:172, 2012) (1) to retrieve the entire complement of orthologous and paralogous genes belonging to the SUMO-Activating Enzymes 1 (SAE1) gene family from a set of species representative of the Brassicaceae plant eudicot family with genomes fully sequenced, and (2) to investigate the history, timing, and molecular mechanisms of the gene duplications driving the evolutionary expansion and functional diversification of the SAE1 family in Brassicaceae.
Polymorphism and selection in the major histocompatibility complex DRA and DQA genes in the family Equidae.

PubMed

Janova, Eva; Matiasovic, Jan; Vahala, Jiri; Vodicka, Roman; Van Dyk, Enette; Horin, Petr

2009-07-01

The major histocompatibility complex genes coding for antigen binding and presenting molecules are the most polymorphic genes in the vertebrate genome. We studied the DRA and DQA gene polymorphism of the family Equidae. In addition to 11 previously reported DRA and 24 DQA alleles, six new DRA sequences and 13 new DQA alleles were identified in the genus Equus. Phylogenetic analysis of both DRA and DQA sequences provided evidence for trans-species polymorphism in the family Equidae. The phylogenetic trees differed from species relationships defined by standard taxonomy of Equidae and from trees based on mitochondrial or neutral gene sequence data. Analysis of selection showed differences between the less variable DRA and more variable DQA genes. DRA alleles were more often shared by more species. The DQA sequences analysed showed strong amongst-species positive selection; the selected amino acid positions mostly corresponded to selected positions in rodent and human DQA genes.
A novel mutation in PRPF31, causative of autosomal dominant retinitis pigmentosa, using the BGISEQ-500 sequencer

PubMed Central

Zheng, Yu; Wang, Hai-Lin; Li, Jian-Kang; Xu, Li; Tellier, Laurent; Li, Xiao-Lin; Huang, Xiao-Yan; Li, Wei; Niu, Tong-Tong; Yang, Huan-Ming; Zhang, Jian-Guo; Liu, Dong-Ning

2018-01-01

AIM To study the genes responsible for retinitis pigmentosa. METHODS A total of 15 Chinese families with retinitis pigmentosa, containing 94 sporadically afflicted cases, were recruited. The targeted sequences were captured using the Target_Eye_365_V3 chip and sequenced using the BGISEQ-500 sequencer, according to the manufacturer's instructions. Data were aligned to UCSC Genome Browser build hg19, using the Burroughs Wheeler Aligner MEM algorithm. Local realignment was performed with the Genome Analysis Toolkit (GATK v.3.3.0) IndelRealigner, and variants were called with the Genome Analysis Toolkit Haplotypecaller, without any use of imputation. Variants were filtered against a panel derived from 1000 Genomes Project, 1000G_ASN, ESP6500, ExAC and dbSNP138. In all members of Family ONE and Family TWO with available DNA samples, the genetic variant was validated using Sanger sequencing. RESULTS A novel, pathogenic variant of retinitis pigmentosa, c.357_358delAA (p.Ser119SerfsX5) was identified in PRPF31 in 2 of 15 autosomal-dominant retinitis pigmentosa (ADRP) families, as well as in one, sporadic case. Sanger sequencing was performed upon probands, as well as upon other family members. This novel, pathogenic genotype co-segregated with retinitis pigmentosa phenotype in these two families. CONCLUSION ADRP is a subtype of retinitis pigmentosa, defined by its genotype, which accounts for 20%-40% of the retinitis pigmentosa patients. Our study thus expands the spectrum of PRPF31 mutations known to occur in ADRP, and provides further demonstration of the applicability of the BGISEQ500 sequencer for genomics research. PMID:29375987
Mutation analysis in 129 genes associated with other forms of retinal dystrophy in 157 families with retinitis pigmentosa based on exome sequencing.

PubMed

Xu, Yan; Guan, Liping; Xiao, Xueshan; Zhang, Jianguo; Li, Shiqiang; Jiang, Hui; Jia, Xiaoyun; Yang, Jianhua; Guo, Xiangming; Yin, Ye; Wang, Jun; Zhang, Qingjiong

2015-01-01

Mutations in 60 known genes were previously identified by exome sequencing in 79 of 157 families with retinitis pigmentosa (RP). This study analyzed variants in 129 genes associated with other forms of hereditary retinal dystrophy in the same cohort. Apart from the 73 genes previously analyzed, a further 129 genes responsible for other forms of hereditary retinal dystrophy were selected based on RetNet. Variants in the 129 genes determined by whole exome sequencing were selected and filtered by bioinformatics analysis. Candidate variants were confirmed by Sanger sequencing and validated by analysis of available family members and controls. A total of 90 candidate variants were present in the 129 genes. Sanger sequencing confirmed 83 of the 90 variants. Analysis of family members and controls excluded 76 of these 83 variants. The remaining seven variants were considered to be potential pathogenic mutations; these were c.899A>G, c.1814C>G, and c.2107C>T in BBS2; c.1073C>T and c.1669C>T in INPP5E; and c.3582C>G and c.5704-5C>G in CACNA1F. Six of these seven mutations were novel. The mutations were detected in five unrelated patients without a family history, including three patients with homozygous or compound heterozygous mutations in BBS2 and INPP5E, and two patients with hemizygous mutations in CACNA1F. None of the patients had mutations in the genes associated with autosome dominant retinal dystrophy. Only a small portion of patients with RP, about 3% (5/157), had causative mutations in the 129 genes associated with other forms of hereditary retinal dystrophy.
Birt-Hogg-Dubé syndrome in two Chinese families with mutations in the FLCN gene.

PubMed

Hou, Xiaocan; Zhou, Yuan; Peng, Yun; Qiu, Rong; Xia, Kun; Tang, Beisha; Zhuang, Wei; Jiang, Hong

2018-01-22

Birt-Hogg-Dubé syndrome is an autosomal dominant hereditary condition caused by mutations in the folliculin-encoding gene FLCN (NM_144997). It is associated with skin lesions such as fibrofolliculoma, acrochordon and trichodiscoma; pulmonary lesions including spontaneous pneumothorax and pulmonary cysts and renal cancer. Genomic DNA was extracted from peripheral venous blood samples of the propositi and their family members. Genetic analysis was performed by whole exome sequencing and Sanger sequencing aiming at corresponding exons in FLCN gene to explore the genetic mutations of these two families. In this study, we performed genetic analysis by whole exome sequencing and Sanger sequencing aiming at corresponding exons in FLCN gene to explore the genetic mutations in two Chinese families. Patients from family 1 mostly suffered from pneumothorax and pulmonary cysts, several of whom also mentioned skin lesions or kidney lesions. While in family 2, only thoracic lesions were found in the patients, without any other clinical manifestations. Two FLCN mutations have been identified: One is an insertion mutation (c.1579_1580insA/p.R527Xfs on exon 14) previously reported in three Asian families (one mainland family and two Taiwanese families); while the other is a firstly reviewed mutation in Asian population (c.649C > T / p.Gln217X on exon 7) that ever been detected in a French family. Overall, The detection of these two mutations expands the spectrum of FLCN mutations and will provide insight into genetic diagnosis and counseling of Birt-Hogg-Dubé syndrome.
The evolution of vertebrate Toll-like receptors

USGS Publications Warehouse

Roach, J.C.; Glusman, G.; Rowen, L.; Kaur, A.; Purcell, M.K.; Smith, K.D.; Hood, L.E.; Aderem, A.

2005-01-01

The complete sequences of Takifugu Toll-like receptor (TLR) loci and gene predictions from many draft genomes enable comprehensive molecular phylogenetic analysis. Strong selective pressure for recognition of and response to pathogen-associated molecular patterns has maintained a largely unchanging TLR recognition in all vertebrates. There are six major families of vertebrate TLRs. This repertoire is distinct from that of invertebrates. TLRs within a family recognize a general class of pathogen-associated molecular patterns. Most vertebrates have exactly one gene ortholog for each TLR family. The family including TLR1 has more species-specific adaptations than other families. A major family including TLR11 is represented in humans only by a pseudogene. Coincidental evolution plays a minor role in TLR evolution. The sequencing phase of this study produced finished genomic sequences for the 12 Takifugu rubripes TLRs. In addition, we have produced > 70 gene models, including sequences from the opossum, chicken, frog, dog, sea urchin, and sea squirt. ?? 2005 by The National Academy of Sciences of the USA.
SINEBase: a database and tool for SINE analysis.

PubMed

Vassetzky, Nikita S; Kramerov, Dmitri A

2013-01-01

SINEBase (http://sines.eimb.ru) integrates the revisited body of knowledge about short interspersed elements (SINEs). A set of formal definitions concerning SINEs was introduced. All available sequence data were screened through these definitions and the genetic elements misidentified as SINEs were discarded. As a result, 175 SINE families have been recognized in animals, flowering plants and green algae. These families were classified by the modular structure of their nucleotide sequences and the frequencies of different patterns were evaluated. These data formed the basis for the database of SINEs. The SINEBase website can be used in two ways: first, to explore the database of SINE families, and second, to analyse candidate SINE sequences using specifically developed tools. This article presents an overview of the database and the process of SINE identification and analysis.
SINEBase: a database and tool for SINE analysis

PubMed Central

Vassetzky, Nikita S.; Kramerov, Dmitri A.

2013-01-01

SINEBase (http://sines.eimb.ru) integrates the revisited body of knowledge about short interspersed elements (SINEs). A set of formal definitions concerning SINEs was introduced. All available sequence data were screened through these definitions and the genetic elements misidentified as SINEs were discarded. As a result, 175 SINE families have been recognized in animals, flowering plants and green algae. These families were classified by the modular structure of their nucleotide sequences and the frequencies of different patterns were evaluated. These data formed the basis for the database of SINEs. The SINEBase website can be used in two ways: first, to explore the database of SINE families, and second, to analyse candidate SINE sequences using specifically developed tools. This article presents an overview of the database and the process of SINE identification and analysis. PMID:23203982

Identification of novel mutations in the α-galactosidase A gene in patients with Fabry disease: pitfalls of mutation analyses in patients with low α-galactosidase A activity.

PubMed

Yoshimitsu, Makoto; Higuchi, Koji; Miyata, Masaaki; Devine, Sean; Mattman, Andre; Sirrs, Sandra; Medin, Jeffrey A; Tei, Chuwa; Takenaka, Toshihiro

2011-05-01

Fabry disease is an X-linked lysosomal storage disorder caused by mutations of the α-galactosidase A (GLA) gene, and the disease is a relatively prevalent cause of left ventricular hypertrophy followed by conduction abnormalities and arrhythmias. Mutation analysis of the GLA gene is a valuable tool for accurate diagnosis of affected families. In this study, we carried out molecular studies of 10 unrelated families diagnosed with Fabry disease. Genetic analysis of the GLA gene using conventional genomic sequencing was performed in 9 hemizygous males and 6 heterozygous females. In patients with no mutations in coding DNA sequence, multiplex ligation-dependent probe amplification (MLPA) and/or cDNA sequencing were performed. We identified a novel exon 2 deletion (IVS1_IVS2) in a heterozygous female by MLPA, which was undetectable by conventional sequencing methods. In addition, the g.9331G>A mutation that has previously been found only in patients with cardiac Fabry disease was found in 3 unrelated, newly-diagnosed, cardiac Fabry patients by sequencing GLA genomic DNA and cDNA. Two other novel mutations, g.8319A>G and 832delA were also found in addition to 4 previously reported mutations (R112C, C142Y, M296I, and G373D) in 6 other families. We could identify GLA gene mutations in all hemizygotes and heterozygotes from 10 families with Fabry disease. Mutations in 4 out of 10 families could not be identified by classical genomic analysis, which focuses on exons and the flanking region. Instead, these data suggest that MLPA analysis and cDNA sequence should be considered in genetic testing surveys of patients with Fabry disease. Copyright © 2011 Japanese College of Cardiology. Published by Elsevier Ltd. All rights reserved.
WISARD: workbench for integrated superfast association studies for related datasets.

PubMed

Lee, Sungyoung; Choi, Sungkyoung; Qiao, Dandi; Cho, Michael; Silverman, Edwin K; Park, Taesung; Won, Sungho

2018-04-20

A Mendelian transmission produces phenotypic and genetic relatedness between family members, giving family-based analytical methods an important role in genetic epidemiological studies-from heritability estimations to genetic association analyses. With the advance in genotyping technologies, whole-genome sequence data can be utilized for genetic epidemiological studies, and family-based samples may become more useful for detecting de novo mutations. However, genetic analyses employing family-based samples usually suffer from the complexity of the computational/statistical algorithms, and certain types of family designs, such as incorporating data from extended families, have rarely been used. We present a Workbench for Integrated Superfast Association studies for Related Data (WISARD) programmed in C/C++. WISARD enables the fast and a comprehensive analysis of SNP-chip and next-generation sequencing data on extended families, with applications from designing genetic studies to summarizing analysis results. In addition, WISARD can automatically be run in a fully multithreaded manner, and the integration of R software for visualization makes it more accessible to non-experts. Comparison with existing toolsets showed that WISARD is computationally suitable for integrated analysis of related subjects, and demonstrated that WISARD outperforms existing toolsets. WISARD has also been successfully utilized to analyze the large-scale massive sequencing dataset of chronic obstructive pulmonary disease data (COPD), and we identified multiple genes associated with COPD, which demonstrates its practical value.
A diverse family of serine proteinase genes expressed in cotton boll weevil (Anthonomus grandis): implications for the design of pest-resistant transgenic cotton plants.

PubMed

Oliveira-Neto, Osmundo B; Batista, João A N; Rigden, Daniel J; Fragoso, Rodrigo R; Silva, Rodrigo O; Gomes, Eliane A; Franco, Octávio L; Dias, Simoni C; Cordeiro, Célia M T; Monnerat, Rose G; Grossi-De-Sá, Maria F

2004-09-01

Fourteen different cDNA fragments encoding serine proteinases were isolated by reverse transcription-PCR from cotton boll weevil (Anthonomus grandis) larvae. A large diversity between the sequences was observed, with a mean pairwise identity of 22% in the amino acid sequence. The cDNAs encompassed 11 trypsin-like sequences classifiable into three families and three chymotrypsin-like sequences belonging to a single family. Using a combination of 5' and 3' RACE, the full-length sequence was obtained for five of the cDNAs, named Agser2, Agser5, Agser6, Agser10 and Agser21. The encoded proteins included amino acid sequence motifs of serine proteinase active sites, conserved cysteine residues, and both zymogen activation and signal peptides. Southern blotting analysis suggested that one or two copies of these serine proteinase genes exist in the A. grandis genome. Northern blotting analysis of Agser2 and Agser5 showed that for both genes, expression is induced upon feeding and is concentrated in the gut of larvae and adult insects. Reverse northern analysis of the 14 cDNA fragments showed that only two trypsin-like and two chymotrypsin-like were expressed at detectable levels. Under the effect of the serine proteinase inhibitors soybean Kunitz trypsin inhibitor and black-eyed pea trypsin/chymotrypsin inhibitor, expression of one of the trypsin-like sequences was upregulated while expression of the two chymotrypsin-like sequences was downregulated. Copyright 2004 Elsevier Ltd.
Structure of the highly repeated, long interspersed DNA family (LINE or L1Rn) of the rat.

PubMed Central

D'Ambrosio, E; Waitzkin, S D; Witney, F R; Salemme, A; Furano, A V

1986-01-01

We present the DNA sequence of a 6.7-kilobase member of the rat long interspersed repeated DNA family (LINE or L1Rn). This member (LINE 3) is flanked by a perfect 14-base-pair (bp) direct repeat and is a full-length, or close-to-full-length, member of this family. LINE 3 contains an approximately 100-bp A-rich right end, a number of long (greater than 400-bp) open reading frames, and a ca. 200-bp G + C-rich (ca. 60%) cluster near each terminus. Comparison of the LINE 3 sequence with the sequence of about one-half of another member, which we also present, as well as restriction enzyme analysis of the genomic copies of this family, indicates that in length and overall structure LINE 3 is quite typical of the 40,000 or so other genomic members of this family which would account for as much as 10% of the rat genome. Therefore, the rat LINE family is relatively homogeneous, which contrasts with the heterogeneous LINE families in primates and mice. Transcripts corresponding to the entire LINE sequence are abundant in the nuclear RNA of rat liver. The characteristics of the rat LINE family are discussed with respect to the possible function and evolution of this family of DNA sequences. Images PMID:3023845
Metagenomics and the protein universe

PubMed Central

Godzik, Adam

2011-01-01

Metagenomics sequencing projects have dramatically increased our knowledge of the protein universe and provided over one-half of currently known protein sequences; they have also introduced a much broader phylogenetic diversity into the protein databases. The full analysis of metagenomic datasets is only beginning, but it has already led to the discovery of thousands of new protein families, likely representing novel functions specific to given environments. At the same time, a deeper analysis of such novel families, including experimental structure determination of some representatives, suggests that most of them represent distant homologs of already characterized protein families, and thus most of the protein diversity present in the new environments are due to functional divergence of the known protein families rather than the emergence of new ones. PMID:21497084
The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families.

PubMed

Yooseph, Shibu; Sutton, Granger; Rusch, Douglas B; Halpern, Aaron L; Williamson, Shannon J; Remington, Karin; Eisen, Jonathan A; Heidelberg, Karla B; Manning, Gerard; Li, Weizhong; Jaroszewski, Lukasz; Cieplak, Piotr; Miller, Christopher S; Li, Huiying; Mashiyama, Susan T; Joachimiak, Marcin P; van Belle, Christopher; Chandonia, John-Marc; Soergel, David A; Zhai, Yufeng; Natarajan, Kannan; Lee, Shaun; Raphael, Benjamin J; Bafna, Vineet; Friedman, Robert; Brenner, Steven E; Godzik, Adam; Eisenberg, David; Dixon, Jack E; Taylor, Susan S; Strausberg, Robert L; Frazier, Marvin; Venter, J Craig

2007-03-01

Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature.
PANTHER: a browsable database of gene products organized by biological function, using curated protein family and subfamily classification.

PubMed

Thomas, Paul D; Kejariwal, Anish; Campbell, Michael J; Mi, Huaiyu; Diemer, Karen; Guo, Nan; Ladunga, Istvan; Ulitsky-Lazareva, Betty; Muruganujan, Anushya; Rabkin, Steven; Vandergriff, Jody A; Doremieux, Olivier

2003-01-01

The PANTHER database was designed for high-throughput analysis of protein sequences. One of the key features is a simplified ontology of protein function, which allows browsing of the database by biological functions. Biologist curators have associated the ontology terms with groups of protein sequences rather than individual sequences. Statistical models (Hidden Markov Models, or HMMs) are built from each of these groups. The advantage of this approach is that new sequences can be automatically classified as they become available. To ensure accurate functional classification, HMMs are constructed not only for families, but also for functionally distinct subfamilies. Multiple sequence alignments and phylogenetic trees, including curator-assigned information, are available for each family. The current version of the PANTHER database includes training sequences from all organisms in the GenBank non-redundant protein database, and the HMMs have been used to classify gene products across the entire genomes of human, and Drosophila melanogaster. The ontology terms and protein families and subfamilies, as well as Drosophila gene c;assifications, can be browsed and searched for free. Due to outstanding contractual obligations, access to human gene classifications and to protein family trees and multiple sequence alignments will temporarily require a nominal registration fee. PANTHER is publicly available on the web at http://panther.celera.com.
Genetic analysis of a Chinese family with members affected with Usher syndrome type II and Waardenburg syndrome type IV.

PubMed

Wang, Xueling; Lin, Xiao-Jiang; Tang, Xiangrong; Chai, Yong-Chuan; Yu, De-Hong; Chen, Dong-Ye; Wu, Hao

2017-11-01

The purpose of this study was to identify the genetic causes of a family presenting with multiple symptoms overlapping Usher syndrome type II (USH2) and Waardenburg syndrome type IV (WS4). Targeted next-generation sequencing including the exon and flanking intron sequences of 79 deafness genes was performed on the proband. Co-segregation of the disease phenotype and the detected variants were confirmed in all family members by PCR amplification and Sanger sequencing. The affected members of this family had two different recessive disorders, USH2 and WS4. By targeted next-generation sequencing, we identified that USH2 was caused by a novel missense mutation, p.V4907D in GPR98; whereas WS4 due to p.V185M in EDNRB. This is the first report of homozygous p.V185M mutation in EDNRB in patient with WS4. This study reported a Chinese family with multiple independent and overlapping phenotypes. In condition, molecular level analysis was efficient to identify the causative variant p.V4907D in GPR98 and p.V185M in EDNRB, also was helpful to confirm the clinical diagnosis of USH2 and WS4. Copyright © 2017 Elsevier B.V. All rights reserved.
Using random forests for assistance in the curation of G-protein coupled receptor databases.

PubMed

Shkurin, Aleksei; Vellido, Alfredo

2017-08-18

Biology is experiencing a gradual but fast transformation from a laboratory-centred science towards a data-centred one. As such, it requires robust data engineering and the use of quantitative data analysis methods as part of database curation. This paper focuses on G protein-coupled receptors, a large and heterogeneous super-family of cell membrane proteins of interest to biology in general. One of its families, Class C, is of particular interest to pharmacology and drug design. This family is quite heterogeneous on its own, and the discrimination of its several sub-families is a challenging problem. In the absence of known crystal structure, such discrimination must rely on their primary amino acid sequences. We are interested not as much in achieving maximum sub-family discrimination accuracy using quantitative methods, but in exploring sequence misclassification behavior. Specifically, we are interested in isolating those sequences showing consistent misclassification, that is, sequences that are very often misclassified and almost always to the same wrong sub-family. Random forests are used for this analysis due to their ensemble nature, which makes them naturally suited to gauge the consistency of misclassification. This consistency is here defined through the voting scheme of their base tree classifiers. Detailed consistency results for the random forest ensemble classification were obtained for all receptors and for all data transformations of their unaligned primary sequences. Shortlists of the most consistently misclassified receptors for each subfamily and transformation, as well as an overall shortlist including those cases that were consistently misclassified across transformations, were obtained. The latter should be referred to experts for further investigation as a data curation task. The automatic discrimination of the Class C sub-families of G protein-coupled receptors from their unaligned primary sequences shows clear limits. This study has investigated in some detail the consistency of their misclassification using random forest ensemble classifiers. Different sub-families have been shown to display very different discrimination consistency behaviors. The individual identification of consistently misclassified sequences should provide a tool for quality control to GPCR database curators.
[Analysis of genotype and phenotype correlation of MYH7-V878A mutation among ethnic Han Chinese pedigrees affected with hypertrophic cardiomyopathy].

PubMed

Wang, Bo; Guo, Ruiqi; Zuo, Lei; Shao, Hong; Liu, Ying; Wang, Yu; Ju, Yan; Sun, Chao; Wang, Lifeng; Zhang, Yanmin; Liu, Liwen

2017-08-10

To analyze the phenotype-genotype correlation of MYH7-V878A mutation. Exonic amplification and high-throughput sequencing of 96-cardiovascular disease-related genes were carried out on probands from 210 pedigrees affected with hypertrophic cardiomyopathy (HCM). For the probands, their family members, and 300 healthy volunteers, the identified MYH7-V878A mutation was verified by Sanger sequencing. Information of the HCM patients and their family members, including clinical data, physical examination, echocardiography (UCG), electrocardiography (ECG), and conserved sequence of the mutation among various species were analyzed. A MYH7-V878A mutation was detected in five HCM pedigrees containing 31 family members. Fourteen members have carried the mutation, among whom 11 were diagnosed with HCM, while 3 did not meet the diagnostic criteria. Some of the fourteen members also carried other mutations. Family members not carrying the mutation had normal UCG and ECG. No MYH7-V878A mutation was found among the 300 healthy volunteers. Analysis of sequence conservation showed that the amino acid is located in highly conserved regions among various species. MYH7-V878A is a hot spot among ethnic Han Chinese with a high penetrance. Functional analysis of the conserved sequences suggested that the mutation may cause significant alteration of the function. MYH7-V878A has a significant value for the early diagnosis of HCM.
STAG3 truncating variant as the cause of primary ovarian insufficiency

PubMed Central

Le Quesne Stabej, Polona; Williams, Hywel J; James, Chela; Tekman, Mehmet; Stanescu, Horia C; Kleta, Robert; Ocaka, Louise; Lescai, Francesco; Storr, Helen L; Bitner-Glindzicz, Maria; Bacchelli, Chiara; Conway, Gerard S

2016-01-01

Primary ovarian insufficiency (POI) is a distressing cause of infertility in young women. POI is heterogeneous with only a few causative genes having been discovered so far. Our objective was to determine the genetic cause of POI in a consanguineous Lebanese family with two affected sisters presenting with primary amenorrhoea and an absence of any pubertal development. Multipoint parametric linkage analysis was performed. Whole-exome sequencing was done on the proband. Linkage analysis identified a locus on chromosome 7 where exome sequencing successfully identified a homozygous two base pair duplication (c.1947_48dupCT), leading to a truncated protein p.(Y650Sfs*22) in the STAG3 gene, confirming it as the cause of POI in this family. Exome sequencing combined with linkage analyses offers a powerful tool to efficiently find novel genetic causes of rare, heterogeneous disorders, even in small single families. This is only the second report of a STAG3 variant; the first STAG3 variant was recently described in a phenotypically similar family with extreme POI. Identification of an additional family highlights the importance of STAG3 in POI pathogenesis and suggests it should be evaluated in families affected with POI. PMID:26059840
On the value of Mendelian laws of segregation in families: data quality control, imputation and beyond

PubMed Central

Blue, Elizabeth Marchani; Sun, Lei; Tintle, Nathan L.; Wijsman, Ellen M.

2014-01-01

When analyzing family data, we dream of perfectly informative data, even whole genome sequences (WGS) for all family members. Reality intervenes, and we find next-generation sequence (NGS) data have error, and are often too expensive or impossible to collect on everyone. Genetic Analysis Workshop 18 groups “Quality Control” and “Dropping WGS through families using GWAS framework” focused on finding, correcting, and using errors within the available sequence and family data, developing methods to infer and analyze missing sequence data among relatives, and testing for linkage and association with simulated blood pressure. We found that single nucleotide polymorphisms, NGS, and imputed data are generally concordant, but that errors are particularly likely at rare variants, homozygous genotypes, within regions with repeated sequences or structural variants, and within sequence data imputed from unrelateds. Admixture complicated identification of cryptic relatedness, but information from Mendelian transmission improved error detection and provided an estimate of the de novo mutation rate. Both genotype and pedigree errors had an adverse effect on subsequent analyses. Computationally fast rules-based imputation was accurate, but could not cover as many loci or subjects as more computationally demanding probability-based methods. Incorporating population-level data into pedigree-based imputation methods improved results. Observed data outperformed imputed data in association testing, but imputed data were also useful. We discuss the strengths and weaknesses of existing methods, and suggest possible future directions. Topics include improving communication between those performing data collection and analysis, establishing thresholds for and improving imputation quality, and incorporating error into imputation and analytical models. PMID:25112184
A new cryptic virus belonging to the family Partitiviridae was found in watermelon co-infected with Melon necrotic spot virus.

PubMed

Sela, Noa; Lachman, Oded; Reingold, Victoria; Dombrovsky, Aviv

2013-10-01

A novel virus was detected in watermelon plants (Citrullus lanatus Thunb.) infected with Melon necrotic spot virus (MNSV) using SOLiD next-generation sequence analysis. In addition to the expected MSNV genome, two double-stranded RNA (dsRNA) segments of 1,312 and 1,118 bp were also identified and sequenced from the purified virus preparations. These two dsRNA segments encode two putative partitivirus-related proteins, an RNA-dependent RNA polymerase (RdRP) and a capsid protein, which were sequenced. Genomic-sequence analysis and analysis of phylogenetic relationships indicate that these two dsRNAs together make up the genome of a novel Partitivirus. This virus was found to be closely related to the Pepper cryptic virus 1 and Raphanus sativus cryptic virus. It is suggested that this novel virus putatively named Citrullus lanatus cryptic virus be considered as a new member of the family Partitiviridae.
Rare variants in RTEL1 are associated with familial interstitial pneumonia.

PubMed

Cogan, Joy D; Kropski, Jonathan A; Zhao, Min; Mitchell, Daphne B; Rives, Lynette; Markin, Cheryl; Garnett, Errine T; Montgomery, Keri H; Mason, Wendi R; McKean, David F; Powers, Julia; Murphy, Elissa; Olson, Lana M; Choi, Leena; Cheng, Dong-Sheng; Blue, Elizabeth Marchani; Young, Lisa R; Lancaster, Lisa H; Steele, Mark P; Brown, Kevin K; Schwarz, Marvin I; Fingerlin, Tasha E; Schwartz, David A; Lawson, William E; Loyd, James E; Zhao, Zhongming; Phillips, John A; Blackwell, Timothy S

2015-03-15

Up to 20% of cases of idiopathic interstitial pneumonia cluster in families, comprising the syndrome of familial interstitial pneumonia (FIP); however, the genetic basis of FIP remains uncertain in most families. To determine if new disease-causing rare genetic variants could be identified using whole-exome sequencing of affected members from FIP families, providing additional insights into disease pathogenesis. Affected subjects from 25 kindreds were selected from an ongoing FIP registry for whole-exome sequencing from genomic DNA. Candidate rare variants were confirmed by Sanger sequencing, and cosegregation analysis was performed in families, followed by additional sequencing of affected individuals from another 163 kindreds. We identified a potentially damaging rare variant in the gene encoding for regulator of telomere elongation helicase 1 (RTEL1) that segregated with disease and was associated with very short telomeres in peripheral blood mononuclear cells in 1 of 25 families in our original whole-exome sequencing cohort. Evaluation of affected individuals in 163 additional kindreds revealed another eight families (4.7%) with heterozygous rare variants in RTEL1 that segregated with clinical FIP. Probands and unaffected carriers of these rare variants had short telomeres (<10% for age) in peripheral blood mononuclear cells and increased T-circle formation, suggesting impaired RTEL1 function. Rare loss-of-function variants in RTEL1 represent a newly defined genetic predisposition for FIP, supporting the importance of telomere-related pathways in pulmonary fibrosis.
[Analysis of gene mutation in a Chinese family with Norrie disease].

PubMed

Zhang, Tian-xiao; Zhao, Xiu-li; Hua, Rui; Zhang, Jin-song; Zhang, Xue

2012-09-01

To detect the pathogenic mutation in a Chinese family with Norrie disease. Clinical diagnosis was based on familial history, clinical sign and B ultrasonic examination. Peripheral blood samples were obtained from all available members in a Chinese family with Norrie disease. Genomic DNA was extracted from lymphocytes by the standard SDS-proteinase K-phenol/chloroform method. Two coding exons and all intron-exon boundaries of the NDP gene were PCR amplified using three pairs of primers and subjected to automatic DNA sequence. The causative mutation was confirmed by restriction enzyme analysis and genotyping analysis in all members. Sequence analysis of NDP gene revealed a missense mutation c.220C > T (p.Arg74Cys) in the proband and his mother. Further mutation identification by restriction enzyme analysis and genotyping analysis showed that the proband was homozygote of this mutation. His mother and other four unaffected members (III3, IV4, III5 and II2) were carriers of this mutation. The mutant amino acid located in the C-terminal cystine knot-like domain, which was critical motif for the structure and function of NDP. A NDP missense mutation was identified in a Chinese family with Norrie disease.
Basic Helix-Loop-Helix Transcription Factor Gene Family Phylogenetics and Nomenclature

PubMed Central

Skinner, Michael K.; Rawls, Alan; Wilson-Rawls, Jeanne; Roalson, Eric H.

2010-01-01

A phylogenetic analysis of the basic helix-loop-helix (bHLH) gene superfamily was performed using seven different species (human, mouse, rat, worm, fly, yeast, and plant Arabidopsis) and involving over 600 bHLH genes [1]. All bHLH genes were identified in the genomes of the various species, including expressed sequence tags, and the entire coding sequence was used in the analysis. Nearly 15% of the gene family has been updated or added since the original publication. A super-tree involving six clades and all structural relationships was established and is now presented for four of the species. The wealth of functional data available for members of the bHLH gene superfamily provides us with the opportunity to use this exhaustive phylogenetic tree to predict potential functions of uncharacterized members of the family. This phylogenetic and genomic analysis of the bHLH gene family has revealed unique elements of the evolution and functional relationships of the different genes in the bHLH gene family. PMID:20219281
A novel mutation in PAX3 associated with Waardenburg syndrome type I in a Chinese family.

PubMed

Xiao, Yun; Luo, Jianfen; Zhang, Fengguo; Li, Jianfeng; Han, Yuechen; Zhang, Daogong; Wang, Mingming; Ma, Yalin; Xu, Lei; Bai, Xiaohui; Wang, Haibo

2016-01-01

The novel compound heterozygous mutation in PAX3 was the key genetic reason for WS1 in this family, which was useful to the molecular diagnosis of WS1. Screening the pathogenic mutations in a four generation Chinese family with Waardenburg syndrome type I (WS1). WS1 was diagnosed in a 4-year-old boy according to the Waardenburg syndrome Consortium criteria. The detailed family history revealed four affected members in the family. Routine clinical, audiological examination, and ophthalmologic evaluation were performed on four affected and 10 healthy members in this family. The genetic analysis was conducted, including the targeted next-generation sequencing of 127 known deafness genes combined with Sanger sequencing, TA clone and bioinformatic analysis. A novel compound heterozygous mutation c.[169_170insC;172_174delAAG] (p.His57ProfsX55) was identified in PAX3, which was co-segregated with WS1 in the Chinese family. This mutation was absent in the unaffected family members and 200 ethnicity-matched controls. The phylogenetic analysis and three-dimensional (3D) modeling of Pax3 protein further confirmed that the novel compound heterozygous mutation was pathogenic.
Evolution, substrate specificity and subfamily classification of glycoside hydrolase family 5 (GH5).

PubMed

Aspeborg, Henrik; Coutinho, Pedro M; Wang, Yang; Brumer, Harry; Henrissat, Bernard

2012-09-20

The large Glycoside Hydrolase family 5 (GH5) groups together a wide range of enzymes acting on β-linked oligo- and polysaccharides, and glycoconjugates from a large spectrum of organisms. The long and complex evolution of this family of enzymes and its broad sequence diversity limits functional prediction. With the objective of improving the differentiation of enzyme specificities in a knowledge-based context, and to obtain new evolutionary insights, we present here a new, robust subfamily classification of family GH5. About 80% of the current sequences were assigned into 51 subfamilies in a global analysis of all publicly available GH5 sequences and associated biochemical data. Examination of subfamilies with catalytically-active members revealed that one third are monospecific (containing a single enzyme activity), although new functions may be discovered with biochemical characterization in the future. Furthermore, twenty subfamilies presently have no characterization whatsoever and many others have only limited structural and biochemical data. Mapping of functional knowledge onto the GH5 phylogenetic tree revealed that the sequence space of this historical and industrially important family is far from well dispersed, highlighting targets in need of further study. The analysis also uncovered a number of GH5 proteins which have lost their catalytic machinery, indicating evolution towards novel functions. Overall, the subfamily division of GH5 provides an actively curated resource for large-scale protein sequence annotation for glycogenomics; the subfamily assignments are openly accessible via the Carbohydrate-Active Enzyme database at http://www.cazy.org/GH5.html.
Molecular Diet Analysis of Two African Free-Tailed Bats (Molossidae) Using High Throughput Sequencing

PubMed Central

Bohmann, Kristine; Monadjem, Ara; Lehmkuhl Noer, Christina; Rasmussen, Morten; Zeale, Matt R. K.; Clare, Elizabeth; Jones, Gareth; Willerslev, Eske; Gilbert, M. Thomas P.

2011-01-01

Given the diversity of prey consumed by insectivorous bats, it is difficult to discern the composition of their diet using morphological or conventional PCR-based analyses of their faeces. We demonstrate the use of a powerful alternate tool, the use of the Roche FLX sequencing platform to deep-sequence uniquely 5′ tagged insect-generic barcode cytochrome c oxidase I (COI) fragments, that were PCR amplified from faecal pellets of two free-tailed bat species Chaerephon pumilus and Mops condylurus (family: Molossidae). Although the analyses were challenged by the paucity of southern African insect COI sequences in the GenBank and BOLD databases, similarity to existing collections allowed the preliminary identification of 25 prey families from six orders of insects within the diet of C. pumilus, and 24 families from seven orders within the diet of M. condylurus. Insects identified to families within the orders Lepidoptera and Diptera were widely present among the faecal samples analysed. The two families that were observed most frequently were Noctuidae and Nymphalidae (Lepidoptera). Species-level analysis of the data was accomplished using novel bioinformatics techniques for the identification of molecular operational taxonomic units (MOTU). Based on these analyses, our data provide little evidence of resource partitioning between sympatric M. condylurus and C. pumilus in the Simunye region of Swaziland at the time of year when the samples were collected, although as more complete databases against which to compare the sequences are generated this may have to be re-evaluated. PMID:21731749
Hydrophobic cluster analysis of G protein-coupled receptors: a powerful tool to derive structural and functional information from 2D-representation of protein sequences.

PubMed

Lentes, K U; Mathieu, E; Bischoff, R; Rasmussen, U B; Pavirani, A

1993-01-01

Current methods for comparative analyses of protein sequences are 1D-alignments of amino acid sequences based on the maximization of amino acid identity (homology) and the prediction of secondary structure elements. This method has a major drawback once the amino acid identity drops below 20-25%, since maximization of a homology score does not take into account any structural information. A new technique called Hydrophobic Cluster Analysis (HCA) has been developed by Lemesle-Varloot et al. (Biochimie 72, 555-574), 1990). This consists of comparing several sequences simultaneously and combining homology detection with secondary structure analysis. HCA is primarily based on the detection and comparison of structural segments constituting the hydrophobic core of globular protein domains, with or without transmembrane domains. We have applied HCA to the analysis of different families of G-protein coupled receptors, such as catecholamine receptors as well as peptide hormone receptors. Utilizing HCA the thrombin receptor, a new and as yet unique member of the family of G-protein coupled receptors, can be clearly classified as being closely related to the family of neuropeptide receptors rather than to the catecholamine receptors for which the shape of the hydrophobic clusters and the length of their third cytoplasmic loop are very different. Furthermore, the potential of HCA to predict relationships between new putative and already characterized members of this family of receptors will be presented.

A novel recurrent mutation in MITF predisposes to familial and sporadic melanoma

PubMed Central

Yokoyama, Satoru; Woods, Susan L.; Boyle, Glen M.; Aoude, Lauren G.; MacGregor, Stuart; Zismann, Victoria; Gartside, Michael; Cust, Anne E.; Haq, Rizwan; Harland, Mark; Taylor, John C.; Duffy, David L.; Holohan, Kelly; Dutton-Regester, Ken; Palmer, Jane M.; Bonazzi, Vanessa; Stark, Mitchell S.; Symmons, Judith; Law, Matthew H.; Schmidt, Christopher; Lanagan, Cathy; O’Connor, Linda; Holland, Elizabeth A.; Schmid, Helen; Maskiell, Judith A.; Jetann, Jodie; Ferguson, Megan; Jenkins, Mark A.; Kefford, Richard F.; Giles, Graham G.; Armstrong, Bruce K.; Aitken, Joanne F.; Hopper, John L.; Whiteman, David C.; Pharoah, Paul D.; Easton, Douglas F.; Dunning, Alison M.; Newton-Bishop, Julia A.; Montgomery, Grant W.; Martin, Nicholas G.; Mann, Graham J.; Bishop, D. Timothy; Tsao, Hensin; Trent, Jeffrey M.; Fisher, David E.; Hayward, Nicholas K.; Brown, Kevin M.

2012-01-01

So far, two familial melanoma genes have been identified, accounting for a minority of genetic risk in families. Mutations in CDKN2A account for approximately 40% of familial cases1, and predisposing mutations in CDK4 have been reported in a very small number of melanoma kindreds2. To identify other familial melanoma genes, here we conducted whole-genome sequencing of probands from several melanoma families, identifying one individual carrying a novel germline variant (coding DNA sequence c.G1075A; protein sequence p.E318K; rs149617956) in the melanoma-lineage-specific oncogene microphthalmia-associated transcription factor (MITF). Although the variant co-segregated with melanoma in some but not all cases in the family, linkage analysis of 31 families subsequently identified to carry the variant generated a log odds ratio (lod) score of 2.7 under a dominant model, indicating E318K as a possible intermediate risk variant. Consistent with this, the E318K variant was significantly associated with melanoma in a large Australian case–control sample. Likewise, it was similarly associated in an independent case–control sample from the United Kingdom. In the Australian sample, the variant allele was significantly over-represented in cases with a family history of melanoma, multiple primary melanomas, or both. The variant allele was also associated with increased naevus count and non-blue eye colour. Functional analysis of E318K showed that MITF encoded by the variant allele had impaired sumoylation and differentially regulated several MITF targets. These data indicate that MITF is a melanoma-predisposition gene and highlight the utility of whole-genome sequencing to identify novel rare variants associated with disease susceptibility. PMID:22080950
A family-based probabilistic method for capturing de novo mutations from high-throughput short-read sequencing data.

PubMed

Cartwright, Reed A; Hussin, Julie; Keebler, Jonathan E M; Stone, Eric A; Awadalla, Philip

2012-01-06

Recent advances in high-throughput DNA sequencing technologies and associated statistical analyses have enabled in-depth analysis of whole-genome sequences. As this technology is applied to a growing number of individual human genomes, entire families are now being sequenced. Information contained within the pedigree of a sequenced family can be leveraged when inferring the donors' genotypes. The presence of a de novo mutation within the pedigree is indicated by a violation of Mendelian inheritance laws. Here, we present a method for probabilistically inferring genotypes across a pedigree using high-throughput sequencing data and producing the posterior probability of de novo mutation at each genomic site examined. This framework can be used to disentangle the effects of germline and somatic mutational processes and to simultaneously estimate the effect of sequencing error and the initial genetic variation in the population from which the founders of the pedigree arise. This approach is examined in detail through simulations and areas for method improvement are noted. By applying this method to data from members of a well-defined nuclear family with accurate pedigree information, the stage is set to make the most direct estimates of the human mutation rate to date.
Two Paralogous Families of a Two-Gene Subtilisin Operon Are Widely Distributed in Oral Treponemes

PubMed Central

Correia, Frederick F.; Plummer, Alvin R.; Ellen, Richard P.; Wyss, Chris; Boches, Susan K.; Galvin, Jamie L.; Paster, Bruce J.; Dewhirst, Floyd E.

2003-01-01

Certain oral treponemes express a highly proteolytic phenotype and have been associated with periodontal diseases. The periodontal pathogen Treponema denticola produces dentilisin, a serine protease of the subtilisin family. The two-gene operon prcA-prtP is required for expression of active dentilisin (PrtP), a putative lipoprotein attached to the treponeme's outer membrane or sheath. The purpose of this study was to examine the diversity and structure of treponemal subtilisin-like proteases in order to better understand their distribution and function. The complete sequences of five prcA-prtP operons were determined for Treponema lecithinolyticum, “Treponema vincentii,” and two canine species. Partial operon sequences were obtained for T. socranskii subsp. 04 as well as 450- to 1,000-base fragments of prtP genes from four additional treponeme strains. Phylogenetic analysis demonstrated that the sequences fall into two paralogous families. The first family includes the sequence from T. denticola. Treponemes possessing this operon family express chymotrypsin-like protease activity and can cleave the substrate N-succinyl-alanyl-alanyl-prolyl-phenylalanine-p-nitroanilide (SAAPFNA). Treponemes possessing the second paralog family do not possess chymotrypsin-like activity or cleave SAAPFNA. Despite examination of a range of protein and peptide substrates, the specificity of the second protease family remains unknown. Each of the fully sequenced prcA and prtP genes contains a 5′ hydrophobic leader sequence with a treponeme lipobox. The two paralogous families of treponeme subtilisins represent a new subgroup within the subtilisin family of proteases and are the only subtilisin lipoprotein family. The present study demonstrated that the subtilisin paralogs comprising a two-gene operon are widely distributed among treponemes. PMID:14617650
Identification of a novel mutation in a Chinese family with Nance-Horan syndrome by whole exome sequencing.

PubMed

Hong, Nan; Chen, Yan-hua; Xie, Chen; Xu, Bai-sheng; Huang, Hui; Li, Xin; Yang, Yue-qing; Huang, Ying-ping; Deng, Jian-lian; Qi, Ming; Gu, Yang-shun

2014-08-01

Nance-Horan syndrome (NHS) is a rare X-linked disorder characterized by congenital nuclear cataracts, dental anomalies, and craniofacial dysmorphisms. Mental retardation was present in about 30% of the reported cases. The purpose of this study was to investigate the genetic and clinical features of NHS in a Chinese family. Whole exome sequencing analysis was performed on DNA from an affected male to scan for candidate mutations on the X-chromosome. Sanger sequencing was used to verify these candidate mutations in the whole family. Clinical and ophthalmological examinations were performed on all members of the family. A combination of exome sequencing and Sanger sequencing revealed a nonsense mutation c.322G>T (E108X) in exon 1 of NHS gene, co-segregating with the disease in the family. The nonsense mutation led to the conversion of glutamic acid to a stop codon (E108X), resulting in truncation of the NHS protein. Multiple sequence alignments showed that codon 108, where the mutation (c.322G>T) occurred, was located within a phylogenetically conserved region. The clinical features in all affected males and female carriers are described in detail. We report a nonsense mutation c.322G>T (E108X) in a Chinese family with NHS. Our findings broaden the spectrum of NHS mutations and provide molecular insight into future NHS clinical genetic diagnosis.
DNA sequence-level analyses reveal potential phenotypic modifiers in a large family with psychiatric disorders.

PubMed

Ryan, Niamh M; Lihm, Jayon; Kramer, Melissa; McCarthy, Shane; Morris, Stewart W; Arnau-Soler, Aleix; Davies, Gail; Duff, Barbara; Ghiban, Elena; Hayward, Caroline; Deary, Ian J; Blackwood, Douglas H R; Lawrie, Stephen M; McIntosh, Andrew M; Evans, Kathryn L; Porteous, David J; McCombie, W Richard; Thomson, Pippa A

2018-06-07

Psychiatric disorders are a group of genetically related diseases with highly polygenic architectures. Genome-wide association analyses have made substantial progress towards understanding the genetic architecture of these disorders. More recently, exome- and whole-genome sequencing of cases and families have identified rare, high penetrant variants that provide direct functional insight. There remains, however, a gap in the heritability explained by these complementary approaches. To understand how multiple genetic variants combine to modify both severity and penetrance of a highly penetrant variant, we sequenced 48 whole genomes from a family with a high loading of psychiatric disorder linked to a balanced chromosomal translocation. The (1;11)(q42;q14.3) translocation directly disrupts three genes: DISC1, DISC2, DISC1FP and has been linked to multiple brain imaging and neurocognitive outcomes in the family. Using DNA sequence-level linkage analysis, functional annotation and population-based association, we identified common and rare variants in GRM5 (minor allele frequency (MAF) > 0.05), PDE4D (MAF > 0.2) and CNTN5 (MAF < 0.01) that may help explain the individual differences in phenotypic expression in the family. We suggest that whole-genome sequencing in large families will improve the understanding of the combined effects of the rare and common sequence variation underlying psychiatric phenotypes.
Analysis of functional redundancies within the Arabidopsis TCP transcription factor family.

PubMed

Danisman, Selahattin; van Dijk, Aalt D J; Bimbo, Andrea; van der Wal, Froukje; Hennig, Lars; de Folter, Stefan; Angenent, Gerco C; Immink, Richard G H

2013-12-01

Analyses of the functions of TEOSINTE-LIKE1, CYCLOIDEA, and PROLIFERATING CELL FACTOR1 (TCP) transcription factors have been hampered by functional redundancy between its individual members. In general, putative functionally redundant genes are predicted based on sequence similarity and confirmed by genetic analysis. In the TCP family, however, identification is impeded by relatively low overall sequence similarity. In a search for functionally redundant TCP pairs that control Arabidopsis leaf development, this work performed an integrative bioinformatics analysis, combining protein sequence similarities, gene expression data, and results of pair-wise protein-protein interaction studies for the 24 members of the Arabidopsis TCP transcription factor family. For this, the work completed any lacking gene expression and protein-protein interaction data experimentally and then performed a comprehensive prediction of potential functional redundant TCP pairs. Subsequently, redundant functions could be confirmed for selected predicted TCP pairs by genetic and molecular analyses. It is demonstrated that the previously uncharacterized class I TCP19 gene plays a role in the control of leaf senescence in a redundant fashion with TCP20. Altogether, this work shows the power of combining classical genetic and molecular approaches with bioinformatics predictions to unravel functional redundancies in the TCP transcription factor family.
Analysis of functional redundancies within the Arabidopsis TCP transcription factor family

PubMed Central

Danisman, Selahattin; de Folter, Stefan; Immink, Richard G. H.

2013-01-01

Analyses of the functions of TEOSINTE-LIKE1, CYCLOIDEA, and PROLIFERATING CELL FACTOR1 (TCP) transcription factors have been hampered by functional redundancy between its individual members. In general, putative functionally redundant genes are predicted based on sequence similarity and confirmed by genetic analysis. In the TCP family, however, identification is impeded by relatively low overall sequence similarity. In a search for functionally redundant TCP pairs that control Arabidopsis leaf development, this work performed an integrative bioinformatics analysis, combining protein sequence similarities, gene expression data, and results of pair-wise protein–protein interaction studies for the 24 members of the Arabidopsis TCP transcription factor family. For this, the work completed any lacking gene expression and protein–protein interaction data experimentally and then performed a comprehensive prediction of potential functional redundant TCP pairs. Subsequently, redundant functions could be confirmed for selected predicted TCP pairs by genetic and molecular analyses. It is demonstrated that the previously uncharacterized class I TCP19 gene plays a role in the control of leaf senescence in a redundant fashion with TCP20. Altogether, this work shows the power of combining classical genetic and molecular approaches with bioinformatics predictions to unravel functional redundancies in the TCP transcription factor family. PMID:24129704
Cold Urticaria, Immunodeficiency, and Autoimmunity Related to PLCG2 Deletions

PubMed Central

Ombrello, Michael J.; Remmers, Elaine F.; Sun, Guangping; Freeman, Alexandra F.; Datta, Shrimati; Torabi-Parizi, Parizad; Subramanian, Naeha; Bunney, Tom D.; Baxendale, Rhona W.; Martins, Marta S.; Romberg, Neil; Komarow, Hirsh; Aksentijevich, Ivona; Kim, Hun Sik; Ho, Jason; Cruse, Glenn; Jung, Mi-Yeon; Gilfillan, Alasdair M.; Metcalfe, Dean D.; Nelson, Celeste; O'Brien, Michelle; Wisch, Laura; Stone, Kelly; Douek, Daniel C.; Gandhi, Chhavi; Wanderer, Alan A.; Lee, Hane; Nelson, Stanley F.; Shianna, Kevin V.; Cirulli, Elizabeth T.; Goldstein, David B.; Long, Eric O.; Moir, Susan; Meffre, Eric; Holland, Steven M.; Kastner, Daniel L.; Katan, Matilda; Hoffman, Hal M.; Milner, Joshua D.

2012-01-01

Background Mendelian analysis of disorders of immune regulation can provide insight into molecular pathways associated with host defense and immune tolerance. Methods We identified three families with a dominantly inherited complex of cold-induced urticaria, antibody deficiency, and susceptibility to infection and autoimmunity. Immunophenotyping methods included flow cytometry, analysis of serum immunoglobulins and autoantibodies, lymphocyte stimulation, and enzymatic assays. Genetic studies included linkage analysis, targeted Sanger sequencing, and next-generation whole-genome sequencing. Results Cold urticaria occurred in all affected subjects. Other, variable manifestations included atopy, granulomatous rash, autoimmune thyroiditis, the presence of antinuclear antibodies, sinopulmonary infections, and common variable immunodeficiency. Levels of serum IgM and IgA and circulating natural killer cells and class-switched memory B cells were reduced. Linkage analysis showed a 7-Mb candidate interval on chromosome 16q in one family, overlapping by 3.5 Mb a disease-associated haplotype in a smaller family. This interval includes PLCG2, encoding phospholipase Cγ2 (PLCγ2), a signaling molecule expressed in B cells, natural killer cells, and mast cells. Sequencing of complementary DNA revealed heterozygous transcripts lacking exon 19 in two families and lacking exons 20 through 22 in a third family. Genomic sequencing identified three distinct in-frame deletions that cosegregated with disease. These deletions, located within a region encoding an autoinhibitory domain, result in protein products with constitutive phospholipase activity. PLCG2-expressing cells had diminished cellular signaling at 37°C but enhanced signaling at subphysiologic temperatures. Conclusions Genomic deletions in PLCG2 cause gain of PLCγ2 function, leading to signaling abnormalities in multiple leukocyte subsets and a phenotype encompassing both excessive and deficient immune function. (Funded by the National Institutes of Health Intramural Research Programs and others.) PMID:22236196
A novel species-specific tandem repeat DNA family from Sinapis arvensis: detection of telomere-like sequences.

PubMed

Kapila, R; Das, S; Srivastava, P S; Lakshmikumaran, M

1996-08-01

DNA sequences representing a tandemly repeated DNA family of the Sinapis arvensis genome were cloned and characterized. The 700-bp tandem repeat family is represented by two clones, pSA35 and pSA52, which are 697 and 709 bp in length, respectively. Dot matrix analysis of the sequences indicates the presence of repeated elements within each monomeric unit. Sequence analysis of the repetitive region of clones pSA35 and pSA52 shows that there are several copies of a 7-bp repeat element organized in tandem. The consensus sequence of this repeat element is 5'-TTTAGGG-3'. These elements are highly mutated and the difference in length between the two clones is due to different copy numbers of these elements. The repetitive region of clone pSA35 has 26 copies of the element TTTAGGG, whereas clone pSA52 has 28 copies. The repetitive region in both clones is flanked on either side by inverted repeats that may be footprints of a transposition event. Sequence comparison indicates that the element TTTAGGG is identical to telomeric repeats present in Arabidopsis, maize, tomato, and other plants. However, Bal31 digestion kinetics indicates non-telomeric localization of the 700-bp tandem repeats. The clones represent a novel repeat family as (i) they contain telomere-like motifs as subrepeats within each unit; and (ii) they do not hybridize to related crucifers and are species-specific in nature.
Targeted exome sequencing reveals novel USH2A mutations in Chinese patients with simplex Usher syndrome.

PubMed

Shu, Hai-Rong; Bi, Huai; Pan, Yang-Chun; Xu, Hang-Yu; Song, Jian-Xin; Hu, Jie

2015-09-16

Usher syndrome (USH) is an autosomal recessive disorder characterized by hearing impairment and vision dysfunction due to retinitis pigmentosa. Phenotypic and genetic heterogeneities of this disease make it impractical to obtain a genetic diagnosis by conventional Sanger sequencing. In this study, we applied a next-generation sequencing approach to detect genetic abnormalities in patients with USH. Two unrelated Chinese families were recruited, consisting of two USH afflicted patients and four unaffected relatives. We selected 199 genes related to inherited retinal diseases as targets for deep exome sequencing. Through systematic data analysis using an established bioinformatics pipeline, all variants that passed filter criteria were validated by Sanger sequencing and co-segregation analysis. A homozygous frameshift mutation (c.4382delA, p.T1462Lfs*2) was revealed in exon20 of gene USH2A in the F1 family. Two compound heterozygous mutations, IVS47 + 1G > A and c.13156A > T (p.I4386F), located in intron 48 and exon 63 respectively, of USH2A, were identified as causative mutations for the F2 family. Of note, the missense mutation c.13156A > T has not been reported so far. In conclusion, targeted exome sequencing precisely and rapidly identified the genetic defects in two Chinese USH families and this technique can be applied as a routine examination for these disorders with significant clinical and genetic heterogeneity.
Comprehensive comparative analysis of kinesins in photosynthetic eukaryotes

PubMed Central

Richardson, Dale N; Simmons, Mark P; Reddy, Anireddy SN

2006-01-01

Background Kinesins, a superfamily of molecular motors, use microtubules as tracks and transport diverse cellular cargoes. All kinesins contain a highly conserved ~350 amino acid motor domain. Previous analysis of the completed genome sequence of one flowering plant (Arabidopsis) has resulted in identification of 61 kinesins. The recent completion of genome sequencing of several photosynthetic and non-photosynthetic eukaryotes that belong to divergent lineages offers a unique opportunity to conduct a comprehensive comparative analysis of kinesins in plant and non-plant systems and infer their evolutionary relationships. Results We used the kinesin motor domain to identify kinesins in the completed genome sequences of 19 species, including 13 newly sequenced genomes. Among the newly analyzed genomes, six represent photosynthetic eukaryotes. A total of 529 kinesins was used to perform comprehensive analysis of kinesins and to construct gene trees using the Bayesian and parsimony approaches. The previously recognized 14 families of kinesins are resolved as distinct lineages in our inferred gene tree. At least three of the 14 kinesin families are not represented in flowering plants. Chlamydomonas, a green alga that is part of the lineage that includes land plants, has at least nine of the 14 known kinesin families. Seven of ten families present in flowering plants are represented in Chlamydomonas, indicating that these families were retained in both the flowering-plant and green algae lineages. Conclusion The increase in the number of kinesins in flowering plants is due to vast expansion of the Kinesin-14 and Kinesin-7 families. The Kinesin-14 family, which typically contains a C-terminal motor, has many plant kinesins that have the motor domain at the N terminus, in the middle, or the C terminus. Several domains in kinesins are present exclusively either in plant or animal lineages. Addition of novel domains to kinesins in lineage-specific groups contributed to the functional diversification of kinesins. Results from our gene-tree analyses indicate that there was tremendous lineage-specific duplication and diversification of kinesins in eukaryotes. Since the functions of only a few plant kinesins are reported in the literature, this comprehensive comparative analysis will be useful in designing functional studies with photosynthetic eukaryotes. PMID:16448571
Exome Sequence Analysis of 14 Families With High Myopia.

PubMed

Kloss, Bethany A; Tompson, Stuart W; Whisenhunt, Kristina N; Quow, Krystina L; Huang, Samuel J; Pavelec, Derek M; Rosenberg, Thomas; Young, Terri L

2017-04-01

To identify causal gene mutations in 14 families with autosomal dominant (AD) high myopia using exome sequencing. Select individuals from 14 large Caucasian families with high myopia were exome sequenced. Gene variants were filtered to identify potential pathogenic changes. Sanger sequencing was used to confirm variants in original DNA, and to test for disease cosegregation in additional family members. Candidate genes and chromosomal loci previously associated with myopic refractive error and its endophenotypes were comprehensively screened. In 14 high myopia families, we identified 73 rare and 31 novel gene variants as candidates for pathogenicity. In seven of these families, two of the novel and eight of the rare variants were within known myopia loci. A total of 104 heterozygous nonsynonymous rare variants in 104 genes were identified in 10 out of 14 probands. Each variant cosegregated with affection status. No rare variants were identified in genes known to cause myopia or in genes closest to published genome-wide association study association signals for refractive error or its endophenotypes. Whole exome sequencing was performed to determine gene variants implicated in the pathogenesis of AD high myopia. This study provides new genes for consideration in the pathogenesis of high myopia, and may aid in the development of genetic profiling of those at greatest risk for attendant ocular morbidities of this disorder.
Capturing Complexities of Relationship-Level Family Planning Trajectories in Malawi.

PubMed

Furnas, Hannah E

2016-09-01

In a transitioning fertility climate, preferences and decisions surrounding family planning are constantly in flux. Malawi provides an ideal case study of family planning complexities as fertility preferences are flexible, the relationship context is unstable, and childbearing begins early. I use intensive longitudinal data from Tsogolo la Thanzi-a research project in Malawi that follows young adults in romantic partnerships through the course of their relationship. I examine two questions: (1) What are the typical patterns of family planning as young adults transition through a relationship? (2) How are family planning trajectories related to individual and relationship-level characteristics? I use sequence analysis to order family planning across time and to contextualize it within each relationship. I generate and cluster the family planning trajectories and find six distinct groups of young adults who engage in family planning in similar ways. I find that family planning is complex, dynamic, and unique to each relationship. I argue that (a) family planning research should use the relationship as the unit of analysis and (b) family planning behaviors and preferences should be sequenced over time for a better understanding of key concepts, such as unmet need. © 2016 The Population Council, Inc.
Capturing Complexities of Relationship-Level Family Planning Trajectories in Malawi

PubMed Central

Furnas, Hannah E.

2017-01-01

In a transitioning fertility climate, preferences and decisions surrounding family planning are constantly in flux. Malawi provides an ideal case study of family planning complexities as fertility preferences are flexible, the relationship context is unstable, and childbearing begins early. I use intensive longitudinal data from Tsogolo la Thanzi—a research project in Malawi that follows young adults in romantic partnerships through the course of their relationship and allows me to ask two questions: (1) What are the typical patterns of family planning as young adults transition through a relationship? (2) How are family planning trajectories related to individual and relationship-level characteristics? I use sequence analysis to order family planning across time and to contextualize it within each relationship. I generate and cluster the family planning trajectories and find six distinct groups of young adults who engage in family planning in similar ways. I find that family planning is complex, dynamic, and unique to each relationship. I argue that (a) family planning research should use the relationship as the unit of analysis and (b) family planning behaviors and preferences should be sequenced over time for a better understanding of key concepts, such as unmet need. PMID:27517867
Problems of classification in the family Paramyxoviridae.

PubMed

Rima, Bert; Collins, Peter; Easton, Andrew; Fouchier, Ron; Kurath, Gael; Lamb, Robert A; Lee, Benhur; Maisner, Andrea; Rota, Paul; Wang, Lin-Fa

2018-05-01

A number of unassigned viruses in the family Paramyxoviridae need to be classified either as a new genus or placed into one of the seven genera currently recognized in this family. Furthermore, numerous new paramyxoviruses continue to be discovered. However, attempts at classification have highlighted the difficulties that arise by applying historic criteria or criteria based on sequence alone to the classification of the viruses in this family. While the recent taxonomic change that elevated the previous subfamily Pneumovirinae into a separate family Pneumoviridae is readily justified on the basis of RNA dependent -RNA polymerase (RdRp or L protein) sequence motifs, using RdRp sequence comparisons for assignment to lower level taxa raises problems that would require an overhaul of the current criteria for assignment into genera in the family Paramyxoviridae. Arbitrary cut off points to delineate genera and species would have to be set if classification was based on the amino acid sequence of the RdRp alone or on pairwise analysis of sequence complementarity (PASC) of all open reading frames (ORFs). While these cut-offs cannot be made consistent with the current classification in this family, resorting to genus-level demarcation criteria with additional input from the biological context may afford a way forward. Such criteria would reflect the increasingly dynamic nature of virus taxonomy even if it would require a complete revision of the current classification.
Novel USH2A compound heterozygous mutations cause RP/USH2 in a Chinese family.

PubMed

Liu, Xiaowen; Tang, Zhaohui; Li, Chang; Yang, Kangjuan; Gan, Guanqi; Zhang, Zibo; Liu, Jingyu; Jiang, Fagang; Wang, Qing; Liu, Mugen

2010-03-17

To identify the disease-causing gene in a four-generation Chinese family affected with retinitis pigmentosa (RP). Linkage analysis was performed with a panel of microsatellite markers flanking the candidate genetic loci of RP. These loci included 38 known RP genes. The complete coding region and exon-intron boundaries of Usher syndrome 2A (USH2A) were sequenced with the proband DNA to screen the disease-causing gene mutation. Restriction fragment length polymorphism (RFLP) analysis and direct DNA sequence analysis were done to demonstrate co-segregation of the USH2A mutations with the family disease. One hundred normal controls were used without the mutations. The disease-causing gene in this Chinese family was linked to the USH2A locus on chromosome 1q41. Direct DNA sequence analysis of USH2A identified two novel mutations in the patients: one missense mutation p.G1734R in exon 26 and a splice site mutation, IVS32+1G>A, which was found in the donor site of intron 32 of USH2A. Neither the p.G1734R nor the IVS32+1G>A mutation was found in the unaffected family members or the 100 normal controls. One patient with a homozygous mutation displayed only RP symptoms until now, while three patients with compound heterozygous mutations in the family of study showed both RP and hearing impairment. This study identified two novel mutations: p.G1734R and IVS32+1G>A of USH2A in a four-generation Chinese RP family. In this study, the heterozygous mutation and the homozygous mutation in USH2A may cause Usher syndrome Type II or RP, respectively. These two mutations expand the mutant spectrum of USH2A.
LGI1 microdeletion in autosomal dominant lateral temporal epilepsy

PubMed Central

Fanciulli, M.; Santulli, L.; Errichiello, L.; Barozzi, C.; Tomasi, L.; Rigon, L.; Cubeddu, T.; de Falco, A.; Rampazzo, A.; Michelucci, R.; Uzzau, S.; Striano, S.; de Falco, F.A.; Striano, P.

2012-01-01

Objectives: To characterize clinically and genetically a family with autosomal dominant lateral temporal epilepsy (ADLTE) negative to LGI1 exon sequencing test. Methods: All participants were personally interviewed and underwent neurologic examination. Most affected subjects underwent EEG and neuroradiologic examinations (CT/MRI). Available family members were genotyped with the HumanOmni1-Quad v1.0 single nucleotide polymorphism (SNP) array beadchip and copy number variations (CNVs) were analyzed in each subject. LGI1 gene dosage was performed by real-time quantitative PCR (qPCR). Results: The family had 8 affected members (2 deceased) over 3 generations. All of them showed GTC seizures, with focal onset in 6 and unknown onset in 2. Four patients had focal seizures with auditory features. EEG showed only minor sharp abnormalities in 3 patients and MRI was unremarkable in all the patients examined. Three family members presented major depression and anxiety symptoms. Routine LGI1 exon sequencing revealed no point mutation. High-density SNP array CNV analysis identified a genomic microdeletion about 81 kb in size encompassing the first 4 exons of LGI1 in all available affected members and in 2 nonaffected carriers, which was confirmed by qPCR analysis. Conclusions: This is the first microdeletion affecting LGI1 identified in ADLTE. Families with ADLTE in which no point mutations are revealed by direct exon sequencing should be screened for possible genomic deletion mutations by CNV analysis or other appropriate methods. Overall, CNV analysis of multiplex families may be useful for identifying microdeletions in novel disease genes. PMID:22496201
Clinical germline diagnostic exome sequencing for hereditary cancer: Findings within novel candidate genes are prevalent.

PubMed

Powis, Zöe; Espenschied, Carin R; LaDuca, Holly; Hagman, Kelly D; Paudyal, Tripti; Li, Shuwei; Inaba, Hiroto; Mauer, Ann; Nathanson, Katherine L; Knost, James; Chao, Elizabeth C; Tang, Sha

2018-08-01

Clinical diagnostic exome sequencing (DES) has been effective in diagnosing individuals with suspected genetic conditions; nevertheless little has been described regarding its clinical utility in individuals with a personal and family history of cancer. This study aimed to assess diagnostic yield and clinical characteristics of pediatric and adult patients undergoing germline DES for hereditary cancer. We retrospectively reviewed 2171 patients referred for DES; cases with a personal and/or family history of cancer were further studied. Of 39 cancer patients, relevant alterations were found in eight individuals (21%), including one (3%) positive pathogenic alteration within a characterized gene, two (5%) uncertain findings in characterized genes, and five (13%) alterations in novel candidate genes. Two of the 5 pediatric patients, undergoing testing, (40%) had findings in novel candidate genes, with the remainder being negative. We include brief case studies to illustrate the variety of challenging issues related to these patients. Our observations demonstrate utility of family-based exome sequencing in patients for suspected hereditary cancer, including familial co-segregation analysis, and comprehensive medical review. DES may be particularly useful when traditional approaches do not result in a diagnosis or in families with unique phenotypes. This work also highlights the importance and complexity of analysis of uncharacterized genes in exome sequencing for hereditary cancer. Copyright © 2018 Elsevier Inc. All rights reserved.
Use of life course work-family profiles to predict mortality risk among US women.

PubMed

Sabbath, Erika L; Guevara, Ivan Mejía; Glymour, M Maria; Berkman, Lisa F

2015-04-01

We examined relationships between US women's exposure to midlife work-family demands and subsequent mortality risk. We used data from women born 1935 to 1956 in the Health and Retirement Study to calculate employment, marital, and parenthood statuses for each age between 16 and 50 years. We used sequence analysis to identify 7 prototypical work-family trajectories. We calculated age-standardized mortality rates and hazard ratios (HRs) for mortality associated with work-family sequences, with adjustment for covariates and potentially explanatory later-life factors. Married women staying home with children briefly before reentering the workforce had the lowest mortality rates. In comparison, after adjustment for age, race/ethnicity, and education, HRs for mortality were 2.14 (95% confidence interval [CI] = 1.58, 2.90) among single nonworking mothers, 1.48 (95% CI = 1.06, 1.98) among single working mothers, and 1.36 (95% CI = 1.02, 1.80) among married nonworking mothers. Adjustment for later-life behavioral and economic factors partially attenuated risks. Sequence analysis is a promising exposure assessment tool for life course research. This method permitted identification of certain lifetime work-family profiles associated with mortality risk before age 75 years.
Genetic and molecular characterization of the maize rp3 rust resistance locus.

PubMed Central

Webb, Craig A; Richter, Todd E; Collins, Nicholas C; Nicolas, Marie; Trick, Harold N; Pryor, Tony; Hulbert, Scot H

2002-01-01

In maize, the Rp3 gene confers resistance to common rust caused by Puccinia sorghi. Flanking marker analysis of rust-susceptible rp3 variants suggested that most of them arose via unequal crossing over, indicating that rp3 is a complex locus like rp1. The PIC13 probe identifies a nucleotide binding site-leucine-rich repeat (NBS-LRR) gene family that maps to the complex. Rp3 variants show losses of PIC13 family members relative to the resistant parents when probed with PIC13, indicating that the Rp3 gene is a member of this family. Gel blots and sequence analysis suggest that at least 9 family members are at the locus in most Rp3-carrying lines and that at least 5 of these are transcribed in the Rp3-A haplotype. The coding regions of 14 family members, isolated from three different Rp3-carrying haplotypes, had DNA sequence identities from 93 to 99%. Partial sequencing of clones of a BAC contig spanning the rp3 locus in the maize inbred line B73 identified five different PIC13 paralogues in a region of approximately 140 kb. PMID:12242248

Rare Variants in RTEL1 Are Associated with Familial Interstitial Pneumonia

PubMed Central

Cogan, Joy D.; Zhao, Min; Mitchell, Daphne B.; Rives, Lynette; Markin, Cheryl; Garnett, Errine T.; Montgomery, Keri H.; Mason, Wendi R.; McKean, David F.; Powers, Julia; Murphy, Elissa; Olson, Lana M.; Choi, Leena; Cheng, Dong-Sheng; Blue, Elizabeth Marchani; Young, Lisa R.; Lancaster, Lisa H.; Steele, Mark P.; Brown, Kevin K.; Schwarz, Marvin I.; Fingerlin, Tasha E.; Schwartz, David A.; Lawson, William E.; Loyd, James E.; Zhao, Zhongming; Phillips, John A.; Blackwell, Timothy S.

2015-01-01

Rationale: Up to 20% of cases of idiopathic interstitial pneumonia cluster in families, comprising the syndrome of familial interstitial pneumonia (FIP); however, the genetic basis of FIP remains uncertain in most families. Objectives: To determine if new disease-causing rare genetic variants could be identified using whole-exome sequencing of affected members from FIP families, providing additional insights into disease pathogenesis. Methods: Affected subjects from 25 kindreds were selected from an ongoing FIP registry for whole-exome sequencing from genomic DNA. Candidate rare variants were confirmed by Sanger sequencing, and cosegregation analysis was performed in families, followed by additional sequencing of affected individuals from another 163 kindreds. Measurements and Main Results: We identified a potentially damaging rare variant in the gene encoding for regulator of telomere elongation helicase 1 (RTEL1) that segregated with disease and was associated with very short telomeres in peripheral blood mononuclear cells in 1 of 25 families in our original whole-exome sequencing cohort. Evaluation of affected individuals in 163 additional kindreds revealed another eight families (4.7%) with heterozygous rare variants in RTEL1 that segregated with clinical FIP. Probands and unaffected carriers of these rare variants had short telomeres (<10% for age) in peripheral blood mononuclear cells and increased T-circle formation, suggesting impaired RTEL1 function. Conclusions: Rare loss-of-function variants in RTEL1 represent a newly defined genetic predisposition for FIP, supporting the importance of telomere-related pathways in pulmonary fibrosis. PMID:25607374
Cazymes Analysis Toolkit (CAT): Webservice for searching and analyzing carbohydrateactive enzymes in a newly sequenced organism using CAZy database

DOE Office of Scientific and Technical Information (OSTI.GOV)

Karpinets, Tatiana V; Park, Byung; Syed, Mustafa H

2010-01-01

The Carbohydrate-Active Enzyme (CAZy) database provides a rich set of manually annotated enzymes that degrade, modify, or create glycosidic bonds. Despite rich and invaluable information stored in the database, software tools utilizing this information for annotation of newly sequenced genomes by CAZy families are limited. We have employed two annotation approaches to fill the gap between manually curated high-quality protein sequences collected in the CAZy database and the growing number of other protein sequences produced by genome or metagenome sequencing projects. The first approach is based on a similarity search against the entire non-redundant sequences of the CAZy database. Themore » second approach performs annotation using links or correspondences between the CAZy families and protein family domains. The links were discovered using the association rule learning algorithm applied to sequences from the CAZy database. The approaches complement each other and in combination achieved high specificity and sensitivity when cross-evaluated with the manually curated genomes of Clostridium thermocellum ATCC 27405 and Saccharophagus degradans 2-40. The capability of the proposed framework to predict the function of unknown protein domains (DUF) and of hypothetical proteins in the genome of Neurospora crassa is demonstrated. The framework is implemented as a Web service, the CAZymes Analysis Toolkit (CAT), and is available at http://cricket.ornl.gov/cgi-bin/cat.cgi.« less
CAZymes Analysis Toolkit (CAT): web service for searching and analyzing carbohydrate-active enzymes in a newly sequenced organism using CAZy database.

PubMed

Park, Byung H; Karpinets, Tatiana V; Syed, Mustafa H; Leuze, Michael R; Uberbacher, Edward C

2010-12-01

The Carbohydrate-Active Enzyme (CAZy) database provides a rich set of manually annotated enzymes that degrade, modify, or create glycosidic bonds. Despite rich and invaluable information stored in the database, software tools utilizing this information for annotation of newly sequenced genomes by CAZy families are limited. We have employed two annotation approaches to fill the gap between manually curated high-quality protein sequences collected in the CAZy database and the growing number of other protein sequences produced by genome or metagenome sequencing projects. The first approach is based on a similarity search against the entire nonredundant sequences of the CAZy database. The second approach performs annotation using links or correspondences between the CAZy families and protein family domains. The links were discovered using the association rule learning algorithm applied to sequences from the CAZy database. The approaches complement each other and in combination achieved high specificity and sensitivity when cross-evaluated with the manually curated genomes of Clostridium thermocellum ATCC 27405 and Saccharophagus degradans 2-40. The capability of the proposed framework to predict the function of unknown protein domains and of hypothetical proteins in the genome of Neurospora crassa is demonstrated. The framework is implemented as a Web service, the CAZymes Analysis Toolkit, and is available at http://cricket.ornl.gov/cgi-bin/cat.cgi.
Next-generation sequencing reveals a novel NDP gene mutation in a Chinese family with Norrie disease.

PubMed

Huang, Xiaoyan; Tian, Mao; Li, Jiankang; Cui, Ling; Li, Min; Zhang, Jianguo

2017-11-01

Norrie disease (ND) is a rare X-linked genetic disorder, the main symptoms of which are congenital blindness and white pupils. It has been reported that ND is caused by mutations in the NDP gene. Although many mutations in NDP have been reported, the genetic cause for many patients remains unknown. In this study, the aim is to investigate the genetic defect in a five-generation family with typical symptoms of ND. To identify the causative gene, next-generation sequencing based target capture sequencing was performed. Segregation analysis of the candidate variant was performed in additional family members using Sanger sequencing. We identified a novel missense variant (c.314C>A) located within the NDP gene. The mutation cosegregated within all affected individuals in the family and was not found in unaffected members. By happenstance, in this family, we also detected a known pathogenic variant of retinitis pigmentosa in a healthy individual. c.314C>A mutation of NDP gene is a novel mutation and broadens the genetic spectrum of ND.
Next-generation sequencing reveals a novel NDP gene mutation in a Chinese family with Norrie disease

PubMed Central

Huang, Xiaoyan; Tian, Mao; Li, Jiankang; Cui, Ling; Li, Min; Zhang, Jianguo

2017-01-01

Purpose: Norrie disease (ND) is a rare X-linked genetic disorder, the main symptoms of which are congenital blindness and white pupils. It has been reported that ND is caused by mutations in the NDP gene. Although many mutations in NDP have been reported, the genetic cause for many patients remains unknown. In this study, the aim is to investigate the genetic defect in a five-generation family with typical symptoms of ND. Methods: To identify the causative gene, next-generation sequencing based target capture sequencing was performed. Segregation analysis of the candidate variant was performed in additional family members using Sanger sequencing. Results: We identified a novel missense variant (c.314C>A) located within the NDP gene. The mutation cosegregated within all affected individuals in the family and was not found in unaffected members. By happenstance, in this family, we also detected a known pathogenic variant of retinitis pigmentosa in a healthy individual. Conclusion: c.314C>A mutation of NDP gene is a novel mutation and broadens the genetic spectrum of ND. PMID:29133643
Integrated genome sequence and linkage map of physic nut (Jatropha curcas L.), a biodiesel plant.

PubMed

Wu, Pingzhi; Zhou, Changpin; Cheng, Shifeng; Wu, Zhenying; Lu, Wenjia; Han, Jinli; Chen, Yanbo; Chen, Yan; Ni, Peixiang; Wang, Ying; Xu, Xun; Huang, Ying; Song, Chi; Wang, Zhiwen; Shi, Nan; Zhang, Xudong; Fang, Xiaohua; Yang, Qing; Jiang, Huawu; Chen, Yaping; Li, Meiru; Wang, Ying; Chen, Fan; Wang, Jun; Wu, Guojiang

2015-03-01

The family Euphorbiaceae includes some of the most efficient biomass accumulators. Whole genome sequencing and the development of genetic maps of these species are important components in molecular breeding and genetic improvement. Here we report the draft genome of physic nut (Jatropha curcas L.), a biodiesel plant. The assembled genome has a total length of 320.5 Mbp and contains 27,172 putative protein-coding genes. We established a linkage map containing 1208 markers and anchored the genome assembly (81.7%) to this map to produce 11 pseudochromosomes. After gene family clustering, 15,268 families were identified, of which 13,887 existed in the castor bean genome. Analysis of the genome highlighted specific expansion and contraction of a number of gene families during the evolution of this species, including the ribosome-inactivating proteins and oil biosynthesis pathway enzymes. The genomic sequence and linkage map provide a valuable resource not only for fundamental and applied research on physic nut but also for evolutionary and comparative genomics analysis, particularly in the Euphorbiaceae. © 2015 The Authors The Plant Journal © 2015 John Wiley & Sons Ltd.
Comprehensive analysis of genetic variations in strictly-defined Leber congenital amaurosis with whole-exome sequencing in Chinese.

PubMed

Wang, Shi-Yuan; Zhang, Qi; Zhang, Xiang; Zhao, Pei-Quan

2016-01-01

To make a comprehensive analysis of the potential pathogenic genes related with Leber congenital amaurosis (LCA) in Chinese. LCA subjects and their families were retrospectively collected from 2013 to 2015. Firstly, whole-exome sequencing was performed in patients who had underwent gene mutation screening with nothing found, and then homozygous sites was selected, candidate sites were annotated, and pathogenic analysis was conducted using softwares including Sorting Tolerant from Intolerant (SIFT), Polyphen-2, Mutation assessor, Condel, and Functional Analysis through Hidden Markov Models (FATHMM). Furthermore, Gene Ontology function and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses of pathogenic genes were performed followed by co-segregation analysis using Fisher exact Test. Sanger sequencing was used to validate single-nucleotide variations (SNVs). Expanded verification was performed in the rest patients. Totally 51 LCA families with 53 patients and 24 family members were recruited. A total of 104 SNVs (66 LCA-related genes and 15 co-segregated genes) were submitted for expand verification. The frequencies of homozygous mutation of KRT12 and CYP1A1 were simultaneously observed in 3 families. Enrichment analysis showed that the potential pathogenic genes were mainly enriched in functions related to cell adhesion, biological adhesion, retinoid metabolic process, and eye development biological adhesion. Additionally, WFS1 and STAU2 had the highest homozygous frequencies. LCA is a highly heterogeneous disease. Mutations in KRT12, CYP1A1, WFS1, and STAU2 may be involved in the development of LCA.
Mutation analysis of 272 Spanish families affected by autosomal recessive retinitis pigmentosa using a genotyping microarray.

PubMed

Ávila-Fernández, Almudena; Cantalapiedra, Diego; Aller, Elena; Vallespín, Elena; Aguirre-Lambán, Jana; Blanco-Kelly, Fiona; Corton, M; Riveiro-Álvarez, Rosa; Allikmets, Rando; Trujillo-Tiebas, María José; Millán, José M; Cremers, Frans P M; Ayuso, Carmen

2010-12-03

Retinitis pigmentosa (RP) is a genetically heterogeneous disorder characterized by progressive loss of vision. The aim of this study was to identify the causative mutations in 272 Spanish families using a genotyping microarray. 272 unrelated Spanish families, 107 with autosomal recessive RP (arRP) and 165 with sporadic RP (sRP), were studied using the APEX genotyping microarray. The families were also classified by clinical criteria: 86 juveniles and 186 typical RP families. Haplotype and sequence analysis were performed to identify the second mutated allele. At least one-gene variant was found in 14% and 16% of the juvenile and typical RP groups respectively. Further study identified four new mutations, providing both causative changes in 11% of the families. Retinol Dehydrogenase 12 (RDH12) was the most frequently mutated gene in the juvenile RP group, and Usher Syndrome 2A (USH2A) and Ceramide Kinase-Like (CERKL) were the most frequently mutated genes in the typical RP group. The only variant found in CERKL was p.Arg257Stop, the most frequent mutation. The genotyping microarray combined with segregation and sequence analysis allowed us to identify the causative mutations in 11% of the families. Due to the low number of characterized families, this approach should be used in tandem with other techniques.
[The mutation analysis of PAH gene and prenatal diagnosis in classical phenylketonuria family].

PubMed

Yan, Yousheng; Hao, Shengju; Yao, Fengxia; Sun, Qingmei; Zheng, Lei; Zhang, Qinghua; Zhang, Chuan; Yang, Tao; Huang, Shangzhi

2014-12-01

To characterize the mutation spectrum of phenylalanine hydroxylase (PAH) gene and perform prenatal diagnosis for families with classical phenylketonuria. By stratified sequencing, mutations were detected in the exons and flaking introns of PAH gene of 44 families with classical phenylketonuria. 47 fetuses were diagnosed by combined sequencing with linkage analysis of three common short tandem repeats (STR) (PAH-STR, PAH-26 and PAH-32) in the PAH gene. Thirty-one types of mutations were identified. A total of 84 mutations were identified in 88 alleles (95.45%), in which the most common mutation have been R243Q (21.59%), EX6-96A>G (6.82%), IVS4-1G>A (5.86%) and IVS7+2T>A (5.86%). Most mutations were found in exons 3, 5, 6, 7, 11 and 12. The polymorphism information content (PIC) of these three STR markers was 0.71 (PAH-STR), 0.48 (PAH-26) and 0.40 (PAH-32), respectively. Prenatal diagnosis was performed successfully with the combined method in 47 fetuses of 44 classical phenylketonuria families. Among them, 11 (23.4%) were diagnosed as affected, 24 (51.1%) as carriers, and 12 (25.5%) as unaffected. Prenatal diagnosis can be achieved efficiently and accurately by stratified sequencing of PAH gene and linkage analysis of STR for classical phenylketonuria families.
Management of familial cancer: sequencing, surveillance and society.

PubMed

Samuel, Nardin; Villani, Anita; Fernandez, Conrad V; Malkin, David

2014-12-01

The clinical management of familial cancer begins with recognition of patterns of cancer occurrence suggestive of genetic susceptibility in a proband or pedigree, to enable subsequent investigation of the underlying DNA mutations. In this regard, next-generation sequencing of DNA continues to transform cancer diagnostics, by enabling screening for cancer-susceptibility genes in the context of known and emerging familial cancer syndromes. Increasingly, not only are candidate cancer genes sequenced, but also entire 'healthy' genomes are mapped in children with cancer and their family members. Although large-scale genomic analysis is considered intrinsic to the success of cancer research and discovery, a number of accompanying ethical and technical issues must be addressed before this approach can be adopted widely in personalized therapy. In this Perspectives article, we describe our views on how the emergence of new sequencing technologies and cancer surveillance strategies is altering the framework for the clinical management of hereditary cancer. Genetic counselling and disclosure issues are discussed, and strategies for approaching ethical dilemmas are proposed.
Integration of Bioinformatics and Synthetic Promoters Leads to the Discovery of Novel Elicitor-Responsive cis-Regulatory Sequences in Arabidopsis1[C][W][OA

PubMed Central

Koschmann, Jeannette; Machens, Fabian; Becker, Marlies; Niemeyer, Julia; Schulze, Jutta; Bülow, Lorenz; Stahl, Dietmar J.; Hehl, Reinhard

2012-01-01

A combination of bioinformatic tools, high-throughput gene expression profiles, and the use of synthetic promoters is a powerful approach to discover and evaluate novel cis-sequences in response to specific stimuli. With Arabidopsis (Arabidopsis thaliana) microarray data annotated to the PathoPlant database, 732 different queries with a focus on fungal and oomycete pathogens were performed, leading to 510 up-regulated gene groups. Using the binding site estimation suite of tools, BEST, 407 conserved sequence motifs were identified in promoter regions of these coregulated gene sets. Motif similarities were determined with STAMP, classifying the 407 sequence motifs into 37 families. A comparative analysis of these 37 families with the AthaMap, PLACE, and AGRIS databases revealed similarities to known cis-elements but also led to the discovery of cis-sequences not yet implicated in pathogen response. Using a parsley (Petroselinum crispum) protoplast system and a modified reporter gene vector with an internal transformation control, 25 elicitor-responsive cis-sequences from 10 different motif families were identified. Many of the elicitor-responsive cis-sequences also drive reporter gene expression in an Agrobacterium tumefaciens infection assay in Nicotiana benthamiana. This work significantly increases the number of known elicitor-responsive cis-sequences and demonstrates the successful integration of a diverse set of bioinformatic resources combined with synthetic promoter analysis for data mining and functional screening in plant-pathogen interaction. PMID:22744985
Analysis of expressed sequence tags generated from full-length enriched cDNA libraries of melon

PubMed Central

2011-01-01

Background Melon (Cucumis melo), an economically important vegetable crop, belongs to the Cucurbitaceae family which includes several other important crops such as watermelon, cucumber, and pumpkin. It has served as a model system for sex determination and vascular biology studies. However, genomic resources currently available for melon are limited. Result We constructed eleven full-length enriched and four standard cDNA libraries from fruits, flowers, leaves, roots, cotyledons, and calluses of four different melon genotypes, and generated 71,577 and 22,179 ESTs from full-length enriched and standard cDNA libraries, respectively. These ESTs, together with ~35,000 ESTs available in public domains, were assembled into 24,444 unigenes, which were extensively annotated by comparing their sequences to different protein and functional domain databases, assigning them Gene Ontology (GO) terms, and mapping them onto metabolic pathways. Comparative analysis of melon unigenes and other plant genomes revealed that 75% to 85% of melon unigenes had homologs in other dicot plants, while approximately 70% had homologs in monocot plants. The analysis also identified 6,972 gene families that were conserved across dicot and monocot plants, and 181, 1,192, and 220 gene families specific to fleshy fruit-bearing plants, the Cucurbitaceae family, and melon, respectively. Digital expression analysis identified a total of 175 tissue-specific genes, which provides a valuable gene sequence resource for future genomics and functional studies. Furthermore, we identified 4,068 simple sequence repeats (SSRs) and 3,073 single nucleotide polymorphisms (SNPs) in the melon EST collection. Finally, we obtained a total of 1,382 melon full-length transcripts through the analysis of full-length enriched cDNA clones that were sequenced from both ends. Analysis of these full-length transcripts indicated that sizes of melon 5' and 3' UTRs were similar to those of tomato, but longer than many other dicot plants. Codon usages of melon full-length transcripts were largely similar to those of Arabidopsis coding sequences. Conclusion The collection of melon ESTs generated from full-length enriched and standard cDNA libraries is expected to play significant roles in annotating the melon genome. The ESTs and associated analysis results will be useful resources for gene discovery, functional analysis, marker-assisted breeding of melon and closely related species, comparative genomic studies and for gaining insights into gene expression patterns. PMID:21599934
Analysis of whole exome sequencing with cardiometabolic traits using family-based linkage and association in the IRAS Family Study

PubMed Central

Tabb, Keri L.; Hellwege, Jacklyn N.; Palmer, Nicholette D.; Dimitrov, Latchezar; Sajuthi, Satria; Taylor, Kent D.; NG, Maggie C.Y.; Hawkins, Gregory A.; Chen, Yii-Der Ida; Brown, W. Mark; McWilliams, David; Williams, Adrienne; Lorenzo, Carlos; Norris, Jill M.; Long, Jirong; Rotter, Jerome I.; Curran, Joanne E.; Blangero, John; Wagenknecht, Lynne E.; Langefeld, Carl D.; Bowden, Donald W.

2017-01-01

Summary Family-based methods are a potentially powerful tool to identify trait-defining genetic variants in extended families, particularly when used to complement conventional association analysis. We utilized two-point linkage analysis and single variant association analysis to evaluate whole exome sequencing (WES) data from 1,205 Hispanic Americans (78 families) from the Insulin Resistance Atherosclerosis Family Study. WES identified 211,612 variants above the minor allele frequency threshold of ≥0.005. These variants were tested for linkage and/or association with 50 cardiometabolic traits after quality control checks. Two-point linkage analysis yielded 10,580,600 LOD scores with 1,148 LOD scores ≥3, 183 LOD scores ≥4, and 29 LOD scores ≥5. The maximal novel LOD score was 5.50 for rs2289043:T>C, in UNC5C with subcutaneous adipose tissue volume. Association analysis identified 13 variants attaining genome-wide significance (p<5×10-08), with the strongest association between rs651821:C>T in APOA5, and triglyceride levels (p=3.67×10-10). Overall, there was a 5.2-fold increase in the number of informative variants detected by WES compared to exome chip analysis in this population, nearly 30% of which were novel variants relative to dbSNP build 138. Thus, integration of results from two-point linkage and single-variant association analysis from WES data enabled identification of novel signals potentially contributing to cardiometabolic traits. PMID:28067407
GFam: a platform for automatic annotation of gene families.

PubMed

Sasidharan, Rajkumar; Nepusz, Tamás; Swarbreck, David; Huala, Eva; Paccanaro, Alberto

2012-10-01

We have developed GFam, a platform for automatic annotation of gene/protein families. GFam provides a framework for genome initiatives and model organism resources to build domain-based families, derive meaningful functional labels and offers a seamless approach to propagate functional annotation across periodic genome updates. GFam is a hybrid approach that uses a greedy algorithm to chain component domains from InterPro annotation provided by its 12 member resources followed by a sequence-based connected component analysis of un-annotated sequence regions to derive consensus domain architecture for each sequence and subsequently generate families based on common architectures. Our integrated approach increases sequence coverage by 7.2 percentage points and residue coverage by 14.6 percentage points higher than the coverage relative to the best single-constituent database within InterPro for the proteome of Arabidopsis. The true power of GFam lies in maximizing annotation provided by the different InterPro data sources that offer resource-specific coverage for different regions of a sequence. GFam's capability to capture higher sequence and residue coverage can be useful for genome annotation, comparative genomics and functional studies. GFam is a general-purpose software and can be used for any collection of protein sequences. The software is open source and can be obtained from http://www.paccanarolab.org/software/gfam/.
Rare mtDNA variants in Leber hereditary optic neuropathy families with recurrence of myoclonus.

PubMed

La Morgia, C; Achilli, A; Iommarini, L; Barboni, P; Pala, M; Olivieri, A; Zanna, C; Vidoni, S; Tonon, C; Lodi, R; Vetrugno, R; Mostacci, B; Liguori, R; Carroccia, R; Montagna, P; Rugolo, M; Torroni, A; Carelli, V

2008-03-04

To investigate the mechanisms underlying myoclonus in Leber hereditary optic neuropathy (LHON). Five patients and one unaffected carrier from two Italian families bearing the homoplasmic 11778/ND4 and 3460/ND1 mutations underwent a uniform investigation including neurophysiologic studies, muscle biopsy, serum lactic acid after exercise, and muscle ((31)P) and cerebral ((1)H) magnetic resonance spectroscopy (MRS). Biochemical investigations on fibroblasts and complete mitochondrial DNA (mtDNA) sequences of both families were also performed. All six individuals had myoclonus. In spite of a normal EEG background and the absence of giant SEPs and C reflex, EEG-EMG back-averaging showed a preceding jerk-locked EEG potential, consistent with a cortical generator of the myoclonus. Specific comorbidities in the 11778/ND4 family included muscular cramps and psychiatric disorders, whereas features common to both families were migraine and cardiologic abnormalities. Signs of mitochondrial proliferation were seen in muscle biopsies and lactic acid elevation was observed in four of six patients. (31)P-MRS was abnormal in five of six patients and (1)H-MRS showed ventricular accumulation of lactic acid in three of six patients. Fibroblast ATP depletion was evident at 48 hours incubation with galactose in LHON/myoclonus patients. Sequence analysis revealed haplogroup T2 (11778/ND4 family) and U4a (3460/ND1 family) mtDNAs. A functional role for the non-synonymous 4136A>G/ND1, 9139G>A/ATPase6, and 15773G>A/cyt b variants was supported by amino acid conservation analysis. Myoclonus and other comorbidities characterized our Leber hereditary optic neuropathy (LHON) families. Functional investigations disclosed a bioenergetic impairment in all individuals. Our sequence analysis suggests that the LHON plus phenotype in our cases may relate to the synergic role of mtDNA variants.
Work Sequences of Women During the Family Life Cycle

ERIC Educational Resources Information Center

Young, Christabel M.

1978-01-01

Identifies main work sequences of women during the first three stages of marriage and considers the influence of level of education, birthplace, and year of marriage on work sequence. An A.I.D. analysis illustrates characteristics of women most likely to adopt a given pattern of work. (Author)
Identification of a novel mutation in a Chinese family with Nance-Horan syndrome by whole exome sequencing*

PubMed Central

Hong, Nan; Chen, Yan-hua; Xie, Chen; Xu, Bai-sheng; Huang, Hui; Li, Xin; Yang, Yue-qing; Huang, Ying-ping; Deng, Jian-lian; Qi, Ming; Gu, Yang-shun

2014-01-01

Objective: Nance-Horan syndrome (NHS) is a rare X-linked disorder characterized by congenital nuclear cataracts, dental anomalies, and craniofacial dysmorphisms. Mental retardation was present in about 30% of the reported cases. The purpose of this study was to investigate the genetic and clinical features of NHS in a Chinese family. Methods: Whole exome sequencing analysis was performed on DNA from an affected male to scan for candidate mutations on the X-chromosome. Sanger sequencing was used to verify these candidate mutations in the whole family. Clinical and ophthalmological examinations were performed on all members of the family. Results: A combination of exome sequencing and Sanger sequencing revealed a nonsense mutation c.322G>T (E108X) in exon 1 of NHS gene, co-segregating with the disease in the family. The nonsense mutation led to the conversion of glutamic acid to a stop codon (E108X), resulting in truncation of the NHS protein. Multiple sequence alignments showed that codon 108, where the mutation (c.322G>T) occurred, was located within a phylogenetically conserved region. The clinical features in all affected males and female carriers are described in detail. Conclusions: We report a nonsense mutation c.322G>T (E108X) in a Chinese family with NHS. Our findings broaden the spectrum of NHS mutations and provide molecular insight into future NHS clinical genetic diagnosis. PMID:25091991
The Solute Carrier Families Have a Remarkably Long Evolutionary History with the Majority of the Human Families Present before Divergence of Bilaterian Species

PubMed Central

Höglund, Pär J.; Nordström, Karl J.V.; Schiöth, Helgi B.; Fredriksson, Robert

2011-01-01

The Solute Carriers (SLCs) are membrane proteins that regulate transport of many types of substances over the cell membrane. The SLCs are found in at least 46 gene families in the human genome. Here, we performed the first evolutionary analysis of the entire SLC family based on whole genome sequences. We systematically mined and analyzed the genomes of 17 species to identify SLC genes. In all, we identified 4,813 SLC sequences in these genomes, and we delineated the evolutionary history of each of the subgroups. Moreover, we also identified ten new human sequences not previously classified as SLCs, which most likely belong to the SLC family. We found that 43 of the 46 SLC families found in Homo sapiens were also found in Caenorhabditis elegans, whereas 42 of them were also found in insects. Mammals have a higher number of SLC genes in most families, perhaps reflecting important roles for these in central nervous system functions. This study provides a systematic analysis of the evolutionary history of the SLC families in Eukaryotes showing that the SLC superfamily is ancient with multiple branches that were present before early divergence of Bilateria. The results provide foundation for overall classification of SLC genes and are valuable for annotation and prediction of substrates for the many SLCs that have not been tested in experimental transport assays. PMID:21186191
Structural analysis of the rDNA intergenic spacer of Brassica nigra: evolutionary divergence of the spacers of the three diploid Brassica species.

PubMed

Bhatia, S; Singh Negi, M; Lakshmikumaran, M

1996-11-01

EcoRI restriction of the B. nigra rDNA recombinants, isolated from a lambda genomic library, showed that the 3.9-kb fragment corresponded to the Intergenic Spacer (IGS), which was sequenced and found to be 3,928 bp in size. Sequence and dot-matrix analyses showed that the organization of the B. nigra rDNA IGS was typical of most rDNA spacers, consisting of a central repetitive region and flanking unique sequences on either side. The repetitive region was composed of two repeat families-RF 'A' and RF 'B.' The B. nigra RF 'A' consisted of a tandem array of three full-length copies of a 106-bp sequence element. RF 'B' was composed of 66 tandemly repeated elements. Each 'B' element was only 21-bp in size and this is the smallest repeat unit identified in plant rDNA to date. The putative transcription initiation site (TIS) was identified as nucleotide position 3,110. Based on the sequence analysis it was suggested that the present organization of the repeat families was generated by successive cycles of deletions and amplifications and was being maintained by homogenization processes such as gene conversion and crossing-over.A detailed comparison of the rDNA IGS sequences of the three diploid Brassica species-namely, B. nigra, B. campestris, and B. oleracea-was carried out. First, comparisons revealed that B. campestris and B. oleracea were close to each other as the repeat families in both showed high sequence homology between each other. Second, the repeat elements in both the species were organized in an interspersed manner. Third, a 52-bp sequence, present just downstream of the repeats in B. campestris, was found to be identical to the B. oleracea repeats, thereby suggesting a common progenitor. On the other hand, in B. nigra no interspersion pattern of organization of repeats was observed. Further, the B. nigra RF 'A' was identified as distinct from the repeat families of B. campestris and B. oleracea. Based on this analysis, it was suggested that during speciation B. campestris and B. oleracea evolved in one lineage whereas B. nigra diverged into a separate lineage. The comparative analysis of the IGS helped in identifying not only conserved ancestral sequence motifs of possible functional significance such as promoters and enhancers, but also sequences which showed variation between the three diploid species and were therefore identified as species-specific sequences.
Spencermartinsiella europaea gen. nov., sp. nov., a new member of the family Trichomonascaceae

USDA-ARS?s Scientific Manuscript database

Ten strains of a novel heterothallic yeast species were isolated from rotten wood collected at different locations in Hungary. Analysis of gene sequences for the D1/D2 domain of the large subunit ribosomal RNA, as well as analysis of concatenated gene sequences for the nearly complete nuclear large...

The challenge of annotating protein sequences: The tale of eight domains of unknown function in Pfam.

PubMed

Goonesekere, Nalin C W; Shipely, Krysten; O'Connor, Kevin

2010-06-01

The Pfam database is an important tool in genome annotation, since it provides a collection of curated protein families. However, a subset of these families, known as domains of unknown function (DUFs), remains poorly characterized. We have related sequences from DUF404, DUF407, DUF482, DUF608, DUF810, DUF853, DUF976 and DUF1111 to homologs in PDB, within the midnight zone (9-20%) of sequence identity. These relationships were extended to provide functional annotation by sequence analysis and model building. Also described are examples of residue plasticity within enzyme active sites, and change of function within homologous sequences of a DUF. Copyright 2010 Elsevier Ltd. All rights reserved.
Complete genome of Cobetia marina JCM 21022T and phylogenomic analysis of the family Halomonadaceae

NASA Astrophysics Data System (ADS)

Tang, Xianghai; Xu, Kuipeng; Han, Xiaojuan; Mo, Zhaolan; Mao, Yunxiang

2018-03-01

Cobetia marina is a model proteobacteria in researches on marine biofouling. Its taxonomic nomenclature has been revised many times over the past few decades. To better understand the role of the surface-associated lifestyle of C. marina and the phylogeny of the family Halomonadaceae, we sequenced the entire genome of C. marina JCM 21022T using single molecule real-time sequencing technology (SMRT) and performed comparative genomics and phylogenomics analyses. The circular chromosome was 4 176 300 bp with an average GC content of 62.44% and contained 3 611 predicted coding sequences, 72 tRNA genes, and 21 rRNA genes. The C. marina JCM 21022T genome contained a set of crucial genes involved in surface colonization processes. The comparative genome analysis indicated the significant differences between C. marina JCM 21022T and Cobetia amphilecti KMM 296 (formerly named C. marina KMM 296) resulted from sequence insertions or deletions and chromosomal recombination. Despite these differences, pan and core genome analysis showed similar gene functions between the two strains. The phylogenomic study of the family Halomonadaceae is reported here for the first time. We found that the relationships were well resolved among every genera tested, including Chromohalobacter, Halomonas, Cobetia, Kushneria, Zymobacter, and Halotalea.
Complete genome of Cobetia marina JCM 21022T and phylogenomic analysis of the family Halomonadaceae

NASA Astrophysics Data System (ADS)

Tang, Xianghai; Xu, Kuipeng; Han, Xiaojuan; Mo, Zhaolan; Mao, Yunxiang

2016-09-01

Cobetia marina is a model proteobacteria in researches on marine biofouling. Its taxonomic nomenclature has been revised many times over the past few decades. To better understand the role of the surface-associated lifestyle of C. marina and the phylogeny of the family Halomonadaceae, we sequenced the entire genome of C. marina JCM 21022T using single molecule real-time sequencing technology (SMRT) and performed comparative genomics and phylogenomics analyses. The circular chromosome was 4 176 300 bp with an average GC content of 62.44% and contained 3 611 predicted coding sequences, 72 tRNA genes, and 21 rRNA genes. The C. marina JCM 21022T genome contained a set of crucial genes involved in surface colonization processes. The comparative genome analysis indicated the significant diff erences between C. marina JCM 21022T and Cobetia amphilecti KMM 296 (formerly named C. marina KMM 296) resulted from sequence insertions or deletions and chromosomal recombination. Despite these diff erences, pan and core genome analysis showed similar gene functions between the two strains. The phylogenomic study of the family Halomonadaceae is reported here for the first time. We found that the relationships were well resolved among every genera tested, including Chromohalobacter, Halomonas, Cobetia, Kushneria, Zymobacter, and Halotalea.
Structural characterization of copia-type retrotransposons leads to insights into the marker development in a biofuel crop, Jatropha curcas L.

PubMed Central

2013-01-01

Background Recently, Jatropha curcas L. has attracted worldwide attention for its potential as a source of biodiesel. However, most DNA markers have demonstrated high levels of genetic similarity among and within jatropha populations around the globe. Despite promising features of copia-type retrotransposons as ideal genetic tools for gene tagging, mutagenesis, and marker-assisted selection, they have not been characterized in the jatropha genome yet. Here, we examined the diversity, evolution, and genome-wide organization of copia-type retrotransposons in the Asian, African, and Mesoamerican accessions of jatropha, then introduced a retrotransposon-based marker for this biofuel crop. Results In total, 157 PCR fragments that were amplified using the degenerate primers for the reverse transcriptase (RT) domain of copia-type retroelements were sequenced and aligned to construct the neighbor-joining tree. Phylogenetic analysis demonstrated that isolated copia RT sequences were classified into ten families, which were then grouped into three lineages. An in-depth study of the jatropha genome for the RT sequences of each family led to the characterization of full consensus sequences of the jatropha copia-type families. Estimated copy numbers of target sequences were largely different among families, as was presence of genes within 5 kb flanking regions for each family. Five copia-type families were as appealing candidates for the development of DNA marker systems. A candidate marker from family Jc7 was particularly capable of detecting genetic variation among different jatropha accessions. Fluorescence in situ hybridization (FISH) to metaphase chromosomes reveals that copia-type retrotransposons are scattered across chromosomes mainly located in the distal part regions. Conclusion This is the first report on genome-wide analysis and the cytogenetic mapping of copia-type retrotransposons of jatropha, leading to the discovery of families bearing high potential as DNA markers. Distinct dynamics of individual copia-type families, feasibility of a retrotransposon-based insertion polymorphism marker system in examining genetic variability, and approaches for the development of breeding strategies in jatropha using copia-type retrotransposons are discussed. PMID:24020916
Targeted next-generation sequencing analysis identifies novel mutations in families with severe familial exudative vitreoretinopathy.

PubMed

Huang, Xiao-Yan; Zhuang, Hong; Wu, Ji-Hong; Li, Jian-Kang; Hu, Fang-Yuan; Zheng, Yu; Tellier, Laurent Christian Asker M; Zhang, Sheng-Hai; Gao, Feng-Juan; Zhang, Jian-Guo; Xu, Ge-Zhi

2017-01-01

Familial exudative vitreoretinopathy (FEVR) is a genetically and clinically heterogeneous disease, characterized by failure of vascular development of the peripheral retina. The symptoms of FEVR vary widely among patients in the same family, and even between the two eyes of a given patient. This study was designed to identify the genetic defect in a patient cohort of ten Chinese families with a definitive diagnosis of FEVR. To identify the causative gene, next-generation sequencing (NGS)-based target capture sequencing was performed. Segregation analysis of the candidate variant was performed in additional family members by using Sanger sequencing and quantitative real-time PCR (QPCR). Of the cohort of ten FEVR families, six pathogenic variants were identified, including four novel and two known heterozygous mutations. Of the variants identified, four were missense variants, and two were novel heterozygous deletion mutations [ LRP5 , c.4053 DelC (p.Ile1351IlefsX88); TSPAN12 , EX8Del]. The two novel heterozygous deletion mutations were not observed in the control subjects and could give rise to a relatively severe FEVR phenotype, which could be explained by the protein function prediction. We identified two novel heterozygous deletion mutations [ LRP5 , c.4053 DelC (p.Ile1351IlefsX88); TSPAN12 , EX8Del] using targeted NGS as a causative mutation for FEVR. These genetic deletion variations exhibit a severe form of FEVR, with tractional retinal detachments compared with other known point mutations. The data further enrich the mutation spectrum of FEVR and enhance our understanding of genotype-phenotype correlations to provide useful information for disease diagnosis, prognosis, and effective genetic counseling.
Targeted next-generation sequencing analysis identifies novel mutations in families with severe familial exudative vitreoretinopathy

PubMed Central

Huang, Xiao-Yan; Zhuang, Hong; Wu, Ji-Hong; Li, Jian-Kang; Hu, Fang-Yuan; Zheng, Yu; Tellier, Laurent Christian Asker M.; Zhang, Sheng-Hai; Gao, Feng-Juan; Zhang, Jian-Guo

2017-01-01

Purpose Familial exudative vitreoretinopathy (FEVR) is a genetically and clinically heterogeneous disease, characterized by failure of vascular development of the peripheral retina. The symptoms of FEVR vary widely among patients in the same family, and even between the two eyes of a given patient. This study was designed to identify the genetic defect in a patient cohort of ten Chinese families with a definitive diagnosis of FEVR. Methods To identify the causative gene, next-generation sequencing (NGS)-based target capture sequencing was performed. Segregation analysis of the candidate variant was performed in additional family members by using Sanger sequencing and quantitative real-time PCR (QPCR). Results Of the cohort of ten FEVR families, six pathogenic variants were identified, including four novel and two known heterozygous mutations. Of the variants identified, four were missense variants, and two were novel heterozygous deletion mutations [LRP5, c.4053 DelC (p.Ile1351IlefsX88); TSPAN12, EX8Del]. The two novel heterozygous deletion mutations were not observed in the control subjects and could give rise to a relatively severe FEVR phenotype, which could be explained by the protein function prediction. Conclusions We identified two novel heterozygous deletion mutations [LRP5, c.4053 DelC (p.Ile1351IlefsX88); TSPAN12, EX8Del] using targeted NGS as a causative mutation for FEVR. These genetic deletion variations exhibit a severe form of FEVR, with tractional retinal detachments compared with other known point mutations. The data further enrich the mutation spectrum of FEVR and enhance our understanding of genotype–phenotype correlations to provide useful information for disease diagnosis, prognosis, and effective genetic counseling. PMID:28867931
Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies.

PubMed

Zeng, Lu; Kortschak, R Daniel; Raison, Joy M; Bertozzi, Terry; Adelson, David L

2018-01-01

Transposable Elements (TEs) are mobile DNA sequences that make up significant fractions of amniote genomes. However, they are difficult to detect and annotate ab initio because of their variable features, lengths and clade-specific variants. We have addressed this problem by refining and developing a Comprehensive ab initio Repeat Pipeline (CARP) to identify and cluster TEs and other repetitive sequences in genome assemblies. The pipeline begins with a pairwise alignment using krishna, a custom aligner. Single linkage clustering is then carried out to produce families of repetitive elements. Consensus sequences are then filtered for protein coding genes and then annotated using Repbase and a custom library of retrovirus and reverse transcriptase sequences. This process yields three types of family: fully annotated, partially annotated and unannotated. Fully annotated families reflect recently diverged/young known TEs present in Repbase. The remaining two types of families contain a mixture of novel TEs and segmental duplications. These can be resolved by aligning these consensus sequences back to the genome to assess copy number vs. length distribution. Our pipeline has three significant advantages compared to other methods for ab initio repeat identification: 1) we generate not only consensus sequences, but keep the genomic intervals for the original aligned sequences, allowing straightforward analysis of evolutionary dynamics, 2) consensus sequences represent low-divergence, recently/currently active TE families, 3) segmental duplications are annotated as a useful by-product. We have compared our ab initio repeat annotations for 7 genome assemblies to other methods and demonstrate that CARP compares favourably with RepeatModeler, the most widely used repeat annotation package.
Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies

PubMed Central

Zeng, Lu; Kortschak, R. Daniel; Raison, Joy M.

2018-01-01

Transposable Elements (TEs) are mobile DNA sequences that make up significant fractions of amniote genomes. However, they are difficult to detect and annotate ab initio because of their variable features, lengths and clade-specific variants. We have addressed this problem by refining and developing a Comprehensive ab initio Repeat Pipeline (CARP) to identify and cluster TEs and other repetitive sequences in genome assemblies. The pipeline begins with a pairwise alignment using krishna, a custom aligner. Single linkage clustering is then carried out to produce families of repetitive elements. Consensus sequences are then filtered for protein coding genes and then annotated using Repbase and a custom library of retrovirus and reverse transcriptase sequences. This process yields three types of family: fully annotated, partially annotated and unannotated. Fully annotated families reflect recently diverged/young known TEs present in Repbase. The remaining two types of families contain a mixture of novel TEs and segmental duplications. These can be resolved by aligning these consensus sequences back to the genome to assess copy number vs. length distribution. Our pipeline has three significant advantages compared to other methods for ab initio repeat identification: 1) we generate not only consensus sequences, but keep the genomic intervals for the original aligned sequences, allowing straightforward analysis of evolutionary dynamics, 2) consensus sequences represent low-divergence, recently/currently active TE families, 3) segmental duplications are annotated as a useful by-product. We have compared our ab initio repeat annotations for 7 genome assemblies to other methods and demonstrate that CARP compares favourably with RepeatModeler, the most widely used repeat annotation package. PMID:29538441
Chicken genome analysis reveals novel genes encoding biotin-binding proteins related to avidin family

PubMed Central

Niskanen, Einari A; Hytönen, Vesa P; Grapputo, Alessandro; Nordlund, Henri R; Kulomaa, Markku S; Laitinen, Olli H

2005-01-01

Background A chicken egg contains several biotin-binding proteins (BBPs), whose complete DNA and amino acid sequences are not known. In order to identify and characterise these genes and proteins we studied chicken cDNAs and genes available in the NCBI database and chicken genome database using the reported N-terminal amino acid sequences of chicken egg-yolk BBPs as search strings. Results Two separate hits showing significant homology for these N-terminal sequences were discovered. For one of these hits, the chromosomal location in the immediate proximity of the avidin gene family was found. Both of these hits encode proteins having high sequence similarity with avidin suggesting that chicken BBPs are paralogous to avidin family. In particular, almost all residues corresponding to biotin binding in avidin are conserved in these putative BBP proteins. One of the found DNA sequences, however, seems to encode a carboxy-terminal extension not present in avidin. Conclusion We describe here the predicted properties of the putative BBP genes and proteins. Our present observations link BBP genes together with avidin gene family and shed more light on the genetic arrangement and variability of this family. In addition, comparative modelling revealed the potential structural elements important for the functional and structural properties of the putative BBP proteins. PMID:15777476
Characterization of species-specific repeated DNA sequences from B. nigra.

PubMed

Gupta, V; Lakshmisita, G; Shaila, M S; Jagannathan, V; Lakshmikumaran, M S

1992-07-01

The construction and characterization of two genome-specific recombinant DNA clones from B. nigra are described. Southern analysis showed that the two clones belong to a dispersed repeat family. They differ from each other in their length, distribution and sequence, though the average GC content is nearly the same (45%). These B genome-specific repeats have been used to analyse the phylogenetic relationships between cultivated and wild species of the family Brassicaceae.
Genome-Wide Identification and Comparative Analysis of Albumin Family in Vertebrates

PubMed Central

Li, Shugang; Cao, Yiping; Geng, Fang

2017-01-01

Albumins are the most well-known globular proteins, and the most typical representatives are the serum albumins. However, less attention was paid to the albumin family, except for the human and bovine serum albumin. To characterize the features of albumin family, we have mined all the putative albumin proteins from the available genome sequences. The results showed that albumin is widely distributed in vertebrates, but not present in the bacteria and archaea. The phylogenetic analysis of vertebrate albumin family implied an evolutionary relationship between members of serum albumin, α-fetoprotein, vitamin D–binding protein, and afamin. Meanwhile, a new member from the albumin family was found, namely, extracellular matrix protein 1. The structural analysis revealed that the motifs for forming the internal disulfide bonds are highly conserved in the albumin family, despite the low overall sequence identity across the family. The domain arrangement of albumin proteins indicated that most of vertebrate albumins contain 3 characteristic domains, arising from 2 evolutionary patterns. And a significant trend has been observed that the albumin proteins in higher vertebrate species tend to possess more characteristic domains. This study has provided the fundamental information required for achieving a better understanding of the albumin distribution, phylogenetic relationship, characteristic motif, structure, and new insights into the evolutionary pattern. PMID:28680266
The partial sequence of RNA 1 of the ophiovirus Ranunculus white mottle virus indicates its relationship to rhabdoviruses and provides candidate primers for an ophiovirus-specific RT-PCR test.

PubMed

Vaira, A M; Accotto, G P; Costantini, A; Milne, R G

2003-06-01

A 4018 nucleotide sequence was obtained for RNA 1 of Ranunculus white mottle virus (RWMV), genus Ophiovirus, representing an incomplete ORF of 1339 aa. Amino acid sequence analysis revealed significant similarities with RNA polymerases of viruses in the family Rhabdoviridae and a conserved domain of 685 aa, corresponding to the RdRp domain of those in the order Mononegavirales. Phylogenetic analysis indicated that the genus Ophiovirus is not related to the genus Tenuivirus or the family Bunyaviridae, with which it has been linked, and probably deserves a special taxonomic position, within a new family. A pair of degenerate primers was designed from a consensus sequence obtained from a relatively conserved region in the RNA 1 of two members of the genus, Citrus psorosis virus (CPsV) and RWMV. The primers, used in RT-PCR experiments, amplified a 136 bp DNA fragment from all the three recognized members of the genus, i.e. CPsV, RWMV and Tulip mild mottle mosaic virus (TMMMV) and from two tentative ophioviruses from lettuce and freesia. The amplified DNAs were sequenced and compared with the corresponding sequences of CPsV and RWMV and phylogenetic relationships were evaluated. Assays using extracts from plants infected by viruses belonging to the genera Tospovirus, Tenuivirus, Rhabdovirus and Varicosavirus indicated that the primers are genus-specific.
Molecular Genetics of the Usher Syndrome in Lebanon: Identification of 11 Novel Protein Truncating Mutations by Whole Exome Sequencing

PubMed Central

Reddy, Ramesh; Fahiminiya, Somayyeh; El Zir, Elie; Mansour, Ahmad; Megarbane, Andre; Majewski, Jacek; Slim, Rima

2014-01-01

Background Usher syndrome (USH) is a genetically heterogeneous condition with ten disease-causing genes. The spectrum of genes and mutations causing USH in the Lebanese and Middle Eastern populations has not been described. Consequently, diagnostic approaches designed to screen for previously reported mutations were unlikely to identify the mutations in 11 unrelated families, eight of Lebanese and three of Middle Eastern origins. In addition, six of the ten USH genes consist of more than 20 exons, each, which made mutational analysis by Sanger sequencing of PCR-amplified exons from genomic DNA tedious and costly. The study was aimed at the identification of USH causing genes and mutations in 11 unrelated families with USH type I or II. Methods Whole exome sequencing followed by expanded familial validation by Sanger sequencing. Results We identified disease-causing mutations in all the analyzed patients in four USH genes, MYO7A, USH2A, GPR98 and CDH23. Eleven of the mutations were novel and protein truncating, including a complex rearrangement in GPR98. Conclusion Our data highlight the genetic diversity of Usher syndrome in the Lebanese population and the time and cost-effectiveness of whole exome sequencing approach for mutation analysis of genetically heterogeneous conditions caused by large genes. PMID:25211151
Molecular genetics of the Usher syndrome in Lebanon: identification of 11 novel protein truncating mutations by whole exome sequencing.

PubMed

Reddy, Ramesh; Fahiminiya, Somayyeh; El Zir, Elie; Mansour, Ahmad; Megarbane, Andre; Majewski, Jacek; Slim, Rima

2014-01-01

Usher syndrome (USH) is a genetically heterogeneous condition with ten disease-causing genes. The spectrum of genes and mutations causing USH in the Lebanese and Middle Eastern populations has not been described. Consequently, diagnostic approaches designed to screen for previously reported mutations were unlikely to identify the mutations in 11 unrelated families, eight of Lebanese and three of Middle Eastern origins. In addition, six of the ten USH genes consist of more than 20 exons, each, which made mutational analysis by Sanger sequencing of PCR-amplified exons from genomic DNA tedious and costly. The study was aimed at the identification of USH causing genes and mutations in 11 unrelated families with USH type I or II. Whole exome sequencing followed by expanded familial validation by Sanger sequencing. We identified disease-causing mutations in all the analyzed patients in four USH genes, MYO7A, USH2A, GPR98 and CDH23. Eleven of the mutations were novel and protein truncating, including a complex rearrangement in GPR98. Our data highlight the genetic diversity of Usher syndrome in the Lebanese population and the time and cost-effectiveness of whole exome sequencing approach for mutation analysis of genetically heterogeneous conditions caused by large genes.
Taxonomic evaluation of unidentified Streptomyces isolates in the ARS Culture Collection (NRRL) using multi-locus sequence analysis

USDA-ARS?s Scientific Manuscript database

The ARS Culture Collection (NRRL) currently contains 7569 strains within the family Streptomycetaceae but 4368 of them have not been characterized to the species level. A gene sequence database using the Bacterial Isolate Genomic Sequence Database package (BIGSdb) (Jolley & Maiden, 2010) is availabl...
Non-Coding RNA Analysis Using the Rfam Database.

PubMed

Kalvari, Ioanna; Nawrocki, Eric P; Argasinska, Joanna; Quinones-Olvera, Natalia; Finn, Robert D; Bateman, Alex; Petrov, Anton I

2018-06-01

Rfam is a database of non-coding RNA families in which each family is represented by a multiple sequence alignment, a consensus secondary structure, and a covariance model. Using a combination of manual and literature-based curation and a custom software pipeline, Rfam converts descriptions of RNA families found in the scientific literature into computational models that can be used to annotate RNAs belonging to those families in any DNA or RNA sequence. Valuable research outputs that are often locked up in figures and supplementary information files are encapsulated in Rfam entries and made accessible through the Rfam Web site. The data produced by Rfam have a broad application, from genome annotation to providing training sets for algorithm development. This article gives an overview of how to search and navigate the Rfam Web site, and how to annotate sequences with RNA families. The Rfam database is freely available at http://rfam.org. © 2018 by John Wiley & Sons, Inc. Copyright © 2018 John Wiley & Sons, Inc.
Patient perspectives on whole-genome sequencing for undiagnosed diseases.

PubMed

Boeldt, Debra L; Cheung, Cynthia; Ariniello, Lauren; Darst, Burcu F; Topol, Sarah; Schork, Nicholas J; Philis-Tsimikas, Athena; Torkamani, Ali; Fortmann, Addie L; Bloss, Cinnamon S

2017-01-01

This study assessed perspectives on whole-genome sequencing (WGS) for rare disease diagnosis and the process of receiving genetic results. Semistructured interviews were conducted with adult patients and parents of minor patients affected by idiopathic diseases (n = 10 cases). Three main themes were identified through qualitative data analysis and interpretation: perceived benefits of WGS; perceived drawbacks of WGS; and perceptions of the return of results from WGS. Findings suggest that patients and their families have important perspectives on the use of WGS in diagnostic odyssey cases. These perspectives could inform clinical sequencing research study designs as well as the appropriate deployment of patient and family support services in the context of clinical genome sequencing.
Use of Life Course Work–Family Profiles to Predict Mortality Risk Among US Women

PubMed Central

Guevara, Ivan Mejía; Glymour, M. Maria; Berkman, Lisa F.

2015-01-01

Objectives. We examined relationships between US women’s exposure to midlife work–family demands and subsequent mortality risk. Methods. We used data from women born 1935 to 1956 in the Health and Retirement Study to calculate employment, marital, and parenthood statuses for each age between 16 and 50 years. We used sequence analysis to identify 7 prototypical work–family trajectories. We calculated age-standardized mortality rates and hazard ratios (HRs) for mortality associated with work–family sequences, with adjustment for covariates and potentially explanatory later-life factors. Results. Married women staying home with children briefly before reentering the workforce had the lowest mortality rates. In comparison, after adjustment for age, race/ethnicity, and education, HRs for mortality were 2.14 (95% confidence interval [CI] = 1.58, 2.90) among single nonworking mothers, 1.48 (95% CI = 1.06, 1.98) among single working mothers, and 1.36 (95% CI = 1.02, 1.80) among married nonworking mothers. Adjustment for later-life behavioral and economic factors partially attenuated risks. Conclusions. Sequence analysis is a promising exposure assessment tool for life course research. This method permitted identification of certain lifetime work–family profiles associated with mortality risk before age 75 years. PMID:25713976
Epidemiological survey of idiopathic scoliosis and sequence alignment analysis of multiple candidate genes.

PubMed

Yang, Tao; Jia, Quanzhang; Guo, Hong; Xu, Jianzhong; Bai, Yun; Yang, Kai; Luo, Fei; Zhang, Zehua; Hou, Tianyong

2012-06-01

To investigate the effects of genetic factors on idiopathic scoliosis (IS) and genetic modes through genetic epidemiological survey on IS in Chongqing City, China, and to determine whether SH3GL1, GADD45B, and FGF22 in the chromosome 19p13.3 are the pathogenic genes of IS through genetic sequence analysis. 214 nuclear families were investigated to analyse the age incidence, familial aggregation, and heritability. SH3GL1, GADD45B, and FGF22 were chosen as candidate genes for mutation screening in 56 IS patients of 214 families. The sequence alignment analysis was performed to determine mutations and predict the protein structure. The average age of onset of 10.8 years suggests that IS is a early onset disease. Incidences of IS in first-, second-, third-degree relatives and the overall incidence in families (5.68%) were also significantly higher than that of the general population (1.04%). The U test indicated a significant difference, suggesting that IS has a familial aggregation. The heritability of first-degree relatives (77.68 ±10.39%), second-degree relatives (69.89 ±3.14%), and third-degree relatives (62.14 ±11.92%) illustrated that genetic factors play an important role in IS pathogenesis. The incidence of first-degree relatives (10.01%), second-degree relatives (2.55%) and third-degree relatives (1.76%) illustrated that IS is not in simple accord with monogenic Mendel's law but manifests as traits of multifactorial hereditary diseases. Sequence alignment of exons of SH3GL1, GADD45B, and FGF22 showed 17 base mutations, of which 16 mutations do not induce open reading frame (ORF) shift or amino acid changes whereas one mutation (C→T)occurred in SH3GL1 results in formation of the termination codon, which induces variation of protein reading frame. Prediction analysis of protein sequence showed that the SH3GL1 mutant encoded a truncated protein, thus affecting the protein structure. IS is a multifactorial genetic disease and SH3GL1 may be one of the pathogenic genes for IS.
Mutational analysis of the Wolfram syndrome gene in two families with chromosome 4p-linked bipolar affective disorder.

PubMed

Evans, K L; Lawson, D; Meitinger, T; Blackwood, D H; Porteous, D J

2000-04-03

Bipolar affective disorder (BPAD) is a complex disease with a significant genetic component. Heterozygous carriers of Wolfram syndrome (WFS) are at increased risk of psychiatric illness. A gene for WFS (WFS1) has recently been cloned and mapped to chromosome 4p, in the general region we previously reported as showing linkage to BPAD. Here we present sequence analysis of the WFS1 coding sequence in five affected individuals from two chromosome 4p-linked families. This resulted in the identification of six polymorphisms, two of which are predicted to change the amino acid sequence of the WFS1 protein, however none of the changes segregated with disease status. Am. J. Med. Genet. (Neuropsychiatr. Genet.) 96:158-160, 2000. Copyright 2000 Wiley-Liss, Inc.

Integrating protein structural dynamics and evolutionary analysis with Bio3D.

PubMed

Skjærven, Lars; Yao, Xin-Qiu; Scarabelli, Guido; Grant, Barry J

2014-12-10

Popular bioinformatics approaches for studying protein functional dynamics include comparisons of crystallographic structures, molecular dynamics simulations and normal mode analysis. However, determining how observed displacements and predicted motions from these traditionally separate analyses relate to each other, as well as to the evolution of sequence, structure and function within large protein families, remains a considerable challenge. This is in part due to the general lack of tools that integrate information of molecular structure, dynamics and evolution. Here, we describe the integration of new methodologies for evolutionary sequence, structure and simulation analysis into the Bio3D package. This major update includes unique high-throughput normal mode analysis for examining and contrasting the dynamics of related proteins with non-identical sequences and structures, as well as new methods for quantifying dynamical couplings and their residue-wise dissection from correlation network analysis. These new methodologies are integrated with major biomolecular databases as well as established methods for evolutionary sequence and comparative structural analysis. New functionality for directly comparing results derived from normal modes, molecular dynamics and principal component analysis of heterogeneous experimental structure distributions is also included. We demonstrate these integrated capabilities with example applications to dihydrofolate reductase and heterotrimeric G-protein families along with a discussion of the mechanistic insight provided in each case. The integration of structural dynamics and evolutionary analysis in Bio3D enables researchers to go beyond a prediction of single protein dynamics to investigate dynamical features across large protein families. The Bio3D package is distributed with full source code and extensive documentation as a platform independent R package under a GPL2 license from http://thegrantlab.org/bio3d/ .
Comprehensive analysis of genetic variations in strictly-defined Leber congenital amaurosis with whole-exome sequencing in Chinese

PubMed Central

Wang, Shi-Yuan; Zhang, Qi; Zhang, Xiang; Zhao, Pei-Quan

2016-01-01

AIM To make a comprehensive analysis of the potential pathogenic genes related with Leber congenital amaurosis (LCA) in Chinese. METHODS LCA subjects and their families were retrospectively collected from 2013 to 2015. Firstly, whole-exome sequencing was performed in patients who had underwent gene mutation screening with nothing found, and then homozygous sites was selected, candidate sites were annotated, and pathogenic analysis was conducted using softwares including Sorting Tolerant from Intolerant (SIFT), Polyphen-2, Mutation assessor, Condel, and Functional Analysis through Hidden Markov Models (FATHMM). Furthermore, Gene Ontology function and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses of pathogenic genes were performed followed by co-segregation analysis using Fisher exact Test. Sanger sequencing was used to validate single-nucleotide variations (SNVs). Expanded verification was performed in the rest patients. RESULTS Totally 51 LCA families with 53 patients and 24 family members were recruited. A total of 104 SNVs (66 LCA-related genes and 15 co-segregated genes) were submitted for expand verification. The frequencies of homozygous mutation of KRT12 and CYP1A1 were simultaneously observed in 3 families. Enrichment analysis showed that the potential pathogenic genes were mainly enriched in functions related to cell adhesion, biological adhesion, retinoid metabolic process, and eye development biological adhesion. Additionally, WFS1 and STAU2 had the highest homozygous frequencies. CONCLUSION LCA is a highly heterogeneous disease. Mutations in KRT12, CYP1A1, WFS1, and STAU2 may be involved in the development of LCA. PMID:27672588
Sequence variants in four genes underlying Bardet-Biedl syndrome in consanguineous families

PubMed Central

Ullah, Asmat; Umair, Muhammad; Yousaf, Maryam; Khan, Sher Alam; Nazim-ud-din, Muhammad; Shah, Khadim; Ahmad, Farooq; Azeem, Zahid; Ali, Ghazanfar; Alhaddad, Bader; Rafique, Afzal; Jan, Abid; Haack, Tobias B.; Strom, Tim M.; Meitinger, Thomas; Ghous, Tahseen

2017-01-01

Purpose To investigate the molecular basis of Bardet-Biedl syndrome (BBS) in five consanguineous families of Pakistani origin. Methods Linkage in two families (A and B) was established to BBS7 on chromosome 4q27, in family C to BBS8 on chromosome 14q32.1, and in family D to BBS10 on chromosome 12q21.2. Family E was investigated directly with exome sequence analysis. Results Sanger sequencing revealed two novel mutations and three previously reported mutations in the BBS genes. These mutations include two deletions (c.580_582delGCA, c.1592_1597delTTCCAG) in the BBS7 gene, a missense mutation (p.Gln449His) in the BBS8 gene, a frameshift mutation (c.271_272insT) in the BBS10 gene, and a nonsense mutation (p.Ser40*) in the MKKS (BBS6) gene. Conclusions Two novel mutations and three previously reported variants, identified in the present study, further extend the body of evidence implicating BBS6, BBS7, BBS8, and BBS10 in causing BBS. PMID:28761321
Lessons learned from whole exome sequencing in multiplex families affected by a complex genetic disorder, intracranial aneurysm.

PubMed

Farlow, Janice L; Lin, Hai; Sauerbeck, Laura; Lai, Dongbing; Koller, Daniel L; Pugh, Elizabeth; Hetrick, Kurt; Ling, Hua; Kleinloog, Rachel; van der Vlies, Pieter; Deelen, Patrick; Swertz, Morris A; Verweij, Bon H; Regli, Luca; Rinkel, Gabriel J E; Ruigrok, Ynte M; Doheny, Kimberly; Liu, Yunlong; Broderick, Joseph; Foroud, Tatiana

2015-01-01

Genetic risk factors for intracranial aneurysm (IA) are not yet fully understood. Genomewide association studies have been successful at identifying common variants; however, the role of rare variation in IA susceptibility has not been fully explored. In this study, we report the use of whole exome sequencing (WES) in seven densely-affected families (45 individuals) recruited as part of the Familial Intracranial Aneurysm study. WES variants were prioritized by functional prediction, frequency, predicted pathogenicity, and segregation within families. Using these criteria, 68 variants in 68 genes were prioritized across the seven families. Of the genes that were expressed in IA tissue, one gene (TMEM132B) was differentially expressed in aneurysmal samples (n=44) as compared to control samples (n=16) (false discovery rate adjusted p-value=0.023). We demonstrate that sequencing of densely affected families permits exploration of the role of rare variants in a relatively common disease such as IA, although there are important study design considerations for applying sequencing to complex disorders. In this study, we explore methods of WES variant prioritization, including the incorporation of unaffected individuals, multipoint linkage analysis, biological pathway information, and transcriptome profiling. Further studies are needed to validate and characterize the set of variants and genes identified in this study.
Characterization of the glutathione S-transferase gene family through ESTs and expression analyses within common and pigmented cultivars of Citrus sinensis (L.) Osbeck.

PubMed

Licciardello, Concetta; D'Agostino, Nunzio; Traini, Alessandra; Recupero, Giuseppe Reforgiato; Frusciante, Luigi; Chiusano, Maria Luisa

2014-02-03

Glutathione S-transferases (GSTs) represent a ubiquitous gene family encoding detoxification enzymes able to recognize reactive electrophilic xenobiotic molecules as well as compounds of endogenous origin. Anthocyanin pigments require GSTs for their transport into the vacuole since their cytoplasmic retention is toxic to the cell. Anthocyanin accumulation in Citrus sinensis (L.) Osbeck fruit flesh determines different phenotypes affecting the typical pigmentation of Sicilian blood oranges. In this paper we describe: i) the characterization of the GST gene family in C. sinensis through a systematic EST analysis; ii) the validation of the EST assembly by exploiting the genome sequences of C. sinensis and C. clementina and their genome annotations; iii) GST gene expression profiling in six tissues/organs and in two different sweet orange cultivars, Cadenera (common) and Moro (pigmented). We identified 61 GST transcripts, described the full- or partial-length nature of the sequences and assigned to each sequence the GST class membership exploiting a comparative approach and the classification scheme proposed for plant species. A total of 23 full-length sequences were defined. Fifty-four of the 61 transcripts were successfully aligned to the C. sinensis and C. clementina genomes. Tissue specific expression profiling demonstrated that the expression of some GST transcripts was 'tissue-affected' and cultivar specific. A comparative analysis of C. sinensis GSTs with those from other plant species was also considered. Data from the current analysis are accessible at http://biosrv.cab.unina.it/citrusGST/, with the aim to provide a reference resource for C. sinensis GSTs. This study aimed at the characterization of the GST gene family in C. sinensis. Based on expression patterns from two different cultivars and on sequence-comparative analyses, we also highlighted that two sequences, a Phi class GST and a Mapeg class GST, could be involved in the conjugation of anthocyanin pigments and in their transport into the vacuole, specifically in fruit flesh of the pigmented cultivar.
[A family of short retroposons (Squaml) from squamate reptiles (Reptilia: Squamata): structure, evolution and correlation with phylogeny].

PubMed

Kosushkin, S A; Borodulina, O R; Solov'eva, E N; Grechko, V V

2008-01-01

We have isolated and characterised sequences of a SINE family specific for squamate reptiles from a genome of lacertid lizard that we called Squam1. Copies are 360-390 bp in length and share a significant similarity with tRNA gene sequence on its 5'-end. This family was also detected by us in DNA of representatives of varanids, iguanids (anolis), gekkonids, and snakes. No signs of it were found in DNA of mammals, birds, amphibians, and crocodiles. Detailed analysis of primary structure of the retroposons obtained by us from genomic libraries or GenBank sequences was carried out. Most taxa possess 2-3 subfamilies of the SINE in their genomes with specific diagnostic features in their primary structure. Individual variability of copies in different families is about 85% and is just slightly lower on the genera level. Comparison of consensus sequences on family level reveals a high degree of structural similarity with a number of specific apomorphic features which makes it a useful marker of phylogeny for this group of reptiles. Snakes do not show specific affinity to varanids when compared to other lizards, as it was suggested earlier.
Molecular Genetic Analysis of Pakistani Families With Autosomal Recessive Congenital Cataracts by Homozygosity Screening

PubMed Central

Chen, Jianjun; Wang, Qiwei; Cabrera, Patricia E.; Zhong, Zilin; Sun, Wenmin; Jiao, Xiaodong; Chen, Yabin; Govindarajan, Gowthaman; Naeem, Muhammad Asif; Khan, Shaheen N.; Ali, Muhammad Hassaan; Assir, Muhammad Zaman; Rahman, Fawad Ur; Qazi, Zaheeruddin A.; Riazuddin, Sheikh; Akram, Javed; Riazuddin, S. Amer; Hejtmancik, J. Fielding

2017-01-01

Purpose To identify the genetic origins of autosomal recessive congenital cataracts (arCC) in the Pakistani population. Methods Based on the hypothesis that most arCC patients in consanguineous families in the Punjab areas of Pakistan should be homozygous for causative mutations, affected individuals were screened for homozygosity of nearby highly informative microsatellite markers and then screened for pathogenic mutations by DNA sequencing. A total of 83 unmapped consanguineous families were screened for mutations in 33 known candidate genes. Results Patients in 32 arCC families were homozygous for markers near at least 1 of the 33 known CC genes. Sequencing the included genes revealed homozygous cosegregating sequence changes in 10 families, 2 of which had the same variation. These included five missense, one nonsense, two frame shift, and one splice site mutations, eight of which were novel, in EPHA2, FOXE3, FYCO1, TDRD7, MIP, GALK1, and CRYBA4. Conclusions The above results confirm the usefulness of homozygosity mapping for identifying genetic defects underlying autosomal recessive disorders in consanguineous families. In our ongoing study of arCC in Pakistan, including 83 arCC families that underwent homozygosity mapping, 3 mapped using genome-wide linkage analysis in unpublished data, and 30 previously reported families, mutations were detected in approximately 37.1% (43/116) of all families studied, suggesting that additional genes might be responsible in the remaining families. The most commonly mutated gene was FYCO1 (14%), followed by CRYBB3 (5.2%), GALK1 (3.5%), and EPHA2 (2.6%). This provides the first comprehensive description of the genetic architecture of arCC in the Pakistani population. PMID:28418495
Novel arrangement and comparative analysis of hsp90 family genes in three thermotolerant species of Stratiomyidae (Diptera).

PubMed

Astakhova, L N; Zatsepina, O G; Przhiboro, A A; Evgen'ev, M B; Garbuz, D G

2013-06-01

The heat shock proteins belonging to the Hsp90 family (Hsp83 in Diptera) play a crucial role in the protection of cells due to their chaperoning functions. We sequenced hsp90 genes from three species of the family Stratiomyidae (Diptera) living in thermally different habitats and characterized by extraordinarily high thermotolerance. The sequence variation and structure of the hsp90 family genes were compared with previously described features of hsp70 copies isolated from the same species. Two functional hsp83 genes were found in the species studied, that are arranged in tandem orientation at least in one of them. This organization was not previously described. Stratiomyidae hsp83 genes share a high level of identity with hsp83 of Drosophila, and the deduced protein possesses five conserved amino acid sequence motifs characteristic of the Hsp90 family as well as the C-terminus MEEVD sequence characteristic of the cytosolic isoform. A comparison of the hsp83 promoters of two Stratiomyidae species from thermally contrasting habitats demonstrated that while both species contain canonical heat shock elements in the same position, only one of the species contains functional GAF-binding elements. Our data indicate that in the same species, hsp83 family genes show a higher evolution rate than the hsp70 family. © 2013 Royal Entomological Society.
Novel mutations of MYO7A and USH1G in Israeli Arab families with Usher syndrome type 1.

PubMed

Rizel, Leah; Safieh, Christine; Shalev, Stavit A; Mezer, Eedy; Jabaly-Habib, Haneen; Ben-Neriah, Ziva; Chervinsky, Elena; Briscoe, Daniel; Ben-Yosef, Tamar

2011-01-01

This study investigated the genetic basis for Usher syndrome type 1 (USH1) in four consanguineous Israeli Arab families. Haplotype analysis for all known USH1 loci was performed in each family. In families for which haplotype analysis was inconclusive, we performed genome-wide homozygosity mapping using a single nucleotide polymorphism (SNP) array. For mutation analysis, specific primers were used to PCR amplify the coding exons of the MYO7A, USH1C, and USH1G genes including intron-exon boundaries. Mutation screening was performed with direct sequencing. A combination of haplotype analysis and genome-wide homozygosity mapping indicated linkage to the USH1B locus in two families, USH1C in one family and USH1G in another family. Sequence analysis of the relevant genes (MYO7A, USH1C, and USH1G) led to the identification of pathogenic mutations in all families. Two of the identified mutations are novel (c.1135-1147dup in MYO7A and c.206-207insC in USH1G). USH1 is a genetically heterogenous condition. Of the five USH1 genes identified to date, USH1C and USH1G are the rarest contributors to USH1 etiology worldwide. It is therefore interesting that two of the four Israeli Arab families reported here have mutations in these two genes. This finding further demonstrates the unique genetic structure of the Israeli population in general, and the Israeli Arab population in particular, which due to high rates of consanguinity segregates many rare autosomal recessive genetic conditions.
Genetic analysis of a four generation Indian family with Usher syndrome: a novel insertion mutation in MYO7A.

PubMed

Kumar, Arun; Babu, Mohan; Kimberling, William J; Venkatesh, Conjeevaram P

2004-11-24

Usher syndrome (USH) is a rare autosomal recessive disorder characterized by deafness and retinitis pigmentosa. The purpose of this study was to determine the genetic cause of USH in a four generation Indian family. Peripheral blood samples were collected from individuals for genomic DNA isolation. To determine the linkage of this family to known USH loci, microsatellite markers were selected from the candidate regions of known loci and used to genotype the family. Exon specific intronic primers for the MYO7A gene were used to amplify DNA samples from one affected individual from the family. PCR products were subsequently sequenced to detect mutation. PCR-SSCP analysis was used to determine if the mutation segregated with the disease in the family and was not present in 50 control individuals. All affected individuals had a classic USH type I (USH1) phenotype which included deafness, vestibular dysfunction and retinitis pigmentosa. Pedigree analysis suggested an autosomal recessive mode of inheritance of USH in the family. Haplotype analysis suggested linkage of this family to the USH1B locus on chromosome 11q. DNA sequence analysis of the entire coding region of the MYO7A gene showed a novel insertion mutation c.2663_2664insA in a homozygous state in all affected individuals, resulting in truncation of MYO7A protein. This is the first study from India which reports a novel MYO7A insertion mutation in a four generation USH family. The mutation is predicted to produce a truncated MYO7A protein. With the novel mutation reported here, the total number of USH causing mutations in the MYO7A gene described to date reaches to 75.
Pstl repeat: a family of short interspersed nucleotide element (SINE)-like sequences in the genomes of cattle, goat, and buffalo.

PubMed

Sheikh, Faruk G; Mukhopadhyay, Sudit S; Gupta, Prabhakar

2002-02-01

The PstI family of elements are short, highly repetitive DNA sequences interspersed throughout the genome of the Bovidae. We have cloned and sequenced some members of the PstI family from cattle, goat, and buffalo. These elements are approximately 500 bp, have a copy number of 2 x 10(5) - 4 x 10(5), and comprise about 4% of the haploid genome. Studies of nucleotide sequence homology indicate that the buffalo and goat PstI repeats (type II) are similar types of short interspersed nucleotide element (SINE) sequences, but the cattle PstI repeat (type I) is considerably more divergent. Additionally, the goat PstI sequence showed significant sequence homology with bovine serine tRNA, and is therefore likely derived from serine tRNA. Interestingly, Southern hybridization suggests that both types of SINEs (I and II) are present in all the species of Bovidae. Dendrogram analysis indicates that cattle PstI SINE is similar to bovine Alu-like SINEs. Goat and buffalo SINEs formed a separate cluster, suggesting that these two types of SINEs evolved separately in the genome of the Bovidae.
A chemogenomic analysis of the human proteome: application to enzyme families.

PubMed

Bernasconi, Paul; Chen, Min; Galasinski, Scott; Popa-Burke, Ioana; Bobasheva, Anna; Coudurier, Louis; Birkos, Steve; Hallam, Rhonda; Janzen, William P

2007-10-01

Sequence-based phylogenies (SBP) are well-established tools for describing relationships between proteins. They have been used extensively to predict the behavior and sensitivity toward inhibitors of enzymes within a family. The utility of this approach diminishes when comparing proteins with little sequence homology. Even within an enzyme family, SBPs must be complemented by an orthogonal method that is independent of sequence to better predict enzymatic behavior. A chemogenomic approach is demonstrated here that uses the inhibition profile of a 130,000 diverse molecule library to uncover relationships within a set of enzymes. The profile is used to construct a semimetric additive distance matrix. This matrix, in turn, defines a sequence-independent phylogeny (SIP). The method was applied to 97 enzymes (kinases, proteases, and phosphatases). SIP does not use structural information from the molecules used for establishing the profile, thus providing a more heuristic method than the current approaches, which require knowledge of the specific inhibitor's structure. Within enzyme families, SIP shows a good overall correlation with SBP. More interestingly, SIP uncovers distances within families that are not recognizable by sequence-based methods. In addition, SIP allows the determination of distance between enzymes with no sequence homology, thus uncovering novel relationships not predicted by SBP. This chemogenomic approach, used in conjunction with SBP, should prove to be a powerful tool for choosing target combinations for drug discovery programs as well as for guiding the selection of profiling and liability targets.
[Identification of a HPGD mutation in three families affected with primary hypertrophic osteoarthropathy].

PubMed

Zhang, Wanying; Wang, Tao; Huang, Shuaiwu; Zhao, Xiuli

2018-04-10

To detect mutation of HPGD gene among three pedigrees affected with primary hypertrophic osteoarthropathy (PHO) by DNA sequencing and high-resolution melting (HRM) analysis. Genomic DNA was extracted from peripheral blood samples collected from the pedigrees. PCR and direct sequencing were carried out to identify potential mutations of the HPGD gene. Amplicons containing the mutation spot were generated by nested PCR. The products were then subjected to HRM analysis using the HR-1 instrument. Direct sequencing was carried out in family members and healthy individuals to confirm the result of HRM analysis. A homozygous mutation c.310_311delCT was detected in 2 affected probands, while a heterozygous mutation c.310_311delCT was detected in the third proband. HRM analysis of the fragments encompassing HPGD exon 3 showed 3 curve patterns representing three different genotypes, i.e., the wild type, the c.310_311delCT homozygote, and the c.310_311delCT heterozygote. Result of DNA sequencing was consistent with that of the HRM analysis and phenotype of the subjects. The c.310_311delCT mutation may be the most prevalent mutation among Chinese population. HRM analysis has provided an optimized method for genetic testing of HPGD mutation for its simplicity, rapid turnover and high sensitivity.
New MCM8 mutation associated with premature ovarian insufficiency and chromosomal instability in a highly consanguineous Tunisian family.

PubMed

Bouali, Nouha; Francou, Bruno; Bouligand, Jérôme; Imanci, Dilek; Dimassi, Sarra; Tosca, Lucie; Zaouali, Monia; Mougou, Soumaya; Young, Jacques; Saad, Ali; Guiochon-Mantel, Anne

2017-10-01

To identify the gene(s) involved in the etiology of premature ovarian insufficiency in a highly consanguineous Tunisian family. Genetic analysis of a large consanguineous family with several affected siblings. University hospital-based cytogenetics and molecular genetics laboratories. A highly consanguineous Tunisian family with several affected siblings born to healthy second-degree cousins. None. Targeted exome sequencing was performed by next-generation sequencing for affected family members. Mutations were validated by Sanger sequencing. Functional experiments were performed to explore the deleterious effects of the identified mutation. DNA damage was induced by increasing mitomycin C (MMC) concentrations on cultured peripheral lymphocytes. Analysis of the next-generation sequencing data revealed a new homozygous missense mutation in the minichromosome maintenance 8 gene (MCM8).This homozygous mutation (c. 482A>C; p.His161Pro) was predicted to be deleterious and segregated with the disease in the family. MCM8 participates in homologous recombination during meiosis and DNA double-stranded break repair by dimerizing with MCM9. Mcm8 knock out results in an early block in follicle development and small gonads. Given this, we tested the chromosomal breakage repair capacity of homozygous and heterozygous MCM8 p.His161Pro mutation on cultured peripheral lymphocytes exposed to increasing MMC concentrations. We found that chromosomal breakage after MMC exposure was significantly higher in cells from homozygously affected individuals than in those from a healthy control. Our findings provide additional support to the view that MCM8 mutations are involved in the primary ovarian insufficiency phenotype. Copyright © 2017 American Society for Reproductive Medicine. Published by Elsevier Inc. All rights reserved.
Identification of a novel splicing mutation within SLC17A8 in a Korean family with hearing loss by whole-exome sequencing.

PubMed

Ryu, Nari; Lee, Seokwon; Park, Hong-Joon; Lee, Byeonghyeon; Kwon, Tae-Jun; Bok, Jinwoong; Park, Chan Ik; Lee, Kyu-Yup; Baek, Jeong-In; Kim, Un-Kyung

2017-09-05

Hereditary hearing loss (HHL) is a common genetically heterogeneous disorder, which follows Mendelian inheritance in humans. Because of this heterogeneity, the identification of the causative gene of HHL by linkage analysis or Sanger sequencing have shown economic and temporal limitations. With recent advances in next-generation sequencing (NGS) techniques, rapid identification of a causative gene via massively parallel sequencing is now possible. We recruited a Korean family with three generations exhibiting autosomal dominant inheritance of hearing loss (HL), and the clinical information about this family revealed that there are no other symptoms accompanied with HL. To identify a causative mutation of HL in this family, we performed whole-exome sequencing of 4 family members, 3 affected and an unaffected. As the result, A novel splicing mutation, c.763+1G>T, in the solute carrier family 17, member 8 (SLC17A8) gene was identified in the patients, and the genotypes of the mutation were co-segregated with the phenotype of HL. Additionally, this mutation was not detected in 100 Koreans with normal hearing. Via NGS, we detected a novel splicing mutation that might influence the hearing ability within the patients with autosomal dominant non-syndromic HL. Our data suggests that this technique is a powerful tool to discover causative genetic factors of HL and facilitate diagnoses of the primary cause of HHL. Copyright © 2017 Elsevier B.V. All rights reserved.
Genetic analysis of familial non-syndromic primary failure of eruption

PubMed Central

Frazier-Bowers, S.; Simmons, D; Koehler, K; Zhou, J

2009-01-01

Objectives While some eruption disorders occur as part of a medical syndrome, primary failure of eruption (PFE) – defined as a localized failure of secondary tooth eruption -exists without systemic involvement. Recent studies support that heredity may play an important role in the pathogenesis of PFE. The objective of our human genetic study is to investigate the genetic contribution to PFE. Materials and Methods Four candidate genes POSTN, RUNX2, AMELX, and AMBN) were investigated due to their relationship to tooth eruption or putative relationship to each other. Families and individuals were ascertained based on the clinical diagnosis of PFE. Pedigrees were constructed and analyzed by inspection to determine the mode of inheritance in 4 families. The candidate genes were directly sequenced for both unrelated affected individuals and unaffected individuals. A genome wide scan using 500 microsatellite markers followed by linkage analysis was carried out for one family. Results Pedigree analysis of families suggests an autosomal dominant inheritance pattern with complete penetrance and variable expressivity. Sequence analysis revealed 2 non-functional polymorphisms in the POSTN gene and no other sequence variations in the remaining candidate genes. Genotyping and linkage analysis of one family yielded a LOD score of 1.51 for markers D13S272; D15S118 and D17S831 on chromosomes 13, 15 and 17 respectively. Conclusions While LOD scores were not significant evidence of linkage, extension of current pedigrees and novel SNP chip technology holds great promise for identification of a causative locus for PFE. Clinical Relevance When the process of normal tooth eruption fails, it may result in a clinically guarded or hopeless prognosis. Our studies aim to understand the etiological basis of Primary Failure of Eruption (PFE) toward the development of future orthodontic or pharmocologic interventions that will successfully treat this problem. PMID:19419450
Identification of (R)-selective ω-aminotransferases by exploring evolutionary sequence space.

PubMed

Kim, Eun-Mi; Park, Joon Ho; Kim, Byung-Gee; Seo, Joo-Hyun

2018-03-01

Several (R)-selective ω-aminotransferases (R-ωATs) have been reported. The existence of additional R-ωATs having different sequence characteristics from previous ones is highly expected. In addition, it is generally accepted that R-ωATs are variants of aminotransferase group III. Based on these backgrounds, sequences in RefSeq database were scored using family profiles of branched-chain amino acid aminotransferase (BCAT) and d-alanine aminotransferase (DAT) to predict and identify putative R-ωATs. Sequences with two profile analysis scores were plotted on two-dimensional score space. Candidates with relatively similar scores in both BCAT and DAT profiles (i.e., profile analysis score using BCAT profile was similar to profile analysis score using DAT profile) were selected. Experimental results for selected candidates showed that putative R-ωATs from Saccharopolyspora erythraea (R-ωAT_Sery), Bacillus cellulosilyticus (R-ωAT_Bcel), and Bacillus thuringiensis (R-ωAT_Bthu) had R-ωAT activity. Additional experiments revealed that R-ωAT_Sery also possessed DAT activity while R-ωAT_Bcel and R-ωAT_Bthu had BCAT activity. Selecting putative R-ωATs from regions with similar profile analysis scores identified potential R-ωATs. Therefore, R-ωATs could be efficiently identified by using simple family profile analysis and exploring evolutionary sequence space. Copyright © 2017 Elsevier Inc. All rights reserved.
Combined sequence and structure analysis of the fungal laccase family.

PubMed

Kumar, S V Suresh; Phale, Prashant S; Durani, S; Wangikar, Pramod P

2003-08-20

Plant and fungal laccases belong to the family of multi-copper oxidases and show much broader substrate specificity than other members of the family. Laccases have consequently been of interest for potential industrial applications. We have analyzed the essential sequence features of fungal laccases based on multiple sequence alignments of more than 100 laccases. This has resulted in identification of a set of four ungapped sequence regions, L1-L4, as the overall signature sequences that can be used to identify the laccases, distinguishing them within the broader class of multi-copper oxidases. The 12 amino acid residues in the enzymes serving as the copper ligands are housed within these four identified conserved regions, of which L2 and L4 conform to the earlier reported copper signature sequences of multi-copper oxidases while L1 and L3 are distinctive to the laccases. The mapping of regions L1-L4 on to the three-dimensional structure of the Coprinus cinerius laccase indicates that many of the non-copper-ligating residues of the conserved regions could be critical in maintaining a specific, more or less C-2 symmetric, protein conformational motif characterizing the active site apparatus of the enzymes. The observed intraprotein homologies between L1 and L3 and between L2 and L4 at both the structure and the sequence levels suggest that the quasi C-2 symmetric active site conformational motif may have arisen from a structural duplication event that neither the sequence homology analysis nor the structure homology analysis alone would have unraveled. Although the sequence and structure homology is not detectable in the rest of the protein, the relative orientation of region L1 with L2 is similar to that of L3 with L4. The structure duplication of first-shell and second-shell residues has become cryptic because the intraprotein sequence homology noticeable for a given laccase becomes significant only after comparing the conservation pattern in several fungal laccases. The identified motifs, L1-L4, can be useful in searching the newly sequenced genomes for putative laccase enzymes. Copyright 2003 Wiley Periodicals, Inc. Biotechnol Bioeng 83: 386-394, 2003.
Deciphering the molecular and functional basis of Dbl family proteins: a novel systematic approach toward classification of selective activation of the Rho family proteins.

PubMed

Jaiswal, Mamta; Dvorsky, Radovan; Ahmadian, Mohammad Reza

2013-02-08

The diffuse B-cell lymphoma (Dbl) family of the guanine nucleotide exchange factors is a direct activator of the Rho family proteins. The Rho family proteins are involved in almost every cellular process that ranges from fundamental (e.g. the establishment of cell polarity) to highly specialized processes (e.g. the contraction of vascular smooth muscle cells). Abnormal activation of the Rho proteins is known to play a crucial role in cancer, infectious and cognitive disorders, and cardiovascular diseases. However, the existence of 74 Dbl proteins and 25 Rho-related proteins in humans, which are largely uncharacterized, has led to increasing complexity in identifying specific upstream pathways. Thus, we comprehensively investigated sequence-structure-function-property relationships of 21 representatives of the Dbl protein family regarding their specificities and activities toward 12 Rho family proteins. The meta-analysis approach provides an unprecedented opportunity to broadly profile functional properties of Dbl family proteins, including catalytic efficiency, substrate selectivity, and signaling specificity. Our analysis has provided novel insights into the following: (i) understanding of the relative differences of various Rho protein members in nucleotide exchange; (ii) comparing and defining individual and overall guanine nucleotide exchange factor activities of a large representative set of the Dbl proteins toward 12 Rho proteins; (iii) grouping the Dbl family into functionally distinct categories based on both their catalytic efficiencies and their sequence-structural relationships; (iv) identifying conserved amino acids as fingerprints of the Dbl and Rho protein interaction; and (v) defining amino acid sequences conserved within, but not between, Dbl subfamilies. Therefore, the characteristics of such specificity-determining residues identified the regions or clusters conserved within the Dbl subfamilies.
The Rare-Variant Generalized Disequilibrium Test for Association Analysis of Nuclear and Extended Pedigrees with Application to Alzheimer Disease WGS Data.

PubMed

He, Zongxiao; Zhang, Di; Renton, Alan E; Li, Biao; Zhao, Linhai; Wang, Gao T; Goate, Alison M; Mayeux, Richard; Leal, Suzanne M

2017-02-02

Whole-genome and exome sequence data can be cost-effectively generated for the detection of rare-variant (RV) associations in families. Causal variants that aggregate in families usually have larger effect sizes than those found in sporadic cases, so family-based designs can be a more powerful approach than population-based designs. Moreover, some family-based designs are robust to confounding due to population admixture or substructure. We developed a RV extension of the generalized disequilibrium test (GDT) to analyze sequence data obtained from nuclear and extended families. The GDT utilizes genotype differences of all discordant relative pairs to assess associations within a family, and the RV extension combines the single-variant GDT statistic over a genomic region of interest. The RV-GDT has increased power by efficiently incorporating information beyond first-degree relatives and allows for the inclusion of covariates. Using simulated genetic data, we demonstrated that the RV-GDT method has well-controlled type I error rates, even when applied to admixed populations and populations with substructure. It is more powerful than existing family-based RV association methods, particularly for the analysis of extended pedigrees and pedigrees with missing data. We analyzed whole-genome sequence data from families affected by Alzheimer disease to illustrate the application of the RV-GDT. Given the capability of the RV-GDT to adequately control for population admixture or substructure and analyze pedigrees with missing genotype data and its superior power over other family-based methods, it is an effective tool for elucidating the involvement of RVs in the etiology of complex traits. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

The Targeted Sequencing of Alpha Satellite DNA in Cercopithecus pogonias Provides New Insight into the Diversity and Dynamics of Centromeric Repeats in Old World monkeys.

PubMed

Cacheux, Lauriane; Ponger, Loïc; Gerbault-Seureau, Michèle; Loll, François; Gey, Delphine; Richard, Florence Anne; Escudé, Christophe

2018-06-01

Alpha satellite is the major repeated DNA element of primate centromeres. Specific evolutionary mechanisms have led to a great diversity of sequence families with peculiar genomic organization and distribution, which have till now been studied mostly in great apes. Using high throughput sequencing of alpha satellite monomers obtained by enzymatic digestion followed by computational and cytogenetic analysis, we compare here the diversity and genomic distribution of alpha satellite DNA in two related Old World monkey species, Cercopithecus pogonias and Cercopithecus solatus, which are known to have diverged about seven million years ago. Two main families of monomers, called C1 and C2, are found in both species. A detailed analysis of our datasets revealed the existence of numerous subfamilies within the centromeric C1 family. Although the most abundant subfamily is conserved between both species, our FISH experiments clearly show that some subfamilies are specific for each species and that their distribution is restricted to a subset of chromosomes, thereby pointing to the existence of recurrent amplification/homogenization events. The pericentromeric C2 family is very abundant on the short arm of all acrocentric chromosomes in both species, pointing to specific mechanisms that lead to this distribution. Results obtained using two different restriction enzymes are fully consistent with a predominant monomeric organization of alpha satellite DNA which coexists with higher order organization patterns in the Cercopithecus pogonias genome. Our study suggests a high dynamics of alpha satellite DNA in Cercopithecini, with recurrent apparition of new sequence variants and interchromosomal sequence transfer.
First draft genome sequencing of indole acetic acid producing and plant growth promoting fungus Preussia sp. BSL10.

PubMed

Khan, Abdul Latif; Asaf, Sajjad; Khan, Abdur Rahim; Al-Harrasi, Ahmed; Al-Rawahi, Ahmed; Lee, In-Jung

2016-05-10

Preussia sp. BSL10, family Sporormiaceae, was actively producing phytohormone (indole-3-acetic acid) and extra-cellular enzymes (phosphatases and glucosidases). The fungus was also promoting the growth of arid-land tree-Boswellia sacra. Looking at such prospects of this fungus, we sequenced its draft genome for the first time. The Illumina based sequence analysis reveals an approximate genome size of 31.4Mbp for Preussia sp. BSL10. Based on ab initio gene prediction, total 32,312 coding sequences were annotated consisting of 11,967 coding genes, pseudogenes, and 221 tRNA genes. Furthermore, 321 carbohydrate-active enzymes were predicted and classified into many functional families. Copyright © 2016 Elsevier B.V. All rights reserved.
An Ambystoma mexicanum EST sequencing project: analysis of 17,352 expressed sequence tags from embryonic and regenerating blastema cDNA libraries

PubMed Central

Habermann, Bianca; Bebin, Anne-Gaelle; Herklotz, Stephan; Volkmer, Michael; Eckelt, Kay; Pehlke, Kerstin; Epperlein, Hans Henning; Schackert, Hans Konrad; Wiebe, Glenis; Tanaka, Elly M

2004-01-01

Background The ambystomatid salamander, Ambystoma mexicanum (axolotl), is an important model organism in evolutionary and regeneration research but relatively little sequence information has so far been available. This is a major limitation for molecular studies on caudate development, regeneration and evolution. To address this lack of sequence information we have generated an expressed sequence tag (EST) database for A. mexicanum. Results Two cDNA libraries, one made from stage 18-22 embryos and the other from day-6 regenerating tail blastemas, generated 17,352 sequences. From the sequenced ESTs, 6,377 contigs were assembled that probably represent 25% of the expressed genes in this organism. Sequence comparison revealed significant homology to entries in the NCBI non-redundant database. Further examination of this gene set revealed the presence of genes involved in important cell and developmental processes, including cell proliferation, cell differentiation and cell-cell communication. On the basis of these data, we have performed phylogenetic analysis of key cell-cycle regulators. Interestingly, while cell-cycle proteins such as the cyclin B family display expected evolutionary relationships, the cyclin-dependent kinase inhibitor 1 gene family shows an unusual evolutionary behavior among the amphibians. Conclusions Our analysis reveals the importance of a comprehensive sequence set from a representative of the Caudata and illustrates that the EST sequence database is a rich source of molecular, developmental and regeneration studies. To aid in data mining, the ESTs have been organized into an easily searchable database that is freely available online. PMID:15345051
Novel splice mutation in microthalmia-associated transcription factor in Waardenburg Syndrome.

PubMed

Brenner, Laura; Burke, Kelly; Leduc, Charles A; Guha, Saurav; Guo, Jiancheng; Chung, Wendy K

2011-01-01

Waardenburg Syndrome (WS) is a syndromic form of hearing loss associated with mutations in six different genes. We identified a large family with WS that had previously undergone clinical testing, with no reported pathogenic mutation. Using linkage analysis, a region on 3p14.1 with an LOD score of 6.6 was identified. Microthalmia-Associated Transcription Factor, a gene known to cause WS, is located within this region of linkage. Sequencing of Microthalmia-Associated Transcription Factor demonstrated a c.1212 G>A synonymous variant that segregated with the WS in the family and was predicted to cause a novel splicing site that was confirmed with expression analysis of the mRNA. This case illustrates the need to computationally analyze novel synonymous sequence variants for possible effects on splicing to maximize the clinical sensitivity of sequence-based genetic testing.
Complete nucleotide sequence of Sida golden mosaic Florida virus and phylogenetic relationships with other begomoviruses infecting malvaceous weeds in the Caribbean.

PubMed

Fiallo-Olivé, Elvira; Martínez-Zubiaur, Yamila; Moriones, Enrique; Navas-Castillo, Jesús

2010-09-01

The complete genome sequence of two isolates of the bipartite begomovirus (genus Begomovirus, family Geminiviridae) Sida golden mosaic Florida virus (SiGMFV) is presented. We propose that both isolates, found infecting Malvastrum coromandelianum (family Malvaceae) in Cuba, belong to a new strain of SiGMFV. Phylogenetic analysis showed that SiGMFV DNA-A is located in a monophyletic cluster that includes begomoviruses infecting malvaceous weeds from the Caribbean.
Structure and Function of Na+-Symporters with Inverted Repeats

PubMed Central

Abramson, Jeff; Wright, Ernest M.

2009-01-01

Summary Symporters are membrane proteins that couple energy stored in electrochemical potential gradients to drive the cotransport of molecules and ions into cells. Traditionally, proteins are classified into gene families based on sequence homology and functional properties, e.g. the sodium glucose (SLC5 or Sodium Solute Symporter Family, SSS or SSF) and GABA (SLC6 or Neurotransmitter Sodium Symporter Family, NSS or SNF) symporter families [1-4]. Recently, it has been established that four Na+-symporter proteins with unrelated sequences have a common structural core containing an inverted repeat of 5 transmembrane (TM) helices [5-8]. Analysis of these four structures reveals that they reside in different conformations along the transport cycle providing atomic insight into the mechanism of sodium solute cotransport. PMID:19631523
Prolonged and mixed non-O157 Escherichia coli infection in an Australian household.

PubMed

Staples, M; Graham, R M A; Doyle, C J; Smith, H V; Jennison, A V

2012-05-01

An Australian family was identified through a Public Health follow up on a Shiga-toxigenic Escherichia coli (STEC) positive bloody diarrhoea case, with three of the four family members experiencing either symptomatic or asymptomatic STEC shedding. Bacterial isolates were submitted to stx sequence sub-typing, multi-locus variable number tandem repeat analysis (MLVA), multi-locus sequence typing (MLST) and binary typing. The analysis revealed that there were multiple strains of STEC being shed by the family members, with similar virulence gene profiles and the same serogroup but differing in their MLVA and MLST profiles. This study illustrates the potentially complicated nature of non-O157 STEC infections and the importance of molecular epidemiology in understanding disease clusters. © 2012 QUEENSLAND HEALTH. Clinical Microbiology and Infection © 2012 European Society of Clinical Microbiology and Infectious Diseases.
The complete chloroplast genome of Cinnamomum camphora and its comparison with related Lauraceae species.

PubMed

Chen, Caihui; Zheng, Yongjie; Liu, Sian; Zhong, Yongda; Wu, Yanfang; Li, Jiang; Xu, Li-An; Xu, Meng

2017-01-01

Cinnamomum camphora , a member of the Lauraceae family, is a valuable aromatic and timber tree that is indigenous to the south of China and Japan. All parts of Cinnamomum camphora have secretory cells containing different volatile chemical compounds that are utilized as herbal medicines and essential oils. Here, we reported the complete sequencing of the chloroplast genome of Cinnamomum camphora using illumina technology. The chloroplast genome of Cinnamomum camphora is 152,570 bp in length and characterized by a relatively conserved quadripartite structure containing a large single copy region of 93,705 bp, a small single copy region of 19,093 bp and two inverted repeat (IR) regions of 19,886 bp. Overall, the genome contained 123 coding regions, of which 15 were repeated in the IR regions. An analysis of chloroplast sequence divergence revealed that the small single copy region was highly variable among the different genera in the Lauraceae family. A total of 40 repeat structures and 83 simple sequence repeats were detected in both the coding and non-coding regions. A phylogenetic analysis indicated that Calycanthus is most closely related to Lauraceae , both being members of Laurales , which forms a sister group to Magnoliids . The complete sequence of the chloroplast of Cinnamomum camphora will aid in in-depth taxonomical studies of the Lauraceae family in the future. The genetic sequence information will also have valuable applications for chloroplast genetic engineering.
High efficiency family shuffling based on multi-step PCR and in vivo DNA recombination in yeast: statistical and functional analysis of a combinatorial library between human cytochrome P450 1A1 and 1A2.

PubMed

Abécassis, V; Pompon, D; Truan, G

2000-10-15

The design of a family shuffling strategy (CLERY: Combinatorial Libraries Enhanced by Recombination in Yeast) associating PCR-based and in vivo recombination and expression in yeast is described. This strategy was tested using human cytochrome P450 CYP1A1 and CYP1A2 as templates, which share 74% nucleotide sequence identity. Construction of highly shuffled libraries of mosaic structures and reduction of parental gene contamination were two major goals. Library characterization involved multiprobe hybridization on DNA macro-arrays. The statistical analysis of randomly selected clones revealed a high proportion of chimeric genes (86%) and a homogeneous representation of the parental contribution among the sequences (55.8 +/- 2.5% for parental sequence 1A2). A microtiter plate screening system was designed to achieve colorimetric detection of polycyclic hydrocarbon hydroxylation by transformed yeast cells. Full sequences of five randomly picked and five functionally selected clones were analyzed. Results confirmed the shuffling efficiency and allowed calculation of the average length of sequence exchange and mutation rates. The efficient and statistically representative generation of mosaic structures by this type of family shuffling in a yeast expression system constitutes a novel and promising tool for structure-function studies and tuning enzymatic activities of multicomponent eucaryote complexes involving non-soluble enzymes.
DNA Barcodes of Asian Houbara Bustard (Chlamydotis undulata macqueenii)

PubMed Central

Arif, Ibrahim A.; Khan, Haseeb A.; Williams, Joseph B.; Shobrak, Mohammad; Arif, Waad I.

2012-01-01

Populations of Houbara Bustards have dramatically declined in recent years. Captive breeding and reintroduction programs have had limited success in reviving population numbers and thus new technological solutions involving molecular methods are essential for the long term survival of this species. In this study, we sequenced the 694 bp segment of COI gene of the four specimens of Asian Houbara Bustard (Chlamydotis undulata macqueenii). We also compared these sequences with earlier published barcodes of 11 individuals comprising different families of the orders Gruiformes, Ciconiiformes, Podicipediformes and Crocodylia (out group). The pair-wise sequence comparison showed a total of 254 variable sites across all the 15 sequences from different taxa. Three of the four specimens of Houbara Bustard had an identical sequence of COI gene and one individual showed a single nucleotide difference (G > A transition at position 83). Within the bustard family (Otididae), comparison among the three species (Asian Houbara Bustard, Great Bustard (Otis tarda) and the Little Bustard (Tetrax tetrax)), representing three different genera, showed 116 variable sites. For another family (Rallidae), the intra-family variable sites among the individuals of four different genera were found to be 146. The COI genetic distances among the 15 individuals varied from 0.000 to 0.431. Phylogenetic analysis using 619 bp nucleotide segment of COI clearly discriminated all the species representing different genera, families and orders. All the four specimens of Houbara Bustard formed a single clade and are clearly separated from other two individuals of the same family (Otis tarda and Tetrax tetrax). The nucleotide sequence of partial segment of COI gene effectively discriminated the closely related species. This is the first study reporting the barcodes of Houbara Bustard and would be helpful in future molecular studies, particularly for the conservation of this threatened bird in Saudi Arabia. PMID:22408462
Family genome browser: visualizing genomes with pedigree information.

PubMed

Juan, Liran; Liu, Yongzhuang; Wang, Yongtian; Teng, Mingxiang; Zang, Tianyi; Wang, Yadong

2015-07-15

Families with inherited diseases are widely used in Mendelian/complex disease studies. Owing to the advances in high-throughput sequencing technologies, family genome sequencing becomes more and more prevalent. Visualizing family genomes can greatly facilitate human genetics studies and personalized medicine. However, due to the complex genetic relationships and high similarities among genomes of consanguineous family members, family genomes are difficult to be visualized in traditional genome visualization framework. How to visualize the family genome variants and their functions with integrated pedigree information remains a critical challenge. We developed the Family Genome Browser (FGB) to provide comprehensive analysis and visualization for family genomes. The FGB can visualize family genomes in both individual level and variant level effectively, through integrating genome data with pedigree information. Family genome analysis, including determination of parental origin of the variants, detection of de novo mutations, identification of potential recombination events and identical-by-decent segments, etc., can be performed flexibly. Diverse annotations for the family genome variants, such as dbSNP memberships, linkage disequilibriums, genes, variant effects, potential phenotypes, etc., are illustrated as well. Moreover, the FGB can automatically search de novo mutations and compound heterozygous variants for a selected individual, and guide investigators to find high-risk genes with flexible navigation options. These features enable users to investigate and understand family genomes intuitively and systematically. The FGB is available at http://mlg.hit.edu.cn/FGB/. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Structural analysis of two length variants of the rDNA intergenic spacer from Eruca sativa.

PubMed

Lakshmikumaran, M; Negi, M S

1994-03-01

Restriction enzyme analysis of the rRNA genes of Eruca sativa indicated the presence of many length variants within a single plant and also between different cultivars which is unusual for most crucifers studied so far. Two length variants of the rDNA intergenic spacer (IGS) from a single individual E. sativa (cv. Itsa) plant were cloned and characterized. The complete nucleotide sequences of both the variants (3 kb and 4 kb) were determined. The intergenic spacer contains three families of tandemly repeated DNA sequences denoted as A, B and C. However, the long (4 kb) variant shows the presence of an additional repeat, denoted as D, which is a duplication of a 224 bp sequence just upstream of the putative transcription initiation site. Repeat units belonging to the three different families (A, B and C) were in the size range of 22 to 30 bp. Such short repeat elements are present in the IGS of most of the crucifers analysed so far. Sequence analysis of the variants (3 kb and 4 kb) revealed that the length heterogeneity of the spacer is located at three different regions and is due to the varying copy numbers of repeat units belonging to families A and B. Length variation of the spacer is also due to the presence of a large duplication (D repeats) in the 4 kb variant which is absent in the 3 kb variant. The putative transcription initiation site was identified by comparisons with the rDNA sequences from other plant species.
Characterization of microRNAs Expressed during Secondary Wall Biosynthesis in Acacia mangium

PubMed Central

Ong, Seong Siang; Wickneswari, Ratnam

2012-01-01

MicroRNAs (miRNAs) play critical regulatory roles by acting as sequence specific guide during secondary wall formation in woody and non-woody species. Although thousands of plant miRNAs have been sequenced, there is no comprehensive view of miRNA mediated gene regulatory network to provide profound biological insights into the regulation of xylem development. Herein, we report the involvement of six highly conserved amg-miRNA families (amg-miR166, amg-miR172, amg-miR168, amg-miR159, amg-miR394, and amg-miR156) as the potential regulatory sequences of secondary cell wall biosynthesis. Within this highly conserved amg-miRNA family, only amg-miR166 exhibited strong differences in expression between phloem and xylem tissue. The functional characterization of amg-miR166 targets in various tissues revealed three groups of HD-ZIP III: ATHB8, ATHB15, and REVOLUTA which play pivotal roles in xylem development. Although these three groups vary in their functions, -psRNA target analysis indicated that miRNA target sequences of the nine different members of HD-ZIP III are always conserved. We found that precursor structures of amg-miR166 undergo exhaustive sequence variation even within members of the same family. Gene expression analysis showed three key lignin pathway genes: C4H, CAD, and CCoAOMT were upregulated in compression wood where a cascade of miRNAs was downregulated. This study offers a comprehensive analysis on the involvement of highly conserved miRNAs implicated in the secondary wall formation of woody plants. PMID:23251324
Phylogenetic analysis of members of the Phycodnaviridae virus family, using amplified fragments of the major capsid protein gene.

PubMed

Larsen, J B; Larsen, A; Bratbak, G; Sandaa, R-A

2008-05-01

Algal viruses are considered ecologically important by affecting host population dynamics and nutrient flow in aquatic food webs. Members of the family Phycodnaviridae are also interesting due to their extraordinary genome size. Few algal viruses in the Phycodnaviridae family have been sequenced, and those that have been have few genes in common and low gene homology. It has hence been difficult to design general PCR primers that allow further studies of their ecology and diversity. In this study, we screened the nine type I core genes of the nucleocytoplasmic large DNA viruses for sequences suitable for designing a general set of primers. Sequence comparison between members of the Phycodnaviridae family, including three partly sequenced viruses infecting the prymnesiophyte Pyramimonas orientalis and the haptophytes Phaeocystis pouchetii and Chrysochromulina ericina (Pyramimonas orientalis virus 01B [PoV-01B], Phaeocystis pouchetii virus 01 [PpV-01], and Chrysochromulina ericina virus 01B [CeV-01B], respectively), revealed eight conserved regions in the major capsid protein (MCP). Two of these regions also showed conservation at the nucleotide level, and this allowed us to design degenerate PCR primers. The primers produced 347- to 518-bp amplicons when applied to lysates from algal viruses kept in culture and from natural viral communities. The aim of this work was to use the MCP as a proxy to infer phylogenetic relationships and genetic diversity among members of the Phycodnaviridae family and to determine the occurrence and diversity of this gene in natural viral communities. The results support the current legitimate genera in the Phycodnaviridae based on alga host species. However, while placing the mimivirus in close proximity to the type species, PBCV-1, of Phycodnaviridae along with the three new viruses assigned to the family (PoV-01B, PpV-01, and CeV-01B), the results also indicate that the coccolithoviruses and phaeoviruses are more diverged from this group. Phylogenetic analysis of amplicons from virus assemblages from Norwegian coastal waters as well as from isolated algal viruses revealed a cluster of viruses infecting members of the prymnesiophyte and prasinophyte alga divisions. Other distinct clusters were also identified, containing amplicons from this study as well as sequences retrieved from the Sargasso Sea metagenome. This shows that closely related sequences of this family are present at geographically distant locations within the marine environment.
Interspecific and intraspecific gene variability in a 1-Mb region containing the highest density of NBS-LRR genes found in the melon genome.

PubMed

González, Víctor M; Aventín, Núria; Centeno, Emilio; Puigdomènech, Pere

2014-12-17

Plant NBS-LRR -resistance genes tend to be found in clusters, which have been shown to be hot spots of genome variability. In melon, half of the 81 predicted NBS-LRR genes group in nine clusters, and a 1 Mb region on linkage group V contains the highest density of R-genes and presence/absence gene polymorphisms found in the melon genome. This region is known to contain the locus of Vat, an agronomically important gene that confers resistance to aphids. However, the presence of duplications makes the sequencing and annotation of R-gene clusters difficult, usually resulting in multi-gapped sequences with higher than average errors. A 1-Mb sequence that contains the largest NBS-LRR gene cluster found in melon was improved using a strategy that combines Illumina paired-end mapping and PCR-based gap closing. Unknown sequence was decreased by 70% while about 3,000 SNPs and small indels were corrected. As a result, the annotations of 18 of a total of 23 NBS-LRR genes found in this region were modified, including additional coding sequences, amino acid changes, correction of splicing boundaries, or fussion of ORFs in common transcription units. A phylogeny analysis of the R-genes and their comparison with syntenic sequences in other cucurbits point to a pattern of local gene amplifications since the diversification of cucurbits from other families, and through speciation within the family. A candidate Vat gene is proposed based on the sequence similarity between a reported Vat gene from a Korean melon cultivar and a sequence fragment previously absent in the unrefined sequence. A sequence refinement strategy allowed substantial improvement of a 1 Mb fragment of the melon genome and the re-annotation of the largest cluster of NBS-LRR gene homologues found in melon. Analysis of the cluster revealed that resistance genes have been produced by sequence duplication in adjacent genome locations since the divergence of cucurbits from other close families, and through the process of speciation within the family a candidate Vat gene was also identified using sequence previously unavailable, which demonstrates the advantages of genome assembly refinements when analyzing complex regions such as those containing clusters of highly similar genes.
ACLAME: a CLAssification of Mobile genetic Elements, update 2010.

PubMed

Leplae, Raphaël; Lima-Mendez, Gipsi; Toussaint, Ariane

2010-01-01

The ACLAME database is dedicated to the collection, analysis and classification of sequenced mobile genetic elements (MGEs, in particular phages and plasmids). In addition to providing information on the MGEs content, classifications are available at various levels of organization. At the gene/protein level, families group similar sequences that are expected to share the same function. Families of four or more proteins are manually assigned with a functional annotation using the GeneOntology and the locally developed ontology MeGO dedicated to MGEs. At the genome level, evolutionary cohesive modules group sets of protein families shared among MGEs. At the population level, networks display the reticulate evolutionary relationships among MGEs. To increase the coverage of the phage sequence space, ACLAME version 0.4 incorporates 760 high-quality predicted prophages selected from the Prophinder database. Most of the data can be downloaded from the freely accessible ACLAME web site (http://aclame.ulb.ac.be). The BLAST interface for querying the database has been extended and numerous tools for in-depth analysis of the results have been added.
Molecular and clinical studies of X-linked deafness among Pakistani families.

PubMed

Waryah, Ali M; Ahmed, Zubair M; Bhinder, Munir A; Binder, Munir A; Choo, Daniel I; Sisk, Robert A; Shahzad, Mohsin; Khan, Shaheen N; Friedman, Thomas B; Riazuddin, Sheikh; Riazuddin, Saima

2011-07-01

There are 68 sex-linked syndromes that include hearing loss as one feature and five sex-linked nonsyndromic deafness loci listed in the OMIM database. The possibility of additional such sex-linked loci was explored by ascertaining three unrelated Pakistani families (PKDF536, PKDF1132 and PKDF740) segregating X-linked recessive deafness. Sequence analysis of POU3F4 (DFN3) in affected members of families PKDF536 and PKDF1132 revealed two novel nonsense mutations, p.Q136X and p.W114X, respectively. Family PKDF740 is segregating congenital blindness, mild-to-profound progressive hearing loss that is characteristic of Norrie disease (MIM#310600). Sequence analysis of NDP among affected members of this family revealed a novel single nucleotide deletion c.49delG causing a frameshift and premature truncation (p.V17fsX1) of the encoded protein. These mutations were not found in 150 normal DNA samples. Identification of pathogenic alleles causing X-linked recessive deafness will improve molecular diagnosis, genetic counseling and molecular epidemiology of hearing loss among Pakistanis.
Molecular and Clinical Studies of X-linked Deafness Among Pakistani Families

PubMed Central

Waryah, Ali M.; Ahmed, Zubair M.; Choo, Daniel I.; Sisk, Robert A.; Binder, Munir A.; Shahzad, Mohsin; Khan, Shaheen N.; Friedman, Thomas B.; Riazuddin, Sheikh; Riazuddin, Saima

2011-01-01

There are 68 sex-linked syndromes that include hearing loss as one feature and five sex-linked nonsyndromic deafness loci listed in the OMIM database. The possibility of additional such sex-linked loci was explored by ascertaining three unrelated Pakistani families (PKDF536, PKDF1132, PKDF740) segregating X-linked recessive deafness. Sequence analysis of POU3F4 (DFN3) in affected members of families PKDF536 and PKDF1132 revealed two novel nonsense mutations, p.Q136X and p.W114X, respectively. Family PKDF740 is segregating congenital blindness, mild to profound progressive hearing loss that is characteristic of Norrie disease (MIM#310600). Sequence analysis of NDP among affected members of this family revealed a novel single nucleotide deletion c.49delG causing a frameshift and premature truncation (p.V17fsX1) of the encoded protein. These mutations were not found in 150 normal DNA samples. Identification of pathogenic alleles causing X-linked recessive deafness will improve molecular diagnosis, genetic counseling, and molecular epidemiology of hearing loss among Pakistanis. PMID:21633365
Whole-exome sequencing, without prior linkage, identifies a mutation in LAMB3 as a cause of dominant hypoplastic amelogenesis imperfecta.

PubMed

Poulter, James A; El-Sayed, Walid; Shore, Roger C; Kirkham, Jennifer; Inglehearn, Chris F; Mighell, Alan J

2014-01-01

The conventional approach to identifying the defective gene in a family with an inherited disease is to find the disease locus through family studies. However, the rapid development and decreasing cost of next generation sequencing facilitates a more direct approach. Here, we report the identification of a frameshift mutation in LAMB3 as a cause of dominant hypoplastic amelogenesis imperfecta (AI). Whole-exome sequencing of three affected family members and subsequent filtering of shared variants, without prior genetic linkage, sufficed to identify the pathogenic variant. Simultaneous analysis of multiple family members confirms segregation, enhancing the power to filter the genetic variation found and leading to rapid identification of the pathogenic variant. LAMB3 encodes a subunit of Laminin-5, one of a family of basement membrane proteins with essential functions in cell growth, movement and adhesion. Homozygous LAMB3 mutations cause junctional epidermolysis bullosa (JEB) and enamel defects are seen in JEB cases. However, to our knowledge, this is the first report of dominant AI due to a LAMB3 mutation in the absence of JEB.
Enhancing genomic laboratory reports from the patients' view: A qualitative analysis.

PubMed

Stuckey, Heather; Williams, Janet L; Fan, Audrey L; Rahm, Alanna Kulchak; Green, Jamie; Feldman, Lynn; Bonhag, Michele; Zallen, Doris T; Segal, Michael M; Williams, Marc S

2015-10-01

The purpose of this study was to develop a family genomic laboratory report designed to communicate genome sequencing results to parents of children who were participating in a whole genome sequencing clinical research study. Semi-structured interviews were conducted with parents of children who participated in a whole genome sequencing clinical research study to address the elements, language and format of a sample family-directed genome laboratory report. The qualitative interviews were followed by two focus groups aimed at evaluating example presentations of information about prognosis and next steps related to the whole genome sequencing result. Three themes emerged from the qualitative data: (i) Parents described a continual search for valid information and resources regarding their child's condition, a need that prior reports did not meet for parents; (ii) Parents believed that the Family Report would help facilitate communication with physicians and family members; and (iii) Parents identified specific items they appreciated in a genomics Family Report: simplicity of language, logical flow, visual appeal, information on what to expect in the future and recommended next steps. Parents affirmed their desire for a family genomic results report designed for their use and reference. They articulated the need for clear, easy to understand language that provided information with temporal detail and specific recommendations regarding relevant findings consistent with that available to clinicians. © 2015 Wiley Periodicals, Inc.

Phylogenetic analysis of family Neisseriaceae based on genome sequences and description of Populibacter corticis gen. nov., sp. nov., a member of the family Neisseriaceae, isolated from symptomatic bark of Populus × euramericana canker.

PubMed

Li, Yong; Xue, Han; Sang, Sheng-Qi; Lin, Cai-Li; Wang, Xi-Zhuo

2017-01-01

Two Gram-stain negative aerobic bacterial strains were isolated from the bark tissue of Populus × euramericana. The novel isolates were investigated using a polyphasic approach including 16S rRNA gene sequencing, genome sequencing, average nucleotide identity (ANI) and both phenotypic and chemotaxonomic assays. The genome core gene sequence and 16S rRNA gene phylogenies suggest that the novel isolates are different from the genera Snodgrassella and Stenoxybacter. Additionally, the ANI, G+C content, main fatty acids and phospholipid profile data supported the distinctiveness of the novel strain from genus Snodgrassella. Therefore, based on the data presented, the strains constitute a novel species of a novel genus within the family Neisseriaceae, for which the name Populibacter corticis gen. nov., sp. nov. is proposed. The type strain is 15-3-5T (= CFCC 13594T = KCTC 42251T).
A novel 5-bp deletion in Clarin 1 in a family with Usher syndrome.

PubMed

Akoury, Elie; El Zir, Elie; Mansour, Ahmad; Mégarbané, André; Majewski, Jacek; Slim, Rima

2011-11-01

To identify the genetic defect in a Lebanese family with two sibs diagnosed with Usher Syndrome. Exome capture and sequencing were performed on DNA from one affected member using Agilent in solution bead capture, followed by Illumina sequencing. This analysis revealed the presence of a novel homozygous 5-bp deletion, in Clarin 1 (CLRN1), a known gene responsible for Usher syndrome type III. The deletion is inherited from both parents and segregates with the disease phenotype in the family. The 5-bp deletion, c.301_305delGTCAT, p.Val101SerfsX27, is predicted to result in a frameshift and protein truncation after 27 amino acids. Sequencing all the coding regions of the CLRN1 gene in the proband did not reveal any other mutation or variant. Here we describe a novel deletion in CLRN1. Our data support previously reported intra familial variability in the clinical features of Usher syndrome type I and III.
Prospecting Metagenomic Enzyme Subfamily Genes for DNA Family Shuffling by a Novel PCR-based Approach*

PubMed Central

Wang, Qiuyan; Wu, Huili; Wang, Anming; Du, Pengfei; Pei, Xiaolin; Li, Haifeng; Yin, Xiaopu; Huang, Lifeng; Xiong, Xiaolong

2010-01-01

DNA family shuffling is a powerful method for enzyme engineering, which utilizes recombination of naturally occurring functional diversity to accelerate laboratory-directed evolution. However, the use of this technique has been hindered by the scarcity of family genes with the required level of sequence identity in the genome database. We describe here a strategy for collecting metagenomic homologous genes for DNA shuffling from environmental samples by truncated metagenomic gene-specific PCR (TMGS-PCR). Using identified metagenomic gene-specific primers, twenty-three 921-bp truncated lipase gene fragments, which shared 64–99% identity with each other and formed a distinct subfamily of lipases, were retrieved from 60 metagenomic samples. These lipase genes were shuffled, and selected active clones were characterized. The chimeric clones show extensive functional and genetic diversity, as demonstrated by functional characterization and sequence analysis. Our results indicate that homologous sequences of genes captured by TMGS-PCR can be used as suitable genetic material for DNA family shuffling with broad applications in enzyme engineering. PMID:20962349
Mitochondrial C4375T mutation might be a molecular risk factor in a maternal Chinese hypertensive family under haplotype C.

PubMed

Chen, Hong; Sun, Min; Fan, Zhen; Tong, Maoqing; Chen, Guodong; Li, Danhui; Ye, Jihui; Yang, Yumin; Zhu, Yongding; Zhu, Jianhua

2017-12-04

Here, we reported a Han Chinese essential hypertensive pedigree based on clinical hereditary and molecular data. To know the molecular basis on this family, mitochondrial genome of one proband from the family was identified through direct sequencing analysis. The age of onset year and affected degree of patients are different in this family. And matrilineal family members carrying C4375T mutation and belong to Eastern Asian halopgroup C. Phylogenetic analysis shows 4375C is highly conservative in 17 species. It is suggested that these mutations might participate in the development of hypertension in this family. And halopgroup C might play a modifying role on the phenotype in this Chinese hypertensive family.
Process of labeling specific chromosomes using recombinant repetitive DNA

DOEpatents

Moyzis, R.K.; Meyne, J.

1988-02-12

Chromosome preferential nucleotide sequences are first determined from a library of recombinant DNA clones having families of repetitive sequences. Library clones are identified with a low homology with a sequence of repetitive DNA families to which the first clones respectively belong and variant sequences are then identified by selecting clones having a pattern of hybridization with genomic DNA dissimilar to the hybridization pattern shown by the respective families. In another embodiment, variant sequences are selected from a sequence of a known repetitive DNA family. The selected variant sequence is classified as chromosome specific, chromosome preferential, or chromosome nonspecific. Sequences which are classified as chromosome preferential are further sequenced and regions are identified having a low homology with other regions of the chromosome preferential sequence or with known sequences of other family members and consensus sequences of the repetitive DNA families for the chromosome preferential sequences. The selected low homology regions are then hybridized with chromosomes to determine those low homology regions hybridized with a specific chromosome under normal stringency conditions.
Proteins with an Euonymus lectin-like domain are ubiquitous in Embryophyta

PubMed Central

2009-01-01

Background Cloning of the Euonymus lectin led to the discovery of a novel domain that also occurs in some stress-induced plant proteins. The distribution and the diversity of proteins with an Euonymus lectin (EUL) domain were investigated using detailed analysis of sequences in publicly accessible genome and transcriptome databases. Results Comprehensive in silico analyses indicate that the recently identified Euonymus europaeus lectin domain represents a conserved structural unit of a novel family of putative carbohydrate-binding proteins, which will further be referred to as the Euonymus lectin (EUL) family. The EUL domain is widespread among plants. Analysis of retrieved sequences revealed that some sequences consist of a single EUL domain linked to an unrelated N-terminal domain whereas others comprise two in tandem arrayed EUL domains. A new classification system for these lectins is proposed based on the overall domain architecture. Evolutionary relationships among the sequences with EUL domains are discussed. Conclusion The identification of the EUL family provides the first evidence for the occurrence in terrestrial plants of a highly conserved plant specific domain. The widespread distribution of the EUL domain strikingly contrasts the more limited or even narrow distribution of most other lectin domains found in plants. The apparent omnipresence of the EUL domain is indicative for a universal role of this lectin domain in plants. Although there is unambiguous evidence that several EUL domains possess carbohydrate-binding activity further research is required to corroborate the carbohydrate-binding properties of different members of the EUL family. PMID:19930663
Widespread and evolutionary analysis of a MITE family Monkey King in Brassicaceae.

PubMed

Dai, Shutao; Hou, Jinna; Long, Yan; Wang, Jing; Li, Cong; Xiao, Qinqin; Jiang, Xiaoxue; Zou, Xiaoxiao; Zou, Jun; Meng, Jinling

2015-06-19

Miniature inverted repeat transposable elements (MITEs) are important components of eukaryotic genomes, with hundreds of families and many copies, which may play important roles in gene regulation and genome evolution. However, few studies have investigated the molecular mechanisms involved. In our previous study, a Tourist-like MITE, Monkey King, was identified from the promoter region of a flowering time gene, BnFLC.A10, in Brassica napus. Based on this MITE, the characteristics and potential roles on gene regulation of the MITE family were analyzed in Brassicaceae. The characteristics of the Tourist-like MITE family Monkey King in Brassicaceae, including its distribution, copies and insertion sites in the genomes of major Brassicaceae species were analyzed in this study. Monkey King was actively amplified in Brassica after divergence from Arabidopsis, which was indicated by the prompt increase in copy number and by phylogenetic analysis. The genomic variations caused by Monkey King insertions, both intra- and inter-species in Brassica, were traced by PCR amplification. Genomic sequence analysis showed that most complete Monkey King elements are located in gene-rich regions, less than 3kb from genes, in both the B. rapa and A. thaliana genomes. Sixty-seven Brassica expressed sequence tags carrying Monkey King fragments were also identified from the NCBI database. Bisulfite sequencing identified specific DNA methylation of cytosine residues in the Monkey King sequence. A fragment containing putative TATA-box motifs in the MITE sequence could bind with nuclear protein(s) extracted from leaves of B. napus plants. A Monkey King-related microRNA, bna-miR6031, was identified in the microRNA database. In transgenic A. thaliana, when the Monkey King element was inserted upstream of 35S promoter, the promoter activity was weakened. Monkey King, a Brassicaceae Tourist-like MITE family, has amplified relatively recently and has induced intra- and inter-species genomic variations in Brassica. Monkey King elements are most abundant in the vicinity of genes and may have a substantial effect on genome-wide gene regulation in Brassicaceae. Monkey King insertions potentially regulate gene expression and genome evolution through epigenetic modification and new regulatory motif production.
The genomic underpinnings of eukaryotic virus taxonomy: creating a sequence-based framework for family-level virus classification.

PubMed

Aiewsakun, Pakorn; Simmonds, Peter

2018-02-20

The International Committee on Taxonomy of Viruses (ICTV) classifies viruses into families, genera and species and provides a regulated system for their nomenclature that is universally used in virus descriptions. Virus taxonomic assignments have traditionally been based upon virus phenotypic properties such as host range, virion morphology and replication mechanisms, particularly at family level. However, gene sequence comparisons provide a clearer guide to their evolutionary relationships and provide the only information that may guide the incorporation of viruses detected in environmental (metagenomic) studies that lack any phenotypic data. The current study sought to determine whether the existing virus taxonomy could be reproduced by examination of genetic relationships through the extraction of protein-coding gene signatures and genome organisational features. We found large-scale consistency between genetic relationships and taxonomic assignments for viruses of all genome configurations and genome sizes. The analysis pipeline that we have called 'Genome Relationships Applied to Virus Taxonomy' (GRAViTy) was highly effective at reproducing the current assignments of viruses at family level as well as inter-family groupings into orders. Its ability to correctly differentiate assigned viruses from unassigned viruses, and classify them into the correct taxonomic group, was evaluated by threefold cross-validation technique. This predicted family membership of eukaryotic viruses with close to 100% accuracy and specificity potentially enabling the algorithm to predict assignments for the vast corpus of metagenomic sequences consistently with ICTV taxonomy rules. In an evaluation run of GRAViTy, over one half (460/921) of (near)-complete genome sequences from several large published metagenomic eukaryotic virus datasets were assigned to 127 novel family-level groupings. If corroborated by other analysis methods, these would potentially more than double the number of eukaryotic virus families in the ICTV taxonomy. A rapid and objective means to explore metagenomic viral diversity and make informed recommendations for their assignments at each taxonomic layer is essential. GRAViTy provides one means to make rule-based assignments at family and order levels in a manner that preserves the integrity and underlying organisational principles of the current ICTV taxonomy framework. Such methods are increasingly required as the vast virosphere is explored.
Phylogeny of the family Moraxellaceae by 16S rDNA sequence analysis, with special emphasis on differentiation of Moraxella species.

PubMed

Pettersson, B; Kodjo, A; Ronaghi, M; Uhlén, M; Tønjum, T

1998-01-01

Thirty-three strains previously classified into 11 species in the bacterial family Moraxellaceae were subjected to phylogenetic analysis based on 16S rRNA sequences. The family Moraxellaceae formed a distinct clade consisting of four phylogenetic groups as judged from branch lengths, bootstrap values and signature nucleotides. Group I contained the classical moraxellae and strains of the coccal moraxellae, previously known as Branhamella, with 16S rRNA similarity of > or = 95%. A further division of group I into five tentative clusters is discussed. Group II consisted of two strains representing Moraxella atlantae and Moraxella osloensis. These strains were only distantly related to each other (93.4%) and also to the other members of the Moraxellaceae (< or = 93%). Therefore, reasons for reclassification of these species into separate and new genera are discussed. Group III harboured strains of the genus Psychrobacter and strain 752/52 of [Moraxella] phenylpyruvica. This strain of [M.] phenylpyruvica formed an early branch from the group III line of descent. Interestingly, a distant relationship was found between Psychrobacter phenylpyruvicus strain ATCC 23333T (formerly classified as [M.] phenylpyruvica) and [M.] phenylpyruvica strain 752/52, exhibiting less than 96% nucleotide similarity between their 16S rRNA sequences. The establishment of a new genus for [M.] phenylpyruvica strain 752/52 is therefore suggested. Group IV contained only two strains of the genus Acinetobacter. Strategies for the development of diagnostic probes and distinctive sequences for 16S rRNA-based species-specific assays within group I are suggested. Although these findings add to the classificatory placements within the Moraxellaceae, analysis of a more comprehensive selection of strains is still needed to obtain a complete classification system within this family.
Systematic Analysis of Primary Sequence Domain Segments for the Discrimination Between Class C GPCR Subtypes.

PubMed

König, Caroline; Alquézar, René; Vellido, Alfredo; Giraldo, Jesús

2018-03-01

G-protein-coupled receptors (GPCRs) are a large and diverse super-family of eukaryotic cell membrane proteins that play an important physiological role as transmitters of extracellular signal. In this paper, we investigate Class C, a member of this super-family that has attracted much attention in pharmacology. The limited knowledge about the complete 3D crystal structure of Class C receptors makes necessary the use of their primary amino acid sequences for analytical purposes. Here, we provide a systematic analysis of distinct receptor sequence segments with regard to their ability to differentiate between seven class C GPCR subtypes according to their topological location in the extracellular, transmembrane, or intracellular domains. We build on the results from the previous research that provided preliminary evidence of the potential use of separated domains of complete class C GPCR sequences as the basis for subtype classification. The use of the extracellular N-terminus domain alone was shown to result in a minor decrease in subtype discrimination in comparison with the complete sequence, despite discarding much of the sequence information. In this paper, we describe the use of Support Vector Machine-based classification models to evaluate the subtype-discriminating capacity of the specific topological sequence segments.
Genome-wide identification and characterization of SnRK2 gene family in cotton (Gossypium hirsutum L.).

PubMed

Liu, Zhao; Ge, Xiaoyang; Yang, Zuoren; Zhang, Chaojun; Zhao, Ge; Chen, Eryong; Liu, Ji; Zhang, Xueyan; Li, Fuguang

2017-06-12

Sucrose non-fermenting-1-related protein kinase 2 (SnRK2) is a plant-specific serine/threonine kinase family involved in the abscisic acid (ABA) signaling pathway and responds to osmotic stress. A genome-wide analysis of this protein family has been conducted previously in some plant species, but little is known about SnRK2 genes in upland cotton (Gossypium hirsutum L.). The recent release of the G. hirsutum genome sequence provides an opportunity to identify and characterize the SnRK2 kinase family in upland cotton. We identified 20 putative SnRK2 sequences in the G. hirsutum genome, designated as GhSnRK2.1 to GhSnRK2.20. All of the sequences encoded hydrophilic proteins. Phylogenetic analysis showed that the GhSnRK2 genes were classifiable into three groups. The chromosomal location and phylogenetic analysis of the cotton SnRK2 genes indicated that segmental duplication likely contributed to the diversification and evolution of the genes. The gene structure and motif composition of the cotton SnRK2 genes were analyzed. Nine exons were conserved in length among all members of the GhSnRK2 family. Although the C-terminus was divergent, seven conserved motifs were present. All GhSnRK2s genes showed expression patterns under abiotic stress based on transcriptome data. The expression profiles of five selected genes were verified in various tissues by quantitative real-time RT-PCR (qRT-PCR). Transcript levels of some family members were up-regulated in response to drought, salinity or ABA treatments, consistent with potential roles in response to abiotic stress. This study is the first comprehensive analysis of SnRK2 genes in upland cotton. Our results provide the fundamental information for the functional dissection of GhSnRK2s and vital availability for the improvement of plant stress tolerance using GhSnRK2s.
Structure-sequence based analysis for identification of conserved regions in proteins

DOEpatents

Zemla, Adam T; Zhou, Carol E; Lam, Marisa W; Smith, Jason R; Pardes, Elizabeth

2013-05-28

Disclosed are computational methods, and associated hardware and software products for scoring conservation in a protein structure based on a computationally identified family or cluster of protein structures. A method of computationally identifying a family or cluster of protein structures in also disclosed herein.
Minke whale genome and aquatic adaptation in cetaceans

PubMed Central

Yim, Hyung-Soon; Cho, Yun Sung; Guang, Xuanmin; Kang, Sung Gyun; Jeong, Jae-Yeon; Cha, Sun-Shin; Oh, Hyun-Myung; Lee, Jae-Hak; Yang, Eun Chan; Kwon, Kae Kyoung; Kim, Yun Jae; Kim, Tae Wan; Kim, Wonduck; Jeon, Jeong Ho; Kim, Sang-Jin; Choi, Dong Han; Jho, Sungwoong; Kim, Hak-Min; Ko, Junsu; Kim, Hyunmin; Shin, Young-Ah; Jung, Hyun-Ju; Zheng, Yuan; Wang, Zhuo; Chen, Yan

2014-01-01

The shift from terrestrial to aquatic life by whales was a substantial evolutionary event. Here we report the whole-genome sequencing and de novo assembly of the minke whale genome, as well as the whole-genome sequences of three minke whales, a fin whale, a bottlenose dolphin and a finless porpoise. Our comparative genomic analysis identified an expansion in the whale lineage of gene families associated with stress-responsive proteins and anaerobic metabolism, whereas gene families related to body hair and sensory receptors were contracted. Our analysis also identified whale-specific mutations in genes encoding antioxidants and enzymes controlling blood pressure and salt concentration. Overall the whale-genome sequences exhibited distinct features that are associated with the physiological and morphological changes needed for life in an aquatic environment, marked by resistance to physiological stresses caused by a lack of oxygen, increased amounts of reactive oxygen species and high salt levels. PMID:24270359
Minke whale genome and aquatic adaptation in cetaceans.

PubMed

Yim, Hyung-Soon; Cho, Yun Sung; Guang, Xuanmin; Kang, Sung Gyun; Jeong, Jae-Yeon; Cha, Sun-Shin; Oh, Hyun-Myung; Lee, Jae-Hak; Yang, Eun Chan; Kwon, Kae Kyoung; Kim, Yun Jae; Kim, Tae Wan; Kim, Wonduck; Jeon, Jeong Ho; Kim, Sang-Jin; Choi, Dong Han; Jho, Sungwoong; Kim, Hak-Min; Ko, Junsu; Kim, Hyunmin; Shin, Young-Ah; Jung, Hyun-Ju; Zheng, Yuan; Wang, Zhuo; Chen, Yan; Chen, Ming; Jiang, Awei; Li, Erli; Zhang, Shu; Hou, Haolong; Kim, Tae Hyung; Yu, Lili; Liu, Sha; Ahn, Kung; Cooper, Jesse; Park, Sin-Gi; Hong, Chang Pyo; Jin, Wook; Kim, Heui-Soo; Park, Chankyu; Lee, Kyooyeol; Chun, Sung; Morin, Phillip A; O'Brien, Stephen J; Lee, Hang; Kimura, Jumpei; Moon, Dae Yeon; Manica, Andrea; Edwards, Jeremy; Kim, Byung Chul; Kim, Sangsoo; Wang, Jun; Bhak, Jong; Lee, Hyun Sook; Lee, Jung-Hyun

2014-01-01

The shift from terrestrial to aquatic life by whales was a substantial evolutionary event. Here we report the whole-genome sequencing and de novo assembly of the minke whale genome, as well as the whole-genome sequences of three minke whales, a fin whale, a bottlenose dolphin and a finless porpoise. Our comparative genomic analysis identified an expansion in the whale lineage of gene families associated with stress-responsive proteins and anaerobic metabolism, whereas gene families related to body hair and sensory receptors were contracted. Our analysis also identified whale-specific mutations in genes encoding antioxidants and enzymes controlling blood pressure and salt concentration. Overall the whale-genome sequences exhibited distinct features that are associated with the physiological and morphological changes needed for life in an aquatic environment, marked by resistance to physiological stresses caused by a lack of oxygen, increased amounts of reactive oxygen species and high salt levels.
Genome-wide analysis of the R2R3-MYB transcription factor gene family in sweet orange (Citrus sinensis).

PubMed

Liu, Chaoyang; Wang, Xia; Xu, Yuantao; Deng, Xiuxin; Xu, Qiang

2014-10-01

MYB transcription factor represents one of the largest gene families in plant genomes. Sweet orange (Citrus sinensis) is one of the most important fruit crops worldwide, and recently the genome has been sequenced. This provides an opportunity to investigate the organization and evolutionary characteristics of sweet orange MYB genes from whole genome view. In the present study, we identified 100 R2R3-MYB genes in the sweet orange genome. A comprehensive analysis of this gene family was performed, including the phylogeny, gene structure, chromosomal localization and expression pattern analyses. The 100 genes were divided into 29 subfamilies based on the sequence similarity and phylogeny, and the classification was also well supported by the highly conserved exon/intron structures and motif composition. The phylogenomic comparison of MYB gene family among sweet orange and related plant species, Arabidopsis, cacao and papaya suggested the existence of functional divergence during evolution. Expression profiling indicated that sweet orange R2R3-MYB genes exhibited distinct temporal and spatial expression patterns. Our analysis suggested that the sweet orange MYB genes may play important roles in different plant biological processes, some of which may be potentially involved in citrus fruit quality. These results will be useful for future functional analysis of the MYB gene family in sweet orange.
Bat guano virome: predominance of dietary viruses from insects and plants plus novel mammalian viruses

USGS Publications Warehouse

Li, Linlin; Joseph, G. Victoria; Wang, Chunlin; Jones, Morris; Fellers, Gary M.; Kunz, Thomas H.; Delwart, Eric

2010-01-01

Bats are hosts to a variety of viruses capable of zoonotic transmissions. Because of increased contact between bats, humans, and other animal species, the possibility exists for further cross-species transmissions and ensuing disease outbreaks. We describe here full and partial viral genomes identified using metagenomics in the guano of bats from California and Texas. A total of 34% and 58% of 390,000 sequence reads from bat guano in California and Texas, respectively, were related to eukaryotic viruses, and the largest proportion of those infect insects, reflecting the diet of these insectivorous bats, including members of the viral families Dicistroviridae, Iflaviridae, Tetraviridae, and Nodaviridae and the subfamily Densovirinae. The second largest proportion of virus-related sequences infects plants and fungi, likely reflecting the diet of ingested insects, including members of the viral families Luteoviridae, Secoviridae, Tymoviridae, and Partitiviridae and the genus Sobemovirus. Bat guano viruses related to those infecting mammals comprised the third largest group, including members of the viral families Parvoviridae, Circoviridae, Picornaviridae, Adenoviridae, Poxviridae, Astroviridae, and Coronaviridae. No close relative of known human viral pathogens was identified in these bat populations. Phylogenetic analysis was used to clarify the relationship to known viral taxa of novel sequences detected in bat guano samples, showing that some guano viral sequences fall outside existing taxonomic groups. This initial characterization of the bat guano virome, the first metagenomic analysis of viruses in wild mammals using second-generation sequencing, therefore showed the presence of previously unidentified viral species, genera, and possibly families. Viral metagenomics is a useful tool for genetically characterizing viruses present in animals with the known capability of direct or indirect viral zoonosis to humans.
Bat Guano Virome: Predominance of Dietary Viruses from Insects and Plants plus Novel Mammalian Viruses▿

PubMed Central

Li, Linlin; Victoria, Joseph G.; Wang, Chunlin; Jones, Morris; Fellers, Gary M.; Kunz, Thomas H.; Delwart, Eric

2010-01-01

Bats are hosts to a variety of viruses capable of zoonotic transmissions. Because of increased contact between bats, humans, and other animal species, the possibility exists for further cross-species transmissions and ensuing disease outbreaks. We describe here full and partial viral genomes identified using metagenomics in the guano of bats from California and Texas. A total of 34% and 58% of 390,000 sequence reads from bat guano in California and Texas, respectively, were related to eukaryotic viruses, and the largest proportion of those infect insects, reflecting the diet of these insectivorous bats, including members of the viral families Dicistroviridae, Iflaviridae, Tetraviridae, and Nodaviridae and the subfamily Densovirinae. The second largest proportion of virus-related sequences infects plants and fungi, likely reflecting the diet of ingested insects, including members of the viral families Luteoviridae, Secoviridae, Tymoviridae, and Partitiviridae and the genus Sobemovirus. Bat guano viruses related to those infecting mammals comprised the third largest group, including members of the viral families Parvoviridae, Circoviridae, Picornaviridae, Adenoviridae, Poxviridae, Astroviridae, and Coronaviridae. No close relative of known human viral pathogens was identified in these bat populations. Phylogenetic analysis was used to clarify the relationship to known viral taxa of novel sequences detected in bat guano samples, showing that some guano viral sequences fall outside existing taxonomic groups. This initial characterization of the bat guano virome, the first metagenomic analysis of viruses in wild mammals using second-generation sequencing, therefore showed the presence of previously unidentified viral species, genera, and possibly families. Viral metagenomics is a useful tool for genetically characterizing viruses present in animals with the known capability of direct or indirect viral zoonosis to humans. PMID:20463061
Bat guano virome: predominance of dietary viruses from insects and plants plus novel mammalian viruses.

PubMed

Li, Linlin; Victoria, Joseph G; Wang, Chunlin; Jones, Morris; Fellers, Gary M; Kunz, Thomas H; Delwart, Eric

2010-07-01

Bats are hosts to a variety of viruses capable of zoonotic transmissions. Because of increased contact between bats, humans, and other animal species, the possibility exists for further cross-species transmissions and ensuing disease outbreaks. We describe here full and partial viral genomes identified using metagenomics in the guano of bats from California and Texas. A total of 34% and 58% of 390,000 sequence reads from bat guano in California and Texas, respectively, were related to eukaryotic viruses, and the largest proportion of those infect insects, reflecting the diet of these insectivorous bats, including members of the viral families Dicistroviridae, Iflaviridae, Tetraviridae, and Nodaviridae and the subfamily Densovirinae. The second largest proportion of virus-related sequences infects plants and fungi, likely reflecting the diet of ingested insects, including members of the viral families Luteoviridae, Secoviridae, Tymoviridae, and Partitiviridae and the genus Sobemovirus. Bat guano viruses related to those infecting mammals comprised the third largest group, including members of the viral families Parvoviridae, Circoviridae, Picornaviridae, Adenoviridae, Poxviridae, Astroviridae, and Coronaviridae. No close relative of known human viral pathogens was identified in these bat populations. Phylogenetic analysis was used to clarify the relationship to known viral taxa of novel sequences detected in bat guano samples, showing that some guano viral sequences fall outside existing taxonomic groups. This initial characterization of the bat guano virome, the first metagenomic analysis of viruses in wild mammals using second-generation sequencing, therefore showed the presence of previously unidentified viral species, genera, and possibly families. Viral metagenomics is a useful tool for genetically characterizing viruses present in animals with the known capability of direct or indirect viral zoonosis to humans.
Evidence for Widespread Reticulate Evolution within Human Duplicons

PubMed Central

Jackson, Michael S. ; Oliver, Karen ; Loveland, Jane ; Humphray, Sean ; Dunham, Ian ; Rocchi, Mariano ; Viggiano, Luigi ; Park, Jonathan P. ; Hurles, Matthew E. ; Santibanez-Koref, Mauro

2005-01-01

Approximately 5% of the human genome consists of segmental duplications that can cause genomic mutations and may play a role in gene innovation. Reticulate evolutionary processes, such as unequal crossing-over and gene conversion, are known to occur within specific duplicon families, but the broader contribution of these processes to the evolution of human duplications remains poorly characterized. Here, we use phylogenetic profiling to analyze multiple alignments of 24 human duplicon families that span >8 Mb of DNA. Our results indicate that none of them are evolving independently, with all alignments showing sharp discontinuities in phylogenetic signal consistent with reticulation. To analyze these results in more detail, we have developed a quartet method that estimates the relative contribution of nucleotide substitution and reticulate processes to sequence evolution. Our data indicate that most of the duplications show a highly significant excess of sites consistent with reticulate evolution, compared with the number expected by nucleotide substitution alone, with 15 of 30 alignments showing a >20-fold excess over that expected. Using permutation tests, we also show that at least 5% of the total sequence shares 100% sequence identity because of reticulation, a figure that includes 74 independent tracts of perfect identity >2 kb in length. Furthermore, analysis of a subset of alignments indicates that the density of reticulation events is as high as 1 every 4 kb. These results indicate that phylogenetic relationships within recently duplicated human DNA can be rapidly disrupted by reticulate evolution. This finding has important implications for efforts to finish the human genome sequence, complicates comparative sequence analysis of duplicon families, and could profoundly influence the tempo of gene-family evolution. PMID:16252241
North Carolina macular dystrophy (MCDR1) caused by a novel tandem duplication of the PRDM13 gene

PubMed Central

Sullivan, Lori S.; Wheaton, Dianna K.; Locke, Kirsten G.; Jones, Kaylie D.; Koboldt, Daniel C.; Fulton, Robert S.; Wilson, Richard K.; Blanton, Susan H.; Birch, David G.; Daiger, Stephen P.

2016-01-01

Purpose To identify the underlying cause of disease in a large family with North Carolina macular dystrophy (NCMD). Methods A large four-generation family (RFS355) with an autosomal dominant form of NCMD was ascertained. Family members underwent comprehensive visual function evaluations. Blood or saliva from six affected family members and three unaffected spouses was collected and DNA tested for linkage to the MCDR1 locus on chromosome 6q12. Three affected family members and two unaffected spouses underwent whole exome sequencing (WES) and subsequently, custom capture of the linkage region followed by next-generation sequencing (NGS). Standard PCR and dideoxy sequencing were used to further characterize the mutation. Results Of the 12 eyes examined in six affected individuals, all but two had Gass grade 3 macular degeneration features. Large central excavation of the retinal and choroid layers, referred to as a macular caldera, was seen in an age-independent manner in the grade 3 eyes. The calderas are unique to affected individuals with MCDR1. Genome-wide linkage mapping and haplotype analysis of markers from the chromosome 6q region were consistent with linkage to the MCDR1 locus. Whole exome sequencing and custom-capture NGS failed to reveal any rare coding variants segregating with the phenotype. Analysis of the custom-capture NGS sequencing data for copy number variants uncovered a tandem duplication of approximately 60 kb on chromosome 6q. This region contains two genes, CCNC and PRDM13. The duplication creates a partial copy of CCNC and a complete copy of PRDM13. The duplication was found in all affected members of the family and is not present in any unaffected members. The duplication was not seen in 200 ethnically matched normal chromosomes. Conclusions The cause of disease in the original family with MCDR1 and several others has been recently reported to be dysregulation of the PRDM13 gene, caused by either single base substitutions in a DNase 1 hypersensitive site upstream of the CCNC and PRDM13 genes or a tandem duplication of the PRDM13 gene. The duplication found in the RFS355 family is distinct from the previously reported duplication and provides additional support that dysregulation of PRDM13, not CCNC, is the cause of NCMD mapped to the MCDR1 locus. PMID:27777503

North Carolina macular dystrophy (MCDR1) caused by a novel tandem duplication of the PRDM13 gene.

PubMed

Bowne, Sara J; Sullivan, Lori S; Wheaton, Dianna K; Locke, Kirsten G; Jones, Kaylie D; Koboldt, Daniel C; Fulton, Robert S; Wilson, Richard K; Blanton, Susan H; Birch, David G; Daiger, Stephen P

2016-01-01

To identify the underlying cause of disease in a large family with North Carolina macular dystrophy (NCMD). A large four-generation family (RFS355) with an autosomal dominant form of NCMD was ascertained. Family members underwent comprehensive visual function evaluations. Blood or saliva from six affected family members and three unaffected spouses was collected and DNA tested for linkage to the MCDR1 locus on chromosome 6q12. Three affected family members and two unaffected spouses underwent whole exome sequencing (WES) and subsequently, custom capture of the linkage region followed by next-generation sequencing (NGS). Standard PCR and dideoxy sequencing were used to further characterize the mutation. Of the 12 eyes examined in six affected individuals, all but two had Gass grade 3 macular degeneration features. Large central excavation of the retinal and choroid layers, referred to as a macular caldera, was seen in an age-independent manner in the grade 3 eyes. The calderas are unique to affected individuals with MCDR1. Genome-wide linkage mapping and haplotype analysis of markers from the chromosome 6q region were consistent with linkage to the MCDR1 locus. Whole exome sequencing and custom-capture NGS failed to reveal any rare coding variants segregating with the phenotype. Analysis of the custom-capture NGS sequencing data for copy number variants uncovered a tandem duplication of approximately 60 kb on chromosome 6q. This region contains two genes, CCNC and PRDM13 . The duplication creates a partial copy of CCNC and a complete copy of PRDM13 . The duplication was found in all affected members of the family and is not present in any unaffected members. The duplication was not seen in 200 ethnically matched normal chromosomes. The cause of disease in the original family with MCDR1 and several others has been recently reported to be dysregulation of the PRDM13 gene, caused by either single base substitutions in a DNase 1 hypersensitive site upstream of the CCNC and PRDM13 genes or a tandem duplication of the PRDM13 gene. The duplication found in the RFS355 family is distinct from the previously reported duplication and provides additional support that dysregulation of PRDM13 , not CCNC , is the cause of NCMD mapped to the MCDR1 locus.
Repetitive DNA in the pea (Pisum sativum L.) genome: comprehensive characterization using 454 sequencing and comparison to soybean and Medicago truncatula

PubMed Central

Macas, Jiří; Neumann, Pavel; Navrátilová, Alice

2007-01-01

Background Extraordinary size variation of higher plant nuclear genomes is in large part caused by differences in accumulation of repetitive DNA. This makes repetitive DNA of great interest for studying the molecular mechanisms shaping architecture and function of complex plant genomes. However, due to methodological constraints of conventional cloning and sequencing, a global description of repeat composition is available for only a very limited number of higher plants. In order to provide further data required for investigating evolutionary patterns of repeated DNA within and between species, we used a novel approach based on massive parallel sequencing which allowed a comprehensive repeat characterization in our model species, garden pea (Pisum sativum). Results Analysis of 33.3 Mb sequence data resulted in quantification and partial sequence reconstruction of major repeat families occurring in the pea genome with at least thousands of copies. Our results showed that the pea genome is dominated by LTR-retrotransposons, estimated at 140,000 copies/1C. Ty3/gypsy elements are less diverse and accumulated to higher copy numbers than Ty1/copia. This is in part due to a large population of Ogre-like retrotransposons which alone make up over 20% of the genome. In addition to numerous types of mobile elements, we have discovered a set of novel satellite repeats and two additional variants of telomeric sequences. Comparative genome analysis revealed that there are only a few repeat sequences conserved between pea and soybean genomes. On the other hand, all major families of pea mobile elements are well represented in M. truncatula. Conclusion We have demonstrated that even in a species with a relatively large genome like pea, where a single 454-sequencing run provided only 0.77% coverage, the generated sequences were sufficient to reconstruct and analyze major repeat families corresponding to a total of 35–48% of the genome. These data provide a starting point for further investigations of legume plant genomes based on their global comparative analysis and for the development of more sophisticated approaches for data mining. PMID:18031571
Genome analysis of the platypus reveals unique signatures of evolution.

PubMed

Warren, Wesley C; Hillier, LaDeana W; Marshall Graves, Jennifer A; Birney, Ewan; Ponting, Chris P; Grützner, Frank; Belov, Katherine; Miller, Webb; Clarke, Laura; Chinwalla, Asif T; Yang, Shiaw-Pyng; Heger, Andreas; Locke, Devin P; Miethke, Pat; Waters, Paul D; Veyrunes, Frédéric; Fulton, Lucinda; Fulton, Bob; Graves, Tina; Wallis, John; Puente, Xose S; López-Otín, Carlos; Ordóñez, Gonzalo R; Eichler, Evan E; Chen, Lin; Cheng, Ze; Deakin, Janine E; Alsop, Amber; Thompson, Katherine; Kirby, Patrick; Papenfuss, Anthony T; Wakefield, Matthew J; Olender, Tsviya; Lancet, Doron; Huttley, Gavin A; Smit, Arian F A; Pask, Andrew; Temple-Smith, Peter; Batzer, Mark A; Walker, Jerilyn A; Konkel, Miriam K; Harris, Robert S; Whittington, Camilla M; Wong, Emily S W; Gemmell, Neil J; Buschiazzo, Emmanuel; Vargas Jentzsch, Iris M; Merkel, Angelika; Schmitz, Juergen; Zemann, Anja; Churakov, Gennady; Kriegs, Jan Ole; Brosius, Juergen; Murchison, Elizabeth P; Sachidanandam, Ravi; Smith, Carly; Hannon, Gregory J; Tsend-Ayush, Enkhjargal; McMillan, Daniel; Attenborough, Rosalind; Rens, Willem; Ferguson-Smith, Malcolm; Lefèvre, Christophe M; Sharp, Julie A; Nicholas, Kevin R; Ray, David A; Kube, Michael; Reinhardt, Richard; Pringle, Thomas H; Taylor, James; Jones, Russell C; Nixon, Brett; Dacheux, Jean-Louis; Niwa, Hitoshi; Sekita, Yoko; Huang, Xiaoqiu; Stark, Alexander; Kheradpour, Pouya; Kellis, Manolis; Flicek, Paul; Chen, Yuan; Webber, Caleb; Hardison, Ross; Nelson, Joanne; Hallsworth-Pepin, Kym; Delehaunty, Kim; Markovic, Chris; Minx, Pat; Feng, Yucheng; Kremitzki, Colin; Mitreva, Makedonka; Glasscock, Jarret; Wylie, Todd; Wohldmann, Patricia; Thiru, Prathapan; Nhan, Michael N; Pohl, Craig S; Smith, Scott M; Hou, Shunfeng; Nefedov, Mikhail; de Jong, Pieter J; Renfree, Marilyn B; Mardis, Elaine R; Wilson, Richard K

2008-05-08

We present a draft genome sequence of the platypus, Ornithorhynchus anatinus. This monotreme exhibits a fascinating combination of reptilian and mammalian characters. For example, platypuses have a coat of fur adapted to an aquatic lifestyle; platypus females lactate, yet lay eggs; and males are equipped with venom similar to that of reptiles. Analysis of the first monotreme genome aligned these features with genetic innovations. We find that reptile and platypus venom proteins have been co-opted independently from the same gene families; milk protein genes are conserved despite platypuses laying eggs; and immune gene family expansions are directly related to platypus biology. Expansions of protein, non-protein-coding RNA and microRNA families, as well as repeat elements, are identified. Sequencing of this genome now provides a valuable resource for deep mammalian comparative analyses, as well as for monotreme biology and conservation.
Genome analysis of the platypus reveals unique signatures of evolution

PubMed Central

Warren, Wesley C.; Hillier, LaDeana W.; Marshall Graves, Jennifer A.; Birney, Ewan; Ponting, Chris P.; Grützner, Frank; Belov, Katherine; Miller, Webb; Clarke, Laura; Chinwalla, Asif T.; Yang, Shiaw-Pyng; Heger, Andreas; Locke, Devin P.; Miethke, Pat; Waters, Paul D.; Veyrunes, Frédéric; Fulton, Lucinda; Fulton, Bob; Graves, Tina; Wallis, John; Puente, Xose S.; López-Otín, Carlos; Ordóñez, Gonzalo R.; Eichler, Evan E.; Chen, Lin; Cheng, Ze; Deakin, Janine E.; Alsop, Amber; Thompson, Katherine; Kirby, Patrick; Papenfuss, Anthony T.; Wakefield, Matthew J.; Olender, Tsviya; Lancet, Doron; Huttley, Gavin A.; Smit, Arian F. A.; Pask, Andrew; Temple-Smith, Peter; Batzer, Mark A.; Walker, Jerilyn A.; Konkel, Miriam K.; Harris, Robert S.; Whittington, Camilla M.; Wong, Emily S. W.; Gemmell, Neil J.; Buschiazzo, Emmanuel; Vargas Jentzsch, Iris M.; Merkel, Angelika; Schmitz, Juergen; Zemann, Anja; Churakov, Gennady; Kriegs, Jan Ole; Brosius, Juergen; Murchison, Elizabeth P.; Sachidanandam, Ravi; Smith, Carly; Hannon, Gregory J.; Tsend-Ayush, Enkhjargal; McMillan, Daniel; Attenborough, Rosalind; Rens, Willem; Ferguson-Smith, Malcolm; Lefèvre, Christophe M.; Sharp, Julie A.; Nicholas, Kevin R.; Ray, David A.; Kube, Michael; Reinhardt, Richard; Pringle, Thomas H.; Taylor, James; Jones, Russell C.; Nixon, Brett; Dacheux, Jean-Louis; Niwa, Hitoshi; Sekita, Yoko; Huang, Xiaoqiu; Stark, Alexander; Kheradpour, Pouya; Kellis, Manolis; Flicek, Paul; Chen, Yuan; Webber, Caleb; Hardison, Ross; Nelson, Joanne; Hallsworth-Pepin, Kym; Delehaunty, Kim; Markovic, Chris; Minx, Pat; Feng, Yucheng; Kremitzki, Colin; Mitreva, Makedonka; Glasscock, Jarret; Wylie, Todd; Wohldmann, Patricia; Thiru, Prathapan; Nhan, Michael N.; Pohl, Craig S.; Smith, Scott M.; Hou, Shunfeng; Renfree, Marilyn B.; Mardis, Elaine R.; Wilson, Richard K.

2009-01-01

We present a draft genome sequence of the platypus, Ornithorhynchus anatinus. This monotreme exhibits a fascinating combination of reptilian and mammalian characters. For example, platypuses have a coat of fur adapted to an aquatic lifestyle; platypus females lactate, yet lay eggs; and males are equipped with venom similar to that of reptiles. Analysis of the first monotreme genome aligned these features with genetic innovations. We find that reptile and platypus venom proteins have been co-opted independently from the same gene families; milk protein genes are conserved despite platypuses laying eggs; and immune gene family expansions are directly related to platypus biology. Expansions of protein, non-protein-coding RNA and microRNA families, as well as repeat elements, are identified. Sequencing of this genome now provides a valuable resource for deep mammalian comparative analyses, as well as for monotreme biology and conservation. PMID:18464734
Exome sequence analysis suggests genetic burden contributes to phenotypic variability and complex neuropathy

PubMed Central

Gonzaga-Jauregui, Claudia; Harel, Tamar; Gambin, Tomasz; Kousi, Maria; Griffin, Laurie B.; Francescatto, Ludmila; Ozes, Burcak; Karaca, Ender; Jhangiani, Shalini; Bainbridge, Matthew N.; Lawson, Kim S.; Pehlivan, Davut; Okamoto, Yuji; Withers, Marjorie; Mancias, Pedro; Slavotinek, Anne; Reitnauer, Pamela J; Goksungur, Meryem T.; Shy, Michael; Crawford, Thomas O.; Koenig, Michel; Willer, Jason; Flores, Brittany N.; Pediaditrakis, Igor; Us, Onder; Wiszniewski, Wojciech; Parman, Yesim; Antonellis, Anthony; Muzny, Donna M.; Katsanis, Nicholas; Battaloglu, Esra; Boerwinkle, Eric; Gibbs, Richard A.; Lupski, James R.

2015-01-01

Charcot-Marie-Tooth (CMT) disease is a clinically and genetically heterogeneous distal symmetric polyneuropathy. Whole-exome sequencing (WES) of 40 individuals from 37 unrelated families with CMT-like peripheral neuropathy refractory to molecular diagnosis identified apparent causal mutations in ~45% (17/37) of families. Three candidate disease genes are proposed, supported by a combination of genetic and in vivo studies. Aggregate analysis of mutation data revealed a significantly increased number of rare variants across 58 neuropathy associated genes in subjects versus controls; confirmed in a second ethnically discrete neuropathy cohort, suggesting mutation burden potentially contributes to phenotypic variability. Neuropathy genes shown to have highly penetrant Mendelizing variants (HMPVs) and implicated by burden in families were shown to interact genetically in a zebrafish assay exacerbating the phenotype established by the suppression of single genes. Our findings suggest that the combinatorial effect of rare variants contributes to disease burden and variable expressivity. PMID:26257172
Application of Whole Exome Sequencing in Six Families with an Initial Diagnosis of Autosomal Dominant Retinitis Pigmentosa: Lessons Learned

PubMed Central

Fernandez-San Jose, Patricia; Liu, Yichuan; March, Michael; Pellegrino, Renata; Golhar, Ryan; Corton, Marta; Blanco-Kelly, Fiona; López-Molina, Maria Isabel; García-Sandoval, Blanca; Guo, Yiran; Tian, Lifeng; Liu, Xuanzhu; Guan, Liping; Zhang, Jianguo; Keating, Brendan; Xu, Xun

2015-01-01

This study aimed to identify the genetics underlying dominant forms of inherited retinal dystrophies using whole exome sequencing (WES) in six families extensively screened for known mutations or genes. Thirty-eight individuals were subjected to WES. Causative variants were searched among single nucleotide variants (SNVs) and insertion/deletion variants (indels) and whenever no potential candidate emerged, copy number variant (CNV) analysis was performed. Variants or regions harboring a candidate variant were prioritized and segregation of the variant with the disease was further assessed using Sanger sequencing in case of SNVs and indels, and quantitative PCR (qPCR) for CNVs. SNV and indel analysis led to the identification of a previously reported mutation in PRPH2. Two additional mutations linked to different forms of retinal dystrophies were identified in two families: a known frameshift deletion in RPGR, a gene responsible for X-linked retinitis pigmentosa and p.Ser163Arg in C1QTNF5 associated with Late-Onset Retinal Degeneration. A novel heterozygous deletion spanning the entire region of PRPF31 was also identified in the affected members of a fourth family, which was confirmed with qPCR. This study allowed the identification of the genetic cause of the retinal dystrophy and the establishment of a correct diagnosis in four families, including a large heterozygous deletion in PRPF31, typically considered one of the pitfalls of this method. Since all findings in this study are restricted to known genes, we propose that targeted sequencing using gene-panel is an optimal first approach for the genetic screening and that once known genetic causes are ruled out, WES might be used to uncover new genes involved in inherited retinal dystrophies. PMID:26197217
Increased Probability of Co-Occurrence of Two Rare Diseases in Consanguineous Families and Resolution of a Complex Phenotype by Next Generation Sequencing

PubMed Central

Lal, Dennis; Neubauer, Bernd A.; Toliat, Mohammad R.; Altmüller, Janine; Thiele, Holger; Nürnberg, Peter; Kamrath, Clemens; Schänzer, Anne; Sander, Thomas; Hahn, Andreas; Nothnagel, Michael

2016-01-01

Massively parallel sequencing of whole genomes and exomes has facilitated a direct assessment of causative genetic variation, now enabling the identification of genetic factors involved in rare diseases (RD) with Mendelian inheritance patterns on an almost routine basis. Here, we describe the illustrative case of a single consanguineous family where this strategy suffered from the difficulty to distinguish between two etiologically distinct disorders, namely the co-occurrence of hereditary hypophosphatemic rickets (HRR) and congenital myopathies (CM), by their phenotypic manifestation alone. We used parametric linkage analysis, homozygosity mapping and whole exome-sequencing to identify mutations underlying HRR and CM. We also present an approximate approach for assessing the probability of co-occurrence of two unlinked recessive RD in a single family as a function of the degree of consanguinity and the frequency of the disease-causing alleles. Linkage analysis and homozygosity mapping yielded elusive results when assuming a single RD, but whole-exome sequencing helped to identify two mutations in two genes, namely SLC34A3 and SEPN1, that segregated independently in this family and that have previously been linked to two etiologically different diseases. We assess the increase in chance co-occurrence of rare diseases due to consanguinity, i.e. under circumstances that generally favor linkage mapping of recessive disease, and show that this probability can increase by several orders of magnitudes. We conclude that such potential co-occurrence represents an underestimated risk when analyzing rare or undefined diseases in consanguineous families and should be given more consideration in the clinical and genetic evaluation. PMID:26789268
RHO Mutations (p.W126L and p.A346P) in Two Japanese Families with Autosomal Dominant Retinitis Pigmentosa

PubMed Central

Akahori, Masakazu; Itabashi, Takeshi; Nishino, Jo; Yoshitake, Kazutoshi; Ikeo, Kazuho; Tsuneoka, Hiroshi

2014-01-01

Purpose. To investigate genetic and clinical features of patients with rhodopsin (RHO) mutations in two Japanese families with autosomal dominant retinitis pigmentosa (adRP). Methods. Whole-exome sequence analysis was performed in ten adRP families. Identified RHO mutations for the cosegregation analysis were confirmed by Sanger sequencing. Ophthalmic examinations were performed to evaluate the RP phenotypes. The impact of the RHO mutation on the rhodopsin conformation was examined by molecular modeling analysis. Results. In two adRP families, we identified two RHO mutations (c.377G>T (p.W126L) and c.1036G>C (p.A346P)), one of which was novel. Complete cosegregation was confirmed for each mutation exhibiting the RP phenotype in both families. Molecular modeling predicted that the novel mutation (p.W126L) might impair rhodopsin function by affecting its conformational transition in the light-adapted form. Clinical phenotypes showed that patients with p.W126L exhibited sector RP, whereas patients with p.A346P exhibited classic RP. Conclusions. Our findings demonstrated that the novel mutation (p.W126L) may be associated with the phenotype of sector RP. Identification of RHO mutations is a very useful tool for predicting disease severity and providing precise genetic counseling. PMID:25485142
The complete nucleotide sequence and genome organization of a novel betaflexivirus infecting Citrullus lanatus.

PubMed

Xin, Min; Zhang, Peipei; Liu, Wenwen; Ren, Yingdang; Cao, Mengji; Wang, Xifeng

2017-10-01

The complete nucleotide sequence of a novel positive single-stranded (+ss) RNA virus, tentatively named watermelon virus A (WVA), was determined using a combination of three methods: RNA sequencing, small RNA sequencing, and Sanger sequencing. The full genome of WVA is comprised of 8,372 nucleotides (nt), excluding the poly (A) tail, and contains four open reading frames (ORFs). The largest ORF, ORF1 encodes a putative replication-associated polyprotein (RP) with three conserved domains. ORF2 and ORF4 encode a movement protein (MP) and coat protein (CP), respectively. The putative product encoded by ORF3, of an estimated molecular mass of 25 kDa, has no significant similarity with other proteins. Identity and phylogenetic analysis indicate that WVA is a new virus, closely related to members of the family Betaflexiviridae. However, the final taxonomic allocation of WVA within the family is yet to be determined.
An effective approach for annotation of protein families with low sequence similarity and conserved motifs: identifying GDSL hydrolases across the plant kingdom.

PubMed

Vujaklija, Ivan; Bielen, Ana; Paradžik, Tina; Biđin, Siniša; Goldstein, Pavle; Vujaklija, Dušica

2016-02-18

The massive accumulation of protein sequences arising from the rapid development of high-throughput sequencing, coupled with automatic annotation, results in high levels of incorrect annotations. In this study, we describe an approach to decrease annotation errors of protein families characterized by low overall sequence similarity. The GDSL lipolytic family comprises proteins with multifunctional properties and high potential for pharmaceutical and industrial applications. The number of proteins assigned to this family has increased rapidly over the last few years. In particular, the natural abundance of GDSL enzymes reported recently in plants indicates that they could be a good source of novel GDSL enzymes. We noticed that a significant proportion of annotated sequences lack specific GDSL motif(s) or catalytic residue(s). Here, we applied motif-based sequence analyses to identify enzymes possessing conserved GDSL motifs in selected proteomes across the plant kingdom. Motif-based HMM scanning (Viterbi decoding-VD and posterior decoding-PD) and the here described PD/VD protocol were successfully applied on 12 selected plant proteomes to identify sequences with GDSL motifs. A significant number of identified GDSL sequences were novel. Moreover, our scanning approach successfully detected protein sequences lacking at least one of the essential motifs (171/820) annotated by Pfam profile search (PfamA) as GDSL. Based on these analyses we provide a curated list of GDSL enzymes from the selected plants. CLANS clustering and phylogenetic analysis helped us to gain a better insight into the evolutionary relationship of all identified GDSL sequences. Three novel GDSL subfamilies as well as unreported variations in GDSL motifs were discovered in this study. In addition, analyses of selected proteomes showed a remarkable expansion of GDSL enzymes in the lycophyte, Selaginella moellendorffii. Finally, we provide a general motif-HMM scanner which is easily accessible through the graphical user interface ( http://compbio.math.hr/ ). Our results show that scanning with a carefully parameterized motif-HMM is an effective approach for annotation of protein families with low sequence similarity and conserved motifs. The results of this study expand current knowledge and provide new insights into the evolution of the large GDSL-lipase family in land plants.
Novel mutations of ABCB6 associated with autosomal dominant dyschromatosis universalis hereditaria.

PubMed

Cui, Ying-Xia; Xia, Xin-Yi; Zhou, Yang; Gao, Lin; Shang, Xue-Jun; Ni, Tong; Wang, Wei-Ping; Fan, Xiao-Buo; Yin, Hong-Lin; Jiang, Shao-Jun; Yao, Bing; Hu, Yu-An; Wang, Gang; Li, Xiao-Jun

2013-01-01

Dyschromatosis universalis hereditaria (DUH) is a rare heterogeneous pigmentary genodermatosis, which was first described in 1933. The genetic cause has recently been discovered by the discovery of mutations in ABCB6. Here we investigated a Chinese family with typical features of autosomal dominant DUH and 3 unrelated patients with sporadic DUH. Skin tissues were obtained from the proband, of this family and the 3 sporadic patients. Histopathological examination and immunohistochemical analysis of ABCB6 were performed. Peripheral blood DNA samples were obtained from 21 affected, 14 unaffected, 11 spouses in the family and the 3 sporadic patients. A genome-wide linkage scan for the family was carried out to localize the causative gene. Exome sequencing was performed from 3 affected and 1 unaffected in the family. Sanger sequencing of ABCB6 was further used to identify the causative gene for all samples obtained from available family members, the 3 sporadic patients and a panel of 455 ethnically-matched normal Chinese individuals. Histopathological analysis showed melanocytes in normal control's skin tissue and the hyperpigmented area contained more melanized, mature melanosomes than those within the hypopigmented areas. Empty immature melanosomes were found in the hypopigmented melanocytes. Parametric multipoint linkage analysis produced a HLOD score of 4.68, with markers on chromosome 2q35-q37.2. A missense mutation (c.1663 C>A, p.Gln555Lys) in ABCB6 was identified in this family by exome and Sanger sequencing. The mutation perfectly cosegregated with the skin phenotype. An additional mutation (g.776 delC, c.459 delC) in ABCB6 was found in an unrelated sporadic patient. No mutation in ABCB6 was discovered in the other two sporadic patients. Neither of the two mutations was present in the 455 controls. Melanocytes showed positive immunoreactivity to ABCB6. Our data add new variants to the repertoire of ABCB6 mutations with DUH.
De Novo Transcriptome Sequencing of Olea europaea L. to Identify Genes Involved in the Development of the Pollen Tube.

PubMed

Iaria, Domenico; Chiappetta, Adriana; Muzzalupo, Innocenzo

2016-01-01

In olive (Olea europaea L.), the processes controlling self-incompatibility are still unclear and the molecular basis underlying this process are still not fully characterized. In order to determine compatibility relationships, using next-generation sequencing techniques and a de novo transcriptome assembly strategy, we show that pollen tubes from different olive plants, grown in vitro in a medium containing its own pistil and in combination pollen/pistil from self-sterile and self-fertile cultivars, have a distinct gene expression profile and many of the differentially expressed sequences between the samples fall within gene families involved in the development of the pollen tube, such as lipase, carboxylesterase, pectinesterase, pectin methylesterase, and callose synthase. Moreover, different genes involved in signal transduction, transcription, and growth are overrepresented. The analysis also allowed us to identify members in actin and actin depolymerization factor and fibrin gene family and member of the Ca(2+) binding gene family related to the development and polarization of pollen apical tip. The whole transcriptomic analysis, through the identification of the differentially expressed transcripts set and an extended functional annotation analysis, will lead to a better understanding of the mechanisms of pollen germination and pollen tube growth in the olive.
Characterization of the glutathione S-transferase gene family through ESTs and expression analyses within common and pigmented cultivars of Citrus sinensis (L.) Osbeck

PubMed Central

2014-01-01

Background Glutathione S-transferases (GSTs) represent a ubiquitous gene family encoding detoxification enzymes able to recognize reactive electrophilic xenobiotic molecules as well as compounds of endogenous origin. Anthocyanin pigments require GSTs for their transport into the vacuole since their cytoplasmic retention is toxic to the cell. Anthocyanin accumulation in Citrus sinensis (L.) Osbeck fruit flesh determines different phenotypes affecting the typical pigmentation of Sicilian blood oranges. In this paper we describe: i) the characterization of the GST gene family in C. sinensis through a systematic EST analysis; ii) the validation of the EST assembly by exploiting the genome sequences of C. sinensis and C. clementina and their genome annotations; iii) GST gene expression profiling in six tissues/organs and in two different sweet orange cultivars, Cadenera (common) and Moro (pigmented). Results We identified 61 GST transcripts, described the full- or partial-length nature of the sequences and assigned to each sequence the GST class membership exploiting a comparative approach and the classification scheme proposed for plant species. A total of 23 full-length sequences were defined. Fifty-four of the 61 transcripts were successfully aligned to the C. sinensis and C. clementina genomes. Tissue specific expression profiling demonstrated that the expression of some GST transcripts was 'tissue-affected' and cultivar specific. A comparative analysis of C. sinensis GSTs with those from other plant species was also considered. Data from the current analysis are accessible at http://biosrv.cab.unina.it/citrusGST/, with the aim to provide a reference resource for C. sinensis GSTs. Conclusions This study aimed at the characterization of the GST gene family in C. sinensis. Based on expression patterns from two different cultivars and on sequence-comparative analyses, we also highlighted that two sequences, a Phi class GST and a Mapeg class GST, could be involved in the conjugation of anthocyanin pigments and in their transport into the vacuole, specifically in fruit flesh of the pigmented cultivar. PMID:24490620
Identification a novel MYOC gene mutation in a Chinese family with juvenile-onset open angle glaucoma.

PubMed

Zhao, Xin; Yang, Chaoshan; Tong, Yi; Zhang, Xiaohui; Xu, Liang; Li, Yang

2010-08-25

To describe the clinical and genetic findings in one Chinese family with juvenile-onset open angle glaucoma (JOAG). One family was examined clinically and a follow-up took place 5 years later. After informed consent was obtained, genomic DNA was extracted from the venous blood of all participants. Linkage analysis was performed with three microsatellite markers around the MYOC gene (D1S196, D1S2815, and D1S218) in the family. Mutation screening of all coding exons of MYOC was performed by direct sequencing of PCR-amplified DNA fragments and restriction fragment length polymorphism (RFLP) analysis. Bioinformatics analysis by the Garnier-Osguthorpe-Robson (GOR) method predicted the effects of variants detected on secondary structures of the MYOC protein. Clinical examination and pedigree analysis revealed a three- generation family with seven members diagnosed with JOAG, three with ocular hypertension, and five normal individuals. Through genotyping, the pedigree showed a linkage to the MYOC on chromosome 1q24-25. Mutation screening of MYOC in this family revealed an A-->T transition at position 1348 (p. N450Y) of the cDNA sequence. This missense mutation co-segregated with the disease phenotype of the family, but was not found in 100 normal controls. Secondary structure prediction of the p.N450Y by the GOR method revealed the replacement of a coil with a beta sheet at the amino acid 447. Early onset JOAG, with incomplete penetrance, is consistent with a novel mutation in MYOC. The finding provides pre-symptomatic molecular diagnosis for the members of this family and is useful for further genetic consultation.
A Statistical Framework for the Functional Analysis of Metagenomes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sharon, Itai; Pati, Amrita; Markowitz, Victor

2008-10-01

Metagenomic studies consider the genetic makeup of microbial communities as a whole, rather than their individual member organisms. The functional and metabolic potential of microbial communities can be analyzed by comparing the relative abundance of gene families in their collective genomic sequences (metagenome) under different conditions. Such comparisons require accurate estimation of gene family frequencies. They present a statistical framework for assessing these frequencies based on the Lander-Waterman theory developed originally for Whole Genome Shotgun (WGS) sequencing projects. They also provide a novel method for assessing the reliability of the estimations which can be used for removing seemingly unreliable measurements.more » They tested their method on a wide range of datasets, including simulated genomes and real WGS data from sequencing projects of whole genomes. Results suggest that their framework corrects inherent biases in accepted methods and provides a good approximation to the true statistics of gene families in WGS projects.« less
Evolution of the chalcone synthase gene family in the genus Ipomoea.

PubMed Central

Durbin, M L; Learn, G H; Huttley, G A; Clegg, M T

1995-01-01

The evolution of the chalcone synthase [CHS; malonyl-CoA:4-coumaroyl-CoA malonyltransferase (cyclizing), EC 2.3.1.74] multigene family in the genus Ipomoea is explored. Thirteen CHS genes from seven Ipomoea species (family Convolvulaceae) were sequenced--three from genomic clones and the remainder from PCR amplification with primers designed from the 5' flanking region and the end of the 3' coding region of Ipomoea purpurea Roth. Analysis of the data indicates a duplication of CHS that predates the divergence of the Ipomoea species in this study. The Ipomoea CHS genes are among the most rapidly evolving of the CHS genes sequenced to date. The CHS genes in this study are most closely related to the Petunia CHS-B gene, which is also rapidly evolving and highly divergent from the rest of the Petunia CHS sequences. PMID:7724563
Phylogenomic analysis of UDP glycosyltransferase 1 multigene family in Linum usitatissimum identified genes with varied expression patterns.

PubMed

Barvkar, Vitthal T; Pardeshi, Varsha C; Kale, Sandip M; Kadoo, Narendra Y; Gupta, Vidya S

2012-05-08

The glycosylation process, catalyzed by ubiquitous glycosyltransferase (GT) family enzymes, is a prevalent modification of plant secondary metabolites that regulates various functions such as hormone homeostasis, detoxification of xenobiotics and biosynthesis and storage of secondary metabolites. Flax (Linum usitatissimum L.) is a commercially grown oilseed crop, important because of its essential fatty acids and health promoting lignans. Identification and characterization of UDP glycosyltransferase (UGT) genes from flax could provide valuable basic information about this important gene family and help to explain the seed specific glycosylated metabolite accumulation and other processes in plants. Plant genome sequencing projects are useful to discover complexity within this gene family and also pave way for the development of functional genomics approaches. Taking advantage of the newly assembled draft genome sequence of flax, we identified 137 UDP glycosyltransferase (UGT) genes from flax using a conserved signature motif. Phylogenetic analysis of these protein sequences clustered them into 14 major groups (A-N). Expression patterns of these genes were investigated using publicly available expressed sequence tag (EST), microarray data and reverse transcription quantitative real time PCR (RT-qPCR). Seventy-three per cent of these genes (100 out of 137) showed expression evidence in 15 tissues examined and indicated varied expression profiles. The RT-qPCR results of 10 selected genes were also coherent with the digital expression analysis. Interestingly, five duplicated UGT genes were identified, which showed differential expression in various tissues. Of the seven intron loss/gain positions detected, two intron positions were conserved among most of the UGTs, although a clear relationship about the evolution of these genes could not be established. Comparison of the flax UGTs with orthologs from four other sequenced dicot genomes indicated that seven UGTs were flax diverged. Flax has a large number of UGT genes including few flax diverged ones. Phylogenetic analysis and expression profiles of these genes identified tissue and condition specific repertoire of UGT genes from this crop. This study would facilitate precise selection of candidate genes and their further characterization of substrate specificities and in planta functions.
Phylogenomic analysis of UDP glycosyltransferase 1 multigene family in Linum usitatissimum identified genes with varied expression patterns

PubMed Central

2012-01-01

Background The glycosylation process, catalyzed by ubiquitous glycosyltransferase (GT) family enzymes, is a prevalent modification of plant secondary metabolites that regulates various functions such as hormone homeostasis, detoxification of xenobiotics and biosynthesis and storage of secondary metabolites. Flax (Linum usitatissimum L.) is a commercially grown oilseed crop, important because of its essential fatty acids and health promoting lignans. Identification and characterization of UDP glycosyltransferase (UGT) genes from flax could provide valuable basic information about this important gene family and help to explain the seed specific glycosylated metabolite accumulation and other processes in plants. Plant genome sequencing projects are useful to discover complexity within this gene family and also pave way for the development of functional genomics approaches. Results Taking advantage of the newly assembled draft genome sequence of flax, we identified 137 UDP glycosyltransferase (UGT) genes from flax using a conserved signature motif. Phylogenetic analysis of these protein sequences clustered them into 14 major groups (A-N). Expression patterns of these genes were investigated using publicly available expressed sequence tag (EST), microarray data and reverse transcription quantitative real time PCR (RT-qPCR). Seventy-three per cent of these genes (100 out of 137) showed expression evidence in 15 tissues examined and indicated varied expression profiles. The RT-qPCR results of 10 selected genes were also coherent with the digital expression analysis. Interestingly, five duplicated UGT genes were identified, which showed differential expression in various tissues. Of the seven intron loss/gain positions detected, two intron positions were conserved among most of the UGTs, although a clear relationship about the evolution of these genes could not be established. Comparison of the flax UGTs with orthologs from four other sequenced dicot genomes indicated that seven UGTs were flax diverged. Conclusions Flax has a large number of UGT genes including few flax diverged ones. Phylogenetic analysis and expression profiles of these genes identified tissue and condition specific repertoire of UGT genes from this crop. This study would facilitate precise selection of candidate genes and their further characterization of substrate specificities and in planta functions. PMID:22568875
Identification of a fourth locus (EVR4) for familial exudative vitreoretinopathy (FEVR).

PubMed

Toomes, Carmel; Downey, Louise M; Bottomley, Helen M; Scott, Sheila; Woodruff, Geoffrey; Trembath, Richard C; Inglehearn, Chris F

2004-01-15

Familial exudative vitreoretinopathy (FEVR) is a genetically heterogeneous inherited blinding disorder of the retinal vascular system. To date three loci have been mapped: EVR1 on chromosome 11q, EVR2 on chromosome Xp, and EVR3 on chromosome 11p. The gene underlying EVR3 remains unidentified whilst the EVR2 gene, which encodes the Norrie disease protein (NDP), was identified over a decade ago. More recently, FZD4, the gene that encodes the Wnt receptor Frizzled-4, was identified as the mutated gene at the EVR1 locus. The purpose of this study was to screen FZD4 in a large family previously proven to be linked to the EVR1 locus. PCR products were generated using genomic DNA from affected family members with primers designed to amplify the coding sequence of FZD4. The PCR products were screened for mutations by direct sequencing. Genotyping was performed in all available family members using fluorescently labeled microsatellite markers from chromosome 11q. Sequencing of the EVR1 gene, FZD4, in this family identified no mutation. To investigate this family further we performed high-resolution genotyping with markers spanning chromosome 11q. Haplotype analysis excluded FZD4 as the mutated gene in this family and identified a candidate region approximately 10 cM centromeric to EVR1. This new FEVR locus is flanked by markers D11S1368 (centromeric) and D11S937 (telomeric) and spans approximately 15 cM. High-resolution genotyping and haplotype analysis excluded FZD4 as the defective gene in a family previously linked to the EVR1 locus. The results indicate that the gene mutated in this family lies centromeric to the EVR1 gene, FZD4, and is also genetically distinct from the EVR3 locus. This new locus has been designated EVR4 and is the fourth FEVR locus to be described.
A variant of Leber hereditary optic neuropathy characterized by recovery of vision and by an unusual mitochondrial genetic etiology.

PubMed Central

Mackey, D; Howell, N

1992-01-01

The Tas2 and Vic2 Australian families are affected with a variant of Leber hereditary optic neuropathy (LHON). The risk of developing the optic neuropathy shows strict maternal inheritance, and the ophthalmological changes in affected family members are characteristic of LHON. However, in contrast to the common form of the disease, members of these two families show a high frequency of vision recovery. To ascertain the mitochondrial genetic etiology of the LHON in these families, both (a) the the nucleotide sequences of the seven mitochondrial genes encoding subunits of respiratory-chain complex I and (b) the mitochondrial cytochrome b gene were determined for representatives of both families. Neither family carries any of the previously identified primary mitochondrial LHON mutations: ND4/11778, ND1/3460, or ND1/4160. Instead, both LHON families carry multiple nucleotide changes in the mitochondrial complex I genes, which produce conservative amino acid changes. From the available sequence data, it is inferred that the Vic2 and Tas2 LHON families are phylogenetically related to each other and to a cluster of LHON families in which mutations in the mitochondrial cytochrome b gene have been hypothesized to play a primary etiological role. However, sequencing analysis establishes that the Vic2 and Tas2 LHON families do not carry these cytochrome b mutations. There are two hypotheses to account for the unusual mitochondrial genetic etiology of the LHON in the Tas2 and Vic2 LHON families. One possibility is that there is a primary LHON mutation within the mitochondrial genome but that it is at a site that was not included in the sequencing analyses. Alternatively, the disease in these families may result from the cumulative effects of multiple secondary LHON mutations that have less severe phenotypic consequences. PMID:1463007

A variant of Leber hereditary optic neuropathy characterized by recovery of vision and by an unusual mitochondrial genetic etiology

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mackey, D.; Howell, N.

1992-12-01

The Tas2 and Vic2 Australian families are affected with a variant of Leber hereditary optic neuropathy (LHON). The risk of developing the optic neuropathy shows strict maternal inheritance, and the opthalmological changes in affected family members are characteristic of LHON. However, in contrast to the common form of the disease, members of these two families show a high frequency of vision recovery. To ascertain the mitochondrial genetic etiology of the LHON in these families, both (a) the nucleotide sequences of the seven mitochondrial genes encoding subunits of respiratory-chain complex I and (b) the mitochondrial cytochrome b gene were determined formore » representatives of both families. Neither family carries any of the previously identified primary mitochondrial LHON mutations: ND4/11778, ND1/3460, or ND1/4160. Instead, both LHON families carry multiple nucleotide changes in the mitochondrial complex I genes, which produce conservative amino acid changes. From the available sequence data, it is inferred that the Vic2 and Tas2 LHON families are phylogenetically related to each other and to a cluster of LHON families in which mutations in the mitochondrial cytochrome b gene have been hypothesized to play a primary etiological role. However, sequencing analysis establishes that the Vic2 and Tas2 LHON families do not carry these cytochrome b mutations. There are two hypotheses to account for the unusual mitochondrial genetic etiology of the LHON in the Tas2 and Vic2 LHON families. One possibility is that there is a primary LHON mutation within the mitochondrial genome but that it is at a site that was not included in the sequencing analyses. Alternatively, the disease in these families may result from the cumulative effects of multiple secondary LHON mutations that have less severe phenotypic consequences. 29 refs., 3 figs., 3 tabs.« less
Genome-wide analysis of the Solanum tuberosum (potato) trehalose-6-phosphate synthase (TPS) gene family: evolution and differential expression during development and stress.

PubMed

Xu, Yingchun; Wang, Yanjie; Mattson, Neil; Yang, Liu; Jin, Qijiang

2017-12-01

Trehalose-6-phosphate synthase (TPS) serves important functions in plant desiccation tolerance and response to environmental stimuli. At present, a comprehensive analysis, i.e. functional classification, molecular evolution, and expression patterns of this gene family are still lacking in Solanum tuberosum (potato). In this study, a comprehensive analysis of the TPS gene family was conducted in potato. A total of eight putative potato TPS genes (StTPSs) were identified by searching the latest potato genome sequence. The amino acid identity among eight StTPSs varied from 59.91 to 89.54%. Analysis of d N /d S ratios suggested that regions in the TPP (trehalose-6-phosphate phosphatase) domains evolved faster than the TPS domains. Although the sequence of the eight StTPSs showed high similarity (2571-2796 bp), their gene length is highly differentiated (3189-8406 bp). Many of the regulatory elements possibly related to phytohormones, abiotic stress and development were identified in different TPS genes. Based on the phylogenetic tree constructed using TPS genes of potato, and four other Solanaceae plants, TPS genes could be categorized into 6 distinct groups. Analysis revealed that purifying selection most likely played a major role during the evolution of this family. Amino acid changes detected in specific branches of the phylogenetic tree suggests relaxed constraints might have contributed to functional divergence among groups. Moreover, StTPSs were found to exhibit tissue and treatment specific expression patterns upon analysis of transcriptome data, and performing qRT-PCR. This study provides a reference for genome-wide identification of the potato TPS gene family and sets a framework for further functional studies of this important gene family in development and stress response.
Common founder mutation in the LDL receptor gene causing familial hypercholesterolaemia in the Icelandic population.

PubMed

Gudnason, V; Sigurdsson, G; Nissen, H; Humphries, S E

1997-01-01

Haplotype analysis in 18 apparently unrelated families with familial hypercholesterolaemia (FH) in Iceland has identified at least five different chromosomes cosegregating with hypercholesterolaemia. The most common haplotype was identified in 11 of the 18 families, indicating a responsible for FH in the Icelandic population. By using single-strand conformation polymorphism (SSCP) and direct sequencing of amplified DNA, we identified a novel mutation (a T to a C) in the second nucleotide in the 5' part of intron 4 in the LDL receptor gene. This mutation was present in approximately 60% of the FH families (10/18), supporting the prediction of a common founder. These families could be traced to a common ancestor in half of the cases by going back no further than the eighteenth century. The mutation was predicted to affect correct splicing of exon 4, and analysis at the cellular level demonstrated an abnormal mRNA containing intron 4 sequence in lymphoblastoid cells from a patient carrying this mutation. Translation of the mRNA would lead to a premature stop codon and a truncated nonfunctional protein of 285 amino acids. The novel sequence change created a new restriction site for the restriction endonuclease NlaIII, and using this assay, 29 unrelated individuals with possible FH attending a lipid clinic for treatment were examined for this mutation. Two individuals in this group of patients were found to be carriers of this mutation, supporting the suggestion of a founder mutation. Using this assay for the detection of FH in the Icelandic population should identify > 60% of these individuals.
Uronic polysaccharide degrading enzymes.

PubMed

Garron, Marie-Line; Cygler, Miroslaw

2014-10-01

In the past several years progress has been made in the field of structure and function of polysaccharide lyases (PLs). The number of classified polysaccharide lyase families has increased to 23 and more detailed analysis has allowed the identification of more closely related subfamilies, leading to stronger correlation between each subfamily and a unique substrate. The number of as yet unclassified polysaccharide lyases has also increased and we expect that sequencing projects will allow many of these unclassified sequences to emerge as new families. The progress in structural analysis of PLs has led to having at least one representative structure for each of the families and for two unclassified enzymes. The newly determined structures have folds observed previously in other PL families and their catalytic mechanisms follow either metal-assisted or Tyr/His mechanisms characteristic for other PL enzymes. Comparison of PLs with glycoside hydrolases (GHs) shows several folds common to both classes but only for the β-helix fold is there strong indication of divergent evolution from a common ancestor. Analysis of bacterial genomes identified gene clusters containing multiple polysaccharide cleaving enzymes, the Polysaccharides Utilization Loci (PULs), and their gene complement suggests that they are organized to process completely a specific polysaccharide. Copyright © 2014 Elsevier Ltd. All rights reserved.
Initial sequencing and comparative analysis of the mouse genome

DOE Office of Scientific and Technical Information (OSTI.GOV)

Waterston, Robert H.; Lindblad-Toh, Kerstin; Birney, Ewan

2002-12-15

The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of themore » genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.« less
Molecular Characterization of “Candidatus Parilichlamydia carangidicola,” a Novel Chlamydia-Like Epitheliocystis Agent in Yellowtail Kingfish, Seriola lalandi (Valenciennes), and the Proposal of a New Family, “Candidatus Parilichlamydiaceae” fam. nov. (Order Chlamydiales)

PubMed Central

Polkinghorne, A.; Miller, T. L.; Groff, J. M.; LaPatra, S. E.; Nowak, B. F.

2013-01-01

Three cohorts of farmed yellowtail kingfish (Seriola lalandi) from South Australia were examined for Chlamydia-like organisms associated with epitheliocystis. To characterize the bacteria, 38 gill samples were processed for histopathology, electron microscopy, and 16S rRNA amplification, sequencing, and phylogenetic analysis. Microscopically, the presence of membrane-enclosed cysts was observed within the gill lamellae. Also observed was hyperplasia of the epithelial cells with cytoplasmic vacuolization and fusion of the gill lamellae. Transmission electron microscopy revealed morphological features of the reticulate and intermediate bodies typical of members of the order Chlamydiales. A novel 1,393-bp 16S chlamydial rRNA sequence was amplified from gill DNA extracted from fish in all cohorts over a 3-year period that corresponded to the 16S rRNA sequence amplified directly from laser-dissected cysts. This sequence was only 87% similar to the reported “Candidatus Piscichlamydia salmonis” (AY462244) from Atlantic salmon and Arctic charr. Phylogenetic analysis of this sequence against 35 Chlamydia and Chlamydia-like bacteria revealed that this novel bacterium belongs to an undescribed family lineage in the order Chlamydiales. Based on these observations, we propose this bacterium of yellowtail kingfish be known as “Candidatus Parilichlamydia carangidicola” and that the new family be known as “Candidatus Parilichlamydiaceae.” PMID:23275507
Structural analysis of the HLA-A/HLA-F subregion: Precise localization of two new multigene families closely associated with the HLA class I sequences

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pichon, L.; Carn, G.; Bouric, P.

1996-03-01

Positional cloning strategies for the hemochromatosis gene have previously concentrated on a target area restricted to a maximum genomic expanse of 400 kb around the HLA-A and HLA-F loci. Recently, the candidate region has been extended to 2-3 Mb on the distal side of the MHC. In this study, 10 coding sequences [hemochromatosis candidate genes (HCG) I to X] were isolated by cDNA selection using YACs covering the HLA-A/HLA-F subregion. Two of these (HCG II and HCG IV) belong to multigene families, as well as other sequences already described in this region, i.e., P5, pMC 6.7, and HLA class I.more » Fingerprinting of the four YACSs overlapping the region was performed and allowed partial localization of the different multigene family sequences on each YAC without defining their exact positions. Fingerprinting on cosmids isolated from the ICRF chromosome 6-specific cosmid library allowed more precise localization of the redundant sequences in all of the multigene families and revealed their apparent organization in clusters. Further examination of these intertwined sequences demonstrated that this structural organization resulted from a succession of complex phenomena, including duplications and contractions. This study presents a precise description of the structural organization of the HLA-A/HLA-F region and a determination of the sequences involved in the megabase size polymorphism observed among the A3, A24, and A31 haplotypes. 29 refs., 2 figs., 2 tabs.« less
Comparative molecular cytogenetic analyses of a major tandemly repeated DNA family and retrotransposon sequences in cultivated jute Corchorus species (Malvaceae).

PubMed

Begum, Rabeya; Zakrzewski, Falk; Menzel, Gerhard; Weber, Beatrice; Alam, Sheikh Shamimul; Schmidt, Thomas

2013-07-01

The cultivated jute species Corchorus olitorius and Corchorus capsularis are important fibre crops. The analysis of repetitive DNA sequences, comprising a major part of plant genomes, has not been carried out in jute but is useful to investigate the long-range organization of chromosomes. The aim of this study was the identification of repetitive DNA sequences to facilitate comparative molecular and cytogenetic studies of two jute cultivars and to develop a fluorescent in situ hybridization (FISH) karyotype for chromosome identification. A plasmid library was generated from C. olitorius and C. capsularis with genomic restriction fragments of 100-500 bp, which was complemented by targeted cloning of satellite DNA by PCR. The diversity of the repetitive DNA families was analysed comparatively. The genomic abundance and chromosomal localization of different repeat classes were investigated by Southern analysis and FISH, respectively. The cytosine methylation of satellite arrays was studied by immunolabelling. Major satellite repeats and retrotransposons have been identified from C. olitorius and C. capsularis. The satellite family CoSat I forms two undermethylated species-specific subfamilies, while the long terminal repeat (LTR) retrotransposons CoRetro I and CoRetro II show similarity to the Metaviridea of plant retroelements. FISH karyotypes were developed by multicolour FISH using these repetitive DNA sequences in combination with 5S and 18S-5·8S-25S rRNA genes which enable the unequivocal chromosome discrimination in both jute species. The analysis of the structure and diversity of the repeated DNA is crucial for genome sequence annotation. The reference karyotypes will be useful for breeding of jute and provide the basis for karyotyping homeologous chromosomes of wild jute species to reveal the genetic and evolutionary relationship between cultivated and wild Corchorus species.
Mariniradius saccharolyticus gen. nov., sp. nov., a member of the family Cyclobacteriaceae isolated from marine aquaculture pond water, and emended descriptions of the genus Aquiflexum and Aquiflexum balticum.

PubMed

Bhumika, V; Srinivas, T N R; Ravinder, K; Kumar, P Anil

2013-06-01

A novel marine, Gram-stain-negative, oxidase- and catalase- positive, rod-shaped bacterium, designated strain AK6(T), was isolated from marine aquaculture pond water collected in Andhra Pradesh, India. The fatty acids were dominated by iso-C15:0, iso-C17:1ω9c, iso-C15:1 G, iso-C17:0 3-OH and anteiso-C15:0. Strain AK6(T) contained MK-7 as the sole respiratory quinone and phosphatidylethanolamine, one unidentified aminophospholipid, one unidentified phospholipid and seven unidentified lipids as polar lipids. The DNA G+C content of strain AK6(T) was 45.6 mol%. Phylogenetic analysis showed that strain AK6(T) formed a distinct branch within the family Cyclobacteriaceae and clustered with Aquiflexum balticum DSM 16537(T) and other members of the family Cyclobacteriaceae. 16S rRNA gene sequence analysis confirmed that Aquiflexum balticum DSM 16537(T) was the nearest neighbour, with pairwise sequence similarity of 90.1%, while sequence similarity with the other members of the family was <88.5%. Based on differentiating phenotypic characteristics and phylogenetic inference, strain AK6(T) is proposed as a representative of a new genus and species of the family Cyclobacteriaceae, as Mariniradius saccharolyticus gen. nov., sp. nov. The type strain of Mariniradius saccharolyticus is AK6(T) (=MTCC 11279(T)=JCM 17389(T)). Emended descriptions of the genus Aquiflexum and Aquiflexum balticum are also proposed.
Resistance gene candidates identified by PCR with degenerate oligonucleotide primers map to clusters of resistance genes in lettuce.

PubMed

Shen, K A; Meyers, B C; Islam-Faridi, M N; Chin, D B; Stelly, D M; Michelmore, R W

1998-08-01

The recent cloning of genes for resistance against diverse pathogens from a variety of plants has revealed that many share conserved sequence motifs. This provides the possibility of isolating numerous additional resistance genes by polymerase chain reaction (PCR) with degenerate oligonucleotide primers. We amplified resistance gene candidates (RGCs) from lettuce with multiple combinations of primers with low degeneracy designed from motifs in the nucleotide binding sites (NBSs) of RPS2 of Arabidopsis thaliana and N of tobacco. Genomic DNA, cDNA, and bacterial artificial chromosome (BAC) clones were successfully used as templates. Four families of sequences were identified that had the same similarity to each other as to resistance genes from other species. The relationship of the amplified products to resistance genes was evaluated by several sequence and genetic criteria. The amplified products contained open reading frames with additional sequences characteristic of NBSs. Hybridization of RGCs to genomic DNA and to BAC clones revealed large numbers of related sequences. Genetic analysis demonstrated the existence of clustered multigene families for each of the four RGC sequences. This parallels classical genetic data on clustering of disease resistance genes. Two of the four families mapped to known clusters of resistance genes; these two families were therefore studied in greater detail. Additional evidence that these RGCs could be resistance genes was gained by the identification of leucine-rich repeat (LRR) regions in sequences adjoining the NBS similar to those in RPM1 and RPS2 of A. thaliana. Fluorescent in situ hybridization confirmed the clustered genomic distribution of these sequences. The use of PCR with degenerate oligonucleotide primers is therefore an efficient method to identify numerous RGCs in plants.
Identification and analysis of mutational hotspots in oncogenes and tumour suppressors.

PubMed

Baeissa, Hanadi; Benstead-Hume, Graeme; Richardson, Christopher J; Pearl, Frances M G

2017-03-28

The key to interpreting the contribution of a disease-associated mutation in the development and progression of cancer is an understanding of the consequences of that mutation both on the function of the affected protein and on the pathways in which that protein is involved. Protein domains encapsulate function and position-specific domain based analysis of mutations have been shown to help elucidate their phenotypes. In this paper we examine the domain biases in oncogenes and tumour suppressors, and find that their domain compositions substantially differ. Using data from over 30 different cancers from whole-exome sequencing cancer genomic projects we mapped over one million mutations to their respective Pfam domains to identify which domains are enriched in any of three different classes of mutation; missense, indels or truncations. Next, we identified the mutational hotspots within domain families by mapping small mutations to equivalent positions in multiple sequence alignments of protein domainsWe find that gain of function mutations from oncogenes and loss of function mutations from tumour suppressors are normally found in different domain families and when observed in the same domain families, hotspot mutations are located at different positions within the multiple sequence alignment of the domain. By considering hotspots in tumour suppressors and oncogenes independently, we find that there are different specific positions within domain families that are particularly suited to accommodate either a loss or a gain of function mutation. The position is also dependent on the class of mutation.We find rare mutations co-located with well-known functional mutation hotspots, in members of homologous domain superfamilies, and we detect novel mutation hotspots in domain families previously unconnected with cancer. The results of this analysis can be accessed through the MOKCa database (http://strubiol.icr.ac.uk/extra/MOKCa).
Molecular cloning of the potato Gro1-4 gene conferring resistance to pathotype Ro1 of the root cyst nematode Globodera rostochiensis, based on a candidate gene approach.

PubMed

Paal, Jürgen; Henselewski, Heike; Muth, Jost; Meksem, Khalid; Menéndez, Cristina M; Salamini, Francesco; Ballvora, Agim; Gebhardt, Christiane

2004-04-01

The endoparasitic root cyst nematode Globodera rostochiensis causes considerable damage in potato cultivation. In the past, major genes for nematode resistance have been introgressed from related potato species into cultivars. Elucidating the molecular basis of resistance will contribute to the understanding of nematode-plant interactions and assist in breeding nematode-resistant cultivars. The Gro1 resistance locus to G. rostochiensis on potato chromosome VII co-localized with a resistance-gene-like (RGL) DNA marker. This marker was used to isolate from genomic libraries 15 members of a closely related candidate gene family. Analysis of inheritance, linkage mapping, and sequencing reduced the number of candidate genes to three. Complementation analysis by stable potato transformation showed that the gene Gro1-4 conferred resistance to G. rostochiensis pathotype Ro1. Gro1-4 encodes a protein of 1136 amino acids that contains Toll-interleukin 1 receptor (TIR), nucleotide-binding (NB), leucine-rich repeat (LRR) homology domains and a C-terminal domain with unknown function. The deduced Gro1-4 protein differed by 29 amino acid changes from susceptible members of the Gro1 gene family. Sequence characterization of 13 members of the Gro1 gene family revealed putative regulatory elements and a variable microsatellite in the promoter region, insertion of a retrotransposon-like element in the first intron, and a stop codon in the NB coding region of some genes. Sequence analysis of RT-PCR products showed that Gro1-4 is expressed, among other members of the family including putative pseudogenes, in non-infected roots of nematode-resistant plants. RT-PCR also demonstrated that members of the Gro1 gene family are expressed in most potato tissues.
Analysis of Cytoskeletal and Motility Proteins in the Sea Urchin Genome Assembly

PubMed Central

RL, Morris; MP, Hoffman; RA, Obar; SS, McCafferty; IR, Gibbons; AD, Leone; J, Cool; EL, Allgood; AM, Musante; KM, Judkins; BJ, Rossetti; AP, Rawson; DR, Burgess

2007-01-01

The sea urchin embryo is a classical model system for studying the role of the cytoskeleton in such events as fertilization, mitosis, cleavage, cell migration and gastrulation. We have conducted an analysis of gene models derived from the Strongylocentrotus purpuratus genome assembly and have gathered strong evidence for the existence of multiple gene families encoding cytoskeletal proteins and their regulators in sea urchin. While many cytoskeletal genes have been cloned from sea urchin with sequences already existing in public databases, genome analysis reveals a significantly higher degree of diversity within certain gene families. Furthermore, genes are described corresponding to homologs of cytoskeletal proteins not previously documented in sea urchins. To illustrate the varying degree of sequence diversity that exists within cytoskeletal gene families, we conducted an analysis of genes encoding actins, specific actin-binding proteins, myosins, tubulins, kinesins, dyneins, specific microtubule-associated proteins, and intermediate filaments. We conducted ontological analysis of select genes to better understand the relatedness of urchin cytoskeletal genes to those of other deuterostomes. We analyzed developmental expression (EST) data to confirm the existence of select gene models and to understand their differential expression during various stages of early development. PMID:17027957
ITEP: an integrated toolkit for exploration of microbial pan-genomes.

PubMed

Benedict, Matthew N; Henriksen, James R; Metcalf, William W; Whitaker, Rachel J; Price, Nathan D

2014-01-03

Comparative genomics is a powerful approach for studying variation in physiological traits as well as the evolution and ecology of microorganisms. Recent technological advances have enabled sequencing large numbers of related genomes in a single project, requiring computational tools for their integrated analysis. In particular, accurate annotations and identification of gene presence and absence are critical for understanding and modeling the cellular physiology of newly sequenced genomes. Although many tools are available to compare the gene contents of related genomes, new tools are necessary to enable close examination and curation of protein families from large numbers of closely related organisms, to integrate curation with the analysis of gain and loss, and to generate metabolic networks linking the annotations to observed phenotypes. We have developed ITEP, an Integrated Toolkit for Exploration of microbial Pan-genomes, to curate protein families, compute similarities to externally-defined domains, analyze gene gain and loss, and generate draft metabolic networks from one or more curated reference network reconstructions in groups of related microbial species among which the combination of core and variable genes constitute the their "pan-genomes". The ITEP toolkit consists of: (1) a series of modular command-line scripts for identification, comparison, curation, and analysis of protein families and their distribution across many genomes; (2) a set of Python libraries for programmatic access to the same data; and (3) pre-packaged scripts to perform common analysis workflows on a collection of genomes. ITEP's capabilities include de novo protein family prediction, ortholog detection, analysis of functional domains, identification of core and variable genes and gene regions, sequence alignments and tree generation, annotation curation, and the integration of cross-genome analysis and metabolic networks for study of metabolic network evolution. ITEP is a powerful, flexible toolkit for generation and curation of protein families. ITEP's modular design allows for straightforward extension as analysis methods and tools evolve. By integrating comparative genomics with the development of draft metabolic networks, ITEP harnesses the power of comparative genomics to build confidence in links between genotype and phenotype and helps disambiguate gene annotations when they are evaluated in both evolutionary and metabolic network contexts.
Whole-Exome Sequencing to Decipher the Genetic Heterogeneity of Hearing Loss in a Chinese Family with Deaf by Deaf Mating

PubMed Central

Qing, Jie; Yan, Denise; Zhou, Yuan; Liu, Qiong; Wu, Weijing; Xiao, Zian; Liu, Yuyuan; Liu, Jia; Du, Lilin; Xie, Dinghua; Liu, Xue Zhong

2014-01-01

Inherited deafness has been shown to have high genetic heterogeneity. For many decades, linkage analysis and candidate gene approaches have been the main tools to elucidate the genetics of hearing loss. However, this associated study design is costly, time-consuming, and unsuitable for small families. This is mainly due to the inadequate numbers of available affected individuals, locus heterogeneity, and assortative mating. Exome sequencing has now become technically feasible and a cost-effective method for detection of disease variants underlying Mendelian disorders due to the recent advances in next-generation sequencing (NGS) technologies. In the present study, we have combined both the Deafness Gene Mutation Detection Array and exome sequencing to identify deafness causative variants in a large Chinese composite family with deaf by deaf mating. The simultaneous screening of the 9 common deafness mutations using the allele-specific PCR based universal array, resulted in the identification of the 1555A>G in the mitochondrial DNA (mtDNA) 12S rRNA in affected individuals in one branch of the family. We then subjected the mutation-negative cases to exome sequencing and identified novel causative variants in the MYH14 and WFS1 genes. This report confirms the effective use of a NGS technique to detect pathogenic mutations in affected individuals who were not candidates for classical genetic studies. PMID:25289672
Evolutionary genomics of miniature inverted-repeat transposable elements (MITEs) in Brassica.

PubMed

Nouroz, Faisal; Noreen, Shumaila; Heslop-Harrison, J S

2015-12-01

Miniature inverted-repeat transposable elements (MITEs) are truncated derivatives of autonomous DNA transposons, and are dispersed abundantly in most eukaryotic genomes. We aimed to characterize various MITEs families in Brassica in terms of their presence, sequence characteristics and evolutionary activity. Dot plot analyses involving comparison of homoeologous bacterial artificial chromosome (BAC) sequences allowed identification of 15 novel families of mobile MITEs. Of which, 5 were Stowaway-like with TA Target Site Duplications (TSDs), 4 Tourist-like with TAA/TTA TSDs, 5 Mutator-like with 9-10 bp TSDs and 1 novel MITE (BoXMITE1) flanked by 3 bp TSDs. Our data suggested that there are about 30,000 MITE-related sequences in Brassica rapa and B. oleracea genomes. In situ hybridization showed one abundant family was dispersed in the A-genome, while another was located near 45S rDNA sites. PCR analysis using primers flanking sequences of MITE elements detected MITE insertion polymorphisms between and within the three Brassica (AA, BB, CC) genomes, with many insertions being specific to single genomes and others showing evidence of more recent evolutionary insertions. Our BAC sequence comparison strategy enables identification of evolutionarily active MITEs with no prior knowledge of MITE sequences. The details of MITE families reported in Brassica enable their identification, characterization and annotation. Insertion polymorphisms of MITEs and their transposition activity indicated important mechanism of genome evolution and diversification. MITE families derived from known Mariner, Harbinger and Mutator DNA transposons were discovered, as well as some novel structures. The identification of Brassica MITEs will have broad applications in Brassica genomics, breeding, hybridization and phylogeny through their use as DNA markers.
Enhancing genomic laboratory reports from the patients' view: A qualitative analysis

PubMed Central

Stuckey, Heather; Fan, Audrey L.; Rahm, Alanna Kulchak; Green, Jamie; Feldman, Lynn; Bonhag, Michele; Zallen, Doris T.; Segal, Michael M.; Williams, Marc S.

2015-01-01

The purpose of this study was to develop a family genomic laboratory report designed to communicate genome sequencing results to parents of children who were participating in a whole genome sequencing clinical research study. Semi‐structured interviews were conducted with parents of children who participated in a whole genome sequencing clinical research study to address the elements, language and format of a sample family‐directed genome laboratory report. The qualitative interviews were followed by two focus groups aimed at evaluating example presentations of information about prognosis and next steps related to the whole genome sequencing result. Three themes emerged from the qualitative data: (i) Parents described a continual search for valid information and resources regarding their child's condition, a need that prior reports did not meet for parents; (ii) Parents believed that the Family Report would help facilitate communication with physicians and family members; and (iii) Parents identified specific items they appreciated in a genomics Family Report: simplicity of language, logical flow, visual appeal, information on what to expect in the future and recommended next steps. Parents affirmed their desire for a family genomic results report designed for their use and reference. They articulated the need for clear, easy to understand language that provided information with temporal detail and specific recommendations regarding relevant findings consistent with that available to clinicians. PMID:26086630
Sequencing and phylogenetic analysis of tobacco virus 2, a polerovirus from Nicotiana tabacum.

PubMed

Zhou, Benguo; Wang, Fang; Zhang, Xuesong; Zhang, Lina; Lin, Huafeng

2017-07-01

The complete genome sequence of a new virus, provisionally named tobacco virus 2 (TV2), was determined and identified from leaves of tobacco (Nicotiana tabacum) exhibiting leaf mosaic, yellowing, and deformity, in Anhui Province, China. The genome sequence of TV2 comprises 5,979 nucleotides, with 87% nucleotide sequence identity to potato leafroll virus (PLRV). Its genome organization is similar to that of PLRV, containing six open reading frames (ORFs) that potentially encode proteins with putative functions in cell-to-cell movement and suppression of RNA silencing. Phylogenetic analysis of the nucleotide sequence placed TV2 alongside members of the genus Polerovirus in the family Luteoviridae. To the best our knowledge, this study is the first report of a complete genome sequence of a new polerovirus identified in tobacco.
MIPS: a database for genomes and protein sequences.

PubMed Central

Mewes, H W; Heumann, K; Kaps, A; Mayer, K; Pfeiffer, F; Stocker, S; Frishman, D

1999-01-01

The Munich Information Center for Protein Sequences (MIPS-GSF), Martinsried near Munich, Germany, develops and maintains genome oriented databases. It is commonplace that the amount of sequence data available increases rapidly, but not the capacity of qualified manual annotation at the sequence databases. Therefore, our strategy aims to cope with the data stream by the comprehensive application of analysis tools to sequences of complete genomes, the systematic classification of protein sequences and the active support of sequence analysis and functional genomics projects. This report describes the systematic and up-to-date analysis of genomes (PEDANT), a comprehensive database of the yeast genome (MYGD), a database reflecting the progress in sequencing the Arabidopsis thaliana genome (MATD), the database of assembled, annotated human EST clusters (MEST), and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). MIPS provides access through its WWW server (http://www.mips.biochem.mpg.de) to a spectrum of generic databases, including the above mentioned as well as a database of protein families (PROTFAM), the MITOP database, and the all-against-all FASTA database. PMID:9847138
Characterization of a gene family abundantly expressed in Oenothera organensis pollen that shows sequence similarity to polygalacturonase.

PubMed Central

Brown, S M; Crouch, M L

1990-01-01

We have isolated and characterized cDNA clones of a gene family (P2) expressed in Oenothera organensis pollen. This family contains approximately six to eight family members and is expressed at high levels only in pollen. The predicted protein sequence from a near full-length cDNA clone shows that the protein products of these genes are at least 38,000 daltons. We identified the protein encoded by one of the cDNAs in this family by using antibodies to beta-galactosidase/pollen cDNA fusion proteins. Immunoblot analysis using these antibodies identifies a family of proteins of approximately 40 kilodaltons that is present in mature pollen, indicating that these mRNAs are not stored solely for translation after pollen germination. These proteins accumulate late in pollen development and are not detectable in other parts of the plant. Although not present in unpollinated or self-pollinated styles, the 40-kilodalton to 45-kilodalton antigens are detectable in extracts from cross-pollinated styles, suggesting that the proteins are present in pollen tubes growing through the style during pollination. The proteins are also present in pollen tubes growing in vitro. Both nucleotide and amino acid sequences are similar to the published sequences for cDNAs encoding the enzyme polygalacturonase, which suggests that the P2 gene family may function in depolymerizing pectin during pollen development, germination, and tube growth. Cross-hybridizing RNAs and immunoreactive proteins were detected in pollen from a wide variety of plant species, which indicates that the P2 family of polygalacturonase-like genes are conserved and may be expressed in the pollen from many angiosperms. PMID:2152116

Mutation analysis of pre-mRNA splicing genes in Chinese families with retinitis pigmentosa

PubMed Central

Pan, Xinyuan; Chen, Xue; Liu, Xiaoxing; Gao, Xiang; Kang, Xiaoli; Xu, Qihua; Chen, Xuejuan; Zhao, Kanxing; Zhang, Xiumei; Chu, Qiaomei; Wang, Xiuying

2014-01-01

Purpose Seven genes involved in precursor mRNA (pre-mRNA) splicing have been implicated in autosomal dominant retinitis pigmentosa (adRP). We sought to detect mutations in all seven genes in Chinese families with RP, to characterize the relevant phenotypes, and to evaluate the prevalence of mutations in splicing genes in patients with adRP. Methods Six unrelated families from our adRP cohort (42 families) and two additional families with RP with uncertain inheritance mode were clinically characterized in the present study. Targeted sequence capture with next-generation massively parallel sequencing (NGS) was performed to screen mutations in 189 genes including all seven pre-mRNA splicing genes associated with adRP. Variants detected with NGS were filtered with bioinformatics analyses, validated with Sanger sequencing, and prioritized with pathogenicity analysis. Results Mutations in pre-mRNA splicing genes were identified in three individual families including one novel frameshift mutation in PRPF31 (p.Leu366fs*1) and two known mutations in SNRNP200 (p.Arg681His and p.Ser1087Leu). The patients carrying SNRNP200 p.R681H showed rapid disease progression, and the family carrying p.S1087L presented earlier onset ages and more severe phenotypes compared to another previously reported family with p.S1087L. In five other families, we identified mutations in other RP-related genes, including RP1 p. Ser781* (novel), RP2 p.Gln65* (novel) and p.Ile137del (novel), IMPDH1 p.Asp311Asn (recurrent), and RHO p.Pro347Leu (recurrent). Conclusions Mutations in splicing genes identified in the present and our previous study account for 9.5% in our adRP cohort, indicating the important role of pre-mRNA splicing deficiency in the etiology of adRP. Mutations in the same splicing gene, or even the same mutation, could correlate with different phenotypic severities, complicating the genotype–phenotype correlation and clinical prognosis. PMID:24940031
Discovering human germ cell mutagens with whole genome sequencing: Insights from power calculations reveal the importance of controlling for between-family variability.

PubMed

Webster, R J; Williams, A; Marchetti, F; Yauk, C L

2018-07-01

Mutations in germ cells pose potential genetic risks to offspring. However, de novo mutations are rare events that are spread across the genome and are difficult to detect. Thus, studies in this area have generally been under-powered, and no human germ cell mutagen has been identified. Whole Genome Sequencing (WGS) of human pedigrees has been proposed as an approach to overcome these technical and statistical challenges. WGS enables analysis of a much wider breadth of the genome than traditional approaches. Here, we performed power analyses to determine the feasibility of using WGS in human families to identify germ cell mutagens. Different statistical models were compared in the power analyses (ANOVA and multiple regression for one-child families, and mixed effect model sampling between two to four siblings per family). Assumptions were made based on parameters from the existing literature, such as the mutation-by-paternal age effect. We explored two scenarios: a constant effect due to an exposure that occurred in the past, and an accumulating effect where the exposure is continuing. Our analysis revealed the importance of modeling inter-family variability of the mutation-by-paternal age effect. Statistical power was improved by models accounting for the family-to-family variability. Our power analyses suggest that sufficient statistical power can be attained with 4-28 four-sibling families per treatment group, when the increase in mutations ranges from 40 to 10% respectively. Modeling family variability using mixed effect models provided a reduction in sample size compared to a multiple regression approach. Much larger sample sizes were required to detect an interaction effect between environmental exposures and paternal age. These findings inform study design and statistical modeling approaches to improve power and reduce sequencing costs for future studies in this area. Crown Copyright © 2018. Published by Elsevier B.V. All rights reserved.
A genomewide survey of basic helix–loop–helix factors in Drosophila

PubMed Central

Moore, Adrian W.; Barbel, Sandra; Jan, Lily Yeh; Jan, Yuh Nung

2000-01-01

The basic helix–loop–helix (bHLH) transcription factors play important roles in the specification of tissue type during the development of animals. We have used the information contained in the recently published genomic sequence of Drosophila melanogaster to identify 12 additional bHLH proteins. By sequence analysis we have assigned these proteins to families defined by Atonal, Hairy-Enhancer of Split, Hand, p48, Mesp, MYC/USF, and the bHLH-Per, Arnt, Sim (PAS) domain. In addition, one single protein represents a unique family of bHLH proteins. mRNA in situ analysis demonstrates that the genes encoding these proteins are expressed in several tissue types but are particularly concentrated in the developing nervous system and mesoderm. PMID:10973473
Phylogenetic analysis of Rutaceous plants based on single nucleotide polymorphism in chloroplast and nuclear gene sequences

USDA-ARS?s Scientific Manuscript database

The family Rutaceae encompasses several genera including the economically important genus Citrus. In this study, we selected 22 citrus relatives belonging to the various sub groups of Rutaceae and compared the sequences of three gene fragments. The accessions selected belong to the subfamily Rutoide...
Expanding sialidosis spectrum by genome-wide screening: NEU1 mutations in adult-onset myoclonus.

PubMed

Canafoglia, Laura; Robbiano, Angela; Pareyson, Davide; Panzica, Ferruccio; Nanetti, Lorenzo; Giovagnoli, Anna Rita; Venerando, Anna; Gellera, Cinzia; Franceschetti, Silvana; Zara, Federico

2014-06-03

To identify the genetic cause of a familial form of late-onset action myoclonus in 2 unrelated patients. Both probands had 2 siblings displaying a similar disorder. Extensive laboratory examinations, including biochemical assessment for urine sialic acid in the 2 probands, were negative. Exome sequencing was performed in the probands using an Illumina platform. Segregation analysis of putative mutations was performed in all family members by standard Sanger sequencing protocols. NEU1 mutations were detected in 3 siblings of each family with prominent cortical myoclonus presenting in the third decade of life and having a mild and slowly progressive course. They did not have macular cherry-red spot and their urinary sialic acid excretion was within normal values. Genetic analysis demonstrated a homozygous mutation in family 1 (c.200G>T, p.S67I) and 2 compound heterozygous mutations in family 2 (c.679G>A, p.G227R; c.913C>T, p.R305C). Our observation indicates that sialidosis should be suspected and the NEU1 gene analyzed in patients with isolated action myoclonus presenting in adulthood in the absence of other typical clinical and laboratory findings. © 2014 American Academy of Neurology.
Seven novel mutations in the long isoform of the USH2A gene in Chinese families with nonsyndromic retinitis pigmentosa and Usher syndrome Type II.

PubMed

Xu, Wenjun; Dai, Hanjun; Lu, Tingting; Zhang, Xiaohui; Dong, Bing; Li, Yang

2011-01-01

To describe the clinical and genetic findings in one Chinese family with autosomal recessive retinitis pigmentosa (arRP) and in three unrelated Chinese families with Usher syndrome type II (USH2). One family (FR1) with arRP and three unrelated families (F6, F7, and F8) with Usher syndrome (USH), including eight affected members and seven unaffected family individuals were examined clinically. The study included 100 normal Chinese individuals as normal controls. After obtaining informed consent, peripheral blood samples from all participants were collected and genomic DNA was extracted. Genotyping and haplotyping analyses were performed on the known genetic loci for arRP with a panel of polymorphic markers in family FR1. In all four families, the coding region (exons 2-72), including the intron-exon boundary of the USH2A (Usher syndrome type -2A protein) gene, was screened by PCR and direct DNA sequencing. Whenever substitutions were identified in a patient, a restriction fragment length polymorphism (RFLP) analysis, single strand conformation polymorphism (SSCP) analysis, or high resolution melt curve analysis (HRM) was performed on all available family members and on the 100 normal controls. The affected individuals presented with typical fundus features of retinitis pigmentosa (RP), including narrowing of the vessels, bone-spicule pigmentation, and waxy optic discs. The electroretinogram (ERG) wave amplitudes of the available probands were undetectable. Audiometric tests in the affected individuals in family FR1 were normal, while indicating moderate to severe sensorineural hearing impairment in the affected individuals in families F6, F7, and F8. Vestibular function was normal in all patients from all four families. The disease-causing gene in family FR1 was mapped to the USH2A locus on chromosome 1q41. Seven novel mutations (two missenses, one 7-bp deletion, two small deletions, and two nonsenses) were detected in the four families after sequencing analysis of USH2A. The results further support that mutations of USH2A are also responsible for non-syndromic RP. The mutation spectrum among Chinese patients might differ from that among European Caucasians.
Using information content and base frequencies to distinguish mutations from genetic polymorphisms in splice junction recognition sites.

PubMed

Rogan, P K; Schneider, T D

1995-01-01

Predicting the effects of nucleotide substitutions in human splice sites has been based on analysis of consensus sequences. We used a graphic representation of sequence conservation and base frequency, the sequence logo, to demonstrate that a change in a splice acceptor of hMSH2 (a gene associated with familial nonpolyposis colon cancer) probably does not reduce splicing efficiency. This confirms a population genetic study that suggested that this substitution is a genetic polymorphism. The information theory-based sequence logo is quantitative and more sensitive than the corresponding splice acceptor consensus sequence for detection of true mutations. Information analysis may potentially be used to distinguish polymorphisms from mutations in other types of transcriptional, translational, or protein-coding motifs.
Exome Sequencing and Linkage Analysis Identified Novel Candidate Genes in Recessive Intellectual Disability Associated with Ataxia.

PubMed

Jazayeri, Roshanak; Hu, Hao; Fattahi, Zohreh; Musante, Luciana; Abedini, Seyedeh Sedigheh; Hosseini, Masoumeh; Wienker, Thomas F; Ropers, Hans Hilger; Najmabadi, Hossein; Kahrizi, Kimia

2015-10-01

Intellectual disability (ID) is a neuro-developmental disorder which causes considerable socio-economic problems. Some ID individuals are also affected by ataxia, and the condition includes different mutations affecting several genes. We used whole exome sequencing (WES) in combination with homozygosity mapping (HM) to identify the genetic defects in five consanguineous families among our cohort study, with two affected children with ID and ataxia as major clinical symptoms. We identified three novel candidate genes, RIPPLY1, MRPL10, SNX14, and a new mutation in known gene SURF1. All are autosomal genes, except RIPPLY1, which is located on the X chromosome. Two are housekeeping genes, implicated in transcription and translation regulation and intracellular trafficking, and two encode mitochondrial proteins. The pathogenesis of these variants was evaluated by mutation classification, bioinformatic methods, review of medical and biological relevance, co-segregation studies in the particular family, and a normal population study. Linkage analysis and exome sequencing of a small number of affected family members is a powerful new technique which can be used to decrease the number of candidate genes in heterogenic disorders such as ID, and may even identify the responsible gene(s).
Novel FAM20A mutation causes autosomal recessive amelogenesis imperfecta.

PubMed

Volodarsky, Michael; Zilberman, Uri; Birk, Ohad S

2015-06-01

To relate the peculiar phenotype of amelogenesis imperfecta in a large Bedouin family to the genotype determined by whole genome linkage analysis. Amelogenesis imperfecta (AI) is a broad group of inherited pathologies affecting enamel formation, characterized by variability in phenotypes, causing mutations and modes of inheritance. Autosomal recessive or compound heterozygous mutations in FAM20A, encoding sequence similarity 20, member A, have been shown to cause several AI phenotypes. Five members from a large consanguineous Bedouin family presented with hypoplastic amelogenesis imperfecta with unerupted and resorbed permanent molars. Following Soroka Medical Center IRB approval and informed consent, blood samples were obtained from six affected offspring, five obligatory carriers and two unaffected siblings. Whole genome linkage analysis was performed followed by Sanger sequencing of FAM20A. The sequencing unravelled a novel homozygous deletion mutation in exon 11 (c.1523delC), predicted to insert a premature stop codon (p.Thr508Lysfs*6). We provide an interesting case of novel mutation in this rare disorder, in which the affected kindred is unique in the large number of family members sharing a similar phenotype. Copyright © 2015 Elsevier Ltd. All rights reserved.
Real-Time PCR for the Detection and Quantification of Geodermatophilaceae from Stone Samples and Identification of New Members of the Genus Blastococcus†

PubMed Central

Salazar, Oscar; Valverde, Aranzazu; Genilloud, Olga

2006-01-01

Real-time PCR (RT-PCR) technology was used for the specific detection and quantification of members of the family Geodermatophilaceae in stone samples. Differences in the nucleotide sequences of the 16S rRNA gene region were used to design a pair of family-specific primers that were used to detect and quantify by RT-PCR DNA from members of this family in stone samples from different geographical origins in Spain. These primers were applied later to identify by PCR-specific amplification new members of the family Geodermatophilaceae isolated from the same stone samples. The diversity and taxonomic position of the wild-type strains identified from ribosomal sequence analysis suggest the presence of a new lineage within the genus Blastococcus. PMID:16391063
The membrane skeleton in Paramecium: Molecular characterization of a novel epiplasmin family and preliminary GFP expression results.

PubMed

Pomel, Sébastien; Diogon, Marie; Bouchard, Philippe; Pradel, Lydie; Ravet, Viviane; Coffe, Gérard; Viguès, Bernard

2006-02-01

Previous attempts to identify the membrane skeleton of Paramecium cells have revealed a protein pattern that is both complex and specific. The most prominent structural elements, epiplasmic scales, are centered around ciliary units and are closely apposed to the cytoplasmic side of the inner alveolar membrane. We sought to characterize epiplasmic scale proteins (epiplasmins) at the molecular level. PCR approaches enabled the cloning and sequencing of two closely related genes by amplifications of sequences from a macronuclear genomic library. Using these two genes (EPI-1 and EPI-2), we have contributed to the annotation of the Paramecium tetraurelia macronuclear genome and identified 39 additional (paralogous) sequences. Two orthologous sequences were found in the Tetrahymena thermophila genome. Structural analysis of the 43 sequences indicates that the hallmark of this new multigenic family is a 79 aa domain flanked by two Q-, P- and V-rich stretches of sequence that are much more variable in amino-acid composition. Such features clearly distinguish members of the multigenic family from epiplasmic proteins previously sequenced in other ciliates. The expression of Green Fluorescent Protein (GFP)-tagged epiplasmin showed significant labeling of epiplasmic scales as well as oral structures. We expect that the GFP construct described herein will prove to be a useful tool for comparative subcellular localization of different putative epiplasmins in Paramecium.
Development of PCR primers specific for the amplification and direct sequencing of gyrB genes from microbacteria, order Actinomycetales.

PubMed

Richert, Kathrin; Brambilla, Evelyne; Stackebrandt, Erko

2005-01-01

PCR primer sets were developed for the specific amplification and sequence analyses encoding the gyrase subunit B (gyrB) of members of the family Microbacteriaceae, class Actinobacteria. The family contains species highly related by 16S rRNA gene sequence analyses. In order to test if the gene sequence analysis of gyrB is appropriate to discriminate between closely related species, we evaluate the 16S rRNA gene phylogeny of its members. As the published universal primer set for gyrB failed to amplify the responding gene of the majority of the 80 type strains of the family, three new primer sets were identified that generated fragments with a composite sequence length of about 900 nt. However, the amplification of all three fragments was successful only in 25% of the 80 type strains. In this study, the substitution frequencies in genes encoding gyrase and 16S rDNA were compared for 10 strains of nine genera. The frequency of gyrB nucleotide substitution is significantly higher than that of the 16S rDNA, and no linear correlation exists between the similarities of both molecules among members of the Microbacteriaceae. The phylogenetic analyses using the gyrB sequences provide higher resolution than using 16S rDNA sequences and seem able to discriminate between closely related species.
Prediction of a common beta-propeller catalytic domain for fructosyltransferases of different origin and substrate specificity.

PubMed

Pons, T; Hernández, L; Batista, F R; Chinea, G

2000-11-01

The three-dimensional (3D) structure of fructan biosynthetic enzymes is still unknown. Here, we have explored folding similarities between reported microbial and plant enzymes that catalyze transfructosylation reactions. A sequence-structure compatibility search using TOPITS, SDP, 3D-PSSM, and SAM-T98 programs identified a beta-propeller fold with scores above the confidence threshold that indicate a structurally conserved catalytic domain in fructosyltransferases (FTFs) of diverse origin and substrate specificity. The predicted fold appeared related to that of neuraminidase and sialidase, of glycoside hydrolase families 33 and 34, respectively. The most reliable structural model was obtained using the crystal structure of neuraminidase (Protein Data Bank file: 5nn9) as template, and it is consistent with the location of previously identified functional residues of bacterial levansucrases (Batista et al., 1999; Song & Jacques, 1999). The sequence-sequence analysis presented here reinforces the recent inclusion of fungal and plant FTFs into glycoside hydrolase family 32, and suggests a modified sequence pattern H-x (2)-[PTV]-x (4)-[LIVMA]-[NSCAYG]-[DE]-P-[NDSC][GA]3 for this family.
Prediction of a common beta-propeller catalytic domain for fructosyltransferases of different origin and substrate specificity.

PubMed Central

Pons, T.; Hernández, L.; Batista, F. R.; Chinea, G.

2000-01-01

The three-dimensional (3D) structure of fructan biosynthetic enzymes is still unknown. Here, we have explored folding similarities between reported microbial and plant enzymes that catalyze transfructosylation reactions. A sequence-structure compatibility search using TOPITS, SDP, 3D-PSSM, and SAM-T98 programs identified a beta-propeller fold with scores above the confidence threshold that indicate a structurally conserved catalytic domain in fructosyltransferases (FTFs) of diverse origin and substrate specificity. The predicted fold appeared related to that of neuraminidase and sialidase, of glycoside hydrolase families 33 and 34, respectively. The most reliable structural model was obtained using the crystal structure of neuraminidase (Protein Data Bank file: 5nn9) as template, and it is consistent with the location of previously identified functional residues of bacterial levansucrases (Batista et al., 1999; Song & Jacques, 1999). The sequence-sequence analysis presented here reinforces the recent inclusion of fungal and plant FTFs into glycoside hydrolase family 32, and suggests a modified sequence pattern H-x (2)-[PTV]-x (4)-[LIVMA]-[NSCAYG]-[DE]-P-[NDSC][GA]3 for this family. PMID:11305239
Within-genome evolution of REPINs: a new family of miniature mobile DNA in bacteria.

PubMed

Bertels, Frederic; Rainey, Paul B

2011-06-01

Repetitive sequences are a conserved feature of many bacterial genomes. While first reported almost thirty years ago, and frequently exploited for genotyping purposes, little is known about their origin, maintenance, or processes affecting the dynamics of within-genome evolution. Here, beginning with analysis of the diversity and abundance of short oligonucleotide sequences in the genome of Pseudomonas fluorescens SBW25, we show that over-represented short sequences define three distinct groups (GI, GII, and GIII) of repetitive extragenic palindromic (REP) sequences. Patterns of REP distribution suggest that closely linked REP sequences form a functional replicative unit: REP doublets are over-represented, randomly distributed in extragenic space, and more highly conserved than singlets. In addition, doublets are organized as inverted repeats, which together with intervening spacer sequences are predicted to form hairpin structures in ssDNA or mRNA. We refer to these newly defined entities as REPINs (REP doublets forming hairpins) and identify short reads from population sequencing that reveal putative transposition intermediates. The proximal relationship between GI, GII, and GIII REPINs and specific REP-associated tyrosine transposases (RAYTs), combined with features of the putative transposition intermediate, suggests a mechanism for within-genome dissemination. Analysis of the distribution of REPs in a range of RAYT-containing bacterial genomes, including Escherichia coli K-12 and Nostoc punctiforme, show that REPINs are a widely distributed, but hitherto unrecognized, family of miniature non-autonomous mobile DNA.
[Study of a family with epidermolysis bullosa simplex resulting from a novel mutation of KRT14 gene].

PubMed

Meng, Lanlan; Du, Juan; Li, Wen; Lu, Guangxiu; Tan, Yueqiu

2017-08-10

To determine the molecular etiology for a Chinese pedigree affected with epidermolysis bullosa simplex (EBS). Target region sequencing using a hereditary epidermolysis bullosa capture array combined with Sanger sequencing and bioinformatics analysis were used. Mutation taster, PolyPhen-2, Provean, and SIFT software and NCBI online were employed to assess the pathogenicity and conservation of detected mutations. One hundred healthy unrelated individuals were used as controls. Target region sequencing showed that the proband has carried a unreported heterozygous c.1234A>G (p.Ile412Val) mutation of the KRT14 gene, which was confirmed by Sanger sequencing in other 8 affected individuals but not among healthy members of the pedigree. Bioinformatics analysis indicated that the mutation is highly pathogenic. Remarkably, 3 members of the family (2 affected and 1 unaffected) have carried a heterozygous c.1237G>A (p.Ala413Thr) mutation of the KRT14 gene, which was collected in Human Gene Mutation Database (HGMD). Bioinformatics analysis indicated that the mutation may not be pathogenic. Both mutations were not detected among the 100 healthy controls. The novel c.1234A>G(p.Ile412Val) mutation of the KRT14 gene is probably responsible for the disease, while c.1237G>A (p.Ala413Thr) mutation of KRT14 gene may be a polymorphism. Compared with Sanger sequencing, target region capture sequencing is more efficient and can significantly reduce the cost of genetic testing for EBS.
Multivariate sequence analysis reveals additional function impacting residues in the SDR superfamily.

PubMed

Tiwari, Pratibha; Singh, Noopur; Dixit, Aparna; Choudhury, Devapriya

2014-10-01

The "extended" type of short chain dehydrogenases/reductases (SDR), share a remarkable similarity in their tertiary structures inspite of being highly divergent in their functions and sequences. We have carried out principal component analysis (PCA) on structurally equivalent residue positions of 10 SDR families using information theoretic measures like Jensen-Shannon divergence and average shannon entropy as variables. The results classify residue positions in the SDR fold into six groups, one of which is characterized by low Shannon entropies but high Jensen-Shannon divergence against the reference family SDR1E, suggesting that these positions are responsible for the specific functional identities of individual SDR families, distinguishing them from the reference family SDR1E. Site directed mutagenesis of three residues from this group in the enzyme UDP-Galactose 4-epimerase belonging to SDR1E shows that the mutants promote the formation of NADH containing abortive complexes. Finally, molecular dynamics simulations have been used to suggest a mechanism by which the mutants interfere with the re-oxidation of NADH leading to the formation of abortive complexes. © 2014 Wiley Periodicals, Inc.
Novel mutations in the genes TGM1 and ALOXE3 underlying autosomal recessive congenital ichthyosis

PubMed Central

Ullah, Rahim; Ansar, Muhammad; Durrani, Zaka Ullah; Lee, Kwanghyuk; Santos-Cortez, Regie Lyn P.; Muhammad, Dost; Ali, Mahboob; Zia, Muhammad; Ayub, Muhammad; Khan, Suliman; Smith, Josh D.; Nickerson, Deborah A.; Shendure, Jay; Bamshad, Michael; Leal, Suzanne M.; Ahmad, Wasim

2016-01-01

Background Ichthyoses are clinically characterized by scaling or hyperkeratosis of the skin or both. It can be an isolated condition limited to the skin or appear secondarily with involvement of other cutaneous or systemic abnormalities. Methods The present study investigated clinical and molecular characterization of three consanguineous families (A, B, C) segregating two different forms of autosomal recessive congenital ichthyosis (ARCI). Linkage in three consanguineous families (A, B, C) segregating two different forms of ARCI was searched by typing microsatellite and single nucleotide polymorphism marker analysis. Sequencing of the two genes TGM1 and ALOXE3 was performed by the dideoxy chain termination method. Results Genome-wide linkage analysis established linkage in family A to TGM1 gene on chromosome 14q11 and in families B and C to ALOXE3 gene on chromosome 17p13. Subsequently, sequencing of these genes using samples from affected family members led to the identification of three novel mutations: a missense variant p.Trp455Arg in TGM1 (family A); a nonsense variant p.Arg140* in ALOXE3 (family B); and a complex rearrangement in ALOXE3 (family C). Conclusion The present study further extends the spectrum of mutations in the two genes involved in causing ARCI. Characterizing the clinical spectrum resulting from mutations in the TGM1 and ALOXE3 genes will improve diagnosis and may direct clinical care of the family members. PMID:26578203
[Molecular cloning of the DNA sequence of activin beta A subunit gene mature peptides from panda and related species and its application in the research of phylogeny and taxonomy].

PubMed

Wang, Xiao-Jing; Wang, Xiao-Xing; Wang, Ya-Jun; Wang, Xi-Zhong; He, Guang-Xin; Chen, Hong-Wei; Fei, Li-Song

2002-09-01

Activin, which is included in the transforming growth factor-beta (TGF beta) superfamily of proteins and receptors, is known to have broad-ranging effects in the creatures. The mature peptide of beta A subunit of this gene, one of the most highly conserved sequence, can elevate the basal secretion of follicle-stimulating hormone (FSH) in the pituitary and FSH is pivotal to organism's reproduction. Reproduction block is one of the main reasons which cause giant panda to extinct. The sequence of Activin beta A subunit gene mature peptides has been successfully amplified from giant panda, red panda and malayan sun bear's genomic DNA by using polymerase chain reaction (PCR) with a pair of degenerate primers. The PCR products were cloned into the vector pBlueScript+ of Esherichia coli. Sequence analysis of Activin beta A subunit gene mature peptides shows that the length of this gene segment is the same (359 bp) and there is no intron in all three species. The sequence encodes a peptide of 119 amino acid residues. The homology comparison demonstrates 93.9% DNA homology and 99% homology in amino acid among these three species. Both GenBank blast search result and restriction enzyme map reveal that the sequences of Activin beta A subunit gene mature peptides of different species are highly conserved during the evolution process. Phylogeny analysis is performed with PHYLIP software package. A consistent phylogeny tree has been drawn with three different methods. The software analysis outcome accords with the academic view that giant panda has a closer relationship to the malayan sun bear than the red panda. Giant panda should be grouped into the bear family (Uersidae) with the malayan sun bear. As to the red panda, it would be better that this animal be grouped into the unique family (red panda family) because of great difference between the red panda and the bears (Uersidae).
Complete sequence determination of a novel reptile iridovirus isolated from soft-shelled turtle and evolutionary analysis of Iridoviridae

PubMed Central

Huang, Youhua; Huang, Xiaohong; Liu, Hong; Gong, Jie; Ouyang, Zhengliang; Cui, Huachun; Cao, Jianhao; Zhao, Yingtao; Wang, Xiujie; Jiang, Yulin; Qin, Qiwei

2009-01-01

Background Soft-shelled turtle iridovirus (STIV) is the causative agent of severe systemic diseases in cultured soft-shelled turtles (Trionyx sinensis). To our knowledge, the only molecular information available on STIV mainly concerns the highly conserved STIV major capsid protein. The complete sequence of the STIV genome is not yet available. Therefore, determining the genome sequence of STIV and providing a detailed bioinformatic analysis of its genome content and evolution status will facilitate further understanding of the taxonomic elements of STIV and the molecular mechanisms of reptile iridovirus pathogenesis. Results We determined the complete nucleotide sequence of the STIV genome using 454 Life Science sequencing technology. The STIV genome is 105 890 bp in length with a base composition of 55.1% G+C. Computer assisted analysis revealed that the STIV genome contains 105 potential open reading frames (ORFs), which encode polypeptides ranging from 40 to 1,294 amino acids and 20 microRNA candidates. Among the putative proteins, 20 share homology with the ancestral proteins of the nuclear and cytoplasmic large DNA viruses (NCLDVs). Comparative genomic analysis showed that STIV has the highest degree of sequence conservation and a colinear arrangement of genes with frog virus 3 (FV3), followed by Tiger frog virus (TFV), Ambystoma tigrinum virus (ATV), Singapore grouper iridovirus (SGIV), Grouper iridovirus (GIV) and other iridovirus isolates. Phylogenetic analysis based on conserved core genes and complete genome sequence of STIV with other virus genomes was performed. Moreover, analysis of the gene gain-and-loss events in the family Iridoviridae suggested that the genes encoded by iridoviruses have evolved for favoring adaptation to different natural host species. Conclusion This study has provided the complete genome sequence of STIV. Phylogenetic analysis suggested that STIV and FV3 are strains of the same viral species belonging to the Ranavirus genus in the Iridoviridae family. Given virus-host co-evolution and the phylogenetic relationship among vertebrates from fish to reptiles, we propose that iridovirus might transmit between reptiles and amphibians and that STIV and FV3 are strains of the same viral species in the Ranavirus genus. PMID:19439104

Identification of novel mutations of the CHST6 gene in Vietnamese families affected with macular corneal dystrophy in two generations.

PubMed

Ha, Nguyen Thanh; Chau, Hoang Minh; Cung, Le Xuan; Thanh, Ton Kim; Fujiki, Keiko; Murakami, Akira; Hiratsuka, Yoshimune; Hasegawa, Nobuko; Kanai, Atsushi

2003-08-01

To report the clinical and genetic findings of Vietnamese families affected with macular corneal dystrophy (MCD) in 2 generations. Two families, including 7 patients and 3 unaffected members, were examined clinically. Blood samples were collected. Fifty normal Vietnamese individuals were used as controls. Genomic DNA was extracted from leukocytes. Analysis of the carbohydrate sulfotransferase (CHST6) gene was performed using polymerase chain reaction and direct sequencing. The typical form of MCD was recognized in family B, in which sequencing of CHST6 gene revealed an nt 1067-1068ins(GGCCGTG) mutation (frameshift after 125V) homozygously in MCD patients and heterozygously in the unaffected members. Family N also showed clinical features of MCD, moderate in the mother but severe in the affected son. Sequencing revealed a single heterozygous Arg211Gln in the mother, compound heterozygous Arg211Gln+ Gln82Stop in the affected son, and heterozygous Arg211Gln mutation in the unaffected members. The identified mutations in these pedigrees were excluded from normal controls. The novel frameshift and compound heterozygous mutations might be responsible for MCD in the families studied. The phenotypic variation between affected parents and offspring was unclear. In family N, severe MCD phenotype seen in the affected son may be due the fact that he had an early stop codon mutation (Gln82Stop).
Next-generation sequencing to solve complex inherited retinal dystrophy: A case series of multiple genes contributing to disease in extended families.

PubMed

Jones, Kaylie D; Wheaton, Dianna K; Bowne, Sara J; Sullivan, Lori S; Birch, David G; Chen, Rui; Daiger, Stephen P

2017-01-01

With recent availability of next-generation sequencing (NGS), it is becoming more common to pursue disease-targeted panel testing rather than traditional sequential gene-by-gene dideoxy sequencing. In this report, we describe using NGS to identify multiple disease-causing mutations that contribute concurrently or independently to retinal dystrophy in three relatively small families. Family members underwent comprehensive visual function evaluations, and genetic counseling including a detailed family history. A preliminary genetic inheritance pattern was assigned and updated as additional family members were tested. Family 1 (FAM1) and Family 2 (FAM2) were clinically diagnosed with retinitis pigmentosa (RP) and had a suspected autosomal dominant pedigree with non-penetrance (n.p.). Family 3 (FAM3) consisted of a large family with a diagnosis of RP and an overall dominant pedigree, but the proband had phenotypically cone-rod dystrophy. Initial genetic analysis was performed on one family member with traditional Sanger single gene sequencing and/or panel-based testing, and ultimately, retinal gene-targeted NGS was required to identify the underlying cause of disease for individuals within the three families. Results obtained in these families necessitated further genetic and clinical testing of additional family members to determine the complex genetic and phenotypic etiology of each family. Genetic testing of FAM1 (n = 4 affected; 1 n.p.) identified a dominant mutation in RP1 (p.Arg677Ter) that was present for two of the four affected individuals but absent in the proband and the presumed non-penetrant individual. Retinal gene-targeted NGS in the fourth affected family member revealed compound heterozygous mutations in USH2A (p. Cys419Phe, p.Glu767Serfs*21). Genetic testing of FAM2 (n = 3 affected; 1 n.p.) identified three retinal dystrophy genes ( PRPH2 , PRPF8 , and USH2A ) with disease-causing mutations in varying combinations among the affected family members. Genetic testing of FAM3 (n = 7 affected) identified a mutation in PRPH2 (p.Pro216Leu) tracking with disease in six of the seven affected individuals. Additional retinal gene-targeted NGS testing determined that the proband also harbored a multiple exon deletion in the CRX gene likely accounting for her cone-rod phenotype; her son harbored only the mutation in CRX , not the familial mutation in PRPH2 . Multiple genes contributing to the retinal dystrophy genotypes within a family were discovered using retinal gene-targeted NGS. Families with noted examples of phenotypic variation or apparent non-penetrant individuals may offer a clue to suspect complex inheritance. Furthermore, this finding underscores that caution should be taken when attributing a single gene disease-causing mutation (or inheritance pattern) to a family as a whole. Identification of a disease-causing mutation in a proband, even with a clear inheritance pattern in hand, may not be sufficient for targeted, known mutation analysis in other family members.
Gene Structures, Evolution and Transcriptional Profiling of the WRKY Gene Family in Castor Bean (Ricinus communis L.).

PubMed

Zou, Zhi; Yang, Lifu; Wang, Danhua; Huang, Qixing; Mo, Yeyong; Xie, Guishui

2016-01-01

WRKY proteins comprise one of the largest transcription factor families in plants and form key regulators of many plant processes. This study presents the characterization of 58 WRKY genes from the castor bean (Ricinus communis L., Euphorbiaceae) genome. Compared with the automatic genome annotation, one more WRKY-encoding locus was identified and 20 out of the 57 predicted gene models were manually corrected. All RcWRKY genes were shown to contain at least one intron in their coding sequences. According to the structural features of the present WRKY domains, the identified RcWRKY genes were assigned to three previously defined groups (I-III). Although castor bean underwent no recent whole-genome duplication event like physic nut (Jatropha curcas L., Euphorbiaceae), comparative genomics analysis indicated that one gene loss, one intron loss and one recent proximal duplication occurred in the RcWRKY gene family. The expression of all 58 RcWRKY genes was supported by ESTs and/or RNA sequencing reads derived from roots, leaves, flowers, seeds and endosperms. Further global expression profiles with RNA sequencing data revealed diverse expression patterns among various tissues. Results obtained from this study not only provide valuable information for future functional analysis and utilization of the castor bean WRKY genes, but also provide a useful reference to investigate the gene family expansion and evolution in Euphorbiaceus plants.
Differences in glycosyltransferase family 61 accompany variation in seed coat mucilage composition in Plantago spp.

PubMed

Phan, Jana L; Tucker, Matthew R; Khor, Shi Fang; Shirley, Neil; Lahnstein, Jelle; Beahan, Cherie; Bacic, Antony; Burton, Rachel A

2016-12-01

Xylans are the most abundant non-cellulosic polysaccharide found in plant cell walls. A diverse range of xylan structures influence tissue function during growth and development. Despite the abundance of xylans in nature, details of the genes and biochemical pathways controlling their biosynthesis are lacking. In this study we have utilized natural variation within the Plantago genus to examine variation in heteroxylan composition and structure in seed coat mucilage. Compositional assays were combined with analysis of the glycosyltransferase family 61 (GT61) family during seed coat development, with the aim of identifying GT61 sequences participating in xylan backbone substitution. The results reveal natural variation in heteroxylan content and structure, particularly in P. ovata and P. cunninghamii, species which show a similar amount of heteroxylan but different backbone substitution profiles. Analysis of the GT61 family identified specific sequences co-expressed with IRREGULAR XYLEM 10 genes, which encode putative xylan synthases, revealing a close temporal association between xylan synthesis and substitution. Moreover, in P. ovata, several abundant GT61 sequences appear to lack orthologues in P. cunninghamii. Our results indicate that natural variation in Plantago species can be exploited to reveal novel details of seed coat development and polysaccharide biosynthetic pathways. © The Author 2016. Published by Oxford University Press on behalf of the Society for Experimental Biology.
Genetic analysis of PAX3 for diagnosis of Waardenburg syndrome type I.

PubMed

Matsunaga, Tatsuo; Mutai, Hideki; Namba, Kazunori; Morita, Noriko; Masuda, Sawako

2013-04-01

PAX3 genetic analysis increased the diagnostic accuracy for Waardenburg syndrome type I (WS1). Analysis of the three-dimensional (3D) structure of PAX3 helped verify the pathogenicity of a missense mutation, and multiple ligation-dependent probe amplification (MLPA) analysis of PAX3 increased the sensitivity of genetic diagnosis in patients with WS1. Clinical diagnosis of WS1 is often difficult in individual patients with isolated, mild, or non-specific symptoms. The objective of the present study was to facilitate the accurate diagnosis of WS1 through genetic analysis of PAX3 and to expand the spectrum of known PAX3 mutations. In two Japanese families with WS1, we conducted a clinical evaluation of symptoms and genetic analysis, which involved direct sequencing, MLPA analysis, quantitative PCR of PAX3, and analysis of the predicted 3D structure of PAX3. The normal-hearing control group comprised 92 subjects who had normal hearing according to pure tone audiometry. In one family, direct sequencing of PAX3 identified a heterozygous mutation, p.I59F. Analysis of PAX3 3D structures indicated that this mutation distorted the DNA-binding site of PAX3. In the other family, MLPA analysis and subsequent quantitative PCR detected a large, heterozygous deletion spanning 1759-2554 kb that eliminated 12-18 genes including a whole PAX3 gene.
Multi-Harmony: detecting functional specificity from sequence alignment

PubMed Central

Brandt, Bernd W.; Feenstra, K. Anton; Heringa, Jaap

2010-01-01

Many protein families contain sub-families with functional specialization, such as binding different ligands or being involved in different protein–protein interactions. A small number of amino acids generally determine functional specificity. The identification of these residues can aid the understanding of protein function and help finding targets for experimental analysis. Here, we present multi-Harmony, an interactive web sever for detecting sub-type-specific sites in proteins starting from a multiple sequence alignment. Combining our Sequence Harmony (SH) and multi-Relief (mR) methods in one web server allows simultaneous analysis and comparison of specificity residues; furthermore, both methods have been significantly improved and extended. SH has been extended to cope with more than two sub-groups. mR has been changed from a sampling implementation to a deterministic one, making it more consistent and user friendly. For both methods Z-scores are reported. The multi-Harmony web server produces a dynamic output page, which includes interactive connections to the Jalview and Jmol applets, thereby allowing interactive analysis of the results. Multi-Harmony is available at http://www.ibi.vu.nl/ programs/shmrwww. PMID:20525785
The crystal structure of Erwinia amylovora AmyR, a member of the YbjN protein family, shows similarity to type III secretion chaperones but suggests different cellular functions

PubMed Central

Bartho, Joseph D.; Bellini, Dom; Wuerges, Jochen; Demitri, Nicola; Toccafondi, Mirco; Schmitt, Armin O.; Zhao, Youfu; Walsh, Martin A.

2017-01-01

AmyR is a stress and virulence associated protein from the plant pathogenic Enterobacteriaceae species Erwinia amylovora, and is a functionally conserved ortholog of YbjN from Escherichia coli. The crystal structure of E. amylovora AmyR reveals a class I type III secretion chaperone-like fold, despite the lack of sequence similarity between these two classes of protein and lacking any evidence of a secretion-associated role. The results indicate that AmyR, and YbjN proteins in general, function through protein-protein interactions without any enzymatic action. The YbjN proteins of Enterobacteriaceae show remarkably low sequence similarity with other members of the YbjN protein family in Eubacteria, yet a high level of structural conservation is observed. Across the YbjN protein family sequence conservation is limited to residues stabilising the protein core and dimerization interface, while interacting regions are only conserved between closely related species. This study presents the first structure of a YbjN protein from Enterobacteriaceae, the most highly divergent and well-studied subgroup of YbjN proteins, and an in-depth sequence and structural analysis of this important but poorly understood protein family. PMID:28426806
The crystal structure of Erwinia amylovora AmyR, a member of the YbjN protein family, shows similarity to type III secretion chaperones but suggests different cellular functions.

PubMed

Bartho, Joseph D; Bellini, Dom; Wuerges, Jochen; Demitri, Nicola; Toccafondi, Mirco; Schmitt, Armin O; Zhao, Youfu; Walsh, Martin A; Benini, Stefano

2017-01-01

AmyR is a stress and virulence associated protein from the plant pathogenic Enterobacteriaceae species Erwinia amylovora, and is a functionally conserved ortholog of YbjN from Escherichia coli. The crystal structure of E. amylovora AmyR reveals a class I type III secretion chaperone-like fold, despite the lack of sequence similarity between these two classes of protein and lacking any evidence of a secretion-associated role. The results indicate that AmyR, and YbjN proteins in general, function through protein-protein interactions without any enzymatic action. The YbjN proteins of Enterobacteriaceae show remarkably low sequence similarity with other members of the YbjN protein family in Eubacteria, yet a high level of structural conservation is observed. Across the YbjN protein family sequence conservation is limited to residues stabilising the protein core and dimerization interface, while interacting regions are only conserved between closely related species. This study presents the first structure of a YbjN protein from Enterobacteriaceae, the most highly divergent and well-studied subgroup of YbjN proteins, and an in-depth sequence and structural analysis of this important but poorly understood protein family.
Viral metagenomic analysis of feces of wild small carnivores

PubMed Central

2014-01-01

Background Recent studies have clearly demonstrated the enormous virus diversity that exists among wild animals. This exemplifies the required expansion of our knowledge of the virus diversity present in wildlife, as well as the potential transmission of these viruses to domestic animals or humans. Methods In the present study we evaluated the viral diversity of fecal samples (n = 42) collected from 10 different species of wild small carnivores inhabiting the northern part of Spain using random PCR in combination with next-generation sequencing. Samples were collected from American mink (Neovison vison), European mink (Mustela lutreola), European polecat (Mustela putorius), European pine marten (Martes martes), stone marten (Martes foina), Eurasian otter (Lutra lutra) and Eurasian badger (Meles meles) of the family of Mustelidae; common genet (Genetta genetta) of the family of Viverridae; red fox (Vulpes vulpes) of the family of Canidae and European wild cat (Felis silvestris) of the family of Felidae. Results A number of sequences of possible novel viruses or virus variants were detected, including a theilovirus, phleboviruses, an amdovirus, a kobuvirus and picobirnaviruses. Conclusions Using random PCR in combination with next generation sequencing, sequences of various novel viruses or virus variants were detected in fecal samples collected from Spanish carnivores. Detected novel viruses highlight the viral diversity that is present in fecal material of wild carnivores. PMID:24886057
Whole Genome Sequencing Reveals a De Novo SHANK3 Mutation in Familial Autism Spectrum Disorder

PubMed Central

Nemirovsky, Sergio I.; Córdoba, Marta; Zaiat, Jonathan J.; Completa, Sabrina P.; Vega, Patricia A.; González-Morón, Dolores; Medina, Nancy M.; Fabbro, Mónica; Romero, Soledad; Brun, Bianca; Revale, Santiago; Ogara, María Florencia; Pecci, Adali; Marti, Marcelo; Vazquez, Martin; Turjanski, Adrián; Kauffman, Marcelo A.

2015-01-01

Introduction Clinical genomics promise to be especially suitable for the study of etiologically heterogeneous conditions such as Autism Spectrum Disorder (ASD). Here we present three siblings with ASD where we evaluated the usefulness of Whole Genome Sequencing (WGS) for the diagnostic approach to ASD. Methods We identified a family segregating ASD in three siblings with an unidentified cause. We performed WGS in the three probands and used a state-of-the-art comprehensive bioinformatic analysis pipeline and prioritized the identified variants located in genes likely to be related to ASD. We validated the finding by Sanger sequencing in the probands and their parents. Results Three male siblings presented a syndrome characterized by severe intellectual disability, absence of language, autism spectrum symptoms and epilepsy with negative family history for mental retardation, language disorders, ASD or other psychiatric disorders. We found germline mosaicism for a heterozygous deletion of a cytosine in the exon 21 of the SHANK3 gene, resulting in a missense sequence of 5 codons followed by a premature stop codon (NM_033517:c.3259_3259delC, p.Ser1088Profs*6). Conclusions We reported an infrequent form of familial ASD where WGS proved useful in the clinic. We identified a mutation in SHANK3 that underscores its relevance in Autism Spectrum Disorder. PMID:25646853
Whole genome sequencing reveals a de novo SHANK3 mutation in familial autism spectrum disorder.

PubMed

Nemirovsky, Sergio I; Córdoba, Marta; Zaiat, Jonathan J; Completa, Sabrina P; Vega, Patricia A; González-Morón, Dolores; Medina, Nancy M; Fabbro, Mónica; Romero, Soledad; Brun, Bianca; Revale, Santiago; Ogara, María Florencia; Pecci, Adali; Marti, Marcelo; Vazquez, Martin; Turjanski, Adrián; Kauffman, Marcelo A

2015-01-01

Clinical genomics promise to be especially suitable for the study of etiologically heterogeneous conditions such as Autism Spectrum Disorder (ASD). Here we present three siblings with ASD where we evaluated the usefulness of Whole Genome Sequencing (WGS) for the diagnostic approach to ASD. We identified a family segregating ASD in three siblings with an unidentified cause. We performed WGS in the three probands and used a state-of-the-art comprehensive bioinformatic analysis pipeline and prioritized the identified variants located in genes likely to be related to ASD. We validated the finding by Sanger sequencing in the probands and their parents. Three male siblings presented a syndrome characterized by severe intellectual disability, absence of language, autism spectrum symptoms and epilepsy with negative family history for mental retardation, language disorders, ASD or other psychiatric disorders. We found germline mosaicism for a heterozygous deletion of a cytosine in the exon 21 of the SHANK3 gene, resulting in a missense sequence of 5 codons followed by a premature stop codon (NM_033517:c.3259_3259delC, p.Ser1088Profs*6). We reported an infrequent form of familial ASD where WGS proved useful in the clinic. We identified a mutation in SHANK3 that underscores its relevance in Autism Spectrum Disorder.
Novel compound heterozygous mutations in MYO7A in a Chinese family with Usher syndrome type 1.

PubMed

Liu, Fei; Li, Pengcheng; Liu, Ying; Li, Weirong; Wong, Fulton; Du, Rong; Wang, Lei; Li, Chang; Jiang, Fagang; Tang, Zhaohui; Liu, Mugen

2013-01-01

To identify the disease-causing mutation(s) in a Chinese family with autosomal recessive Usher syndrome type 1 (USH1). An ophthalmic examination and an audiometric test were conducted to ascertain the phenotype of two affected siblings. The microsatellite marker D11S937, which is close to the candidate gene MYO7A (USH1B locus), was selected for genotyping. From the DNA of the proband, all coding exons and exon-intron boundaries of MYO7A were sequenced to identify the disease-causing mutation(s). Restriction fragment length polymorphism (RFLP) analysis was performed to exclude the alternative conclusion that the mutations are non-pathogenic rare polymorphisms. Based on severe hearing impairment, unintelligible speech, and retinitis pigmentosa, a clinical diagnosis of Usher syndrome type 1 was made. The genotyping results did not exclude the USH1B locus, which suggested that the MYO7A gene was likely the gene associated with the disease-causing mutation(s) in the family. With direct DNA sequencing of MYO7A, two novel compound heterozygous mutations (c.3742G>A and c.6051+1G>A) of MYO7A were identified in the proband. DNA sequence analysis and RFLP analysis of other family members showed that the mutations cosegregated with the disease. Unaffected members, including the parents, uncle, and sister of the proband, carry only one of the two mutations. The mutations were not present in the controls (100 normal Chinese subjects=200 chromosomes) according to the RFLP analysis. In this study, we identified two novel mutations, c.3742G>A (p.E1248K) and c.6051+1G>A (donor splice site mutation in intron 44), of MYO7A in a Chinese non-consanguineous family with USH1. The mutations cosegregated with the disease and most likely cause the phenotype in the two affected siblings who carry these mutations compound heterozygously. Our finding expands the mutational spectrum of MYO7A.
Laboratory diagnosis and genetic analysis of a family clustering outbreak of aseptic meningitis due to echovirus 30

PubMed Central

Ye, Hongyan; Yan, Juying; Xie, Guoliang; Cui, Dawei; Yu, Fei; Wang, Yiyin; Yang, Xianzhi; Zhou, Fangman; Zhang, Yanjun; Tian, Xueli; Chen, Yu

2016-01-01

Echovirus 30 (E30) is a major pathogen associated with aseptic meningitis. In the summer of 2014, a family clustering aseptic meningitis outbreak occurred in urban–rural fringe of Ningbo city in Zhejiang Province in China. To identify the etiologic agent, specimens were tested by cell culture and reverse transcriptase–polymerase chain reaction. Pathogenic examination confirmed that the outbreak is caused by E30. The first case is a 6-year-old child, who studied in kindergarten in local, suffered from headache and fever. Same symptoms appeared in his parents, aunts, and other six relatives continuously. Meanwhile, vomiting occurred in majority of the patients and diarrhea in parts of them. White blood cells in cerebrospinal fluid (CSF) exceeded normal range in all patients. Protein levels in CSF were above normal range in half of the patients. Glucose levels in CSF were within normal range in all patients. We isolated six strains E30 in the stool specimens of patients, and carried out sequencing analysis to VP1 region. Sequencing results showed that 100% sequence identity was seen in both nucleotide and amino acid levels. Phylogenetic analysis discovered that isolate in this study was grouped into sublineage D2 together with sequences isolated from other areas of China in the 2000s and 2010s. Our study is the first family clustering outbreak of aseptic meningitis caused by E30 in Zhejiang Province in China. It is essential to establish an enterovirus molecular surveillance system in China to prevent mass outbreaks in Zhejiang. PMID:27646838
Laboratory diagnosis and genetic analysis of a family clustering outbreak of aseptic meningitis due to echovirus 30.

PubMed

Zheng, Shufa; Ye, Hongyan; Yan, Juying; Xie, Guoliang; Cui, Dawei; Yu, Fei; Wang, Yiyin; Yang, Xianzhi; Zhou, Fangman; Zhang, Yanjun; Tian, Xueli; Chen, Yu

2016-09-01

Echovirus 30 (E30) is a major pathogen associated with aseptic meningitis. In the summer of 2014, a family clustering aseptic meningitis outbreak occurred in urban-rural fringe of Ningbo city in Zhejiang Province in China. To identify the etiologic agent, specimens were tested by cell culture and reverse transcriptase-polymerase chain reaction. Pathogenic examination confirmed that the outbreak is caused by E30. The first case is a 6-year-old child, who studied in kindergarten in local, suffered from headache and fever. Same symptoms appeared in his parents, aunts, and other six relatives continuously. Meanwhile, vomiting occurred in majority of the patients and diarrhea in parts of them. White blood cells in cerebrospinal fluid (CSF) exceeded normal range in all patients. Protein levels in CSF were above normal range in half of the patients. Glucose levels in CSF were within normal range in all patients. We isolated six strains E30 in the stool specimens of patients, and carried out sequencing analysis to VP1 region. Sequencing results showed that 100% sequence identity was seen in both nucleotide and amino acid levels. Phylogenetic analysis discovered that isolate in this study was grouped into sublineage D2 together with sequences isolated from other areas of China in the 2000s and 2010s. Our study is the first family clustering outbreak of aseptic meningitis caused by E30 in Zhejiang Province in China. It is essential to establish an enterovirus molecular surveillance system in China to prevent mass outbreaks in Zhejiang.
Transcriptome Wide Identification and Validation of Calcium Sensor Gene Family in the Developing Spikes of Finger Millet Genotypes for Elucidating Its Role in Grain Calcium Accumulation

PubMed Central

Singh, Uma M.; Chandra, Muktesh; Shankhdhar, Shailesh C.; Kumar, Anil

2014-01-01

Background In finger millet, calcium is one of the important and abundant mineral elements. The molecular mechanisms involved in calcium accumulation in plants remains poorly understood. Transcriptome sequencing of genetically diverse genotypes of finger millet differing in grain calcium content will help in understanding the trait. Principal Finding In this study, the transcriptome sequencing of spike tissues of two genotypes of finger millet differing in their grain calcium content, were performed for the first time. Out of 109,218 contigs, 78 contigs in case of GP-1 (Low Ca genotype) and out of 120,130 contigs 76 contigs in case of GP-45 (High Ca genotype), were identified as calcium sensor genes. Through in silico analysis all 82 unique calcium sensor genes were classified into eight calcium sensor gene family viz., CaM & CaMLs, CBLs, CIPKs, CRKs, PEPRKs, CDPKs, CaMKs and CCaMK. Out of 82 genes, 12 were found diverse from the rice orthologs. The differential expression analysis on the basis of FPKM value resulted in 24 genes highly expressed in GP-45 and 11 genes highly expressed in GP-1. Ten of the 35 differentially expressed genes could be assigned to three documented pathways involved mainly in stress responses. Furthermore, validation of selected calcium sensor responder genes was also performed by qPCR, in developing spikes of both genotypes grown on different concentration of exogenous calcium. Conclusion Through de novo transcriptome data assembly and analysis, we reported the comprehensive identification and functional characterization of calcium sensor gene family. The calcium sensor gene family identified and characterized in this study will facilitate in understanding the molecular basis of calcium accumulation and development of calcium biofortified crops. Moreover, this study also supported that identification and characterization of gene family through Illumina paired-end sequencing is a potential tool for generating the genomic information of gene family in non-model species. PMID:25157851
Transcriptome wide identification and validation of calcium sensor gene family in the developing spikes of finger millet genotypes for elucidating its role in grain calcium accumulation.

PubMed

Singh, Uma M; Chandra, Muktesh; Shankhdhar, Shailesh C; Kumar, Anil

2014-01-01

In finger millet, calcium is one of the important and abundant mineral elements. The molecular mechanisms involved in calcium accumulation in plants remains poorly understood. Transcriptome sequencing of genetically diverse genotypes of finger millet differing in grain calcium content will help in understanding the trait. In this study, the transcriptome sequencing of spike tissues of two genotypes of finger millet differing in their grain calcium content, were performed for the first time. Out of 109,218 contigs, 78 contigs in case of GP-1 (Low Ca genotype) and out of 120,130 contigs 76 contigs in case of GP-45 (High Ca genotype), were identified as calcium sensor genes. Through in silico analysis all 82 unique calcium sensor genes were classified into eight calcium sensor gene family viz., CaM & CaMLs, CBLs, CIPKs, CRKs, PEPRKs, CDPKs, CaMKs and CCaMK. Out of 82 genes, 12 were found diverse from the rice orthologs. The differential expression analysis on the basis of FPKM value resulted in 24 genes highly expressed in GP-45 and 11 genes highly expressed in GP-1. Ten of the 35 differentially expressed genes could be assigned to three documented pathways involved mainly in stress responses. Furthermore, validation of selected calcium sensor responder genes was also performed by qPCR, in developing spikes of both genotypes grown on different concentration of exogenous calcium. Through de novo transcriptome data assembly and analysis, we reported the comprehensive identification and functional characterization of calcium sensor gene family. The calcium sensor gene family identified and characterized in this study will facilitate in understanding the molecular basis of calcium accumulation and development of calcium biofortified crops. Moreover, this study also supported that identification and characterization of gene family through Illumina paired-end sequencing is a potential tool for generating the genomic information of gene family in non-model species.
Taxonomic distribution and origins of the extended LHC (light-harvesting complex) antenna protein superfamily

PubMed Central

2010-01-01

Background The extended light-harvesting complex (LHC) protein superfamily is a centerpiece of eukaryotic photosynthesis, comprising the LHC family and several families involved in photoprotection, like the LHC-like and the photosystem II subunit S (PSBS). The evolution of this complex superfamily has long remained elusive, partially due to previously missing families. Results In this study we present a meticulous search for LHC-like sequences in public genome and expressed sequence tag databases covering twelve representative photosynthetic eukaryotes from the three primary lineages of plants (Plantae): glaucophytes, red algae and green plants (Viridiplantae). By introducing a coherent classification of the different protein families based on both, hidden Markov model analyses and structural predictions, numerous new LHC-like sequences were identified and several new families were described, including the red lineage chlorophyll a/b-binding-like protein (RedCAP) family from red algae and diatoms. The test of alternative topologies of sequences of the highly conserved chlorophyll-binding core structure of LHC and PSBS proteins significantly supports the independent origins of LHC and PSBS families via two unrelated internal gene duplication events. This result was confirmed by the application of cluster likelihood mapping. Conclusions The independent evolution of LHC and PSBS families is supported by strong phylogenetic evidence. In addition, a possible origin of LHC and PSBS families from different homologous members of the stress-enhanced protein subfamily, a diverse and anciently paralogous group of two-helix proteins, seems likely. The new hypothesis for the evolution of the extended LHC protein superfamily proposed here is in agreement with the character evolution analysis that incorporates the distribution of families and subfamilies across taxonomic lineages. Intriguingly, stress-enhanced proteins, which are universally found in the genomes of green plants, red algae, glaucophytes and in diatoms with complex plastids, could represent an important and previously missing link in the evolution of the extended LHC protein superfamily. PMID:20673336
Whole-genome analysis of piscine reovirus (PRV) shows PRV represents a new genus in family Reoviridae and its genome segment S1 sequences group it into two separate sub-genotypes.

PubMed

Kibenge, Molly J T; Iwamoto, Tokinori; Wang, Yingwei; Morton, Alexandra; Godoy, Marcos G; Kibenge, Frederick S B

2013-07-11

Piscine reovirus (PRV) is a newly discovered fish reovirus of anadromous and marine fish ubiquitous among fish in Norwegian salmon farms, and likely the causative agent of heart and skeletal muscle inflammation (HSMI). HSMI is an increasingly economically significant disease in Atlantic salmon (Salmo salar) farms. The nucleotide sequence data available for PRV are limited, and there is no genetic information on this virus outside of Norway and none from wild fish. RT-PCR amplification and sequencing were used to obtain the complete viral genome of PRV (10 segments) from western Canada and Chile. The genetic diversity among the PRV strains and their relationship to Norwegian PRV isolates were determined by phylogenetic analyses and sequence identity comparisons. PRV is distantly related to members of the genera Orthoreovirus and Aquareovirus and an unambiguous new genus within the family Reoviridae. The Canadian and Norwegian PRV strains are most divergent in the segment S1 and S4 encoded proteins. Phylogenetic analysis of PRV S1 sequences, for which the largest number of complete sequences from different "isolates" is available, grouped Norwegian PRV strains into a single genotype, Genotype I, with sub-genotypes, Ia and Ib. The Canadian PRV strains matched sub-genotype Ia and Chilean PRV strains matched sub-genotype Ib. PRV should be considered as a member of a new genus within the family Reoviridae with two major Norwegian sub-genotypes. The Canadian PRV diverged from Norwegian sub-genotype Ia around 2007 ± 1, whereas the Chilean PRV diverged from Norwegian sub-genotype Ib around 2008 ± 1.
Whole-genome analysis of piscine reovirus (PRV) shows PRV represents a new genus in family Reoviridae and its genome segment S1 sequences group it into two separate sub-genotypes

PubMed Central

2013-01-01

Background Piscine reovirus (PRV) is a newly discovered fish reovirus of anadromous and marine fish ubiquitous among fish in Norwegian salmon farms, and likely the causative agent of heart and skeletal muscle inflammation (HSMI). HSMI is an increasingly economically significant disease in Atlantic salmon (Salmo salar) farms. The nucleotide sequence data available for PRV are limited, and there is no genetic information on this virus outside of Norway and none from wild fish. Methods RT-PCR amplification and sequencing were used to obtain the complete viral genome of PRV (10 segments) from western Canada and Chile. The genetic diversity among the PRV strains and their relationship to Norwegian PRV isolates were determined by phylogenetic analyses and sequence identity comparisons. Results PRV is distantly related to members of the genera Orthoreovirus and Aquareovirus and an unambiguous new genus within the family Reoviridae. The Canadian and Norwegian PRV strains are most divergent in the segment S1 and S4 encoded proteins. Phylogenetic analysis of PRV S1 sequences, for which the largest number of complete sequences from different “isolates” is available, grouped Norwegian PRV strains into a single genotype, Genotype I, with sub-genotypes, Ia and Ib. The Canadian PRV strains matched sub-genotype Ia and Chilean PRV strains matched sub-genotype Ib. Conclusions PRV should be considered as a member of a new genus within the family Reoviridae with two major Norwegian sub-genotypes. The Canadian PRV diverged from Norwegian sub-genotype Ia around 2007 ± 1, whereas the Chilean PRV diverged from Norwegian sub-genotype Ib around 2008 ± 1. PMID:23844948
Genome-wide identification, classification, and expression analysis of the arabinogalactan protein gene family in rice (Oryza sativa L.)

PubMed Central

Zhao, Jie

2010-01-01

Arabinogalactan proteins (AGPs) comprise a family of hydroxyproline-rich glycoproteins that are implicated in plant growth and development. In this study, 69 AGPs are identified from the rice genome, including 13 classical AGPs, 15 arabinogalactan (AG) peptides, three non-classical AGPs, three early nodulin-like AGPs (eNod-like AGPs), eight non-specific lipid transfer protein-like AGPs (nsLTP-like AGPs), and 27 fasciclin-like AGPs (FLAs). The results from expressed sequence tags, microarrays, and massively parallel signature sequencing tags are used to analyse the expression of AGP-encoding genes, which is confirmed by real-time PCR. The results reveal that several rice AGP-encoding genes are predominantly expressed in anthers and display differential expression patterns in response to abscisic acid, gibberellic acid, and abiotic stresses. Based on the results obtained from this analysis, an attempt has been made to link the protein structures and expression patterns of rice AGP-encoding genes to their functions. Taken together, the genome-wide identification and expression analysis of the rice AGP gene family might facilitate further functional studies of rice AGPs. PMID:20423940

Clinical presentation and genetic analysis of a five generation Chinese family with isolated left ventricular noncompaction.

PubMed

Xia, Shudong; Wang, Hongxia; Zhang, Xiaoliang; Zhu, Jianhua; Tang, Xiaoli

2008-01-01

Isolated left ventricular noncompaction (ILVNC) is a rare congenital cardiomyopathy characterized by numerous excessive trabeculations and deep intertrabecular recesses. To date, the clinical features and genetic causes of ILVNC remain unclear. Here, we report the clinical presentation and genetic analysis of a five generation Chinese family with ILVNC. For this study, 21 living family members were recruited. Each individual underwent a detailed clinical examination for ILVNC. Peripheral blood samples were collected for direct gene sequencing to determine any mutations in the known disease-causing genes of ILVNC, which include the genes TAZ, DTNA, LDB3, LMNA and FKBP12. Classic echocardiographic presentation of ILVNC was identified in the proband who had his first onset of heart failure at age 52. His 28-year-old son and 26-year-old daughter showed similar heart anomalies as their father. Although they had no symptoms to date, depressed ventricular systolic function was noted in both of them. Pedigree analysis suggested an autosomal domain mode of inheritance. DNA sequencing found no mutation in the known disease-causing genes of ILVNC. Interestingly, two other members of the family, the proband's wife (also his first cousin) and her sister had classic echocardiographic presentation of hypertrophic cardiomyopathy (HCM). A single Chinese family with ILVNC associated with HCM is reported; no mutations in TAZ, DTNA, LDB3, LMNA and FKBP12 was found.
Genome-wide identification and expression profiling of the SnRK2 gene family in Malus prunifolia.

PubMed

Shao, Yun; Qin, Yuan; Zou, Yangjun; Ma, Fengwang

2014-11-15

Sucrose non-fermenting-1-related protein kinase 2 (SnRK2) constitutes a small plant-specific serine/threonine kinase family with essential roles in the abscisic acid (ABA) signal pathway and in responses to osmotic stress. Although a genome-wide analysis of this family has been conducted in some species, little is known about SnRK2 genes in apple (Malus domestica). We identified 14 putative sequences encoding 12 deduced SnRK2 proteins within the apple genome. Gene chromosomal location and synteny analysis of the apple SnRK2 genes indicated that tandem and segmental duplications have likely contributed to the expansion and evolution of these genes. All 12 full-length coding sequences were confirmed by cloning from Malus prunifolia. The gene structure and motif compositions of the apple SnRK2 genes were analyzed. Phylogenetic analysis showed that MpSnRK2s could be classified into four groups. Profiling of these genes presented differential patterns of expression in various tissues. Under stress conditions, transcript levels for some family members were up-regulated in the leaves in response to drought, salinity, or ABA treatments. This suggested their possible roles in plant response to abiotic stress. Our findings provide essential information about SnRK2 genes in apple and will contribute to further functional dissection of this gene family. Copyright © 2014 Elsevier B.V. All rights reserved.
Investigation of FANCA gene in Fanconi anaemia patients in Iran

PubMed Central

Saffar Moghadam, Ali Akbar; Mahjoubi, Frouzandeh; Reisi, Nahid; Vosough, Parvaneh

2016-01-01

Background & objectives: Fanconi anaemia (FA) is a syndrome with a predisposition to bone marrow failure, congenital anomalies and malignancies. It is characterized by cellular hypersensitivity to cross-linking agents such as mitomycin C (MMC). In the present study, a new approach was selected to investigate FANCA (Fanconi anaemia complementation group A) gene in patients clinically diagnosed with cellular hypersensitivity to DNA cross-linking agent MMC. Methods: Chromosomal breakage analysis was performed to prove the diagnosis of Fanconi anaemia in 318 families. Of these, 70 families had a positive result. Forty families agreed to molecular genetic testing. In total, there were 27 patients with unknown complementary types. Genomic DNA was extracted and total RNA was isolated from fresh whole blood of the patients. The first-strand cDNA was synthesized and the cDNA of each patient was then tested with 21 pairs of overlapping primers. High resolution melting curve analysis was used to screen FANCA, and LinReg software version 1.7 was utilized for analysis of expression. Results: In total, six sequence alterations were identified, which included two stop codons, two frames-shift mutations, one large deletion and one amino acid exchange. FANCA expression was downregulated in patients who had sequence alterations. Interpretation & conclusions: The results of the present study show that high resolution melting (HRM) curve analysis may be useful in the detection of sequence alteration. It is simpler and more costeffective than the multiplex ligation-dependent probe amplification (MLPA) procedure. PMID:27121516
Investigation of FANCA gene in Fanconi anaemia patients in Iran.

PubMed

Moghadam, Ali Akbar Saffar; Mahjoubi, Frouzandeh; Reisi, Nahid; Vosough, Parvaneh

2016-02-01

Fanconi anaemia (FA) is a syndrome with a predisposition to bone marrow failure, congenital anomalies and malignancies. It is characterized by cellular hypersensitivity to cross-linking agents such as mitomycin C (MMC). In the present study, a new approach was selected to investigate FANCA (Fanconi anaemia complementation group A) gene in patients clinically diagnosed with cellular hypersensitivity to DNA cross-linking agent MMC. Chromosomal breakage analysis was performed to prove the diagnosis of Fanconi anaemia in 318 families. Of these, 70 families had a positive result. Forty families agreed to molecular genetic testing. In total, there were 27 patients with unknown complementary types. Genomic DNA was extracted and total RNA was isolated from fresh whole blood of the patients. The first-strand cDNA was synthesized and the cDNA of each patient was then tested with 21 pairs of overlapping primers. High resolution melting curve analysis was used to screen FANCA, and LinReg software version 1.7 was utilized for analysis of expression. In total, six sequence alterations were identified, which included two stop codons, two frames-shift mutations, one large deletion and one amino acid exchange. FANCA expression was downregulated in patients who had sequence alterations. The results of the present study show that high resolution melting (HRM) curve analysis may be useful in the detection of sequence alteration. It is simpler and more cost-effective than the multiplex ligation-dependent probe amplification (MLPA) procedure.
Chromosome specific repetitive DNA sequences

DOEpatents

Moyzis, Robert K.; Meyne, Julianne

1991-01-01

A method is provided for determining specific nucleotide sequences useful in forming a probe which can identify specific chromosomes, preferably through in situ hybridization within the cell itself. In one embodiment, chromosome preferential nucleotide sequences are first determined from a library of recombinant DNA clones having families of repetitive sequences. Library clones are identified with a low homology with a sequence of repetitive DNA families to which the first clones respectively belong and variant sequences are then identified by selecting clones having a pattern of hybridization with genomic DNA dissimilar to the hybridization pattern shown by the respective families. In another embodiment, variant sequences are selected from a sequence of a known repetitive DNA family. The selected variant sequence is classified as chromosome specific, chromosome preferential, or chromosome nonspecific. Sequences which are classified as chromosome preferential are further sequenced and regions are identified having a low homology with other regions of the chromosome preferential sequence or with known sequences of other family me This invention is the result of a contract with the Department of Energy (Contract No. W-7405-ENG-36).
[Analysis of clinical phenotype and genetic mutations of a pedigree of familial hemophagocytic lymphohistiocytosis].

PubMed

Sun, Shuwen; Guo, Xia; Zhu, Yiping; Yang, Xue; Li, Qiang; Gao, Ju

2014-10-01

To analyze mutations in a pedigree of familial hemophagocytic lymphohistiocytosis (FHLH) from Sichuan and provide genetic counseling for the family. Clinical data of a case with FHLH diagnosed at West China Second Hospital was retrospectively analyzed. Genomic DNA was extracted from peripheral blood samples of the proband and his family members. Eight candidate genes for primary HLH were amplified with PCR and analyzed by direct sequencing. The proband was diagnosed as HLH based on clinical manifestations of recurrent fever for 2 months, hepatosplenomegaly, lymphadenopathy, pancytopenia, hyperferritinemia, and decreased fibrinogen and hemophagocytosis in bone marrow. Genetic testing for primary HLH was carried out considering the relapse of illness after hormone therapy for 8 weeks and the family history. The results of gene sequencing showed that the proband has carried compound heterozygous mutations in PRF1 gene (c.1349C> T in exon 3 and c.445G> A in exon 2). His father has carried a heterozygous mutation (c.445G> A in exon 2) and nonsense mutation (c.900C> T in exon 3), and his mother carried a heterozygous mutation (c.1349C> T in exon 3). Both c.1349C> T and c.445G> A have been previously reported as pathogenic mutations. The family has been diagnosed as familial HLH type 2 based on clinical and laboratory examinations and molecular genetic testing. Gene sequencing has indicated that is was a recessive type familial HLH.
Analyses of MMP20 Missense Mutations in Two Families with Hypomaturation Amelogenesis Imperfecta.

PubMed

Kim, Youn Jung; Kang, Jenny; Seymen, Figen; Koruyucu, Mine; Gencay, Koray; Shin, Teo Jeon; Hyun, Hong-Keun; Lee, Zang Hee; Hu, Jan C-C; Simmer, James P; Kim, Jung-Wook

2017-01-01

Amelogenesis imperfecta is a group of rare inherited disorders that affect tooth enamel formation, quantitatively and/or qualitatively. The aim of this study was to identify the genetic etiologies of two families presenting with hypomaturation amelogenesis imperfecta. DNA was isolated from peripheral blood samples obtained from participating family members. Whole exome sequencing was performed using DNA samples from the two probands. Sequencing data was aligned to the NCBI human reference genome (NCBI build 37.2, hg19) and sequence variations were annotated with the dbSNP build 138. Mutations in MMP20 were identified in both probands. A homozygous missense mutation (c.678T>A; p.His226Gln) was identified in the consanguineous Family 1. Compound heterozygous MMP20 mutations (c.540T>A, p.Tyr180 * and c.389C>T, p.Thr130Ile) were identified in the non-consanguineous Family 2. Affected persons in Family 1 showed hypomaturation AI with dark brown discoloration, which is similar to the clinical phenotype in a previous report with the same mutation. However, the dentition of the Family 2 proband exhibited slight yellowish discoloration with reduced transparency. Functional analysis showed that the p.Thr130Ile mutant protein had reduced activity of MMP20, while there was no functional MMP20 in the Family 1 proband. These results expand the mutational spectrum of the MMP20 and broaden our understanding of genotype-phenotype correlations in amelogenesis imperfecta.
Genome structure drives patterns of gene family evolution in ciliates, a case study using Chilodonella uncinata (Protista, Ciliophora, Phyllopharyngea)

PubMed Central

Gao, Feng; Song, Weibo; Katz, Laura A.

2014-01-01

In most lineages, diversity among gene family members results from gene duplication followed by sequence divergence. Because of the genome rearrangements during the development of somatic nuclei, gene family evolution in ciliates involves more complex processes. Previous work on the ciliate Chilodonella uncinata revealed that macronuclear β-tubulin gene family members are generated by alternative processing, in which germline regions are alternatively used in multiple macronuclear chromosomes. To further study genome evolution in this ciliate, we analyzed its transcriptome and found that: 1) alternative processing is extensive among gene families; and 2) such gene families are likely to be C. uncinata-specific. We characterized additional macronuclear and micronuclear copies of one candidate alternatively processed gene family -- a protein kinase domain containing protein (PKc) -- from two C. uncinata strains. Analysis of the PKc sequences reveals: 1) multiple PKc gene family members in the macronucleus share some identical regions flanked by divergent regions; and 2) the shared identical regions are processed from a single micronuclear chromosome. We discuss analogous processes in lineages across the eukaryotic tree of life to provide further insights on the impact of genome structure on gene family evolution in eukaryotes. PMID:24749903
Identification and analysis of multigene families by comparison of exon fingerprints.

PubMed

Brown, N P; Whittaker, A J; Newell, W R; Rawlings, C J; Beck, S

1995-06-02

Gene families are often recognised by sequence homology using similarity searching to find relationships, however, genomic sequence data provides gene architectural information not used by conventional search methods. In particular, intron positions and phases are expected to be relatively conserved features, because mis-splicing and reading frame shifts should be selected against. A fast search technique capable of detecting possible weak sequence homologies apparent at the intron/exon level of gene organization is presented for comparing spliceosomal genes and gene fragments. FINEX compares strings of exons delimited by intron/exon boundary positions and intron phases (exon fingerprint) using a global dynamic programming algorithm with a combined intron phase identity and exon size dissimilarity score. Exon fingerprints are typically two orders of magnitude smaller than their nucleic acid sequence counterparts giving rise to fast search times: a ranked search against a library of 6755 fingerprints for a typical three exon fingerprint completes in under 30 seconds on an ordinary workstation, while a worst case largest fingerprint of 52 exons completes in just over one minute. The short "sequence" length of exon fingerprints in comparisons is compensated for by the large exon alphabet compounded of intron phase types and a wide range of exon sizes, the latter contributing the most information to alignments. FINEX performs better in some searches than conventional methods, finding matches with similar exon organization, but low sequence homology. A search using a human serum albumin finds all members of the multigene family in the FINEX database at the top of the search ranking, despite very low amino acid percentage identities between family members. The method should complement conventional sequence searching and alignment techniques, offering a means of identifying otherwise hard to detect homologies where genomic data are available.
Germline mutations in SUFU cause Gorlin syndrome-associated childhood medulloblastoma and redefine the risk associated with PTCH1 mutations.

PubMed

Smith, Miriam J; Beetz, Christian; Williams, Simon G; Bhaskar, Sanjeev S; O'Sullivan, James; Anderson, Beverley; Daly, Sarah B; Urquhart, Jill E; Bholah, Zaynab; Oudit, Deemesh; Cheesman, Edmund; Kelsey, Anna; McCabe, Martin G; Newman, William G; Evans, D Gareth R

2014-12-20

Heterozygous germline PTCH1 mutations are causative of Gorlin syndrome (naevoid basal cell carcinoma), but detection rates > 70% have rarely been reported. We aimed to define the causative mutations in individuals with Gorlin syndrome without PTCH1 mutations. We undertook exome sequencing on lymphocyte DNA from four unrelated individuals from families with Gorlin syndrome with no PTCH1 mutations found by Sanger sequencing, multiplex ligation-dependent probe amplification (MLPA), or RNA analysis. A germline heterozygous nonsense mutation in SUFU was identified in one of four exomes. Sanger sequencing of SUFU in 23 additional PTCH1-negative Gorlin syndrome families identified a SUFU mutation in a second family. Copy-number analysis of SUFU by MLPA revealed a large heterozygous deletion in a third family. All three SUFU-positive families fulfilled diagnostic criteria for Gorlin syndrome, although none had odontogenic jaw keratocysts. Each SUFU-positive family included a single case of medulloblastoma, whereas only two (1.7%) of 115 individuals with Gorlin syndrome and a PTCH1 mutation developed medulloblastoma. We demonstrate convincing evidence that SUFU mutations can cause classical Gorlin syndrome. Our study redefines the risk of medulloblastoma in Gorlin syndrome, dependent on the underlying causative gene. Previous reports have found a 5% risk of medulloblastoma in Gorlin syndrome. We found a < 2% risk in PTCH1 mutation-positive individuals, with a risk up to 20× higher in SUFU mutation-positive individuals. Our data suggest childhood brain magnetic resonance imaging surveillance is justified in SUFU-related, but not PTCH1-related, Gorlin syndrome. © 2014 by American Society of Clinical Oncology.
A deep insight into the sialotranscriptome of the mosquito, Psorophora albipes

PubMed Central

2013-01-01

Background Psorophora mosquitoes are exclusively found in the Americas and have been associated with transmission of encephalitis and West Nile fever viruses, among other arboviruses. Mosquito salivary glands represent the final route of differentiation and transmission of many parasites. They also secrete molecules with powerful pharmacologic actions that modulate host hemostasis, inflammation, and immune response. Here, we employed next generation sequencing and proteome approaches to investigate for the first time the salivary composition of a mosquito member of the Psorophora genus. We additionally discuss the evolutionary position of this mosquito genus into the Culicidae family by comparing the identity of its secreted salivary compounds to other mosquito salivary proteins identified so far. Results Illumina sequencing resulted in 13,535,229 sequence reads, which were assembled into 3,247 contigs. All families were classified according to their in silico-predicted function/ activity. Annotation of these sequences allowed classification of their products into 83 salivary protein families, twenty (24.39%) of which were confirmed by our subsequent proteome analysis. Two protein families were deorphanized from Aedes and one from Ochlerotatus, while four protein families were described as novel to Psorophora genus because they had no match with any other known mosquito salivary sequence. Several protein families described as exclusive to Culicines were present in Psorophora mosquitoes, while we did not identify any member of the protein families already known as unique to Anophelines. Also, the Psorophora salivary proteins had better identity to homologs in Aedes (69.23%), followed by Ochlerotatus (8.15%), Culex (6.52%), and Anopheles (4.66%), respectively. Conclusions This is the first sialome (from the Greek sialo = saliva) catalog of salivary proteins from a Psorophora mosquito, which may be useful for better understanding the lifecycle of this mosquito and the role of its salivary secretion in arboviral transmission. PMID:24330624
Determination and analysis of the complete genome sequence of Paralichthys olivaceus rhabdovirus (PORV).

PubMed

Zhu, Ruo-Lin; Zhang, Qi-Ya

2014-04-01

Paralichthys olivaceus rhabdovirus (PORV), which is associated with high mortality rates in flounder, was isolated in China in 2005. Here, we provide an annotated sequence record of PORV, the genome of which comprises 11,182 nucleotides and contains six genes in the order 3'-N-P-M-G-NV-L-5'. Phylogenetic analysis based on glycoprotein sequences of PORV and other rhabdoviruses showed that PORV clusters with viral haemorrhagic septicemia virus (VHSV), genus Novirhabdovirus, family Rhabdoviridae. Further phylogenetic analysis of the combined amino acid sequences of six proteins of PORV and VHSV strains showed that PORV clusters with Korean strains and is closely related to Asian strains, all of which were isolated from flounder. In a comparison in which the sequences of the six proteins were combined, PORV shared the highest identity (98.3 %) with VHSV strain KJ2008 from Korea.
Ocular findings associated with a Cys39Arg mutation in the Norrie disease gene.

PubMed

Joos, K M; Kimura, A E; Vandenburgh, K; Bartley, J A; Stone, E M

1994-12-01

To diagnose the carriers and noncarriers in a family affected with Norrie disease based on molecular analysis. Family members from three generations, including one affected patient, two obligate carriers, one carrier identified with linkage analysis, one noncarrier identified with linkage analysis, and one female family member with indeterminate carrier status, were examined clinically and electrophysiologically. Linkage analysis had previously failed to determine the carrier status of one female family member in the third generation. Blood samples were screened for mutations in the Norrie disease gene with single-strand conformation polymorphism analysis. The mutation was characterized by dideoxy-termination sequencing. Ophthalmoscopy and electroretinographic examination failed to detect the carrier state. The affected individuals and carriers in this family were found to have a transition from thymidine to cytosine in the first nucleotide of codon 39 of the Norrie disease gene, causing a cysteine-to-arginine mutation. Single-strand conformation polymorphism analysis identified a patient of indeterminate status (by linkage) to be a noncarrier of Norrie disease. Ophthalmoscopy and electroretinography could not identify carriers of this Norrie disease mutation. Single-strand conformation polymorphism analysis was more sensitive and specific than linkage analysis in identifying carriers in this family.
Genome-wide analysis of the WRKY gene family in physic nut (Jatropha curcas L.).

PubMed

Xiong, Wangdan; Xu, Xueqin; Zhang, Lin; Wu, Pingzhi; Chen, Yaping; Li, Meiru; Jiang, Huawu; Wu, Guojiang

2013-07-25

The WRKY proteins, which contain highly conserved WRKYGQK amino acid sequences and zinc-finger-like motifs, constitute a large family of transcription factors in plants. They participate in diverse physiological and developmental processes. WRKY genes have been identified and characterized in a number of plant species. We identified a total of 58 WRKY genes (JcWRKY) in the genome of the physic nut (Jatropha curcas L.). On the basis of their conserved WRKY domain sequences, all of the JcWRKY proteins could be assigned to one of the previously defined groups, I-III. Phylogenetic analysis of JcWRKY genes with Arabidopsis and rice WRKY genes, and separately with castor bean WRKY genes, revealed no evidence of recent gene duplication in JcWRKY gene family. Analysis of transcript abundance of JcWRKY gene products were tested in different tissues under normal growth condition. In addition, 47 WRKY genes responded to at least one abiotic stress (drought, salinity, phosphate starvation and nitrogen starvation) in individual tissues (leaf, root and/or shoot cortex). Our study provides a useful reference data set as the basis for cloning and functional analysis of physic nut WRKY genes. Copyright © 2013 Elsevier B.V. All rights reserved.
MIPS: a database for protein sequences and complete genomes.

PubMed Central

Mewes, H W; Hani, J; Pfeiffer, F; Frishman, D

1998-01-01

The MIPS group [Munich Information Center for Protein Sequences of the German National Center for Environment and Health (GSF)] at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, is involved in a number of data collection activities, including a comprehensive database of the yeast genome, a database reflecting the progress in sequencing the Arabidopsis thaliana genome, the systematic analysis of other small genomes and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). Through its WWW server (http://www.mips.biochem.mpg.de ) MIPS provides access to a variety of generic databases, including a database of protein families as well as automatically generated data by the systematic application of sequence analysis algorithms. The yeast genome sequence and its related information was also compiled on CD-ROM to provide dynamic interactive access to the 16 chromosomes of the first eukaryotic genome unraveled. PMID:9399795
Hyperdiversity of Genes Encoding Integral Light-Harvesting Proteins in the Dinoflagellate Symbiodinium sp

PubMed Central

Boldt, Lynda; Yellowlees, David; Leggat, William

2012-01-01

The superfamily of light-harvesting complex (LHC) proteins is comprised of proteins with diverse functions in light-harvesting and photoprotection. LHC proteins bind chlorophyll (Chl) and carotenoids and include a family of LHCs that bind Chl a and c. Dinophytes (dinoflagellates) are predominantly Chl c binding algal taxa, bind peridinin or fucoxanthin as the primary carotenoid, and can possess a number of LHC subfamilies. Here we report 11 LHC sequences for the chlorophyll a-chlorophyll c 2-peridinin protein complex (acpPC) subfamily isolated from Symbiodinium sp. C3, an ecologically important peridinin binding dinoflagellate taxa. Phylogenetic analysis of these proteins suggests the acpPC subfamily forms at least three clades within the Chl a/c binding LHC family; Clade 1 clusters with rhodophyte, cryptophyte and peridinin binding dinoflagellate sequences, Clade 2 with peridinin binding dinoflagellate sequences only and Clades 3 with heterokontophytes, fucoxanthin and peridinin binding dinoflagellate sequences. PMID:23112815
Functional metagenomics reveals novel β-galactosidases not predictable from gene sequences.

PubMed

Cheng, Jiujun; Romantsov, Tatyana; Engel, Katja; Doxey, Andrew C; Rose, David R; Neufeld, Josh D; Charles, Trevor C

2017-01-01

The techniques of metagenomics have allowed researchers to access the genomic potential of uncultivated microbes, but there remain significant barriers to determination of gene function based on DNA sequence alone. Functional metagenomics, in which DNA is cloned and expressed in surrogate hosts, can overcome these barriers, and make important contributions to the discovery of novel enzymes. In this study, a soil metagenomic library carried in an IncP cosmid was used for functional complementation for β-galactosidase activity in both Sinorhizobium meliloti (α-Proteobacteria) and Escherichia coli (γ-Proteobacteria) backgrounds. One β-galactosidase, encoded by six overlapping clones that were selected in both hosts, was identified as a member of glycoside hydrolase family 2. We could not identify ORFs obviously encoding possible β-galactosidases in 19 other sequenced clones that were only able to complement S. meliloti. Based on low sequence identity to other known glycoside hydrolases, yet not β-galactosidases, three of these ORFs were examined further. Biochemical analysis confirmed that all three encoded β-galactosidase activity. Lac36W_ORF11 and Lac161_ORF7 had conserved domains, but lacked similarities to known glycoside hydrolases. Lac161_ORF10 had neither conserved domains nor similarity to known glycoside hydrolases. Bioinformatic and structural modeling implied that Lac161_ORF10 protein represented a novel enzyme family with a five-bladed propeller glycoside hydrolase domain. By discovering founding members of three novel β-galactosidase families, we have reinforced the value of functional metagenomics for isolating novel genes that could not have been predicted from DNA sequence analysis alone.
Sequencing, bioinformatic characterization and expression pattern of a putative amino acid transporter from the parasitic cestode Echinococcus granulosus.

PubMed

Camicia, Federico; Paredes, Rodolfo; Chalar, Cora; Galanti, Norbel; Kamenetzky, Laura; Gutierrez, Ariana; Rosenzvit, Mara C

2008-03-31

We have sequenced and partially characterized an Echinococcus granulosus cDNA, termed egat1, from a protoscolex signal sequence trap (SST) cDNA library. The isolated 1627 bp long cDNA contains an ORF of 489 amino acids and shows an amino acid identity of 30% with neutral and excitatory amino acid transporters members of the Dicarboxylate/Amino Acid Na+ and/or H+ Cation Symporter family (DAACS) (TC 2.A.23). Additional bioinformatics analysis of EgAT1, confirmed the results obtained by similarity searches and showed the presence of 9 to 10 transmembrane domains, consensus sequences for N-glycosylation between the third and fourth transmembrane domain, a highly similar hydropathy profile with ASCT1 (a known member of DAACS family), high score with SDF (Sodium Dicarboxilate Family) and similar motifs with EDTRANSPORT, a fingerprint of excitatory amino acid transporters. The localization of the putative amino acid transporter was analyzed by in situ hybridization and immunofluorescence in protoscoleces and associated germinal layer. The in situ hybridization labelling indicates the distribution of egat1 mRNA throughout the tegument. EgAT1 protein, which showed in Western blots a molecular mass of approximately 60 kD, is localized in the subtegumental region of the metacestode, particularly around suckers and rostellum of protoscoleces and layers from brood capsules. The sequence and expression analyses of EgAT1 pave the way for functional analysis of amino acids transporters of E. granulosus and its evaluation as new drug targets against cystic echinococcosis.
Conserved Features in the Structure, Mechanism, and Biogenesis of the Inverse Autotransporter Protein Family

PubMed Central

Heinz, Eva; Stubenrauch, Christopher J.; Grinter, Rhys; Croft, Nathan P.; Purcell, Anthony W.; Strugnell, Richard A.; Dougan, Gordon; Lithgow, Trevor

2016-01-01

The bacterial cell surface proteins intimin and invasin are virulence factors that share a common domain structure and bind selectively to host cell receptors in the course of bacterial pathogenesis. The β-barrel domains of intimin and invasin show significant sequence and structural similarities. Conversely, a variety of proteins with sometimes limited sequence similarity have also been annotated as “intimin-like” and “invasin” in genome datasets, while other recent work on apparently unrelated virulence-associated proteins ultimately revealed similarities to intimin and invasin. Here we characterize the sequence and structural relationships across this complex protein family. Surprisingly, intimins and invasins represent a very small minority of the sequence diversity in what has been previously the “intimin/invasin protein family”. Analysis of the assembly pathway for expression of the classic intimin, EaeA, and a characteristic example of the most prevalent members of the group, FdeC, revealed a dependence on the translocation and assembly module as a common feature for both these proteins. While the majority of the sequences in the grouping are most similar to FdeC, a further and widespread group is two-partner secretion systems that use the β-barrel domain as the delivery device for secretion of a variety of virulence factors. This comprehensive analysis supports the adoption of the “inverse autotransporter protein family” as the most accurate nomenclature for the family and, in turn, has important consequences for our overall understanding of the Type V secretion systems of bacterial pathogens. PMID:27190006
Novel compound heterozygous mutations in MYO7A gene associated with autosomal recessive sensorineural hearing loss in a Chinese family.

PubMed

Ma, Yalin; Xiao, Yun; Zhang, Fengguo; Han, Yuechen; Li, Jianfeng; Xu, Lei; Bai, Xiaohui; Wang, Haibo

2016-04-01

Mutations in MYO7A gene have been reported to be associated with Usher Syndrome type 1B (USH1B) and nonsyndromic hearing loss (DFNB2, DFNA11). Most mutations in MYO7A gene caused USH1B, whereas only a few reported mutations led to DFNB2 and DFNA11. The current study was designed to investigate the mutations among a Chinese family with autosomal recessive hearing loss. In this study, we present the clinical, genetic and molecular characteristics of a Chinese family. Targeted capture of 127 known deafness genes and next-generation sequencing were employed to study the genetic causes of two siblings in the Chinese family. Sanger sequencing was employed to examine those variant mutations in the members of this family and other ethnicity-matched controls. We identified the novel compound heterozygous mutant alleles of MYO7A gene: a novel missense mutation c.3671C>A (p.A1224D) and a reported insert mutation c.390_391insC (p.P131PfsX9). Variants were further confirmed by Sanger sequencing. These two compound heterozygous variants were co-segregated with autosomal recessive hearing loss phenotype. The gene mutation analysis and protein sequence alignment further supported that the novel compound heterozygous mutations were pathogenic. The novel compound heterozygous mutations (c.3671C>A and c.390_391insC) in MYO7A gene identified in this study were responsible for the autosomal recessive sensorineural hearing loss of this Chinese family. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

Familial Cortical Myoclonus with a Mutation in NOL3

PubMed Central

Russell, Jonathan F.; Steckley, Jamie L.; Coppola, Giovanni; Hahn, Angelika F.G.; Howard, MacKenzie A.; Kornberg, Zachary; Huang, Alden; Mirsattari, Seyed M.; Merriman, Barry; Klein, Eric; Choi, Murim; Lee, Hsien-Yang; Kirk, Andrew; Nelson-Williams, Carol; Gibson, Gillian; Baraban, Scott C.; Lifton, Richard P.; Geschwind, Daniel H.; Fu, Ying-Hui; Ptáček, Louis J.

2012-01-01

Objective Myoclonus is characterized by sudden, brief involuntary movements and its presence is debilitating. We identified a family suffering from adult-onset, cortical myoclonus without associated seizures. We performed clinical, electrophysiological, and genetic studies to define this phenotype. Methods A large, four-generation family with history of myoclonus underwent careful questioning, examination, and electrophysiological testing. Thirty-five family members donated blood samples for genetic analysis, which included SNP mapping, microsatellite linkage, targeted massively parallel sequencing, and Sanger sequencing. In silico and in vitro experiments were performed to investigate functional significance of the mutation. Results We identified 11 members of a Canadian Mennonite family suffering from adult-onset, slowly progressive, disabling, multifocal myoclonus. Somatosensory evoked potentials indicated a cortical origin of the myoclonus. There were no associated seizures. Some severely affected individuals developed signs of progressive cerebellar ataxia of variable severity late in the course of their illness. The phenotype was inherited in an autosomal dominant fashion. We demonstrated linkage to chromosome 16q21-22.1. We then sequenced all coding sequence in the critical region, identifying only a single co-segregating, novel, nonsynonymous mutation, which resides in the gene NOL3. Furthermore, this mutation was found to alter post-translational modification of NOL3 protein in vitro. Interpretation We propose that Familial Cortical Myoclonus (FCM) is a novel movement disorder that may be caused by mutation in NOL3. Further investigation of the role of NOL3 in neuronal physiology may shed light on neuronal membrane hyperexcitability and pathophysiology of myoclonus and related disorders. PMID:22926851
Genetic analysis of an Indian family with members affected with Waardenburg syndrome and Duchenne muscular dystrophy

PubMed Central

Kapoor, Saketh; Bindu, Parayil Sankaran; Taly, Arun B.; Sinha, Sanjib; Gayathri, Narayanappa; Rani, S. Vasantha; Chandak, Giriraj Ratan

2012-01-01

Purpose Waardenburg syndrome (WS) is characterized by sensorineural hearing loss and pigmentation defects of the eye, skin, and hair. It is caused by mutations in one of the following genes: PAX3 (paired box 3), MITF (microphthalmia-associated transcription factor), EDNRB (endothelin receptor type B), EDN3 (endothelin 3), SNAI2 (snail homolog 2, Drosophila) and SOX10 (SRY-box containing gene 10). Duchenne muscular dystrophy (DMD) is an X-linked recessive disorder caused by mutations in the DMD gene. The purpose of this study was to identify the genetic causes of WS and DMD in an Indian family with two patients: one affected with WS and DMD, and another one affected with only WS. Methods Blood samples were collected from individuals for genomic DNA isolation. To determine the linkage of this family to the eight known WS loci, microsatellite markers were selected from the candidate regions and used to genotype the family. Exon-specific intronic primers for EDN3 were used to amplify and sequence DNA samples from affected individuals to detect mutations. A mutation in DMD was identified by multiplex PCR and multiplex ligation-dependent probe amplification method using exon-specific probes. Results Pedigree analysis suggested segregation of WS as an autosomal recessive trait in the family. Haplotype analysis suggested linkage of the family to the WS4B (EDN3) locus. DNA sequencing identified a novel missense mutation p.T98M in EDN3. A deletion mutation was identified in DMD. Conclusions This study reports a novel missense mutation in EDN3 and a deletion mutation in DMD in the same Indian family. The present study will be helpful in genetic diagnosis of this family and increases the mutation spectrum of EDN3. PMID:22876130
Genetic analysis of an Indian family with members affected with Waardenburg syndrome and Duchenne muscular dystrophy.

PubMed

Kapoor, Saketh; Bindu, Parayil Sankaran; Taly, Arun B; Sinha, Sanjib; Gayathri, Narayanappa; Rani, S Vasantha; Chandak, Giriraj Ratan; Kumar, Arun

2012-01-01

Waardenburg syndrome (WS) is characterized by sensorineural hearing loss and pigmentation defects of the eye, skin, and hair. It is caused by mutations in one of the following genes: PAX3 (paired box 3), MITF (microphthalmia-associated transcription factor), EDNRB (endothelin receptor type B), EDN3 (endothelin 3), SNAI2 (snail homolog 2, Drosophila) and SOX10 (SRY-box containing gene 10). Duchenne muscular dystrophy (DMD) is an X-linked recessive disorder caused by mutations in the DMD gene. The purpose of this study was to identify the genetic causes of WS and DMD in an Indian family with two patients: one affected with WS and DMD, and another one affected with only WS. Blood samples were collected from individuals for genomic DNA isolation. To determine the linkage of this family to the eight known WS loci, microsatellite markers were selected from the candidate regions and used to genotype the family. Exon-specific intronic primers for EDN3 were used to amplify and sequence DNA samples from affected individuals to detect mutations. A mutation in DMD was identified by multiplex PCR and multiplex ligation-dependent probe amplification method using exon-specific probes. Pedigree analysis suggested segregation of WS as an autosomal recessive trait in the family. Haplotype analysis suggested linkage of the family to the WS4B (EDN3) locus. DNA sequencing identified a novel missense mutation p.T98M in EDN3. A deletion mutation was identified in DMD. This study reports a novel missense mutation in EDN3 and a deletion mutation in DMD in the same Indian family. The present study will be helpful in genetic diagnosis of this family and increases the mutation spectrum of EDN3.
[Mutation analysis for a pedigree affected with keratitis-ichthyosis-deafness syndrome].

PubMed

Li, Lulu; Li, Yuan; Lin, Wei; Zhao, Xiuli

2017-10-10

To identify mutation of GJB2 gene and provide genetic counseling for a family affected with keratitis-ichthyosis-deafness (KID) syndrome. Genomic DNA was extracted from peripheral blood samples with a standard phenol-chloroform method. PCR and Sanger sequencing were used to analyze potential mutation in the proband. Suspected mutation was verified with a PCR-high-resolution melting (PCR-HRM) method. T-clone sequencing was applied to determine the parental origin of the mutation. A heterozygous mutation, c.148G>A (p.Asp50Asn), which is located in the exon 1 of the GJB2 gene, was found in the proband. The results was confirmed by HRM analysis. Cloning sequencing suggested that the mutation was derived from the father's germline. The hot-spot mutation c.148G>A (p.Asp50Asn) in the GJB2 gene probably underlies the KID syndrome in this Chinese family. A PCR-HRM method has been established to rapidly detect common mutations associated with this disease.
Development of an Analysis Pipeline Characterizing Multiple Hypervariable Regions of 16S rRNA Using Mock Samples.

PubMed

Barb, Jennifer J; Oler, Andrew J; Kim, Hyung-Suk; Chalmers, Natalia; Wallen, Gwenyth R; Cashion, Ann; Munson, Peter J; Ames, Nancy J

2016-01-01

There is much speculation on which hypervariable region provides the highest bacterial specificity in 16S rRNA sequencing. The optimum solution to prevent bias and to obtain a comprehensive view of complex bacterial communities would be to sequence the entire 16S rRNA gene; however, this is not possible with second generation standard library design and short-read next-generation sequencing technology. This paper examines a new process using seven hypervariable or V regions of the 16S rRNA (six amplicons: V2, V3, V4, V6-7, V8, and V9) processed simultaneously on the Ion Torrent Personal Genome Machine (Life Technologies, Grand Island, NY). Four mock samples were amplified using the 16S Ion Metagenomics Kit™ (Life Technologies) and their sequencing data is subjected to a novel analytical pipeline. Results are presented at family and genus level. The Kullback-Leibler divergence (DKL), a measure of the departure of the computed from the nominal bacterial distribution in the mock samples, was used to infer which region performed best at the family and genus levels. Three different hypervariable regions, V2, V4, and V6-7, produced the lowest divergence compared to the known mock sample. The V9 region gave the highest (worst) average DKL while the V4 gave the lowest (best) average DKL. In addition to having a high DKL, the V9 region in both the forward and reverse directions performed the worst finding only 17% and 53% of the known family level and 12% and 47% of the genus level bacteria, while results from the forward and reverse V4 region identified all 17 family level bacteria. The results of our analysis have shown that our sequencing methods using 6 hypervariable regions of the 16S rRNA and subsequent analysis is valid. This method also allowed for the assessment of how well each of the variable regions might perform simultaneously. Our findings will provide the basis for future work intended to assess microbial abundance at different time points throughout a clinical protocol.
A distinct X-linked syndrome involving joint contractures, keloids, large optic cup-to-disc ratio, and renal stones results from a filamin A (FLNA) mutation.

PubMed

Lah, Melissa; Niranjan, Tejasvi; Srikanth, Sujata; Holloway, Lynda; Schwartz, Charles E; Wang, Tao; Weaver, David D

2016-04-01

We further evaluated a previously reported family with an apparently undescribed X-linked syndrome involving joint contractures, keloids, an increased optic cup-to-disc ratio, and renal stones to elucidate the genetic cause. To do this, we obtained medical histories and performed physical examination on 14 individuals in the family, five of whom are affected males and three are obligate carrier females. Linkage analysis was performed on all but one individual and chromosome X-exome sequencing was done on two affected males. The analysis localized the putative gene to Xq27-qter and chromosome X-exome sequencing revealed a mutation in exon 28 (c.4726G>A) of the filamin A (FLNA) gene, predicting that a conserved glycine had been replaced by arginine at amino acid 1576 (p.G1576R). Segregation analysis demonstrated that all known carrier females tested were heterozygous (G/A), all affected males were hemizygous for the mutation (A allele) and all normal males were hemizygous for the normal G allele. The data and the bioinformatic analysis indicate that the G1576R mutation in the FLNA gene is very likely pathogenic in this family. The syndrome affecting the family shares phenotypic overlap with other syndromes caused by FLNA mutations, but appears to be a distinct phenotype, likely representing a unique genetic syndrome. © 2016 Wiley Periodicals, Inc.
A new polymorphic and multicopy MHC gene family related to nonmammalian class I

DOE Office of Scientific and Technical Information (OSTI.GOV)

Leelayuwat, C.; Degli-Esposti, M.A.; Abraham, L.J.

1994-12-31

The authors have used genomic analysis to characterize a region of the central major histocompatibility complex (MHC) spanning {approximately} 300 kilobases (kb) between TNF and HLA-B. This region has been suggested to carry genetic factors relevant to the development of autoimmune diseases such as myasthenia gravis (MG) and insulin dependent diabetes mellitus (IDDM). Genomic sequence was analyzed for coding potential, using two neural network programs, GRAIL and GeneParser. A genomic probe, JAB, containing putative coding sequences (PERB11) located 60 kb centromeric of HLA-B, was used for northern analysis of human tissues. Multiple transcripts were detected. Southern analysis of genomic DNAmore » and overlapping YAC clones, covering the region from BAT1 to HLA-F, indicated that there are at least five copies of PERB11, four of which are located within this region of the MHC. The partial cDNA sequence of PERB11 was obtained from poly-A RNA derived from skeletal muscle. The putative amino acid sequence of PERB11 shares {approximately} 30% identity to MHC class I molecules from various species, including reptiles, chickens, and frogs, as well as to other MHC class I-like molecules, such as the IgG FcR of the mouse and rat and the human Zn-{alpha}2-glycoprotein. From direct comparison of amino acid sequences, it is concluded that PERB11 is a distinct molecule more closely related to nonmammalian than known mammalian MHC class I molecules. Genomic sequence analysis of PERB11 from five MHC ancestral haplotypes (AH) indicated that the gene is polymorphic at both DNA and protein level. The results suggest that the authors have identified a novel polymorphic gene family with multiple copies within the MHC. 48 refs., 10 figs., 2 tabs.« less
Rapid birth-and-death evolution of the xenobiotic metabolizing NAT gene family in vertebrates with evidence of adaptive selection

PubMed Central

2013-01-01

Background The arylamine N-acetyltransferases (NATs) are a unique family of enzymes widely distributed in nature that play a crucial role in the detoxification of aromatic amine xenobiotics. Considering the temporal changes in the levels and toxicity of environmentally available chemicals, the metabolic function of NATs is likely to be under adaptive evolution to broaden or change substrate specificity over time, making NATs a promising subject for evolutionary analyses. In this study, we trace the molecular evolutionary history of the NAT gene family during the last ~450 million years of vertebrate evolution and define the likely role of gene duplication, gene conversion and positive selection in the evolutionary dynamics of this family. Results A phylogenetic analysis of 77 NAT sequences from 38 vertebrate species retrieved from public genomic databases shows that NATs are phylogenetically unstable genes, characterized by frequent gene duplications and losses even among closely related species, and that concerted evolution only played a minor role in the patterns of sequence divergence. Local signals of positive selection are detected in several lineages, probably reflecting response to changes in xenobiotic exposure. We then put a special emphasis on the study of the last ~85 million years of primate NAT evolution by determining the NAT homologous sequences in 13 additional primate species. Our phylogenetic analysis supports the view that the three human NAT genes emerged from a first duplication event in the common ancestor of Simiiformes, yielding NAT1 and an ancestral NAT gene which in turn, duplicated in the common ancestor of Catarrhini, giving rise to NAT2 and the NATP pseudogene. Our analysis suggests a main role of purifying selection in NAT1 protein evolution, whereas NAT2 was predicted to mostly evolve under positive selection to change its amino acid sequence over time. These findings are consistent with a differential role of the two human isoenzymes and support the involvement of NAT1 in endogenous metabolic pathways. Conclusions This study provides unequivocal evidence that the NAT gene family has evolved under a dynamic process of birth-and-death evolution in vertebrates, consistent with previous observations made in fungi. PMID:23497148
Amino acid sequence analysis of the annexin super-gene family of proteins.

PubMed

Barton, G J; Newman, R H; Freemont, P S; Crumpton, M J

1991-06-15

The annexins are a widespread family of calcium-dependent membrane-binding proteins. No common function has been identified for the family and, until recently, no crystallographic data existed for an annexin. In this paper we draw together 22 available annexin sequences consisting of 88 similar repeat units, and apply the techniques of multiple sequence alignment, pattern matching, secondary structure prediction and conservation analysis to the characterisation of the molecules. The analysis clearly shows that the repeats cluster into four distinct families and that greatest variation occurs within the repeat 3 units. Multiple alignment of the 88 repeats shows amino acids with conserved physicochemical properties at 22 positions, with only Gly at position 23 being absolutely conserved in all repeats. Secondary structure prediction techniques identify five conserved helices in each repeat unit and patterns of conserved hydrophobic amino acids are consistent with one face of a helix packing against the protein core in predicted helices a, c, d, e. Helix b is generally hydrophobic in all repeats, but contains a striking pattern of repeat-specific residue conservation at position 31, with Arg in repeats 4 and Glu in repeats 2, but unconserved amino acids in repeats 1 and 3. This suggests repeats 2 and 4 may interact via a buried saltbridge. The loop between predicted helices a and b of repeat 3 shows features distinct from the equivalent loop in repeats 1, 2 and 4, suggesting an important structural and/or functional role for this region. No compelling evidence emerges from this study for uteroglobin and the annexins sharing similar tertiary structures, or for uteroglobin representing a derivative of a primordial one-repeat structure that underwent duplication to give the present day annexins. The analyses performed in this paper are re-evaluated in the Appendix, in the light of the recently published X-ray structure for human annexin V. The structure confirms most of the predictions and shows the power of techniques for the determination of tertiary structural information from the amino acid sequences of an aligned protein family.
GATA3 mutation in a family with hypoparathyroidism, deafness and renal dysplasia syndrome.

PubMed

Zhu, Zi-Yang; Zhou, Qiao-Li; Ni, Shi-Ning; Gu, Wei

2014-08-01

The hypoparathyroidism, deafness and renal dysplasia (HDR) syndrome is an autosomal dominant disorder primarily caused by GATA3 gene mutation. We report here a case that both of a Chinese boy and his father had HDR syndrome which caused by a novel mutation of GATA3. Polymerase chain reaction and DNA sequencing was performed to detect the exons of the GATA3 gene for mutation analysis. Sequence analysis of GATA3 revealed a heterozygous nonsense mutation in this family: a mutation of GATA3 at exon 2 (c.515C >A) that resulted in a premature stop at codon 172 (p.S172X) with a loss of two zinc finger domains. We identified a novel nonsense mutation which will expand the spectrum of HDR-associated GATA3 mutations.
Data set for phylogenetic tree and RAMPAGE Ramachandran plot analysis of SODs in Gossypium raimondii and G. arboreum.

PubMed

Wang, Wei; Xia, Minxuan; Chen, Jie; Deng, Fenni; Yuan, Rui; Zhang, Xiaopei; Shen, Fafu

2016-12-01

The data presented in this paper is supporting the research article "Genome-Wide Analysis of Superoxide Dismutase Gene Family in Gossypium raimondii and G. arboreum" [1]. In this data article, we present phylogenetic tree showing dichotomy with two different clusters of SODs inferred by the Bayesian method of MrBayes (version 3.2.4), "Bayesian phylogenetic inference under mixed models" [2], Ramachandran plots of G. raimondii and G. arboreum SODs, the protein sequence used to generate 3D sructure of proteins and the template accession via SWISS-MODEL server, "SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information." [3] and motif sequences of SODs identified by InterProScan (version 4.8) with the Pfam database, "Pfam: the protein families database" [4].
Exome sequencing identifies a DNAJB6 mutation in a family with dominantly-inherited limb-girdle muscular dystrophy.

PubMed

Couthouis, Julien; Raphael, Alya R; Siskind, Carly; Findlay, Andrew R; Buenrostro, Jason D; Greenleaf, William J; Vogel, Hannes; Day, John W; Flanigan, Kevin M; Gitler, Aaron D

2014-05-01

Limb-girdle muscular dystrophy primarily affects the muscles of the hips and shoulders (the "limb-girdle" muscles), although it is a heterogeneous disorder that can present with varying symptoms. There is currently no cure. We sought to identify the genetic basis of limb-girdle muscular dystrophy type 1 in an American family of Northern European descent using exome sequencing. Exome sequencing was performed on DNA samples from two affected siblings and one unaffected sibling and resulted in the identification of eleven candidate mutations that co-segregated with the disease. Notably, this list included a previously reported mutation in DNAJB6, p.Phe89Ile, which was recently identified as a cause of limb-girdle muscular dystrophy type 1D. Additional family members were Sanger sequenced and the mutation in DNAJB6 was only found in affected individuals. Subsequent haplotype analysis indicated that this DNAJB6 p.Phe89Ile mutation likely arose independently of the previously reported mutation. Since other published mutations are located close by in the G/F domain of DNAJB6, this suggests that the area may represent a mutational hotspot. Exome sequencing provided an unbiased and effective method for identifying the genetic etiology of limb-girdle muscular dystrophy type 1 in a previously genetically uncharacterized family. This work further confirms the causative role of DNAJB6 mutations in limb-girdle muscular dystrophy type 1D. Copyright © 2014 Elsevier B.V. All rights reserved.
Bioinformatic Analysis Reveals Conservation of Intrinsic Disorder in the Linker Sequences of Prokaryotic Dual-family Immunophilin Chaperones.

PubMed

Barik, Sailen

2018-01-01

The two classical immunophilin families, found essentially in all living cells, are: cyclophilin (CYN) and FK506-binding protein (FKBP). We previously reported a novel class of immunophilins that are natural chimera of these two, which we named dual-family immunophilin (DFI). The DFIs were found in either of two conformations: CYN-linker-FKBP (CFBP) or FKBP-3TPR-CYN (FCBP). While the 3TPR domain can serve as a flexible linker between the FKBP and CYN modules in the FCBP-type DFI, the linker sequences in the CFBP-type DFIs are relatively short, diverse in sequence, and contain no discernible motif or signature. Here, I present several lines of computational evidence that, regardless of their primary structure, these CFBP linkers are intrinsically disordered. This report provides the first molecular foundation for the model that the CFBP linker acts as an unstructured, flexible loop, allowing the two flanking chaperone modules function independently while linked in cis , likely to assist in the folding of multisubunit client complexes.
Molecular phylogeny of grey mullets (Teleostei: Mugilidae) in Greece: evidence from sequence analysis of mtDNA segments.

PubMed

Papasotiropoulos, Vasilis; Klossa-Kilia, Elena; Alahiotis, Stamatis N; Kilias, George

2007-08-01

Mitochondrial DNA sequence analysis has been used to explore genetic differentiation and phylogenetic relationships among five species of the Mugilidae family, Mugil cephalus, Chelon labrosus, Liza aurata, Liza ramada, and Liza saliens. DNA was isolated from samples originating from the Messolongi Lagoon in Greece. Three mtDNA segments (12s rRNA, 16s rRNA, and CO I) were PCR amplified and sequenced. Sequencing analysis revealed that the greatest genetic differentiation was observed between M. cephalus and all the other species studied, while C. labrosus and L. aurata were the closest taxa. Dendrograms obtained by the neighbor-joining method and Bayesian inference analysis exhibited the same topology. According to this topology, M. cephalus is the most distinct species and the remaining taxa are clustered together, with C. labrosus and L. aurata forming a single group. The latter result brings into question the monophyletic origin of the genus Liza.
Harnessing Whole Genome Sequencing in Medical Mycology.

PubMed

Cuomo, Christina A

2017-01-01

Comparative genome sequencing studies of human fungal pathogens enable identification of genes and variants associated with virulence and drug resistance. This review describes current approaches, resources, and advances in applying whole genome sequencing to study clinically important fungal pathogens. Genomes for some important fungal pathogens were only recently assembled, revealing gene family expansions in many species and extreme gene loss in one obligate species. The scale and scope of species sequenced is rapidly expanding, leveraging technological advances to assemble and annotate genomes with higher precision. By using iteratively improved reference assemblies or those generated de novo for new species, recent studies have compared the sequence of isolates representing populations or clinical cohorts. Whole genome approaches provide the resolution necessary for comparison of closely related isolates, for example, in the analysis of outbreaks or sampled across time within a single host. Genomic analysis of fungal pathogens has enabled both basic research and diagnostic studies. The increased scale of sequencing can be applied across populations, and new metagenomic methods allow direct analysis of complex samples.
The complete genome sequence of a new polerovirus in strawberry plants from eastern Canada showing strawberry decline symptoms.

PubMed

Xiang, Yu; Bernardy, Mike; Bhagwat, Basdeo; Wiersma, Paul A; DeYoung, Robyn; Bouthillier, Michel

2015-02-01

Strawberry decline disease, probably caused by synergistic reactions of mixed virus infections, threatens the North American strawberry industry. Deep sequencing of strawberry plant samples from eastern Canada resulted in the identification of a new virus genome resembling poleroviruses in sequence and genome structure. Phylogenetic analysis suggests that it is a new member of the genus Polerovirus, family Luteoviridae. The virus is tentatively named "strawberry polerovirus 1" (SPV1).
Complete Genome Sequence of a New Ruminococcaceae Bacterium Isolated from Anaerobic Biomass Hydrolysis.

PubMed

Hahnke, Sarah; Abendroth, Christian; Langer, Thomas; Codoñer, Francisco M; Ramm, Patrice; Porcar, Manuel; Luschnig, Olaf; Klocke, Michael

2018-04-05

A new Ruminococcaceae bacterium, strain HV4-5-B5C, participating in the anaerobic digestion of grass, was isolated from a mesophilic two-stage laboratory-scale leach bed biogas system. The draft annotated genome sequence presented in this study and 16S rRNA gene sequence analysis indicated the affiliation of HV4-5-B5C with the family Ruminococcaceae outside recently described genera. Copyright © 2018 Hahnke et al.
Evidence for 5S rDNA horizontal transfer in the toadfish Halobatrachus didactylus (Schneider, 1801) based on the analysis of three multigene families.

PubMed

Merlo, Manuel A; Cross, Ismael; Palazón, José L; Ubeda-Manzanaro, María; Sarasquete, Carmen; Rebordinos, Laureana

2012-10-07

The Batrachoididae family is a group of marine teleosts that includes several species with more complicated physiological characteristics, such as their excretory, reproductive, cardiovascular and respiratory systems. Previous studies of the 5S rDNA gene family carried out in four species from the Western Atlantic showed two types of this gene in two species but only one in the other two, under processes of concerted evolution and birth-and-death evolution with purifying selection. Here we present results of the 5S rDNA and another two gene families in Halobatrachus didactylus, an Eastern Atlantic species, and draw evolutionary inferences regarding the gene families. In addition we have also mapped the genes on the chromosomes by two-colour fluorescence in situ hybridization (FISH). Two types of 5S rDNA were observed, named type α and type β. Molecular analysis of the 5S rDNA indicates that H. didactylus does not share the non-transcribed spacer (NTS) sequences with four other species of the family; therefore, it must have evolved in isolation. Amplification with the type β specific primers amplified a specific band in 9 specimens of H. didactylus and two of Sparus aurata. Both types showed regulatory regions and a secondary structure which mark them as functional genes. However, the U2 snRNA gene and the ITS-1 sequence showed one electrophoretic band and with one type of sequence. The U2 snRNA sequence was the most variable of the three multigene families studied. Results from two-colour FISH showed no co-localization of the gene coding from three multigene families and provided the first map of the chromosomes of the species. A highly significant finding was observed in the analysis of the 5S rDNA, since two such distant species as H. didactylus and Sparus aurata share a 5S rDNA type. This 5S rDNA type has been detected in other species belonging to the Batrachoidiformes and Perciformes orders, but not in the Pleuronectiformes and Clupeiformes orders. Two hypotheses have been outlined: one is the possible vertical permanence of the shared type in some fish lineages, and the other is the possibility of a horizontal transference event between ancient species of the Perciformes and Batrachoidiformes orders. This finding opens a new perspective in fish evolution and in the knowledge of the dynamism of the 5S rDNA. Cytogenetic analysis allowed some evolutionary trends to be roughed out, such as the progressive change in the U2 snDNA and the organization of (GATA)n repeats, from dispersed to localized in one locus. The accumulation of (GATA)n repeats in one chromosome pair could be implicated in the evolution of a pair of proto-sex chromosomes. This possibility could situate H. didactylus as the most highly evolved of the Batrachoididae family in terms of sex chromosome biology.
Genome-wide screening of Oryza sativa ssp. japonica and indica reveals a complex family of proteins with ribosome-inactivating protein domains.

PubMed

Wytynck, Pieter; Rougé, Pierre; Van Damme, Els J M

2017-11-01

Ribosome-inactivating proteins (RIPs) are cytotoxic enzymes capable of halting protein synthesis by irreversible modification of ribosomes. Although RIPs are widespread they are not ubiquitous in the plant kingdom. The physiological importance of RIPs is not fully elucidated, but evidence suggests a role in the protection of the plant against biotic and abiotic stresses. Searches in the rice genome revealed a large and highly complex family of proteins with a RIP domain. A comparative analysis retrieved 38 RIP sequences from the genome sequence of Oryza sativa subspecies japonica and 34 sequences from the subspecies indica. The RIP sequences are scattered over different chromosomes but are mostly found on the third chromosome. The phylogenetic tree revealed the pairwise clustering of RIPs from japonica and indica. Molecular modeling and sequence analysis yielded information on the catalytic site of the enzyme, and suggested that a large part of RIP domains probably possess N-glycosidase activity. Several RIPs are differentially expressed in plant tissues and in response to specific abiotic stresses. This study provides an overview of RIP motifs in rice and will help to understand their biological role(s) and evolutionary relationships. Copyright © 2017 Elsevier Ltd. All rights reserved.
A comprehensive analysis of three Asiatic black bear mitochondrial genomes (subspecies ussuricus, formosanus and mupinensis), with emphasis on the complete mtDNA sequence of Ursus thibetanus ussuricus (Ursidae).

PubMed

Hwang, Dae-Sik; Ki, Jang-Seu; Jeong, Dong-Hyuk; Kim, Bo-Hyun; Lee, Bae-Keun; Han, Sang-Hoon; Lee, Jae-Seong

2008-08-01

In the present paper, we describe the mitochondrial genome sequence of the Asiatic black bear (Ursus thibetanus ussuricus) with particular emphasis on the control region (CR), and compared with mitochondrial genomes on molecular relationships among the bears. The mitochondrial genome sequence of U. thibetanus ussuricus was 16,700 bp in size with mostly conserved structures (e.g. 13 protein-coding, two rRNA genes, 22 tRNA genes). The CR consisted of several typical conserved domains such as F, E, D, and C boxes, and a conserved sequence block. Nucleotide sequences and the repeated motifs in the CR were different among the bear species, and their copy numbers were also variable according to populations, even within F1 generations of U. thibetanus ussuricus. Comparative analyses showed that the CR D1 region was highly informative for the discrimination of the bear family. These findings suggest that nucleotide sequences of both repeated motifs and CR D1 in the bear family are good markers for species discriminations.

Identification of genes in anonymous DNA sequences. Annual performance report, February 1, 1991--January 31, 1992

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fields, C.A.

1996-06-01

The objective of this project is the development of practical software to automate the identification of genes in anonymous DNA sequences from the human, and other higher eukaryotic genomes. A software system for automated sequence analysis, gm (gene modeler) has been designed, implemented, tested, and distributed to several dozen laboratories worldwide. A significantly faster, more robust, and more flexible version of this software, gm 2.0 has now been completed, and is being tested by operational use to analyze human cosmid sequence data. A range of efforts to further understand the features of eukaryoyic gene sequences are also underway. This progressmore » report also contains papers coming out of the project including the following: gm: a Tool for Exploratory Analysis of DNA Sequence Data; The Human THE-LTR(O) and MstII Interspersed Repeats are subfamilies of a single widely distruted highly variable repeat family; Information contents and dinucleotide compostions of plant intron sequences vary with evolutionary origin; Splicing signals in Drosophila: intron size, information content, and consensus sequences; Integration of automated sequence analysis into mapping and sequencing projects; Software for the C. elegans genome project.« less
[Analysis of SOX10 gene mutation in a family affected with Waardenburg syndrome type II].

PubMed

Zheng, Lei; Yan, Yousheng; Chen, Xue; Zhang, Chuan; Zhang, Qinghua; Feng, Xuan; Hao, Shen

2018-02-10

OBJECTIVE To detect potential mutation of SOX10 gene in a pedigree affected with Warrdenburg syndrome type II. METHODS Genomic DNA was extracted from peripheral blood samples of the proband and his family members. Exons and flanking sequences of MITF, PAX3, SOX10, SNAI2, END3 and ENDRB genes were analyzed by chip capturing and high throughput sequencing. Suspected mutations were verified with Sanger sequencing. RESULTS A c.127C>T (p.R43X) mutation of the SOX10 gene was detected in the proband, for which both parents showed a wild-type genotype. CONCLUSION The c.127C>T (p.R43X) mutation of SOX10 gene probably underlies the ocular symptoms and hearing loss of the proband.
[Analysis of the NDP gene in a Chinese family with X-linked recessive Norrie disease].

PubMed

Mei, Libin; Huang, Yanru; Pan, Qian; Liang, Desheng; Wu, Lingqian

2015-05-01

The purpose of the current research was to investigate the NDP (Norrie disease protein) gene in one Chinese family with Norrie disease (ND) and to characterize the related clinical features. Clinical data of the proband and his family members were collected. Complete ophthalmic examinations were carried out on the proband. Genomic DNA was extracted from peripheral blood leukocytes of 35 family members. Molecular analysis of the NDP gene was performed by polymerase chain reaction and direct sequencing of all exons and flanking regions. A hemizygous NDP missense mutation c.362G > A (p.Arg121Gln) in exon 3 was identified in the affected members, but not in any of the unaffected family individuals. The missense mutation c.362G > A in NDP is responsible for the Norrie disease in this family. This discovery will help provide the family members with accurate and reliable genetic counseling and prenatal diagnosis.
RStrucFam: a web server to associate structure and cognate RNA for RNA-binding proteins from sequence information.

PubMed

Ghosh, Pritha; Mathew, Oommen K; Sowdhamini, Ramanathan

2016-10-07

RNA-binding proteins (RBPs) interact with their cognate RNA(s) to form large biomolecular assemblies. They are versatile in their functionality and are involved in a myriad of processes inside the cell. RBPs with similar structural features and common biological functions are grouped together into families and superfamilies. It will be useful to obtain an early understanding and association of RNA-binding property of sequences of gene products. Here, we report a web server, RStrucFam, to predict the structure, type of cognate RNA(s) and function(s) of proteins, where possible, from mere sequence information. The web server employs Hidden Markov Model scan (hmmscan) to enable association to a back-end database of structural and sequence families. The database (HMMRBP) comprises of 437 HMMs of RBP families of known structure that have been generated using structure-based sequence alignments and 746 sequence-centric RBP family HMMs. The input protein sequence is associated with structural or sequence domain families, if structure or sequence signatures exist. In case of association of the protein with a family of known structures, output features like, multiple structure-based sequence alignment (MSSA) of the query with all others members of that family is provided. Further, cognate RNA partner(s) for that protein, Gene Ontology (GO) annotations, if any and a homology model of the protein can be obtained. The users can also browse through the database for details pertaining to each family, protein or RNA and their related information based on keyword search or RNA motif search. RStrucFam is a web server that exploits structurally conserved features of RBPs, derived from known family members and imprinted in mathematical profiles, to predict putative RBPs from sequence information. Proteins that fail to associate with such structure-centric families are further queried against the sequence-centric RBP family HMMs in the HMMRBP database. Further, all other essential information pertaining to an RBP, like overall function annotations, are provided. The web server can be accessed at the following link: http://caps.ncbs.res.in/rstrucfam .
The interplay of descriptor-based computational analysis with pharmacophore modeling builds the basis for a novel classification scheme for feruloyl esterases.

PubMed

Udatha, D B R K Gupta; Kouskoumvekaki, Irene; Olsson, Lisbeth; Panagiotou, Gianni

2011-01-01

One of the most intriguing groups of enzymes, the feruloyl esterases (FAEs), is ubiquitous in both simple and complex organisms. FAEs have gained importance in biofuel, medicine and food industries due to their capability of acting on a large range of substrates for cleaving ester bonds and synthesizing high-added value molecules through esterification and transesterification reactions. During the past two decades extensive studies have been carried out on the production and partial characterization of FAEs from fungi, while much less is known about FAEs of bacterial or plant origin. Initial classification studies on FAEs were restricted on sequence similarity and substrate specificity on just four model substrates and considered only a handful of FAEs belonging to the fungal kingdom. This study centers on the descriptor-based classification and structural analysis of experimentally verified and putative FAEs; nevertheless, the framework presented here is applicable to every poorly characterized enzyme family. 365 FAE-related sequences of fungal, bacterial and plantae origin were collected and they were clustered using Self Organizing Maps followed by k-means clustering into distinct groups based on amino acid composition and physico-chemical composition descriptors derived from the respective amino acid sequence. A Support Vector Machine model was subsequently constructed for the classification of new FAEs into the pre-assigned clusters. The model successfully recognized 98.2% of the training sequences and all the sequences of the blind test. The underlying functionality of the 12 proposed FAE families was validated against a combination of prediction tools and published experimental data. Another important aspect of the present work involves the development of pharmacophore models for the new FAE families, for which sufficient information on known substrates existed. Knowing the pharmacophoric features of a small molecule that are essential for binding to the members of a certain family opens a window of opportunities for tailored applications of FAEs. Copyright © 2010 Elsevier Inc. All rights reserved.
A generalized least-squares framework for rare-variant analysis in family data.

PubMed

Li, Dalin; Rotter, Jerome I; Guo, Xiuqing

2014-01-01

Rare variants may, in part, explain some of the hereditability missing in current genome-wide association studies. Many gene-based rare-variant analysis approaches proposed in recent years are aimed at population-based samples, although analysis strategies for family-based samples are clearly warranted since the family-based design has the potential to enhance our ability to enrich for rare causal variants. We have recently developed the generalized least squares, sequence kernel association test, or GLS-SKAT, approach for the rare-variant analyses in family samples, in which the kinship matrix that was computed from the high dimension genetic data was used to decorrelate the family structure. We then applied the SKAT-O approach for gene-/region-based inference in the decorrelated data. In this study, we applied this GLS-SKAT method to the systolic blood pressure data in the simulated family sample distributed by the Genetic Analysis Workshop 18. We compared the GLS-SKAT approach to the rare-variant analysis approach implemented in family-based association test-v1 and demonstrated that the GLS-SKAT approach provides superior power and good control of type I error rate.
Unravelling the complexity of microRNA-mediated gene regulation in black pepper (Piper nigrum L.) using high-throughput small RNA profiling.

PubMed

Asha, Srinivasan; Sreekumar, Sweda; Soniya, E V

2016-01-01

Analysis of high-throughput small RNA deep sequencing data, in combination with black pepper transcriptome sequences revealed microRNA-mediated gene regulation in black pepper ( Piper nigrum L.). Black pepper is an important spice crop and its berries are used worldwide as a natural food additive that contributes unique flavour to foods. In the present study to characterize microRNAs from black pepper, we generated a small RNA library from black pepper leaf and sequenced it by Illumina high-throughput sequencing technology. MicroRNAs belonging to a total of 303 conserved miRNA families were identified from the sRNAome data. Subsequent analysis from recently sequenced black pepper transcriptome confirmed precursor sequences of 50 conserved miRNAs and four potential novel miRNA candidates. Stem-loop qRT-PCR experiments demonstrated differential expression of eight conserved miRNAs in black pepper. Computational analysis of targets of the miRNAs showed 223 potential black pepper unigene targets that encode diverse transcription factors and enzymes involved in plant development, disease resistance, metabolic and signalling pathways. RLM-RACE experiments further mapped miRNA-mediated cleavage at five of the mRNA targets. In addition, miRNA isoforms corresponding to 18 miRNA families were also identified from black pepper. This study presents the first large-scale identification of microRNAs from black pepper and provides the foundation for the future studies of miRNA-mediated gene regulation of stress responses and diverse metabolic processes in black pepper.
Phylogenetic relationships and systematic position of the families Cortrematidae and Phaneropsolidae (Platyhelminthes: Digenea).

PubMed

Kanarek, Gerard; Zaleśny, Grzegorz; Sitko, Jiljí; Tkach, Vasyl V

2014-12-01

The systematic position and phylogenetic relationships of the family Cortrematidae Yamaguti, 1958 have always been controversial. In the present study, the phylogenetic relationships of this family and its constituent genera and families within the superfamily Microphalloidea were evaluated using previously published and newly obtained sequences of 28S rDNA of Cortrema magnicaudata (Bykhovskaya-Pavlovskaya, 1950) (Cortrematidae), Phaneropsolus praomydis Baer, 1971 and Microtrema barusi Sitko, 2013 (Phaneropsolidae). Results clearly demonstrate that the genus Cortrema Tang, 1951 is closest to Gyrabascus Macy 1935, both genera forming one of the clades within the family Pleurogenidae in the superfamily Microphalloidea and sharing several important morphological features. Thus, the family Cortrematidae should be considered among synonyms of the Pleurogenidae. Based on the analysis of morphology, C. corti Tang, 1951, C. testilobata (Bykhovskaya-Pavlovskaya, 1953) and C. niloticus Ashour, Ahmed et Lewis, 1994 are considered junior synonyms of C. magnicaudata. The phylogenetic position of P. praomydis as a family-level branch not showing close relationships with other families of the Microphalloidea, supports the status of the Phaneropsolidae as an independent family. The genus Parabascus Looss, 1907 previously considered within the Phaneropsolidae clearly belongs to the Pleurogenidae. In addition, the molecular phylogeny has demonstrated that the recently described phaneropsolid Microtrema barusi belongs to the microphallid genus Microphallus Ward, 1901. Therefore, Microtrema Sitko, 2013 is considered a junior synonym of Microphallus. Our analysis has also confirmed the status of Collyriclidae as a family within the Microphalloidea. Not yet sequenced representatives of other families within the Microphalloidea (e.g. Anenterotrematidae, Eumegacetidae, Renschtrematidae, Stomylotrematidae, etc.) need to be included in future molecular phylogenetic studies to better unravel the taxonomic structure and content of this diverse digenean superfamily.
Within-Genome Evolution of REPINs: a New Family of Miniature Mobile DNA in Bacteria

PubMed Central

Bertels, Frederic; Rainey, Paul B.

2011-01-01

Repetitive sequences are a conserved feature of many bacterial genomes. While first reported almost thirty years ago, and frequently exploited for genotyping purposes, little is known about their origin, maintenance, or processes affecting the dynamics of within-genome evolution. Here, beginning with analysis of the diversity and abundance of short oligonucleotide sequences in the genome of Pseudomonas fluorescens SBW25, we show that over-represented short sequences define three distinct groups (GI, GII, and GIII) of repetitive extragenic palindromic (REP) sequences. Patterns of REP distribution suggest that closely linked REP sequences form a functional replicative unit: REP doublets are over-represented, randomly distributed in extragenic space, and more highly conserved than singlets. In addition, doublets are organized as inverted repeats, which together with intervening spacer sequences are predicted to form hairpin structures in ssDNA or mRNA. We refer to these newly defined entities as REPINs (REP doublets forming hairpins) and identify short reads from population sequencing that reveal putative transposition intermediates. The proximal relationship between GI, GII, and GIII REPINs and specific REP-associated tyrosine transposases (RAYTs), combined with features of the putative transposition intermediate, suggests a mechanism for within-genome dissemination. Analysis of the distribution of REPs in a range of RAYT–containing bacterial genomes, including Escherichia coli K-12 and Nostoc punctiforme, show that REPINs are a widely distributed, but hitherto unrecognized, family of miniature non-autonomous mobile DNA. PMID:21698139
Mutations in ABCR (ABCA4) in patients with Stargardt macular degeneration or cone-rod degeneration.

PubMed

Briggs, C E; Rucinski, D; Rosenfeld, P J; Hirose, T; Berson, E L; Dryja, T P

2001-09-01

To determine the spectrum of ABCR mutations associated with Stargardt macular degeneration and cone-rod degeneration (CRD). One hundred eighteen unrelated patients with recessive Stargardt macular degeneration and eight with recessive CRD were screened for mutations in ABCR (ABCA4) by single-strand conformation polymorphism analysis. Variants were characterized by direct genomic sequencing. Segregation analysis was performed on the families of 20 patients in whom at least two or more likely pathogenic sequence changes were identified. The authors found 77 sequence changes likely to be pathogenic: 21 null mutations (15 novel), 55 missense changes (26 novel), and one deletion of a consensus glycosylation site (also novel). Fifty-two patients with Stargardt macular degeneration (44% of those screened) and five with CRD each had two of these sequence changes or were homozygous for one of them. Segregation analyses in the families of 19 of these patients were informative and revealed that the index cases and all available affected siblings were compound heterozygotes or homozygotes. The authors found one instance of an apparently de novo mutation, Ile824Thr, in a patient. Thirty-seven (31%) of the 118 patients with Stargardt disease and one with CRD had only one likely pathogenic sequence change. Twenty-nine patients with Stargardt disease (25%) and two with CRD had no identified sequence changes. This report of 42 novel mutations brings the growing number of identified likely pathogenic sequence changes in ABCR to approximately 250.
Cellulase Linkers Are Optimized Based on Domain Type and Function: Insights from Sequence Analysis, Biophysical Measurements, and Molecular Simulation

PubMed Central

Sammond, Deanne W.; Payne, Christina M.; Brunecky, Roman; Himmel, Michael E.; Crowley, Michael F.; Beckham, Gregg T.

2012-01-01

Cellulase enzymes deconstruct cellulose to glucose, and are often comprised of glycosylated linkers connecting glycoside hydrolases (GHs) to carbohydrate-binding modules (CBMs). Although linker modifications can alter cellulase activity, the functional role of linkers beyond domain connectivity remains unknown. Here we investigate cellulase linkers connecting GH Family 6 or 7 catalytic domains to Family 1 or 2 CBMs, from both bacterial and eukaryotic cellulases to identify conserved characteristics potentially related to function. Sequence analysis suggests that the linker lengths between structured domains are optimized based on the GH domain and CBM type, such that linker length may be important for activity. Longer linkers are observed in eukaryotic GH Family 6 cellulases compared to GH Family 7 cellulases. Bacterial GH Family 6 cellulases are found with structured domains in either N to C terminal order, and similar linker lengths suggest there is no effect of domain order on length. O-glycosylation is uniformly distributed across linkers, suggesting that glycans are required along entire linker lengths for proteolysis protection and, as suggested by simulation, for extension. Sequence comparisons show that proline content for bacterial linkers is more than double that observed in eukaryotic linkers, but with fewer putative O-glycan sites, suggesting alternative methods for extension. Conversely, near linker termini where linkers connect to structured domains, O-glycosylation sites are observed less frequently, whereas glycines are more prevalent, suggesting the need for flexibility to achieve proper domain orientations. Putative N-glycosylation sites are quite rare in cellulase linkers, while an N-P motif, which strongly disfavors the attachment of N-glycans, is commonly observed. These results suggest that linkers exhibit features that are likely tailored for optimal function, despite possessing low sequence identity. This study suggests that cellulase linkers may exhibit function in enzyme action, and highlights the need for additional studies to elucidate cellulase linker functions. PMID:23139804
Analysis of full-length sequences of two Citrus yellow mosaic badnavirus isolates infecting Citrus jambhiri (Rough Lemon) and Citrus sinensis L. Osbeck (Sweet Orange) from a nursery in India.

PubMed

Anthony Johnson, A M; Borah, B K; Sai Gopal, D V R; Dasgupta, I

2012-12-01

Citrus yellow mosaic badna virus (CMBV), a member of the Family Caulimoviridae, Genus Badnavirus is the causative agent of mosaic disease among Citrus species in southern India. Despite its reported prevalence in several citrus species, complete information on clear functional genomics or functional information of full-length genomes from all the CMBV isolates infecting citrus species are not available in publicly accessible databases. CMBV isolates from Rough Lemon and Sweet Orange collected from a nursery were cloned and sequenced. The analysis revealed high sequence homology of the two CMBV isolates with previously reported CMBV sequences implying that they represent new variants. Based on computational analysis of the predicted secondary structures, the possible functions of some CMBV proteins have been analyzed.
Whole-exome sequencing analysis of Waardenburg syndrome in a Chinese family.

PubMed

Chen, Dezhong; Zhao, Na; Wang, Jing; Li, Zhuoyu; Wu, Changxin; Fu, Jie; Xiao, Han

2017-01-01

Waardenburg syndrome (WS) is a dominantly inherited, genetically heterogeneous auditory-pigmentary syndrome characterized by non-progressive sensorineural hearing loss and iris discoloration. By whole-exome sequencing (WES), we identified a nonsense mutation (c.598C>T) in PAX3 gene, predicted to be disease causing by in silico analysis. This is the first report of genetically diagnosed case of WS PAX3 c.598C>T nonsense mutation in Chinese ethnic origin by WES and in silico functional prediction methods.
Whole-exome sequencing analysis of Waardenburg syndrome in a Chinese family

PubMed Central

Chen, Dezhong; Zhao, Na; Wang, Jing; Li, Zhuoyu; Wu, Changxin; Fu, Jie; Xiao, Han

2017-01-01

Waardenburg syndrome (WS) is a dominantly inherited, genetically heterogeneous auditory-pigmentary syndrome characterized by non-progressive sensorineural hearing loss and iris discoloration. By whole-exome sequencing (WES), we identified a nonsense mutation (c.598C>T) in PAX3 gene, predicted to be disease causing by in silico analysis. This is the first report of genetically diagnosed case of WS PAX3 c.598C>T nonsense mutation in Chinese ethnic origin by WES and in silico functional prediction methods. PMID:28690861
New Sequences with Low Correlation and Large Family Size

NASA Astrophysics Data System (ADS)

Zeng, Fanxin

In direct-sequence code-division multiple-access (DS-CDMA) communication systems and direct-sequence ultra wideband (DS-UWB) radios, sequences with low correlation and large family size are important for reducing multiple access interference (MAI) and accepting more active users, respectively. In this paper, a new collection of families of sequences of length pn-1, which includes three constructions, is proposed. The maximum number of cyclically distinct families without GMW sequences in each construction is φ(pn-1)/n·φ(pm-1)/m, where p is a prime number, n is an even number, and n=2m, and these sequences can be binary or polyphase depending upon choice of the parameter p. In Construction I, there are pn distinct sequences within each family and the new sequences have at most d+2 nontrivial periodic correlation {-pm-1, -1, pm-1, 2pm-1,…,dpm-1}. In Construction II, the new sequences have large family size p2n and possibly take the nontrivial correlation values in {-pm-1, -1, pm-1, 2pm-1,…,(3d-4)pm-1}. In Construction III, the new sequences possess the largest family size p(d-1)n and have at most 2d correlation levels {-pm-1, -1,pm-1, 2pm-1,…,(2d-2)pm-1}. Three constructions are near-optimal with respect to the Welch bound because the values of their Welch-Ratios are moderate, WR_??_d, WR_??_3d-4 and WR_??_2d-2, respectively. Each family in Constructions I, II and III contains a GMW sequence. In addition, Helleseth sequences and Niho sequences are special cases in Constructions I and III, and their restriction conditions to the integers m and n, pm≠2 (mod 3) and n≅0 (mod 4), respectively, are removed in our sequences. Our sequences in Construction III include the sequences with Niho type decimation 3·2m-2, too. Finally, some open questions are pointed out and an example that illustrates the performance of these sequences is given.
A Patient With Desmoid Tumors and Familial FAP Having Frame Shift Mutation of the APC Gene.

PubMed

Sadighi, Sanambar; Ghaffari-Moghaddam, Mahsa; Saffari, Mojtaba; Mohagheghi, Mohammad Ali; Shirkoohi, Reza

2017-02-01

Desmoids tumors, characterized by monoclonal proliferation of myofibroblasts, could occur in 5-10% of patients with familial adenomatous polyposis (FAP) as an extra-colonic manifestation of the disease. FAP can develop when there is a germ-line mutation in the adenomatous polyposis coli gene. Although mild or attenuated FAP may follow mutations in 5΄ extreme of the gene, it is more likely that 3΄ extreme mutations haveamore severe manifestation of thedisease. A 28-year-old woman was admitted to the Cancer Institute of Iran with an abdominal painful mass. She had strong family history of FAP and underwent prophylactic total colectomy. Pre-operative CT scans revealed a large mass. Microscopic observation showed diffuse fibroblast cell infiltration of the adjacent tissue structures. Peripheral blood DNA extraction followed by adenomatous polyposis coli gene exon by exon sequencing was performed to investigate the mutation in adenomatous polyposis coli gene. Analysis of DNA sequencing demonstrated a mutation of 4 bpdeletions at codon 1309-1310 of the exon 16 of adenomatous polyposis coli gene sequence which was repeated in 3 members of the family. Some of them had desmoid tumor without classical FAP history. Even when there is no familial history of adenomatous polyposis, the adenomatous polyposis coli gene mutation should be investigated in cases of familial desmoids tumors for a suitable prevention. The 3΄ extreme of the adenomatous polyposis coli gene is still the best likely location in such families.
A novel mutation of the beta myosin heavy chain gene responsible for familial hypertrophic cardiomyopathy.

PubMed

Wang, Juan; Xu, Shi-Jie; Zhou, Hua; Wang, Li-Jie; Hu, Bo; Fang, Fang; Zhang, Xu-Min; Luo, Yi-Wei; He, Xiao-Yan; Zhuang, Shao-Wei; Li, Xin-Ming; Liu, Zhong-Ming; Hu, Da-Yi

2009-09-01

Hypertrophic cardiomyopathy (HCM) is the most common inherited cardiac disorder and shows high variability in genetic heterogeneity and phenotypic characteristics. The genetic etiology responsible for HCM in many individuals remains unclear. This instigation was sought to identify novel genetic determinants for familial hypertrophic cardiomyopathy. Six unrelated Chinese families with HCM were studied. For each of the 13 established HCM-susceptibility genes, 3 to 5 microsatellite markers were selected to perform genotyping and haplotype analysis. The linked genes were sequenced. Haplotype analyses on candidate genetic loci revealed cosegregation of the gene beta-myosin heavy chain (MYH7) with HCM in a single family. A novel double heterozygous missense mutation of Ala26Val plus Arg719Trp in MYH7 was subsequently identified by sequencing in this family and was associated with a severe phenotype of HCM. The novel double mutation of Ala26Val plus Arg719Trp in MYH7 identified in a Chinese family highlights the remarkable genetic heterogeneity of HCM, which provides important information for genetic counseling, accurate diagnosis, prognostic evaluation, and appropriate clinical management. Copyright 2009 Wiley Periodicals, Inc.
Genome-Wide Analysis of the Musa WRKY Gene Family: Evolution and Differential Expression during Development and Stress

PubMed Central

Goel, Ridhi; Pandey, Ashutosh; Trivedi, Prabodh K.; Asif, Mehar H.

2016-01-01

The WRKY gene family plays an important role in the development and stress responses in plants. As information is not available on the WRKY gene family in Musa species, genome-wide analysis has been carried out in this study using available genomic information from two species, Musa acuminata and Musa balbisiana. Analysis identified 147 and 132 members of the WRKY gene family in M. acuminata and M. balbisiana, respectively. Evolutionary analysis suggests that the WRKY gene family expanded much before the speciation in both the species. Most of the orthologs retained in two species were from the γ duplication event which occurred prior to α and β genome-wide duplication (GWD) events. Analysis also suggests that subtle changes in nucleotide sequences during the course of evolution have led to the development of new motifs which might be involved in neo-functionalization of different WRKY members in two species. Expression and cis-regulatory motif analysis suggest possible involvement of Group II and Group III WRKY members during various stresses and growth/development including fruit ripening process respectively. PMID:27014321
Genome-Wide Analysis of the Musa WRKY Gene Family: Evolution and Differential Expression during Development and Stress.

PubMed

Goel, Ridhi; Pandey, Ashutosh; Trivedi, Prabodh K; Asif, Mehar H

2016-01-01

The WRKY gene family plays an important role in the development and stress responses in plants. As information is not available on the WRKY gene family in Musa species, genome-wide analysis has been carried out in this study using available genomic information from two species, Musa acuminata and Musa balbisiana. Analysis identified 147 and 132 members of the WRKY gene family in M. acuminata and M. balbisiana, respectively. Evolutionary analysis suggests that the WRKY gene family expanded much before the speciation in both the species. Most of the orthologs retained in two species were from the γ duplication event which occurred prior to α and β genome-wide duplication (GWD) events. Analysis also suggests that subtle changes in nucleotide sequences during the course of evolution have led to the development of new motifs which might be involved in neo-functionalization of different WRKY members in two species. Expression and cis-regulatory motif analysis suggest possible involvement of Group II and Group III WRKY members during various stresses and growth/development including fruit ripening process respectively.
Diversified clinical presentations associated with a novel sal-like 4 gene mutation in a Chinese pedigree with Duane retraction syndrome.

PubMed

Yang, Ming-ming; Ho, Mary; Lau, Henry H W; Tam, Pancy O S; Young, Alvin L; Pang, Chi Pui; Yip, Wilson W K; Chen, LiJia

2013-01-01

To determine the underlying genetic cause of Duane retraction syndrome (DRS) in a non-consanguineous Chinese Han family. Detailed ophthalmic and physical examinations were performed on all members from a pedigree with DRS. All exons and their adjacent splicing junctions of the sal-like 4 (SALL4) gene were amplified with polymerase chain reaction and analyzed with direct sequencing in all the recruited family members and 200 unrelated control subjects. Clinical examination revealed a broad spectrum of phenotypes in the DRS family. Mutation analysis of SALL4 identified a novel heterozygous duplication mutation, c.1919dupT, which was completely cosegregated with the disease in the family and absent in controls. This mutation was predicted to cause a frameshift, introducing a premature stop codon, when translated, resulting in a truncated SALL4 protein, i.e., p.Met640IlefsX25. Bioinformatics analysis showed that the affected region of SALL4 shared a highly conserved sequence across different species. Diversified clinical manifestations were observed in the c.1919dupT carriers of the family. We identified a novel truncating mutation in the SALL4 gene that leads to diversified clinical features of DRS in a Chinese family. This mutation is predicted to result in a truncated SALL4 protein affecting two functional domains and cause disease development due to haploinsufficiency through nonsense-mediated mRNA decay.

Basfia succiniciproducens gen. nov., sp. nov., a new member of the family Pasteurellaceae isolated from bovine rumen.

PubMed

Kuhnert, Peter; Scholten, Edzard; Haefner, Stefan; Mayor, Désirée; Frey, Joachim

2010-01-01

Gram-negative, coccoid, non-motile bacteria that are catalase-, urease- and indole-negative, facultatively anaerobic and oxidase-positive were isolated from the bovine rumen using an improved selective medium for members of the Pasteurellaceae. All strains produced significant amounts of succinic acid under anaerobic conditions with glucose as substrate. Phenotypic characterization and multilocus sequence analysis (MLSA) using 16S rRNA, rpoB, infB and recN genes were performed on seven independent isolates. All four genes showed high sequence similarity to their counterparts in the genome sequence of the patent strain MBEL55E, but less than 95 % 16S rRNA gene sequence similarity to any other species of the Pasteurellaceae. Genetically these strains form a very homogeneous group in individual as well as combined phylogenetic trees, clearly separated from other genera of the family from which they can also be separated based on phenotypic markers. Genome relatedness as deduced from the recN gene showed high interspecies similarities, but again low similarity to any of the established genera of the family. No toxicity towards bovine, human or fish cells was observed and no RTX toxin genes were detected in members of the new taxon. Based on phylogenetic clustering in the MLSA analysis, the low genetic similarity to other genera and the phenotypic distinction, we suggest to classify these bovine rumen isolates as Basfia succiniciproducens gen. nov., sp. nov. The type strain is JF4016(T) (=DSM 22022(T) =CCUG 57335(T)).
Complete Sequence and Analysis of Coconut Palm (Cocos nucifera) Mitochondrial Genome.

PubMed

Aljohi, Hasan Awad; Liu, Wanfei; Lin, Qiang; Zhao, Yuhui; Zeng, Jingyao; Alamer, Ali; Alanazi, Ibrahim O; Alawad, Abdullah O; Al-Sadi, Abdullah M; Hu, Songnian; Yu, Jun

2016-01-01

Coconut (Cocos nucifera L.), a member of the palm family (Arecaceae), is one of the most economically important crops in tropics, serving as an important source of food, drink, fuel, medicine, and construction material. Here we report an assembly of the coconut (C. nucifera, Oman local Tall cultivar) mitochondrial (mt) genome based on next-generation sequencing data. This genome, 678,653bp in length and 45.5% in GC content, encodes 72 proteins, 9 pseudogenes, 23 tRNAs, and 3 ribosomal RNAs. Within the assembly, we find that the chloroplast (cp) derived regions account for 5.07% of the total assembly length, including 13 proteins, 2 pseudogenes, and 11 tRNAs. The mt genome has a relatively large fraction of repeat content (17.26%), including both forward (tandem) and inverted (palindromic) repeats. Sequence variation analysis shows that the Ti/Tv ratio of the mt genome is lower as compared to that of the nuclear genome and neutral expectation. By combining public RNA-Seq data for coconut, we identify 734 RNA editing sites supported by at least two datasets. In summary, our data provides the second complete mt genome sequence in the family Arecaceae, essential for further investigations on mitochondrial biology of seed plants.
The production of Multiple Small Peptaibol Families by Single 14-Module Peptide Synthetases in Trichoderma/Hypocrea

DOE Office of Scientific and Technical Information (OSTI.GOV)

Degenkolb, Thomas; Aghchehb, Razieh Karimi; Dieckmann, Ralf

2012-03-01

The most common peptaibibiotic structures are 11-residue peptaibols found widely distributed in the genus Trichoderma/Hypocrea. Frequently associated are 14-residue peptaibols sharing partial sequence identity. Genome sequencing projects of 3 Trichoderma strains of the major clades reveal the presence of up to 3 types of nonribosomal peptide synthetases with 7, 14, or 18-20 amino acid adding modules. We here provide evidence that the 14-module NRPS type found in T. virens, T. reesei (teleomorph Hypocrea jecorina) and T. atroviride produces both 11- and 14- residue peptaibols based on the disruption of the respective NRPS gene of T. reesei, and bioinformatic analysis ofmore » their amino acid activating domains and modules. The structures of these peptides may be predicted from the gene structures and have been confirmed by analysis of families of 11- and 14-residue peptaibols from the strain 618, termed hypojecorins A (23 sequences determined, 4 new) and B (3 new sequences), and the recently established trichovirins A from T. virens. The distribution of 11- and 14-residue products is strain-specific and depends on growth conditions as well. Possible mechanisms of module skipping are discussed.« less
Genomic analysis of WCP30 Phage of Weissella cibaria for Dairy Fermented Foods.

PubMed

Lee, Young-Duck; Park, Jong-Hyun

2017-01-01

In this study, we report the morphogenetic analysis and genome sequence of a new WCP30 phage of Weissella cibaria , isolated from a fermented food. Based on its morphology, as observed by transmission electron microscopy, WCP30 phage belongs to the family Siphoviridae . Genomic analysis of WCP30 phage showed that it had a 33,697-bp double-stranded DNA genome with 41.2% G+C content. Bioinformatics analysis of the genome revealed 35 open reading frames. A BLASTN search showed that WCP30 phage had low sequence similarity compared to other phages infecting lactic acid bacteria. This is the first report of the morphological features and complete genome sequence of WCP30 phage, which may be useful for controlling the fermentation of dairy foods.
StructRNAfinder: an automated pipeline and web server for RNA families prediction.

PubMed

Arias-Carrasco, Raúl; Vásquez-Morán, Yessenia; Nakaya, Helder I; Maracaja-Coutinho, Vinicius

2018-02-17

The function of many noncoding RNAs (ncRNAs) depend upon their secondary structures. Over the last decades, several methodologies have been developed to predict such structures or to use them to functionally annotate RNAs into RNA families. However, to fully perform this analysis, researchers should utilize multiple tools, which require the constant parsing and processing of several intermediate files. This makes the large-scale prediction and annotation of RNAs a daunting task even to researchers with good computational or bioinformatics skills. We present an automated pipeline named StructRNAfinder that predicts and annotates RNA families in transcript or genome sequences. This single tool not only displays the sequence/structural consensus alignments for each RNA family, according to Rfam database but also provides a taxonomic overview for each assigned functional RNA. Moreover, we implemented a user-friendly web service that allows researchers to upload their own nucleotide sequences in order to perform the whole analysis. Finally, we provided a stand-alone version of StructRNAfinder to be used in large-scale projects. The tool was developed under GNU General Public License (GPLv3) and is freely available at http://structrnafinder.integrativebioinformatics.me . The main advantage of StructRNAfinder relies on the large-scale processing and integrating the data obtained by each tool and database employed along the workflow, of which several files are generated and displayed in user-friendly reports, useful for downstream analyses and data exploration.
The Liverwort Contains a Lectin That Is Structurally and Evolutionary Related to the Monocot Mannose-Binding Lectins1

PubMed Central

Peumans, Willy J.; Barre, Annick; Bras, Julien; Rougé, Pierre; Proost, Paul; Van Damme, Els J.M.

2002-01-01

A mannose (Man)-binding lectin has been isolated and characterized from the thallus of the liverwort Marchantia polymorpha. N-terminal sequencing indicated that the M. polymorpha agglutinin (Marpola) shares sequence similarity with the superfamily of monocot Man-binding lectins. Searches in the databases yielded expressed sequence tags encoding Marpola. Sequence analysis, molecular modeling, and docking experiments revealed striking structural similarities between Marpola and the monocot Man-binding lectins. Activity and specificity studies further indicated that Marpola is a much stronger agglutinin than the Galanthus nivalis agglutinin and exhibits a preference for methylated Man and glucose, which is unprecedented within the family of monocot Man-binding lectins. The discovery of Marpola allows us, for the first time, to corroborate the evolutionary relationship between a lectin from a lower plant and a well-established lectin family from flowering plants. In addition, the identification of Marpola sheds a new light on the molecular evolution of the superfamily of monocot Man-binding lectins. Beside evolutionary considerations, the occurrence of a G. nivalis agglutinin homolog in a lower plant necessitates the rethinking of the physiological role of the whole family of monocot Man-binding lectins. PMID:12114560
Unique Features of the Loblolly Pine (Pinus taeda L.) Megagenome Revealed Through Sequence Annotation

PubMed Central

Wegrzyn, Jill L.; Liechty, John D.; Stevens, Kristian A.; Wu, Le-Shin; Loopstra, Carol A.; Vasquez-Gross, Hans A.; Dougherty, William M.; Lin, Brian Y.; Zieve, Jacob J.; Martínez-García, Pedro J.; Holt, Carson; Yandell, Mark; Zimin, Aleksey V.; Yorke, James A.; Crepeau, Marc W.; Puiu, Daniela; Salzberg, Steven L.; de Jong, Pieter J.; Mockaitis, Keithanne; Main, Doreen; Langley, Charles H.; Neale, David B.

2014-01-01

The largest genus in the conifer family Pinaceae is Pinus, with over 100 species. The size and complexity of their genomes (∼20–40 Gb, 2n = 24) have delayed the arrival of a well-annotated reference sequence. In this study, we present the annotation of the first whole-genome shotgun assembly of loblolly pine (Pinus taeda L.), which comprises 20.1 Gb of sequence. The MAKER-P annotation pipeline combined evidence-based alignments and ab initio predictions to generate 50,172 gene models, of which 15,653 are classified as high confidence. Clustering these gene models with 13 other plant species resulted in 20,646 gene families, of which 1554 are predicted to be unique to conifers. Among the conifer gene families, 159 are composed exclusively of loblolly pine members. The gene models for loblolly pine have the highest median and mean intron lengths of 24 fully sequenced plant genomes. Conifer genomes are full of repetitive DNA, with the most significant contributions from long-terminal-repeat retrotransposons. In depth analysis of the tandem and interspersed repetitive content yielded a combined estimate of 82%. PMID:24653211
GNE missense mutation in recessive familial amyotrophic lateral sclerosis.

PubMed

Köroğlu, Çiğdem; Yılmaz, Rezzak; Sorgun, Mine Hayriye; Solakoğlu, Seyhun; Şener, Özden

2017-12-01

Amyotrophic lateral sclerosis (ALS) is a motor neuron disease eventually leading to death from respiratory failure. Recessive inheritance is very rare. Here, we describe the clinical findings in a consanguineous family with five men afflicted with recessive ALS and the identification of the homozygous mutation responsible for the disorder. The onset of the disease ranged from 12 to 35 years of age, with variable disease progressions. We performed clinical investigations including metabolic and paraneoplastic screening, cranial and cervical imaging, and electrophysiology. We mapped the disease gene to 9p21.1-p12 with a LOD score of 5.2 via linkage mapping using genotype data for single-nucleotide polymorphism markers and performed exome sequence analysis to identify the disease-causing gene variant. We also Sanger sequenced all coding sequences of SIGMAR1, a gene reported as responsible for juvenile ALS in a family. We did not find any mutation in SIGMAR1. Instead, we identified a novel homozygous missense mutation p.(His705Arg) in GNE which was predicted as damaging by online tools. GNE has been associated with inclusion body myopathy and is expressed in many tissues. We propose that the GNE mutation underlies the pathology in the family.
Seven novel mutations in the long isoform of the USH2A gene in Chinese families with nonsyndromic retinitis pigmentosa and Usher syndrome Type II

PubMed Central

Xu, Wenjun; Dai, Hanjun; Lu, Tingting; Zhang, Xiaohui; Dong, Bing

2011-01-01

Purpose To describe the clinical and genetic findings in one Chinese family with autosomal recessive retinitis pigmentosa (arRP) and in three unrelated Chinese families with Usher syndrome type II (USH2). Methods One family (FR1) with arRP and three unrelated families (F6, F7, and F8) with Usher syndrome (USH), including eight affected members and seven unaffected family individuals were examined clinically. The study included 100 normal Chinese individuals as normal controls. After obtaining informed consent, peripheral blood samples from all participants were collected and genomic DNA was extracted. Genotyping and haplotyping analyses were performed on the known genetic loci for arRP with a panel of polymorphic markers in family FR1. In all four families, the coding region (exons 2–72), including the intron-exon boundary of the USH2A (Usher syndrome type −2A protein) gene, was screened by PCR and direct DNA sequencing. Whenever substitutions were identified in a patient, a restriction fragment length polymorphism (RFLP) analysis, single strand conformation polymorphism (SSCP) analysis, or high resolution melt curve analysis (HRM) was performed on all available family members and on the 100 normal controls. Results The affected individuals presented with typical fundus features of retinitis pigmentosa (RP), including narrowing of the vessels, bone-spicule pigmentation, and waxy optic discs. The electroretinogram (ERG) wave amplitudes of the available probands were undetectable. Audiometric tests in the affected individuals in family FR1 were normal, while indicating moderate to severe sensorineural hearing impairment in the affected individuals in families F6, F7, and F8. Vestibular function was normal in all patients from all four families. The disease-causing gene in family FR1 was mapped to the USH2A locus on chromosome 1q41. Seven novel mutations (two missenses, one 7-bp deletion, two small deletions, and two nonsenses) were detected in the four families after sequencing analysis of USH2A. Conclusions The results further support that mutations of USH2A are also responsible for non-syndromic RP. The mutation spectrum among Chinese patients might differ from that among European Caucasians. PMID:21686329
Integrated databanks access and sequence/structure analysis services at the PBIL.

PubMed

Perrière, Guy; Combet, Christophe; Penel, Simon; Blanchet, Christophe; Thioulouse, Jean; Geourjon, Christophe; Grassot, Julien; Charavay, Céline; Gouy, Manolo; Duret, Laurent; Deléage, Gilbert

2003-07-01

The World Wide Web server of the PBIL (Pôle Bioinformatique Lyonnais) provides on-line access to sequence databanks and to many tools of nucleic acid and protein sequence analyses. This server allows to query nucleotide sequence banks in the EMBL and GenBank formats and protein sequence banks in the SWISS-PROT and PIR formats. The query engine on which our data bank access is based is the ACNUC system. It allows the possibility to build complex queries to access functional zones of biological interest and to retrieve large sequence sets. Of special interest are the unique features provided by this system to query the data banks of gene families developed at the PBIL. The server also provides access to a wide range of sequence analysis methods: similarity search programs, multiple alignments, protein structure prediction and multivariate statistics. An originality of this server is the integration of these two aspects: sequence retrieval and sequence analysis. Indeed, thanks to the introduction of re-usable lists, it is possible to perform treatments on large sets of data. The PBIL server can be reached at: http://pbil.univ-lyon1.fr.
Quantiprot - a Python package for quantitative analysis of protein sequences.

PubMed

Konopka, Bogumił M; Marciniak, Marta; Dyrka, Witold

2017-07-17

The field of protein sequence analysis is dominated by tools rooted in substitution matrices and alignments. A complementary approach is provided by methods of quantitative characterization. A major advantage of the approach is that quantitative properties defines a multidimensional solution space, where sequences can be related to each other and differences can be meaningfully interpreted. Quantiprot is a software package in Python, which provides a simple and consistent interface to multiple methods for quantitative characterization of protein sequences. The package can be used to calculate dozens of characteristics directly from sequences or using physico-chemical properties of amino acids. Besides basic measures, Quantiprot performs quantitative analysis of recurrence and determinism in the sequence, calculates distribution of n-grams and computes the Zipf's law coefficient. We propose three main fields of application of the Quantiprot package. First, quantitative characteristics can be used in alignment-free similarity searches, and in clustering of large and/or divergent sequence sets. Second, a feature space defined by quantitative properties can be used in comparative studies of protein families and organisms. Third, the feature space can be used for evaluating generative models, where large number of sequences generated by the model can be compared to actually observed sequences.
Novel mutations in CRB1 and ABCA4 genes cause Leber congenital amaurosis and Stargardt disease in a Swedish family

PubMed Central

Jonsson, Frida; Burstedt, Marie S; Sandgren, Ola; Norberg, Anna; Golovleva, Irina

2013-01-01

This study aimed to identify genetic mechanisms underlying severe retinal degeneration in one large family from northern Sweden, members of which presented with early-onset autosomal recessive retinitis pigmentosa and juvenile macular dystrophy. The clinical records of affected family members were analysed retrospectively and ophthalmological and electrophysiological examinations were performed in selected cases. Mutation screening was initially performed with microarrays, interrogating known mutations in the genes associated with recessive retinitis pigmentosa, Leber congenital amaurosis and Stargardt disease. Searching for homozygous regions with putative causative disease genes was done by high-density SNP-array genotyping, followed by segregation analysis of the family members. Two distinct phenotypes of retinal dystrophy, Leber congenital amaurosis and Stargardt disease were present in the family. In the family, four patients with Leber congenital amaurosis were homozygous for a novel c.2557C>T (p.Q853X) mutation in the CRB1 gene, while of two cases with Stargardt disease, one was homozygous for c.5461-10T>C in the ABCA4 gene and another was carrier of the same mutation and a novel ABCA4 mutation c.4773+3A>G. Sequence analysis of the entire ABCA4 gene in patients with Stargardt disease revealed complex alleles with additional sequence variants, which were evaluated by bioinformatics tools. In conclusion, presence of different genetic mechanisms resulting in variable phenotype within the family is not rare and can challenge molecular geneticists, ophthalmologists and genetic counsellors. PMID:23443024
Exome Sequencing Identifies a Novel CEACAM16 Mutation Associated with Autosomal Dominant Nonsyndromic Hearing Loss DFNA4B in a Chinese Family

PubMed Central

He, Chufeng; Li, Haibo; Qing, Jie; Grati, Mhamed; Hu, Zhengmao; Li, Jiada; Hu, Yiqiao; Xia, Kun; Mei, Lingyun; Wang, Xingwei; Yu, Jianjun; Chen, Hongsheng; Jiang, Lu; Liu, Yalan; Men, Meichao; Zhang, Hailin; Guan, Liping; Xiao, Jingjing; Zhang, Jianguo; Liu, Xuezhong; Feng, Yong

2014-01-01

Autosomal dominant nonsyndromic hearing loss (ADNSHL/DFNA) is a highly genetically heterogeneous disorder. Hitherto only about 30 ADNSHL-causing genes have been identified and many unknown genes remain to be discovered. In this research, genome-wide linkage analysis mapped the disease locus to a 4.3 Mb region on chromosome 19q13 in SY-026, a five-generation nonconsanguineous Chinese family affected by late-onset and progressive ADNSHL. This linkage region showed partial overlap with the previously reported DFNA4. Simultaneously, probands were analyzed using exome capture followed by next generation sequencing. Encouragingly, a heterozygous missense mutation, c.505G>A (p.G169R) in exon 3 of the CEACAM16 gene (carcinoembryonic antigen-related cell adhesion molecule 16), was identified via this combined strategy. Sanger sequencing verified that the mutation co-segregated with hearing loss in the family and that it was not present in 200 unrelated control subjects with matched ancestry. This is the second report in the literature of a family with ADNSHL caused by CEACAM16 mutation. Immunofluorescence staining and Western blots also prove CEACAM16 to be a secreted protein. Furthermore, our studies in transfected HEK293T cells show that the secretion efficacy of the mutant CEACAM16 is much lower than that of the wild-type, suggesting a deleterious effect of the sequence variant. PMID:25589040
Exome sequencing identifies a novel CEACAM16 mutation associated with autosomal dominant nonsyndromic hearing loss DFNA4B in a Chinese family.

PubMed

Wang, Honghan; Wang, Xinwei; He, Chufeng; Li, Haibo; Qing, Jie; Grati, Mhamed; Hu, Zhengmao; Li, Jiada; Hu, Yiqiao; Xia, Kun; Mei, Lingyun; Wang, Xingwei; Yu, Jianjun; Chen, Hongsheng; Jiang, Lu; Liu, Yalan; Men, Meichao; Zhang, Hailin; Guan, Liping; Xiao, Jingjing; Zhang, Jianguo; Liu, Xuezhong; Feng, Yong

2015-03-01

Autosomal dominant nonsyndromic hearing loss (ADNSHL/DFNA) is a highly genetically heterogeneous disorder. Hitherto only about 30 ADNSHL-causing genes have been identified and many unknown genes remain to be discovered. In this research, genome-wide linkage analysis mapped the disease locus to a 4.3 Mb region on chromosome 19q13 in SY-026, a five-generation nonconsanguineous Chinese family affected by late-onset and progressive ADNSHL. This linkage region showed partial overlap with the previously reported DFNA4. Simultaneously, probands were analyzed using exome capture followed by next-generation sequencing. Encouragingly, a heterozygous missense mutation, c.505G>A (p.G169R) in exon 3 of the CEACAM16 gene (carcinoembryonic antigen-related cell adhesion molecule 16), was identified via this combined strategy. Sanger sequencing verified that the mutation co-segregated with hearing loss in the family and that it was not present in 200 unrelated control subjects with matched ancestry. This is the second report in the literature of a family with ADNSHL caused by CEACAM16 mutation. Immunofluorescence staining and western blots also prove CEACAM16 to be a secreted protein. Furthermore, our studies in transfected HEK293T cells show that the secretion efficacy of the mutant CEACAM16 is much lower than that of the wild type, suggesting a deleterious effect of the sequence variant.
First report on an X-linked hypohidrotic ectodermal dysplasia family with X chromosome inversion: Breakpoint mapping reveals the pathogenic mechanism and preimplantation genetics diagnosis achieves an unaffected birth.

PubMed

Wu, Tonghua; Yin, Biao; Zhu, Yuanchang; Li, Guangui; Ye, Lijun; Liang, Desheng; Zeng, Yong

2017-12-01

To investigate the etiology of X-linked hypohidrotic ectodermal dysplasia (XLHED) in a family with an inversion of the X chromosome [inv(X)(p21q13)] and to achieve a healthy birth following preimplantation genetic diagnosis (PGD). Next generation sequencing (NGS) and Sanger sequencing analysis were carried out to define the inversion breakpoint. Multiple displacement amplification, amplification of breakpoint junction fragments, Sanger sequencing of exon 1 of ED1, haplotyping of informative short tandem repeat markers and gender determination were performed for PGD. NGS data of the proband sample revealed that the size of the possible inverted fragment was over 42Mb, spanning from position 26, 814, 206 to position 69, 231, 915 on the X chromosome. The breakpoints were confirmed by Sanger sequencing. A total of 5 blastocyst embryos underwent trophectoderm biopsy. Two embryos were diagnosed as carriers and three were unaffected. Two unaffected blastocysts were transferred and a singleton pregnancy was achieved. Following confirmation by prenatal diagnosis, a healthy baby was delivered. This is the first report of an XLHED family with inv(X). ED1 is disrupted by the X chromosome inversion in this XLHED family and embryos with the X chromosomal abnormality can be accurately identified by means of PGD. Copyright © 2017. Published by Elsevier B.V.
An approach to functionally relevant clustering of the protein universe: Active site profile‐based clustering of protein structures and sequences

PubMed Central

Knutson, Stacy T.; Westwood, Brian M.; Leuthaeuser, Janelle B.; Turner, Brandon E.; Nguyendac, Don; Shea, Gabrielle; Kumar, Kiran; Hayden, Julia D.; Harper, Angela F.; Brown, Shoshana D.; Morris, John H.; Ferrin, Thomas E.; Babbitt, Patricia C.

2017-01-01

Abstract Protein function identification remains a significant problem. Solving this problem at the molecular functional level would allow mechanistic determinant identification—amino acids that distinguish details between functional families within a superfamily. Active site profiling was developed to identify mechanistic determinants. DASP and DASP2 were developed as tools to search sequence databases using active site profiling. Here, TuLIP (Two‐Level Iterative clustering Process) is introduced as an iterative, divisive clustering process that utilizes active site profiling to separate structurally characterized superfamily members into functionally relevant clusters. Underlying TuLIP is the observation that functionally relevant families (curated by Structure‐Function Linkage Database, SFLD) self‐identify in DASP2 searches; clusters containing multiple functional families do not. Each TuLIP iteration produces candidate clusters, each evaluated to determine if it self‐identifies using DASP2. If so, it is deemed a functionally relevant group. Divisive clustering continues until each structure is either a functionally relevant group member or a singlet. TuLIP is validated on enolase and glutathione transferase structures, superfamilies well‐curated by SFLD. Correlation is strong; small numbers of structures prevent statistically significant analysis. TuLIP‐identified enolase clusters are used in DASP2 GenBank searches to identify sequences sharing functional site features. Analysis shows a true positive rate of 96%, false negative rate of 4%, and maximum false positive rate of 4%. F‐measure and performance analysis on the enolase search results and comparison to GEMMA and SCI‐PHY demonstrate that TuLIP avoids the over‐division problem of these methods. Mechanistic determinants for enolase families are evaluated and shown to correlate well with literature results. PMID:28054422
An approach to functionally relevant clustering of the protein universe: Active site profile-based clustering of protein structures and sequences.

PubMed

Knutson, Stacy T; Westwood, Brian M; Leuthaeuser, Janelle B; Turner, Brandon E; Nguyendac, Don; Shea, Gabrielle; Kumar, Kiran; Hayden, Julia D; Harper, Angela F; Brown, Shoshana D; Morris, John H; Ferrin, Thomas E; Babbitt, Patricia C; Fetrow, Jacquelyn S

2017-04-01

Protein function identification remains a significant problem. Solving this problem at the molecular functional level would allow mechanistic determinant identification-amino acids that distinguish details between functional families within a superfamily. Active site profiling was developed to identify mechanistic determinants. DASP and DASP2 were developed as tools to search sequence databases using active site profiling. Here, TuLIP (Two-Level Iterative clustering Process) is introduced as an iterative, divisive clustering process that utilizes active site profiling to separate structurally characterized superfamily members into functionally relevant clusters. Underlying TuLIP is the observation that functionally relevant families (curated by Structure-Function Linkage Database, SFLD) self-identify in DASP2 searches; clusters containing multiple functional families do not. Each TuLIP iteration produces candidate clusters, each evaluated to determine if it self-identifies using DASP2. If so, it is deemed a functionally relevant group. Divisive clustering continues until each structure is either a functionally relevant group member or a singlet. TuLIP is validated on enolase and glutathione transferase structures, superfamilies well-curated by SFLD. Correlation is strong; small numbers of structures prevent statistically significant analysis. TuLIP-identified enolase clusters are used in DASP2 GenBank searches to identify sequences sharing functional site features. Analysis shows a true positive rate of 96%, false negative rate of 4%, and maximum false positive rate of 4%. F-measure and performance analysis on the enolase search results and comparison to GEMMA and SCI-PHY demonstrate that TuLIP avoids the over-division problem of these methods. Mechanistic determinants for enolase families are evaluated and shown to correlate well with literature results. © 2017 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.
Mutations in DSTYK and dominant urinary tract malformations.

PubMed

Sanna-Cherchi, Simone; Sampogna, Rosemary V; Papeta, Natalia; Burgess, Katelyn E; Nees, Shannon N; Perry, Brittany J; Choi, Murim; Bodria, Monica; Liu, Yan; Weng, Patricia L; Lozanovski, Vladimir J; Verbitsky, Miguel; Lugani, Francesca; Sterken, Roel; Paragas, Neal; Caridi, Gianluca; Carrea, Alba; Dagnino, Monica; Materna-Kiryluk, Anna; Santamaria, Giuseppe; Murtas, Corrado; Ristoska-Bojkovska, Nadica; Izzi, Claudia; Kacak, Nilgun; Bianco, Beatrice; Giberti, Stefania; Gigante, Maddalena; Piaggio, Giorgio; Gesualdo, Loreto; Vukic, Durdica Kosuljandic; Vukojevic, Katarina; Saraga-Babic, Mirna; Saraga, Marijan; Gucev, Zoran; Allegri, Landino; Latos-Bielenska, Anna; Casu, Domenica; State, Matthew; Scolari, Francesco; Ravazzolo, Roberto; Kiryluk, Krzysztof; Al-Awqati, Qais; D'Agati, Vivette D; Drummond, Iain A; Tasic, Velibor; Lifton, Richard P; Ghiggeri, Gian Marco; Gharavi, Ali G

2013-08-15

Congenital abnormalities of the kidney and the urinary tract are the most common cause of pediatric kidney failure. These disorders are highly heterogeneous, and the etiologic factors are poorly understood. We performed genomewide linkage analysis and whole-exome sequencing in a family with an autosomal dominant form of congenital abnormalities of the kidney or urinary tract (seven affected family members). We also performed a sequence analysis in 311 unrelated patients, as well as histologic and functional studies. Linkage analysis identified five regions of the genome that were shared among all affected family members. Exome sequencing identified a single, rare, deleterious variant within these linkage intervals, a heterozygous splice-site mutation in the dual serine-threonine and tyrosine protein kinase gene (DSTYK). This variant, which resulted in aberrant splicing of messenger RNA, was present in all affected family members. Additional, independent DSTYK mutations, including nonsense and splice-site mutations, were detected in 7 of 311 unrelated patients. DSTYK is highly expressed in the maturing epithelia of all major organs, localizing to cell membranes. Knockdown in zebrafish resulted in developmental defects in multiple organs, which suggested loss of fibroblast growth factor (FGF) signaling. Consistent with this finding is the observation that DSTYK colocalizes with FGF receptors in the ureteric bud and metanephric mesenchyme. DSTYK knockdown in human embryonic kidney cells inhibited FGF-stimulated phosphorylation of extracellular-signal-regulated kinase (ERK), the principal signal downstream of receptor tyrosine kinases. We detected independent DSTYK mutations in 2.3% of patients with congenital abnormalities of the kidney or urinary tract, a finding that suggests that DSTYK is a major determinant of human urinary tract development, downstream of FGF signaling. (Funded by the National Institutes of Health and others.).
Mutations in DSTYK and Dominant Urinary Tract Malformations

PubMed Central

Sanna-Cherchi, Simone; Nees, Shannon N.; Perry, Brittany J.; Choi, Murim; Bodria, Monica; Liu, Yan; Weng, Patricia L.; Lozanovski, Vladimir J.; Verbitsky, Miguel; Lugani, Francesca; Sterken, Roel; Paragas, Neal; Caridi, Gianluca; Carrea, Alba; Dagnino, Monica; Materna-Kiryluk, Anna; Santamaria, Giuseppe; Murtas, Corrado; Ristoska-Bojkovska, Nadica; Izzi, Claudia; Kacak, Nilgun; Bianco, Beatrice; Giberti, Stefania; Gigante, Maddalena; Piaggio, Giorgio; Gesualdo, Loreto; Vukic, Durdica Kosuljandic; Vukojevic, Katarina; Saraga-Babic, Mirna; Saraga, Marijan; Gucev, Zoran; Allegri, Landino; Latos-Bielenska, Anna; Casu, Domenica; State, Matthew; Scolari, Francesco; Ravazzolo, Roberto; Kiryluk, Krzysztof; Al-Awqati, Qais; D'Agati, Vivette D.; Drummond, Iain A.; Tasic, Velibor; Lifton, Richard P.; Ghiggeri, Gian Marco; Gharavi, Ali G.

2013-01-01

BACKGROUND Congenital abnormalities of the kidney and the urinary tract are the most common cause of pediatric kidney failure. These disorders are highly heterogeneous, and the etiologic factors are poorly understood. METHODS We performed genomewide linkage analysis and whole-exome sequencing in a family with an autosomal dominant form of congenital abnormalities of the kidney or urinary tract (seven affected family members). We also performed a sequence analysis in 311 unrelated patients, as well as histologic and functional studies. RESULTS Linkage analysis identified five regions of the genome that were shared among all affected family members. Exome sequencing identified a single, rare, deleterious variant within these linkage intervals, a heterozygous splice-site mutation in the dual serine–threonine and tyrosine protein kinase gene (DSTYK). This variant, which resulted in aberrant splicing of messenger RNA, was present in all affected family members. Additional, independent DSTYK mutations, including nonsense and splice-site mutations, were detected in 7 of 311 unrelated patients. DSTYK is highly expressed in the maturing epithelia of all major organs, localizing to cell membranes. Knockdown in zebrafish resulted in developmental defects in multiple organs, which suggested loss of fibroblast growth factor (FGF) signaling. Consistent with this finding is the observation that DSTYK colocalizes with FGF receptors in the ureteric bud and metanephric mesenchyme. DSTYK knockdown in human embryonic kidney cells inhibited FGF-stimulated phosphorylation of extracellular-signal-regulated kinase (ERK), the principal signal downstream of receptor tyrosine kinases. CONCLUSIONS We detected independent DSTYK mutations in 2.3% of patients with congenital abnormalities of the kidney or urinary tract, a finding that suggests that DSTYK is a major determinant of human urinary tract development, downstream of FGF signaling. (Funded by the National Institutes of Health and others.) PMID:23862974
A Protein Domain and Family Based Approach to Rare Variant Association Analysis.

PubMed

Richardson, Tom G; Shihab, Hashem A; Rivas, Manuel A; McCarthy, Mark I; Campbell, Colin; Timpson, Nicholas J; Gaunt, Tom R

2016-01-01

It has become common practice to analyse large scale sequencing data with statistical approaches based around the aggregation of rare variants within the same gene. We applied a novel approach to rare variant analysis by collapsing variants together using protein domain and family coordinates, regarded to be a more discrete definition of a biologically functional unit. Using Pfam definitions, we collapsed rare variants (Minor Allele Frequency ≤ 1%) together in three different ways 1) variants within single genomic regions which map to individual protein domains 2) variants within two individual protein domain regions which are predicted to be responsible for a protein-protein interaction 3) all variants within combined regions from multiple genes responsible for coding the same protein domain (i.e. protein families). A conventional collapsing analysis using gene coordinates was also undertaken for comparison. We used UK10K sequence data and investigated associations between regions of variants and lipid traits using the sequence kernel association test (SKAT). We observed no strong evidence of association between regions of variants based on Pfam domain definitions and lipid traits. Quantile-Quantile plots illustrated that the overall distributions of p-values from the protein domain analyses were comparable to that of a conventional gene-based approach. Deviations from this distribution suggested that collapsing by either protein domain or gene definitions may be favourable depending on the trait analysed. We have collapsed rare variants together using protein domain and family coordinates to present an alternative approach over collapsing across conventionally used gene-based regions. Although no strong evidence of association was detected in these analyses, future studies may still find value in adopting these approaches to detect previously unidentified association signals.

Genome sequence analysis of five Canadian isolates of strawberry mottle virus reveals extensive intra-species diversity and a longer RNA2 with increased coding capacity compared to a previously characterized European isolate.

PubMed

Bhagwat, Basdeo; Dickison, Virginia; Ding, Xinlun; Walker, Melanie; Bernardy, Michael; Bouthillier, Michel; Creelman, Alexa; DeYoung, Robyn; Li, Yinzi; Nie, Xianzhou; Wang, Aiming; Xiang, Yu; Sanfaçon, Hélène

2016-06-01

In this study, we report the genome sequence of five isolates of strawberry mottle virus (family Secoviridae, order Picornavirales) from strawberry field samples with decline symptoms collected in Eastern Canada. The Canadian isolates differed from the previously characterized European isolate 1134 in that they had a longer RNA2, resulting in a 239-amino-acid extension of the C-terminal region of the polyprotein. Sequence analysis suggests that reassortment and recombination occurred among the isolates. Phylogenetic analysis revealed that the Canadian isolates are diverse, grouping in two separate branches along with isolates from Europe and the Americas.
Discovery of genes related to insecticide resistance in Bactrocera dorsalis by functional genomic analysis of a de novo assembled transcriptome.

PubMed

Hsu, Ju-Chun; Chien, Ting-Ying; Hu, Chia-Cheng; Chen, Mei-Ju May; Wu, Wen-Jer; Feng, Hai-Tung; Haymer, David S; Chen, Chien-Yu

2012-01-01

Insecticide resistance has recently become a critical concern for control of many insect pest species. Genome sequencing and global quantization of gene expression through analysis of the transcriptome can provide useful information relevant to this challenging problem. The oriental fruit fly, Bactrocera dorsalis, is one of the world's most destructive agricultural pests, and recently it has been used as a target for studies of genetic mechanisms related to insecticide resistance. However, prior to this study, the molecular data available for this species was largely limited to genes identified through homology. To provide a broader pool of gene sequences of potential interest with regard to insecticide resistance, this study uses whole transcriptome analysis developed through de novo assembly of short reads generated by next-generation sequencing (NGS). The transcriptome of B. dorsalis was initially constructed using Illumina's Solexa sequencing technology. Qualified reads were assembled into contigs and potential splicing variants (isotigs). A total of 29,067 isotigs have putative homologues in the non-redundant (nr) protein database from NCBI, and 11,073 of these correspond to distinct D. melanogaster proteins in the RefSeq database. Approximately 5,546 isotigs contain coding sequences that are at least 80% complete and appear to represent B. dorsalis genes. We observed a strong correlation between the completeness of the assembled sequences and the expression intensity of the transcripts. The assembled sequences were also used to identify large numbers of genes potentially belonging to families related to insecticide resistance. A total of 90 P450-, 42 GST-and 37 COE-related genes, representing three major enzyme families involved in insecticide metabolism and resistance, were identified. In addition, 36 isotigs were discovered to contain target site sequences related to four classes of resistance genes. Identified sequence motifs were also analyzed to characterize putative polypeptide translational products and associate them with specific genes and protein functions.
Cloning and sequencing of the cDNA species for mammalian dimeric dihydrodiol dehydrogenases.

PubMed Central

Arimitsu, E; Aoki, S; Ishikura, S; Nakanishi, K; Matsuura, K; Hara, A

1999-01-01

Cynomolgus and Japanese monkey kidneys, dog and pig livers and rabbit lens contain dimeric dihydrodiol dehydrogenase (EC 1.3.1.20) associated with high carbonyl reductase activity. Here we have isolated cDNA species for the dimeric enzymes by reverse transcriptase-PCR from human intestine in addition to the above five animal tissues. The amino acid sequences deduced from the monkey, pig and dog cDNA species perfectly matched the partial sequences of peptides digested from the respective enzymes of these animal tissues, and active recombinant proteins were expressed in a bacterial system from the monkey and human cDNA species. Northern blot analysis revealed the existence of a single 1.3 kb mRNA species for the enzyme in these animal tissues. The human enzyme shared 94%, 85%, 84% and 82% amino acid identity with the enzymes of the two monkey strains (their sequences were identical), the dog, the pig and the rabbit respectively. The sequences of the primate enzymes consisted of 335 amino acid residues and lacked one amino acid compared with the other animal enzymes. In contrast with previous reports that other types of dihydrodiol dehydrogenase, carbonyl reductases and enzymes with either activity belong to the aldo-keto reductase family or the short-chain dehydrogenase/reductase family, dimeric dihydrodiol dehydrogenase showed no sequence similarity with the members of the two protein families. The dimeric enzyme aligned with low degrees of identity (14-25%) with several prokaryotic proteins, in which 47 residues are strictly or highly conserved. Thus dimeric dihydrodiol dehydrogenase has a primary structure distinct from the previously known mammalian enzymes and is suggested to constitute a novel protein family with the prokaryotic proteins. PMID:10477285
A novel nonsense mutation in CRYBB1 associated with autosomal dominant congenital cataract

PubMed Central

Yang, Juhua; Zhu, Yihua; Gu, Feng; He, Xiang; Cao, Zongfu; Li, Xuexi; Tong, Yi

2008-01-01

Purpose To identify the molecular defect underlying an autosomal dominant congenital nuclear cataract in a Chinese family. Methods Twenty-two members of a three-generation pedigree were recruited, clinical examinations were performed, and genomic DNA was extracted from peripheral blood leukocytes. All members were genotyped with polymorphic microsatellite markers adjacent to each of the known cataract-related genes. Linkage analysis was performed after genotyping. Candidate genes were screened for mutation using direct sequencing. Individuals were screened for presence of a mutation by restriction fragment length polymorphism (RFLP) analysis. Results Linkage analysis identified a maximum LOD score of 3.31 (recombination fraction [θ]=0.0) with marker D22S1167 on chromosome 22, which flanks the β-crystallin gene cluster (CRYBB3, CRYBB2, CRYBB1, and CRYBA4). Sequencing the coding regions and the flanking intronic sequences of these four candidate genes identified a novel, heterozygous C→T transition in exon 6 of CRYBB1 in the affected individuals of the family. This single nucleotide change introduced a novel BfaI site and was predicted to result in a nonsense mutation at codon 223 that changed a phylogenetically conserved amino acid to a stop codon (p.Q223X). RFLP analysis confirmed that this mutation co-segregated with the disease phenotype in all available family members and was not found in 100 normal unrelated individuals from the same ethnic background. Conclusions This study has identified a novel nonsense mutation in CRYBB1 (p.Q223X) associated with autosomal dominant congenital nuclear cataract. PMID:18432316
Rather than by direct acquisition via lateral gene transfer, GHF5 cellulases were passed on from early Pratylenchidae to root-knot and cyst nematodes.

PubMed

Rybarczyk-Mydłowska, Katarzyna; Maboreke, Hazel Ruvimbo; van Megen, Hanny; van den Elsen, Sven; Mooyman, Paul; Smant, Geert; Bakker, Jaap; Helder, Johannes

2012-11-21

Plant parasitic nematodes are unusual Metazoans as they are equipped with genes that allow for symbiont-independent degradation of plant cell walls. Among the cell wall-degrading enzymes, glycoside hydrolase family 5 (GHF5) cellulases are relatively well characterized, especially for high impact parasites such as root-knot and cyst nematodes. Interestingly, ancestors of extant nematodes most likely acquired these GHF5 cellulases from a prokaryote donor by one or multiple lateral gene transfer events. To obtain insight into the origin of GHF5 cellulases among evolutionary advanced members of the order Tylenchida, cellulase biodiversity data from less distal family members were collected and analyzed. Single nematodes were used to obtain (partial) genomic sequences of cellulases from representatives of the genera Meloidogyne, Pratylenchus, Hirschmanniella and Globodera. Combined Bayesian analysis of ≈ 100 cellulase sequences revealed three types of catalytic domains (A, B, and C). Represented by 84 sequences, type B is numerically dominant, and the overall topology of the catalytic domain type shows remarkable resemblance with trees based on neutral (= pathogenicity-unrelated) small subunit ribosomal DNA sequences. Bayesian analysis further suggested a sister relationship between the lesion nematode Pratylenchus thornei and all type B cellulases from root-knot nematodes. Yet, the relationship between the three catalytic domain types remained unclear. Superposition of intron data onto the cellulase tree suggests that types B and C are related, and together distinct from type A that is characterized by two unique introns. All Tylenchida members investigated here harbored one or multiple GHF5 cellulases. Three types of catalytic domains are distinguished, and the presence of at least two types is relatively common among plant parasitic Tylenchida. Analysis of coding sequences of cellulases suggests that root-knot and cyst nematodes did not acquire this gene directly by lateral genes transfer. More likely, these genes were passed on by ancestors of a family nowadays known as the Pratylenchidae.
How Primary Care Providers Talk to Patients about Genome Sequencing Results: Risk, Rationale, and Recommendation.

PubMed

Vassy, Jason L; Davis, J Kelly; Kirby, Christine; Richardson, Ian J; Green, Robert C; McGuire, Amy L; Ubel, Peter A

2018-06-01

Genomics will play an increasingly prominent role in clinical medicine. To describe how primary care physicians (PCPs) discuss and make clinical recommendations about genome sequencing results. Qualitative analysis. PCPs and their generally healthy patients undergoing genome sequencing. Patients received clinical genome reports that included four categories of results: monogenic disease risk variants (if present), carrier status, five pharmacogenetics results, and polygenic risk estimates for eight cardiometabolic traits. Patients' office visits with their PCPs were audio-recorded, and summative content analysis was used to describe how PCPs discussed genomic results. For each genomic result discussed in 48 PCP-patient visits, we identified a "take-home" message (recommendation), categorized as continuing current management, further treatment, further evaluation, behavior change, remembering for future care, or sharing with family members. We analyzed how PCPs came to each recommendation by identifying 1) how they described the risk or importance of the given result and 2) the rationale they gave for translating that risk into a specific recommendation. Quantitative analysis showed that continuing current management was the most commonly coded recommendation across results overall (492/749, 66%) and for each individual result type except monogenic disease risk results. Pharmacogenetics was the most common result type to prompt a recommendation to remember for future care (94/119, 79%); carrier status was the most common type prompting a recommendation to share with family members (45/54, 83%); and polygenic results were the most common type prompting a behavior change recommendation (55/58, 95%). One-fifth of recommendation codes associated with monogenic results were for further evaluation (6/24, 25%). Rationales for these recommendations included patient context, family context, and scientific/clinical limitations of sequencing. PCPs distinguish substantive differences among categories of genome sequencing results and use clinical judgment to justify continuing current management in generally healthy patients with genomic results.
Comparative molecular cytogenetic analyses of a major tandemly repeated DNA family and retrotransposon sequences in cultivated jute Corchorus species (Malvaceae)

PubMed Central

Begum, Rabeya; Zakrzewski, Falk; Menzel, Gerhard; Weber, Beatrice; Alam, Sheikh Shamimul; Schmidt, Thomas

2013-01-01

Background and Aims The cultivated jute species Corchorus olitorius and Corchorus capsularis are important fibre crops. The analysis of repetitive DNA sequences, comprising a major part of plant genomes, has not been carried out in jute but is useful to investigate the long-range organization of chromosomes. The aim of this study was the identification of repetitive DNA sequences to facilitate comparative molecular and cytogenetic studies of two jute cultivars and to develop a fluorescent in situ hybridization (FISH) karyotype for chromosome identification. Methods A plasmid library was generated from C. olitorius and C. capsularis with genomic restriction fragments of 100–500 bp, which was complemented by targeted cloning of satellite DNA by PCR. The diversity of the repetitive DNA families was analysed comparatively. The genomic abundance and chromosomal localization of different repeat classes were investigated by Southern analysis and FISH, respectively. The cytosine methylation of satellite arrays was studied by immunolabelling. Key Results Major satellite repeats and retrotransposons have been identified from C. olitorius and C. capsularis. The satellite family CoSat I forms two undermethylated species-specific subfamilies, while the long terminal repeat (LTR) retrotransposons CoRetro I and CoRetro II show similarity to the Metaviridea of plant retroelements. FISH karyotypes were developed by multicolour FISH using these repetitive DNA sequences in combination with 5S and 18S–5·8S–25S rRNA genes which enable the unequivocal chromosome discrimination in both jute species. Conclusions The analysis of the structure and diversity of the repeated DNA is crucial for genome sequence annotation. The reference karyotypes will be useful for breeding of jute and provide the basis for karyotyping homeologous chromosomes of wild jute species to reveal the genetic and evolutionary relationship between cultivated and wild Corchorus species. PMID:23666888
Natural killer cell receptor genes in the family Equidae: not only Ly49.

PubMed

Futas, Jan; Horin, Petr

2013-01-01

Natural killer (NK) cells have important functions in immunity. NK recognition in mammals can be mediated through killer cell immunoglobulin-like receptors (KIR) and/or killer cell lectin-like Ly49 receptors. Genes encoding highly variable NK cell receptors (NKR) represent rapidly evolving genomic regions. No single conservative model of NKR genes was observed in mammals. Single-copy low polymorphic NKR genes present in one mammalian species may expand into highly polymorphic multigene families in other species. In contrast to other non-rodent mammals, multiple Ly49-like genes appear to exist in the horse, while no functional KIR genes were observed in this species. In this study, Ly49 and KIR were sought and their evolution was characterized in the entire family Equidae. Genomic sequences retrieved showed the presence of at least five highly conserved polymorphic Ly49 genes in horses, asses and zebras. These findings confirmed that the expansion of Ly49 occurred in the entire family. Several KIR-like sequences were also identified in the genome of Equids. Besides a previously identified non-functional KIR-Immunoglobulin-like transcript fusion gene (KIR-ILTA) and two putative pseudogenes, a KIR3DL-like sequence was analyzed. In contrast to previous observations made in the horse, the KIR3DL sequence, genomic organization and mRNA expression suggest that all Equids might produce a functional KIR receptor protein molecule with a single non-mutated immune tyrosine-based inhibition motif (ITIM) domain. No evidence for positive selection in the KIR3DL gene was found. Phylogenetic analysis including rhinoceros and tapir genomic DNA and deduced amino acid KIR-related sequences showed differences between families and even between species within the order Perissodactyla. The results suggest that the order Perissodactyla and its family Equidae with expanded Ly49 genes and with a potentially functional KIR gene may represent an interesting model for evolutionary biology of NKR genes.
Natural Killer Cell Receptor Genes in the Family Equidae: Not only Ly49

PubMed Central

Futas, Jan; Horin, Petr

2013-01-01

Natural killer (NK) cells have important functions in immunity. NK recognition in mammals can be mediated through killer cell immunoglobulin-like receptors (KIR) and/or killer cell lectin-like Ly49 receptors. Genes encoding highly variable NK cell receptors (NKR) represent rapidly evolving genomic regions. No single conservative model of NKR genes was observed in mammals. Single-copy low polymorphic NKR genes present in one mammalian species may expand into highly polymorphic multigene families in other species. In contrast to other non-rodent mammals, multiple Ly49-like genes appear to exist in the horse, while no functional KIR genes were observed in this species. In this study, Ly49 and KIR were sought and their evolution was characterized in the entire family Equidae. Genomic sequences retrieved showed the presence of at least five highly conserved polymorphic Ly49 genes in horses, asses and zebras. These findings confirmed that the expansion of Ly49 occurred in the entire family. Several KIR-like sequences were also identified in the genome of Equids. Besides a previously identified non-functional KIR-Immunoglobulin-like transcript fusion gene (KIR-ILTA) and two putative pseudogenes, a KIR3DL-like sequence was analyzed. In contrast to previous observations made in the horse, the KIR3DL sequence, genomic organization and mRNA expression suggest that all Equids might produce a functional KIR receptor protein molecule with a single non-mutated immune tyrosine-based inhibition motif (ITIM) domain. No evidence for positive selection in the KIR3DL gene was found. Phylogenetic analysis including rhinoceros and tapir genomic DNA and deduced amino acid KIR-related sequences showed differences between families and even between species within the order Perissodactyla. The results suggest that the order Perissodactyla and its family Equidae with expanded Ly49 genes and with a potentially functional KIR gene may represent an interesting model for evolutionary biology of NKR genes. PMID:23724088
Splice-site mutations identified in PDE6A responsible for retinitis pigmentosa in consanguineous Pakistani families

PubMed Central

Khan, Shahid Y.; Ali, Shahbaz; Naeem, Muhammad Asif; Khan, Shaheen N.; Husnain, Tayyab; Butt, Nadeem H.; Qazi, Zaheeruddin A.; Akram, Javed; Riazuddin, Sheikh; Ayyagari, Radha; Hejtmancik, J. Fielding

2015-01-01

Purpose This study was conducted to localize and identify causal mutations associated with autosomal recessive retinitis pigmentosa (RP) in consanguineous familial cases of Pakistani origin. Methods Ophthalmic examinations that included funduscopy and electroretinography (ERG) were performed to confirm the affectation status. Blood samples were collected from all participating individuals, and genomic DNA was extracted. A genome-wide scan was performed, and two-point logarithm of odds (LOD) scores were calculated. Sanger sequencing was performed to identify the causative variants. Subsequently, we performed whole exome sequencing to rule out the possibility of a second causal variant within the linkage interval. Sequence conservation was performed with alignment analyses of PDE6A orthologs, and in silico splicing analysis was completed with Human Splicing Finder version 2.4.1. Results A large multigenerational consanguineous family diagnosed with early-onset RP was ascertained. An ophthalmic clinical examination consisting of fundus photography and electroretinography confirmed the diagnosis of RP. A genome-wide scan was performed, and suggestive two-point LOD scores were observed with markers on chromosome 5q. Haplotype analyses identified the region; however, the region did not segregate with the disease phenotype in the family. Subsequently, we performed a second genome-wide scan that excluded the entire genome except the chromosome 5q region harboring PDE6A. Next-generation whole exome sequencing identified a splice acceptor site mutation in intron 16: c.2028–1G>A, which was completely conserved in PDE6A orthologs and was absent in ethnically matched 350 control chromosomes, the 1000 Genomes database, and the NHLBI Exome Sequencing Project. Subsequently, we investigated our entire cohort of RP familial cases and identified a second family who harbored a splice acceptor site mutation in intron 10: c.1408–2A>G. In silico analysis suggested that these mutations will result in the elimination of wild-type splice acceptor sites that would result in either skipping of the respective exon or the creation of a new cryptic splice acceptor site; both possibilities would result in retinal photoreceptor cells that lack PDE6A wild-type protein. Conclusions we report two splice acceptor site variations in PDE6A in consanguineous Pakistani families who manifested cardinal symptoms of RP. Taken together with our previously published work, our data suggest that mutations in PDE6A account for about 2% of the total genetic load of RP in our cohort and possibly in the Pakistani population as well. PMID:26321862
Association Between Germline Mutation in VSIG10L and Familial Barrett Neoplasia.

PubMed

Fecteau, Ryan E; Kong, Jianping; Kresak, Adam; Brock, Wendy; Song, Yeunjoo; Fujioka, Hisashi; Elston, Robert; Willis, Joseph E; Lynch, John P; Markowitz, Sanford D; Guda, Kishore; Chak, Amitabh

2016-10-01

Esophageal adenocarcinoma and its precursor lesion Barrett esophagus have seen a dramatic increase in incidence over the past 4 decades yet marked genetic heterogeneity of this disease has precluded advances in understanding its pathogenesis and improving treatment. To identify novel disease susceptibility variants in a familial syndrome of esophageal adenocarcinoma and Barrett esophagus, termed familial Barrett esophagus, by using high-throughput sequencing in affected individuals from a large, multigenerational family. We performed whole exome sequencing (WES) from peripheral lymphocyte DNA on 4 distant relatives from our multiplex, multigenerational familial Barrett esophagus family to identify candidate disease susceptibility variants. Gene variants were filtered, verified, and segregation analysis performed to identify a single candidate variant. Gene expression analysis was done with both quantitative real-time polymerase chain reaction and in situ RNA hybridization. A 3-dimensional organotypic cell culture model of esophageal maturation was utilized to determine the phenotypic effects of our gene variant. We used electron microscopy on esophageal mucosa from an affected family member carrying the gene variant to assess ultrastructural changes. Identification of a novel, germline disease susceptibility variant in a previously uncharacterized gene. A multiplex, multigenerational family with 14 members affected (3 members with esophageal adenocarcinoma and 11 with Barrett esophagus) was identified, and whole-exome sequencing identified a germline mutation (S631G) at a highly conserved serine residue in the uncharacterized gene VSIG10L that segregated in affected members. Transfection of S631G variant into a 3-dimensional organotypic culture model of normal esophageal squamous cells dramatically inhibited epithelial maturation compared with the wild-type. VSIG10L exhibited high expression in normal squamous esophagus with marked loss of expression in Barrett-associated lesions. Electron microscopy of squamous esophageal mucosa harboring the S631G variant revealed dilated intercellular spaces and reduced desmosomes. This study presents VSIG10L as a candidate familial Barrett esophagus susceptibility gene, with a putative role in maintaining normal esophageal homeostasis. Further research assessing VSIG10L function may reveal pathways important for esophageal maturation and the pathogenesis of Barrett esophagus and esophageal adenocarcinoma.
Association Between Germline Mutation in VSIG10L and Familial Barrett Neoplasia

PubMed Central

Fecteau, Ryan E.; Kong, Jianping; Kresak, Adam; Brock, Wendy; Song, Yeunjoo; Fujioka, Hisashi; Elston, Robert; Willis, Joseph E.; Lynch, John P.; Markowitz, Sanford D.; Guda, Kishore; Chak, Amitabh

2016-01-01

IMPORTANCE Esophageal adenocarcinoma and its precursor lesion Barrett esophagus have seen a dramatic increase in incidence over the past 4 decades yet marked genetic heterogeneity of this disease has precluded advances in understanding its pathogenesis and improving treatment. OBJECTIVE To identify novel disease susceptibility variants in a familial syndrome of esophageal adenocarcinoma and Barrett esophagus, termed familial Barrett esophagus, by using high-throughput sequencing in affected individuals from a large, multigenerational family. DESIGN, SETTING, AND PARTICIPANTS We performed whole exome sequencing (WES) from peripheral lymphocyte DNA on 4 distant relatives from our multiplex, multigenerational familial Barrett esophagus family to identify candidate disease susceptibility variants. Gene variants were filtered, verified, and segregation analysis performed to identify a single candidate variant. Gene expression analysis was done with both quantitative real-time polymerase chain reaction and in situ RNA hybridization. A 3-dimensional organotypic cell culture model of esophageal maturation was utilized to determine the phenotypic effects of our gene variant. We used electron microscopy on esophageal mucosa from an affected family member carrying the gene variant to assess ultrastructural changes. MAIN OUTCOMES AND MEASURES Identification of a novel, germline disease susceptibility variant in a previously uncharacterized gene. RESULTS A multiplex, multigenerational family with 14 members affected (3 members with esophageal adenocarcinoma and 11 with Barrett esophagus) was identified, and whole-exome sequencing identified a germline mutation (S631G) at a highly conserved serine residue in the uncharacterized gene VSIG10L that segregated in affected members. Transfection of S631G variant into a 3-dimensional organotypic culture model of normal esophageal squamous cells dramatically inhibited epithelial maturation compared with the wild-type. VSIG10L exhibited high expression in normal squamous esophagus with marked loss of expression in Barrett-associated lesions. Electron microscopy of squamous esophageal mucosa harboring the S631G variant revealed dilated intercellular spaces and reduced desmosomes. CONCLUSIONS AND RELEVANCE This study presents VSIG10L as a candidate familial Barrett esophagus susceptibility gene, with a putative role in maintaining normal esophageal homeostasis. Further research assessing VSIG10L function may reveal pathways important for esophageal maturation and the pathogenesis of Barrett esophagus and esophageal adenocarcinoma. PMID:27467440
Sarcocystis neurona merozoites express a family of immunogenic surface antigens that are orthologues of the Toxoplasma gondii surface antigens (SAGs) and SAG-related sequences.

PubMed

Howe, Daniel K; Gaji, Rajshekhar Y; Mroz-Barrett, Meaghan; Gubbels, Marc-Jan; Striepen, Boris; Stamper, Shelby

2005-02-01

Sarcocystis neurona is a member of the Apicomplexa that causes myelitis and encephalitis in horses but normally cycles between the opossum and small mammals. Analysis of an S. neurona expressed sequence tag (EST) database revealed four paralogous proteins that exhibit clear homology to the family of surface antigens (SAGs) and SAG-related sequences of Toxoplasma gondii. The primary peptide sequences of the S. neurona proteins are consistent with the two-domain structure that has been described for the T. gondii SAGs, and each was predicted to have an amino-terminal signal peptide and a carboxyl-terminal glycolipid anchor addition site, suggesting surface localization. All four proteins were confirmed to be membrane associated and displayed on the surface of S. neurona merozoites. Due to their surface localization and homology to T. gondii surface antigens, these S. neurona proteins were designated SnSAG1, SnSAG2, SnSAG3, and SnSAG4. Consistent with their homology, the SnSAGs elicited a robust immune response in infected and immunized animals, and their conserved structure further suggests that the SnSAGs similarly serve as adhesins for attachment to host cells. Whether the S. neurona SAG family is as extensive as the T. gondii SAG family remains unresolved, but it is probable that additional SnSAGs will be revealed as more S. neurona ESTs are generated. The existence of an SnSAG family in S. neurona indicates that expression of multiple related surface antigens is not unique to the ubiquitous organism T. gondii. Instead, the SAG gene family is a common trait that presumably has an essential, conserved function(s).
Genome structure drives patterns of gene family evolution in ciliates, a case study using Chilodonella uncinata (Protista, Ciliophora, Phyllopharyngea).

PubMed

Gao, Feng; Song, Weibo; Katz, Laura A

2014-08-01

In most lineages, diversity among gene family members results from gene duplication followed by sequence divergence. Because of the genome rearrangements during the development of somatic nuclei, gene family evolution in ciliates involves more complex processes. Previous work on the ciliate Chilodonella uncinata revealed that macronuclear β-tubulin gene family members are generated by alternative processing, in which germline regions are alternatively used in multiple macronuclear chromosomes. To further study genome evolution in this ciliate, we analyzed its transcriptome and found that (1) alternative processing is extensive among gene families; and (2) such gene families are likely to be C. uncinata specific. We characterized additional macronuclear and micronuclear copies of one candidate alternatively processed gene family-a protein kinase domain containing protein (PKc)-from two C. uncinata strains. Analysis of the PKc sequences reveals that (1) multiple PKc gene family members in the macronucleus share some identical regions flanked by divergent regions; and (2) the shared identical regions are processed from a single micronuclear chromosome. We discuss analogous processes in lineages across the eukaryotic tree of life to provide further insights on the impact of genome structure on gene family evolution in eukaryotes. © 2014 The Author(s). Evolution © 2014 The Society for the Study of Evolution.
Chemical property based sequence characterization of PpcA and its homolog proteins PpcB-E: A mathematical approach

PubMed Central

Pal Choudhury, Pabitra

2017-01-01

Periplasmic c7 type cytochrome A (PpcA) protein is determined in Geobacter sulfurreducens along with its other four homologs (PpcB-E). From the crystal structure viewpoint the observation emerges that PpcA protein can bind with Deoxycholate (DXCA), while its other homologs do not. But it is yet to be established with certainty the reason behind this from primary protein sequence information. This study is primarily based on primary protein sequence analysis through the chemical basis of embedded amino acids. Firstly, we look for the chemical group specific score of amino acids. Along with this, we have developed a new methodology for the phylogenetic analysis based on chemical group dissimilarities of amino acids. This new methodology is applied to the cytochrome c7 family members and pinpoint how a particular sequence is differing with others. Secondly, we build a graph theoretic model on using amino acid sequences which is also applied to the cytochrome c7 family members and some unique characteristics and their domains are highlighted. Thirdly, we search for unique patterns as subsequences which are common among the group or specific individual member. In all the cases, we are able to show some distinct features of PpcA that emerges PpcA as an outstanding protein compared to its other homologs, resulting towards its binding with deoxycholate. Similarly, some notable features for the structurally dissimilar protein PpcD compared to the other homologs are also brought out. Further, the five members of cytochrome family being homolog proteins, they must have some common significant features which are also enumerated in this study. PMID:28362850
Identification of five novel mutations in the long isoform of the USH2A gene in Chinese families with Usher syndrome type II.

PubMed

Dai, Hanjun; Zhang, Xiaohui; Zhao, Xin; Deng, Ting; Dong, Bing; Wang, Jingzhao; Li, Yang

2008-01-01

Usher syndrome type II (USH2) is the most common form of Usher syndrome, an autosomal recessive disorder characterized by moderate to severe hearing loss, postpuberal onset of retinitis pigmentosa (RP), and normal vestibular function. Mutations in the USH2A gene have been shown to be responsible for most cases of USH2. To further elucidate the role of USH2A in USH2, mutation screening was undertaken in three Chinese families with USH2. Three unrelated Chinese families, consisting of six patients and 10 unaffected relatives, were examined clinically, and 100 normal Chinese individuals served as controls. Genomic DNA was extracted from the venous blood of all participants. The coding region (exons 2-72), including the intron-exon boundary of USH2A, was amplified by polymerase chain reaction (PCR). The PCR products amplified from the three probands were analyzed using direct sequencing to screen sequence variants. Whenever substitutions were identified in a patient, restriction fragment length polymorphism analysis, or single strand conformation polymorphism analysis was performed on all available family members and the control group. Fundus examination revealed typical fundus features of RP, including narrowing of the vessels, bone-speckle pigmentation, and waxy optic discs. The ERG wave amplitudes of three probands were undetectable. Audiometric tests indicated moderate to severe sensorineural hearing impairment. Vestibular function was normal. Five novel mutations (one small insertion, one small deletion, one nonsense, one missense, and one splice site) were detected in three families after sequence analysis of USH2A. Of the five mutations, four were located in exons 22-72, specific to the long isoform of USH2A. The mutations found in our study broaden the spectrum of USH2A mutations. Our results further indicate that the long isoform of USH2A may harbor even more mutations of the USH2A gene.
Anticipation in a family with primary familial brain calcification caused by an SLC20A2 variant.

PubMed

Konno, Takuya; Blackburn, Patrick R; Rozen, Todd D; van Gerpen, Jay A; Ross, Owen A; Atwal, Paldeep S; Wszolek, Zbigniew K

2018-04-11

To describe a family with primary familial brain calcification (PFBC) due to SLC20A2 variant showing possible genetic anticipation. We conducted historical, genealogical, clinical, and radiologic studies of a family with PFBC. Clinical evaluations including neurological examination and head computed tomography (CT) scans of a proband and her father were performed. They provided additional information regarding other family members. To identify a causative gene variant, we performed whole-exome sequencing for the proband followed by segregation analysis in other affected members using direct sequencing. In this family, nine affected members were identified over four generations. The proband suffered from chronic daily headache including thunderclap headache. We identified an SLC20A2 (c.509delT, p.(Leu170*)) variant in three affected members over three generations. Interestingly, the age of onset became younger as the disease passed through successive generations, suggestive of genetic anticipation. For clinical purpose, it is important to consider thunderclap headache and genetic anticipation in PFBC caused by SLC20A2 variants. Further investigation is required to validate our observation. Copyright © 2018 Polish Neurological Society. Published by Elsevier Urban & Partner Sp. z o.o. All rights reserved.
Phylogenetic analysis of β-xylanase SRXL1 of Sporisorium reilianum and its relationship with families (GH10 and GH11) of Ascomycetes and Basidiomycetes

PubMed Central

Álvarez-Cervantes, Jorge; Díaz-Godínez, Gerardo; Mercado-Flores, Yuridia; Gupta, Vijai Kumar; Anducho-Reyes, Miguel Angel

2016-01-01

In this paper, the amino acid sequence of the β-xylanase SRXL1 of Sporisorium reilianum, which is a pathogenic fungus of maize was used as a model protein to find its phylogenetic relationship with other xylanases of Ascomycetes and Basidiomycetes and the information obtained allowed to establish a hypothesis of monophyly and of biological role. 84 amino acid sequences of β-xylanase obtained from the GenBank database was used. Groupings analysis of higher-level in the Pfam database allowed to determine that the proteins under study were classified into the GH10 and GH11 families, based on the regions of highly conserved amino acids, 233–318 and 180–193 respectively, where glutamate residues are responsible for the catalysis. PMID:27040368
Identification of a novel MIP frameshift mutation associated with congenital cataract in a Chinese family by whole-exome sequencing and functional analysis.

PubMed

Long, Xigui; Huang, Yanru; Tan, Hu; Li, Zhuo; Zhang, Rui; Linpeng, Siyuan; Lv, Weigang; Cao, Yingxi; Li, Haoxian; Liang, Desheng; Wu, Lingqian

2018-04-26

To detect the underlying pathogenesis of congenital cataract in a four-generation Chinese family. Whole-exome sequencing (WES) of family members (III:4, IV:4, and IV:6) was performed. Sanger sequencing and bioinformatics analysis were subsequently conducted. Full-length WT-MIP or K228fs-MIP fused to HA markers at the N-terminal was transfected into HeLa cells. Next, quantitative real-time PCR, western blotting and immunofluorescence confocal laser scanning were performed. The age of onset for nonsyndromic cataracts in male patients was by 1-year old, earlier than for female patients, who exhibited onset at adulthood. A novel c.682_683delAA (p.K228fs230X) mutation in main intrinsic protein (MIP) cosegregated with the cataract phenotype. The instability index and unfolded states for truncated MIP were predicted to increase by bioinformatics analysis. The mRNA transcription level of K228fs-MIP was reduced compared with that of WT-MIP, and K228fs-MIP protein expression was also lower than that of WT-MIP. Immunofluorescence images showed that WT-MIP principally localized to the plasma membrane, whereas the mutant protein was trapped in the cytoplasm. Our study generated genetic and primary functional evidence for a novel c.682_683delAA mutation in MIP that expands the variant spectrum of MIP and help us better understand the molecular basis of cataract.
A Streamlined Protocol for Molecular Testing of the DMD Gene within a Diagnostic Laboratory: A Combination of Array Comparative Genomic Hybridization and Bidirectional Sequence Analysis

PubMed Central

Marquis-Nicholson, Renate; Lai, Daniel; Love, Jennifer M.; Love, Donald R.

2013-01-01

Purpose. The aim of this study was to develop a streamlined mutation screening protocol for the DMD gene in order to confirm a clinical diagnosis of Duchenne or Becker muscular dystrophy in affected males and to clarify the carrier status of female family members. Methods. Sequence analysis and array comparative genomic hybridization (aCGH) were used to identify mutations in the dystrophin DMD gene. We analysed genomic DNA from six individuals with a range of previously characterised mutations and from eight individuals who had not previously undergone any form of molecular analysis. Results. We successfully identified the known mutations in all six patients. A molecular diagnosis was also made in three of the four patients with a clinical diagnosis who had not undergone prior genetic screening, and testing for familial mutations was successfully completed for the remaining four patients. Conclusion. The mutation screening protocol described here meets best practice guidelines for molecular testing of the DMD gene in a diagnostic laboratory. The aCGH method is a superior alternative to more conventional assays such as multiplex ligation-dependent probe amplification (MLPA). The combination of aCGH and sequence analysis will detect mutations in 98% of patients with the Duchenne or Becker muscular dystrophy. PMID:23476807

A novel ATTR L32V mutation causes familial amyloid polyneuropathy in a Bolivian family.

PubMed

Martínez-Ulloa, Pedro L; Vallejo, Manuela; Corral, Iñigo; García-Barragán, Nuria; Alcazar, Alberto; Martínez-Alonso, Emma; Martínez-Poles, Javier; Pian, Hector; Jiménez-Escrig, Adriano

2017-09-01

We report a new transthyretin (ATTR) gene c.272C>G mutation and variant protein, p.Leu32Val, in a kindred of Bolivian origin with a rapid progressive peripheral neuropathy and cardiomyopathy. Three individuals from a kindred with peripheral nerve and cardiac amyloidosis were examined. Analysis of the TTR gene was performed by Sanger direct sequencing. Neuropathologic examination was obtained on the index patient with mass spectrometry study of the ATTR deposition. Direct DNA sequence analysis of exons 2, 3, and 4 of the TTR gene demonstrated a c.272 C>G mutation in exon 2 (p.L32V). Sural nerve biopsy revealed massive amyloid deposition in the perineurium, endoneurium and vasa nervorum. Mass spectrometric analyses of ATTR immunoprecipitated from nerve biopsy showed the presence of both wild-type and variant proteins. The observed mass results for the wild-type and variant proteins were consistent with the predicted values calculated from the genetic analysis data. The ATTR L32V is associated with a severe course. This has implications for treatment of affected individuals and counseling of family members. © 2017 Peripheral Nerve Society.
Genome-, Transcriptome- and Proteome-Wide Analyses of the Gliadin Gene Families in Triticum urartu

PubMed Central

Wang, Dongzhi; Yang, Wenlong; Sun, Jiazhu; Zhang, Aimin; Zhan, Kehui

2015-01-01

Gliadins are the major components of storage proteins in wheat grains, and they play an essential role in the dough extensibility and nutritional quality of flour. Because of the large number of the gliadin family members, the high level of sequence identity, and the lack of abundant genomic data for Triticum species, identifying the full complement of gliadin family genes in hexaploid wheat remains challenging. Triticum urartu is a wild diploid wheat species and considered the A-genome donor of polyploid wheat species. The accession PI428198 (G1812) was chosen to determine the complete composition of the gliadin gene families in the wheat A-genome using the available draft genome. Using a PCR-based cloning strategy for genomic DNA and mRNA as well as a bioinformatics analysis of genomic sequence data, 28 gliadin genes were characterized. Of these genes, 23 were α-gliadin genes, three were γ-gliadin genes and two were ω-gliadin genes. An RNA sequencing (RNA-Seq) survey of the dynamic expression patterns of gliadin genes revealed that their synthesis in immature grains began prior to 10 days post-anthesis (DPA), peaked at 15 DPA and gradually decreased at 20 DPA. The accumulation of proteins encoded by 16 of the expressed gliadin genes was further verified and quantified using proteomic methods. The phylogenetic analysis demonstrated that the homologs of these α-gliadin genes were present in tetraploid and hexaploid wheat, which was consistent with T. urartu being the A-genome progenitor species. This study presents a systematic investigation of the gliadin gene families in T. urartu that spans the genome, transcriptome and proteome, and it provides new information to better understand the molecular structure, expression profiles and evolution of the gliadin genes in T. urartu and common wheat. PMID:26132381
Genome-, Transcriptome- and Proteome-Wide Analyses of the Gliadin Gene Families in Triticum urartu.

PubMed

Zhang, Yanlin; Luo, Guangbin; Liu, Dongcheng; Wang, Dongzhi; Yang, Wenlong; Sun, Jiazhu; Zhang, Aimin; Zhan, Kehui

2015-01-01

Gliadins are the major components of storage proteins in wheat grains, and they play an essential role in the dough extensibility and nutritional quality of flour. Because of the large number of the gliadin family members, the high level of sequence identity, and the lack of abundant genomic data for Triticum species, identifying the full complement of gliadin family genes in hexaploid wheat remains challenging. Triticum urartu is a wild diploid wheat species and considered the A-genome donor of polyploid wheat species. The accession PI428198 (G1812) was chosen to determine the complete composition of the gliadin gene families in the wheat A-genome using the available draft genome. Using a PCR-based cloning strategy for genomic DNA and mRNA as well as a bioinformatics analysis of genomic sequence data, 28 gliadin genes were characterized. Of these genes, 23 were α-gliadin genes, three were γ-gliadin genes and two were ω-gliadin genes. An RNA sequencing (RNA-Seq) survey of the dynamic expression patterns of gliadin genes revealed that their synthesis in immature grains began prior to 10 days post-anthesis (DPA), peaked at 15 DPA and gradually decreased at 20 DPA. The accumulation of proteins encoded by 16 of the expressed gliadin genes was further verified and quantified using proteomic methods. The phylogenetic analysis demonstrated that the homologs of these α-gliadin genes were present in tetraploid and hexaploid wheat, which was consistent with T. urartu being the A-genome progenitor species. This study presents a systematic investigation of the gliadin gene families in T. urartu that spans the genome, transcriptome and proteome, and it provides new information to better understand the molecular structure, expression profiles and evolution of the gliadin genes in T. urartu and common wheat.
A novel thermophilic and halophilic esterase from Janibacter sp. R02, the first member of a new lipase family (Family XVII).

PubMed

Castilla, Agustín; Panizza, Paola; Rodríguez, Diego; Bonino, Luis; Díaz, Pilar; Irazoqui, Gabriela; Rodríguez Giordano, Sonia

2017-03-01

Janibacter sp. strain R02 (BNM 560) was isolated in our laboratory from an Antarctic soil sample. A remarkable trait of the strain was its high lipolytic activity, detected in Rhodamine-olive oil supplemented plates. Supernatants of Janibacter sp. R02 displayed superb activity on transesterification of acyl glycerols, thus being a good candidate for lipase prospection. Considering the lack of information concerning lipases of the genus Janibacter, we focused on the identification, cloning, expression and characterization of the extracellular lipases of this strain. By means of sequence alignment and clustering of consensus nucleotide sequences, a DNA fragment of 1272bp was amplified, cloned and expressed in E. coli. The resulting recombinant enzyme, named LipJ2, showed preference for short to medium chain-length substrates, and displayed maximum activity at 80°C and pH 8-9, being strongly activated by a mixture of Na + and K + . The enzyme presented an outstanding stability regarding both pH and temperature. Bioinformatics analysis of the amino acid sequence of LipJ2 revealed the presence of a consensus catalytic triad and a canonical pentapeptide. However, two additional rare motifs were found in LipJ2: an SXXL β-lactamase motif and two putative Y-type oxyanion holes (YAP). Although some of the previous features could allow assigning LipJ2 to the bacterial lipase families VIII or X, the phylogenetic analysis showed that LipJ2 clusters apart from other members of known lipase families, indicating that the newly isolated Janibacter esterase LipJ2 would be the first characterized member of a new family of bacterial lipases. Published by Elsevier Inc.
Small RNA profiling and degradome analysis reveal regulation of microRNA in peanut embryogenesis and early pod development.

PubMed

Gao, Chao; Wang, Pengfei; Zhao, Shuzhen; Zhao, Chuanzhi; Xia, Han; Hou, Lei; Ju, Zheng; Zhang, Ye; Li, Changsheng; Wang, Xingjun

2017-03-02

As a typical geocarpic plant, peanut embryogenesis and pod development are complex processes involving many gene regulatory pathways and controlled by appropriate hormone level. MicroRNAs (miRNAs) are small non-coding RNAs that play indispensable roles in post-transcriptional gene regulation. Recently, identification and characterization of peanut miRNAs has been described. However, whether miRNAs participate in the regulation of peanut embryogenesis and pod development has yet to be explored. In this study, small RNA and degradome libraries from peanut early pod of different developmental stages were constructed and sequenced. A total of 70 known and 24 novel miRNA families were discovered. Among them, 16 miRNA families were legume-specific and 12 families were peanut-specific. 30 known and 10 novel miRNA families were differentially expressed during pod development. In addition, 115 target genes were identified for 47 miRNA families by degradome sequencing. Several new targets that might be specific to peanut were found and further validated by RNA ligase-mediated rapid amplification of 5' cDNA ends (RLM 5'-RACE). Furthermore, we performed profiling analysis of intact and total transcripts of several target genes, demonstrating that SPL (miR156/157), NAC (miR164), PPRP (miR167 and miR1088), AP2 (miR172) and GRF (miR396) are actively modulated during early pod development, respectively. Large numbers of miRNAs and their related target genes were identified through deep sequencing. These findings provided new information on miRNA-mediated regulatory pathways in peanut pod, which will contribute to the comprehensive understanding of the molecular mechanisms that governing peanut embryo and early pod development.
Reduced expression of APC-1B but not APC-1A by the deletion of promoter 1B is responsible for familial adenomatous polyposis.

PubMed

Yamaguchi, Kiyoshi; Nagayama, Satoshi; Shimizu, Eigo; Komura, Mitsuhiro; Yamaguchi, Rui; Shibuya, Tetsuo; Arai, Masami; Hatakeyama, Seira; Ikenoue, Tsuneo; Ueno, Masashi; Miyano, Satoru; Imoto, Seiya; Furukawa, Yoichi

2016-05-24

Germline mutations in the tumor suppressor gene APC are associated with familial adenomatous polyposis (FAP). Here we applied whole-genome sequencing (WGS) to the DNA of a sporadic FAP patient in which we did not find any pathological APC mutations by direct sequencing. WGS identified a promoter deletion of approximately 10 kb encompassing promoter 1B and exon1B of APC. Additional allele-specific expression analysis by deep cDNA sequencing revealed that the deletion reduced the expression of the mutated APC allele to as low as 11.2% in the total APC transcripts, suggesting that the residual mutant transcripts were driven by other promoter(s). Furthermore, cap analysis of gene expression (CAGE) demonstrated that the deleted promoter 1B region is responsible for the great majority of APC transcription in many tissues except the brain. The deletion decreased the transcripts of APC-1B to 39-45% in the patient compared to the healthy controls, but it did not decrease those of APC-1A. Different deletions including promoter 1B have been reported in FAP patients. Taken together, our results strengthen the evidence that analysis of structural variations in promoter 1B should be considered for the FAP patients whose pathological mutations are not identified by conventional direct sequencing.
An Artificial Functional Family Filter in Homolog Searching in Next-generation Sequencing Metagenomics

PubMed Central

Du, Ruofei; Mercante, Donald; Fang, Zhide

2013-01-01

In functional metagenomics, BLAST homology search is a common method to classify metagenomic reads into protein/domain sequence families such as Clusters of Orthologous Groups of proteins (COGs) in order to quantify the abundance of each COG in the community. The resulting functional profile of the community is then used in downstream analysis to correlate the change in abundance to environmental perturbation, clinical variation, and so on. However, the short read length coupled with next-generation sequencing technologies poses a barrier in this approach, essentially because similarity significance cannot be discerned by searching with short reads. Consequently, artificial functional families are produced, in which those with a large number of reads assigned decreases the accuracy of functional profile dramatically. There is no method available to address this problem. We intended to fill this gap in this paper. We revealed that BLAST similarity scores of homologues for short reads from COG protein members coding sequences are distributed differently from the scores of those derived elsewhere. We showed that, by choosing an appropriate score cut-off, we are able to filter out most artificial families and simultaneously to preserve sufficient information in order to build the functional profile. We also showed that, by incorporated application of BLAST and RPS-BLAST, some artificial families with large read counts can be further identified after the score cutoff filtration. Evaluated on three experimental metagenomic datasets with different coverages, we found that the proposed method is robust against read coverage and consistently outperforms the other E-value cutoff methods currently used in literatures. PMID:23516532
Integrating metagenomic and amplicon databases to resolve the phylogenetic and ecological diversity of the Chlamydiae

PubMed Central

Lagkouvardos, Ilias; Weinmaier, Thomas; Lauro, Federico M; Cavicchioli, Ricardo; Rattei, Thomas; Horn, Matthias

2014-01-01

In the era of metagenomics and amplicon sequencing, comprehensive analyses of available sequence data remain a challenge. Here we describe an approach exploiting metagenomic and amplicon data sets from public databases to elucidate phylogenetic diversity of defined microbial taxa. We investigated the phylum Chlamydiae whose known members are obligate intracellular bacteria that represent important pathogens of humans and animals, as well as symbionts of protists. Despite their medical relevance, our knowledge about chlamydial diversity is still scarce. Most of the nine known families are represented by only a few isolates, while previous clone library-based surveys suggested the existence of yet uncharacterized members of this phylum. Here we identified more than 22 000 high quality, non-redundant chlamydial 16S rRNA gene sequences in diverse databases, as well as 1900 putative chlamydial protein-encoding genes. Even when applying the most conservative approach, clustering of chlamydial 16S rRNA gene sequences into operational taxonomic units revealed an unexpectedly high species, genus and family-level diversity within the Chlamydiae, including 181 putative families. These in silico findings were verified experimentally in one Antarctic sample, which contained a high diversity of novel Chlamydiae. In our analysis, the Rhabdochlamydiaceae, whose known members infect arthropods, represents the most diverse and species-rich chlamydial family, followed by the protist-associated Parachlamydiaceae, and a putative new family (PCF8) with unknown host specificity. Available information on the origin of metagenomic samples indicated that marine environments contain the majority of the newly discovered chlamydial lineages, highlighting this environment as an important chlamydial reservoir. PMID:23949660
Molecular characterization of pea enation mosaic virus and bean leafroll virus from the Pacific Northwest, USA.

PubMed

Vemulapati, B; Druffel, K L; Eigenbrode, S D; Karasev, A; Pappu, H R

2010-10-01

The family Luteoviridae consists of eight viruses assigned to three different genera, Luteovirus, Polerovirus and Enamovirus. The complete genomic sequences of pea enation mosaic virus (genus Enamovirus) and bean leafroll virus (genus Luteovirus) from the Pacific Northwest, USA, were determined. Annotation, sequence comparisons, and phylogenetic analysis of selected genes together with those of known polero- and enamoviruses were conducted.
Graph pyramids for protein function prediction

PubMed Central

2015-01-01

Background Uncovering the hidden organizational characteristics and regularities among biological sequences is the key issue for detailed understanding of an underlying biological phenomenon. Thus pattern recognition from nucleic acid sequences is an important affair for protein function prediction. As proteins from the same family exhibit similar characteristics, homology based approaches predict protein functions via protein classification. But conventional classification approaches mostly rely on the global features by considering only strong protein similarity matches. This leads to significant loss of prediction accuracy. Methods Here we construct the Protein-Protein Similarity (PPS) network, which captures the subtle properties of protein families. The proposed method considers the local as well as the global features, by examining the interactions among 'weakly interacting proteins' in the PPS network and by using hierarchical graph analysis via the graph pyramid. Different underlying properties of the protein families are uncovered by operating the proposed graph based features at various pyramid levels. Results Experimental results on benchmark data sets show that the proposed hierarchical voting algorithm using graph pyramid helps to improve computational efficiency as well the protein classification accuracy. Quantitatively, among 14,086 test sequences, on an average the proposed method misclassified only 21.1 sequences whereas baseline BLAST score based global feature matching method misclassified 362.9 sequences. With each correctly classified test sequence, the fast incremental learning ability of the proposed method further enhances the training model. Thus it has achieved more than 96% protein classification accuracy using only 20% per class training data. PMID:26044522
Graph pyramids for protein function prediction.

PubMed

Sandhan, Tushar; Yoo, Youngjun; Choi, Jin; Kim, Sun

2015-01-01

Uncovering the hidden organizational characteristics and regularities among biological sequences is the key issue for detailed understanding of an underlying biological phenomenon. Thus pattern recognition from nucleic acid sequences is an important affair for protein function prediction. As proteins from the same family exhibit similar characteristics, homology based approaches predict protein functions via protein classification. But conventional classification approaches mostly rely on the global features by considering only strong protein similarity matches. This leads to significant loss of prediction accuracy. Here we construct the Protein-Protein Similarity (PPS) network, which captures the subtle properties of protein families. The proposed method considers the local as well as the global features, by examining the interactions among 'weakly interacting proteins' in the PPS network and by using hierarchical graph analysis via the graph pyramid. Different underlying properties of the protein families are uncovered by operating the proposed graph based features at various pyramid levels. Experimental results on benchmark data sets show that the proposed hierarchical voting algorithm using graph pyramid helps to improve computational efficiency as well the protein classification accuracy. Quantitatively, among 14,086 test sequences, on an average the proposed method misclassified only 21.1 sequences whereas baseline BLAST score based global feature matching method misclassified 362.9 sequences. With each correctly classified test sequence, the fast incremental learning ability of the proposed method further enhances the training model. Thus it has achieved more than 96% protein classification accuracy using only 20% per class training data.
Molecular variability analysis of five new complete cacao swollen shoot virus genomic sequences.

PubMed

Muller, E; Sackey, S

2005-01-01

Cacao swollen shoot virus (CSSV), a member of the family Caulimovi-ridae, genus Badnavirus occurs in all the main cacao-growing areas of West Africa. We amplified, cloned and sequenced complete genomes of five new isolates, two originating from Togo and three originating from Ghana. The genome of these five newly sequenced isolates all contain the five putative open reading frames I, II, III, X and Y described for the first sequenced CSSV isolate, Agou1 originating from Togo. Their genomes have been aligned with the genome of Agou1. The nucleotide and amino acid sequence identities between isolates have been calculated and a phylogenetic analysis has been made including other pararetroviruses. Maximum nucleotide sequence variability between complete genomes of CSSV isolates was 29.4%. Geographical differentiation between isolates appears more important than differentiation between mild and severe isolates. ORF X differs greatly in size and sequence between the Togolese isolates Nyongbo2 and Agou1, and the four other isolates, its functional role is therefore clearly questionable.
A Novel Missense Mutation p.Gly162Glu of the Gene MYL2 Involved in Hypertrophic Cardiomyopathy: A Pedigree Analysis of a Proband.

PubMed

Renaudin, Pauline; Janin, Alexandre; Millat, Gilles; Chevalier, Philippe

2018-04-01

Hypertrophic cardiomyopathy (HCM), a common and clinically heterogeneous disease characterized by unexplained ventricular myocardial hypertrophy, is mostly caused by mutations in sarcomeric genes. Identifying the genetic cause is important for management, therapy, and genetic counseling. A molecular diagnosis was performed on a 51-year-old woman diagnosed with HCM using a next-generation sequencing workflow based on a panel designed for sequencing the most prevalent cardiomyopathy-causing genes. Segregation analysis was performed on the woman's family. A novel myosin regulatory light chain (MYL2) missense variant, NM_000432.3:c485G>A, p.Gly162Glu, was identified and firstly considered as a putative pathogenic mutation. Among the 27 family members tested, 16 were carriers for the MYL2-p.Gly162Glu mutation, of whom 12 with the phenotype were positive. None of the 11 family members without mutation had cardiomyopathy. Genetic analysis combined with a segregation study allowed us to classify this novel MYL2 variation, p.Gly162Glu, as a novel pathogenic mutation leading to a familial form of HCM. Due to absence of fast in vitro approaches to evaluate the functional impact of missense variants on HCM-causing genes, segregation studies remain, when possible, the easiest approach to evaluate the putative pathogenicity of novel gene variants, more particularly missense ones.
Massive sequencing of 70 genes reveals a myriad of missing genes or mechanisms to be uncovered in hereditary spastic paraplegias

PubMed Central

Morais, Sara; Raymond, Laure; Mairey, Mathilde; Coutinho, Paula; Brandão, Eva; Ribeiro, Paula; Loureiro, José Leal; Sequeiros, Jorge; Brice, Alexis; Alonso, Isabel; Stevanin, Giovanni

2017-01-01

Hereditary spastic paraplegias (HSP) are neurodegenerative disorders characterized by lower limb spasticity and weakness that can be complicated by other neurological or non-neurological signs. Despite a high genetic heterogeneity (>60 causative genes), 40–70% of the families remain without a molecular diagnosis. Analysis of one of the pioneer cohorts of 193 HSP families generated in the early 1990s in Portugal highlighted that SPAST and SPG11 are the most frequent diagnoses. We have now explored 98 unsolved families from this series using custom next generation sequencing panels analyzing up to 70 candidate HSP genes. We identified the likely disease-causing variant in 20 of the 98 families with KIF5A being the most frequently mutated gene. We also found 52 variants of unknown significance (VUS) in 38% of the cases. These new diagnoses resulted in 42% of solved cases in the full Portuguese cohort (81/193). Segregation of the variants was not always compatible with the presumed inheritance, indicating that the analysis of all HSP genes regardless of the inheritance mode can help to explain some cases. Our results show that there is still a large set of unknown genes responsible for HSP and most likely novel mechanisms or inheritance modes leading to the disease to be uncovered, but this will require international collaborative efforts, particularly for the analysis of VUS. PMID:28832565
Oligo Design: a computer program for development of probes for oligonucleotide microarrays.

PubMed

Herold, Keith E; Rasooly, Avraham

2003-12-01

Oligonucleotide microarrays have demonstrated potential for the analysis of gene expression, genotyping, and mutational analysis. Our work focuses primarily on the detection and identification of bacteria based on known short sequences of DNA. Oligo Design, the software described here, automates several design aspects that enable the improved selection of oligonucleotides for use with microarrays for these applications. Two major features of the program are: (i) a tiling algorithm for the design of short overlapping temperature-matched oligonucleotides of variable length, which are useful for the analysis of single nucleotide polymorphisms and (ii) a set of tools for the analysis of multiple alignments of gene families and related short DNA sequences, which allow for the identification of conserved DNA sequences for PCR primer selection and variable DNA sequences for the selection of unique probes for identification. Note that the program does not address the full genome perspective but, instead, is focused on the genetic analysis of short segments of DNA. The program is Internet-enabled and includes a built-in browser and the automated ability to download sequences from GenBank by specifying the GI number. The program also includes several utilities, including audio recital of a DNA sequence (useful for verifying sequences against a written document), a random sequence generator that provides insight into the relationship between melting temperature and GC content, and a PCR calculator.
Isolation and characterization of the chicken trypsinogen gene family.

PubMed Central

Wang, K; Gan, L; Lee, I; Hood, L

1995-01-01

Based on genomic Southern hybridizations and cDNA sequence analyses, the chicken trypsinogen gene family can be divided into two multi-member subfamilies, a six-member trypsinogen I subfamily which encodes the cationic trypsin isoenzymes and a three-member trypsinogen II subfamily which encodes the anionic trypsin isoenzymes. The chicken cDNA and genomic clones containing these two subfamilies were isolated and characterized by DNA sequence analysis. The results indicated that the chicken trypsinogen genes encoded a signal peptide of 15 to 16 amino acid residues, an activation peptide of 9 to 10 residues and a trypsin of 223 amino acid residues. The chicken trypsinogens contain all the common catalytic and structural features for trypsins, including the catalytic triad His, Asp and Ser and the six disulphide bonds. The trypsinogen I and II subfamilies share approximately 70% sequence identity at the nucleotide and amino acid level. The sequence comparison among chicken trypsinogen subfamily members and trypsin sequences from other species suggested that the chicken trypsinogen genes may have evolved in coincidental or concerted fashion. Images Figure 6 Figure 7 PMID:7733885
Outcome of ABCA4 disease-associated alleles in autosomal recessive retinal dystrophies: retrospective analysis in 420 Spanish families.

PubMed

Riveiro-Alvarez, Rosa; Lopez-Martinez, Miguel-Angel; Zernant, Jana; Aguirre-Lamban, Jana; Cantalapiedra, Diego; Avila-Fernandez, Almudena; Gimenez, Ascension; Lopez-Molina, Maria-Isabel; Garcia-Sandoval, Blanca; Blanco-Kelly, Fiona; Corton, Marta; Tatu, Sorina; Fernandez-San Jose, Patricia; Trujillo-Tiebas, Maria-Jose; Ramos, Carmen; Allikmets, Rando; Ayuso, Carmen

2013-11-01

To provide a comprehensive overview of all detected mutations in the ABCA4 gene in Spanish families with autosomal recessive retinal disorders, including Stargardt's disease (arSTGD), cone-rod dystrophy (arCRD), and retinitis pigmentosa (arRP), and to assess genotype-phenotype correlation and disease progression in 10 years by considering the type of variants and age at onset. Case series. A total of 420 unrelated Spanish families: 259 arSTGD, 86 arCRD, and 75 arRP. Spanish families were analyzed through a combination of ABCR400 genotyping microarray, denaturing high-performance liquid chromatography, and high-resolution melting scanning. Direct sequencing was used as a confirmation technique for the identified variants. Screening by multiple ligation probe analysis was used to detect possible large deletions or insertions in the ABCA4 gene. Selected families were analyzed further by next generation sequencing. DNA sequence variants, mutation detection rates, haplotypes, age at onset, central or peripheral vision loss, and night blindness. Overall, we detected 70.5% and 36.6% of all expected ABCA4 mutations in arSTGD and arCRD patient cohorts, respectively. In the fraction of the cohort where the ABCA4 gene was sequenced completely, the detection rates reached 73.6% for arSTGD and 66.7% for arCRD. However, the frequency of possibly pathogenic ABCA4 alleles in arRP families was only slightly higher than that in the general population. Moreover, in some families, mutations in other known arRP genes segregated with the disease phenotype. An increasing understanding of causal ABCA4 alleles in arSTGD and arCRD facilitates disease diagnosis and prognosis and also is paramount in selecting patients for emerging clinical trials of therapeutic interventions. Because ABCA4-associated diseases are evolving retinal dystrophies, assessment of age at onset, accurate clinical diagnosis, and genetic testing are crucial. We suggest that ABCA4 mutations may be associated with a retinitis pigmentosa-like phenotype often as a consequence of severe (null) mutations, in cases of long-term, advanced disease, or both. Patients with classical arRP phenotypes, especially from the onset of the disease, should be screened first for mutations in known arRP genes and not ABCA4. Copyright © 2013 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.
Porcine MYF6 gene: sequence, homology analysis, and variation in the promoter region.

PubMed

Wyszyńska-Koko, J; Kurył, J

2004-01-01

MYF6 gene codes for the bHLH transcription factor belonging to MyoD family. Its expression accompanies the processes of differentiation and maturation of myotubes during embriogenesis and continues on a relatively high level after birth, affecting the muscle phenotype. The porcine MYF6 gene was amplified and sequenced and compared with MYF6 gene sequences of other species. The amino acid sequence was deduced and an interspecies homology analysis was performed. Myf-6 protein shows a high conservation among species of 99 and 97% identity when comparing pig with cow and human, respectively, and of 93% when comparing pig with mouse and rat. The single nucleotide polymorphism (SNP) was revealed within the promoter region, which appeared to be T --> C transition recognized by a MspI restriction enzyme.
Phylogenetic diversity and biological activity of culturable Actinobacteria isolated from freshwater fish gut microbiota.

PubMed

Jami, Mansooreh; Ghanbari, Mahdi; Kneifel, Wolfgang; Domig, Konrad J

2015-06-01

The diversity of Actinobacteria isolated from the gut microbiota of two freshwater fish species namely Schizothorax zarudnyi and Schizocypris altidorsalis was investigated employing classical cultivation techniques, repetitive sequence-based PCR (rep-PCR), partial and full 16S rDNA sequencing followed by phylogenetic analysis. A total of 277 isolates were cultured by applying three different agar media. Based on rep-PCR profile analysis a subset of 33 strains was selected for further phylogenetic investigations, antimicrobial activity testing and diversity analysis of secondary-metabolite biosynthetic genes. The identification based on 16S rRNA gene sequencing revealed that the isolates belong to eight genera distributed among six families. At the family level, 72% of the 277 isolates belong to the family Streptomycetaceae. Among the non-streptomycetes group, the most dominant group could be allocated to the family of Pseudonocardiaceae followed by the members of Micromonosporaceae. Phylogenetic analysis clearly showed that many of the isolates in the genera Streptomyces, Saccharomonospora, Micromonospora, Nocardiopsis, Arthrobacter, Kocuria, Microbacterium and Agromyces formed a single and distinct cluster with the type strains. Notably, there is no report so far about the occurrence of these Actinobacteria in the microbiota of freshwater fish. Of the 33 isolates, all the strains exhibited antibacterial activity against a set of tested human and fish pathogenic bacteria. Then, to study their associated potential capacity to synthesize diverse bioactive natural products, diversity of genes associated with secondary-metabolite biosynthesis including PKS I, PKS II, NRPS, the enzyme PhzE of the phenazine pathways, the enzyme dTGD of 6-deoxyhexoses glycosylation pathway, the enzyme Halo of halogenation pathway and the enzyme CYP in polyene polyketide biosynthesis were investigated among the isolates. All the strains possess at least two types of the investigated biosynthetic genes, one-fourth of them harbours more than four. This study demonstrates the significant diversity of Actinobacteria in the fish gut microbiota and it's potential to produce biologically active compounds. Copyright © 2015 Elsevier GmbH. All rights reserved.
Novel compound heterozygous mutations in MYO7A in a Chinese family with Usher syndrome type 1

PubMed Central

Liu, Fei; Li, Pengcheng; Liu, Ying; Li, Weirong; Wong, Fulton; Du, Rong; Wang, Lei; Li, Chang; Jiang, Fagang; Tang, Zhaohui

2013-01-01

Purpose To identify the disease-causing mutation(s) in a Chinese family with autosomal recessive Usher syndrome type 1 (USH1). Methods An ophthalmic examination and an audiometric test were conducted to ascertain the phenotype of two affected siblings. The microsatellite marker D11S937, which is close to the candidate gene MYO7A (USH1B locus), was selected for genotyping. From the DNA of the proband, all coding exons and exon-intron boundaries of MYO7A were sequenced to identify the disease-causing mutation(s). Restriction fragment length polymorphism (RFLP) analysis was performed to exclude the alternative conclusion that the mutations are non-pathogenic rare polymorphisms. Results Based on severe hearing impairment, unintelligible speech, and retinitis pigmentosa, a clinical diagnosis of Usher syndrome type 1 was made. The genotyping results did not exclude the USH1B locus, which suggested that the MYO7A gene was likely the gene associated with the disease-causing mutation(s) in the family. With direct DNA sequencing of MYO7A, two novel compound heterozygous mutations (c.3742G>A and c.6051+1G>A) of MYO7A were identified in the proband. DNA sequence analysis and RFLP analysis of other family members showed that the mutations cosegregated with the disease. Unaffected members, including the parents, uncle, and sister of the proband, carry only one of the two mutations. The mutations were not present in the controls (100 normal Chinese subjects=200 chromosomes) according to the RFLP analysis. Conclusions In this study, we identified two novel mutations, c.3742G>A (p.E1248K) and c.6051+1G>A (donor splice site mutation in intron 44), of MYO7A in a Chinese non-consanguineous family with USH1. The mutations cosegregated with the disease and most likely cause the phenotype in the two affected siblings who carry these mutations compound heterozygously. Our finding expands the mutational spectrum of MYO7A. PMID:23559863

Mutation analysis in a German family identified a new cataract-causing allele in the CRYBB2 gene

PubMed Central

Pauli, Silke; Söker, Torben; Klopp, Norman; Illig, Thomas; Engel, Wolfgang

2007-01-01

Purpose The study demonstrates the functional candidate gene analysis in a cataract family of German descent. Methods We screened a German family, clinically documented to have congenital cataracts, for mutation in the candidate genes CRYG (A to D) and CRYBB2 through polymerase chain reaction analyses and sequencing. Results Congenital cataract was first observed in a daughter of healthy parents. Her two children (a boy and a girl) also suffer from congenital cataracts and have been operated within the first weeks of birth. Morphologically, the cataract is characterized as nuclear with an additional ring-shaped cortical opacity. Molecular analysis revealed no causative mutation in any of the CRYG genes. However, sequencing of the exons of the CRYBB2 gene identified a sequence variation in exon 5 (383 A>T) with a substitution of Asp to Val at position 128. All three affected family members revealed this change but it was not observed in any of the unaffected persons of the family. The putative mutation creates a restriction site for the enzyme TaiI. This mutation was checked for in controls of randomly selected DNA samples from ophthalmologically normal individuals from the population-based KORA S4 study (n=96) and no mutation was observed. Moreover, the Asp at position 128 is within a stretch of 12 amino acids, which are highly conserved throughout the animal kingdom. For the mutant protein, the isoelectric point is raised from pH 6.50 to 6.75. Additionally, the random coil structure of the protein between the amino acids 126-139 is interrupted by a short extended strand structure. In addition, this region becomes hydrophobic (from neutral to +1) and the electrostatic potential in the region surrounding the exchanged amino acid alters from a mainly negative potential to an enlarged positive potential. Conclusions The D128V mutation segregates only in affected family members and is not seen in representative controls. It represents the first mutation outside exon 6 of the human CRYBB2 gene. PMID:17653036
Identification and sequence analyses of novel lipase encoding novel thermophillic bacilli isolated from Armenian geothermal springs.

PubMed

Shahinyan, Grigor; Margaryan, Armine; Panosyan, Hovik; Trchounian, Armen

2017-05-02

Among the huge diversity of thermophilic bacteria mainly bacilli have been reported as active thermostable lipase producers. Geothermal springs serve as the main source for isolation of thermostable lipase producing bacilli. Thermostable lipolytic enzymes, functioning in the harsh conditions, have promising applications in processing of organic chemicals, detergent formulation, synthesis of biosurfactants, pharmaceutical processing etc. In order to study the distribution of lipase-producing thermophilic bacilli and their specific lipase protein primary structures, three lipase producers from different genera were isolated from mesothermal (27.5-70 °C) springs distributed on the territory of Armenia and Nagorno Karabakh. Based on phenotypic characteristics and 16S rRNA gene sequencing the isolates were identified as Geobacillus sp., Bacillus licheniformis and Anoxibacillus flavithermus strains. The lipase genes of isolates were sequenced by using initially designed primer sets. Multiple alignments generated from primary structures of the lipase proteins and annotated lipase protein sequences, conserved regions analysis and amino acid composition have illustrated the similarity (98-99%) of the lipases with true lipases (family I) and GDSL esterase family (family II). A conserved sequence block that determines the thermostability has been identified in the multiple alignments of the lipase proteins. The results are spreading light on the lipase producing bacilli distribution in geothermal springs in Armenia and Nagorno Karabakh. Newly isolated bacilli strains could be prospective source for thermostable lipases and their genes.
Genes encoding calmodulin-binding proteins in the Arabidopsis genome

NASA Technical Reports Server (NTRS)

Reddy, Vaka S.; Ali, Gul S.; Reddy, Anireddy S N.

2002-01-01

Analysis of the recently completed Arabidopsis genome sequence indicates that approximately 31% of the predicted genes could not be assigned to functional categories, as they do not show any sequence similarity with proteins of known function from other organisms. Calmodulin (CaM), a ubiquitous and multifunctional Ca(2+) sensor, interacts with a wide variety of cellular proteins and modulates their activity/function in regulating diverse cellular processes. However, the primary amino acid sequence of the CaM-binding domain in different CaM-binding proteins (CBPs) is not conserved. One way to identify most of the CBPs in the Arabidopsis genome is by protein-protein interaction-based screening of expression libraries with CaM. Here, using a mixture of radiolabeled CaM isoforms from Arabidopsis, we screened several expression libraries prepared from flower meristem, seedlings, or tissues treated with hormones, an elicitor, or a pathogen. Sequence analysis of 77 positive clones that interact with CaM in a Ca(2+)-dependent manner revealed 20 CBPs, including 14 previously unknown CBPs. In addition, by searching the Arabidopsis genome sequence with the newly identified and known plant or animal CBPs, we identified a total of 27 CBPs. Among these, 16 CBPs are represented by families with 2-20 members in each family. Gene expression analysis revealed that CBPs and CBP paralogs are expressed differentially. Our data suggest that Arabidopsis has a large number of CBPs including several plant-specific ones. Although CaM is highly conserved between plants and animals, only a few CBPs are common to both plants and animals. Analysis of Arabidopsis CBPs revealed the presence of a variety of interesting domains. Our analyses identified several hypothetical proteins in the Arabidopsis genome as CaM targets, suggesting their involvement in Ca(2+)-mediated signaling networks.
Ebolavirus comparative genomics

DOE PAGES

Jun, Se-Ran; Leuze, Michael R.; Nookaew, Intawat; ...

2015-07-14

The 2014 Ebola outbreak in West Africa is the largest documented for this virus. We examine the dynamics of this genome, comparing more than one hundred currently available ebolavirus genomes to each other and to other viral genomes. Based on oligomer frequency analysis, the family Filoviridae forms a distinct group from all other sequenced viral genomes. All filovirus genomes sequenced to date encode proteins with similar functions and gene order, although there is considerable divergence in sequences between the three genera Ebolavirus, Cuevavirus, and Marburgvirus within the family Filoviridae. Whereas all ebolavirus genomes are quite similar (multiple sequences of themore » same strain are often identical), variation is most common in the intergenic regions and within specific areas of the genes encoding the glycoprotein (GP), nucleoprotein (NP), and polymerase (L). We predict regions that could contain epitope-binding sites, which might be good vaccine targets. In conclusion, this information, combined with glycosylation sites and experimentally determined epitopes, can identify the most promising regions for the development of therapeutic strategies.« less
A novel recessive mutation in the gene ELOVL4 causes a neuro-ichthyotic disorder with variable expressivity

PubMed Central

2014-01-01

Background A rare neuro-ichthyotic disorder characterized by ichthyosis, spastic quadriplegia and intellectual disability and caused by recessive mutations in ELOVL4, encoding elongase-4 protein has recently been described. The objective of the study was to search for sequence variants in the gene ELOVL4 in three affected individuals of a consanguineous Pakistani family exhibiting features of neuro-ichthyotic disorder. Methods Linkage in the family was searched by genotyping microsatellite markers linked to the gene ELOVL4, mapped at chromosome 6p14.1. Exons and splice junction sites of the gene ELOVL4 were polymerase chain reaction amplified and sequenced in an automated DNA sequencer. Results DNA sequence analysis revealed a novel homozygous nonsense mutation (c.78C > G; p.Tyr26*). Conclusions Our report further confirms the recently described ELOVL4-related neuro-ichthyosis and shows that the neurological phenotype can be absent in some individuals. PMID:24571530
MACSIMS : multiple alignment of complete sequences information management system

PubMed Central

Thompson, Julie D; Muller, Arnaud; Waterhouse, Andrew; Procter, Jim; Barton, Geoffrey J; Plewniak, Frédéric; Poch, Olivier

2006-01-01

Background In the post-genomic era, systems-level studies are being performed that seek to explain complex biological systems by integrating diverse resources from fields such as genomics, proteomics or transcriptomics. New information management systems are now needed for the collection, validation and analysis of the vast amount of heterogeneous data available. Multiple alignments of complete sequences provide an ideal environment for the integration of this information in the context of the protein family. Results MACSIMS is a multiple alignment-based information management program that combines the advantages of both knowledge-based and ab initio sequence analysis methods. Structural and functional information is retrieved automatically from the public databases. In the multiple alignment, homologous regions are identified and the retrieved data is evaluated and propagated from known to unknown sequences with these reliable regions. In a large-scale evaluation, the specificity of the propagated sequence features is estimated to be >99%, i.e. very few false positive predictions are made. MACSIMS is then used to characterise mutations in a test set of 100 proteins that are known to be involved in human genetic diseases. The number of sequence features associated with these proteins was increased by 60%, compared to the features available in the public databases. An XML format output file allows automatic parsing of the MACSIM results, while a graphical display using the JalView program allows manual analysis. Conclusion MACSIMS is a new information management system that incorporates detailed analyses of protein families at the structural, functional and evolutionary levels. MACSIMS thus provides a unique environment that facilitates knowledge extraction and the presentation of the most pertinent information to the biologist. A web server and the source code are available at . PMID:16792820
DNA Barcode Identification of Freshwater Snails in the Family Bithyniidae from Thailand

PubMed Central

Kulsantiwong, Jutharat; Prasopdee, Sattrachai; Ruangsittichai, Jiraporn; Ruangjirachuporn, Wipaporn; Boonmars, Thidarut; Viyanant, Vithoon; Pierossi, Paola; Hebert, Paul D. N.; Tesana, Smarn

2013-01-01

Freshwater snails in the family Bithyniidae are the first intermediate host for Southeast Asian liver fluke (Opisthorchis viverrini), the causative agent of opisthorchiasis. Unfortunately, the subtle morphological characters that differentiate species in this group are not easily discerned by non-specialists. This is a serious matter because the identification of bithyniid species is a fundamental prerequisite for better understanding of the epidemiology of this disease. Because DNA barcoding, the analysis of sequence diversity in the 5’ region of the mitochondrial COI gene, has shown strong performance in other taxonomic groups, we decided to test its capacity to resolve 10 species/ subspecies of bithyniids from Thailand. Our analysis of 217 specimens indicated that COI sequences delivered species-level identification for 9 of 10 currently recognized species. The mean intraspecific divergence of COI was 2.3% (range 0-9.2 %), whereas sequence divergences between congeneric species averaged 8.7% (range 0-22.2 %). Although our results indicate that DNA barcoding can differentiate species of these medically-important snails, we also detected evidence for the presence of one overlooked species and one possible case of synonymy. PMID:24223896
The genome of the Erwinia amylovora phage PhiEaH1 reveals greater diversity and broadens the applicability of phages for the treatment of fire blight.

PubMed

Meczker, Katalin; Dömötör, Dóra; Vass, János; Rákhely, Gábor; Schneider, György; Kovács, Tamás

2014-01-01

The enterobacterium Erwinia amylovora is the causal agent of fire blight. This study presents the analysis of the complete genome of phage PhiEaH1, isolated from the soil surrounding an E. amylovora-infected apple tree in Hungary. Its genome is 218 kb in size, containing 244 ORFs. PhiEaH1 is the second E. amylovora infecting phage from the Siphoviridae family whose complete genome sequence was determined. Beside PhiEaH2, PhiEaH1 is the other active component of Erwiphage, the first bacteriophage-based pesticide on the market against E. amylovora. Comparative genome analysis in this study has revealed that PhiEaH1 not only differs from the 10 formerly sequenced E. amylovora bacteriophages belonging to other phage families, but also from PhiEaH2. Sequencing of more Siphoviridae phage genomes might reveal further diversity, providing opportunities for the development of even more effective biological control agents, phage cocktails against Erwinia fire blight disease of commercial fruit crops.
Algorithm, applications and evaluation for protein comparison by Ramanujan Fourier transform.

PubMed

Zhao, Jian; Wang, Jiasong; Hua, Wei; Ouyang, Pingkai

2015-12-01

The amino acid sequence of a protein determines its chemical properties, chain conformation and biological functions. Protein sequence comparison is of great importance to identify similarities of protein structures and infer their functions. Many properties of a protein correspond to the low-frequency signals within the sequence. Low frequency modes in protein sequences are linked to the secondary structures, membrane protein types, and sub-cellular localizations of the proteins. In this paper, we present Ramanujan Fourier transform (RFT) with a fast algorithm to analyze the low-frequency signals of protein sequences. The RFT method is applied to similarity analysis of protein sequences with the Resonant Recognition Model (RRM). The results show that the proposed fast RFT method on protein comparison is more efficient than commonly used discrete Fourier transform (DFT). RFT can detect common frequencies as significant feature for specific protein families, and the RFT spectrum heat-map of protein sequences demonstrates the information conservation in the sequence comparison. The proposed method offers a new tool for pattern recognition, feature extraction and structural analysis on protein sequences. Copyright © 2015 Elsevier Ltd. All rights reserved.
Evolutionary Origin of the Scombridae (Tunas and Mackerels): Members of a Paleogene Adaptive Radiation with 14 Other Pelagic Fish Families

PubMed Central

Miya, Masaki; Friedman, Matt; Satoh, Takashi P.; Takeshima, Hirohiko; Sado, Tetsuya; Iwasaki, Wataru; Yamanoue, Yusuke; Nakatani, Masanori; Mabuchi, Kohji; Inoue, Jun G.; Poulsen, Jan Yde; Fukunaga, Tsukasa; Sato, Yukuto; Nishida, Mutsumi

2013-01-01

Uncertainties surrounding the evolutionary origin of the epipelagic fish family Scombridae (tunas and mackerels) are symptomatic of the difficulties in resolving suprafamilial relationships within Percomorpha, a hyperdiverse teleost radiation that contains approximately 17,000 species placed in 13 ill-defined orders and 269 families. Here we find that scombrids share a common ancestry with 14 families based on (i) bioinformatic analyses using partial mitochondrial and nuclear gene sequences from all percomorphs deposited in GenBank (10,733 sequences) and (ii) subsequent mitogenomic analysis based on 57 species from those targeted 15 families and 67 outgroup taxa. Morphological heterogeneity among these 15 families is so extraordinary that they have been placed in six different perciform suborders. However, members of the 15 families are either coastal or oceanic pelagic in their ecology with diverse modes of life, suggesting that they represent a previously undetected adaptive radiation in the pelagic realm. Time-calibrated phylogenies imply that scombrids originated from a deep-ocean ancestor and began to radiate after the end-Cretaceous when large predatory epipelagic fishes were selective victims of the Cretaceous-Paleogene mass extinction. We name this clade of open-ocean fishes containing Scombridae “Pelagia” in reference to the common habitat preference that links the 15 families. PMID:24023883
Molecular genetic characterization of the RD-114 gene family of endogenous feline retroviral sequences.

PubMed Central

Reeves, R H; O'Brien, S J

1984-01-01

RD-114 is a replication-competent, xenotropic retrovirus which is homologous to a family of moderately repetitive DNA sequences present at ca. 20 copies in the normal cellular genome of domestic cats. To examine the extent and character of genomic divergence of the RD-114 gene family as well as to assess their positional association within the cat genome, we have prepared a series of molecular clones of endogenous RD-114 DNA segments from a genomic library of cat cellular DNA. Their restriction endonuclease maps were compared with each other as well as to that of the prototype-inducible RD-114 which was molecularly cloned from a chronically infected human cell line. The endogenous sequences analyzed were similar to each other in that they were colinear with RD-114 proviral DNA, were bounded by long terminal redundancies, and conserved many restriction sites in the gag and pol regions. However, the env regions of many of the sequences examined were substantially deleted. Several of the endogenous RD-114 genomes contained a novel envelope sequence which was unrelated to the env gene of the prototype RD-114 env gene but which, like RD-114 and endogenous feline leukemia virus provirus, was found only in species of the genus Felis, and not in other closely related Felidae genera. The endogenous RD-114 sequences each had a distinct cellular flank which indicates that these sequences are not tandem but dispersed nonspecifically throughout the genome. Southern analysis of cat cellular DNA confirmed the conclusions about conserved restriction sites in endogenous sequences and indicated that a single locus may be responsible for the production of the major inducible form of RD-114. Images PMID:6090693
Conservation and diversification of Msx protein in metazoan evolution.

PubMed

Takahashi, Hirokazu; Kamiya, Akiko; Ishiguro, Akira; Suzuki, Atsushi C; Saitou, Naruya; Toyoda, Atsushi; Aruga, Jun

2008-01-01

Msx (/msh) family genes encode homeodomain (HD) proteins that control ontogeny in many animal species. We compared the structures of Msx genes from a wide range of Metazoa (Porifera, Cnidaria, Nematoda, Arthropoda, Tardigrada, Platyhelminthes, Mollusca, Brachiopoda, Annelida, Echiura, Echinodermata, Hemichordata, and Chordata) to gain an understanding of the role of these genes in phylogeny. Exon-intron boundary analysis suggested that the position of the intron located N-terminally to the HDs was widely conserved in all the genes examined, including those of cnidarians. Amino acid (aa) sequence comparison revealed 3 new evolutionarily conserved domains, as well as very strong conservation of the HDs. Two of the three domains were associated with Groucho-like protein binding in both a vertebrate and a cnidarian Msx homolog, suggesting that the interaction between Groucho-like proteins and Msx proteins was established in eumetazoan ancestors. Pairwise comparison among the collected HDs and their C-flanking aa sequences revealed that the degree of sequence conservation varied depending on the animal taxa from which the sequences were derived. Highly conserved Msx genes were identified in the Vertebrata, Cephalochordata, Hemichordata, Echinodermata, Mollusca, Brachiopoda, and Anthozoa. The wide distribution of the conserved sequences in the animal phylogenetic tree suggested that metazoan ancestors had already acquired a set of conserved domains of the current Msx family genes. Interestingly, although strongly conserved sequences were recovered from the Vertebrata, Cephalochordata, and Anthozoa, the sequences from the Urochordata and Hydrozoa showed weak conservation. Because the Vertebrata-Cephalochordata-Urochordata and Anthozoa-Hydrozoa represent sister groups in the Chordata and Cnidaria, respectively, Msx sequence diversification may have occurred differentially in the course of evolution. We speculate that selective loss of the conserved domains in Msx family proteins contributed to the diversification of animal body organization.
Evidence for 5S rDNA Horizontal Transfer in the toadfish Halobatrachus didactylus (Schneider, 1801) based on the analysis of three multigene families

PubMed Central

2012-01-01

Background The Batrachoididae family is a group of marine teleosts that includes several species with more complicated physiological characteristics, such as their excretory, reproductive, cardiovascular and respiratory systems. Previous studies of the 5S rDNA gene family carried out in four species from the Western Atlantic showed two types of this gene in two species but only one in the other two, under processes of concerted evolution and birth-and-death evolution with purifying selection. Here we present results of the 5S rDNA and another two gene families in Halobatrachus didactylus, an Eastern Atlantic species, and draw evolutionary inferences regarding the gene families. In addition we have also mapped the genes on the chromosomes by two-colour fluorescence in situ hybridization (FISH). Results Two types of 5S rDNA were observed, named type α and type β. Molecular analysis of the 5S rDNA indicates that H. didactylus does not share the non-transcribed spacer (NTS) sequences with four other species of the family; therefore, it must have evolved in isolation. Amplification with the type β specific primers amplified a specific band in 9 specimens of H. didactylus and two of Sparus aurata. Both types showed regulatory regions and a secondary structure which mark them as functional genes. However, the U2 snRNA gene and the ITS-1 sequence showed one electrophoretic band and with one type of sequence. The U2 snRNA sequence was the most variable of the three multigene families studied. Results from two-colour FISH showed no co-localization of the gene coding from three multigene families and provided the first map of the chromosomes of the species. Conclusions A highly significant finding was observed in the analysis of the 5S rDNA, since two such distant species as H. didactylus and Sparus aurata share a 5S rDNA type. This 5S rDNA type has been detected in other species belonging to the Batrachoidiformes and Perciformes orders, but not in the Pleuronectiformes and Clupeiformes orders. Two hypotheses have been outlined: one is the possible vertical permanence of the shared type in some fish lineages, and the other is the possibility of a horizontal transference event between ancient species of the Perciformes and Batrachoidiformes orders. This finding opens a new perspective in fish evolution and in the knowledge of the dynamism of the 5S rDNA. Cytogenetic analysis allowed some evolutionary trends to be roughed out, such as the progressive change in the U2 snDNA and the organization of (GATA)n repeats, from dispersed to localized in one locus. The accumulation of (GATA)n repeats in one chromosome pair could be implicated in the evolution of a pair of proto-sex chromosomes. This possibility could situate H. didactylus as the most highly evolved of the Batrachoididae family in terms of sex chromosome biology. PMID:23039906
Genomic characterization and taxonomic position of a rhabdovirus from a hybrid snakehead.

PubMed

Zeng, Weiwei; Wang, Qing; Wang, Yingying; Liu, Cun; Liang, Hongru; Fang, Xiang; Wu, Shuqin

2014-09-01

A new rhabdovirus, tentatively designated as hybrid snakehead rhabdovirus C1207 (HSHRV-C1207), was first isolated from a moribund hybrid snakehead (Channa maculata×Channa argus) in China. We present the complete genome sequence of HSHRV-C1207 and a comprehensive sequence comparison between HSHRV-C1207 and other rhabdoviruses. Sequence alignment and phylogenetic analysis revealed that HSHRV-C1207 shared the highest degree of homology with Monopterus albus rhabdovirus and Siniperca chuatsi rhabdovirus. All three viruses clustered into a single group that was distinct from the recognized genera in the family Rhabdoviridae. Our analysis suggests that HSHRV-C1207, as well as MARV and SCRV, should be assigned to a new rhabdovirus genus.
Characterization of Urtica dioica agglutinin isolectins and the encoding gene family.

PubMed

Does, M P; Ng, D K; Dekker, H L; Peumans, W J; Houterman, P M; Van Damme, E J; Cornelissen, B J

1999-01-01

Urtica dioica agglutinin (UDA) has previously been found in roots and rhizomes of stinging nettles as a mixture of UDA-isolectins. Protein and cDNA sequencing have shown that mature UDA is composed of two hevein domains and is processed from a precursor protein. The precursor contains a signal peptide, two in-tandem hevein domains, a hinge region and a carboxyl-terminal chitinase domain. Genomic fragments encoding precursors for UDA-isolectins have been amplified by five independent polymerase chain reactions on genomic DNA from stinging nettle ecotype Weerselo. One amplified gene was completely sequenced. As compared to the published cDNA sequence, the genomic sequence contains, besides two basepair substitutions, two introns located at the same positions as in other plant chitinases. By partial sequence analysis of 40 amplified genes, 16 different genes were identified which encode seven putative UDA-isolectins. The deduced amino acid sequences share 78.9-98.9% identity. In extracts of roots and rhizomes of stinging nettle ecotype Weerselo six out of these seven isolectins were detected by mass spectrometry. One of them is an acidic form, which has not been identified before. Our results demonstrate that UDA is encoded by a large gene family.
Homoeologous cloning of omega-secalin gene family in a wheat 1BL/1RS translocation.

PubMed

Chai, Jian Fang; Liu, Xu; Jia, Ji Zeng

2005-08-01

Wheat 1BL/1RS translocations are widely planted in China as well as in most of the wheat producing area in the world for their good qualities of disease resistance and high yield. 1BL/1RS translocations are however poor in bread making, partially caused by a family of small monomeric proteins, omega-secalins, which are encoded by genes on 1RS. Based on published sequence of a rye omega-secalin gene we designed a pair of primers to cover the whole mature protein coding sequence. A major band could be amplified from 1BL/1RS translocations but not from euploid wheat. Using this primer set we conducted PCR amplification by using high fidelity Pfu polymerase on the genomic DNAs and cDNAs purified from a 1BL/1RS translocation Lankao 906. Sequencing analysis indicated that this gene family contains several members of 1150 bp, 1076 bp, 1075 bp, 1052 bp and 1004 bp genes, including two pseudogenes and three active genes. The gene transcripts were differentially expressed in developing seeds.
Type III Bartter-like syndrome in an infant boy with Gitelman syndrome and autosomal dominant familial neurohypophyseal diabetes insipidus.

PubMed

Brugnara, Milena; Gaudino, Rossella; Tedeschi, Silvana; Syrèn, Marie-Louise; Perrotta, Silverio; Maines, Evelina; Zaffanello, Marco

2014-09-01

We report the case of an infant boy with polyuria and a familial history of central diabetes insipidus. Laboratory blood tests disclosed hypokalemia, metabolic alkalosis, hyperreninemia, and hyperaldosteronism. Plasma magnesium concentration was slightly low. Urine analysis showed hypercalciuria, hyposthenuria, and high excretion of potassium. Such findings oriented toward type III Bartter syndrome (BSIII). Direct sequencing of the CLCNKB gene revealed no disease-causing mutations. The water deprivation test was positive. Magnetic resonance imaging showed a lack of posterior pituitary hyperintensity. Finally, direct sequencing of the AVP-NPII gene showed a point mutation (c.1884G>A) in a heterozygous state, confirming an autosomal dominant familial neurohypophyseal diabetes insipidus (adFNDI). This condition did not explain the patient's phenotype; thus, we investigated for Gitelman syndrome (GS). A direct sequencing of the SLC12A3 gene showed c.269A>C and c.1205C>A new mutations. In conclusion, the patient had a genetic combination of GS and adFNDI with a BSIII-like phenotype.
Molecular phylogeny of Systellognatha (Plecoptera: Arctoperlaria) inferred from mitochondrial genome sequences.

PubMed

Chen, Zhi-Teng; Zhao, Meng-Yuan; Xu, Cheng; Du, Yu-Zhou

2018-05-01

The infraorder Systellognatha is the most species-rich clade in the insect order Plecoptera and includes six families in two superfamilies: Pteronarcyoidea (Pteronarcyidae, Peltoperlidae, and Styloperlidae) and Perloidea (Perlidae, Perlodidae, and Chloroperlidae). To resolve the debatable phylogeny of Systellognatha, we carried out the first mitochondrial phylogenetic analysis covering all the six families, including three newly sequenced mitogenomes from two families (Perlodidae and Peltoperlidae) and 15 published mitogenomes. The three newly reported mitogenomes share conserved mitogenomic features with other sequenced stoneflies. For phylogenetic analyses, we assembled five datasets with two inference methods to assess their influence on topology and nodal support within Systellognatha. The results indicated that inclusion of the third codon positions of PCGs, exclusion of rRNA genes, the use of nucleotide datasets and Bayesian inference could improve the phylogenetic reconstruction of Systellognatha. The monophyly of Perloidea was supported in the mitochondrial phylogeny, but Pteronarcyoidea was recovered as paraphyletic and remained controversial. In this mitochondrial phylogenetic study, the relationships within Systellognatha were recovered as (((Perlidae + (Perlodidae + Chloroperlidae)) + (Pteronarcyidae + Styloperlidae)) + Peltoperlidae). Copyright © 2018 Elsevier B.V. All rights reserved.
Structural organization and classification of cytochrome P450 genes in flax (Linum usitatissimum L.).

PubMed

Babu, Peram Ravindra; Rao, Khareedu Venkateswara; Reddy, Vudem Dashavantha

2013-01-15

Flax CYPome analysis resulted in the identification of 334 putative cytochrome P450 (CYP450) genes in the cultivated flax genome. Classification of flax CYP450 genes based on the sequence similarity with Arabidopsis orthologs and CYP450 nomenclature, revealed 10 clans representing 44 families and 98 subfamilies. CYP80, CYP83, CYP92, CYP702, CYP705, CYP708, CYP728, CYP729, CYP733 and CYP736 families are absent in the flax genome. The subfamily members exhibited conserved sequences, length of exons and phasing of introns. Similarity search of the genomic resources of wild flax species Linum bienne with CYP450 coding sequences of the cultivated flax, revealed the presence of 127 CYP450 gene orthologs, indicating amplification of novel CYP450 genes in the cultivated flax. Seven families CYP73, 74, 75, 76, 77, 84 and 709, coding for enzymes associated with phenylpropanoid/fatty acid metabolism, showed extensive gene amplification in the flax. About 59% of the flax CYP450 genes were present in the EST libraries. Copyright © 2012 Elsevier B.V. All rights reserved.
A novel DSPP mutation causes dentinogenesis imperfecta type II in a large Mongolian family

PubMed Central

2010-01-01

Background Several studies have shown that the clinical phenotypes of dentinogenesis imperfecta type II (DGI-II) may be caused by mutations in dentin sialophosphoprotein (DSPP). However, no previous studies have documented the clinical phenotype and genetic basis of DGI-II in a Mongolian family from China. Methods We identified a large five-generation Mongolian family from China with DGI-II, comprising 64 living family members of whom 22 were affected. Linkage analysis of five polymorphic markers flanking DSPP gene was used to genotype the families and to construct the haplotypes of these families. All five DSPP exons including the intron-exon boundaries were PCR-amplified and sequenced in 48 members of this large family. Results All affected individuals showed discoloration and severe attrition of their teeth, with obliterated pulp chambers and without progressive high frequency hearing loss or skeletal abnormalities. No recombination was found at five polymorphic markers flanking DSPP in the family. Direct DNA sequencing identified a novel A→G transition mutation adjacent to the donor splicing site within intron 3 in all affected individuals but not in the unaffected family members and 50 unrelated Mongolian individuals. Conclusion This study identified a novel mutation (IVS3+3A→G) in DSPP, which caused DGI-II in a large Mongolian family. This expands the spectrum of mutations leading to DGI-II. PMID:20146806

Phylogenetic Status of an Unrecorded Species of Curvularia, C. spicifera, Based on Current Classification System of Curvularia and Bipolaris Group Using Multi Loci.

PubMed

Jeon, Sun Jeong; Nguyen, Thi Thuong Thuong; Lee, Hyang Burm

2015-09-01

A seed-borne fungus, Curvularia sp. EML-KWD01, was isolated from an indigenous wheat seed by standard blotter method. This fungus was characterized based on the morphological characteristics and molecular phylogenetic analysis. Phylogenetic status of the fungus was determined using sequences of three loci: rDNA internal transcribed spacer, large ribosomal subunit, and glyceraldehyde 3-phosphate dehydrogenase gene. Multi loci sequencing analysis revealed that this fungus was Curvularia spicifera within Curvularia group 2 of family Pleosporaceae.
Exome Sequencing Links Mutations in PARN and RTEL1 with Familial Pulmonary Fibrosis and Telomere Shortening

PubMed Central

Stuart, Bridget D.; Choi, Jungmin; Zaidi, Samir; Xing, Chao; Holohan, Brody; Chen, Rui; Choi, Mihwa; Dharwadkar, Pooja; Torres, Fernando; Girod, Carlos E.; Weissler, Jonathan; Fitzgerald, John; Kershaw, Corey; Klesney-Tait, Julia; Mageto, Yolanda; Shay, Jerry W.; Ji, Weizhen; Bilguvar, Kaya; Mane, Shrikant; Lifton, Richard P.; Garcia, Christine Kim

2015-01-01

Idiopathic pulmonary fibrosis (IPF) is an age-related disease featuring progressive lung scarring. To elucidate the molecular basis of IPF, we performed exome sequencing of familial pulmonary fibrosis kindreds. Gene burden analysis comparing 78 European cases and 2,816 controls implicated PARN, an exoribonuclease with no prior connection to telomere biology or disease, with five novel heterozygous damaging mutations in unrelated cases and none in controls (P-value = 1.3 × 10−8); mutations were shared by all affected relatives (odds in favor of linkage = 4,096:1). RTEL1, an established locus for dyskeratosis congenita, harbored significantly more novel damaging and missense variants at conserved residues in cases than controls (P = 1.6 × 10−6). PARN and RTEL1 mutation carriers had shortened leukocyte telomere lengths and epigenetic inheritance of short telomeres was seen in family members. Together these genes explain ~7% of familial pulmonary fibrosis and strengthen the link between lung fibrosis and telomere dysfunction. PMID:25848748
Exome sequencing links mutations in PARN and RTEL1 with familial pulmonary fibrosis and telomere shortening.

PubMed

Stuart, Bridget D; Choi, Jungmin; Zaidi, Samir; Xing, Chao; Holohan, Brody; Chen, Rui; Choi, Mihwa; Dharwadkar, Pooja; Torres, Fernando; Girod, Carlos E; Weissler, Jonathan; Fitzgerald, John; Kershaw, Corey; Klesney-Tait, Julia; Mageto, Yolanda; Shay, Jerry W; Ji, Weizhen; Bilguvar, Kaya; Mane, Shrikant; Lifton, Richard P; Garcia, Christine Kim

2015-05-01

Idiopathic pulmonary fibrosis (IPF) is an age-related disease featuring progressive lung scarring. To elucidate the molecular basis of IPF, we performed exome sequencing of familial kindreds with pulmonary fibrosis. Gene burden analysis comparing 78 European cases and 2,816 controls implicated PARN, an exoribonuclease with no previous connection to telomere biology or disease, with five new heterozygous damaging mutations in unrelated cases and none in controls (P = 1.3 × 10(-8)); mutations were shared by all affected relatives (odds in favor of linkage = 4,096:1). RTEL1, an established locus for dyskeratosis congenita, harbored significantly more new damaging and missense variants at conserved residues in cases than in controls (P = 1.6 × 10(-6)). PARN and RTEL1 mutation carriers had shortened leukocyte telomere lengths, and we observed epigenetic inheritance of short telomeres in family members. Together, these genes explain ~7% of familial pulmonary fibrosis and strengthen the link between lung fibrosis and telomere dysfunction.
Linkage and association study of late-onset Alzheimer disease families linked to 9p21.3.

PubMed

Züchner, S; Gilbert, J R; Martin, E R; Leon-Guerrero, C R; Xu, P-T; Browning, C; Bronson, P G; Whitehead, P; Schmechel, D E; Haines, J L; Pericak-Vance, M A

2008-11-01

A chromosomal locus for late-onset Alzheimer disease (LOAD) has previously been mapped to 9p21.3. The most significant results were reported in a sample of autopsy-confirmed families. Linkage to this locus has been independently confirmed in AD families from a consanguineous Israeli-Arab community. In the present study we analyzed an expanded clinical sample of 674 late-onset AD families, independently ascertained by three different consortia. Sample subsets were stratified by site and autopsy-confirmation. Linkage analysis of a dense array of SNPs across the chromosomal locus revealed the most significant results in the 166 autopsy-confirmed families of the NIMH sample. Peak HLOD scores of 4.95 at D9S741 and 2.81 at the nearby SNP rs2772677 were obtained in a dominant model. The linked region included the cyclin-dependent kinase inhibitor 2A gene (CDKN2A), which has been suggested as an AD candidate gene. By re-sequencing all exons in the vicinity of CDKN2A in 48 AD cases, we identified and genotyped four novel SNPs, including a non-synonymous, a synonymous, and two variations located in untranslated RNA sequences. Family-based allelic and genotypic association analysis yielded significant results in CDKN2A (rs11515: PDT p = 0.003, genotype-PDT p = 0.014). We conclude that CDKN2A is a promising new candidate gene potentially contributing to AD susceptibility on chromosome 9p.
Genetic analysis of fructose-1,6-bisphosphatase (FBPase) deficiency in nine consanguineous Pakistani families.

PubMed

Ijaz, Sadaqat; Zahoor, Muhammad Yasir; Imran, Muhammad; Ramzan, Khushnooda; Bhinder, Munir Ahmad; Shakeel, Hussain; Iqbal, Muhammad; Aslam, Asim; Shehzad, Wasim; Cheema, Huma Arshad; Rehman, Habib

2017-10-26

Fructose-1,6-bisphosphatase (FBPase) deficiency is a rare inherited metabolic disorder characterized by recurrent episodes of hypoglycemia, ketosis and lactic acidosis. FBPase is encoded by FBP1 gene and catalyzes the hydrolysis of fructose-1,6-bisphosphate to fructose-6-phosphate in the last step of gluconeogenesis. We report here FBP1 mutations in nine consanguineous Pakistani families affected with FBPase deficiency. Nine families having one or two individuals affected with FBPase deficiency were enrolled over a period of 3 years. All FBP1 exonic regions including splicing sites were PCR-amplified and sequenced bidirectionally. Familial cosegregation of mutations with disease was confirmed by direct sequencing and PCR-RFLP analysis. Three different FBP1 mutations were identified. Each of two previously reported mutations (c.472C>T (p.Arg158Trp) and c.841G>A (p.Glu281Lys)) was carried by four different families. The ninth family carried a novel 4-bp deletion (c.609_612delAAAA), which is predicted to result in frameshift (p.Lys204Argfs*72) and loss of FBPase function. The novel variant was not detected in any of 120 chromosomes from normal ethnically matched individuals. FBPase deficiency is often fatal in the infancy and early childhood. Early diagnosis and prompt treatment is therefore crucial to preventing early mortality. We recommend the use of c.472C>T and c.841G>A mutations as first choice genetic markers for molecular diagnosis of FBPase deficiency in Pakistan.
A Novel Missense Mutation of Doublecortin: Mutation Analysis of Korean Patients with Subcortical Band Heterotopia

PubMed Central

Kim, Myeong-Kyu; Park, Man-Seok; Kim, Byeong-Chae; Cho, Ki-Hyun; Kim, Young-Seon; Kim, Jin-Hee; Heo, Tag; Kim, Eun-Young

2005-01-01

The neuronal migration disorders, X-linked lissencephaly syndrome (XLIS) and subcortical band heterotopia (SBH), also called "double cortex", have been linked to missense, nonsense, aberrant splicing, deletion, and insertion mutations in doublecortin (DCX) in families and sporadic cases. Most DCX mutations identified to date are located in two evolutionarily conserved domains. We performed mutation analysis of DCX in two Korean patients with SBH. The SBH patients had mild to moderate developmental delays, drug-resistant generalized seizures, and diffuse thick SBH upon brain MRI. Sequence analysis of the DCX coding region in Patient 1 revealed a c.386 C>T change in exon 3. The sequence variation results in a serine to leucine amino acid change at position 129 (S129L), which has not been found in other family members of Patient 1 or in a large panel of 120 control X-chromosomes. We report here a novel c.386 C>T mutation of DCX that is responsible for SBH. PMID:16100463
Comprehensive analysis of orthologous protein domains using the HOPS database.

PubMed

Storm, Christian E V; Sonnhammer, Erik L L

2003-10-01

One of the most reliable methods for protein function annotation is to transfer experimentally known functions from orthologous proteins in other organisms. Most methods for identifying orthologs operate on a subset of organisms with a completely sequenced genome, and treat proteins as single-domain units. However, it is well known that proteins are often made up of several independent domains, and there is a wealth of protein sequences from genomes that are not completely sequenced. A comprehensive set of protein domain families is found in the Pfam database. We wanted to apply orthology detection to Pfam families, but first some issues needed to be addressed. First, orthology detection becomes impractical and unreliable when too many species are included. Second, shorter domains contain less information. It is therefore important to assess the quality of the orthology assignment and avoid very short domains altogether. We present a database of orthologous protein domains in Pfam called HOPS: Hierarchical grouping of Orthologous and Paralogous Sequences. Orthology is inferred in a hierarchic system of phylogenetic subgroups using ortholog bootstrapping. To avoid the frequent errors stemming from horizontally transferred genes in bacteria, the analysis is presently limited to eukaryotic genes. The results are accessible in the graphical browser NIFAS, a Java tool originally developed for analyzing phylogenetic relations within Pfam families. The method was tested on a set of curated orthologs with experimentally verified function. In comparison to tree reconciliation with a complete species tree, our approach finds significantly more orthologs in the test set. Examples for investigating gene fusions and domain recombination using HOPS are given.
Contrasting patterns of evolution of 45S and 5S rDNA families uncover new aspects in the genome constitution of the agronomically important grass Thinopyrum intermedium (Triticeae).

PubMed

Mahelka, Václav; Kopecky, David; Baum, Bernard R

2013-09-01

We employed sequencing of clones and in situ hybridization (genomic and fluorescent in situ hybridization [GISH and rDNA-FISH]) to characterize both the sequence variation and genomic organization of 45S (herein ITS1-5.8S-ITS2 region) and 5S (5S gene + nontranscribed spacer) ribosomal DNA (rDNA) families in the allohexaploid grass Thinopyrum intermedium. Both rDNA families are organized within several rDNA loci within all three subgenomes of the allohexaploid species. Both families have undergone different patterns of evolution. The 45S rDNA family has evolved in a concerted manner: internal transcribed spacer (ITS) sequences residing within the arrays of two subgenomes out of three got homogenized toward one major ribotype, whereas the third subgenome contained a minor proportion of distinct unhomogenized copies. Homogenization mechanisms such as unequal crossover and/or gene conversion were coupled with the loss of certain 45S rDNA loci. Unlike in the 45S family, the data suggest that neither interlocus homogenization among homeologous chromosomes nor locus loss occurred in 5S rDNA. Consistently with other Triticeae, the 5S rDNA family in intermediate wheatgrass comprised two distinct array types-the long- and short-spacer unit classes. Within the long and short units, we distinguished five and three different types, respectively, likely representing homeologous unit classes donated by putative parental species. Although the major ITS ribotype corresponds in our phylogenetic analysis to the E-genome species, the minor ribotype corresponds to Dasypyrum. 5S sequences suggested the contributions from Pseudoroegneria, Dasypyrum, and Aegilops. The contribution from Aegilops to the intermediate wheatgrass' genome is a new finding with implications in wheat improvement. We discuss rDNA evolution and potential origin of intermediate wheatgrass.
SEPT9 Mutations and a Conserved 17q25 Sequence in Sporadic and Hereditary Brachial Plexus Neuropathy

PubMed Central

Klein, Christopher J.; Wu, Yanhong; Cunningham, Julie M.; Windebank, Anthony J.; Dyck, P. James B.; Friedenberg, Scott M.; Klein, Diane M.; Dyck, Peter J.

2009-01-01

Background The clinical characteristics of sporadic brachial plexus neuropathy (S-BPN) and hereditary brachial plexus neuropathy (H-BPN) are similar. At times of attack inflammation in brachial plexus nerves has been identified in both conditions. SEPT-9 mutations (Arg88Trp, Ser93Phe, 5UTR-131G to C) occur in some families with H-BPN. These mutations were not found in American H-BPN kindreds with a conserved 500 Kb sequence of DNA at 17q25 (the location of SEPT-9) where a founder mutation has been suggested. Objective To study 17q25 and SEPT-9 in S-BPN (56 patients) and H-BPN (13 kindreds). Methods Allele analysis at 17q25, SEPT-9 DNA sequencing and mRNA analysis from lymphoblast cultures. Results A conserved 17q25 sequence was found in 5 of 13 H-BPN kindreds and one S-BPN patient. This conserved sequence was not found in the family with a SEPT-9 mutation (Arg88Trp) or controls (182). SEPT-9 mRNA expression did not differ between forms of H-BPN and controls. No known mutations of SEPT-9 were found in S-BPN. Conclusions/Relevance Rare S-BPN patients have the same conserved 17q25 sequence found in many American H-BPN kindreds. BPN patients with this conserved sequence do not appear to have SEPT-9 mutations or alterations of its mRNA expression levels in lymphoblast cultures. BPN patients with this conserved sequence may have the most common genetic cause in the Americas by a founder effect mutation. PMID:19204161
[Gene mutation analysis and prenatal diagnosis of a family with Bartter syndrome].

PubMed

Li, Long; Ma, Na; Li, Xiu-Rong; Gong, Fei; DU, Juan

2016-08-01

To investigate the mutation of related genes and prenatal diagnosis of a family with Bartter syndrome (BS). The high-throughput capture sequencing technique and PCR-Sanger sequencing were used to detect pathogenic genes in the proband of this family and analyze the whole family at the genomic level. After the genetic cause was clarified, the amniotic fluid was collected from the proband's mother who was pregnant for 5 months for prenatal diagnosis. The proband carried compound heterozygous mutations of c.88C>T(p.Arg30*) and c.968+2T>A in the CLCNKB gene; c.88C>T(p.Arg30*) had been reported as a pathogenic mutation, and c.968+2T>A was a new mutation. Pedigree analysis showed that the two mutations were inherited from the mother and father, respectively. Prenatal diagnosis showed that the fetus did not inherit the mutations from parents and had no mutations at the two loci. The follow-up visit confirmed that the infant was in a healthy state, which proved the accuracy of genetic diagnosis and prenatal diagnosis. The compound heterozygous mutations c.88C>T(p.Arg30*) and c.968+2T>A in the CLCNKB gene are the cause of BS in the proband, and prenatal diagnosis can prevent the risk of recurrence of BS in this family.
[Analysis of USH2A gene mutation in a Chinese family affected with Usher syndrome].

PubMed

Li, Pengcheng; Liu, Fei; Zhang, Mingchang; Wang, Qiufen; Liu, Mugen

2015-08-01

To investigate the disease-causing mutation in a Chinese family affected with Usher syndrome type II. All of the 11 members from the family underwent comprehensive ophthalmologic examination and hearing test, and their genomic DNA were isolated from venous leukocytes. PCR and direct sequencing of USH2A gene were performed for the proband. Wild type and mutant type minigene vectors containing exon 42, intron 42 and exon 43 of the USH2A gene were constructed and transfected into Hela cells by lipofectamine reagent. Reverse transcription (RT)-PCR was carried out to verify the splicing of the minigenes. Pedigree analysis and clinical diagnosis indicated that the patients have suffered from autosomal recessive Usher syndrome type II. DNA sequencing has detected a homozygous c.8559-2A>G mutation of the USH2A gene in the proband, which has co-segregated with the disease in the family. The mutation has affected a conserved splice site in intron 42, which has led to inactivation of the splice site. Minigene experiment has confirmed the retaining of intron 42 in mature mRNA. The c.8559-2A>G mutation in the USH2A gene probably underlies the Usher syndrome type II in this family. The splice site mutation has resulted in abnormal splicing of USH2A pre-mRNA.
A novel mutation in FRMD7 causing X-linked idiopathic congenital nystagmus in a large family

PubMed Central

He, Xiang; Gu, Feng; Wang, Yujing; Yan, Jinting; Zhang, Meng; Huang, Shangzhi

2008-01-01

Purpose To identify the gene responsible for causing an X-linked idiopathic congenital nystagmus (XLICN) in a six-generation Chinese family. Methods Forty-nine members of an XLICN family were recruited and examined after obtaining informed consent. Affected male individuals were genotyped with microsatellite markers around the FRMD7 locus. Mutations were comprehensively screened by direct sequencing using gene specific primers. An X-inactivation pattern was investigated by X chromosome methylation analysis. Results The patients showed phenotypes consistent with XLICN. Genotype analysis showed that male affected individuals in the family shared a common haplotype with the selected markers. Sequencing FRMD7 revealed a G>T transversion (c.812G>T) in exon 9, which caused a conservative substitution of Cys to Phe at codon 271 (p.C271F). This mutation co-segregated with all affected individuals and was present in the obligate, non-penetrant female carriers. However, the mutation was not observed in unaffected familial males or 400 control males. Females with the mutant gene could be affected or carrier and they shared the same inactivated X chromosome harboring the mutation in blood cells, which showed there is no clear causal link between X-inactivation pattern and phenotype. Conclusions We identified a novel mutation in FRMD7 and confirmed the role of this mutation in the pathogenesis of X-linked congenital nystagmus. PMID:18246032
Novel gene fusion of PRCC-MITF defines a new member of MiT family translocation renal cell carcinoma: clinicopathological analysis and detection of the gene fusion by RNA sequencing and FISH.

PubMed

Xia, Qiu-Yuan; Wang, Xiao-Tong; Ye, Sheng-Bing; Wang, Xuan; Li, Rui; Shi, Shan-Shan; Fang, Ru; Zhang, Ru-Song; Ma, Heng-Hui; Lu, Zhen-Feng; Shen, Qin; Bao, Wei; Zhou, Xiao-Jun; Rao, Qiu

2018-04-01

MITF, TFE3, TFEB and TFEC belong to the same microphthalmia-associated transcription factor family (MiT). Two transcription factors in this family have been identified in two unusual types of renal cell carcinoma (RCC): Xp11 translocation RCC harbouring TFE3 gene fusions and t(6;11) RCC harbouring a MALAT1-TFEB gene fusion. The 2016 World Health Organisation classification of renal neoplasia grouped these two neoplasms together under the category of MiT family translocation RCC. RCCs associated with the other two MiT family members, MITF and TFEC, have rarely been reported. Herein, we identify a case of MITF translocation RCC with the novel PRCC-MITF gene fusion by RNA sequencing. Histological examination of the present tumour showed typical features of MiT family translocation RCCs, overlapping with Xp11 translocation RCC and t(6;11) RCC. However, this tumour showed negative results in TFE3 and TFEB immunochemistry and split fluorescence in-situ hybridisation (FISH) assays. The other MiT family members, MITF and TFEC, were tested further immunochemically and also showed negative results. RNA sequencing and reverse transcription-polymerase chain reaction confirmed the presence of a PRCC-MITF gene fusion: a fusion of PRCC exon 5 to MITF exon 4. We then developed FISH assays covering MITF break-apart probes and PRCC-MITF fusion probes to detect the MITF gene rearrangement. This study both proves the recurring existence of MITF translocation RCC and expands the genotype spectrum of MiT family translocation RCCs. © 2017 John Wiley & Sons Ltd.
Deep Sequencing Reveals the Complete Genome and Evidence for Transcriptional Activity of the First Virus-Like Sequences Identified in Aristotelia chilensis (Maqui Berry)

PubMed Central

Villacreses, Javier; Rojas-Herrera, Marcelo; Sánchez, Carolina; Hewstone, Nicole; Undurraga, Soledad F.; Alzate, Juan F.; Manque, Patricio; Maracaja-Coutinho, Vinicius; Polanco, Victor

2015-01-01

Here, we report the genome sequence and evidence for transcriptional activity of a virus-like element in the native Chilean berry tree Aristotelia chilensis. We propose to name the endogenous sequence as Aristotelia chilensis Virus 1 (AcV1). High-throughput sequencing of the genome of this tree uncovered an endogenous viral element, with a size of 7122 bp, corresponding to the complete genome of AcV1. Its sequence contains three open reading frames (ORFs): ORFs 1 and 2 shares 66%–73% amino acid similarity with members of the Caulimoviridae virus family, especially the Petunia vein clearing virus (PVCV), Petuvirus genus. ORF1 encodes a movement protein (MP); ORF2 a Reverse Transcriptase (RT) and a Ribonuclease H (RNase H) domain; and ORF3 showed no amino acid sequence similarity with any other known virus proteins. Analogous to other known endogenous pararetrovirus sequences (EPRVs), AcV1 is integrated in the genome of Maqui Berry and showed low viral transcriptional activity, which was detected by deep sequencing technology (DNA and RNA-seq). Phylogenetic analysis of AcV1 and other pararetroviruses revealed a closer resemblance with Petuvirus. Overall, our data suggests that AcV1 could be a new member of Caulimoviridae family, genus Petuvirus, and the first evidence of this kind of virus in a fruit plant. PMID:25855242
A novel WFS1 mutation in a family with dominant low frequency sensorineural hearing loss with normal VEMP and EcochG findings

PubMed Central

Bramhall, Naomi F; Kallman, Jeremy C; Verrall, Aimee M; Street, Valerie A

2008-01-01

Background Low frequency sensorineural hearing loss (LFSNHL) is an uncommon clinical finding. Mutations within three different identified genes (DIAPH1, MYO7A, and WFS1) are known to cause LFSNHL. The majority of hereditary LFSNHL is associated with heterozygous mutations in the WFS1 gene (wolframin protein). The goal of this study was to use genetic analysis to determine if a small American family's hereditary LFSNHL is linked to a mutation in the WFS1 gene and to use VEMP and EcochG testing to further characterize the family's audiovestibular phenotype. Methods The clinical phenotype of the American family was characterized by audiologic testing, vestibular evoked myogenic potentials (VEMP), and electrocochleography (EcochG) evaluation. Genetic characterization was performed by microsatellite analysis and direct sequencing of WFS1 for mutation detection. Results Sequence analysis of the WFS1 gene revealed a novel heterozygous mutation at c.2054G>C predicting a p.R685P amino acid substitution in wolframin. The c.2054G>C mutation segregates faithfully with hearing loss in the family and is absent in 230 control chromosomes. The p.R685 residue is located within the hydrophilic C-terminus of wolframin and is conserved across species. The VEMP and EcochG findings were normal in individuals segregating the WFS1 c.2054G>C mutation. Conclusion We discovered a novel heterozygous missense mutation in exon 8 of WFS1 predicting a p.R685P amino acid substitution that is likely to underlie the LFSNHL phenotype in the American family. For the first time, we describe VEMP and EcochG findings for individuals segregating a heterozygous WFS1 mutation. PMID:18518985
A novel WFS1 mutation in a family with dominant low frequency sensorineural hearing loss with normal VEMP and EcochG findings.

PubMed

Bramhall, Naomi F; Kallman, Jeremy C; Verrall, Aimee M; Street, Valerie A

2008-06-02

Low frequency sensorineural hearing loss (LFSNHL) is an uncommon clinical finding. Mutations within three different identified genes (DIAPH1, MYO7A, and WFS1) are known to cause LFSNHL. The majority of hereditary LFSNHL is associated with heterozygous mutations in the WFS1 gene (wolframin protein). The goal of this study was to use genetic analysis to determine if a small American family's hereditary LFSNHL is linked to a mutation in the WFS1 gene and to use VEMP and EcochG testing to further characterize the family's audiovestibular phenotype. The clinical phenotype of the American family was characterized by audiologic testing, vestibular evoked myogenic potentials (VEMP), and electrocochleography (EcochG) evaluation. Genetic characterization was performed by microsatellite analysis and direct sequencing of WFS1 for mutation detection. Sequence analysis of the WFS1 gene revealed a novel heterozygous mutation at c.2054G>C predicting a p.R685P amino acid substitution in wolframin. The c.2054G>C mutation segregates faithfully with hearing loss in the family and is absent in 230 control chromosomes. The p.R685 residue is located within the hydrophilic C-terminus of wolframin and is conserved across species. The VEMP and EcochG findings were normal in individuals segregating the WFS1 c.2054G>C mutation. We discovered a novel heterozygous missense mutation in exon 8 of WFS1 predicting a p.R685P amino acid substitution that is likely to underlie the LFSNHL phenotype in the American family. For the first time, we describe VEMP and EcochG findings for individuals segregating a heterozygous WFS1 mutation.
Linkage and mutational analysis of familial Alzheimer disease kindreds for the APP gene region

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kamino, K.; Anderson, L.; O'dahl, S.

1992-11-01

A large number of familial Alzheimer disease (FAD) kindreds were examined to determine whether mutations in the amyloid precursor protein (APP) gene could be responsible for the disease. Previous studies have identified three mutations at APP codon 717 which are pathogenic for Alzheimer disease (AD). Samples from affected subjects were examined for mutations in exons 16 and 17 of the APP gene. A combination of direct sequencing and single-strand conformational polymorphism analysis was used. Sporadic AD and normal controls were also examined by the same methods. Five sequence variants were identified. One variant at APP codon 693 resulted in amore » Glu[yields]Gly change. This is the same codon as the hereditary cerebral hemorrhage with amyloidosis-Dutch type Glu[yields]Gln mutation. Another single-base change at APP codon 708 did not alter the amino acid encoded at this site. Two point mutations and a 6-bp deletion were identified in the intronic sequences surrounding exon 17. None of the variants could be unambigously determined to be responsible for FAD. The larger families were also analyzed by testing for linkage of FAD to a highly polymorphic short tandem repeat marker (D21S210) that is tightly linked to APP. Highly negative LOD scores were obtained for the family groups tested, and linkage was formally excluded beyond [theta] = .10 for the Volga German kindreds, [theta] = .20 for early-onset non-Volga Germans, and [theta] = .10 for late-onset families. LOD scores for linkage of FAD to markers centromeric to APP (D21S1/S11, D21S13, and D21S215) were also negative in the three family groups. These studies show that APP mutations account for AD in only a small fraction of FAD kindreds. 49 refs., 6 figs., 4 tabs.« less
Comparative analysis of six genome sequences of three novel picornaviruses, turdiviruses 1, 2 and 3, in dead wild birds, and proposal of two novel genera, Orthoturdivirus and Paraturdivirus, in the family Picornaviridae.

PubMed

Woo, Patrick C Y; Lau, Susanna K P; Huang, Yi; Lam, Carol S F; Poon, Rosana W S; Tsoi, Hoi-Wah; Lee, Paul; Tse, Herman; Chan, Allen S L; Luk, Geraldine; Chan, Kwok-Hung; Yuen, Kwok-Yung

2010-10-01

In this territory-wide molecular epidemiology study of picornaviruses, involving 6765 dead wild birds from 201 species in 50 families over a 12 month period, three novel picornaviruses, turdiviruses 1, 2 and 3 (TV1, TV2 and TV3), were identified from birds of different genera in the family Turdidae. In contrast to many other viruses in birds of the family Turdidae or viruses of the family Picornaviridae, TV1, TV2 and TV3 were found exclusively in the autumn and winter months. Two genomes each of TV1, TV2 and TV3 were sequenced. Regions P1, P2 and P3 of the three turdiviruses possessed, respectively, <40, <40 and <50 % amino acid identities with those of other picornaviruses. Moreover, P1, P2 and P3 of TV1 also possessed, respectively, <40, <40 and <50 % amino acid identities with those of TV2 and TV3. Phylogenetic analysis revealed that TV1, TV2 and TV3 were distantly related to members of the genus Kobuvirus. Among the three turdiviruses, TV2 and TV3 were always clustered together, with high bootstrap supports of 1000. The genomic features of TV2 and TV3 were also distinct from TV1, including lower G+C contents, shorter leader protein and a preference for codon sequence NNT rather than NNC for amino acids that can use either NNT or NNC as codons (P<0.001 by χ(2)-test). Based on our results we propose two novel genera, Orthoturdivirus for TV1, and Paraturdivirus for TV2 and TV3, in the family Picornaviridae. The type of internal ribosomal entry site for TV1, TV2 and TV3 remains to be determined.
Ready to clone: CNV detection and breakpoint fine-mapping in breast and ovarian cancer susceptibility genes by high-resolution array CGH.

PubMed

Hackmann, Karl; Kuhlee, Franziska; Betcheva-Krajcir, Elitza; Kahlert, Anne-Karin; Mackenroth, Luisa; Klink, Barbara; Di Donato, Nataliya; Tzschach, Andreas; Kast, Karin; Wimberger, Pauline; Schrock, Evelin; Rump, Andreas

2016-10-01

Detection of predisposing copy number variants (CNV) in 330 families affected with hereditary breast and ovarian cancer (HBOC). In order to complement mutation detection with Illumina's TruSight Cancer panel, we designed a customized high-resolution 8 × 60k array for CGH (aCGH) that covers all 94 genes from the panel. Copy number variants with immediate clinical relevance were detected in 12 families (3.6%). Besides 3 known CNVs in CHEK2, RAD51C, and BRCA1, we identified 3 novel pathogenic CNVs in BRCA1 (deletion of exons 4-13, deletion of exons 12-18) and ATM (deletion exons 57-63) plus an intragenic duplication of BRCA2 (exons 3-11) and an intronic BRCA1 variant with unknown pathogenicity. The precision of high-resolution aCGH enabled straight forward breakpoint amplification of a BRCA1 deletion which subsequently allowed for fast and economic CNV verification in family members of the index patient. Furthermore, we used our aCGH data to validate an algorithm that was able to detect all identified copy number changes from next-generation sequencing (NGS) data. Copy number detection is a mandatory analysis in HBOC families at least if no predisposing mutations were found by sequencing. Currently, high-resolution array CGH is our first choice of method of analysis due to unmatched detection precision. Although it seems possible to detect CNV from sequencing data, there currently is no satisfying tool to do so in a routine diagnostic setting.
VPS35 Mutations in Parkinson Disease

PubMed Central

Vilariño-Güell, Carles; Wider, Christian; Ross, Owen A.; Dachsel, Justus C.; Kachergus, Jennifer M.; Lincoln, Sarah J.; Soto-Ortolaza, Alexandra I.; Cobb, Stephanie A.; Wilhoite, Greggory J.; Bacon, Justin A.; Behrouz, Bahareh; Melrose, Heather L.; Hentati, Emna; Puschmann, Andreas; Evans, Daniel M.; Conibear, Elizabeth; Wasserman, Wyeth W.; Aasly, Jan O.; Burkhard, Pierre R.; Djaldetti, Ruth; Ghika, Joseph; Hentati, Faycal; Krygowska-Wajs, Anna; Lynch, Tim; Melamed, Eldad; Rajput, Alex; Rajput, Ali H.; Solida, Alessandra; Wu, Ruey-Meei; Uitti, Ryan J.; Wszolek, Zbigniew K.; Vingerhoets, François; Farrer, Matthew J.

2011-01-01

The identification of genetic causes for Mendelian disorders has been based on the collection of multi-incident families, linkage analysis, and sequencing of genes in candidate intervals. This study describes the application of next-generation sequencing technologies to a Swiss kindred presenting with autosomal-dominant, late-onset Parkinson disease (PD). The family has tremor-predominant dopa-responsive parkinsonism with a mean onset of 50.6 ± 7.3 years. Exome analysis suggests that an aspartic-acid-to-asparagine mutation within vacuolar protein sorting 35 (VPS35 c.1858G>A; p.Asp620Asn) is the genetic determinant of disease. VPS35 is a central component of the retromer cargo-recognition complex, is critical for endosome-trans-golgi trafficking and membrane-protein recycling, and is evolutionarily highly conserved. VPS35 c.1858G>A was found in all affected members of the Swiss kindred and in three more families and one patient with sporadic PD, but it was not observed in 3,309 controls. Further sequencing of familial affected probands revealed only one other missense variant, VPS35 c.946C>T; (p.Pro316Ser), in a pedigree with one unaffected and two affected carriers, and thus the pathogenicity of this mutation remains uncertain. Retromer-mediated sorting and transport is best characterized for acid hydrolase receptors. However, the complex has many types of cargo and is involved in a diverse array of biologic pathways from developmental Wnt signaling to lysosome biogenesis. Our study implicates disruption of VPS35 and retromer-mediated trans-membrane protein sorting, rescue, and recycling in the neurodegenerative process leading to PD. PMID:21763482

Molecular phylogeny of Cyclophyllidea (Cestoda: Eucestoda): an in-silico analysis based on mtCOI gene.

PubMed

Sharma, Sunil; Lyngdoh, Damanbha; Roy, Bishnupada; Tandon, Veena

2016-09-01

Order Cyclophyllidea (of cestode platyhelminths) has a rich diversity of parasites and includes many families and species that are known to cause serious medical condition in humans and domestic and wild animals. Despite various attempts to resolve phylogenetic relationships at the inter-family level, uncertainty remains. In order to add resolution to the existing phylogeny of the order, we generated partial mtCO1 sequences for some commonly occurring cyclophyllidean cestodes and combined them with available sequences from GenBank. Phylogeny was inferred taking a total 83 representative species spanning 8 families using Bayesian analysis. The phylogenetic tree revealed Dilepididae as the most basal taxon and showed early divergence in the phylogenetic tree. Paruterinidae, Taeniidae and Anoplocephalidae showed non-monophyletic assemblage; our result suggests that the family Paruterinidae may represent a polyphyletic group. The diverse family Taeniidae appeared in two separate clades; while one of them included all the members of the genus Echinococcus and also Versteria, the representatives of the genera Taenia and Hydatigera clubbed in the other clade. A close affinity of Dipylidiidae with Taenia and Hydatigera was seen, whereas existence of a close relationship between Mesocestoididae and Echinococcus (of Taeniidae) is also demonstrated. The crown group comprised the families Anoplocephalidae, Davaineidae, Hymenolepididae and Mesocestoididae, and also all species of the genus Echinococcus and Versteria mustelae; monophyly of these families (excepting Anolplocephalidae) and the genus Echinococcus as well as its sister-taxon relation with V. mustelae is also confirmed. Furthermore, non-monophyly of Anoplocephalidae is suggested to be correlated with divergence in the host selection.
Debunking Occam's razor: Diagnosing multiple genetic diseases in families by whole-exome sequencing.

PubMed

Balci, T B; Hartley, T; Xi, Y; Dyment, D A; Beaulieu, C L; Bernier, F P; Dupuis, L; Horvath, G A; Mendoza-Londono, R; Prasad, C; Richer, J; Yang, X-R; Armour, C M; Bareke, E; Fernandez, B A; McMillan, H J; Lamont, R E; Majewski, J; Parboosingh, J S; Prasad, A N; Rupar, C A; Schwartzentruber, J; Smith, A C; Tétreault, M; Innes, A M; Boycott, K M

2017-09-01

Recent clinical whole exome sequencing (WES) cohorts have identified unanticipated multiple genetic diagnoses in single patients. However, the frequency of multiple genetic diagnoses in families is largely unknown. We set out to identify the rate of multiple genetic diagnoses in probands and their families referred for analysis in two national research programs in Canada. We retrospectively analyzed WES results for 802 undiagnosed probands referred over the past 5 years in either the FORGE or Care4Rare Canada WES initiatives. Of the 802 probands, 226 (28.2%) were diagnosed based on mutations in known disease genes. Eight (3.5%) had two or more genetic diagnoses explaining their clinical phenotype, a rate in keeping with the large published studies (average 4.3%; 1.4 - 7.2%). Seven of the 8 probands had family members with one or more of the molecularly diagnosed diseases. Consanguinity and multisystem disease appeared to increase the likelihood of multiple genetic diagnoses in a family. Our findings highlight the importance of comprehensive clinical phenotyping of family members to ultimately provide accurate genetic counseling. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Two Cases of Meningococcal Disease in One Family Separated by an Extended Period - Colorado, 2015-2016.

PubMed

Spence Davizon, Emily; Soeters, Heidi M; Miller, Lisa; Barnes, Meghan

2018-03-30

On April 26, 2015, a case of meningococcal disease in a woman aged 75 years was reported to the Colorado Department of Public Health and Environment (CDPHE). As part of routine public health investigation and control activities, all seven family contacts of the patient were advised to receive appropriate postexposure prophylaxis (PEP) to eradicate nasopharyngeal carriage of meningococci and prevent secondary disease (1), although it is not known whether the family contacts complied with PEP recommendations. Fifteen months later, on June 6, 2016, CDPHE was notified that the grandchild of the first patient, a male infant aged 3 months who lived with the first patient, also had meningococcal disease. The infant's immediate family members (parents and one sibling) were among family contacts for whom PEP was recommended in 2015. Neisseria meningitidis isolates from both patients were found to be serogroup C at the CDPHE laboratory. Whole genome sequence (WGS) analysis at CDC found that both isolates had the same sequence type, indicating close genetic relatedness. These cases represent a possible instance of meningococcal disease transmission within a family, despite appropriate PEP recommendations and with a long interval between cases.
Consolidation of glycosyl hydrolase family 30 : a dual domain 4/7 hydrolase family consisting of two structurally distinct groups

Treesearch

Franz J. St John; Javier M. Gonzalez; Edwin Pozharski

2010-01-01

In this work glycosyl hydrolase (GH) family 30 (GH30) is analyzed and shown to consist of its currently classified member sequences as well as several homologous sequence groups currently assigned within family GH5. A large scale amino acid sequence alignment and a phylogenetic tree were generated and GH30 groups and subgroups were designated. A partial rearrangement...
Application of Broad-Spectrum Resequencing Microarray for Genotyping Rhabdoviruses▿

PubMed Central

Dacheux, Laurent; Berthet, Nicolas; Dissard, Gabriel; Holmes, Edward C.; Delmas, Olivier; Larrous, Florence; Guigon, Ghislaine; Dickinson, Philip; Faye, Ousmane; Sall, Amadou A.; Old, Iain G.; Kong, Katherine; Kennedy, Giulia C.; Manuguerra, Jean-Claude; Cole, Stewart T.; Caro, Valérie; Gessain, Antoine; Bourhy, Hervé

2010-01-01

The rapid and accurate identification of pathogens is critical in the control of infectious disease. To this end, we analyzed the capacity for viral detection and identification of a newly described high-density resequencing microarray (RMA), termed PathogenID, which was designed for multiple pathogen detection using database similarity searching. We focused on one of the largest and most diverse viral families described to date, the family Rhabdoviridae. We demonstrate that this approach has the potential to identify both known and related viruses for which precise sequence information is unavailable. In particular, we demonstrate that a strategy based on consensus sequence determination for analysis of RMA output data enabled successful detection of viruses exhibiting up to 26% nucleotide divergence with the closest sequence tiled on the array. Using clinical specimens obtained from rabid patients and animals, this method also shows a high species level concordance with standard reference assays, indicating that it is amenable for the development of diagnostic assays. Finally, 12 animal rhabdoviruses which were currently unclassified, unassigned, or assigned as tentative species within the family Rhabdoviridae were successfully detected. These new data allowed an unprecedented phylogenetic analysis of 106 rhabdoviruses and further suggest that the principles and methodology developed here may be used for the broad-spectrum surveillance and the broader-scale investigation of biodiversity in the viral world. PMID:20610710
Genomic identification, phylogeny, and expression analysis of MLO genes involved in susceptibility to powdery mildew in Fragaria vesca.

PubMed

Miao, L X; Jiang, M; Zhang, Y C; Yang, X F; Zhang, H Q; Zhang, Z F; Wang, Y Z; Jiang, G H

2016-08-05

The MLO (powdery mildew locus O) gene family is important in resistance to powdery mildew (PM). In this study, all of the members of the MLO family were identified and analyzed in the strawberry (Fragaria vesca) genome. The strawberry contains at least 20 members of the MLO family, and the protein sequence contained between 171 and 1485 amino acids, with 0-34 introns. Chromosomal localization showed that the MLOs were unevenly distributed on each of the chromosomes, except for chromosome 4. The greatest number of MLOs (seven) was found on chromosome 3. A phylogenetic tree showed that the MLOs were divided into seven groups (I-VII), four of which consisted of MLOs from strawberry, Arabidopsis thaliana, rice, and maize, suggesting that these genes may have evolved after the divergence of monocots and dicots. Multiple sequence alignment showed that strawberry MLO candidates related to powdery mildew resistance possessed seven highly conserved transmembrane domains, a calmodulin-binding domain, and two conserved regions, all of which are important domains for powdery mildew resistance genes. Expressed sequence tag analysis revealed that the MLOs were induced by multiple abiotic stressors, including low and high temperature, drought, and high salinity. These findings will contribute to the functional characterization of MLOs related to PM susceptibility, and will assist in the development of disease resistance in strawberries.
Structure-Based Phylogenetic Analysis of the Lipocalin Superfamily.

PubMed

Lakshmi, Balasubramanian; Mishra, Madhulika; Srinivasan, Narayanaswamy; Archunan, Govindaraju

2015-01-01

Lipocalins constitute a superfamily of extracellular proteins that are found in all three kingdoms of life. Although very divergent in their sequences and functions, they show remarkable similarity in 3-D structures. Lipocalins bind and transport small hydrophobic molecules. Earlier sequence-based phylogenetic studies of lipocalins highlighted that they have a long evolutionary history. However the molecular and structural basis of their functional diversity is not completely understood. The main objective of the present study is to understand functional diversity of the lipocalins using a structure-based phylogenetic approach. The present study with 39 protein domains from the lipocalin superfamily suggests that the clusters of lipocalins obtained by structure-based phylogeny correspond well with the functional diversity. The detailed analysis on each of the clusters and sub-clusters reveals that the 39 lipocalin domains cluster based on their mode of ligand binding though the clustering was performed on the basis of gross domain structure. The outliers in the phylogenetic tree are often from single member families. Also structure-based phylogenetic approach has provided pointers to assign putative function for the domains of unknown function in lipocalin family. The approach employed in the present study can be used in the future for the functional identification of new lipocalin proteins and may be extended to other protein families where members show poor sequence similarity but high structural similarity.
Molecular characterization of a family of ligands for eph-related tyrosine kinase receptors.

PubMed Central

Beckmann, M P; Cerretti, D P; Baum, P; Vanden Bos, T; James, L; Farrah, T; Kozlosky, C; Hollingsworth, T; Shilling, H; Maraskovsky, E

1994-01-01

A family of tyrosine kinase receptors related to the product of the eph gene has been described recently. One of these receptors, elk, has been shown to be expressed only in brain and testes. Using a direct expression cloning technique, a ligand for the elk receptor has been isolated by screening a human placenta cDNA library with a fusion protein containing the extracellular domain of the receptor. This isolated cDNA encodes a transmembrane protein. While the sequence of the ligand cDNA is unique, it is related to a previously described sequence known as B61. Northern blot analysis of human tissue mRNA showed that the elk ligand's mRNA is 3.5 kb long and is found in placenta, heart, lung, liver, skeletal muscle, kidney and pancreas. Southern blot analysis showed that the gene is highly conserved in a wide variety of species. Both elk ligand and B61 mRNAs are inducible by tumour necrosis factor in human umbilical vein endothelial cells. In addition, both proteins show promiscuity in binding to the elk and the related hek receptors. Since these two ligand sequences are similar, and since elk and hek are members of a larger family of eph-related receptor molecules, we refer to these ligands as LERKs (ligands for eph-related kinases). Images PMID:8070404
Characterization of Farmington virus, a novel virus from birds that is distantly related to members of the family Rhabdoviridae.

PubMed

Palacios, Gustavo; Forrester, Naomi L; Savji, Nazir; Travassos da Rosa, Amelia P A; Guzman, Hilda; Detoy, Kelly; Popov, Vsevolod L; Walker, Peter J; Lipkin, W Ian; Vasilakis, Nikos; Tesh, Robert B

2013-07-01

Farmington virus (FARV) is a rhabdovirus that was isolated from a wild bird during an outbreak of epizootic eastern equine encephalitis on a pheasant farm in Connecticut, USA. Analysis of the nearly complete genome sequence of the prototype CT AN 114 strain indicates that it encodes the five canonical rhabdovirus structural proteins (N, P, M, G and L) with alternative ORFs (> 180 nt) in the N and G genes. Phenotypic and genetic characterization of FARV has confirmed that it is a novel rhabdovirus and probably represents a new species within the family Rhabdoviridae. In sum, our analysis indicates that FARV represents a new species within the family Rhabdoviridae.
Whole exome sequencing in an Italian family with isolated maxillary canine agenesis and canine eruption anomalies.

PubMed

Barbato, Ersilia; Traversa, Alice; Guarnieri, Rosanna; Giovannetti, Agnese; Genovesi, Maria Luce; Magliozzi, Maria Rosa; Paolacci, Stefano; Ciolfi, Andrea; Pizzi, Simone; Di Giorgio, Roberto; Tartaglia, Marco; Pizzuti, Antonio; Caputo, Viviana

2018-07-01

The aim of this study was the clinical and molecular characterization of a family segregating a trait consisting of a phenotype specifically involving the maxillary canines, including agenesis, impaction and ectopic eruption, characterized by incomplete penetrance and variable expressivity. Clinical standardized assessment of 14 family members and a whole-exome sequencing (WES) of three affected subjects were performed. WES data analyses (sequence alignment, variant calling, annotation and prioritization) were carried out using an in-house implemented pipeline. Variant filtering retained coding and splice-site high quality private and rare variants. Variant prioritization was performed taking into account both the disruptive impact and the biological relevance of individual variants and genes. Sanger sequencing was performed to validate the variants of interest and to carry out segregation analysis. Prioritization of variants "by function" allowed the identification of multiple variants contributing to the trait, including two concomitant heterozygous variants in EDARADD (c.308C>T, p.Ser103Phe) and COL5A1 (c.1588G>A, p.Gly530Ser), specifically associated with a more severe phenotype (i.e. canine agenesis). Differently, heterozygous variants in genes encoding proteins with a role in the WNT pathway were shared by subjects showing a phenotype of impacted/ectopic erupted canines. This study characterized the genetic contribution underlying a complex trait consisting of isolated canine anomalies in a medium-sized family, highlighting the role of WNT and EDA cell signaling pathways in tooth development. Copyright © 2018 Elsevier Ltd. All rights reserved.
A new family of β-helix proteins with similarities to the polysaccharide lyases

DOE PAGES

Close, Devin W.; D'Angelo, Sara; Bradbury, Andrew R. M.

2014-09-27

Microorganisms that degrade biomass produce diverse assortments of carbohydrate-active enzymes and binding modules. Despite tremendous advances in the genomic sequencing of these organisms, many genes do not have an ascribed function owing to low sequence identity to genes that have been annotated. Consequently, biochemical and structural characterization of genes with unknown function is required to complement the rapidly growing pool of genomic sequencing data. A protein with previously unknown function (Cthe_2159) was recently isolated in a genome-wide screen using phage display to identify cellulose-binding protein domains from the biomass-degrading bacterium Clostridium thermocellum. Here, the crystal structure of Cthe_2159 is presentedmore » and it is shown that it is a unique right-handed parallel β-helix protein. Despite very low sequence identity to known β-helix or carbohydrate-active proteins, Cthe_2159 displays structural features that are very similar to those of polysaccharide lyase (PL) families 1, 3, 6 and 9. Cthe_2159 is conserved across bacteria and some archaea and is a member of the domain of unknown function family DUF4353. This suggests that Cthe_2159 is the first representative of a previously unknown family of cellulose and/or acid-sugar binding β-helix proteins that share structural similarities with PLs. More importantly, these results demonstrate how functional annotation by biochemical and structural analysis remains a critical tool in the characterization of new gene products.« less
A new family of β-helix proteins with similarities to the polysaccharide lyases

DOE Office of Scientific and Technical Information (OSTI.GOV)

Close, Devin W.; D'Angelo, Sara; Bradbury, Andrew R. M.

Microorganisms that degrade biomass produce diverse assortments of carbohydrate-active enzymes and binding modules. Despite tremendous advances in the genomic sequencing of these organisms, many genes do not have an ascribed function owing to low sequence identity to genes that have been annotated. Consequently, biochemical and structural characterization of genes with unknown function is required to complement the rapidly growing pool of genomic sequencing data. A protein with previously unknown function (Cthe_2159) was recently isolated in a genome-wide screen using phage display to identify cellulose-binding protein domains from the biomass-degrading bacterium Clostridium thermocellum. Here, the crystal structure of Cthe_2159 is presentedmore » and it is shown that it is a unique right-handed parallel β-helix protein. Despite very low sequence identity to known β-helix or carbohydrate-active proteins, Cthe_2159 displays structural features that are very similar to those of polysaccharide lyase (PL) families 1, 3, 6 and 9. Cthe_2159 is conserved across bacteria and some archaea and is a member of the domain of unknown function family DUF4353. This suggests that Cthe_2159 is the first representative of a previously unknown family of cellulose and/or acid-sugar binding β-helix proteins that share structural similarities with PLs. More importantly, these results demonstrate how functional annotation by biochemical and structural analysis remains a critical tool in the characterization of new gene products.« less
A fully automatic evolutionary classification of protein folds: Dali Domain Dictionary version 3

PubMed Central

Dietmann, Sabine; Park, Jong; Notredame, Cedric; Heger, Andreas; Lappe, Michael; Holm, Liisa

2001-01-01

The Dali Domain Dictionary (http://www.ebi.ac.uk/dali/domain) is a numerical taxonomy of all known structures in the Protein Data Bank (PDB). The taxonomy is derived fully automatically from measurements of structural, functional and sequence similarities. Here, we report the extension of the classification to match the traditional four hierarchical levels corresponding to: (i) supersecondary structural motifs (attractors in fold space), (ii) the topology of globular domains (fold types), (iii) remote homologues (functional families) and (iv) homologues with sequence identity above 25% (sequence families). The computational definitions of attractors and functional families are new. In September 2000, the Dali classification contained 10 531 PDB entries comprising 17 101 chains, which were partitioned into five attractor regions, 1375 fold types, 2582 functional families and 3724 domain sequence families. Sequence families were further associated with 99 582 unique homologous sequences in the HSSP database, which increases the number of effectively known structures several-fold. The resulting database contains the description of protein domain architecture, the definition of structural neighbours around each known structure, the definition of structurally conserved cores and a comprehensive library of explicit multiple alignments of distantly related protein families. PMID:11125048
'DNA Strider': a 'C' program for the fast analysis of DNA and protein sequences on the Apple Macintosh family of computers.

PubMed Central

Marck, C

1988-01-01

DNA Strider is a new integrated DNA and Protein sequence analysis program written with the C language for the Macintosh Plus, SE and II computers. It has been designed as an easy to learn and use program as well as a fast and efficient tool for the day-to-day sequence analysis work. The program consists of a multi-window sequence editor and of various DNA and Protein analysis functions. The editor may use 4 different types of sequences (DNA, degenerate DNA, RNA and one-letter coded protein) and can handle simultaneously 6 sequences of any type up to 32.5 kB each. Negative numbering of the bases is allowed for DNA sequences. All classical restriction and translation analysis functions are present and can be performed in any order on any open sequence or part of a sequence. The main feature of the program is that the same analysis function can be repeated several times on different sequences, thus generating multiple windows on the screen. Many graphic capabilities have been incorporated such as graphic restriction map, hydrophobicity profile and the CAI plot- codon adaptation index according to Sharp and Li. The restriction sites search uses a newly designed fast hexamer look-ahead algorithm. Typical runtime for the search of all sites with a library of 130 restriction endonucleases is 1 second per 10,000 bases. The circular graphic restriction map of the pBR322 plasmid can be therefore computed from its sequence and displayed on the Macintosh Plus screen within 2 seconds and its multiline restriction map obtained in a scrolling window within 5 seconds. PMID:2832831
Gene and domain duplication in the chordate Otx gene family: insights from amphioxus Otx.

PubMed

Williams, N A; Holland, P W

1998-05-01

We report the genomic organization and deduced protein sequence of a cephalochordate member of the Otx homeobox gene family (AmphiOtx) and show its probable single-copy state in the genome. We also present molecular phylogenetic analysis indicating that there was single ancestral Otx gene in the first chordates which was duplicated in the vertebrate lineage after it had split from the lineage leading to the cephalochordates. Duplication of a C-terminal protein domain has occurred specifically in the vertebrate lineage, strengthening the case for a single Otx gene in an ancestral chordate whose gene structure has been retained in an extant cephalochordate. Comparative analysis of protein sequences and published gene expression patterns suggest that the ancestral chordate Otx gene had roles in patterning the anterior mesendoderm and central nervous system. These roles were elaborated following Otx gene duplication in vertebrates, accompanied by regulatory and structural divergence, particularly of Otx1 descendant genes.
Characterization of a filamentous virus from Bermuda grass and its molecular, serological and biological comparison with Spartina mottle virus.

PubMed

Hosseini, A; Koohi Habibi, M; Izadpanah, K; Mosahebi, G H; Rubies-Autonell, C; Ratti, C

2010-10-01

Bermuda grass with mosaic symptoms have been found in many parts of Iran. No serological correlation was observed between two isolates of this filamentous virus and any of the members of the family Potyviridae that were tested. Aphid transmission was demonstrated at low efficiency for isolates of this virus, whereas no transmission through seed was observed. A DNA fragment corresponding to the 3' end of the viral genome of these two isolates from Iran and one isolate from Italy was amplified and sequenced. A BLAST search showed that these isolates are more closely related to Spartina mottle virus (SpMV) than to any other virus in the family Potyviridae. Specific serological assays confirmed the phylogenetic analysis. Sequence and phylogenetic analysis suggested that these isolates could be considered as divergent strains of SpMV in the proposed genus Sparmovirus.
A recurrent deletion mutation in OPA1 causes autosomal dominant optic atrophy in a Chinese family

NASA Astrophysics Data System (ADS)

Zhang, Liping; Shi, Wei; Song, Liming; Zhang, Xiao; Cheng, Lulu; Wang, Yanfang; Ge, Xianglian; Li, Wei; Zhang, Wei; Min, Qingjie; Jin, Zi-Bing; Qu, Jia; Gu, Feng

2014-11-01

Autosomal dominant optic atrophy (ADOA) is the most frequent form of hereditary optic neuropathy and occurs due to the degeneration of the retinal ganglion cells. To identify the genetic defect in a family with putative ADOA, we performed capture next generation sequencing (CNGS) to screen known retinal disease genes. However, six exons failed to be sequenced by CNGS in optic atrophy 1 gene (OPA1). Sequencing of those exons identified a 4 bp deletion mutation (c.2983-1_2985del) in OPA1. Furthermore, we sequenced the transcripts of OPA1 from the patient skin fibroblasts and found there is six-nucleotide deletion (c.2984-c.2989, AGAAAG). Quantitative-PCR and Western blotting showed that OPA1 mRNA and its protein expression have no obvious difference between patient skin fibroblast and control. The analysis of protein structure by molecular modeling suggests that the mutation may change the structure of OPA1 by formation of an alpha helix protruding into an existing pocket. Taken together, we identified an OPA1 mutation in a family with ADOA by filling the missing CNGS data. We also showed that this mutation affects the structural intactness of OPA1. It provides molecular insights for clinical genetic diagnosis and treatment of optic atrophy.
A novel ABCD1 mutation detected by next generation sequencing in presumed hereditary spastic paraplegia: A 30-year diagnostic delay caused by misleading biochemical findings.

PubMed

Koutsis, Georgios; Lynch, David S; Tucci, Arianna; Houlden, Henry; Karadima, Georgia; Panas, Marios

2015-08-15

To present a Greek family in which 5 male and 2 female members developed progressive spastic paraplegia. Plasma very long chain fatty acids (VLCFA) were reportedly normal at first testing in an affected male and for over 30 years the presumed diagnosis was hereditary spastic paraplegia (HSP). Targeted next generation sequencing (NGS) was used as a further diagnostic tool. Targeted exome sequencing in the proband, followed by Sanger sequencing confirmation; mutation segregation testing in multiple family members and plasma VLCFA measurement in the proband. NGS of the proband revealed a novel frameshift mutation in ABCD1 (c.1174_1178del, p.Leu392Serfs*7), bringing an end to diagnostic uncertainty by establishing the diagnosis of adrenomyeloneuropathy (AMN), the myelopathic phenotype of X-linked adrenoleukodystrophy (ALD). The mutation segregated in all family members and the diagnosis of AMN/ALD was confirmed by plasma VLCFA measurement. Confounding factors that delayed the diagnosis are presented. This report highlights the diagnostic utility of NGS in patients with undiagnosed spastic paraplegia, establishing a molecular diagnosis of AMN, allowing proper genetic counseling and management, and overcoming the diagnostic delay that can be rarely caused by false negative VLCFA analysis. Copyright © 2015 Elsevier B.V. All rights reserved.
A recurrent deletion mutation in OPA1 causes autosomal dominant optic atrophy in a Chinese family.

PubMed

Zhang, Liping; Shi, Wei; Song, Liming; Zhang, Xiao; Cheng, Lulu; Wang, Yanfang; Ge, Xianglian; Li, Wei; Zhang, Wei; Min, Qingjie; Jin, Zi-Bing; Qu, Jia; Gu, Feng

2014-11-06

Autosomal dominant optic atrophy (ADOA) is the most frequent form of hereditary optic neuropathy and occurs due to the degeneration of the retinal ganglion cells. To identify the genetic defect in a family with putative ADOA, we performed capture next generation sequencing (CNGS) to screen known retinal disease genes. However, six exons failed to be sequenced by CNGS in optic atrophy 1 gene (OPA1). Sequencing of those exons identified a 4 bp deletion mutation (c.2983-1_2985del) in OPA1. Furthermore, we sequenced the transcripts of OPA1 from the patient skin fibroblasts and found there is six-nucleotide deletion (c.2984-c.2989, AGAAAG). Quantitative-PCR and Western blotting showed that OPA1 mRNA and its protein expression have no obvious difference between patient skin fibroblast and control. The analysis of protein structure by molecular modeling suggests that the mutation may change the structure of OPA1 by formation of an alpha helix protruding into an existing pocket. Taken together, we identified an OPA1 mutation in a family with ADOA by filling the missing CNGS data. We also showed that this mutation affects the structural intactness of OPA1. It provides molecular insights for clinical genetic diagnosis and treatment of optic atrophy.
Comparative Genomics of the Balsaminaceae Sister Genera Hydrocera triflora and Impatiens pinfanensis

PubMed Central

Li, Zhi-Zhong; Saina, Josphat K.; Gichira, Andrew W.; Kyalo, Cornelius M.; Wang, Qing-Feng

2018-01-01

The family Balsaminaceae, which consists of the economically important genus Impatiens and the monotypic genus Hydrocera, lacks a reported or published complete chloroplast genome sequence. Therefore, chloroplast genome sequences of the two sister genera are significant to give insight into the phylogenetic position and understanding the evolution of the Balsaminaceae family among the Ericales. In this study, complete chloroplast (cp) genomes of Impatiens pinfanensis and Hydrocera triflora were characterized and assembled using a high-throughput sequencing method. The complete cp genomes were found to possess the typical quadripartite structure of land plants chloroplast genomes with double-stranded molecules of 154,189 bp (Impatiens pinfanensis) and 152,238 bp (Hydrocera triflora) in length. A total of 115 unique genes were identified in both genomes, of which 80 are protein-coding genes, 31 are distinct transfer RNA (tRNA) and four distinct ribosomal RNA (rRNA). Thirty codons, of which 29 had A/T ending codons, revealed relative synonymous codon usage values of >1, whereas those with G/C ending codons displayed values of <1. The simple sequence repeats comprise mostly the mononucleotide repeats A/T in all examined cp genomes. Phylogenetic analysis based on 51 common protein-coding genes indicated that the Balsaminaceae family formed a lineage with Ebenaceae together with all the other Ericales. PMID:29360746

Evolutionary history of the enolase gene family.

PubMed

Tracy, M R; Hedges, S B

2000-12-23

The enzyme enolase [EC 4.2.1.11] is found in all organisms, with vertebrates exhibiting tissue-specific isozymes encoded by three genes: alpha (alpha), beta (beta), and gamma (gamma) enolase. Limited taxonomic sampling of enolase has obscured the timing of gene duplication events. To help clarify the evolutionary history of the gene family, cDNAs were sequenced from six taxa representing major lineages of vertebrates: Chiloscyllium punctatum (shark), Amia calva (bowfin), Salmo trutta (trout), Latimeria chalumnae (coelacanth), Lepidosiren paradoxa (South American lungfish), and Neoceratodus forsteri (Australian lungfish). Phylogenetic analysis of all enolase and related gene sequences revealed an early gene duplication event prior to the last common ancestor of living organisms. Several distantly related archaebacterial sequences were designated as 'enolase-2', whereas all other enolase sequences were designated 'enolase-1'. Two of the three isozymes of enolase-1, alpha- and beta-enolase, were discovered in actinopterygian, sarcopterygian, and chondrichthian fishes. Phylogenetic analysis of vertebrate enolases revealed that the two gene duplications leading to the three isozymes of enolase-1 occurred subsequent to the divergence of living agnathans, near the Proterozoic/Phanerozoic boundary (approximately 550Mya). Two copies of enolase, designated alpha(1) and alpha(2), were found in the trout and are presumed to be the result of a genome duplication event.
Breaking the 1000-gene barrier for Mimivirus using ultra-deep genome and transcriptome sequencing.

PubMed

Legendre, Matthieu; Santini, Sébastien; Rico, Alain; Abergel, Chantal; Claverie, Jean-Michel

2011-03-04

Mimivirus, a giant dsDNA virus infecting Acanthamoeba, is the prototype of the mimiviridae family, the latest addition to the family of the nucleocytoplasmic large DNA viruses (NCLDVs). Its 1.2 Mb-genome was initially predicted to encode 917 genes. A subsequent RNA-Seq analysis precisely mapped many transcript boundaries and identified 75 new genes. We now report a much deeper analysis using the SOLiD™ technology combining RNA-Seq of the Mimivirus transcriptome during the infectious cycle (202.4 Million reads), and a complete genome re-sequencing (45.3 Million reads). This study corrected the genome sequence and identified several single nucleotide polymorphisms. Our results also provided clear evidence of previously overlooked transcription units, including an important RNA polymerase subunit distantly related to Euryarchea homologues. The total Mimivirus gene count is now 1018, 11% greater than the original annotation. This study highlights the huge progress brought about by ultra-deep sequencing for the comprehensive annotation of virus genomes, opening the door to a complete one-nucleotide resolution level description of their transcriptional activity, and to the realistic modeling of the viral genome expression at the ultimate molecular level. This work also illustrates the need to go beyond bioinformatics-only approaches for the annotation of short protein and non-coding genes in viral genomes.
Complete Sequence and Analysis of Coconut Palm (Cocos nucifera) Mitochondrial Genome

PubMed Central

Zhao, Yuhui; Zeng, Jingyao; Alamer, Ali; Alanazi, Ibrahim O.; Alawad, Abdullah O.; Al-Sadi, Abdullah M.; Hu, Songnian; Yu, Jun

2016-01-01

Coconut (Cocos nucifera L.), a member of the palm family (Arecaceae), is one of the most economically important crops in tropics, serving as an important source of food, drink, fuel, medicine, and construction material. Here we report an assembly of the coconut (C. nucifera, Oman local Tall cultivar) mitochondrial (mt) genome based on next-generation sequencing data. This genome, 678,653bp in length and 45.5% in GC content, encodes 72 proteins, 9 pseudogenes, 23 tRNAs, and 3 ribosomal RNAs. Within the assembly, we find that the chloroplast (cp) derived regions account for 5.07% of the total assembly length, including 13 proteins, 2 pseudogenes, and 11 tRNAs. The mt genome has a relatively large fraction of repeat content (17.26%), including both forward (tandem) and inverted (palindromic) repeats. Sequence variation analysis shows that the Ti/Tv ratio of the mt genome is lower as compared to that of the nuclear genome and neutral expectation. By combining public RNA-Seq data for coconut, we identify 734 RNA editing sites supported by at least two datasets. In summary, our data provides the second complete mt genome sequence in the family Arecaceae, essential for further investigations on mitochondrial biology of seed plants. PMID:27736909
DOE Office of Scientific and Technical Information (OSTI.GOV)

Bahl, C.; Morisseau, C; Bomberger, J

Cystic fibrosis transmembrane conductance regulator (CFTR) inhibitory factor (Cif) is a virulence factor secreted by Pseudomonas aeruginosa that reduces the quantity of CFTR in the apical membrane of human airway epithelial cells. Initial sequence analysis suggested that Cif is an epoxide hydrolase (EH), but its sequence violates two strictly conserved EH motifs and also is compatible with other {alpha}/{beta} hydrolase family members with diverse substrate specificities. To investigate the mechanistic basis of Cif activity, we have determined its structure at 1.8-{angstrom} resolution by X-ray crystallography. The catalytic triad consists of residues Asp129, His297, and Glu153, which are conserved across themore » family of EHs. At other positions, sequence deviations from canonical EH active-site motifs are stereochemically conservative. Furthermore, detailed enzymatic analysis confirms that Cif catalyzes the hydrolysis of epoxide compounds, with specific activity against both epibromohydrin and cis-stilbene oxide, but with a relatively narrow range of substrate selectivity. Although closely related to two other classes of {alpha}/{beta} hydrolase in both sequence and structure, Cif does not exhibit activity as either a haloacetate dehalogenase or a haloalkane dehalogenase. A reassessment of the structural and functional consequences of the H269A mutation suggests that Cif's effect on host-cell CFTR expression requires the hydrolysis of an extended endogenous epoxide substrate.« less
Structural genomics analysis of uncharacterized protein families overrepresented in human gut bacteria identifies a novel glycoside hydrolase

PubMed Central

2014-01-01

Background Bacteroides spp. form a significant part of our gut microbiome and are well known for optimized metabolism of diverse polysaccharides. Initial analysis of the archetypal Bacteroides thetaiotaomicron genome identified 172 glycosyl hydrolases and a large number of uncharacterized proteins associated with polysaccharide metabolism. Results BT_1012 from Bacteroides thetaiotaomicron VPI-5482 is a protein of unknown function and a member of a large protein family consisting entirely of uncharacterized proteins. Initial sequence analysis predicted that this protein has two domains, one on the N- and one on the C-terminal. A PSI-BLAST search found over 150 full length and over 90 half size homologs consisting only of the N-terminal domain. The experimentally determined three-dimensional structure of the BT_1012 protein confirms its two-domain architecture and structural analysis of both domains suggests their specific functions. The N-terminal domain is a putative catalytic domain with significant similarity to known glycoside hydrolases, the C-terminal domain has a beta-sandwich fold typically found in C-terminal domains of other glycosyl hydrolases, however these domains are typically involved in substrate binding. We describe the structure of the BT_1012 protein and discuss its sequence-structure relationship and their possible functional implications. Conclusions Structural and sequence analyses of the BT_1012 protein identifies it as a glycosyl hydrolase, expanding an already impressive catalog of enzymes involved in polysaccharide metabolism in Bacteroides spp. Based on this we have renamed the Pfam families representing the two domains found in the BT_1012 protein, PF13204 and PF12904, as putative glycoside hydrolase and glycoside hydrolase-associated C-terminal domain respectively. PMID:24742328
Gitelman syndrome in a South African family presenting with hypokalaemia and unusual food cravings.

PubMed

van der Merwe, Pieter Du Toit; Rensburg, Megan A; Haylett, William L; Bardien, Soraya; Davids, M Razeen

2017-01-26

Gitelman syndrome (GS) is an autosomal recessive renal tubular disorder characterised by renal salt wasting with hypokalaemia, metabolic alkalosis, hypomagnesaemia and hypocalciuria. It is caused by mutations in SLC12A3 encoding the sodium-chloride cotransporter on the apical membrane of the distal convoluted tubule. We report a South African family with five affected individuals presenting with hypokalaemia and unusual food cravings. The affected individuals and two unaffected first degree relatives were enrolled into the study. Phenotypes were evaluated through history, physical examination and biochemical analysis of blood and urine. Mutation screening was performed by sequencing of SLC12A3, and determining the allele frequencies of the sequence variants found in this family in 117 ethnically matched controls. The index patient, her sister, father and two aunts had a history of severe salt cravings, fatigue and tetanic episodes, leading to consumption of large quantities of salt and vinegar. All affected individuals demonstrated hypokalaemia with renal potassium wasting. Genetic analysis revealed that the pseudo-dominant pattern of inheritance was due to compound heterozygosity with two novel mutations: a S546G substitution in exon 13, and insertion of AGCCCC at c.1930 in exon 16. These variants were present in the five affected individuals, but only one variant each in the unaffected family members. Neither variant was found in any of the controls. The diagnosis of GS was established in five members of a South African family through clinical assessment, biochemical analysis and mutation screening of the SLC12A3 gene, which identified two novel putative pathogenic mutations.
SVM-Prot 2016: A Web-Server for Machine Learning Prediction of Protein Functional Families from Sequence Irrespective of Similarity.

PubMed

Li, Ying Hong; Xu, Jing Yu; Tao, Lin; Li, Xiao Feng; Li, Shuang; Zeng, Xian; Chen, Shang Ying; Zhang, Peng; Qin, Chu; Zhang, Cheng; Chen, Zhe; Zhu, Feng; Chen, Yu Zong

2016-01-01

Knowledge of protein function is important for biological, medical and therapeutic studies, but many proteins are still unknown in function. There is a need for more improved functional prediction methods. Our SVM-Prot web-server employed a machine learning method for predicting protein functional families from protein sequences irrespective of similarity, which complemented those similarity-based and other methods in predicting diverse classes of proteins including the distantly-related proteins and homologous proteins of different functions. Since its publication in 2003, we made major improvements to SVM-Prot with (1) expanded coverage from 54 to 192 functional families, (2) more diverse protein descriptors protein representation, (3) improved predictive performances due to the use of more enriched training datasets and more variety of protein descriptors, (4) newly integrated BLAST analysis option for assessing proteins in the SVM-Prot predicted functional families that were similar in sequence to a query protein, and (5) newly added batch submission option for supporting the classification of multiple proteins. Moreover, 2 more machine learning approaches, K nearest neighbor and probabilistic neural networks, were added for facilitating collective assessment of protein functions by multiple methods. SVM-Prot can be accessed at http://bidd2.nus.edu.sg/cgi-bin/svmprot/svmprot.cgi.
PFAAT version 2.0: a tool for editing, annotating, and analyzing multiple sequence alignments.

PubMed

Caffrey, Daniel R; Dana, Paul H; Mathur, Vidhya; Ocano, Marco; Hong, Eun-Jong; Wang, Yaoyu E; Somaroo, Shyamal; Caffrey, Brian E; Potluri, Shobha; Huang, Enoch S

2007-10-11

By virtue of their shared ancestry, homologous sequences are similar in their structure and function. Consequently, multiple sequence alignments are routinely used to identify trends that relate to function. This type of analysis is particularly productive when it is combined with structural and phylogenetic analysis. Here we describe the release of PFAAT version 2.0, a tool for editing, analyzing, and annotating multiple sequence alignments. Support for multiple annotations is a key component of this release as it provides a framework for most of the new functionalities. The sequence annotations are accessible from the alignment and tree, where they are typically used to label sequences or hyperlink them to related databases. Sequence annotations can be created manually or extracted automatically from UniProt entries. Once a multiple sequence alignment is populated with sequence annotations, sequences can be easily selected and sorted through a sophisticated search dialog. The selected sequences can be further analyzed using statistical methods that explicitly model relationships between the sequence annotations and residue properties. Residue annotations are accessible from the alignment viewer and are typically used to designate binding sites or properties for a particular residue. Residue annotations are also searchable, and allow one to quickly select alignment columns for further sequence analysis, e.g. computing percent identities. Other features include: novel algorithms to compute sequence conservation, mapping conservation scores to a 3D structure in Jmol, displaying secondary structure elements, and sorting sequences by residue composition. PFAAT provides a framework whereby end-users can specify knowledge for a protein family in the form of annotation. The annotations can be combined with sophisticated analysis to test hypothesis that relate to sequence, structure and function.
Question 7: Comparative Genomics and Early Cell Evolution: A Cautionary Methodological Note

NASA Astrophysics Data System (ADS)

Islas, Sara; Hernández-Morales, Ricardo; Lazcano, Antonio

2007-10-01

Inventories of the gene content of the last common ancestor (LCA), i.e., the cenancestor, include sequences that may have undergone horizontal transfer events, as well as sequences that have originated in different pre-cenancestral epochs. However, the universal distribution of highly conserved genes involved in RNA metabolism provide insights into early stages of cell evolution during which RNA played a much more conspicuous biological role, and is consistent with the hypothesis that extant living systems were preceded by an RNA/protein world. Insights into the traits of primitive entities from which the LCA evolved may be derived from the analysis of paralogous gene families, including those formed by sequences that resulted from internal elongation events. Three major types of paralogous gene families can be recognized. The importance of this grouping for understanding the traits of early cells is discussed.
Repeated Evolution of the Pyrrolizidine Alkaloid–Mediated Defense System in Separate Angiosperm LineagesW⃞

PubMed Central

Reimann, Andreas; Nurhayati, Niknik; Backenköhler, Anita; Ober, Dietrich

2004-01-01

Species of several unrelated families within the angiosperms are able to constitutively produce pyrrolizidine alkaloids as a defense against herbivores. In pyrrolizidine alkaloid (PA) biosynthesis, homospermidine synthase (HSS) catalyzes the first specific step. HSS was recruited during angiosperm evolution from deoxyhypusine synthase (DHS), an enzyme involved in the posttranslational activation of eukaryotic initiation factor 5A. Phylogenetic analysis of 23 cDNA sequences coding for HSS and DHS of various angiosperm species revealed at least four independent recruitments of HSS from DHS: one within the Boraginaceae, one within the monocots, and two within the Asteraceae family. Furthermore, sequence analyses indicated elevated substitution rates within HSS-coding sequences after each gene duplication, with an increased level of nonsynonymous mutations. However, the contradiction between the polyphyletic origin of the first enzyme in PA biosynthesis and the structural identity of the final biosynthetic PA products needs clarification. PMID:15466410
Genome mining of ascomycetous fungi reveals their genetic potential for ergot alkaloid production.

PubMed

Gerhards, Nina; Matuschek, Marco; Wallwey, Christiane; Li, Shu-Ming

2015-06-01

Ergot alkaloids are important as mycotoxins or as drugs. Naturally occurring ergot alkaloids as well as their semisynthetic derivatives have been used as pharmaceuticals in modern medicine for decades. We identified 196 putative ergot alkaloid biosynthetic genes belonging to at least 31 putative gene clusters in 31 fungal species by genome mining of the 360 available genome sequences of ascomycetous fungi with known proteins. Detailed analysis showed that these fungi belong to the families Aspergillaceae, Clavicipitaceae, Arthrodermataceae, Helotiaceae and Thermoascaceae. Within the identified families, only a small number of taxa are represented. Literature search revealed a large diversity of ergot alkaloid structures in different fungi of the phylum Ascomycota. However, ergot alkaloid accumulation was only observed in 15 of the sequenced species. Therefore, this study provides genetic basis for further study on ergot alkaloid production in the sequenced strains.
Novel skeletal muscle ryanodine receptor mutation in a large Brazilian family with malignant hyperthermia.

PubMed

McWilliams, S; Nelson, T; Sudo, R T; Zapata-Sudo, G; Batti, M; Sambuughin, N

2002-07-01

Malignant hyperthermia (MH) is an autosomal dominant disorder that predisposes susceptible individuals to a potentially life-threatening crisis when exposed to commonly used anesthetics. Mutations in the skeletal muscle calcium release channel, ryanodine receptor (RYR1) are associated with MH in over 50% of affected families. Linkage analysis of the RYR1 gene region at 19q13 was performed in a large Brazilian family and a distinct disease co-segregating haplotype was revealed in the majority of members with diagnosis of MH. Subsequent sequencing of RYR1 mutational hot spots revealed a nucleotide substitution of C to T at position 7062, causing a novel amino acid change from Arg2355 to Cys associated with MH in the family. Haplotype analysis of the RYR1 gene area at 19q13 in the family with multiple MH members is an important tool in identification of genetic cause underlying this disease.
Trends in genome dynamics among major orders of insects revealed through variations in protein families.

PubMed

Rappoport, Nadav; Linial, Michal

2015-08-07

Insects belong to a class that accounts for the majority of animals on earth. With over one million identified species, insects display a huge diversity and occupy extreme environments. At present, there are dozens of fully sequenced insect genomes that cover a range of habitats, social behavior and morphologies. In view of such diverse collection of genomes, revealing evolutionary trends and charting functional relationships of proteins remain challenging. We analyzed the relatedness of 17 complete proteomes representative of proteomes from insects including louse, bee, beetle, ants, flies and mosquitoes, as well as an out-group from the crustaceans. The analyzed proteomes mostly represented the orders of Hymenoptera and Diptera. The 287,405 protein sequences from the 18 proteomes were automatically clustered into 20,933 families, including 799 singletons. A comprehensive analysis based on statistical considerations identified the families that were significantly expanded or reduced in any of the studied organisms. Among all the tested species, ants are characterized by an exceptionally high rate of family gain and loss. By assigning annotations to hundreds of species-specific families, the functional diversity among species and between the major clades (Diptera and Hymenoptera) is revealed. We found that many species-specific families are associated with receptor signaling, stress-related functions and proteases. The highest variability among insects associates with the function of transposition and nucleic acids processes (collectively coined TNAP). Specifically, the wasp and ants have an order of magnitude more TNAP families and proteins relative to species that belong to Diptera (mosquitoes and flies). An unsupervised clustering methodology combined with a comparative functional analysis unveiled proteomic signatures in the major clades of winged insects. We propose that the expansion of TNAP families in Hymenoptera potentially contributes to the accelerated genome dynamics that characterize the wasp and ants.
Jannovar: a java library for exome annotation.

PubMed

Jäger, Marten; Wang, Kai; Bauer, Sebastian; Smedley, Damian; Krawitz, Peter; Robinson, Peter N

2014-05-01

Transcript-based annotation and pedigree analysis are two basic steps in the computational analysis of whole-exome sequencing experiments in genetic diagnostics and disease-gene discovery projects. Here, we present Jannovar, a stand-alone Java application as well as a Java library designed to be used in larger software frameworks for exome and genome analysis. Jannovar uses an interval tree to identify all transcripts affected by a given variant, and provides Human Genome Variation Society-compliant annotations both for variants affecting coding sequences and splice junctions as well as untranslated regions and noncoding RNA transcripts. Jannovar can also perform family-based pedigree analysis with Variant Call Format (VCF) files with data from members of a family segregating a Mendelian disorder. Using a desktop computer, Jannovar requires a few seconds to annotate a typical VCF file with exome data. Jannovar is freely available under the BSD2 license. Source code as well as the Java application and library file can be downloaded from http://compbio.charite.de (with tutorial) and https://github.com/charite/jannovar. © 2014 WILEY PERIODICALS, INC.
Genomewide Function Conservation and Phylogeny in the Herpesviridae

PubMed Central

Albà, M. Mar; Das, Rhiju; Orengo, Christine A.; Kellam, Paul

2001-01-01

The Herpesviridae are a large group of well-characterized double-stranded DNA viruses for which many complete genome sequences have been determined. We have extracted protein sequences from all predicted open reading frames of 19 herpesvirus genomes. Sequence comparison and protein sequence clustering methods have been used to construct herpesvirus protein homologous families. This resulted in 1692 proteins being clustered into 243 multiprotein families and 196 singleton proteins. Predicted functions were assigned to each homologous family based on genome annotation and published data and each family classified into seven broad functional groups. Phylogenetic profiles were constructed for each herpesvirus from the homologous protein families and used to determine conserved functions and genomewide phylogenetic trees. These trees agreed with molecular-sequence-derived trees and allowed greater insight into the phylogeny of ungulate and murine gammaherpesviruses. PMID:11156614
Genome-Wide Identification and Expression Analysis of WRKY Gene Family in Capsicum annuum L.

PubMed

Diao, Wei-Ping; Snyder, John C; Wang, Shu-Bin; Liu, Jin-Bing; Pan, Bao-Gui; Guo, Guang-Jun; Wei, Ge

2016-01-01

The WRKY family of transcription factors is one of the most important families of plant transcriptional regulators with members regulating multiple biological processes, especially in regulating defense against biotic and abiotic stresses. However, little information is available about WRKYs in pepper (Capsicum annuum L.). The recent release of completely assembled genome sequences of pepper allowed us to perform a genome-wide investigation for pepper WRKY proteins. In the present study, a total of 71 WRKY genes were identified in the pepper genome. According to structural features of their encoded proteins, the pepper WRKY genes (CaWRKY) were classified into three main groups, with the second group further divided into five subgroups. Genome mapping analysis revealed that CaWRKY were enriched on four chromosomes, especially on chromosome 1, and 15.5% of the family members were tandemly duplicated genes. A phylogenetic tree was constructed depending on WRKY domain' sequences derived from pepper and Arabidopsis. The expression of 21 selected CaWRKY genes in response to seven different biotic and abiotic stresses (salt, heat shock, drought, Phytophtora capsici, SA, MeJA, and ABA) was evaluated by quantitative RT-PCR; Some CaWRKYs were highly expressed and up-regulated by stress treatment. Our results will provide a platform for functional identification and molecular breeding studies of WRKY genes in pepper.
A missense mutation in the vasopressin-neurophysin precursor gene cosegregates with human autosomal dominant neurohypophyseal diabetes insipidus.

PubMed Central

Bahnsen, U; Oosting, P; Swaab, D F; Nahke, P; Richter, D; Schmale, H

1992-01-01

Familial neurohypophyseal diabetes insipidus in humans is a rare disease transmitted as an autosomal dominant trait. Affected individuals have very low or undetectable levels of circulating vasopressin and suffer from polydipsia and polyuria. An obvious candidate gene for the disease is the vasopressin-neurophysin (AVP-NP) precursor gene on human chromosome 20. The 2 kb gene with three exons encodes a composite precursor protein consisting of the neuropeptide vasopressin and two associated proteins, neurophysin and a glycopeptide. Cloning and nucleotide sequence analysis of both alleles of the AVP-NP gene present in a Dutch ADNDI family reveals a point mutation in one allele of the affected family members. Comparison of the nucleotide sequences shows a G----T transversion within the neurophysin-encoding exon B. This missense mutation converts a highly conserved glycine (Gly17 of neurophysin) to a valine residue. RFLP analysis of six related family members indicates cosegregation of the mutant allele with the DI phenotype. The mutation is not present in 96 chromosomes of an unrelated control group. These data suggest that a single amino acid exchange within a highly conserved domain of the human vasopressin-associated neurophysin is the primary cause of one form of ADNDI. Images PMID:1740104
Use of designed sequences in protein structure recognition.

PubMed

Kumar, Gayatri; Mudgal, Richa; Srinivasan, Narayanaswamy; Sandhya, Sankaran

2018-05-09

Knowledge of the protein structure is a pre-requisite for improved understanding of molecular function. The gap in the sequence-structure space has increased in the post-genomic era. Grouping related protein sequences into families can aid in narrowing the gap. In the Pfam database, structure description is provided for part or full-length proteins of 7726 families. For the remaining 52% of the families, information on 3-D structure is not yet available. We use the computationally designed sequences that are intermediately related to two protein domain families, which are already known to share the same fold. These strategically designed sequences enable detection of distant relationships and here, we have employed them for the purpose of structure recognition of protein families of yet unknown structure. We first measured the success rate of our approach using a dataset of protein families of known fold and achieved a success rate of 88%. Next, for 1392 families of yet unknown structure, we made structural assignments for part/full length of the proteins. Fold association for 423 domains of unknown function (DUFs) are provided as a step towards functional annotation. The results indicate that knowledge-based filling of gaps in protein sequence space is a lucrative approach for structure recognition. Such sequences assist in traversal through protein sequence space and effectively function as 'linkers', where natural linkers between distant proteins are unavailable. This article was reviewed by Oliviero Carugo, Christine Orengo and Srikrishna Subramanian.
Sequence analysis of serum albumins reveals the molecular evolution of ligand recognition properties.

PubMed

Fanali, Gabriella; Ascenzi, Paolo; Bernardi, Giorgio; Fasano, Mauro

2012-01-01

Serum albumin (SA) is a circulating protein providing a depot and carrier for many endogenous and exogenous compounds. At least seven major binding sites have been identified by structural and functional investigations mainly in human SA. SA is conserved in vertebrates, with at least 49 entries in protein sequence databases. The multiple sequence analysis of this set of entries leads to the definition of a cladistic tree for the molecular evolution of SA orthologs in vertebrates, thus showing the clustering of the considered species, with lamprey SAs (Lethenteron japonicum and Petromyzon marinus) in a separate outgroup. Sequence analysis aimed at searching conserved domains revealed that most SA sequences are made up by three repeated domains (about 600 residues), as extensively characterized for human SA. On the contrary, lamprey SAs are giant proteins (about 1400 residues) comprising seven repeated domains. The phylogenetic analysis of the SA family reveals a stringent correlation with the taxonomic classification of the species available in sequence databases. A focused inspection of the sequences of ligand binding sites in SA revealed that in all sites most residues involved in ligand binding are conserved, although the versatility towards different ligands could be peculiar of higher organisms. Moreover, the analysis of molecular links between the different sites suggests that allosteric modulation mechanisms could be restricted to higher vertebrates.
SLC52A2 mutations cause SCABD2 phenotype: A second report.

PubMed

Babanejad, Mojgan; Adeli, Omid Ali; Nikzat, Nooshin; Beheshtian, Maryam; Azarafra, Hakimeh; Sadeghnia, Farnaz; Mohseni, Marzieh; Najmabadi, Hossein; Kahrizi, Kimia

2018-01-01

Autosomal recessive cerebellar ataxias (ARCAs) are a large group of neurodegenerative disorders that manifest mainly in children and young adults. Most ARCAs are heterogeneous with respect to age at onset, severity of disease progression, and frequency of extracerebellar and systemic signs. The phenotype of a consanguineous Iranian family was characterized using clinical testing and pedigree analysis. Whole-exome sequencing was used to identify the disease-causing gene in this family. Using whole exome sequencing (WES), a novel missense mutation in SLC52A2 gene is reported in a consanguineous Iranian family with progressive severe hearing loss, optic atrophy and ataxia. This is the second report of the genotype-phenotype correlation between this syndrome named spinocerebellar ataxia with blindness and deafness type 2 (SCABD2) and SLC52A2 gene. Copyright © 2017 Elsevier B.V. All rights reserved.

Sequence of the non-phosphorylating glyceraldehyde-3-phosphate dehydrogenase from Nicotiana plumbaginifolia and phylogenetic origin of the gene family.

PubMed

Habenicht, A; Quesada, A; Cerff, R

1997-10-01

A cDNA-library has been constructed from Nicotiana plumbaginifolia seedlings, and the non-phosphorylating glyceraldehyde-3-phosphate dehydrogenase (GapN, EC 1.2.1.9) was isolated by plaque hybridization using the cDNA from pea as a heterologous probe. The cDNA comprises the entire GapN coding region. A putative polyadenylation signal is identified. Phylogenetic analysis based on the deduced amino acid sequences revealed that the GapN gene family represents a separate ancient branch within the aldehyde dehydrogenase superfamily. It can be shown that the GapN gene family and other distinct branches of the superfamily have its phylogenetic origin before the separation of primary life-forms. This further demonstrates that already very early in evolution, a broad diversification of the aldehyde dehydrogenases led to the formation of the superfamily.
HIPPI: highly accurate protein family classification with ensembles of HMMs.

PubMed

Nguyen, Nam-Phuong; Nute, Michael; Mirarab, Siavash; Warnow, Tandy

2016-11-11

Given a new biological sequence, detecting membership in a known family is a basic step in many bioinformatics analyses, with applications to protein structure and function prediction and metagenomic taxon identification and abundance profiling, among others. Yet family identification of sequences that are distantly related to sequences in public databases or that are fragmentary remains one of the more difficult analytical problems in bioinformatics. We present a new technique for family identification called HIPPI (Hierarchical Profile Hidden Markov Models for Protein family Identification). HIPPI uses a novel technique to represent a multiple sequence alignment for a given protein family or superfamily by an ensemble of profile hidden Markov models computed using HMMER. An evaluation of HIPPI on the Pfam database shows that HIPPI has better overall precision and recall than blastp, HMMER, and pipelines based on HHsearch, and maintains good accuracy even for fragmentary query sequences and for protein families with low average pairwise sequence identity, both conditions where other methods degrade in accuracy. HIPPI provides accurate protein family identification and is robust to difficult model conditions. Our results, combined with observations from previous studies, show that ensembles of profile Hidden Markov models can better represent multiple sequence alignments than a single profile Hidden Markov model, and thus can improve downstream analyses for various bioinformatic tasks. Further research is needed to determine the best practices for building the ensemble of profile Hidden Markov models. HIPPI is available on GitHub at https://github.com/smirarab/sepp .
Variations in Nuclear Localization Strategies Among Pol X Family Enzymes.

PubMed

Kirby, Thomas W; Pedersen, Lars C; Gabel, Scott A; Gassman, Natalie R; London, Robert E

2018-06-22

Despite the essential roles of pol X family enzymes in DNA repair, information about the structural basis of their nuclear import is limited. Recent studies revealed the unexpected presence of a functional NLS in DNA polymerase β, indicating the importance of active nuclear targeting, even for enzymes likely to leak into and out of the nucleus. The current studies further explore the active nuclear transport of these enzymes by identifying and structurally characterizing the functional NLS sequences in the three remaining human pol X enzymes: terminal deoxynucleotidyl transferase (TdT), DNA polymerase μ (pol μ), and DNA polymerase λ (pol λ). NLS identifications are based on Importin α (Impα) binding affinity determined by fluorescence polarization of fluorescein-labeled NLS peptides, X-ray crystallographic analysis of the Impα∆IBB•NLS complexes, and fluorescence-based subcellular localization studies. All three polymerases use NLS sequences located near their N-terminus; TdT and pol μ utilize monopartite NLS sequences, while pol λ utilizes a bipartite sequence, unique among the pol X family members. The pol μ NLS has relatively weak measured affinity for Impα, due in part to its proximity to the N-terminus that limits non-specific interactions of flanking residues preceding the NLS. However, this effect is partially mitigated by an N-terminal sequence unsupportive of Met1 removal by methionine aminopeptidase, leading to a 3-fold increase in affinity when the N-terminal methionine is present. Nuclear targeting is unique to each pol X family enzyme with variations dependent on the structure and unique functional role of each polymerase. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
Identification of novel microRNAs in Hevea brasiliensis and computational prediction of their targets

PubMed Central

2012-01-01

Background Plants respond to external stimuli through fine regulation of gene expression partially ensured by small RNAs. Of these, microRNAs (miRNAs) play a crucial role. They negatively regulate gene expression by targeting the cleavage or translational inhibition of target messenger RNAs (mRNAs). In Hevea brasiliensis, environmental and harvesting stresses are known to affect natural rubber production. This study set out to identify abiotic stress-related miRNAs in Hevea using next-generation sequencing and bioinformatic analysis. Results Deep sequencing of small RNAs was carried out on plantlets subjected to severe abiotic stress using the Solexa technique. By combining the LeARN pipeline, data from the Plant microRNA database (PMRD) and Hevea EST sequences, we identified 48 conserved miRNA families already characterized in other plant species, and 10 putatively novel miRNA families. The results showed the most abundant size for miRNAs to be 24 nucleotides, except for seven families. Several MIR genes produced both 20-22 nucleotides and 23-27 nucleotides. The two miRNA class sizes were detected for both conserved and putative novel miRNA families, suggesting their functional duality. The EST databases were scanned with conserved and novel miRNA sequences. MiRNA targets were computationally predicted and analysed. The predicted targets involved in "responses to stimuli" and to "antioxidant" and "transcription activities" are presented. Conclusions Deep sequencing of small RNAs combined with transcriptomic data is a powerful tool for identifying conserved and novel miRNAs when the complete genome is not yet available. Our study provided additional information for evolutionary studies and revealed potentially specific regulation of the control of redox status in Hevea. PMID:22330773
Comparative Mitogenomics of Plant Bugs (Hemiptera: Miridae): Identifying the AGG Codon Reassignments between Serine and Lysine

PubMed Central

Wang, Pei; Song, Fan; Cai, Wanzhi

2014-01-01

Insect mitochondrial genomes are very important to understand the molecular evolution as well as for phylogenetic and phylogeographic studies of the insects. The Miridae are the largest family of Heteroptera encompassing more than 11,000 described species and of great economic importance. For better understanding the diversity and the evolution of plant bugs, we sequence five new mitochondrial genomes and present the first comparative analysis of nine mitochondrial genomes of mirids available to date. Our result showed that gene content, gene arrangement, base composition and sequences of mitochondrial transcription termination factor were conserved in plant bugs. Intra-genus species shared more conserved genomic characteristics, such as nucleotide and amino acid composition of protein-coding genes, secondary structure and anticodon mutations of tRNAs, and non-coding sequences. Control region possessed several distinct characteristics, including: variable size, abundant tandem repetitions, and intra-genus conservation; and was useful in evolutionary and population genetic studies. The AGG codon reassignments were investigated between serine and lysine in the genera Adelphocoris and other cimicomorphans. Our analysis revealed correlated evolution between reassignments of the AGG codon and specific point mutations at the antidocons of tRNALys and tRNASer(AGN). Phylogenetic analysis indicated that mitochondrial genome sequences were useful in resolving family level relationship of Cimicomorpha. Comparative evolutionary analysis of plant bug mitochondrial genomes allowed the identification of previously neglected coding genes or non-coding regions as potential molecular markers. The finding of the AGG codon reassignments between serine and lysine indicated the parallel evolution of the genetic code in Hemiptera mitochondrial genomes. PMID:24988409
Phylogenetic analysis of dissimilatory Fe(III)-reducing bacteria

USGS Publications Warehouse

Lonergan, D.J.; Jenter, H.L.; Coates, J.D.; Phillips, E.J.P.; Schmidt, T.M.; Lovley, D.R.

1996-01-01

Evolutionary relationships among strictly anaerobic dissimilatory Fe(III)- reducing bacteria obtained from a diversity of sedimentary environments were examined by phylogenetic analysis of 16S rRNA gene sequences. Members of the genera Geobacter, Desulfuromonas, Pelobacter, and Desulfuromusa formed a monophyletic group within the delta subdivision of the class Proteobacteria. On the basis of their common ancestry and the shared ability to reduce Fe(III) and/or S0, we propose that this group be considered a single family, Geobacteraceae. Bootstrap analysis, characteristic nucleotides, and higher- order secondary structures support the division of Geobacteraceae into two subgroups, designated the Geobacter and Desulfuromonas clusters. The genus Desulfuromusa and Pelobacter acidigallici make up a distinct branch with the Desulfuromonas cluster. Several members of the family Geobacteraceae, none of which reduce sulfate, were found to contain the target sequences of probes that have been previously used to define the distribution of sulfate-reducing bacteria and sulfate-reducing bacterium-like microorganisms. The recent isolations of Fe(III)-reducing microorganisms distributed throughout the domain Bacteria suggest that development of 16S rRNA probes that would specifically target all Fe(III) reducers may not be feasible. However, all of the evidence suggests that if a 16S rRNA sequence falls within the family Geobacteraceae, then the organism has the capacity for Fe(III) reduction. The suggestion, based on geological evidence, that Fe(III) reduction was the first globally significant process for oxidizing organic matter back to carbon dioxide is consistent with the finding that acetate-oxidizing Fe(III) reducers are phylogenetically diverse.
Identification of an additional member of the protein-tyrosine-phosphatase family: evidence for alternative splicing in the tyrosine phosphatase domain.

PubMed Central

Matthews, R J; Cahir, E D; Thomas, M L

1990-01-01

Protein-tyrosine-phosphatases (protein-tyrosine-phosphate phosphohydrolase, EC 3.13.48) have been implicated in the regulation of cell growth; however, to date few tyrosine phosphatases have been characterized. To identify additional family members, the cDNA for the human tyrosine phosphatase leukocyte common antigen (LCA; CD45) was used to screen, under low stringency, a mouse pre-B-cell cDNA library. Two cDNA clones were isolated and sequence analysis predicts a protein sequence of 793 amino acids. We have named the molecule LRP (LCA-related phosphatase). RNA transfer analysis indicates that the cDNAs were derived from a 3.2-kilobase mRNA. The LRP mRNA is transcribed in a wide variety of tissues. The predicted protein structure can be divided into the following structural features: a short 19-amino acid leader sequence, an exterior domain of 123 amino acids that is predicted to be highly glycosylated, a 24-amino acid membrane-spanning region, and a 627-amino acid cytoplasmic region. The cytoplasmic region contains two approximately 260-amino acid domains, each with homology to the tyrosine phosphatase family. One of the cDNA clones differed in that it had a 108-base-pair insertion that, while preserving the reading frame, would disrupt the first protein-tyrosine-phosphatase domain. Analysis of genomic DNA indicates that the insertion is due to an alternatively spliced exon. LRP appears to be evolutionarily conserved as a putative homologue has been identified in the invertebrate Styela plicata. Images PMID:2162042
Protein family clustering for structural genomics.

PubMed

Yan, Yongpan; Moult, John

2005-10-28

A major goal of structural genomics is the provision of a structural template for a large fraction of protein domains. The magnitude of this task depends on the number and nature of protein sequence families. With a large number of bacterial genomes now fully sequenced, it is possible to obtain improved estimates of the number and diversity of families in that kingdom. We have used an automated clustering procedure to group all sequences in a set of genomes into protein families. Bench-marking shows the clustering method is sensitive at detecting remote family members, and has a low level of false positives. This comprehensive protein family set has been used to address the following questions. (1) What is the structure coverage for currently known families? (2) How will the number of known apparent families grow as more genomes are sequenced? (3) What is a practical strategy for maximizing structure coverage in future? Our study indicates that approximately 20% of known families with three or more members currently have a representative structure. The study indicates also that the number of apparent protein families will be considerably larger than previously thought: We estimate that, by the criteria of this work, there will be about 250,000 protein families when 1000 microbial genomes have been sequenced. However, the vast majority of these families will be small, and it will be possible to obtain structural templates for 70-80% of protein domains with an achievable number of representative structures, by systematically sampling the larger families.
MSH6 and MSH3 are rarely involved in genetic predisposition to nonpolypotic colon cancer.

PubMed

Huang, J; Kuismanen, S A; Liu, T; Chadwick, R B; Johnson, C K; Stevens, M W; Richards, S K; Meek, J E; Gao, X; Wright, F A; Mecklin, J P; Järvinen, H J; Grönberg, H; Bisgaard, M L; Lindblom, A; Peltomäki, P

2001-02-15

A set of 90 nonpolypotic colon cancer families in which germ-line mutations of MSH2 and MLH1 had been excluded were screened for mutations in two additional DNA mismatch repair genes, MSH6 and MSH3. Kindreds fulfilling and not fulfilling the Amsterdam I criteria, showing early and late onset colorectal (and other) cancers, and having microsatellite stable and unstable tumors were included. Two partly parallel approaches were used: genetic linkage analysis (19 large families) and the protein truncation test (85, mostly smaller, families). Whereas MSH3 was not involved in any family, a large Amsterdam-positive, late-onset family showed a novel germ-line mutation in MSH6 (deletion of CT at nucleotide 3052 in exon 4). The mutation was identified through genetic linkage (multipoint lod score 2.4) and subsequent sequencing of MSH6. Furthermore, the entire MSH6 gene was sequenced exon by exon in families with frameshift mutations in the (C)8 tract in tumors, previously suggested as a predictor of MSH6 germ-line mutations; no mutations were found. We conclude that germ-line involvement of MSH6 and MSH3 is rare and that other genes are likely to account for a majority of MSH2-, MLH1-mutation negative families with nonpolypotic colon cancer.
Assessing the 5S ribosomal RNA heterogeneity in Arabidopsis thaliana using short RNA next generation sequencing data.

PubMed

Szymanski, Maciej; Karlowski, Wojciech M

2016-01-01

In eukaryotes, ribosomal 5S rRNAs are products of multigene families organized within clusters of tandemly repeated units. Accumulation of genomic data obtained from a variety of organisms demonstrated that the potential 5S rRNA coding sequences show a large number of variants, often incompatible with folding into a correct secondary structure. Here, we present results of an analysis of a large set of short RNA sequences generated by the next generation sequencing techniques, to address the problem of heterogeneity of the 5S rRNA transcripts in Arabidopsis and identification of potentially functional rRNA-derived fragments.
Diversity and Evolutionary Analysis of Iron-Containing (Type-III) Alcohol Dehydrogenases in Eukaryotes

PubMed Central

Gaona-López, Carlos; Julián-Sánchez, Adriana

2016-01-01

Background Alcohol dehydrogenase (ADH) activity is widely distributed in the three domains of life. Currently, there are three non-homologous NAD(P)+-dependent ADH families reported: Type I ADH comprises Zn-dependent ADHs; type II ADH comprises short-chain ADHs described first in Drosophila; and, type III ADH comprises iron-containing ADHs (FeADHs). These three families arose independently throughout evolution and possess different structures and mechanisms of reaction. While types I and II ADHs have been extensively studied, analyses about the evolution and diversity of (type III) FeADHs have not been published yet. Therefore in this work, a phylogenetic analysis of FeADHs was performed to get insights into the evolution of this protein family, as well as explore the diversity of FeADHs in eukaryotes. Principal Findings Results showed that FeADHs from eukaryotes are distributed in thirteen protein subfamilies, eight of them possessing protein sequences distributed in the three domains of life. Interestingly, none of these protein subfamilies possess protein sequences found simultaneously in animals, plants and fungi. Many FeADHs are activated by or contain Fe2+, but many others bind to a variety of metals, or even lack of metal cofactor. Animal FeADHs are found in just one protein subfamily, the hydroxyacid-oxoacid transhydrogenase (HOT) subfamily, which includes protein sequences widely distributed in fungi, but not in plants), and in several taxa from lower eukaryotes, bacteria and archaea. Fungi FeADHs are found mainly in two subfamilies: HOT and maleylacetate reductase (MAR), but some can be found also in other three different protein subfamilies. Plant FeADHs are found only in chlorophyta but not in higher plants, and are distributed in three different protein subfamilies. Conclusions/Significance FeADHs are a diverse and ancient protein family that shares a common 3D scaffold with a patchy distribution in eukaryotes. The majority of sequenced FeADHs from eukaryotes are distributed in just two subfamilies, HOT and MAR (found mainly in animals and fungi). These two subfamilies comprise almost 85% of all sequenced FeADHs in eukaryotes. PMID:27893862
X-exome sequencing of 405 unresolved families identifies seven novel intellectual disability genes.

PubMed

Hu, H; Haas, S A; Chelly, J; Van Esch, H; Raynaud, M; de Brouwer, A P M; Weinert, S; Froyen, G; Frints, S G M; Laumonnier, F; Zemojtel, T; Love, M I; Richard, H; Emde, A-K; Bienek, M; Jensen, C; Hambrock, M; Fischer, U; Langnick, C; Feldkamp, M; Wissink-Lindhout, W; Lebrun, N; Castelnau, L; Rucci, J; Montjean, R; Dorseuil, O; Billuart, P; Stuhlmann, T; Shaw, M; Corbett, M A; Gardner, A; Willis-Owen, S; Tan, C; Friend, K L; Belet, S; van Roozendaal, K E P; Jimenez-Pocquet, M; Moizard, M-P; Ronce, N; Sun, R; O'Keeffe, S; Chenna, R; van Bömmel, A; Göke, J; Hackett, A; Field, M; Christie, L; Boyle, J; Haan, E; Nelson, J; Turner, G; Baynam, G; Gillessen-Kaesbach, G; Müller, U; Steinberger, D; Budny, B; Badura-Stronka, M; Latos-Bieleńska, A; Ousager, L B; Wieacker, P; Rodríguez Criado, G; Bondeson, M-L; Annerén, G; Dufke, A; Cohen, M; Van Maldergem, L; Vincent-Delorme, C; Echenne, B; Simon-Bouy, B; Kleefstra, T; Willemsen, M; Fryns, J-P; Devriendt, K; Ullmann, R; Vingron, M; Wrogemann, K; Wienker, T F; Tzschach, A; van Bokhoven, H; Gecz, J; Jentsch, T J; Chen, W; Ropers, H-H; Kalscheuer, V M

2016-01-01

X-linked intellectual disability (XLID) is a clinically and genetically heterogeneous disorder. During the past two decades in excess of 100 X-chromosome ID genes have been identified. Yet, a large number of families mapping to the X-chromosome remained unresolved suggesting that more XLID genes or loci are yet to be identified. Here, we have investigated 405 unresolved families with XLID. We employed massively parallel sequencing of all X-chromosome exons in the index males. The majority of these males were previously tested negative for copy number variations and for mutations in a subset of known XLID genes by Sanger sequencing. In total, 745 X-chromosomal genes were screened. After stringent filtering, a total of 1297 non-recurrent exonic variants remained for prioritization. Co-segregation analysis of potential clinically relevant changes revealed that 80 families (20%) carried pathogenic variants in established XLID genes. In 19 families, we detected likely causative protein truncating and missense variants in 7 novel and validated XLID genes (CLCN4, CNKSR2, FRMPD4, KLHL15, LAS1L, RLIM and USP27X) and potentially deleterious variants in 2 novel candidate XLID genes (CDK16 and TAF1). We show that the CLCN4 and CNKSR2 variants impair protein functions as indicated by electrophysiological studies and altered differentiation of cultured primary neurons from Clcn4(-/-) mice or after mRNA knock-down. The newly identified and candidate XLID proteins belong to pathways and networks with established roles in cognitive function and intellectual disability in particular. We suggest that systematic sequencing of all X-chromosomal genes in a cohort of patients with genetic evidence for X-chromosome locus involvement may resolve up to 58% of Fragile X-negative cases.
X-exome sequencing of 405 unresolved families identifies seven novel intellectual disability genes

PubMed Central

Hu, H; Haas, S A; Chelly, J; Van Esch, H; Raynaud, M; de Brouwer, A P M; Weinert, S; Froyen, G; Frints, S G M; Laumonnier, F; Zemojtel, T; Love, M I; Richard, H; Emde, A-K; Bienek, M; Jensen, C; Hambrock, M; Fischer, U; Langnick, C; Feldkamp, M; Wissink-Lindhout, W; Lebrun, N; Castelnau, L; Rucci, J; Montjean, R; Dorseuil, O; Billuart, P; Stuhlmann, T; Shaw, M; Corbett, M A; Gardner, A; Willis-Owen, S; Tan, C; Friend, K L; Belet, S; van Roozendaal, K E P; Jimenez-Pocquet, M; Moizard, M-P; Ronce, N; Sun, R; O'Keeffe, S; Chenna, R; van Bömmel, A; Göke, J; Hackett, A; Field, M; Christie, L; Boyle, J; Haan, E; Nelson, J; Turner, G; Baynam, G; Gillessen-Kaesbach, G; Müller, U; Steinberger, D; Budny, B; Badura-Stronka, M; Latos-Bieleńska, A; Ousager, L B; Wieacker, P; Rodríguez Criado, G; Bondeson, M-L; Annerén, G; Dufke, A; Cohen, M; Van Maldergem, L; Vincent-Delorme, C; Echenne, B; Simon-Bouy, B; Kleefstra, T; Willemsen, M; Fryns, J-P; Devriendt, K; Ullmann, R; Vingron, M; Wrogemann, K; Wienker, T F; Tzschach, A; van Bokhoven, H; Gecz, J; Jentsch, T J; Chen, W; Ropers, H-H; Kalscheuer, V M

2016-01-01

X-linked intellectual disability (XLID) is a clinically and genetically heterogeneous disorder. During the past two decades in excess of 100 X-chromosome ID genes have been identified. Yet, a large number of families mapping to the X-chromosome remained unresolved suggesting that more XLID genes or loci are yet to be identified. Here, we have investigated 405 unresolved families with XLID. We employed massively parallel sequencing of all X-chromosome exons in the index males. The majority of these males were previously tested negative for copy number variations and for mutations in a subset of known XLID genes by Sanger sequencing. In total, 745 X-chromosomal genes were screened. After stringent filtering, a total of 1297 non-recurrent exonic variants remained for prioritization. Co-segregation analysis of potential clinically relevant changes revealed that 80 families (20%) carried pathogenic variants in established XLID genes. In 19 families, we detected likely causative protein truncating and missense variants in 7 novel and validated XLID genes (CLCN4, CNKSR2, FRMPD4, KLHL15, LAS1L, RLIM and USP27X) and potentially deleterious variants in 2 novel candidate XLID genes (CDK16 and TAF1). We show that the CLCN4 and CNKSR2 variants impair protein functions as indicated by electrophysiological studies and altered differentiation of cultured primary neurons from Clcn4−/− mice or after mRNA knock-down. The newly identified and candidate XLID proteins belong to pathways and networks with established roles in cognitive function and intellectual disability in particular. We suggest that systematic sequencing of all X-chromosomal genes in a cohort of patients with genetic evidence for X-chromosome locus involvement may resolve up to 58% of Fragile X-negative cases. PMID:25644381
Hereditary spastic paraplegias: identification of a novel SPG57 variant affecting TFG oligomerization and description of HSP subtypes in Sudan.

PubMed

Elsayed, Liena E O; Mohammed, Inaam N; Hamed, Ahlam A A; Elseed, Maha A; Johnson, Adam; Mairey, Mathilde; Mohamed, Hassab Elrasoul S A; Idris, Mohamed N; Salih, Mustafa A M; El-Sadig, Sarah M; Koko, Mahmoud E; Mohamed, Ashraf Y O; Raymond, Laure; Coutelier, Marie; Darios, Frédéric; Siddig, Rayan A; Ahmed, Ahmed K M A; Babai, Arwa M A; Malik, Hiba M O; Omer, Zulfa M B M; Mohamed, Eman O E; Eltahir, Hanan B; Magboul, Nasr Aldin A; Bushara, Elfatih E; Elnour, Abdelrahman; Rahim, Salah M Abdel; Alattaya, Abdelmoneim; Elbashir, Mustafa I; Ibrahim, Muntaser E; Durr, Alexandra; Audhya, Anjon; Brice, Alexis; Ahmed, Ammar E; Stevanin, Giovanni

2016-01-01

Hereditary spastic paraplegias (HSP) are the second most common type of motor neuron disease recognized worldwide. We investigated a total of 25 consanguineous families from Sudan. We used next-generation sequencing to screen 74 HSP-related genes in 23 families. Linkage analysis and candidate gene sequencing was performed in two other families. We established a genetic diagnosis in six families with autosomal recessive HSP (SPG11 in three families and TFG/SPG57, SACS and ALS2 in one family each). A heterozygous mutation in a gene involved in an autosomal dominant HSP (ATL1/SPG3A) was also identified in one additional family. Six out of seven identified variants were novel. The c.64C>T (p.(Arg22Trp)) TFG/SPG57 variant (PB1 domain) is the second identified that underlies HSP, and we demonstrated its impact on TFG oligomerization in vitro. Patients did not present with visual impairment as observed in a previously reported SPG57 family (c.316C>T (p.(Arg106Cys)) in coiled-coil domain), suggesting unique contributions of the PB1 and coiled-coil domains in TFG complex formation/function and a possible phenotype correlation to variant location. Some families manifested marked phenotypic variations implying the possibility of modifier factors complicated by high inbreeding. Finally, additional genetic heterogeneity is expected in HSP Sudanese families. The remaining families might unravel new genes or uncommon modes of inheritance.
Hereditary spastic paraplegias: identification of a novel SPG57 variant affecting TFG oligomerization and description of HSP subtypes in Sudan

PubMed Central

Elsayed, Liena E O; Mohammed, Inaam N; Hamed, Ahlam A A; Elseed, Maha A; Johnson, Adam; Mairey, Mathilde; Mohamed, Hassab Elrasoul S A; Idris, Mohamed N; Salih, Mustafa A M; El-sadig, Sarah M; Koko, Mahmoud E; Mohamed, Ashraf Y O; Raymond, Laure; Coutelier, Marie; Darios, Frédéric; Siddig, Rayan A; Ahmed, Ahmed K M A; Babai, Arwa M A; Malik, Hiba M O; Omer, Zulfa M B M; Mohamed, Eman O E; Eltahir, Hanan B; Magboul, Nasr Aldin A; Bushara, Elfatih E; Elnour, Abdelrahman; Rahim, Salah M Abdel; Alattaya, Abdelmoneim; Elbashir, Mustafa I; Ibrahim, Muntaser E; Durr, Alexandra; Audhya, Anjon; Brice, Alexis; Ahmed, Ammar E; Stevanin, Giovanni

2017-01-01

Hereditary spastic paraplegias (HSP) are the second most common type of motor neuron disease recognized worldwide. We investigated a total of 25 consanguineous families from Sudan. We used next-generation sequencing to screen 74 HSP-related genes in 23 families. Linkage analysis and candidate gene sequencing was performed in two other families. We established a genetic diagnosis in six families with autosomal recessive HSP (SPG11 in three families and TFG/SPG57, SACS and ALS2 in one family each). A heterozygous mutation in a gene involved in an autosomal dominant HSP (ATL1/SPG3A) was also identified in one additional family. Six out of seven identified variants were novel. The c.64C>T (p.(Arg22Trp)) TFG/SPG57 variant (PB1 domain) is the second identified that underlies HSP, and we demonstrated its impact on TFG oligomerization in vitro. Patients did not present with visual impairment as observed in a previously reported SPG57 family (c.316C>T (p.(Arg106Cys)) in coiled-coil domain), suggesting unique contributions of the PB1 and coiled-coil domains in TFG complex formation/function and a possible phenotype correlation to variant location. Some families manifested marked phenotypic variations implying the possibility of modifier factors complicated by high inbreeding. Finally, additional genetic heterogeneity is expected in HSP Sudanese families. The remaining families might unravel new genes or uncommon modes of inheritance. PMID:27601211
Sequence and Analysis of the Genome of the Pathogenic Yeast Candida orthopsilosis

PubMed Central

Riccombeni, Alessandro; Vidanes, Genevieve; Proux-Wéra, Estelle; Wolfe, Kenneth H.; Butler, Geraldine

2012-01-01

Candida orthopsilosis is closely related to the fungal pathogen Candida parapsilosis. However, whereas C. parapsilosis is a major cause of disease in immunosuppressed individuals and in premature neonates, C. orthopsilosis is more rarely associated with infection. We sequenced the C. orthopsilosis genome to facilitate the identification of genes associated with virulence. Here, we report the de novo assembly and annotation of the genome of a Type 2 isolate of C. orthopsilosis. The sequence was obtained by combining data from next generation sequencing (454 Life Sciences and Illumina) with paired-end Sanger reads from a fosmid library. The final assembly contains 12.6 Mb on 8 chromosomes. The genome was annotated using an automated pipeline based on comparative analysis of genomes of Candida species, together with manual identification of introns. We identified 5700 protein-coding genes in C. orthopsilosis, of which 5570 have an ortholog in C. parapsilosis. The time of divergence between C. orthopsilosis and C. parapsilosis is estimated to be twice as great as that between Candida albicans and Candida dubliniensis. There has been an expansion of the Hyr/Iff family of cell wall genes and the JEN family of monocarboxylic transporters in C. parapsilosis relative to C. orthopsilosis. We identified one gene from a Maltose/Galactoside O-acetyltransferase family that originated by horizontal gene transfer from a bacterium to the common ancestor of C. orthopsilosis and C. parapsilosis. We report that TFB3, a component of the general transcription factor TFIIH, undergoes alternative splicing by intron retention in multiple Candida species. We also show that an intein in the vacuolar ATPase gene VMA1 is present in C. orthopsilosis but not C. parapsilosis, and has a patchy distribution in Candida species. Our results suggest that the difference in virulence between C. parapsilosis and C. orthopsilosis may be associated with expansion of gene families. PMID:22563396
BMPR1B mutation causes Pierre Robin sequence

PubMed Central

Yao, Xu; Zhang, Rong; Yang, Hui; Zhao, Rui; Guo, Jihong; Jin, Ke; Mei, Haibo; Luo, Yongqi; Zhao, Liu; Tu, Ming; Zhu, Yimin

2017-01-01

Background We investigated a large family with Pierre Robin sequence (PRS). Aim of the study This study aims to determine the genetic cause of PRS. Results The reciprocal translocation t(4;6)(q22;p21) was identified to be segregated with PRS in a three-generation family. Whole-genome sequencing and Sanger sequencing successfully detected breakpoints in the intragenic regions of BMRP1B and GRM4. We hypothesized that PRS in this family was caused by (i) haploinsufficiency for BMPR1B or (ii) a gain of function mechanism mediated by the BMPR1B-GRM4 fusion gene. In an unrelated family, we identified another BMPR1B-splicing mutation that co-segregated with PRS. Conclusion We detected two BMPR1B mutations in two unrelated PRS families, suggesting that BMPR1B disruption is probably a cause of human PRS. Methods GTG banding, comparative genomic hybridization, whole-genome sequencing, and Sanger sequencing were performed to identify the gene causing PRS. PMID:28418932
The WRKY Transcription Factor Family in Citrus: Valuable and Useful Candidate Genes for Citrus Breeding.

PubMed

Ayadi, M; Hanana, M; Kharrat, N; Merchaoui, H; Marzoug, R Ben; Lauvergeat, V; Rebaï, A; Mzid, R

2016-10-01

WRKY transcription factors belong to a large family of plant transcriptional regulators whose members have been reported to be involved in a wide range of biological roles including plant development, adaptation to environmental constraints and response to several diseases. However, little or poor information is available about WRKY's in Citrus. The recent release of completely assembled genomes sequences of Citrus sinensis and Citrus clementina and the availability of ESTs sequences from other citrus species allowed us to perform a genome survey for Citrus WRKY proteins. In the present study, we identified 100 WRKY members from C. sinensis (51), C. clementina (48) and Citrus unshiu (1), and analyzed their chromosomal distribution, gene structure, gene duplication, syntenic relation and phylogenetic analysis. A phylogenetic tree of 100 Citrus WRKY sequences with their orthologs from Arabidopsis has distinguished seven groups. The CsWRKY genes were distributed across all ten sweet orange chromosomes. A comprehensive approach and an integrative analysis of Citrus WRKY gene expression revealed variable profiles of expression within tissues and stress conditions indicating functional diversification. Thus, candidate Citrus WRKY genes have been proposed as potentially involved in fruit acidification, essential oil biosynthesis and abiotic/biotic stress tolerance. Our results provided essential prerequisites for further WRKY genes cloning and functional analysis with an aim of citrus crop improvement.
A family of long intergenic non-coding RNA genes in human chromosomal region 22q11.2 carry a DNA translocation breakpoint/AT-rich sequence

PubMed Central

2018-01-01

FAM230C, a long intergenic non-coding RNA (lincRNA) gene in human chromosome 13 (chr13) is a member of lincRNA genes termed family with sequence similarity 230. An analysis using bioinformatics search tools and alignment programs was undertaken to determine properties of FAM230C and its related genes. Results reveal that the DNA translocation element, the Translocation Breakpoint Type A (TBTA) sequence, which consists of satellite DNA, Alu elements, and AT-rich sequences is embedded in the FAM230C gene. Eight lincRNA genes related to FAM230C also carry the TBTA sequences. These genes were formed from a large segment of the 3’ half of the FAM230C sequence duplicated in chr22, and are specifically in regions of low copy repeats (LCR22)s, in or close to the 22q.11.2 region. 22q11.2 is a chromosomal segment that undergoes a high rate of DNA translocation and is prone to genetic deletions. FAM230C-related genes present in other chromosomes do not carry the TBTA motif and were formed from the 5’ half region of the FAM230C sequence. These findings identify a high specificity in lincRNA gene formation by gene sequence duplication in different chromosomes. PMID:29668722
Identifying different transcribed proteins in the newly described Theraphosidae Pamphobeteus verdolaga.

PubMed

Estrada-Gómez, Sebastian; Vargas-Muñoz, Leidy Johana; Saldarriaga-Córdoba, Mónica; Cifuentes, Yeimy; Perafan, Carlos

2017-04-01

Theraphosidae spider venoms are well known for possess a complex mixture of protein and non-protein compounds in their venom. The objective of this study was to report and identify different proteins translated from the venom gland DNA information of the recently described Theraphosidae spider Pamphobeteus verdolaga. Using a venom gland transcriptomic analysis, we reported a set of the first complete sequences of seven different proteins of the recenlty described Theraphosidae spider P. verdolaga. Protein analysis indicates the presence of different proteins on the venom composition of this new spider, some of them uncommon in the Theraphosidae family. MS/MS analysis of P. verdolaga showed different fragments matching sphingomyelinases (sicaritoxin), barytoxins, hexatoxins, latroinsectotoxins, and linear (zadotoxins) peptides. Only four of the MS/MS fragments showed 100% sequence similarity with one of the transcribed proteins. Transcriptomic analysis showed the presence of different groups of proteins like phospholipases, hyaluronidases, inhibitory cysteine knots (ICK) peptides among others. The three database of protein domains used in this study (Pfam, SMART and CDD) showed congruency in the search of unique conserved protein domain for only four of the translated proteins. Those proteins matched with EF-hand proteins, cysteine rich secretory proteins, jingzhaotoxins, theraphotoxins and hexatoxins, from different Mygalomorphae spiders belonging to the families Theraphosidae, Barychelidae and Hexathelidae. None of the analyzed sequences showed a complete 100% similarity. Copyright © 2017 Elsevier Ltd. All rights reserved.

Diversity in the architecture of ATLs, a family of plant ubiquitin-ligases, leads to recognition and targeting of substrates in different cellular environments.

PubMed

Aguilar-Hernández, Victor; Aguilar-Henonin, Laura; Guzmán, Plinio

2011-01-01

Ubiquitin-ligases or E3s are components of the ubiquitin proteasome system (UPS) that coordinate the transfer of ubiquitin to the target protein. A major class of ubiquitin-ligases consists of RING-finger domain proteins that include the substrate recognition sequences in the same polypeptide; these are known as single-subunit RING finger E3s. We are studying a particular family of RING finger E3s, named ATL, that contain a transmembrane domain and the RING-H2 finger domain; none of the member of the family contains any other previously described domain. Although the study of a few members in A. thaliana and O. sativa has been reported, the role of this family in the life cycle of a plant is still vague. To provide tools to advance on the functional analysis of this family we have undertaken a phylogenetic analysis of ATLs in twenty-four plant genomes. ATLs were found in all the 24 plant species analyzed, in numbers ranging from 20-28 in two basal species to 162 in soybean. Analysis of ATLs arrayed in tandem indicates that sets of genes are expanding in a species-specific manner. To get insights into the domain architecture of ATLs we generated 75 pHMM LOGOs from 1815 ATLs, and unraveled potential protein-protein interaction regions by means of yeast two-hybrid assays. Several ATLs were found to interact with DSK2a/ubiquilin through a region at the amino-terminal end, suggesting that this is a widespread interaction that may assist in the mode of action of ATLs; the region was traced to a distinct sequence LOGO. Our analysis provides significant observations on the evolution and expansion of the ATL family in addition to information on the domain structure of this class of ubiquitin-ligases that may be involved in plant adaptation to environmental stress.
Implications of the plastid genome sequence of typha (typhaceae, poales) for understanding genome evolution in poaceae.

PubMed

Guisinger, Mary M; Chumley, Timothy W; Kuehl, Jennifer V; Boore, Jeffrey L; Jansen, Robert K

2010-02-01

Plastid genomes of the grasses (Poaceae) are unusual in their organization and rates of sequence evolution. There has been a recent surge in the availability of grass plastid genome sequences, but a comprehensive comparative analysis of genome evolution has not been performed that includes any related families in the Poales. We report on the plastid genome of Typha latifolia, the first non-grass Poales sequenced to date, and we present comparisons of genome organization and sequence evolution within Poales. Our results confirm that grass plastid genomes exhibit acceleration in both genomic rearrangements and nucleotide substitutions. Poaceae have multiple structural rearrangements, including three inversions, three genes losses (accD, ycf1, ycf2), intron losses in two genes (clpP, rpoC1), and expansion of the inverted repeat (IR) into both large and small single-copy regions. These rearrangements are restricted to the Poaceae, and IR expansion into the small single-copy region correlates with the phylogeny of the family. Comparisons of 73 protein-coding genes for 47 angiosperms including nine Poaceae genera confirm that the branch leading to Poaceae has significantly accelerated rates of change relative to other monocots and angiosperms. Furthermore, rates of sequence evolution within grasses are lower, indicating a deceleration during diversification of the family. Overall there is a strong correlation between accelerated rates of genomic rearrangements and nucleotide substitutions in Poaceae, a phenomenon that has been noted recently throughout angiosperms. The cause of the correlation is unknown, but faulty DNA repair has been suggested in other systems including bacterial and animal mitochondrial genomes.
A draft of the genome and four transcriptomes of a medicinal and pesticidal angiosperm Azadirachta indica

PubMed Central

2012-01-01

Background The Azadirachta indica (neem) tree is a source of a wide number of natural products, including the potent biopesticide azadirachtin. In spite of its widespread applications in agriculture and medicine, the molecular aspects of the biosynthesis of neem terpenoids remain largely unexplored. The current report describes the draft genome and four transcriptomes of A. indica and attempts to contextualise the sequence information in terms of its molecular phylogeny, transcript expression and terpenoid biosynthesis pathways. A. indica is the first member of the family Meliaceae to be sequenced using next generation sequencing approach. Results The genome and transcriptomes of A. indica were sequenced using multiple sequencing platforms and libraries. The A. indica genome is AT-rich, bears few repetitive DNA elements and comprises about 20,000 genes. The molecular phylogenetic analyses grouped A. indica together with Citrus sinensis from the Rutaceae family validating its conventional taxonomic classification. Comparative transcript expression analysis showed either exclusive or enhanced expression of known genes involved in neem terpenoid biosynthesis pathways compared to other sequenced angiosperms. Genome and transcriptome analyses in A. indica led to the identification of repeat elements, nucleotide composition and expression profiles of genes in various organs. Conclusions This study on A. indica genome and transcriptomes will provide a model for characterization of metabolic pathways involved in synthesis of bioactive compounds, comparative evolutionary studies among various Meliaceae family members and help annotate their genomes. A better understanding of molecular pathways involved in the azadirachtin synthesis in A. indica will pave ways for bulk production of environment friendly biopesticides. PMID:22958331
The complete chloroplast genome sequence of Epipremnum aureum and its comparative analysis among eight Araceae species

PubMed Central

Han, Limin; Chen, Chen; Wang, Zhezhi

2018-01-01

Epipremnum aureum is an important foliage plant in the Araceae family. In this study, we have sequenced the complete chloroplast genome of E. aureum by using Illumina Hiseq sequencing platforms. This genome is a double-stranded circular DNA sequence of 164,831 bp that contains 35.8% GC. The two inverted repeats (IRa and IRb; 26,606 bp) are spaced by a small single-copy region (22,868 bp) and a large single-copy region (88,751 bp). The chloroplast genome has 131 (113 unique) functional genes, including 86 (79 unique) protein-coding genes, 37 (30 unique) tRNA genes, and eight (four unique) rRNA genes. Tandem repeats comprise the majority of the 43 long repetitive sequences. In addition, 111 simple sequence repeats are present, with mononucleotides being the most common type and di- and tetranucleotides being infrequent events. Positive selection pressure on rps12 in the E. aureum chloroplast has been demonstrated via synonymous and nonsynonymous substitution rates and selection pressure sites analyses. Ycf15 and infA are pseudogenes in this species. We constructed a Maximum Likelihood phylogenetic tree based on the complete chloroplast genomes of 38 species from 13 families. Those results strongly indicated that E. aureum is positioned as the sister of Colocasia esculenta within the Araceae family. This work may provide information for further study of the molecular phylogenetic relationships within Araceae, as well as molecular markers and breeding novel varieties by chloroplast genetic-transformation of E. aureum in particular. PMID:29529038
Homology-based Modeling of Rhodopsin-like Family Members in the Inactive State: Structural Analysis and Deduction of Tips for Modeling and Optimization.

PubMed

Pappalardo, Matteo; Rayan, Mahmoud; Abu-Lafi, Saleh; Leonardi, Martha E; Milardi, Danilo; Guccione, Salvatore; Rayan, Anwar

2017-08-01

Modeling G-Protein Coupled Receptors (GPCRs) is an emergent field of research, since utility of high-quality models in receptor structure-based strategies might facilitate the discovery of interesting drug candidates. The findings from a quantitative analysis of eighteen resolved structures of rhodopsin family "A" receptors crystallized with antagonists and 153 pairs of structures are described. A strategy termed endeca-amino acids fragmentation was used to analyze the structures models aiming to detect the relationship between sequence identity and Root Mean Square Deviation (RMSD) at each trans-membrane-domain. Moreover, we have applied the leave-one-out strategy to study the shiftiness likelihood of the helices. The type of correlation between sequence identity and RMSD was studied using the aforementioned set receptors as representatives of membrane proteins and 98 serine proteases with 4753 pairs of structures as representatives of globular proteins. Data analysis using fragmentation strategy revealed that there is some extent of correlation between sequence identity and global RMSD of 11AA width windows. However, spatial conservation is not always close to the endoplasmic side as was reported before. A comparative study with globular proteins shows that GPCRs have higher standard deviation and higher slope in the graph with correlation between sequence identity and RMSD. The extracted information disclosed in this paper could be incorporated in the modeling protocols while using technique for model optimization and refinement. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Frequent germline deleterious mutations in DNA repair genes in familial prostate cancer cases are associated with advanced disease.

PubMed

Leongamornlert, D; Saunders, E; Dadaev, T; Tymrakiewicz, M; Goh, C; Jugurnauth-Little, S; Kozarewa, I; Fenwick, K; Assiotis, I; Barrowdale, D; Govindasami, K; Guy, M; Sawyer, E; Wilkinson, R; Antoniou, A C; Eeles, R; Kote-Jarai, Z

2014-03-18

Prostate cancer (PrCa) is one of the most common diseases to affect men worldwide and among the leading causes of cancer-related death. The purpose of this study was to use second-generation sequencing technology to assess the frequency of deleterious mutations in 22 tumour suppressor genes in familial PrCa and estimate the relative risk of PrCa if these genes are mutated. Germline DNA samples from 191 men with 3 or more cases of PrCa in their family were sequenced for 22 tumour suppressor genes using Agilent target enrichment and Illumina technology. Analysis for genetic variation was carried out by using a pipeline consisting of BWA, Genome Analysis Toolkit (GATK) and ANNOVAR. Clinical features were correlated with mutation status using standard statistical tests. Modified segregation analysis was used to determine the relative risk of PrCa conferred by the putative loss-of-function (LoF) mutations identified. We discovered 14 putative LoF mutations in 191 samples (7.3%) and these mutations were more frequently associated with nodal involvement, metastasis or T4 tumour stage (P=0.00164). Segregation analysis of probands with European ancestry estimated that LoF mutations in any of the studied genes confer a relative risk of PrCa of 1.94 (95% CI: 1.56-2.42). These findings show that LoF mutations in DNA repair pathway genes predispose to familial PrCa and advanced disease and therefore warrants further investigation. The clinical utility of these findings will become increasingly important as targeted screening and therapies become more widespread.
Automated DNA mutation detection using universal conditions direct sequencing: application to ten muscular dystrophy genes

PubMed Central

2009-01-01

Background One of the most common and efficient methods for detecting mutations in genes is PCR amplification followed by direct sequencing. Until recently, the process of designing PCR assays has been to focus on individual assay parameters rather than concentrating on matching conditions for a set of assays. Primers for each individual assay were selected based on location and sequence concerns. The two primer sequences were then iteratively adjusted to make the individual assays work properly. This generally resulted in groups of assays with different annealing temperatures that required the use of multiple thermal cyclers or multiple passes in a single thermal cycler making diagnostic testing time-consuming, laborious and expensive. These factors have severely hampered diagnostic testing services, leaving many families without an answer for the exact cause of a familial genetic disease. A search of GeneTests for sequencing analysis of the entire coding sequence for genes that are known to cause muscular dystrophies returns only a small list of laboratories that perform comprehensive gene panels. The hypothesis for the study was that a complete set of universal assays can be designed to amplify and sequence any gene or family of genes using computer aided design tools. If true, this would allow automation and optimization of the mutation detection process resulting in reduced cost and increased throughput. Results An automated process has been developed for the detection of deletions, duplications/insertions and point mutations in any gene or family of genes and has been applied to ten genes known to bear mutations that cause muscular dystrophy: DMD; CAV3; CAPN3; FKRP; TRIM32; LMNA; SGCA; SGCB; SGCG; SGCD. Using this process, mutations have been found in five DMD patients and four LGMD patients (one in the FKRP gene, one in the CAV3 gene, and two likely causative heterozygous pairs of variations in the CAPN3 gene of two other patients). Methods and assay sequences are reported in this paper. Conclusion This automated process allows laboratories to discover DNA variations in a short time and at low cost. PMID:19835634
Automated DNA mutation detection using universal conditions direct sequencing: application to ten muscular dystrophy genes.

PubMed

Bennett, Richard R; Schneider, Hal E; Estrella, Elicia; Burgess, Stephanie; Cheng, Andrew S; Barrett, Caitlin; Lip, Va; Lai, Poh San; Shen, Yiping; Wu, Bai-Lin; Darras, Basil T; Beggs, Alan H; Kunkel, Louis M

2009-10-18

One of the most common and efficient methods for detecting mutations in genes is PCR amplification followed by direct sequencing. Until recently, the process of designing PCR assays has been to focus on individual assay parameters rather than concentrating on matching conditions for a set of assays. Primers for each individual assay were selected based on location and sequence concerns. The two primer sequences were then iteratively adjusted to make the individual assays work properly. This generally resulted in groups of assays with different annealing temperatures that required the use of multiple thermal cyclers or multiple passes in a single thermal cycler making diagnostic testing time-consuming, laborious and expensive.These factors have severely hampered diagnostic testing services, leaving many families without an answer for the exact cause of a familial genetic disease. A search of GeneTests for sequencing analysis of the entire coding sequence for genes that are known to cause muscular dystrophies returns only a small list of laboratories that perform comprehensive gene panels.The hypothesis for the study was that a complete set of universal assays can be designed to amplify and sequence any gene or family of genes using computer aided design tools. If true, this would allow automation and optimization of the mutation detection process resulting in reduced cost and increased throughput. An automated process has been developed for the detection of deletions, duplications/insertions and point mutations in any gene or family of genes and has been applied to ten genes known to bear mutations that cause muscular dystrophy: DMD; CAV3; CAPN3; FKRP; TRIM32; LMNA; SGCA; SGCB; SGCG; SGCD. Using this process, mutations have been found in five DMD patients and four LGMD patients (one in the FKRP gene, one in the CAV3 gene, and two likely causative heterozygous pairs of variations in the CAPN3 gene of two other patients). Methods and assay sequences are reported in this paper. This automated process allows laboratories to discover DNA variations in a short time and at low cost.
No novel, high penetrant gene might remain to be found in Japanese patients with unknown MODY.

PubMed

Horikawa, Yukio; Hosomichi, Kazuyoshi; Enya, Mayumi; Ishiura, Hiroyuki; Suzuki, Yutaka; Tsuji, Shoji; Sugano, Sumio; Inoue, Ituro; Takeda, Jun

2018-07-01

MODY 5 and 6 have been shown to be low-penetrant MODYs. As the genetic background of unknown MODY is assumed to be similar, a new analytical strategy is applied here to elucidate genetic predispositions to unknown MODY. We examined to find whether there are major MODY gene loci remaining to be identified using SNP linkage analysis in Japanese. Whole-exome sequencing was performed with seven families with typical MODY. Candidates for novel MODY genes were examined combined with in silico network analysis. Some peaks were found only in either parametric or non-parametric analysis; however, none of these peaks showed a LOD score greater than 3.7, which is approved to be the significance threshold of evidence for linkage. Exome sequencing revealed that three mutated genes were common among 3 families and 42 mutated genes were common in two families. Only one of these genes, MYO5A, having rare amino acid mutations p.R849Q and p.V1601G, was involved in the biological network of known MODY genes through the intermediary of the INS. Although only one promising candidate gene, MYO5A, was identified, no novel, high penetrant MODY genes might remain to be found in Japanese MODY.
Disruption of the methyltransferase-like 23 gene METTL23 causes mild autosomal recessive intellectual disability

PubMed Central

Bernkopf, Marie; Webersinke, Gerald; Tongsook, Chanakan; Koyani, Chintan N.; Rafiq, Muhammad A.; Ayaz, Muhammad; Müller, Doris; Enzinger, Christian; Aslam, Muhammad; Naeem, Farooq; Schmidt, Kurt; Gruber, Karl; Speicher, Michael R.; Malle, Ernst; Macheroux, Peter; Ayub, Muhammad; Vincent, John B.; Windpassinger, Christian; Duba, Hans-Christoph

2014-01-01

We describe the characterization of a gene for mild nonsyndromic autosomal recessive intellectual disability (ID) in two unrelated families, one from Austria, the other from Pakistan. Genome-wide single nucleotide polymorphism microarray analysis enabled us to define a region of homozygosity by descent on chromosome 17q25. Whole-exome sequencing and analysis of this region in an affected individual from the Austrian family identified a 5 bp frameshifting deletion in the METTL23 gene. By means of Sanger sequencing of METTL23, a nonsense mutation was detected in a consanguineous ID family from Pakistan for which homozygosity-by-descent mapping had identified a region on 17q25. Both changes lead to truncation of the putative METTL23 protein, which disrupts the predicted catalytic domain and alters the cellular localization. 3D-modelling of the protein indicates that METTL23 is strongly predicted to function as an S-adenosyl-methionine (SAM)-dependent methyltransferase. Expression analysis of METTL23 indicated a strong association with heat shock proteins, which suggests that these may act as a putative substrate for methylation by METTL23. A number of methyltransferases have been described recently in association with ID. Disruption of METTL23 presented here supports the importance of methylation processes for intact neuronal function and brain development. PMID:24626631
Basic leucine zipper family in barley: genome-wide characterization of members and expression analysis.

PubMed

Pourabed, Ehsan; Ghane Golmohamadi, Farzan; Soleymani Monfared, Peyman; Razavi, Seyed Morteza; Shobbar, Zahra-Sadat

2015-01-01

The basic leucine zipper (bZIP) family is one of the largest and most diverse transcription factors in eukaryotes participating in many essential plant processes. We identified 141 bZIP proteins encoded by 89 genes from the Hordeum vulgare genome. HvbZIPs were classified into 11 groups based on their DNA-binding motif. Amino acid sequence alignment of the HvbZIPs basic-hinge regions revealed some highly conserved residues within each group. The leucine zipper heptads were analyzed predicting their dimerization properties. 34 conserved motifs were identified outside the bZIP domain. Phylogenetic analysis indicated that major diversification within the bZIP family predated the monocot/dicot divergence, although intra-species duplication and parallel evolution seems to be occurred afterward. Localization of HvbZIPs on the barley chromosomes revealed that different groups have been distributed on seven chromosomes of barley. Six types of intron pattern were detected within the basic-hinge regions. Most of the detected cis-elements in the promoter and UTR sequences were involved in seed development or abiotic stress response. Microarray data analysis revealed differential expression pattern of HvbZIPs in response to ABA treatment, drought, and cold stresses and during barley grain development and germination. This information would be helpful for functional characterization of bZIP transcription factors in barley.
Genome analysis of Hibiscus syriacus provides insights of polyploidization and indeterminate flowering in woody plants

PubMed Central

Kim, Yong-Min; Kim, Seungill; Koo, Namjin; Shin, Ah-Young; Yeom, Seon-In; Seo, Eunyoung; Park, Seong-Jin; Kang, Won-Hee; Kim, Myung-Shin; Park, Jieun; Jang, Insu; Kim, Pan-Gyu; Byeon, Iksu; Kim, Min-Seo; Choi, JinHyuk; Ko, Gunhwan; Hwang, JiHye; Yang, Tae-Jin; Choi, Sang-Bong; Lee, Je Min; Lim, Ki-Byung; Lee, Jungho; Choi, Ik-Young; Park, Beom-Seok; Kwon, Suk-Yoon; Choi, Doil

2017-01-01

Abstract Hibiscus syriacus (L.) (rose of Sharon) is one of the most widespread garden shrubs in the world. We report a draft of the H. syriacus genome comprised of a 1.75 Gb assembly that covers 92% of the genome with only 1.7% (33 Mb) gap sequences. Predicted gene modeling detected 87,603 genes, mostly supported by deep RNA sequencing data. To define gene family distribution among relatives of H. syriacus, orthologous gene sets containing 164,660 genes in 21,472 clusters were identified by OrthoMCL analysis of five plant species, including H. syriacus, Arabidopsis thaliana, Gossypium raimondii, Theobroma cacao and Amborella trichopoda. We inferred their evolutionary relationships based on divergence times among Malvaceae plant genes and found that gene families involved in flowering regulation and disease resistance were more highly divergent and expanded in H. syriacus than in its close relatives, G. raimondii (DD) and T. cacao. Clustered gene families and gene collinearity analysis revealed that two recent rounds of whole-genome duplication were followed by diploidization of the H. syriacus genome after speciation. Copy number variation and phylogenetic divergence indicates that WGDs and subsequent diploidization led to unequal duplication and deletion of flowering-related genes in H. syriacus and may affect its unique floral morphology. PMID:28011721
An Alignment-Free Algorithm in Comparing the Similarity of Protein Sequences Based on Pseudo-Markov Transition Probabilities among Amino Acids

PubMed Central

Li, Yushuang; Yang, Jiasheng; Zhang, Yi

2016-01-01

In this paper, we have proposed a novel alignment-free method for comparing the similarity of protein sequences. We first encode a protein sequence into a 440 dimensional feature vector consisting of a 400 dimensional Pseudo-Markov transition probability vector among the 20 amino acids, a 20 dimensional content ratio vector, and a 20 dimensional position ratio vector of the amino acids in the sequence. By evaluating the Euclidean distances among the representing vectors, we compare the similarity of protein sequences. We then apply this method into the ND5 dataset consisting of the ND5 protein sequences of 9 species, and the F10 and G11 datasets representing two of the xylanases containing glycoside hydrolase families, i.e., families 10 and 11. As a result, our method achieves a correlation coefficient of 0.962 with the canonical protein sequence aligner ClustalW in the ND5 dataset, much higher than those of other 5 popular alignment-free methods. In addition, we successfully separate the xylanases sequences in the F10 family and the G11 family and illustrate that the F10 family is more heat stable than the G11 family, consistent with a few previous studies. Moreover, we prove mathematically an identity equation involving the Pseudo-Markov transition probability vector and the amino acids content ratio vector. PMID:27918587
Isolation, structural analysis, and expression characteristics of the maize (Zea mays L.) hexokinase gene family.

PubMed

Zhang, Zhongbao; Zhang, Jiewei; Chen, Yajuan; Li, Ruifen; Wang, Hongzhi; Ding, Liping; Wei, Jianhua

2014-09-01

Hexokinases (HXKs, EC 2.7.1.1) play important roles in metabolism, glucose (Glc) signaling, and phosphorylation of Glc and fructose and are ubiquitous in all organisms. Despite their physiological importance, the maize HXK (ZmHXK) genes have not been analyzed systematically. We isolated and characterized nine members of the ZmHXK gene family which were distributed on 3 of the 10 maize chromosomes. A multiple sequence alignment and motif analysis revealed that the maize ZmHXK proteins share three conserved domains. Phylogenetic analysis revealed that the ZmHXK family can be divided into four subfamilies. We identified putative cis-elements in the ZmHXK promoter sequences potentially involved in phytohormone and abiotic stress responses, sugar repression, light and circadian rhythm regulation, Ca(2+) responses, seed development and germination, and CO2-responsive transcriptional activation. To study the functions of maize HXK isoforms, we characterized the expression of the ZmHXK5 and ZmHXK6 genes, which are evolutionarily related to the OsHXK5 and OsHXK6 genes from rice. Analysis of tissue-specific expression patterns using quantitative real time-PCR showed that ZmHXK5 was highly expressed in tassels, while ZmHXK6 was expressed in both tassels and leaves. ZmHXK5 and ZmHXK6 expression levels were upregulated by phytohormones and by abiotic stress.
Genomic characterization and expression profiles upon bacterial infection of a novel cystatin B homologue from disk abalone (Haliotis discus discus).

PubMed

Premachandra, H K A; Wan, Qiang; Elvitigala, Don Anushka Sandaruwan; De Zoysa, Mahanama; Choi, Cheol Young; Whang, Ilson; Lee, Jehee

2012-12-01

Cystatins are a large family of cysteine proteinase inhibitors which are involved in diverse biological and pathological processes. In the present study, we identified a gene related to cystatin superfamily, AbCyt B, from disk abalone Haliotis discus discus by expressed sequence tag (EST) analysis and BAC library screening. The complete cDNA sequence of AbCyt B is comprised of 1967 nucleotides with a 306 bp open reading frame (ORF) encoding for 101 amino acids. The amino acid sequence consists of a single cystatin-like domain, which has a cysteine proteinase inhibitor signature, a conserved Gly in N-terminal region, QVVAG motif and a variant of PW motif. No signal peptide, disulfide bonds or carbohydrate side chains were identified. Analysis of deduced amino acid sequence revealed that AbCyt B shares up to 44.7% identity and 65.7% similarity with the cystatin B genes from other organisms. The genomic sequence of AbCyt B is approximately 8.4 Kb, consisting of three exons and two introns. Phylogenetic tree analysis showed that AbCyt B was closely related to the cystatin B from pacific oyster (Crassostrea gigas) under the family 1.Functional analysis of recombinant AbCyt B protein exhibited inhibitory activity against the papain, with almost 84% inhibition at a concentration of 3.5 μmol/L. In tissue expression analysis, AbCyt B transcripts were expressed abundantly in the hemocyte, gill, mantle, and digestive tract, while weakly in muscle, testis, and hepatopancreas. After the immune challenge with Vibrio parahemolyticus, the AbCyt B showed significant (P<0.05) up-regulation of relative mRNA expression in gill and hemocytes at 24 and 6 h of post infection, respectively. These results collectively suggest that AbCyst B is a potent inhibitor of cysteine proteinases and is also potentially involved in immune responses against invading bacterial pathogens in abalone. Copyright © 2012 Elsevier Ltd. All rights reserved.
Identification of Sequence Specificity of 5-Methylcytosine Oxidation by Tet1 Protein with High-Throughput Sequencing.

PubMed

Kizaki, Seiichiro; Chandran, Anandhakumar; Sugiyama, Hiroshi

2016-03-02

Tet (ten-eleven translocation) family proteins have the ability to oxidize 5-methylcytosine (mC) to 5-hydroxymethylcytosine (hmC), 5-formylcytosine (fC), and 5-carboxycytosine (caC). However, the oxidation reaction of Tet is not understood completely. Evaluation of genomic-level epigenetic changes by Tet protein requires unbiased identification of the highly selective oxidation sites. In this study, we used high-throughput sequencing to investigate the sequence specificity of mC oxidation by Tet1. A 6.6×10(4) -member mC-containing random DNA-sequence library was constructed. The library was subjected to Tet-reactive pulldown followed by high-throughput sequencing. Analysis of the obtained sequence data identified the Tet1-reactive sequences. We identified mCpG as a highly reactive sequence of Tet1 protein. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Whole-genome sequence analysis of Zika virus, amplified from urine of traveler from the Philippines.

PubMed

Gu, Se Hun; Song, Dong Hyun; Lee, Daesang; Jang, Jeyoun; Kim, Min Young; Jung, Jaehun; Woo, Koung In; Kim, Mirang; Seog, Woong; Oh, Hong Sang; Choi, Byung Seop; Ahn, Jong-Seong; Park, Quehn; Jeong, Seong Tae

2017-12-01

Zika virus (ZIKV) (genus Flavivirus, family Flaviviridae) is an emerging pathogen associated with microcephaly and Guillain-Barré syndrome. The rapid spread of ZIKV disease in over 60 countries and the large numbers of travel-associated cases have caused worldwide concern. Thus, intensified surveillance of cases among immigrants and tourists from ZIKV-endemic areas is important for disease control and prevention. In this study, using Next Generation Sequencing, we reported the first whole-genome sequence of ZIKV strain AFMC-U, amplified from the urine of a traveler returning to Korea from the Philippines. Phylogenetic analysis showed geographic-specific clustering. Our results underscore the importance of examining urine in the diagnosis of ZIKV infection.
Genetic relationships between blowflies (Calliphoridae) of forensic importance.

PubMed

Stevens, J; Wall, R

2001-08-15

Phylogenetic relationships among blowfly (Calliphoridae) species of forensic importance are explored using DNA sequence data from the large sub-unit (lsu, 28S) ribosomal RNA (rRNA) gene, the study includes representatives of a range of calliphorid species commonly encountered in forensic analysis in Britain and Europe. The data presented provide a basis to define molecular markers, including the identification of highly informative intra-sequence regions, which may be of use in the identification of larvae for forensic entomology. Phylogenetic analysis of the sequences also provides new insights into the different evolutionary patterns apparent within the family Calliphoridae which, additionally, can provide a measure of the degree of genetic variation likely to be encountered within taxonomic groups of differing forensic utility.
“One code to find them all”: a perl tool to conveniently parse RepeatMasker output files

PubMed Central

2014-01-01

Background Of the different bioinformatic methods used to recover transposable elements (TEs) in genome sequences, one of the most commonly used procedures is the homology-based method proposed by the RepeatMasker program. RepeatMasker generates several output files, including the .out file, which provides annotations for all detected repeats in a query sequence. However, a remaining challenge consists of identifying the different copies of TEs that correspond to the identified hits. This step is essential for any evolutionary/comparative analysis of the different copies within a family. Different possibilities can lead to multiple hits corresponding to a unique copy of an element, such as the presence of large deletions/insertions or undetermined bases, and distinct consensus corresponding to a single full-length sequence (like for long terminal repeat (LTR)-retrotransposons). These possibilities must be taken into account to determine the exact number of TE copies. Results We have developed a perl tool that parses the RepeatMasker .out file to better determine the number and positions of TE copies in the query sequence, in addition to computing quantitative information for the different families. To determine the accuracy of the program, we tested it on several RepeatMasker .out files corresponding to two organisms (Drosophila melanogaster and Homo sapiens) for which the TE content has already been largely described and which present great differences in genome size, TE content, and TE families. Conclusions Our tool provides access to detailed information concerning the TE content in a genome at the family level from the .out file of RepeatMasker. This information includes the exact position and orientation of each copy, its proportion in the query sequence, and its quality compared to the reference element. In addition, our tool allows a user to directly retrieve the sequence of each copy and obtain the same detailed information at the family level when a local library with incomplete TE class/subclass information was used with RepeatMasker. We hope that this tool will be helpful for people working on the distribution and evolution of TEs within genomes.
Complete nucleotide sequences and genome characterization of a novel double-stranded RNA virus infecting Rosa multiflora.

PubMed

Salem, Nidá M; Golino, Deborah A; Falk, Bryce W; Rowhani, Adib

2008-01-01

The three double-stranded (ds) RNAs were detected in Rosa multiflora plants showing rose spring dwarf (RSD) symptoms. Northern blot analysis revealed three dsRNAs in preparations of both dsRNA and total RNA from R. multiflora plants. The complete sequences of the dsRNAs (referred to as dsRNA 1, dsRNA 2 and dsRNA 3) were determined based on a combination of shotgun cloning of dsRNA cDNAs and reverse transcription-polymerase chain reaction (RT-PCR). The largest dsRNA (dsRNA 1) was 1,762 bp long with a single open reading frame (ORF) that encoded a putative polypeptide containing 479 amino acid residues with a molecular mass of 55.9 kDa. This polypeptide contains amino acid sequence motifs conserved in the RNA-dependent RNA polymerases (RdRp) of members of the family Partitiviridae. Both dsRNA 2 (1,475 bp) and dsRNA 3 (1,384 bp) contained single ORFs, encoding putative proteins of unknown function. The 5' untranslated regions (UTR) of all three segments shared regions of high sequence homology. Phylogenetic analysis using the RdRp sequences of the various partitiviruses revealed that the new sequences would constitute the genome of a virus in family Partitiviridae. This virus would cluster with Fragaria chiloensis cryptic virus and Raphanus sativus cryptic virus 2. We suggest that the three dsRNA segments constitute the genome of a novel cryptic virus infecting roses; we propose the name Rosa multiflora cryptic virus (RMCV). Detection primers were developed and used for RT-PCR detection of RMCV in rose plants.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.