acid sequence homologies: Topics by Science.gov

Sample records for acid sequence homologies

Homology analyses of the protein sequences of fatty acid synthases from chicken liver, rat mammary gland, and yeast

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chang, Soo-Ik; Hammes, G.G.

1989-11-01

Homology analyses of the protein sequences of chicken liver and rat mammary gland fatty acid synthases were carried out. The amino acid sequences of the chicken and rat enzymes are 67% identical. If conservative substitutions are allowed, 78% of the amino acids are matched. A region of low homologies exists between the functional domains, in particular around amino acid residues 1059-1264 of the chicken enzyme. Homologies between the active sites of chicken and rat and of chicken and yeast enzymes have been analyzed by an alignment method. A high degree of homology exists between the active sites of the chickenmore » and rat enzymes. However, the chicken and yeast enzymes show a lower degree of homology. The DADPH-binding dinucleotide folds of the {beta}-ketoacyl reductase and the enoyl reductase sites were identified by comparison with a known consensus sequence for the DADP- and FAD-binding dinucleotide folds. The active sites of all of the enzymes are primarily in hydrophobic regions of the protein. This study suggests that the genes for the functional domains of fatty acid synthase were originally separated, and these genes were connected to each other by using different connecting nucleotide sequences in different species. An alternative explanation for the differences in rat and chicken is a common ancestry and mutations in the joining regions during evolution.« less
Predicted secondary structure similarity in the absence of primary amino acid sequence homology: hepatitis B virus open reading frames.

PubMed Central

Schaeffer, E; Sninsky, J J

1984-01-01

Proteins that are related evolutionarily may have diverged at the level of primary amino acid sequence while maintaining similar secondary structures. Computer analysis has been used to compare the open reading frames of the hepatitis B virus to those of the woodchuck hepatitis virus at the level of amino acid sequence, and to predict the relative hydrophilic character and the secondary structure of putative polypeptides. Similarity is seen at the levels of relative hydrophilicity and secondary structure, in the absence of sequence homology. These data reinforce the proposal that these open reading frames encode viral proteins. Computer analysis of this type can be more generally used to establish structural similarities between proteins that do not share obvious sequence homology as well as to assess whether an open reading frame is fortuitous or codes for a protein. PMID:6585835
Detecting false positive sequence homology: a machine learning approach.

PubMed

Fujimoto, M Stanley; Suvorov, Anton; Jensen, Nicholas O; Clement, Mark J; Bybee, Seth M

2016-02-24

Accurate detection of homologous relationships of biological sequences (DNA or amino acid) amongst organisms is an important and often difficult task that is essential to various evolutionary studies, ranging from building phylogenies to predicting functional gene annotations. There are many existing heuristic tools, most commonly based on bidirectional BLAST searches that are used to identify homologous genes and combine them into two fundamentally distinct classes: orthologs and paralogs. Due to only using heuristic filtering based on significance score cutoffs and having no cluster post-processing tools available, these methods can often produce multiple clusters constituting unrelated (non-homologous) sequences. Therefore sequencing data extracted from incomplete genome/transcriptome assemblies originated from low coverage sequencing or produced by de novo processes without a reference genome are susceptible to high false positive rates of homology detection. In this paper we develop biologically informative features that can be extracted from multiple sequence alignments of putative homologous genes (orthologs and paralogs) and further utilized in context of guided experimentation to verify false positive outcomes. We demonstrate that our machine learning method trained on both known homology clusters obtained from OrthoDB and randomly generated sequence alignments (non-homologs), successfully determines apparent false positives inferred by heuristic algorithms especially among proteomes recovered from low-coverage RNA-seq data. Almost ~42 % and ~25 % of predicted putative homologies by InParanoid and HaMStR respectively were classified as false positives on experimental data set. Our process increases the quality of output from other clustering algorithms by providing a novel post-processing method that is both fast and efficient at removing low quality clusters of putative homologous genes recovered by heuristic-based approaches.
Amino acid sequence of the human fibronectin receptor

PubMed Central

1987-01-01

The amino acid sequence deduced from cDNA of the human placental fibronectin receptor is reported. The receptor is composed of two subunits: an alpha subunit of 1,008 amino acids which is processed into two polypeptides disulfide bonded to one another, and a beta subunit of 778 amino acids. Each subunit has near its COOH terminus a hydrophobic segment. This and other sequence features suggest a structure for the receptor in which the hydrophobic segments serve as transmembrane domains anchoring each subunit to the membrane and dividing each into a large ectodomain and a short cytoplasmic domain. The alpha subunit ectodomain has five sequence elements homologous to consensus Ca2+- binding sites of several calcium-binding proteins, and the beta subunit contains a fourfold repeat strikingly rich in cysteine. The alpha subunit sequence is 46% homologous to the alpha subunit of the vitronectin receptor. The beta subunit is 44% homologous to the human platelet adhesion receptor subunit IIIa and 47% homologous to a leukocyte adhesion receptor beta subunit. The high degree of homology (85%) of the beta subunit with one of the polypeptides of a chicken adhesion receptor complex referred to as integrin complex strongly suggests that the latter polypeptide is the chicken homologue of the fibronectin receptor beta subunit. These receptor subunit homologies define a superfamily of adhesion receptors. The availability of the entire protein sequence for the fibronectin receptor will facilitate studies on the functions of these receptors. PMID:2958481
[Sequence analysis of LEAFY homologous gene from Dendrobium moniliforme and application for identification of medicinal Dendrobium].

PubMed

Xing, Wen-Rui; Hou, Bei-Wei; Guan, Jing-Jiao; Luo, Jing; Ding, Xiao-Yu

2013-04-01

The LEAFY (LFY) homologous gene of Dendrobium moniliforme (L.) Sw. was cloned by new primers which were designed based on the conservative region of known sequences of orchid LEAFY gene. Partial LFY homologous gene was cloned by common PCR, then we got the complete LFY homologous gene Den LFY by Tail-PCR. The complete sequence of DenLFY gene was 3 575 bp which contained three exons and two introns. Using BLAST method, comparison analysis among the exon of LFY homologous gene indicted that the DenLFY gene had high identity with orchids LFY homologous, including the related fragment of PhalLFY (84%) in Phalaenopsis hybrid cultivar, LFY homologous gene in Oncidium (90%) and in other orchid (over 80%). Using MP analysis, Dendrobium is found to be the sister to Oncidium and Phalaenopsis. Homologous analysis demonstrated that the C-terminal amino acids were highly conserved. When the exons and introns were separately considered, exons and the sequence of amino acid were good markers for the function research of DenLFY gene. The second intron can be used in authentication research of Dendrobium based on the length polymorphism between Dendrobium moniliforme and Dendrobium officinale.
Functional region prediction with a set of appropriate homologous sequences-an index for sequence selection by integrating structure and sequence information with spatial statistics

PubMed Central

2012-01-01

Background The detection of conserved residue clusters on a protein structure is one of the effective strategies for the prediction of functional protein regions. Various methods, such as Evolutionary Trace, have been developed based on this strategy. In such approaches, the conserved residues are identified through comparisons of homologous amino acid sequences. Therefore, the selection of homologous sequences is a critical step. It is empirically known that a certain degree of sequence divergence in the set of homologous sequences is required for the identification of conserved residues. However, the development of a method to select homologous sequences appropriate for the identification of conserved residues has not been sufficiently addressed. An objective and general method to select appropriate homologous sequences is desired for the efficient prediction of functional regions. Results We have developed a novel index to select the sequences appropriate for the identification of conserved residues, and implemented the index within our method to predict the functional regions of a protein. The implementation of the index improved the performance of the functional region prediction. The index represents the degree of conserved residue clustering on the tertiary structure of the protein. For this purpose, the structure and sequence information were integrated within the index by the application of spatial statistics. Spatial statistics is a field of statistics in which not only the attributes but also the geometrical coordinates of the data are considered simultaneously. Higher degrees of clustering generate larger index scores. We adopted the set of homologous sequences with the highest index score, under the assumption that the best prediction accuracy is obtained when the degree of clustering is the maximum. The set of sequences selected by the index led to higher functional region prediction performance than the sets of sequences selected by other sequence
Evaluating the efficacy of a structure-derived amino acid substitution matrix in detecting protein homologs by BLAST and PSI-BLAST.

PubMed

Goonesekere, Nalin Cw

2009-01-01

The large numbers of protein sequences generated by whole genome sequencing projects require rapid and accurate methods of annotation. The detection of homology through computational sequence analysis is a powerful tool in determining the complex evolutionary and functional relationships that exist between proteins. Homology search algorithms employ amino acid substitution matrices to detect similarity between proteins sequences. The substitution matrices in common use today are constructed using sequences aligned without reference to protein structure. Here we present amino acid substitution matrices constructed from the alignment of a large number of protein domain structures from the structural classification of proteins (SCOP) database. We show that when incorporated into the homology search algorithms BLAST and PSI-blast, the structure-based substitution matrices enhance the efficacy of detecting remote homologs.
Faster sequence homology searches by clustering subsequences.

PubMed

Suzuki, Shuji; Kakuta, Masanori; Ishida, Takashi; Akiyama, Yutaka

2015-04-15

Sequence homology searches are used in various fields. New sequencing technologies produce huge amounts of sequence data, which continuously increase the size of sequence databases. As a result, homology searches require large amounts of computational time, especially for metagenomic analysis. We developed a fast homology search method based on database subsequence clustering, and implemented it as GHOSTZ. This method clusters similar subsequences from a database to perform an efficient seed search and ungapped extension by reducing alignment candidates based on triangle inequality. The database subsequence clustering technique achieved an ∼2-fold increase in speed without a large decrease in search sensitivity. When we measured with metagenomic data, GHOSTZ is ∼2.2-2.8 times faster than RAPSearch and is ∼185-261 times faster than BLASTX. The source code is freely available for download at http://www.bi.cs.titech.ac.jp/ghostz/ akiyama@cs.titech.ac.jp Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.
Chemical property based sequence characterization of PpcA and its homolog proteins PpcB-E: A mathematical approach

PubMed Central

Pal Choudhury, Pabitra

2017-01-01

Periplasmic c7 type cytochrome A (PpcA) protein is determined in Geobacter sulfurreducens along with its other four homologs (PpcB-E). From the crystal structure viewpoint the observation emerges that PpcA protein can bind with Deoxycholate (DXCA), while its other homologs do not. But it is yet to be established with certainty the reason behind this from primary protein sequence information. This study is primarily based on primary protein sequence analysis through the chemical basis of embedded amino acids. Firstly, we look for the chemical group specific score of amino acids. Along with this, we have developed a new methodology for the phylogenetic analysis based on chemical group dissimilarities of amino acids. This new methodology is applied to the cytochrome c7 family members and pinpoint how a particular sequence is differing with others. Secondly, we build a graph theoretic model on using amino acid sequences which is also applied to the cytochrome c7 family members and some unique characteristics and their domains are highlighted. Thirdly, we search for unique patterns as subsequences which are common among the group or specific individual member. In all the cases, we are able to show some distinct features of PpcA that emerges PpcA as an outstanding protein compared to its other homologs, resulting towards its binding with deoxycholate. Similarly, some notable features for the structurally dissimilar protein PpcD compared to the other homologs are also brought out. Further, the five members of cytochrome family being homolog proteins, they must have some common significant features which are also enumerated in this study. PMID:28362850
Establishing homologies in protein sequences

NASA Technical Reports Server (NTRS)

Dayhoff, M. O.; Barker, W. C.; Hunt, L. T.

1983-01-01

Computer-based statistical techniques used to determine homologies between proteins occurring in different species are reviewed. The technique is based on comparison of two protein sequences, either by relating all segments of a given length in one sequence to all segments of the second or by finding the best alignment of the two sequences. Approaches discussed include selection using printed tabulations, identification of very similar sequences, and computer searches of a database. The use of the SEARCH, RELATE, and ALIGN programs (Dayhoff, 1979) is explained; sample data are presented in graphs, diagrams, and tables and the construction of scoring matrices is considered.
Bloom DNA Helicase Facilitates Homologous Recombination between Diverged Homologous Sequences*

PubMed Central

Kikuchi, Koji; Abdel-Aziz, H. Ismail; Taniguchi, Yoshihito; Yamazoe, Mitsuyoshi; Takeda, Shunichi; Hirota, Kouji

2009-01-01

Bloom syndrome caused by inactivation of the Bloom DNA helicase (Blm) is characterized by increases in the level of sister chromatid exchange, homologous recombination (HR) associated with cross-over. It is therefore believed that Blm works as an anti-recombinase. Meanwhile, in Drosophila, DmBlm is required specifically to promote the synthesis-dependent strand anneal (SDSA), a type of HR not associating with cross-over. However, conservation of Blm function in SDSA through higher eukaryotes has been a matter of debate. Here, we demonstrate the function of Blm in SDSA type HR in chicken DT40 B lymphocyte line, where Ig gene conversion diversifies the immunoglobulin V gene through intragenic HR between diverged homologous segments. This reaction is initiated by the activation-induced cytidine deaminase enzyme-mediated uracil formation at the V gene, which in turn converts into abasic site, presumably leading to a single strand gap. Ig gene conversion frequency was drastically reduced in BLM−/− cells. In addition, BLM−/− cells used limited donor segments harboring higher identity compared with other segments in Ig gene conversion event, suggesting that Blm can promote HR between diverged sequences. To further understand the role of Blm in HR between diverged homologous sequences, we measured the frequency of gene targeting induced by an I-SceI-endonuclease-mediated double-strand break. BLM−/− cells showed a severer defect in the gene targeting frequency as the number of heterologous sequences increased at the double-strand break site. Conversely, the overexpression of Blm, even an ATPase-defective mutant, strongly stimulated gene targeting. In summary, Blm promotes HR between diverged sequences through a novel ATPase-independent mechanism. PMID:19661064
Solid phase sequencing of double-stranded nucleic acids

DOEpatents

Fu, Dong-Jing; Cantor, Charles R.; Koster, Hubert; Smith, Cassandra L.

2002-01-01

This invention relates to methods for detecting and sequencing of target double-stranded nucleic acid sequences, to nucleic acid probes and arrays of probes useful in these methods, and to kits and systems which contain these probes. Useful methods involve hybridizing the nucleic acids or nucleic acids which represent complementary or homologous sequences of the target to an array of nucleic acid probes. These probe comprise a single-stranded portion, an optional double-stranded portion and a variable sequence within the single-stranded portion. The molecular weights of the hybridized nucleic acids of the set can be determined by mass spectroscopy, and the sequence of the target determined from the molecular weights of the fragments. Nucleic acids whose sequences can be determined include nucleic acids in biological samples such as patient biopsies and environmental samples. Probes may be fixed to a solid support such as a hybridization chip to facilitate automated determination of molecular weights and identification of the target sequence.
Gentle Masking of Low-Complexity Sequences Improves Homology Search

PubMed Central

Frith, Martin C.

2011-01-01

Detection of sequences that are homologous, i.e. descended from a common ancestor, is a fundamental task in computational biology. This task is confounded by low-complexity tracts (such as atatatatatat), which arise frequently and independently, causing strong similarities that are not homologies. There has been much research on identifying low-complexity tracts, but little research on how to treat them during homology search. We propose to find homologies by aligning sequences with “gentle” masking of low-complexity tracts. Gentle masking means that the match score involving a masked letter is , where is the unmasked score. Gentle masking slightly but noticeably improves the sensitivity of homology search (compared to “harsh” masking), without harming specificity. We show examples in three useful homology search problems: detection of NUMTs (nuclear copies of mitochondrial DNA), recruitment of metagenomic DNA reads to reference genomes, and pseudogene detection. Gentle masking is currently the best way to treat low-complexity tracts during homology search. PMID:22205972
DNA sequence alignment by microhomology sampling during homologous recombination

PubMed Central

Qi, Zhi; Redding, Sy; Lee, Ja Yil; Gibb, Bryan; Kwon, YoungHo; Niu, Hengyao; Gaines, William A.; Sung, Patrick

2015-01-01

Summary Homologous recombination (HR) mediates the exchange of genetic information between sister or homologous chromatids. During HR, members of the RecA/Rad51 family of recombinases must somehow search through vast quantities of DNA sequence to align and pair ssDNA with a homologous dsDNA template. Here we use single-molecule imaging to visualize Rad51 as it aligns and pairs homologous DNA sequences in real-time. We show that Rad51 uses a length-based recognition mechanism while interrogating dsDNA, enabling robust kinetic selection of 8-nucleotide (nt) tracts of microhomology, which kinetically confines the search to sites with a high probability of being a homologous target. Successful pairing with a 9th nucleotide coincides with an additional reduction in binding free energy and subsequent strand exchange occurs in precise 3-nt steps, reflecting the base triplet organization of the presynaptic complex. These findings provide crucial new insights into the physical and evolutionary underpinnings of DNA recombination. PMID:25684365
Synthetic oligonucleotide probes deduced from amino acid sequence data. Theoretical and practical considerations.

PubMed

Lathe, R

1985-05-05

Synthetic probes deduced from amino acid sequence data are widely used to detect cognate coding sequences in libraries of cloned DNA segments. The redundancy of the genetic code dictates that a choice must be made between (1) a mixture of probes reflecting all codon combinations, and (2) a single longer "optimal" probe. The second strategy is examined in detail. The frequency of sequences matching a given probe by chance alone can be determined and also the frequency of sequences closely resembling the probe and contributing to the hybridization background. Gene banks cannot be treated as random associations of the four nucleotides, and probe sequences deduced from amino acid sequence data occur more often than predicted by chance alone. Probe lengths must be increased to confer the necessary specificity. Examination of hybrids formed between unique homologous probes and their cognate targets reveals that short stretches of perfect homology occurring by chance make a significant contribution to the hybridization background. Statistical methods for improving homology are examined, taking human coding sequences as an example, and considerations of codon utilization and dinucleotide frequencies yield an overall homology of greater than 82%. Recommendations for probe design and hybridization are presented, and the choice between using multiple probes reflecting all codon possibilities and a unique optimal probe is discussed.
Acid sphingomyelinase possesses a domain homologous to its activator proteins: saposins B and D.

PubMed Central

Ponting, C. P.

1994-01-01

An N-terminal region of the acid sphingomyelinase sequence (residues 89-165) is shown to be homologous to saposin-type sequences. By analogy with the known functions of saposins, this sphingomyelinase saposin-type domain may possess lipid-binding and/or sphingomyelinase-activator properties. This finding may prove to be important in the understanding of Niemann-Pick disease, which results from sphingomyelinase deficiency. PMID:8003971
The complete amino acid sequence of echinoidin, a lectin from the coelomic fluid of the sea urchin Anthocidaris crassispina. Homologies with mammalian and insect lectins.

PubMed

Giga, Y; Ikai, A; Takahashi, K

1987-05-05

The complete amino acid sequence of echinoidin, the proposed name for a lectin from the coelomic fluid of the sea urchin Anthocidaris crassispina, has been determined by sequencing the peptides obtained from tryptic, Staphylococcus aureus V8 protease, chymotryptic, and thermolysin digestions. Echinoidin is a multimeric protein (Giga, Y., Sutoh, K., and Ikai, A. (1985) Biochemistry 24, 4461-4467) whose subunit consists of a total of 147 amino acid residues and one carbohydrate chain attached to Ser38. The molecular weight of the polypeptide without carbohydrate was calculated to be 16,671. Each polypeptide chain contains seven half-cystines, and six of them form three disulfide bonds in the single polypeptide chain (Cys3-Cys14, Cys31-Cys141, and Cys116-Cys132), while Cys2 is involved in an interpolypeptide disulfide linkage. From secondary structure prediction by the method of Chou and Fasman (Chou, P. Y., and Fasman, G. D. (1974) Biochemistry 13, 211-222) the protein appears to be rich in beta-sheet and beta-turn structures and poor in alpha-helical structure. The sequence of the COOH-terminal half of echinoidin is highly homologous to those of the COOH-terminal carbohydrate recognition portions of rat liver mannose-binding protein and several other hepatic lectins. This COOH-terminal region of echinoidin is also homologous to the central portion of the lectin from the flesh fly Sarcophaga peregrina. Moreover, echinoidin contains an Arg-Gly-Asp sequence which has been proposed to be a basic functional unit in cellular recognition proteins.
Nucleotide sequence of the L1 ribosomal protein gene of Xenopus laevis: remarkable sequence homology among introns.

PubMed Central

Loreni, F; Ruberti, I; Bozzoni, I; Pierandrei-Amaldi, P; Amaldi, F

1985-01-01

Ribosomal protein L1 is encoded by two genes in Xenopus laevis. The comparison of two cDNA sequences shows that the two L1 gene copies (L1a and L1b) have diverged in many silent sites and very few substitution sites; moreover a small duplication occurred at the very end of the coding region of the L1b gene which thus codes for a product five amino acids longer than that coded by L1a. Quantitatively the divergence between the two L1 genes confirms that a whole genome duplication took place in Xenopus laevis approximately 30 million years ago. A genomic fragment containing one of the two L1 gene copies (L1a), with its nine introns and flanking regions, has been completely sequenced. The 5' end of this gene has been mapped within a 20-pyridimine stretch as already found for other vertebrate ribosomal protein genes. Four of the nine introns have a 60-nucleotide sequence with 80% homology; within this region some boxes, one of which is 16 nucleotides long, are 100% homologous among the four introns. This feature of L1a gene introns is interesting since we have previously shown that the activity of this gene is regulated at a post-transcriptional level and it involves the block of the normal splicing of some intron sequences. Images Fig. 3. Fig. 5. PMID:3841512
Using structure to explore the sequence alignment space of remote homologs.

PubMed

Kuziemko, Andrew; Honig, Barry; Petrey, Donald

2011-10-01

Protein structure modeling by homology requires an accurate sequence alignment between the query protein and its structural template. However, sequence alignment methods based on dynamic programming (DP) are typically unable to generate accurate alignments for remote sequence homologs, thus limiting the applicability of modeling methods. A central problem is that the alignment that is "optimal" in terms of the DP score does not necessarily correspond to the alignment that produces the most accurate structural model. That is, the correct alignment based on structural superposition will generally have a lower score than the optimal alignment obtained from sequence. Variations of the DP algorithm have been developed that generate alternative alignments that are "suboptimal" in terms of the DP score, but these still encounter difficulties in detecting the correct structural alignment. We present here a new alternative sequence alignment method that relies heavily on the structure of the template. By initially aligning the query sequence to individual fragments in secondary structure elements and combining high-scoring fragments that pass basic tests for "modelability", we can generate accurate alignments within a small ensemble. Our results suggest that the set of sequences that can currently be modeled by homology can be greatly extended.
Isolation and sequence of partial cDNA clones of human L1: homology of human and rodent L1 in the cytoplasmic region.

PubMed

Harper, J R; Prince, J T; Healy, P A; Stuart, J K; Nauman, S J; Stallcup, W B

1991-03-01

We have isolated cDNA clones coding for the human homologue of the neuronal cell adhesion molecule L1. The nucleotide sequence of the cDNA clones and the deduced primary amino acid sequence of the carboxy terminal portion of the human L1 are homologous to the corresponding sequences of mouse L1 and rat NILE glycoprotein, with an especially high sequences identity in the cytoplasmic regions of the proteins. There is also protein sequence homology with the cytoplasmic region of the Drosophila cell adhesion molecule, neuroglian. The conservation of the cytoplasmic domain argues for an important functional role for this portion of the molecule.

Porcine MYF6 gene: sequence, homology analysis, and variation in the promoter region.

PubMed

Wyszyńska-Koko, J; Kurył, J

2004-01-01

MYF6 gene codes for the bHLH transcription factor belonging to MyoD family. Its expression accompanies the processes of differentiation and maturation of myotubes during embriogenesis and continues on a relatively high level after birth, affecting the muscle phenotype. The porcine MYF6 gene was amplified and sequenced and compared with MYF6 gene sequences of other species. The amino acid sequence was deduced and an interspecies homology analysis was performed. Myf-6 protein shows a high conservation among species of 99 and 97% identity when comparing pig with cow and human, respectively, and of 93% when comparing pig with mouse and rat. The single nucleotide polymorphism (SNP) was revealed within the promoter region, which appeared to be T --> C transition recognized by a MspI restriction enzyme.
Complete amino acid sequence of bovine colostrum low-Mr cysteine proteinase inhibitor.

PubMed

Hirado, M; Tsunasawa, S; Sakiyama, F; Niinobe, M; Fujii, S

1985-07-01

The complete amino acid sequence of bovine colostrum cysteine proteinase inhibitor was determined by sequencing native inhibitor and peptides obtained by cyanogen bromide degradation, Achromobacter lysylendopeptidase digestion and partial acid hydrolysis of reduced and S-carboxymethylated protein. Achromobacter peptidase digestion was successfully used to isolate two disulfide-containing peptides. The inhibitor consists of 112 amino acids with an Mr of 12787. Two disulfide bonds were established between Cys 66 and Cys 77 and between Cys 90 and Cys 110. A high degree of homology in the sequence was found between the colostrum inhibitor and human gamma-trace, human salivary acidic protein and chicken egg-white cystatin.
CBH1 homologs and variant CBH1 cellulases

DOEpatents

Goedegebuur, Frits [Rozenlaan, NL; Gualfetti, Peter [San Francisco, CA; Mitchinson, Colin [Half Moon Bay, CA; Neefe, Paulien [Zoetermeer, NL

2011-05-31

Disclosed are a number of homologs and variants of Hypocrea jecorina Cel7A (formerly Trichoderma reesei cellobiohydrolase I or CBH1), nucleic acids encoding the same and methods for producing the same. The homologs and variant cellulases have the amino acid sequence of a glycosyl hydrolase of family 7A wherein one or more amino acid residues are substituted and/or deleted.
CBH1 homologs and varian CBH1 cellulase

DOEpatents

Goedegebuur, Frits; Gualfetti, Peter; Mitchinson, Colin; Neefe, Paulien

2014-07-01

Disclosed are a number of homologs and variants of Hypocrea jecorina Cel7A (formerly Trichoderma reesei cellobiohydrolase I or CBH1), nucleic acids encoding the same and methods for producing the same. The homologs and variant cellulases have the amino acid sequence of a glycosyl hydrolase of family 7A wherein one or more amino acid residues are substituted and/or deleted.
The LINE-1 DNA sequences in four mammalian orders predict proteins that conserve homologies to retrovirus proteins.

PubMed Central

Fanning, T; Singer, M

1987-01-01

Recent work suggests that one or more members of the highly repeated LINE-1 (L1) DNA family found in all mammals may encode one or more proteins. Here we report the sequence of a portion of an L1 cloned from the domestic cat (Felis catus). These data permit comparison of the L1 sequences in four mammalian orders (Carnivore, Lagomorph, Rodent and Primate) and the comparison supports the suggested coding potential. In two separate, noncontiguous regions in the carboxy terminal half of the proteins predicted from the DNA sequences, there are several strongly conserved segments. In one region, these share homology with known or suspected reverse transcriptases, as described by others in rodents and primates. In the second region, closer to the carboxy terminus, the strongly conserved segments are over 90% homologous among the four orders. One of the latter segments is cysteine rich and resembles the putative metal binding domains of nucleic acid binding proteins, including those of TFIIIA and retroviruses. PMID:3562227
Adhesive proteins of stalked and acorn barnacles display homology with low sequence similarities.

PubMed

Jonker, Jaimie-Leigh; Abram, Florence; Pires, Elisabete; Varela Coelho, Ana; Grunwald, Ingo; Power, Anne Marie

2014-01-01

Barnacle adhesion underwater is an important phenomenon to understand for the prevention of biofouling and potential biotechnological innovations, yet so far, identifying what makes barnacle glue proteins 'sticky' has proved elusive. Examination of a broad range of species within the barnacles may be instructive to identify conserved adhesive domains. We add to extensive information from the acorn barnacles (order Sessilia) by providing the first protein analysis of a stalked barnacle adhesive, Lepas anatifera (order Lepadiformes). It was possible to separate the L. anatifera adhesive into at least 10 protein bands using SDS-PAGE. Intense bands were present at approximately 30, 70, 90 and 110 kilodaltons (kDa). Mass spectrometry for protein identification was followed by de novo sequencing which detected 52 peptides of 7-16 amino acids in length. None of the peptides matched published or unpublished transcriptome sequences, but some amino acid sequence similarity was apparent between L. anatifera and closely-related Dosima fascicularis. Antibodies against two acorn barnacle proteins (ab-cp-52k and ab-cp-68k) showed cross-reactivity in the adhesive glands of L. anatifera. We also analysed the similarity of adhesive proteins across several barnacle taxa, including Pollicipes pollicipes (a stalked barnacle in the order Scalpelliformes). Sequence alignment of published expressed sequence tags clearly indicated that P. pollicipes possesses homologues for the 19 kDa and 100 kDa proteins in acorn barnacles. Homology aside, sequence similarity in amino acid and gene sequences tended to decline as taxonomic distance increased, with minimum similarities of 18-26%, depending on the gene. The results indicate that some adhesive proteins (e.g. 100 kDa) are more conserved within barnacles than others (20 kDa).
Nucleic Acid Homologies Among Oxidase-Negative Moraxella Species

PubMed Central

Johnson, John L.; Anderson, Robert S.; Ordal, Erling J.

1970-01-01

The deoxyribonucleic acid (DNA) base composition and DNA homologies of more than 40 strains of oxidase-negative Moraxella species were determined. These bacteria have also been identified as belonging to the Mima-Herellea-Acinetobacter group and the Bacterium anitratum group, as well as to several other genera including Achromobacter and Alcaligenes. The DNA base content of these strains ranged from 40 to 46% guanine plus cytosine. DNA–DNA competition experiments distinguished five groups whose members were determined by showing 50% or more homology to one of the reference strains: B. anitratum type B5W, Achromobacter haemolyticus var. haemolyticus, Alcaligenes haemolysans, Achromobacter metalcaligenes, and Moraxella lwoffi. A sixth group comprised those strains showing less than 50% homology to any of the reference strains. Negligible homology was found between strains of oxidase-negative and oxidase-positive Moraxella species in DNA–DNA competition experiments. However, evidence of a distant relationship between the two groups was obtained in competition experiments by using ribosomal ribonucleic acid. PMID:5413826
The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification products by 454 parallel sequencing.

PubMed

Binladen, Jonas; Gilbert, M Thomas P; Bollback, Jonathan P; Panitz, Frank; Bendixen, Christian; Nielsen, Rasmus; Willerslev, Eske

2007-02-14

The invention of the Genome Sequence 20 DNA Sequencing System (454 parallel sequencing platform) has enabled the rapid and high-volume production of sequence data. Until now, however, individual emulsion PCR (emPCR) reactions and subsequent sequencing runs have been unable to combine template DNA from multiple individuals, as homologous sequences cannot be subsequently assigned to their original sources. We use conventional PCR with 5'-nucleotide tagged primers to generate homologous DNA amplification products from multiple specimens, followed by sequencing through the high-throughput Genome Sequence 20 DNA Sequencing System (GS20, Roche/454 Life Sciences). Each DNA sequence is subsequently traced back to its individual source through 5'tag-analysis. We demonstrate that this new approach enables the assignment of virtually all the generated DNA sequences to the correct source once sequencing anomalies are accounted for (miss-assignment rate<0.4%). Therefore, the method enables accurate sequencing and assignment of homologous DNA sequences from multiple sources in single high-throughput GS20 run. We observe a bias in the distribution of the differently tagged primers that is dependent on the 5' nucleotide of the tag. In particular, primers 5' labelled with a cytosine are heavily overrepresented among the final sequences, while those 5' labelled with a thymine are strongly underrepresented. A weaker bias also exists with regards to the distribution of the sequences as sorted by the second nucleotide of the dinucleotide tags. As the results are based on a single GS20 run, the general applicability of the approach requires confirmation. However, our experiments demonstrate that 5'primer tagging is a useful method in which the sequencing power of the GS20 can be applied to PCR-based assays of multiple homologous PCR products. The new approach will be of value to a broad range of research areas, such as those of comparative genomics, complete mitochondrial analyses
Adhesive Proteins of Stalked and Acorn Barnacles Display Homology with Low Sequence Similarities

PubMed Central

Jonker, Jaimie-Leigh; Abram, Florence; Pires, Elisabete; Varela Coelho, Ana; Grunwald, Ingo; Power, Anne Marie

2014-01-01

Barnacle adhesion underwater is an important phenomenon to understand for the prevention of biofouling and potential biotechnological innovations, yet so far, identifying what makes barnacle glue proteins ‘sticky’ has proved elusive. Examination of a broad range of species within the barnacles may be instructive to identify conserved adhesive domains. We add to extensive information from the acorn barnacles (order Sessilia) by providing the first protein analysis of a stalked barnacle adhesive, Lepas anatifera (order Lepadiformes). It was possible to separate the L. anatifera adhesive into at least 10 protein bands using SDS-PAGE. Intense bands were present at approximately 30, 70, 90 and 110 kilodaltons (kDa). Mass spectrometry for protein identification was followed by de novo sequencing which detected 52 peptides of 7–16 amino acids in length. None of the peptides matched published or unpublished transcriptome sequences, but some amino acid sequence similarity was apparent between L. anatifera and closely-related Dosima fascicularis. Antibodies against two acorn barnacle proteins (ab-cp-52k and ab-cp-68k) showed cross-reactivity in the adhesive glands of L. anatifera. We also analysed the similarity of adhesive proteins across several barnacle taxa, including Pollicipes pollicipes (a stalked barnacle in the order Scalpelliformes). Sequence alignment of published expressed sequence tags clearly indicated that P. pollicipes possesses homologues for the 19 kDa and 100 kDa proteins in acorn barnacles. Homology aside, sequence similarity in amino acid and gene sequences tended to decline as taxonomic distance increased, with minimum similarities of 18–26%, depending on the gene. The results indicate that some adhesive proteins (e.g. 100 kDa) are more conserved within barnacles than others (20 kDa). PMID:25295513
Homology and the optimization of DNA sequence data

NASA Technical Reports Server (NTRS)

Wheeler, W.

2001-01-01

Three methods of nucleotide character analysis are discussed. Their implications for molecular sequence homology and phylogenetic analysis are compared. The criterion of inter-data set congruence, both character based and topological, are applied to two data sets to elucidate and potentially discriminate among these parsimony-based ideas. c2001 The Willi Hennig Society.
Sequence homology between HLA-bound cytomegalovirus and human peptides: A potential trigger for alloreactivity

PubMed Central

Koparde, Vishal N.; Jameson-Lee, Maximilian; Elnasseh, Abdelrhman G.; Scalora, Allison F.; Kobulnicky, David J.; Serrano, Myrna G.; Roberts, Catherine H.; Buck, Gregory A.; Neale, Michael C.; Nixon, Daniel E.; Toor, Amir A.

2017-01-01

Human cytomegalovirus (hCMV) reactivation may often coincide with the development of graft-versus-host-disease (GVHD) in stem cell transplantation (SCT). Seventy seven SCT donor-recipient pairs (DRP) (HLA matched unrelated donor (MUD), n = 50; matched related donor (MRD), n = 27) underwent whole exome sequencing to identify single nucleotide polymorphisms (SNPs) generating alloreactive peptide libraries for each DRP (9-mer peptide-HLA complexes); Human CMV CROSS (Cross-Reactive Open Source Sequence) database was compiled from NCBI; HLA class I binding affinity for each DRPs HLA was calculated by NetMHCpan 2.8 and hCMV- derived 9-mers algorithmically compared to the alloreactive peptide-HLA complex libraries. Short consecutive (≥6) amino acid (AA) sequence homology matching hCMV to recipient peptides was considered for HLA-bound-peptide (IC50<500nM) cross reactivity. Of the 70,686 hCMV 9-mers contained within the hCMV CROSS database, an average of 29,658 matched the MRD DRP alloreactive peptides and 52,910 matched MUD DRP peptides (p<0.001). In silico analysis revealed multiple high affinity, immunogenic CMV-Human peptide matches (IC50<500 nM) expressed in GVHD-affected tissue-specific manner. hCMV+GVHD was found in 18 patients, 13 developing hCMV viremia before GVHD onset. Analysis of patients with GVHD identified potential cross reactive peptide expression within affected organs. We propose that hCMV peptide sequence homology with human alloreactive peptides may contribute to the pathophysiology of GVHD. PMID:28800601
Sequence homology between HLA-bound cytomegalovirus and human peptides: A potential trigger for alloreactivity.

PubMed

Hall, Charles E; Koparde, Vishal N; Jameson-Lee, Maximilian; Elnasseh, Abdelrhman G; Scalora, Allison F; Kobulnicky, David J; Serrano, Myrna G; Roberts, Catherine H; Buck, Gregory A; Neale, Michael C; Nixon, Daniel E; Toor, Amir A

2017-01-01

Human cytomegalovirus (hCMV) reactivation may often coincide with the development of graft-versus-host-disease (GVHD) in stem cell transplantation (SCT). Seventy seven SCT donor-recipient pairs (DRP) (HLA matched unrelated donor (MUD), n = 50; matched related donor (MRD), n = 27) underwent whole exome sequencing to identify single nucleotide polymorphisms (SNPs) generating alloreactive peptide libraries for each DRP (9-mer peptide-HLA complexes); Human CMV CROSS (Cross-Reactive Open Source Sequence) database was compiled from NCBI; HLA class I binding affinity for each DRPs HLA was calculated by NetMHCpan 2.8 and hCMV- derived 9-mers algorithmically compared to the alloreactive peptide-HLA complex libraries. Short consecutive (≥6) amino acid (AA) sequence homology matching hCMV to recipient peptides was considered for HLA-bound-peptide (IC50<500nM) cross reactivity. Of the 70,686 hCMV 9-mers contained within the hCMV CROSS database, an average of 29,658 matched the MRD DRP alloreactive peptides and 52,910 matched MUD DRP peptides (p<0.001). In silico analysis revealed multiple high affinity, immunogenic CMV-Human peptide matches (IC50<500 nM) expressed in GVHD-affected tissue-specific manner. hCMV+GVHD was found in 18 patients, 13 developing hCMV viremia before GVHD onset. Analysis of patients with GVHD identified potential cross reactive peptide expression within affected organs. We propose that hCMV peptide sequence homology with human alloreactive peptides may contribute to the pathophysiology of GVHD.
Sequence-structure relationships in RNA loops: establishing the basis for loop homology modeling.

PubMed

Schudoma, Christian; May, Patrick; Nikiforova, Viktoria; Walther, Dirk

2010-01-01

The specific function of RNA molecules frequently resides in their seemingly unstructured loop regions. We performed a systematic analysis of RNA loops extracted from experimentally determined three-dimensional structures of RNA molecules. A comprehensive loop-structure data set was created and organized into distinct clusters based on structural and sequence similarity. We detected clear evidence of the hallmark of homology present in the sequence-structure relationships in loops. Loops differing by <25% in sequence identity fold into very similar structures. Thus, our results support the application of homology modeling for RNA loop model building. We established a threshold that may guide the sequence divergence-based selection of template structures for RNA loop homology modeling. Of all possible sequences that are, under the assumption of isosteric relationships, theoretically compatible with actual sequences observed in RNA structures, only a small fraction is contained in the Rfam database of RNA sequences and classes implying that the actual RNA loop space may consist of a limited number of unique loop structures and conserved sequences. The loop-structure data sets are made available via an online database, RLooM. RLooM also offers functionalities for the modeling of RNA loop structures in support of RNA engineering and design efforts.
Amino acid sequences of ribosomal proteins S11 from Bacillus stearothermophilus and S19 from Halobacterium marismortui. Comparison of the ribosomal protein S11 family.

PubMed

Kimura, M; Kimura, J; Hatakeyama, T

1988-11-21

The complete amino acid sequences of ribosomal proteins S11 from the Gram-positive eubacterium Bacillus stearothermophilus and of S19 from the archaebacterium Halobacterium marismortui have been determined. A search for homologous sequences of these proteins revealed that they belong to the ribosomal protein S11 family. Homologous proteins have previously been sequenced from Escherichia coli as well as from chloroplast, yeast and mammalian ribosomes. A pairwise comparison of the amino acid sequences showed that Bacillus protein S11 shares 68% identical residues with S11 from Escherichia coli and a slightly lower homology (52%) with the homologous chloroplast protein. The halophilic protein S19 is more related to the eukaryotic (45-49%) than to the eubacterial counterparts (35%).
Nucleotide sequence analysis of the L gene of Newcastle disease virus: homologies with Sendai and vesicular stomatitis viruses.

PubMed Central

Yusoff, K; Millar, N S; Chambers, P; Emmerson, P T

1987-01-01

The nucleotide sequence of the L gene of the Beaudette C strain of Newcastle disease virus (NDV) has been determined. The L gene is 6704 nucleotides long and encodes a protein of 2204 amino acids with a calculated molecular weight of 248822. Mung bean nuclease mapping of the 5' terminus of the L gene mRNA indicates that the transcription of the L gene is initiated 11 nucleotides upstream of the translational start site. Comparison with the amino acid sequences of the L genes of Sendai virus and vesicular stomatitis virus (VSV) suggests that there are several regions of homology between the sequences. These data provide further evidence for an evolutionary relationship between the Paramyxoviridae and the Rhabdoviridae. A non-coding sequence of 46 nucleotides downstream of the presumed polyadenylation site of the L gene may be part of a negative strand leader RNA. Images PMID:3035486
Transitive homology-guided structural studies lead to discovery of Cro proteins with 40% sequence identity but different folds

PubMed Central

Roessler, Christian G.; Hall, Branwen M.; Anderson, William J.; Ingram, Wendy M.; Roberts, Sue A.; Montfort, William R.; Cordes, Matthew H. J.

2008-01-01

Proteins that share common ancestry may differ in structure and function because of divergent evolution of their amino acid sequences. For a typical diverse protein superfamily, the properties of a few scattered members are known from experiment. A satisfying picture of functional and structural evolution in relation to sequence changes, however, may require characterization of a larger, well chosen subset. Here, we employ a “stepping-stone” method, based on transitive homology, to target sequences intermediate between two related proteins with known divergent properties. We apply the approach to the question of how new protein folds can evolve from preexisting folds and, in particular, to an evolutionary change in secondary structure and oligomeric state in the Cro family of bacteriophage transcription factors, initially identified by sequence-structure comparison of distant homologs from phages P22 and λ. We report crystal structures of two Cro proteins, Xfaso 1 and Pfl 6, with sequences intermediate between those of P22 and λ. The domains show 40% sequence identity but differ by switching of α-helix to β-sheet in a C-terminal region spanning ≈25 residues. Sedimentation analysis also suggests a correlation between helix-to-sheet conversion and strengthened dimerization. PMID:18227506
[Hepatitis C virus: sequence homology of a European isolate and divergence from the prototype].

PubMed

Seelig, R; Seelig, H P; Renz, M

1991-08-01

The polymerase chain reaction (PCR) detected specific hepatitis C viral (HCV) RNA sequences in liver biopsies from two patients with chronic hepatitis, in the tissue of a liver implantate, in plasma from four chronic non-A, non-B hepatitis (NANBH) patients and, for the first time, in an infectious anti-D-immunoglobulin preparation. A comparison of the viral sequences coding for a region for the nonstructural NS3 protein from the liver tissues revealed only a very small degree of sequence divergence on the cDNA as well as on the amino acid level (between 0 and 5%). The sequence similarities of the RNA isolated from plasma of the four chronic NANBH patients and the anti-D-immunoglobulin preparation were partly somewhat lower but altogether also high (between 90 and 100%). In contrast, all eight cDNA and amino acid sequences exhibited a significantly higher degree of divergence in comparison with the HCV prototype sequence (between 29 and 32%) than among themselves (between 0 and 10%). This unexpected high sequence similarity of the eight European isolates and their low homology to the Northamerican prototype sequence is indicative for the existence of different types of HCV. This will be important not only for epidemiological studies but also for the development of effective diagnostic procedures and vaccines. Concerning the pathogenesis of NANBH, a double infection or a helper mechanism has to be considered: in addition to the C virus, sequences of an other virus particle were found in the infectious IgG preparation as well as in the liver biopsies.
HomPPI: a class of sequence homology based protein-protein interface prediction methods

PubMed Central

2011-01-01

Background Although homology-based methods are among the most widely used methods for predicting the structure and function of proteins, the question as to whether interface sequence conservation can be effectively exploited in predicting protein-protein interfaces has been a subject of debate. Results We studied more than 300,000 pair-wise alignments of protein sequences from structurally characterized protein complexes, including both obligate and transient complexes. We identified sequence similarity criteria required for accurate homology-based inference of interface residues in a query protein sequence. Based on these analyses, we developed HomPPI, a class of sequence homology-based methods for predicting protein-protein interface residues. We present two variants of HomPPI: (i) NPS-HomPPI (Non partner-specific HomPPI), which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner; and (ii) PS-HomPPI (Partner-specific HomPPI), which can be used to predict the interface residues of a query protein with a specific target protein. Our experiments on a benchmark dataset of obligate homodimeric complexes show that NPS-HomPPI can reliably predict protein-protein interface residues in a given protein, with an average correlation coefficient (CC) of 0.76, sensitivity of 0.83, and specificity of 0.78, when sequence homologs of the query protein can be reliably identified. NPS-HomPPI also reliably predicts the interface residues of intrinsically disordered proteins. Our experiments suggest that NPS-HomPPI is competitive with several state-of-the-art interface prediction servers including those that exploit the structure of the query proteins. The partner-specific classifier, PS-HomPPI can, on a large dataset of transient complexes, predict the interface residues of a query protein with a specific target, with a CC of 0.65, sensitivity of 0.69, and specificity of 0.70, when homologs of both the query and the
Navigating highly homologous genes in a molecular diagnostic setting: a resource for clinical next-generation sequencing.

PubMed

Mandelker, Diana; Schmidt, Ryan J; Ankala, Arunkanth; McDonald Gibson, Kristin; Bowser, Mark; Sharma, Himanshu; Duffy, Elizabeth; Hegde, Madhuri; Santani, Avni; Lebo, Matthew; Funke, Birgit

2016-12-01

Next-generation sequencing (NGS) is now routinely used to interrogate large sets of genes in a diagnostic setting. Regions of high sequence homology continue to be a major challenge for short-read technologies and can lead to false-positive and false-negative diagnostic errors. At the scale of whole-exome sequencing (WES), laboratories may be limited in their knowledge of genes and regions that pose technical hurdles due to high homology. We have created an exome-wide resource that catalogs highly homologous regions that is tailored toward diagnostic applications. This resource was developed using a mappability-based approach tailored to current Sanger and NGS protocols. Gene-level and exon-level lists delineate regions that are difficult or impossible to analyze via standard NGS. These regions are ranked by degree of affectedness, annotated for medical relevance, and classified by the type of homology (within-gene, different functional gene, known pseudogene, uncharacterized noncoding region). Additionally, we provide a list of exons that cannot be analyzed by short-amplicon Sanger sequencing. This resource can help guide clinical test design, supplemental assay implementation, and results interpretation in the context of high homology.Genet Med 18 12, 1282-1289.
GPU-Acceleration of Sequence Homology Searches with Database Subsequence Clustering.

PubMed

Suzuki, Shuji; Kakuta, Masanori; Ishida, Takashi; Akiyama, Yutaka

2016-01-01

Sequence homology searches are used in various fields and require large amounts of computation time, especially for metagenomic analysis, owing to the large number of queries and the database size. To accelerate computing analyses, graphics processing units (GPUs) are widely used as a low-cost, high-performance computing platform. Therefore, we mapped the time-consuming steps involved in GHOSTZ, which is a state-of-the-art homology search algorithm for protein sequences, onto a GPU and implemented it as GHOSTZ-GPU. In addition, we optimized memory access for GPU calculations and for communication between the CPU and GPU. As per results of the evaluation test involving metagenomic data, GHOSTZ-GPU with 12 CPU threads and 1 GPU was approximately 3.0- to 4.1-fold faster than GHOSTZ with 12 CPU threads. Moreover, GHOSTZ-GPU with 12 CPU threads and 3 GPUs was approximately 5.8- to 7.7-fold faster than GHOSTZ with 12 CPU threads.

GPU-Acceleration of Sequence Homology Searches with Database Subsequence Clustering

PubMed Central

Suzuki, Shuji; Kakuta, Masanori; Ishida, Takashi; Akiyama, Yutaka

2016-01-01

Sequence homology searches are used in various fields and require large amounts of computation time, especially for metagenomic analysis, owing to the large number of queries and the database size. To accelerate computing analyses, graphics processing units (GPUs) are widely used as a low-cost, high-performance computing platform. Therefore, we mapped the time-consuming steps involved in GHOSTZ, which is a state-of-the-art homology search algorithm for protein sequences, onto a GPU and implemented it as GHOSTZ-GPU. In addition, we optimized memory access for GPU calculations and for communication between the CPU and GPU. As per results of the evaluation test involving metagenomic data, GHOSTZ-GPU with 12 CPU threads and 1 GPU was approximately 3.0- to 4.1-fold faster than GHOSTZ with 12 CPU threads. Moreover, GHOSTZ-GPU with 12 CPU threads and 3 GPUs was approximately 5.8- to 7.7-fold faster than GHOSTZ with 12 CPU threads. PMID:27482905
Amino acid sequence of a trypsin inhibitor from a Spirometra (Spirometra erinaceieuropaei).

PubMed

Sanda, A; Uchida, A; Itagaki, T; Kobayashi, H; Inokuchi, N; Koyama, T; Iwama, M; Ohgi, K; Irie, M

2001-12-01

A trypsin inhibitor that is highly homologous with bovine pancreatic trypsin inhibitor (BPTI) was co-purified along with RNase from Spirometra (Spirometra erinaceieuropaei). The amino acid sequence of this inhibitor (SETI) and the nucleotide sequence of the cDNA encoding this protein were determined by protein chemistry and gene technology. SETI contains 68 amino acid residues and has a molecular mass of 7,798 Da. SETI has 31 amino acid residues that are identical with BPTI's sequence, including 6 half-cystine and 5 aromatic amino acid residues. The active site Lys residue in BPTI is replaced by an Arg residue in SETI. SETI is an effective inhibitor of trypsin and moderately inhibits a-chymotrypsin, but less inhibits elastase or subtilisin. SETI was expressed by E. coli containing a PelB vector carrying the SETI encoding cDNA; an expression yield of 0.68 mg/l was obtained. The phylogenetic relationship of SETI and the other BPTI-like trypsin inhibitors was analyzed using most likelihood inference methods.
Single-cell template strand sequencing by Strand-seq enables the characterization of individual homologs.

PubMed

Sanders, Ashley D; Falconer, Ester; Hills, Mark; Spierings, Diana C J; Lansdorp, Peter M

2017-06-01

The ability to distinguish between genome sequences of homologous chromosomes in single cells is important for studies of copy-neutral genomic rearrangements (such as inversions and translocations), building chromosome-length haplotypes, refining genome assemblies, mapping sister chromatid exchange events and exploring cellular heterogeneity. Strand-seq is a single-cell sequencing technology that resolves the individual homologs within a cell by restricting sequence analysis to the DNA template strands used during DNA replication. This protocol, which takes up to 4 d to complete, relies on the directionality of DNA, in which each single strand of a DNA molecule is distinguished based on its 5'-3' orientation. Culturing cells in a thymidine analog for one round of cell division labels nascent DNA strands, allowing for their selective removal during genomic library construction. To preserve directionality of template strands, genomic preamplification is bypassed and labeled nascent strands are nicked and not amplified during library preparation. Each single-cell library is multiplexed for pooling and sequencing, and the resulting sequence data are aligned, mapping to either the minus or plus strand of the reference genome, to assign template strand states for each chromosome in the cell. The major adaptations to conventional single-cell sequencing protocols include harvesting of daughter cells after a single round of BrdU incorporation, bypassing of whole-genome amplification, and removal of the BrdU + strand during Strand-seq library preparation. By sequencing just template strands, the structure and identity of each homolog are preserved.
Molecular cloning, sequence analysis and homology modeling of the first caudata amphibian antifreeze-like protein in axolotl (Ambystoma mexicanum).

PubMed

Zhang, Songyan; Gao, Jiuxiang; Lu, Yiling; Cai, Shasha; Qiao, Xue; Wang, Yipeng; Yu, Haining

2013-08-01

Antifreeze proteins (AFPs) refer to a class of polypeptides that are produced by certain vertebrates, plants, fungi, and bacteria and which permit their survival in subzero environments. In this study, we report the molecular cloning, sequence analysis and three-dimensional structure of the axolotl antifreeze-like protein (AFLP) by homology modeling of the first caudate amphibian AFLP. We constructed a full-length spleen cDNA library of axolotl (Ambystoma mexicanum). An EST having highest similarity (∼42%) with freeze-responsive liver protein Li16 from Rana sylvatica was identified, and the full-length cDNA was subsequently obtained by RACE-PCR. The axolotl antifreeze-like protein sequence represents an open reading frame for a putative signal peptide and the mature protein composed of 93 amino acids. The calculated molecular mass and the theoretical isoelectric point (pl) of this mature protein were 10128.6 Da and 8.97, respectively. The molecular characterization of this gene and its deduced protein were further performed by detailed bioinformatics analysis. The three-dimensional structure of current AFLP was predicted by homology modeling, and the conserved residues required for functionality were identified. The homology model constructed could be of use for effective drug design. This is the first report of an antifreeze-like protein identified from a caudate amphibian.
Sequence basis of Barnacle Cement Nanostructure is Defined by Proteins with Silk Homology

NASA Astrophysics Data System (ADS)

So, Christopher R.; Fears, Kenan P.; Leary, Dagmar H.; Scancella, Jenifer M.; Wang, Zheng; Liu, Jinny L.; Orihuela, Beatriz; Rittschof, Dan; Spillmann, Christopher M.; Wahl, Kathryn J.

2016-11-01

Barnacles adhere by producing a mixture of cement proteins (CPs) that organize into a permanently bonded layer displayed as nanoscale fibers. These cement proteins share no homology with any other marine adhesives, and a common sequence-basis that defines how nanostructures function as adhesives remains undiscovered. Here we demonstrate that a significant unidentified portion of acorn barnacle cement is comprised of low complexity proteins; they are organized into repetitive sequence blocks and found to maintain homology to silk motifs. Proteomic analysis of aggregate bands from PAGE gels reveal an abundance of Gly/Ala/Ser/Thr repeats exemplified by a prominent, previously unidentified, 43 kDa protein in the solubilized adhesive. Low complexity regions found throughout the cement proteome, as well as multiple lysyl oxidases and peroxidases, establish homology with silk-associated materials such as fibroin, silk gum sericin, and pyriform spidroins from spider silk. Distinct primary structures defined by homologous domains shed light on how barnacles use low complexity in nanofibers to enable adhesion, and serves as a starting point for unraveling the molecular architecture of a robust and unique class of adhesive nanostructures.
The vector homology problem in diagnostic nucleic acid hybridization of clinical specimens.

PubMed Central

Ambinder, R F; Charache, P; Staal, S; Wright, P; Forman, M; Hayward, S D; Hayward, G S

1986-01-01

Nucleic acid hybridization techniques using cloned probes are finding application in assays of clinical specimens in research and diagnostic laboratories. The probes that we and others have used are recombinant plasmids composed of viral inserts and bacterial plasmid vectors such as pBR322. We suspected that there was material homologous to pBR322 present in many clinical samples. because hybridization occurred in samples which lacked evidence of virus by other techniques. If the presence of this vector-homologous material was unrecognized, hybridization in the test sample might erroneously be interpreted as indicating the presence of viral sequences. In this paper we demonstrate specific hybridization of labeled pBR322 DNA with DNA from various clinical samples. Evidence is presented that nonspecific probe trapping could not account for this phenomenon. In mixing experiments, it is shown that contamination of clinical samples with bacteria would explain such a result. Approaches tested to circumvent this problem included the use of isolated insert probes, alternate cloning vectors, and cold competitor pBR322 DNA in prehybridization and hybridization mixes. None proved entirely satisfactory. We therefore emphasize that it is essential that all hybridization detection systems use a control probe of the vector alone in order to demonstrate the absence of material with vector homology in the specimen tested. Images PMID:3013928
Amino acid sequences of the ribosomal proteins HL30 and HmaL5 from the archaebacterium Halobacterium marismortui.

PubMed

Hatakeyama, T; Hatakeyama, T

1990-07-06

The complete amino acid sequences of the ribosomal proteins HL30 and HmaL5 from the archaebacterium Halobacterium marismortui were determined. Protein HL30 was found to be acetylated at its N-terminal amino acid and shows homology to the eukaryotic ribosomal proteins YL34 from yeast and RL31 from rat. Protein HmaL5 was homologous to the protein L5 from Escherichia coli and Bacillus stearothermophilus as well as to YL16 from yeast. HmaL5 shows more similarities to its eukaryotic counterpart than to eubacterial ones.
VITAL NMR: Using Chemical Shift Derived Secondary Structure Information for a Limited Set of Amino Acids to Assess Homology Model Accuracy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brothers, Michael C; Nesbitt, Anna E; Hallock, Michael J

2011-01-01

Homology modeling is a powerful tool for predicting protein structures, whose success depends on obtaining a reasonable alignment between a given structural template and the protein sequence being analyzed. In order to leverage greater predictive power for proteins with few structural templates, we have developed a method to rank homology models based upon their compliance to secondary structure derived from experimental solid-state NMR (SSNMR) data. Such data is obtainable in a rapid manner by simple SSNMR experiments (e.g., (13)C-(13)C 2D correlation spectra). To test our homology model scoring procedure for various amino acid labeling schemes, we generated a library ofmore » 7,474 homology models for 22 protein targets culled from the TALOS+/SPARTA+ training set of protein structures. Using subsets of amino acids that are plausibly assigned by SSNMR, we discovered that pairs of the residues Val, Ile, Thr, Ala and Leu (VITAL) emulate an ideal dataset where all residues are site specifically assigned. Scoring the models with a predicted VITAL site-specific dataset and calculating secondary structure with the Chemical Shift Index resulted in a Pearson correlation coefficient (-0.75) commensurate to the control (-0.77), where secondary structure was scored site specifically for all amino acids (ALL 20) using STRIDE. This method promises to accelerate structure procurement by SSNMR for proteins with unknown folds through guiding the selection of remotely homologous protein templates and assessing model quality.« less
Homology between DNA polymerases of poxviruses, herpesviruses, and adenoviruses: nucleotide sequence of the vaccinia virus DNA polymerase gene.

PubMed Central

Earl, P L; Jones, E V; Moss, B

1986-01-01

A 5400-base-pair segment of the vaccinia virus genome was sequenced and an open reading frame of 938 codons was found precisely where the DNA polymerase had been mapped by transfer of a phosphonoacetate-resistance marker. A single nucleotide substitution changing glycine at position 347 to aspartic acid accounts for the drug resistance of the mutant vaccinia virus. The 5' end of the DNA polymerase mRNA was located 80 base pairs before the methionine codon initiating the open reading frame. Correspondence between the predicted Mr 108,577 polypeptide and the 110,000 purified enzyme indicates that little or no proteolytic processing occurs. Extensive homology, extending over 435 amino acids, was found upon comparing the DNA polymerase of vaccinia virus and DNA polymerase of Epstein-Barr virus. A highly conserved sequence of 14 amino acids in the carboxyl-terminal regions of the above DNA polymerases is also present at a similar location in adenovirus DNA polymerase. This structure, which is predicted to form a turn flanked by beta-pleated sheets, may form part of an essential binding or catalytic site that accounts for its presence in DNA polymerases of poxviruses, herpesviruses, and adenoviruses. Images PMID:3012524
The amino acid sequence around the active-site cysteine and histidine residues of stem bromelain

PubMed Central

Husain, S. S.; Lowe, G.

1970-01-01

Stem bromelain that had been irreversibly inhibited with 1,3-dibromo[2-14C]-acetone was reduced with sodium borohydride and carboxymethylated with iodoacetic acid. After digestion with trypsin and α-chymotrypsin three radioactive peptides were isolated chromatographically. The amino acid sequences around the cross-linked cysteine and histidine residues were determined and showed a high degree of homology with those around the active-site cysteine and histidine residues of papain and ficin. PMID:5420046
EUGENE'HOM: A generic similarity-based gene finder using multiple homologous sequences.

PubMed

Foissac, Sylvain; Bardou, Philippe; Moisan, Annick; Cros, Marie-Josée; Schiex, Thomas

2003-07-01

EUGENE'HOM is a gene prediction software for eukaryotic organisms based on comparative analysis. EUGENE'HOM is able to take into account multiple homologous sequences from more or less closely related organisms. It integrates the results of TBLASTX analysis, splice site and start codon prediction and a robust coding/non-coding probabilistic model which allows EUGENE'HOM to handle sequences from a variety of organisms. The current target of EUGENE'HOM is plant sequences. The EUGENE'HOM web site is available at http://genopole.toulouse.inra.fr/bioinfo/eugene/EuGeneHom/cgi-bin/EuGeneHom.pl.
PSI/TM-Coffee: a web server for fast and accurate multiple sequence alignments of regular and transmembrane proteins using homology extension on reduced databases.

PubMed

Floden, Evan W; Tommaso, Paolo D; Chatzou, Maria; Magis, Cedrik; Notredame, Cedric; Chang, Jia-Ming

2016-07-08

The PSI/TM-Coffee web server performs multiple sequence alignment (MSA) of proteins by combining homology extension with a consistency based alignment approach. Homology extension is performed with Position Specific Iterative (PSI) BLAST searches against a choice of redundant and non-redundant databases. The main novelty of this server is to allow databases of reduced complexity to rapidly perform homology extension. This server also gives the possibility to use transmembrane proteins (TMPs) reference databases to allow even faster homology extension on this important category of proteins. Aside from an MSA, the server also outputs topological prediction of TMPs using the HMMTOP algorithm. Previous benchmarking of the method has shown this approach outperforms the most accurate alignment methods such as MSAProbs, Kalign, PROMALS, MAFFT, ProbCons and PRALINE™. The web server is available at http://tcoffee.crg.cat/tmcoffee. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
On the necessity of dissecting sequence similarity scores into segment-specific contributions for inferring protein homology, function prediction and annotation

PubMed Central

2014-01-01

Background Protein sequence similarities to any types of non-globular segments (coiled coils, low complexity regions, transmembrane regions, long loops, etc. where either positional sequence conservation is the result of a very simple, physically induced pattern or rather integral sequence properties are critical) are pertinent sources for mistaken homologies. Regretfully, these considerations regularly escape attention in large-scale annotation studies since, often, there is no substitute to manual handling of these cases. Quantitative criteria are required to suppress events of function annotation transfer as a result of false homology assignments. Results The sequence homology concept is based on the similarity comparison between the structural elements, the basic building blocks for conferring the overall fold of a protein. We propose to dissect the total similarity score into fold-critical and other, remaining contributions and suggest that, for a valid homology statement, the fold-relevant score contribution should at least be significant on its own. As part of the article, we provide the DissectHMMER software program for dissecting HMMER2/3 scores into segment-specific contributions. We show that DissectHMMER reproduces HMMER2/3 scores with sufficient accuracy and that it is useful in automated decisions about homology for instructive sequence examples. To generalize the dissection concept for cases without 3D structural information, we find that a dissection based on alignment quality is an appropriate surrogate. The approach was applied to a large-scale study of SMART and PFAM domains in the space of seed sequences and in the space of UniProt/SwissProt. Conclusions Sequence similarity core dissection with regard to fold-critical and other contributions systematically suppresses false hits and, additionally, recovers previously obscured homology relationships such as the one between aquaporins and formate/nitrite transporters that, so far, was only
Tracing the Evolutionary History of the CAP Superfamily of Proteins Using Amino Acid Sequence Homology and Conservation of Splice Sites.

PubMed

Abraham, Anup; Chandler, Douglas E

2017-10-01

Proteins of the CAP superfamily play numerous roles in reproduction, innate immune responses, cancer biology, and venom toxicology. Here we document the breadth of the CAP (Cysteine-RIch Secretory Protein (CRISP), Antigen 5, and Pathogenesis-Related) protein superfamily and trace the major events in its evolution using amino acid sequence homology and the positions of exon/intron borders within their genes. Seldom acknowledged in the literature, we find that many of the CAP subfamilies present in mammals, where they were originally characterized, have distinct homologues in the invertebrate phyla. Early eukaryotic CAP genes contained only one exon inherited from prokaryotic predecessors and as evolution progressed an increasing number of introns were inserted, reaching 2-5 in the invertebrate world and 5-15 in the vertebrate world. Focusing on the CRISP subfamily, we propose that these proteins evolved in three major steps: (1) origination of the CAP/PR/SCP domain in bacteria, (2) addition of a small Hinge domain to produce the two-domain SCP-like proteins found in roundworms and anthropoids, and (3) addition of an Ion Channel Regulatory domain, borrowed from invertebrate peptide toxins, to produce full length, three-domain CRISP proteins, first seen in insects and later to diversify into multiple subtypes in the vertebrate world.
alpha-Amylase gene of Streptomyces limosus: nucleotide sequence, expression motifs, and amino acid sequence homology to mammalian and invertebrate alpha-amylases.

PubMed Central

Long, C M; Virolle, M J; Chang, S Y; Chang, S; Bibb, M J

1987-01-01

The nucleotide sequence of the coding and regulatory regions of the alpha-amylase gene (aml) of Streptomyces limosus was determined. High-resolution S1 mapping was used to locate the 5' end of the transcript and demonstrated that the gene is transcribed from a unique promoter. The predicted amino acid sequence has considerable identity to mammalian and invertebrate alpha-amylases, but not to those of plant, fungal, or eubacterial origin. Consistent with this is the susceptibility of the enzyme to an inhibitor of mammalian alpha-amylases. The amino-terminal sequence of the extracellular enzyme was determined, revealing the presence of a typical signal peptide preceding the mature form of the alpha-amylase. Images PMID:3500166
Hydroquinone: O-glucosyltransferase from cultivated Rauvolfia cells: enrichment and partial amino acid sequences.

PubMed

Arend, J; Warzecha, H; Stöckigt, J

2000-01-01

Plant cell suspension cultures of Rauvolfia are able to produce a high amount of arbutin by glucosylation of exogenously added hydroquinone. A four step purification procedure using anion exchange, hydrophobic interaction, hydroxyapatite-chromatography and chromatofocusing delivered in a yield of 0.5%, an approximately 390 fold enrichment of the involved glucosyltransferase. SDS-PAGE showed a M(r) for the enzyme of 52 kDa. Proteolysis of the pure enzyme with endoproteinase LysC revealed six peptide fragments with 9-23 amino acids which were sequenced. Sequence alignment of the six peptides showed high homologies to glycosyltransferases from other higher plants.
CPHmodels-3.0--remote homology modeling using structure-guided sequence profiles.

PubMed

Nielsen, Morten; Lundegaard, Claus; Lund, Ole; Petersen, Thomas Nordahl

2010-07-01

CPHmodels-3.0 is a web server predicting protein 3D structure by use of single template homology modeling. The server employs a hybrid of the scoring functions of CPHmodels-2.0 and a novel remote homology-modeling algorithm. A query sequence is first attempted modeled using the fast CPHmodels-2.0 profile-profile scoring function suitable for close homology modeling. The new computational costly remote homology-modeling algorithm is only engaged provided that no suitable PDB template is identified in the initial search. CPHmodels-3.0 was benchmarked in the CASP8 competition and produced models for 94% of the targets (117 out of 128), 74% were predicted as high reliability models (87 out of 117). These achieved an average RMSD of 4.6 A when superimposed to the 3D structure. The remaining 26% low reliably models (30 out of 117) could superimpose to the true 3D structure with an average RMSD of 9.3 A. These performance values place the CPHmodels-3.0 method in the group of high performing 3D prediction tools. Beside its accuracy, one of the important features of the method is its speed. For most queries, the response time of the server is <20 min. The web server is available at http://www.cbs.dtu.dk/services/CPHmodels/.
Sequence homology and structural analysis of plasmepsin 4 isolated from Indian Plasmodium vivax isolates.

PubMed

Rawat, Manmeet; Vijay, Sonam; Gupta, Yash; Dixit, Rajnikant; Tiwari, P K; Sharma, Arun

2011-07-01

Plasmodium vivax malaria is a globally widespread disease responsible for 50% of human malaria cases in Central and South America, South East Asia and Indian subcontinent. The rising severity of the disease and emerging resistance of the parasite has emphasized the need for the search of novel therapeutic targets to combat P. vivax malaria. Plasmepsin 4 (PM4) a food vacuole aspartic protease is essential in parasite functions and viability such as initiating hemoglobin digestion and processing of proteins and is being looked upon as potential drug target. Although the plasmepsins of Plasmodium falciparum have been extensively studied, the plasmepsins of P. vivax are not well characterized. This is the first report detailing complete PM4 gene analysis from Indian P. vivax isolates. Blast results of sequences of P. vivax plasmepsin 4 (PvPM4) shows 100% homology among isolates of P. vivax collected from different geographical regions of India. All of the seven Indian isolates did not contain intron within the coding region. Interestingly, PvPM4 sequence analysis showed a very high degree of homology with all other sequences of Plasmodium species available in the genebank. Our results strongly suggest that PvPM4 are highly conserved except a small number of amino acid substitutions that did not modify key motifs at active site formation for the function or the structure of the enzymes. Furthermore, our study shows that PvPM4 occupies unique phylogenetic status within Plasmodium group and sufficiently differ from the most closely related human aspartic protease, cathepsin D. The analysis of 3D model of PM4 showed a typical aspartic protease structure with bi-lobed, compact and distinct peptide binding cleft in both P. vivax and P. falciparum. In order to validate appropriate use of PM4 as potential anti-malarial drug target, studies on genetic and structural variations among P. vivax plasmepsins (PvPMs) from different geographical regions are of utmost importance for
The catalytic activity for ginkgolic acid biodegradation, homology modeling and molecular dynamic simulation of salicylic acid decarboxylase.

PubMed

Hu, Yanying; Hua, Qingyuan; Sun, Guojuan; Shi, Kunpeng; Zhang, Huitu; Zhao, Kai; Jia, Shiru; Dai, Yujie; Wu, Qingli

2018-05-02

The toxic ginkgolic acids are the main safety concern for the application of Ginkgo biloba. In this study, the degradation ability of salicylic acid decarboxylase (SDC) for ginkgolic acids was examined using ginkgolic acid C15:1 as a substrate. The results indicated that the content of ginkgolic acid C15:1 in Ginkgo biloba seeds was significantly decreased after 5 h treatment with SDC at 40 °Cand pH 5.5. In order to explore the structure of SDC and the interaction between SDC and substrates, homology modeling, molecular docking and molecular dynamics were performed. The results showed that SDC might also have a catalytic active center containing a Zn 2+ . Compared with the template structure of 2,6-dihydroxybenzoate decarboxylase, the residues surrounding the binding pocket, His10, Phe23 and Phe290, were replaced by Ala10, Tyr27 and Tyr301 in the homology constructed structure of SDC, respectively. These differences may significantly affect the substrates adaptability of SDC for salicylic acid derivatives. Copyright © 2018 Elsevier Ltd. All rights reserved.
CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction.

PubMed

Cui, Xuefeng; Lu, Zhiwu; Wang, Sheng; Jing-Yan Wang, Jim; Gao, Xin

2016-06-15

Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment, threading and alignment-free methods, protein homology detection remains a challenging open problem. Recently, network methods that try to find transitive paths in the protein structure space demonstrate the importance of incorporating network information of the structure space. Yet, current methods merge the sequence space and the structure space into a single space, and thus introduce inconsistency in combining different sources of information. We present a novel network-based protein homology detection method, CMsearch, based on cross-modal learning. Instead of exploring a single network built from the mixture of sequence and structure space information, CMsearch builds two separate networks to represent the sequence space and the structure space. It then learns sequence-structure correlation by simultaneously taking sequence information, structure information, sequence space information and structure space information into consideration. We tested CMsearch on two challenging tasks, protein homology detection and protein structure prediction, by querying all 8332 PDB40 proteins. Our results demonstrate that CMsearch is insensitive to the similarity metrics used to define the sequence and the structure spaces. By using HMM-HMM alignment as the sequence similarity metric, CMsearch clearly outperforms state-of-the-art homology detection methods and the CASP-winning template-based protein structure prediction methods. Our program is freely available for download from http://sfb.kaust.edu.sa/Pages/Software.aspx : xin.gao@kaust.edu.sa Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

Not all transmembrane helices are born equal: Towards the extension of the sequence homology concept to membrane proteins

PubMed Central

2011-01-01

Background Sequence homology considerations widely used to transfer functional annotation to uncharacterized protein sequences require special precautions in the case of non-globular sequence segments including membrane-spanning stretches composed of non-polar residues. Simple, quantitative criteria are desirable for identifying transmembrane helices (TMs) that must be included into or should be excluded from start sequence segments in similarity searches aimed at finding distant homologues. Results We found that there are two types of TMs in membrane-associated proteins. On the one hand, there are so-called simple TMs with elevated hydrophobicity, low sequence complexity and extraordinary enrichment in long aliphatic residues. They merely serve as membrane-anchoring device. In contrast, so-called complex TMs have lower hydrophobicity, higher sequence complexity and some functional residues. These TMs have additional roles besides membrane anchoring such as intra-membrane complex formation, ligand binding or a catalytic role. Simple and complex TMs can occur both in single- and multi-membrane-spanning proteins essentially in any type of topology. Whereas simple TMs have the potential to confuse searches for sequence homologues and to generate unrelated hits with seemingly convincing statistical significance, complex TMs contain essential evolutionary information. Conclusion For extending the homology concept onto membrane proteins, we provide a necessary quantitative criterion to distinguish simple TMs (and a sufficient criterion for complex TMs) in query sequences prior to their usage in homology searches based on assessment of hydrophobicity and sequence complexity of the TM sequence segments. Reviewers This article was reviewed by Shamil Sunyaev, L. Aravind and Arcady Mushegian. PMID:22024092
Full genome virus detection in fecal samples using sensitive nucleic acid preparation, deep sequencing, and a novel iterative sequence classification algorithm.

PubMed

Cotten, Matthew; Oude Munnink, Bas; Canuti, Marta; Deijs, Martin; Watson, Simon J; Kellam, Paul; van der Hoek, Lia

2014-01-01

We have developed a full genome virus detection process that combines sensitive nucleic acid preparation optimised for virus identification in fecal material with Illumina MiSeq sequencing and a novel post-sequencing virus identification algorithm. Enriched viral nucleic acid was converted to double-stranded DNA and subjected to Illumina MiSeq sequencing. The resulting short reads were processed with a novel iterative Python algorithm SLIM for the identification of sequences with homology to known viruses. De novo assembly was then used to generate full viral genomes. The sensitivity of this process was demonstrated with a set of fecal samples from HIV-1 infected patients. A quantitative assessment of the mammalian, plant, and bacterial virus content of this compartment was generated and the deep sequencing data were sufficient to assembly 12 complete viral genomes from 6 virus families. The method detected high levels of enteropathic viruses that are normally controlled in healthy adults, but may be involved in the pathogenesis of HIV-1 infection and will provide a powerful tool for virus detection and for analyzing changes in the fecal virome associated with HIV-1 progression and pathogenesis.
Full Genome Virus Detection in Fecal Samples Using Sensitive Nucleic Acid Preparation, Deep Sequencing, and a Novel Iterative Sequence Classification Algorithm

PubMed Central

Cotten, Matthew; Oude Munnink, Bas; Canuti, Marta; Deijs, Martin; Watson, Simon J.; Kellam, Paul; van der Hoek, Lia

2014-01-01

We have developed a full genome virus detection process that combines sensitive nucleic acid preparation optimised for virus identification in fecal material with Illumina MiSeq sequencing and a novel post-sequencing virus identification algorithm. Enriched viral nucleic acid was converted to double-stranded DNA and subjected to Illumina MiSeq sequencing. The resulting short reads were processed with a novel iterative Python algorithm SLIM for the identification of sequences with homology to known viruses. De novo assembly was then used to generate full viral genomes. The sensitivity of this process was demonstrated with a set of fecal samples from HIV-1 infected patients. A quantitative assessment of the mammalian, plant, and bacterial virus content of this compartment was generated and the deep sequencing data were sufficient to assembly 12 complete viral genomes from 6 virus families. The method detected high levels of enteropathic viruses that are normally controlled in healthy adults, but may be involved in the pathogenesis of HIV-1 infection and will provide a powerful tool for virus detection and for analyzing changes in the fecal virome associated with HIV-1 progression and pathogenesis. PMID:24695106
GAWK, a novel human pituitary polypeptide: isolation, immunocytochemical localization and complete amino acid sequence.

PubMed

Benjannet, S; Leduc, R; Lazure, C; Seidah, N G; Marcinkiewicz, M; Chrétien, M

1985-01-16

During the course of reverse-phase high pressure liquid chromatography (RP-HPLC) purification of a postulated big ACTH (1) from human pituitary gland extracts, a highly purified peptide bearing no resemblance to any known polypeptide was isolated. The complete sequence of this 74 amino acid polypeptide, called GAWK, has been determined. Search on a computer data bank on the possible homology to any known protein or fragment, using a mutation data matrix, failed to reveal any homology greater than 30%. An antibody produced against a synthetic fragment allowed us to detect several immunoreactive forms. The antisera also enabled us to localize the polypeptide, by immunocytochemistry, in the anterior lobe of the pituitary gland.
EUGÈNE'HOM: a generic similarity-based gene finder using multiple homologous sequences

PubMed Central

Foissac, Sylvain; Bardou, Philippe; Moisan, Annick; Cros, Marie-Josée; Schiex, Thomas

2003-01-01

EUGÈNE'HOM is a gene prediction software for eukaryotic organisms based on comparative analysis. EUGÈNE'HOM is able to take into account multiple homologous sequences from more or less closely related organisms. It integrates the results of TBLASTX analysis, splice site and start codon prediction and a robust coding/non-coding probabilistic model which allows EUGÈNE'HOM to handle sequences from a variety of organisms. The current target of EUGÈNE'HOM is plant sequences. The EUGÈNE'HOM web site is available at http://genopole.toulouse.inra.fr/bioinfo/eugene/EuGeneHom/cgi-bin/EuGeneHom.pl. PMID:12824408
Composition for nucleic acid sequencing

DOEpatents

Korlach, Jonas [Ithaca, NY; Webb, Watt W [Ithaca, NY; Levene, Michael [Ithaca, NY; Turner, Stephen [Ithaca, NY; Craighead, Harold G [Ithaca, NY; Foquet, Mathieu [Ithaca, NY

2008-08-26

The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.
Solid phase sequencing of biopolymers

DOEpatents

Cantor, Charles; Koster, Hubert

2010-09-28

This invention relates to methods for detecting and sequencing target nucleic acid sequences, to mass modified nucleic acid probes and arrays of probes useful in these methods, and to kits and systems which contain these probes. Useful methods involve hybridizing the nucleic acids or nucleic acids which represent complementary or homologous sequences of the target to an array of nucleic acid probes. These probes comprise a single-stranded portion, an optional double-stranded portion and a variable sequence within the single-stranded portion. The molecular weights of the hybridized nucleic acids of the set can be determined by mass spectroscopy, and the sequence of the target determined from the molecular weights of the fragments. Nucleic acids whose sequences can be determined include DNA or RNA in biological samples such as patient biopsies and environmental samples. Probes may be fixed to a solid support such as a hybridization chip to facilitate automated molecular weight analysis and identification of the target sequence.
The sequence, and its evolutionary implications, of a Thermococcus celer protein associated with transcription

NASA Technical Reports Server (NTRS)

Kaine, B. P.; Mehr, I. J.; Woese, C. R.

1994-01-01

Through random search, a gene from Thermococcus celer has been identified and sequenced that appears to encode a transcription-associated protein (110 amino acid residues). The sequence has clear homology to approximately the last half of an open reading frame reported previously for Sulfolobus acidocaldarius [Langer, D. & Zillig, W. (1993) Nucleic Acids Res. 21, 2251]. The protein translations of these two archaeal genes in turn are homologs of a small subunit found in eukaryotic RNA polymerase I (A12.2) and the counterpart of this from RNA polymerase II (B12.6). Homology is also seen with the eukaryotic transcription factor TFIIS, but it involves only the terminal 45 amino acids of the archaeal proteins. Evolutionary implications of these homologies are discussed.
Chip-based sequencing nucleic acids

DOEpatents

Beer, Neil Reginald

2014-08-26

A system for fast DNA sequencing by amplification of genetic material within microreactors, denaturing, demulsifying, and then sequencing the material, while retaining it in a PCR/sequencing zone by a magnetic field. One embodiment includes sequencing nucleic acids on a microchip that includes a microchannel flow channel in the microchip. The nucleic acids are isolated and hybridized to magnetic nanoparticles or to magnetic polystyrene-coated beads. Microreactor droplets are formed in the microchannel flow channel. The microreactor droplets containing the nucleic acids and the magnetic nanoparticles are retained in a magnetic trap in the microchannel flow channel and sequenced.
Streptococcal phosphoenolpyruvate-sugar phosphotransferase system: amino acid sequence and site of ATP-dependent phosphorylation of HPr

DOE Office of Scientific and Technical Information (OSTI.GOV)

Deutscher, J.; Pevec, B.; Beyreuther, K.

1986-10-21

The amino acid sequence of histidine-containing protein (HPr) from Streptococcus faecalis has been determined by direct Edman degradation of intact HPr and by amino acid sequence analysis of tryptic peptides, V8 proteolyptic peptides, thermolytic peptides, and cyanogen bromide cleavage products. HPr from S. faecalis was found to contain 89 amino acid residues, corresponding to a molecular weight of 9438. The amino acid sequence of HPr from S. faecalis shows extended homology to the primary structure of HPr proteins from other bacteria. Besides the phosphoenolpyruvate-dependent phosphorylation of a histidyl residue in HPr, catalyzed by enzyme I of the bacterial phosphotransferase system,more » HPr was also found to be phosphorylated at a seryl residue in an ATP-dependent protein kinase catalyzed reaction. The site of ATP-dependent phosphorylation in HPr of S faecalis has now been determined. (/sup 32/P)P-Ser-HPr was digested with three different proteases, and in each case, a single labeled peptide was isolated. Following digestion with subtilisin, they obtained a peptide with the sequence -(P)Ser-Ile-Met-. Using chymotrypsin, they isolated a peptide with the sequence -Ser-Val-Asn-Leu-Lys-(P)Ser-Ile-Met-Gly-Val-Met-. The longest labeled peptide was obtained with V8 staphylococcal protease. According to amino acid analysis, this peptide contained 36 out of the 89 amino acid residues of HPr. The following sequence of 12 amino acid residues of the V8 peptide was determined: -Tyr-Lys-Gly-Lys-Ser-Val-Asn-Leu-Lys-(P)Ser-Ile-Met-. Thus, the site of ATP-dependent phosphorylation was determined to be Ser-46 within the primary structure of HPr.« less
Complete amino acid sequences of the ribosomal proteins L25, L29 and L31 from the archaebacterium Halobacterium marismortui.

PubMed

Hatakeyama, T; Kimura, M

1988-03-15

Ribosomal proteins were extracted from 50S ribosomal subunits of the archaebacterium Halobacterium marismortui by decreasing the concentration of Mg2+ and K+, and the proteins were separated and purified by ion-exchange column chromatography on DEAE-cellulose. Ten proteins were purified to homogeneity and three of these proteins were subjected to sequence analysis. The complete amino acid sequences of the ribosomal proteins L25, L29 and L31 were established by analyses of the peptides obtained by enzymatic digestion with trypsin, Staphylococcus aureus protease, chymotrypsin and lysylendopeptidase. Proteins L25, L29 and L31 consist of 84, 115 and 95 amino acid residues with the molecular masses of 9472 Da, 12293 Da and 10418 Da respectively. A comparison of their sequences with those of other large-ribosomal-subunit proteins from other organisms revealed that protein L25 from H. marismortui is homologous to protein L23 from Escherichia coli (34.6%), Bacillus stearothermophilus (41.8%), and tobacco chloroplasts (16.3%) as well as to protein L25 from yeast (38.0%). Proteins L29 and L31 do not appear to be homologous to any other ribosomal proteins whose structures are so far known.
SNBRFinder: A Sequence-Based Hybrid Algorithm for Enhanced Prediction of Nucleic Acid-Binding Residues.

PubMed

Yang, Xiaoxia; Wang, Jia; Sun, Jun; Liu, Rong

2015-01-01

Protein-nucleic acid interactions are central to various fundamental biological processes. Automated methods capable of reliably identifying DNA- and RNA-binding residues in protein sequence are assuming ever-increasing importance. The majority of current algorithms rely on feature-based prediction, but their accuracy remains to be further improved. Here we propose a sequence-based hybrid algorithm SNBRFinder (Sequence-based Nucleic acid-Binding Residue Finder) by merging a feature predictor SNBRFinderF and a template predictor SNBRFinderT. SNBRFinderF was established using the support vector machine whose inputs include sequence profile and other complementary sequence descriptors, while SNBRFinderT was implemented with the sequence alignment algorithm based on profile hidden Markov models to capture the weakly homologous template of query sequence. Experimental results show that SNBRFinderF was clearly superior to the commonly used sequence profile-based predictor and SNBRFinderT can achieve comparable performance to the structure-based template methods. Leveraging the complementary relationship between these two predictors, SNBRFinder reasonably improved the performance of both DNA- and RNA-binding residue predictions. More importantly, the sequence-based hybrid prediction reached competitive performance relative to our previous structure-based counterpart. Our extensive and stringent comparisons show that SNBRFinder has obvious advantages over the existing sequence-based prediction algorithms. The value of our algorithm is highlighted by establishing an easy-to-use web server that is freely accessible at http://ibi.hzau.edu.cn/SNBRFinder.
Partial amino acid sequence of the branched chain amino acid aminotransferase (TmB) of E. coli JA199 pDU11

DOE Office of Scientific and Technical Information (OSTI.GOV)

Feild, M.J.; Armstrong, F.B.

1987-05-01

E. coli JA199 pDU11 harbors a multicopy plasmid containing the ilv GEDAY gene cluster of S. typhimurium. TmB, gene product of ilv E, was purified, crystallized, and subjected to Edman degradation using a gas phase sequencer. The intact protein yielded an amino terminal 31 residue sequence. Both carboxymethylated apoenzyme and (/sup 3/H)-NaBH-reduced holoenzyme were then subjected to digestion by trypsin. The digests were fractionated using reversed phase HPLC, and the peptides isolated were sequenced. The borohydride-treated holoenzyme was used to isolate the cofactor-binding peptide. The peptide is 27 residues long and a comparison with known sequences of other aminotransferases revealedmore » limited homology. Peptides accounting for 211 of 288 predicted residues have been sequenced, including 9 residues of the carboxyl terminus. Comparison of peptides with the inferred amino acid sequence of the E. coli K-12 enzyme has helped determine the sequence of the amino terminal 59 residues; only two differences between the sequences are noted in this region.« less
SANSparallel: interactive homology search against Uniprot.

PubMed

Somervuo, Panu; Holm, Liisa

2015-07-01

Proteins evolve by mutations and natural selection. The network of sequence similarities is a rich source for mining homologous relationships that inform on protein structure and function. There are many servers available to browse the network of homology relationships but one has to wait up to a minute for results. The SANSparallel webserver provides protein sequence database searches with immediate response and professional alignment visualization by third-party software. The output is a list, pairwise alignment or stacked alignment of sequence-similar proteins from Uniprot, UniRef90/50, Swissprot or Protein Data Bank. The stacked alignments are viewed in Jalview or as sequence logos. The database search uses the suffix array neighborhood search (SANS) method, which has been re-implemented as a client-server, improved and parallelized. The method is extremely fast and as sensitive as BLAST above 50% sequence identity. Benchmarks show that the method is highly competitive compared to previously published fast database search programs: UBLAST, DIAMOND, LAST, LAMBDA, RAPSEARCH2 and BLAT. The web server can be accessed interactively or programmatically at http://ekhidna2.biocenter.helsinki.fi/cgi-bin/sans/sans.cgi. It can be used to make protein functional annotation pipelines more efficient, and it is useful in interactive exploration of the detailed evidence supporting the annotation of particular proteins of interest. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Next-Generation Sequence Analysis of the Genome of RFHVMn, the Macaque Homolog of Kaposi's Sarcoma (KS)-Associated Herpesvirus, from a KS-Like Tumor of a Pig-Tailed Macaque

PubMed Central

Bruce, A. Gregory; Ryan, Jonathan T.; Thomas, Mathew J.; Peng, Xinxia; Grundhoff, Adam; Tsai, Che-Chung

2013-01-01

The complete sequence of retroperitoneal fibromatosis-associated herpesvirus Macaca nemestrina (RFHVMn), the pig-tailed macaque homolog of Kaposi's sarcoma-associated herpesvirus (KSHV), was determined by next-generation sequence analysis of a Kaposi's sarcoma (KS)-like macaque tumor. Colinearity of genes was observed with the KSHV genome, and the core herpesvirus genes had strong sequence homology to the corresponding KSHV genes. RFHVMn lacked homologs of open reading frame 11 (ORF11) and KSHV ORFs K5 and K6, which appear to have been generated by duplication of ORFs K3 and K4 after the divergence of KSHV and RFHV. RFHVMn contained positional homologs of all other unique KSHV genes, although some showed limited sequence similarity. RFHVMn contained a number of candidate microRNA genes. Although there was little sequence similarity with KSHV microRNAs, one candidate contained the same seed sequence as the positional homolog, kshv-miR-K12-10a, suggesting functional overlap. RNA transcript splicing was highly conserved between RFHVMn and KSHV, and strong sequence conservation was noted in specific promoters and putative origins of replication, predicting important functional similarities. Sequence comparisons indicated that RFHVMn and KSHV developed in long-term synchrony with the evolution of their hosts, and both viruses phylogenetically group within the RV1 lineage of Old World primate rhadinoviruses. RFHVMn is the closest homolog of KSHV to be completely sequenced and the first sequenced RV1 rhadinovirus homolog of KSHV from a nonhuman Old World primate. The strong genetic and sequence similarity between RFHVMn and KSHV, coupled with similarities in biology and pathology, demonstrate that RFHVMn infection in macaques offers an important and relevant model for the study of KSHV in humans. PMID:24109218
Initial cloning and sequencing of hydHG, an operon homologous to ntrBC and regulating the labile hydrogenase activity in Escherichia coli K-12.

PubMed Central

Stoker, K; Reijnders, W N; Oltmann, L F; Stouthamer, A H

1989-01-01

To isolate genes from Escherichia coli which regulate the labile hydrogenase activity, a plasmid library was used to transform hydL mutants lacking the labile hydrogenase. A single type of gene, designated hydG, was isolated. This gene also partially restored the hydrogenase activity in hydF mutants (which are defective in all hydrogenase isoenzymes), although the low hydrogenase 1 and 2 levels were not induced. Therefore, hydG apparently regulates, specifically, the labile hydrogenase activity. Restoration of this latter activity in hydF mutants was accompanied by a proportional increase of the H2 uptake activity, suggesting a functional relationship. H2:fumarate oxidoreductase activity was not restored in complemented hydL mutants. These latter strains may therefore lack, in addition to the labile hydrogenase, a second component (provisionally designated component R), possibly an electron carrier coupling H2 oxidation to the anerobic respiratory chain. Sequence analysis showed an open reading frame of 1,314 base pairs for hydG. It was preceded by a ribosome-binding site but apparently lacked a promoter. Minicell experiments revealed a single polypeptide of approximately 50 kilodaltons. Comparison of the predicted amino acid sequence with a protein sequence data base revealed strong homology to NtrC from Klebsiella pneumoniae, a DNA-binding transcriptional activator. The 411 base pairs upstream from pHG40 contained a second open reading frame overlapping hydG by four bases. The deduced amino acid sequence showed considerable homology with the C-terminal part of NtrB. This sequence was therefore assumed to be part of a second gene, encoding the NtrB-like component, and was designated hydH. The labile hydrogenase activity in E. coli is apparently regulated by a multicomponent system analogous to the NtrB-NtrC system. This conclusion is in agreement with the results of Birkmann et al. (A. Birkmann, R. G. Sawers, and A. Böck, Mol. Gen. Genet. 210:535-542, 1987), who
Comparative analysis of ribosomal protein L5 sequences from bacteria of the genus Thermus.

PubMed

Jahn, O; Hartmann, R K; Boeckh, T; Erdmann, V A

1991-06-01

The genes for the ribosomal 5S rRNA binding protein L5 have been cloned from three extremely thermophilic eubacteria, Thermus flavus, Thermus thermophilus HB8 and Thermus aquaticus (Jahn et al, submitted). Genes for protein L5 from the three Thermus strains display 95% G/C in third positions of codons. Amino acid sequences deduced from the DNA sequence were shown to be identical for T flavus and T thermophilus, although the corresponding DNA sequences differed by two T to C transitions in the T thermophilus gene. Protein L5 sequences from T flavus and T thermophilus are 95% homologous to L5 from T aquaticus and 56.5% homologous to the corresponding E coli sequence. The lowest degrees of homology were found between the T flavus/T thermophilus L5 proteins and those of yeast L16 (27.5%), Halobacterium marismortui (34.0%) and Methanococcus vannielii (36.6%). From sequence comparison it becomes clear that thermostability of Thermus L5 proteins is achieved by an increase in hydrophobic interactions and/or by restriction of steric flexibility due to the introduction of amino acids with branched aliphatic side chains such as leucine. Alignment of the nine protein sequences equivalent to Thermus L5 proteins led to identification of a conserved internal segment, rich in acidic amino acids, which shows homology to subsequences of E coli L18 and L25. The occurrence of conserved sequence elements in 5S rRNA binding proteins and ribosomal proteins in general is discussed in terms of evolution and function.
Biosynthesis of Lipoic Acid in Arabidopsis: Cloning and Characterization of the cDNA for Lipoic Acid Synthase1

PubMed Central

Yasuno, Rie; Wada, Hajime

1998-01-01

Lipoic acid is a coenzyme that is essential for the activity of enzyme complexes such as those of pyruvate dehydrogenase and glycine decarboxylase. We report here the isolation and characterization of LIP1 cDNA for lipoic acid synthase of Arabidopsis. The Arabidopsis LIP1 cDNA was isolated using an expressed sequence tag homologous to the lipoic acid synthase of Escherichia coli. This cDNA was shown to code for Arabidopsis lipoic acid synthase by its ability to complement a lipA mutant of E. coli defective in lipoic acid synthase. DNA-sequence analysis of the LIP1 cDNA revealed an open reading frame predicting a protein of 374 amino acids. Comparisons of the deduced amino acid sequence with those of E. coli and yeast lipoic acid synthase homologs showed a high degree of sequence similarity and the presence of a leader sequence presumably required for import into the mitochondria. Southern-hybridization analysis suggested that LIP1 is a single-copy gene in Arabidopsis. Western analysis with an antibody against lipoic acid synthase demonstrated that this enzyme is located in the mitochondrial compartment in Arabidopsis cells as a 43-kD polypeptide. PMID:9808738
Heterochromatic self-association, a determinant of nuclear organization, does not require sequence homology in Drosophila.

PubMed Central

Sage, Brian T; Csink, Amy K

2003-01-01

Chromosomes of higher eukaryotes contain blocks of heterochromatin that can associate with each other in the interphase nucleus. A well-studied example of heterochromatic interaction is the brown(Dominant) (bwD) chromosome of D. melanogaster, which contains an approximately 1.6-Mbp insertion of AAGAG repeats near the distal tip of chromosome 2. This insertion causes association of the tip with the centric heterochromatin of chromosome 2 (2h), which contains megabases of AAGAG repeats. Here we describe an example, other than bwD, in which distally translocated heterochromatin associates with centric heterochromatin. Additionally, we show that when a translocation places bwD on a different chromosome, bwD tends to associate with the centric heterochromatin of this chromosome, even when the chromosome contains a small fraction of the sequence homology present elsewhere. To further test the importance of sequence homology in these interactions, we used interspecific mating to introgress the bwD allele from D. melanogaster into D. simulans, which lacks the AAGAG on the autosomes. We find that D. simulans bwD associates with 2h, which lacks the AAGAG sequence, while it does not associate with the AAGAG containing X chromosome heterochromatin. Our results show that intranuclear association of separate heterochromatic blocks does not require that they contain the same sequence. PMID:14668374
MIPS: a database for protein sequences, homology data and yeast genome information.

PubMed Central

Mewes, H W; Albermann, K; Heumann, K; Liebl, S; Pfeiffer, F

1997-01-01

The MIPS group (Martinsried Institute for Protein Sequences) at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, collects, processes and distributes protein sequence data within the framework of the tripartite association of the PIR-International Protein Sequence Database (,). MIPS contributes nearly 50% of the data input to the PIR-International Protein Sequence Database. The database is distributed on CD-ROM together with PATCHX, an exhaustive supplement of unique, unverified protein sequences from external sources compiled by MIPS. Through its WWW server (http://www.mips.biochem.mpg.de/ ) MIPS permits internet access to sequence databases, homology data and to yeast genome information. (i) Sequence similarity results from the FASTA program () are stored in the FASTA database for all proteins from PIR-International and PATCHX. The database is dynamically maintained and permits instant access to FASTA results. (ii) Starting with FASTA database queries, proteins have been classified into families and superfamilies (PROT-FAM). (iii) The HPT (hashed position tree) data structure () developed at MIPS is a new approach for rapid sequence and pattern searching. (iv) MIPS provides access to the sequence and annotation of the complete yeast genome (), the functional classification of yeast genes (FunCat) and its graphical display, the 'Genome Browser' (). A CD-ROM based on the JAVA programming language providing dynamic interactive access to the yeast genome and the related protein sequences has been compiled and is available on request. PMID:9016498

Method for sequencing nucleic acid molecules

DOEpatents

Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

2006-06-06

The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.
Method for sequencing nucleic acid molecules

DOEpatents

Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

2006-05-30

The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.
Implication of the cause of differences in 3D structures of proteins with high sequence identity based on analyses of amino acid sequences and 3D structures.

PubMed

Matsuoka, Masanari; Sugita, Masatake; Kikuchi, Takeshi

2014-09-18

Proteins that share a high sequence homology while exhibiting drastically different 3D structures are investigated in this study. Recently, artificial proteins related to the sequences of the GA and IgG binding GB domains of human serum albumin have been designed. These artificial proteins, referred to as GA and GB, share 98% amino acid sequence identity but exhibit different 3D structures, namely, a 3α bundle versus a 4β + α structure. Discriminating between their 3D structures based on their amino acid sequences is a very difficult problem. In the present work, in addition to using bioinformatics techniques, an analysis based on inter-residue average distance statistics is used to address this problem. It was hard to distinguish which structure a given sequence would take only with the results of ordinary analyses like BLAST and conservation analyses. However, in addition to these analyses, with the analysis based on the inter-residue average distance statistics and our sequence tendency analysis, we could infer which part would play an important role in its structural formation. The results suggest possible determinants of the different 3D structures for sequences with high sequence identity. The possibility of discriminating between the 3D structures based on the given sequences is also discussed.
KM+, a mannose-binding lectin from Artocarpus integrifolia: amino acid sequence, predicted tertiary structure, carbohydrate recognition, and analysis of the beta-prism fold.

PubMed Central

Rosa, J. C.; De Oliveira, P. S.; Garratt, R.; Beltramini, L.; Resing, K.; Roque-Barreira, M. C.; Greene, L. J.

1999-01-01

The complete amino acid sequence of the lectin KM+ from Artocarpus integrifolia (jackfruit), which contains 149 residues/mol, is reported and compared to those of other members of the Moraceae family, particularly that of jacalin, also from jackfruit, with which it shares 52% sequence identity. KM+ presents an acetyl-blocked N-terminus and is not posttranslationally modified by proteolytic cleavage as is the case for jacalin. Rather, it possesses a short, glycine-rich linker that unites the regions homologous to the alpha- and beta-chains of jacalin. The results of homology modeling implicate the linker sequence in sterically impeding rotation of the side chain of Asp141 within the binding site pocket. As a consequence, the aspartic acid is locked into a conformation adequate only for the recognition of equatorial hydroxyl groups on the C4 epimeric center (alpha-D-mannose, alpha-D-glucose, and their derivatives). In contrast, the internal cleavage of the jacalin chain permits free rotation of the homologous aspartic acid, rendering it capable of accepting hydrogen bonds from both possible hydroxyl configurations on C4. We suggest that, together with direct recognition of epimeric hydroxyls and the steric exclusion of disfavored ligands, conformational restriction of the lectin should be considered to be a new mechanism by which selectivity may be built into carbohydrate binding sites. Jacalin and KM+ adopt the beta-prism fold already observed in two unrelated protein families. Despite presenting little or no sequence similarity, an analysis of the beta-prism reveals a canonical feature repeatedly present in all such structures, which is based on six largely hydrophobic residues within a beta-hairpin containing two classic-type beta-bulges. We suggest the term beta-prism motif to describe this feature. PMID:10210179
Unequal homologous recombination between tandemly arranged sequences stably incorporated into cultured rat cells.

PubMed Central

Stringer, J R; Kuhn, R M; Newman, J L; Meade, J C

1985-01-01

Cultured rat cells deficient in endogenous thymidine kinase activity (tk) were stably transformed with a recombination-indicator DNA substrate constructed in vitro by rearrangement of the herpes simplex virus tk gene sequences into a partially redundant permutation of the functional gene. The recombination-indicator DNA did not express tk, but was designed to allow formation of a functional tk gene via homologous recombination. A clonal cell line (519) was isolated that harbored several permuted herpes simplex virus tk genes. 519 cells spontaneously produced progeny that survived in medium containing hypoxanthine, aminopterin, and thymidine. Acquisition of resistance to hypoxanthine, aminopterin, and thymidine was accompanied by the rearrangement of the defective tk gene to functional configuration. The rearrangement apparently occurred by unequal exchange between one permuted tk gene and a replicated copy of itself. Recombination was between 500-base-pair tracts of DNA sequence homology that were separated by 3.4 kilobases. Exchanges occurred spontaneously at a frequency of approximately 5 X 10(-6) events per cell per generation. Recombination also mediated reversion to the tk- phenotype; however, the predominant mechanism by which cells escaped death in the presence of drugs rendered toxic by thymidine kinase was not recombination, but rather inactivation of the intact tk gene. Images PMID:3016511
Redesigning Aldolase Stereoselectivity by Homologous Grafting.

PubMed

Bisterfeld, Carolin; Classen, Thomas; Küberl, Irene; Henßen, Birgit; Metz, Alexander; Gohlke, Holger; Pietruszka, Jörg

2016-01-01

The 2-deoxy-d-ribose-5-phosphate aldolase (DERA) offers access to highly desirable building blocks for organic synthesis by catalyzing a stereoselective C-C bond formation between acetaldehyde and certain electrophilic aldehydes. DERA´s potential is particularly highlighted by the ability to catalyze sequential, highly enantioselective aldol reactions. However, its synthetic use is limited by the absence of an enantiocomplementary enzyme. Here, we introduce the concept of homologous grafting to identify stereoselectivity-determining amino acid positions in DERA. We identified such positions by structural analysis of the homologous aldolases 2-keto-3-deoxy-6-phosphogluconate aldolase (KDPG) and the enantiocomplementary enzyme 2-keto-3-deoxy-6-phosphogalactonate aldolase (KDPGal). Mutation of these positions led to a slightly inversed enantiopreference of both aldolases to the same extent. By transferring these sequence motifs onto DERA we achieved the intended change in enantioselectivity.
Nucleotide sequence of the gene encoding the nitrogenase iron protein of Thiobacillus ferrooxidans

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pretorius, I.M.; Rawlings, D.E.; O'Neill, E.G.

1987-01-01

The DNA sequence was determined for the cloned Thiobacillus ferrooxidans nifH and part of the nifD genes. The DNA chains were radiolabeled with (..cap alpha..-/sup 32/P)dCTP (3000 Ci/mmol) or (..cap alpha..-/sup 35/S)dCTP (400 Ci/mmol). A putative T. ferrooxidans nifH promoter was identified whose sequences showed perfect consensus with those of the Klebsiella pneumoniae nif promoter. Two putative consensus upstream activator sequences were also identified. The amino acid sequence was deduced from the DNA sequence. In a comparison of nifH DNA sequences from T. ferrooxidans and eight other nitrogen-fixing microbes, a Rhizobium sp. isolated from Parasponia andersonii showed the greatest homologymore » (74%) and Clostridium pasteurianum (nifH1) showed the least homology (54%). In the comparison of the amino acid sequences of the Fe proteins, the Rhizobium sp. and Rhizobium japonicum showed the greatest homology (both 86%) and C. pasteurianum (nifH1 gene product) demonstrated the least homology (56%) to the T. ferrooxidans Fe protein.« less
DNA Repair: The Search for Homology.

PubMed

Haber, James E

2018-05-01

The repair of chromosomal double-strand breaks (DSBs) by homologous recombination is essential to maintain genome integrity. The key step in DSB repair is the RecA/Rad51-mediated process to match sequences at the broken end to homologous donor sequences that can be used as a template to repair the lesion. Here, in reviewing research about DSB repair, I consider the many factors that appear to play important roles in the successful search for homology by several homologous recombination mechanisms. See also the video abstract here: https://youtu.be/vm7-X5uIzS8. © 2018 WILEY Periodicals, Inc.
Amino- and carboxyl-terminal amino acid sequences of proteins coded by gag gene of murine leukemia virus

PubMed Central

Oroszlan, Stephen; Henderson, Louis E.; Stephenson, John R.; Copeland, Terry D.; Long, Cedric W.; Ihle, James N.; Gilden, Raymond V.

1978-01-01

The amino- and carboxyl-terminal amino acid sequences of proteins (p10, p12, p15, and p30) coded by the gag gene of Rauscher and AKR murine leukemia viruses were determined. Among these proteins, p15 from both viruses appears to have a blocked amino end. Proline was found to be the common NH2 terminus of both p30s and both p12s, and alanine of both p10s. The amino-terminal sequences of p30s are identical, as are those of p10s, while the p12 sequences are clearly distinctive but also show substantial homology. The carboxyl-terminal amino acids of both viral p30s and p12s are leucine and phenylalanine, respectively. Rauscher leukemia virus p15 has tyrosine as the carboxyl terminus while AKR virus p15 has phenylalanine in this position. The compositional and sequence data provide definite chemical criteria for the identification of analogous gag gene products and for the comparison of viral proteins isolated in different laboratories. On the basis of amino acid sequences and the previously proposed H-p15-p12-p30-p10-COOH peptide sequence in the precursor polyprotein, a model for cleavage sites involved in the post-translational processing of the precursor coded for by the gag gene is proposed. PMID:206897
Method for identifying and quantifying nucleic acid sequence aberrations

DOEpatents

Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

1998-01-01

A method for detecting nucleic acid sequence aberrations by detecting nucleic acid sequences having both a first and a second nucleic acid sequence type, the presence of the first and second sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. The method uses a first hybridization probe which includes a nucleic acid sequence that is complementary to a first sequence type and a first complexing agent capable of attaching to a second complexing agent and a second hybridization probe which includes a nucleic acid sequence that selectively hybridizes to the second nucleic acid sequence type over the first sequence type and includes a detectable marker for detecting the second hybridization probe.
Cloning and characterization of the ddc homolog encoding L-2,4-diaminobutyrate decarboxylase in Enterobacter aerogenes.

PubMed

Yamamoto, S; Mutoh, N; Tsuzuki, D; Ikai, H; Nakao, H; Shinoda, S; Narimatsu, S; Miyoshi, S I

2000-05-01

L-2,4-diaminobutyrate decarboxylase (DABA DC) catalyzes the formation of 1,3-diaminopropane (DAP) from DABA. In the present study, the ddc gene encoding DABA DC from Enterobacter aerogenes ATCC 13048 was cloned and characterized. Determination of the nucleotide sequence revealed an open reading frame of 1470 bp encoding a 53659-Da protein of 490 amino acids, whose deduced NH2-terminal sequence was identical to that of purified DABA DC from E. aerogenes. The deduced amino acid sequence was highly similar to those of Acinetobacter baumannii and Haemophilus influenzae DABA DCs encoded by the ddc genes. The lysine-307 of the E. aerogenes DABA DC was identified as the pyridoxal 5'-phosphate binding residue by site-directed mutagenesis. Furthermore, PCR analysis revealed the distribution of E. aerogenes ddc homologs in some other species of Enterobacteriaceae. Such a relatively wide occurrence of the ddc homologs implies biological significance of DABA DC and its product DAP.
Comparative analysis of the prion protein gene sequences in African lion.

PubMed

Wu, Chang-De; Pang, Wan-Yong; Zhao, De-Ming

2006-10-01

The prion protein gene of African lion (Panthera Leo) was first cloned and polymorphisms screened. The results suggest that the prion protein gene of eight African lions is highly homogenous. The amino acid sequences of the prion protein (PrP) of all samples tested were identical. Four single nucleotide polymorphisms (C42T, C81A, C420T, T600C) in the prion protein gene (Prnp) of African lion were found, but no amino acid substitutions. Sequence analysis showed that the higher homology is observed to felis catus AF003087 (96.7%) and to sheep number M31313.1 (96.2%) Genbank accessed. With respect to all the mammalian prion protein sequences compared, the African lion prion protein sequence has three amino acid substitutions. The homology might in turn affect the potential intermolecular interactions critical for cross species transmission of prion disease.
Method for identifying and quantifying nucleic acid sequence aberrations

DOEpatents

Lucas, J.N.; Straume, T.; Bogen, K.T.

1998-07-21

A method is disclosed for detecting nucleic acid sequence aberrations by detecting nucleic acid sequences having both a first and a second nucleic acid sequence type, the presence of the first and second sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. The method uses a first hybridization probe which includes a nucleic acid sequence that is complementary to a first sequence type and a first complexing agent capable of attaching to a second complexing agent and a second hybridization probe which includes a nucleic acid sequence that selectively hybridizes to the second nucleic acid sequence type over the first sequence type and includes a detectable marker for detecting the second hybridization probe. 11 figs.
Nucleotide sequence determination of guinea-pig casein B mRNA reveals homology with bovine and rat alpha s1 caseins and conservation of the non-coding regions of the mRNA.

PubMed Central

Hall, L; Laird, J E; Craig, R K

1984-01-01

Nucleotide sequence analysis of cloned guinea-pig casein B cDNA sequences has identified two casein B variants related to the bovine and rat alpha s1 caseins. Amino acid homology was largely confined to the known bovine or predicted rat phosphorylation sites and within the 'signal' precursor sequence. Comparison of the deduced nucleotide sequence of the guinea-pig and rat alpha s1 casein mRNA species showed greater sequence conservation in the non-coding than in the coding regions, suggesting a functional and possibly regulatory role for the non-coding regions of casein mRNA. The results provide insight into the evolution of the casein genes, and raise questions as to the role of conserved nucleotide sequences within the non-coding regions of mRNA species. Images Fig. 1. PMID:6548375
Comparative genomic survey, exon-intron annotation and phylogenetic analysis of NAT-homologous sequences in archaea, protists, fungi, viruses, and invertebrates

USDA-ARS?s Scientific Manuscript database

We have previously published extensive genomic surveys [1-3], reporting NAT-homologous sequences in hundreds of sequenced bacterial, fungal and vertebrate genomes. We present here the results of our latest search of 2445 genomes, representing 1532 (70 archaeal, 1210 bacterial, 43 protist, 97 fungal,...
DNA homology among diverse spiroplasma strains representing several serological groups.

PubMed

Lee, I M; Davis, R E

1980-11-01

Deoxyribonucleic acid (DNA) homology among 10 strains of spiroplasma associated with plants and insects was assessed by analysis of DNA-DNA hybrids with single strand specific S1 nuclease. Based on DNA homology, the spiroplasmas could be divided into three genetically distinct groups (designated I, II, and III), corresponding to three separate serogroups described previously. DNA sequence homology between the three groups was less than or equal to 5%. Based on DNA homology, group I could be divided into three subgroups (A, B, and C) that corresponded to three serological subgroups of serogroup I. Subgroup A contained Spiroplasma citri strains Maroc R8A2 and C 189; subgroup B contained strains AS 576 from honey bee and G 1 from flowers; subgroup C contained corn stunt spiroplasma strains I-747 and PU 8-17. There was 27-54% DNA sequence homology among these three subgroups. Group II contained strains 23-6 and 27-31 isolated from flowers of tulip tree (Liriodendron tulipifera L.). Group III contained strains SR 3 and SR 9, other isolates from flowers of tulip tree. Based on thermal denaturation, guanine plus cytosine contents of DNA from five type strains representing all groups and subgroups were estimated to be close to 26 mol% for group I strains, close to 25 mol% for group II strains, and close to 29 mol% for group III strains. The genome molecular weights of these five type strains were all estimated to bae about 10(9).
Top-Down-Assisted Bottom-Up Method for Homologous Protein Sequencing: Hemoglobin from 33 Bird Species

NASA Astrophysics Data System (ADS)

Song, Yang; Laskay, Ünige A.; Vilcins, Inger-Marie E.; Barbour, Alan G.; Wysocki, Vicki H.

2015-11-01

Ticks are vectors for disease transmission because they are indiscriminant in their feeding on multiple vertebrate hosts, transmitting pathogens between their hosts. Identifying the hosts on which ticks have fed is important for disease prevention and intervention. We have previously shown that hemoglobin (Hb) remnants from a host on which a tick fed can be used to reveal the host's identity. For the present research, blood was collected from 33 bird species that are common in the U.S. as hosts for ticks but that have unknown Hb sequences. A top-down-assisted bottom-up mass spectrometry approach with a customized searching database, based on variability in known bird hemoglobin sequences, has been devised to facilitate fast and complete sequencing of hemoglobin from birds with unknown sequences. These hemoglobin sequences will be added to a hemoglobin database and used for tick host identification. The general approach has the potential to sequence any set of homologous proteins completely in a rapid manner.
Protein conformation and disease : pathological consequences of analogous mutations in homologous proteins.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stevens, F. J.; Pokkuluri, P. R.; Schiffer, M.

2000-12-19

The antibody light chain variable domain (V{sub L}){sup 1} and myelin protein zero (MPZ) are representatives of the functionally diverse immunoglobulin superfamily. The V{sub L} is a subunit of the antigen-binding component of antibodies, while MPZ is the major membrane-linked constituent of the myelin sheaths that coat peripheral nerves. Despite limited amino acid sequence homology, the conformations of the core structures of the two proteins are largely superimposable. Amino acid variations in V{sub L} account for various conformational disease outcomes, including amyloidosis. However, the specific amino acid changes in V{sub L} that are responsible for disease have been obscured bymore » multiple concurrent primary structure alterations. Recently, certain demyelination disorders have been linked to point mutations and single amino acid polymorphisms in MPZ. We demonstrate here that some pathogenic variations in MPZ correspond to changes suspected of determining amyloidosis in V{sub L}. This unanticipated observation suggests that studies of the biophysical origin of conformational disease in one member of a superfamily of homologous proteins may have implications throughout the superfamily. In some cases, findings may account for overt disease; in other cases, due to the natural repertoire of inherited polymorphisms, variations in a representative protein may predict subclinical impairment of homologous proteins.« less
PARRoT- a homology-based strategy to quantify and compare RNA-sequencing from non-model organisms.

PubMed

Gan, Ruei-Chi; Chen, Ting-Wen; Wu, Timothy H; Huang, Po-Jung; Lee, Chi-Ching; Yeh, Yuan-Ming; Chiu, Cheng-Hsun; Huang, Hsien-Da; Tang, Petrus

2016-12-22

Next-generation sequencing promises the de novo genomic and transcriptomic analysis of samples of interests. However, there are only a few organisms having reference genomic sequences and even fewer having well-defined or curated annotations. For transcriptome studies focusing on organisms lacking proper reference genomes, the common strategy is de novo assembly followed by functional annotation. However, things become even more complicated when multiple transcriptomes are compared. Here, we propose a new analysis strategy and quantification methods for quantifying expression level which not only generate a virtual reference from sequencing data, but also provide comparisons between transcriptomes. First, all reads from the transcriptome datasets are pooled together for de novo assembly. The assembled contigs are searched against NCBI NR databases to find potential homolog sequences. Based on the searched result, a set of virtual transcripts are generated and served as a reference transcriptome. By using the same reference, normalized quantification values including RC (read counts), eRPKM (estimated RPKM) and eTPM (estimated TPM) can be obtained that are comparable across transcriptome datasets. In order to demonstrate the feasibility of our strategy, we implement it in the web service PARRoT. PARRoT stands for Pipeline for Analyzing RNA Reads of Transcriptomes. It analyzes gene expression profiles for two transcriptome sequencing datasets. For better understanding of the biological meaning from the comparison among transcriptomes, PARRoT further provides linkage between these virtual transcripts and their potential function through showing best hits in SwissProt, NR database, assigning GO terms. Our demo datasets showed that PARRoT can analyze two paired-end transcriptomic datasets of approximately 100 million reads within just three hours. In this study, we proposed and implemented a strategy to analyze transcriptomes from non-reference organisms which offers
Epitopes of human testis-specific lactate dehydrogenase deduced from a cDNA sequence

DOE Office of Scientific and Technical Information (OSTI.GOV)

Millan, J.L.; Driscoll, C.E.; LeVan, K.M.

The sequence and structure of human testis-specific L-lactate dehydrogenase (LDHC/sub 4/, LDHX; (L)-lactate:NAD/sup +/ oxidoreductase, EC 1.1.1.27) has been derived from analysis of a complementary DNA (cDNA) clone comprising the complete protein coding region of the enzyme. From the deduced amino acid sequence, human LDHC/sub 4/ is as different from rodent LDHC/sub 4/ (73% homology) as it is from human LDHA/sub 4/ (76% homology) and porcine LDHB/sub 4/ (68% homology). Subunit homologies are consistent with the conclusion that the LDHC gene arose by at least two independent duplication events. Furthermore, the lower degree of homology between mouse and human LDHC/submore » 4/ and the appearance of this isozyme late in evolution suggests a higher rate of mutation in the mammalian LDHC genes than in the LDHA and -B genes. Comparison of exposed amino acid residues of discrete anti-genic determinants of mouse and human LDHC/sub 4/ reveals significant differences. Knowledge of the human LDHC/sub 4/ sequence will help design human-specific peptides useful in the development of a contraceptive vaccine.« less

A putative carbohydrate-binding domain of the lactose-binding Cytisus sessilifolius anti-H(O) lectin has a similar amino acid sequence to that of the L-fucose-binding Ulex europaeus anti-H(O) lectin.

PubMed

Konami, Y; Yamamoto, K; Osawa, T; Irimura, T

1995-04-01

The complete amino acid sequence of a lactose-binding Cytisus sessilifolius anti-H(O) lectin II (CSA-II) was determined using a protein sequencer. After digestion of CSA-II with endoproteinase Lys-C or Asp-N, the resulting peptides were purified by reversed-phase high performance liquid chromatography (HPLC) and then subjected to sequence analysis. Comparison of the complete amino acid sequence of CSA-II with the sequences of other leguminous seed lectins revealed regions of extensive homology. The amino acid sequence of a putative carbohydrate-binding domain of CSA-II was found to be similar to those of several anti-H(O) leguminous lectins, especially to that of the L-fucose-binding Ulex europaeus lectin I (UEA-I).
The OGCleaner: filtering false-positive homology clusters.

PubMed

Fujimoto, M Stanley; Suvorov, Anton; Jensen, Nicholas O; Clement, Mark J; Snell, Quinn; Bybee, Seth M

2017-01-01

Detecting homologous sequences in organisms is an essential step in protein structure and function prediction, gene annotation and phylogenetic tree construction. Heuristic methods are often employed for quality control of putative homology clusters. These heuristics, however, usually only apply to pairwise sequence comparison and do not examine clusters as a whole. We present the Orthology Group Cleaner (the OGCleaner), a tool designed for filtering putative orthology groups as homology or non-homology clusters by considering all sequences in a cluster. The OGCleaner relies on high-quality orthologous groups identified in OrthoDB to train machine learning algorithms that are able to distinguish between true-positive and false-positive homology groups. This package aims to improve the quality of phylogenetic tree construction especially in instances of lower-quality transcriptome assemblies. https://github.com/byucsl/ogcleaner CONTACT: sfujimoto@gmail.comSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Osteoblast-specific factor 2: cloning of a putative bone adhesion protein with homology with the insect protein fasciclin I.

PubMed Central

Takeshita, S; Kikuno, R; Tezuka, K; Amann, E

1993-01-01

A cDNA library prepared from the mouse osteoblastic cell line MC3T3-E1 was screened for the presence of specifically expressed genes by employing a combined subtraction hybridization/differential screening approach. A cDNA was identified and sequenced which encodes a protein designated osteoblast-specific factor 2 (OSF-2) comprising 811 amino acids. OSF-2 has a typical signal sequence, followed by a cysteine-rich domain, a fourfold repeated domain and a C-terminal domain. The protein lacks a typical transmembrane region. The fourfold repeated domain of OSF-2 shows homology with the insect protein fasciclin I. RNA analyses revealed that OSF-2 is expressed in bone and to a lesser extent in lung, but not in other tissues. Mouse OSF-2 cDNA was subsequently used as a probe to clone the human counterpart. Mouse and human OSF-2 show a high amino acid sequence conservation except for the signal sequence and two regions in the C-terminal domain in which 'in-frame' insertions or deletions are observed, implying alternative splicing events. On the basis of the amino acid sequence homology with fasciclin I, we suggest that OSF-2 functions as a homophilic adhesion molecule in bone formation. Images Figure 3 Figure 4 Figure 5 Figure 6 PMID:8363580
Sequence homology and expression profile of genes associated with DNA repair pathways in Mycobacterium leprae.

PubMed

Sharma, Mukul; Vedithi, Sundeep Chaitanya; Das, Madhusmita; Roy, Anindya; Ebenezer, Mannam

2017-01-01

Survival of Mycobacterium leprae, the causative bacteria for leprosy, in the human host is dependent to an extent on the ways in which its genome integrity is retained. DNA repair mechanisms protect bacterial DNA from damage induced by various stress factors. The current study is aimed at understanding the sequence and functional annotation of DNA repair genes in M. leprae. T he genome of M. leprae was annotated using sequence alignment tools to identify DNA repair genes that have homologs in Mycobacterium tuberculosis and Escherichia coli. A set of 96 genes known to be involved in DNA repair mechanisms in E. coli and Mycobacteriaceae were chosen as a reference. Among these, 61 were identified in M. leprae based on sequence similarity and domain architecture. The 61 were classified into 36 characterized gene products (59%), 11 hypothetical proteins (18%), and 14 pseudogenes (23%). All these genes have homologs in M. tuberculosis and 49 (80.32%) in E. coli. A set of 12 genes which are absent in E. coli were present in M. leprae and in Mycobacteriaceae. These 61 genes were further investigated for their expression profiles in the whole transcriptome microarray data of M. leprae which was obtained from the signal intensities of 60bp probes, tiling the entire genome with 10bp overlaps. It was noted that transcripts corresponding to all the 61 genes were identified in the transcriptome data with varying expression levels ranging from 0.18 to 2.47 fold (normalized with 16SrRNA). The mRNA expression levels of a representative set of seven genes ( four annotated and three hypothetical protein coding genes) were analyzed using quantitative Polymerase Chain Reaction (qPCR) assays with RNA extracted from skin biopsies of 10 newly diagnosed, untreated leprosy cases. It was noted that RNA expression levels were higher for genes involved in homologous recombination whereas the genes with a low level of expression are involved in the direct repair pathway. This study provided
Genomic perspectives of spider silk genes through target capture sequencing: Conservation of stabilization mechanisms and homology-based structural models of spidroin terminal regions.

PubMed

Collin, Matthew A; Clarke, Thomas H; Ayoub, Nadia A; Hayashi, Cheryl Y

2018-07-01

A powerful system for studying protein aggregation, particularly rapid self-assembly, is spider silk. Spider silks are proteinaceous and silk proteins are synthesized and stored within silk glands as liquid dope. As needed, liquid dope is near-instantaneously transformed into solid fibers or viscous adhesives. The dominant constituents of silks are spidroins (spider fibroins) and their terminal domains are vital for the tight control of silk self-assembly. To better understand spidroin termini, we used target capture and deep sequencing to identify spidroin gene sequences from six species representing the araneoid families of Araneidae, Nephilidae, and Theridiidae. We obtained 145 terminal regions, of which 103 are newly annotated here, as well as novel variants within nine diverse spidroin types. Our comparative analyses demonstrated the conservation of acidic, basic, and cysteine amino acid residues across spidroin types that had been proposed to be important for monomer stability, dimer formation, and self-assembly from a limited sampling of spidroins. Computational, protein homology modeling revealed areas of spidroin terminal regions that are highly conserved in three-dimensions despite sequence divergence across spidroin types. Analyses of our dense sampling of terminal regions suggest that most spidroins share stabilization mechanisms, dimer formation, and tertiary structure, despite producing functionally distinct materials. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.
Structures of Arg- and Gln-type bacterial cysteine dioxygenase homologs: Arg- and Gln-type Bacterial CDO Homologs

DOE PAGES

Driggers, Camden M.; Hartman, Steven J.; Karplus, P. Andrew

2015-01-01

In some bacteria, cysteine is converted to cysteine sulfinic acid by cysteine dioxygenases (CDO) that are only ~15–30% identical in sequence to mammalian CDOs. Among bacterial proteins having this range of sequence similarity to mammalian CDO are some that conserve an active site Arg residue (“Arg-type” enzymes) and some having a Gln substituted for this Arg (“Gln-type” enzymes). Here, we describe a structure from each of these enzyme types by analyzing structures originally solved by structural genomics groups but not published: a Bacillus subtilis “Arg-type” enzyme that has cysteine dioxygenase activity (BsCDO), and a Ralstonia eutropha “Gln-type” CDO homolog ofmore » uncharacterized activity (ReCDOhom). The BsCDO active site is well conserved with mammalian CDO, and a cysteine complex captured in the active site confirms that the cysteine binding mode is also similar. The ReCDOhom structure reveals a new active site Arg residue that is hydrogen bonding to an iron-bound diatomic molecule we have interpreted as dioxygen. Notably, the Arg position is not compatible with the mode of Cys binding seen in both rat CDO and BsCDO. As sequence alignments show that this newly discovered active site Arg is well conserved among “Gln-type” CDO enzymes, we conclude that the “Gln-type” CDO homologs are not authentic CDOs but will have substrate specificity more similar to 3-mercaptopropionate dioxygenases.« less
CDNA encoding a polypeptide including a hevein sequence

DOEpatents

Raikhel, Natasha V.; Broekaert, Willem F.; Chua, Nam-Hai; Kush, Anil

1995-03-21

A cDNA clone (HEV1) encoding hevein was isolated via polymerase chain reaction (PCR) using mixed oligonucleotides corresponding to two regions of hevein as primers and a Hevea brasiliensis latex cDNA library as a template. HEV1 is 1018 nucleotides long and includes an open reading frame of 204 amino acids. The deduced amino acid sequence contains a putative signal sequence of 17 amino acid residues followed by a 187 amino acid polypeptide. The amino-terminal region (43 amino acids) is identical to hevein and shows homology to several chitin-binding proteins and to the amino-termini of wound-induced genes in potato and poplar. The carboxyl-terminal portion of the polypeptide (144 amino acids) is 74-79% homologous to the carboxyl-terminal region of wound-inducible genes of potato. Wounding, as well as application of the plant hormones abscisic acid and ethylene, resulted in accumulation of hevein transcripts in leaves, stems and latex, but not in roots, as shown by using the cDNA as a probe. A fusion protein was produced in E. coli from the protein of the present invention and maltose binding protein produced by the E. coli.
A Multidimensional Strategy to Detect Polypharmacological Targets in the Absence of Structural and Sequence Homology

PubMed Central

Durrant, Jacob D.; Amaro, Rommie E.; Xie, Lei; Urbaniak, Michael D.; Ferguson, Michael A. J.; Haapalainen, Antti; Chen, Zhijun; Di Guilmi, Anne Marie; Wunder, Frank; Bourne, Philip E.; McCammon, J. Andrew

2010-01-01

Conventional drug design embraces the “one gene, one drug, one disease” philosophy. Polypharmacology, which focuses on multi-target drugs, has emerged as a new paradigm in drug discovery. The rational design of drugs that act via polypharmacological mechanisms can produce compounds that exhibit increased therapeutic potency and against which resistance is less likely to develop. Additionally, identifying multiple protein targets is also critical for side-effect prediction. One third of potential therapeutic compounds fail in clinical trials or are later removed from the market due to unacceptable side effects often caused by off-target binding. In the current work, we introduce a multidimensional strategy for the identification of secondary targets of known small-molecule inhibitors in the absence of global structural and sequence homology with the primary target protein. To demonstrate the utility of the strategy, we identify several targets of 4,5-dihydroxy-3-(1-naphthyldiazenyl)-2,7-naphthalenedisulfonic acid, a known micromolar inhibitor of Trypanosoma brucei RNA editing ligase 1. As it is capable of identifying potential secondary targets, the strategy described here may play a useful role in future efforts to reduce drug side effects and/or to increase polypharmacology. PMID:20098496
A multidimensional strategy to detect polypharmacological targets in the absence of structural and sequence homology.

PubMed

Durrant, Jacob D; Amaro, Rommie E; Xie, Lei; Urbaniak, Michael D; Ferguson, Michael A J; Haapalainen, Antti; Chen, Zhijun; Di Guilmi, Anne Marie; Wunder, Frank; Bourne, Philip E; McCammon, J Andrew

2010-01-22

Conventional drug design embraces the "one gene, one drug, one disease" philosophy. Polypharmacology, which focuses on multi-target drugs, has emerged as a new paradigm in drug discovery. The rational design of drugs that act via polypharmacological mechanisms can produce compounds that exhibit increased therapeutic potency and against which resistance is less likely to develop. Additionally, identifying multiple protein targets is also critical for side-effect prediction. One third of potential therapeutic compounds fail in clinical trials or are later removed from the market due to unacceptable side effects often caused by off-target binding. In the current work, we introduce a multidimensional strategy for the identification of secondary targets of known small-molecule inhibitors in the absence of global structural and sequence homology with the primary target protein. To demonstrate the utility of the strategy, we identify several targets of 4,5-dihydroxy-3-(1-naphthyldiazenyl)-2,7-naphthalenedisulfonic acid, a known micromolar inhibitor of Trypanosoma brucei RNA editing ligase 1. As it is capable of identifying potential secondary targets, the strategy described here may play a useful role in future efforts to reduce drug side effects and/or to increase polypharmacology.
SGP-1: Prediction and Validation of Homologous Genes Based on Sequence Alignments

PubMed Central

Wiehe, Thomas; Gebauer-Jung, Steffi; Mitchell-Olds, Thomas; Guigó, Roderic

2001-01-01

Conventional methods of gene prediction rely on the recognition of DNA-sequence signals, the coding potential or the comparison of a genomic sequence with a cDNA, EST, or protein database. Reasons for limited accuracy in many circumstances are species-specific training and the incompleteness of reference databases. Lately, comparative genome analysis has attracted increasing attention. Several analysis tools that are based on human/mouse comparisons are already available. Here, we present a program for the prediction of protein-coding genes, termed SGP-1 (Syntenic Gene Prediction), which is based on the similarity of homologous genomic sequences. In contrast to most existing tools, the accuracy of SGP-1 depends little on species-specific properties such as codon usage or the nucleotide distribution. SGP-1 may therefore be applied to nonstandard model organisms in vertebrates as well as in plants, without the need for extensive parameter training. In addition to predicting genes in large-scale genomic sequences, the program may be useful to validate gene structure annotations from databases. To this end, SGP-1 output also contains comparisons between predicted and annotated gene structures in HTML format. The program can be accessed via a Web server at http://soft.ice.mpg.de/sgp-1. The source code, written in ANSI C, is available on request from the authors. PMID:11544202
The complete DNA sequence of lymphocystis disease virus.

PubMed

Tidona, C A; Darai, G

1997-04-14

Lymphocystis disease virus (LCDV) is the causative agent of lymphocystis disease, which has been reported to occur in over 100 different fish species worldwide. LCDV is a member of the family Iridoviridae and the type species of the genus Lymphocystivirus. The virions contain a single linear double-stranded DNA molecule, which is circularly permuted, terminally redundant, and heavily methylated at cytosines in CpG sequences. The complete nucleotide sequence of LCDV-1 (flounder isolate) was determined by automated cycle sequencing and primer walking. The genome of LCDV-1 is 102.653 bp in length and contains 195 open reading frames with coding capacities ranging from 40 to 1199 amino acids. Computer-assisted analyses of the deduced amino acid sequences led to the identification of several putative gene products with significant homologies to entries in protein data banks, such as the two major subunits of the viral DNA-dependent RNA polymerase, DNA polymerase, several protein kinases, two subunits of the ribonucleoside diphosphate reductase, DNA methyltransferase, the viral major capsid protein, insulin-like growth factor, and tumor necrosis factor receptor homolog.
Two zebrafish G2A homologs activate multiple intracellular signaling pathways in acidic environment.

PubMed

Ichijo, Yuta; Mochimaru, Yuta; Azuma, Morio; Satou, Kazuhiro; Negishi, Jun; Nakakura, Takashi; Oshima, Natsuki; Mogi, Chihiro; Sato, Koichi; Matsuda, Kouhei; Okajima, Fumikazu; Tomura, Hideaki

2016-01-01

Human G2A is activated by various stimuli such as lysophosphatidylcholine (LPC), 9-hydroxyoctadecadienoic acid (9-HODE), and protons. The receptor is coupled to multiple intracellular signaling pathways, including the Gs-protein/cAMP/CRE, G12/13-protein/Rho/SRE, and Gq-protein/phospholipase C/NFAT pathways. In the present study, we examined whether zebrafish G2A homologs (zG2A-a and zG2A-b) could respond to these stimuli and activate multiple intracellular signaling pathways. We also examined whether histidine residue and basic amino acid residue in the N-terminus of the homologs also play roles similar to those played by human G2A residues if the homologs sense protons. We found that the zG2A-a showed the high CRE, SRE, and NFAT activities, however, zG2A-b showed only the high SRE activity under a pH of 8.0. Extracellular acidification from pH 7.4 to 6.3 ameliorated these activities in zG2A-a-expressing cells. On the other hand, acidification ameliorated the SRE activity but not the CRE and NFAT activities in zG2A-b-expressing cells. LPC or 9-HODE did not modify any activity of either homolog. The substitution of histidine residue at the 174(th) position from the N-terminus of zG2A-a to asparagine residue attenuated proton-induced CRE and NFAT activities but not SRE activity. The substitution of arginine residue at the 32nd position from the N-terminus of zG2A-a to the alanine residue also attenuated its high and the proton-induced CRE and NFAT activities. On the contrary, the substitution did not attenuate SRE activity. The substitution of the arginine residue at the 10th position from the N-terminus of zG2A-b to the alanine residue also did not attenuate its high or the proton-induced SRE activity. These results indicate that zebrafish G2A homologs were activated by protons but not by LPC and 9-HODE, and the activation mechanisms of the homologs were similar to those of human G2A. Copyright © 2015 Elsevier Inc. All rights reserved.
Nucleic Acid Encoding A Lectin-Derived Progenitor Cell Preservation Factor

DOEpatents

Colucci, M. Gabriella; Chrispeels, Maarten J.; Moore, Jeffrey G.

2001-10-30

The invention relates to an isolated nucleic acid molecule that encodes a protein that is effective to preserve progenitor cells, such as hematopoietic progenitor cells. The nucleic acid comprises a sequence defined by SEQ ID NO:1, a homolog thereof, or a fragment thereof. The encoded protein has an amino acid sequence that comprises a sequence defined by SEQ ID NO:2, a homolog thereof, or a fragment thereof that contains an amino acid sequence TNNVLQVT. Methods of using the encoded protein for preserving progenitor cells in vitro, ex vivo, and in vivo are also described. The invention, therefore, include methods such as myeloablation therapies for cancer treatment wherein myeloid reconstitution is facilitated by means of the specified protein. Other therapeutic utilities are also enabled through the invention, for example, expanding progenitor cell populations ex vivo to increase chances of engraftation, improving conditions for transporting and storing progenitor cells, and facilitating gene therapy to treat and cure a broad range of life-threatening hematologic diseases.
Construction and production of oncotropic vectors, derived from MVM(p), that share reduced sequence homology with helper plasmids.

PubMed

Clément, Nathalie; Velu, Thierry; Brandenburger, Annick

2002-09-01

The production of currently available vectors derived from autonomous parvoviruses requires the expression of capsid proteins in trans, from helper sequences. Cotransfection of a helper plasmid always generates significant amounts of replication-competent virus (RCV) that can be reduced by the integration of helper sequences into a packaging cell line. Although stocks of minute virus of mice (MVM)-based vectors with no detectable RCV could be produced by transfection into packaging cells; the latter appear after one or two rounds of replication, precluding further amplification of the vector stock. Indeed, once RCVs become detectable, they are efficiently amplified and rapidly take over the culture. Theoretically RCV-free vector stocks could be produced if all homology between vector and helper DNA is eliminated, thus preventing homologous recombination. We constructed new vectors based on the structure of spontaneously occurring defective particles of MVM. Based on published observations related to the size of vectors and the sequence of the viral origin of replication, these vectors were modified by the insertion of foreign DNA sequences downstream of the transgene and by the introduction of a consensus NS-1 nick site near the origin of replication to optimize their production. In one of the vectors the inserted fragment of mouse genomic DNA had a synergistic effect with the modified origin of replication in increasing vector production.
cDNA encoding a polypeptide including a hevein sequence

DOEpatents

Raikhel, Natasha V.; Broekaert, Willem F.; Chua, Nam-Hai; Kush, Anil

1999-05-04

A cDNA clone (HEV1) encoding hevein was isolated via polymerase chain reaction (PCR) using mixed oligonucleotides corresponding to two regions of hevein as primers and a Hevea brasiliensis latex cDNA library as a template. HEV1 is 1018 nucleotides long and includes an open reading frame of 204 amino acids. The deduced amino acid sequence contains a putative signal sequence of 17 amino acid residues followed by a 187 amino acid polypeptide. The amino-terminal region (43 amino acids) is identical to hevein and shows homology to several chitin-binding proteins and to the amino-termini of wound-induced genes in potato and poplar. The carboxyl-terminal portion of the polypeptide (144 amino acids) is 74-79% homologous to the carboxyl-terminal region of wound-inducible genes of potato. Wounding, as well as application of the plant hormones abscisic acid and ethylene, resulted in accumulation of hevein transcripts in leaves, stems and latex, but not in roots, as shown by using the cDNA as a probe. A fusion protein was produced in E. coli from the protein of the present invention and maltose binding protein produced by the E. coli.
cDNA encoding a polypeptide including a hevein sequence

DOEpatents

Raikhel, N.V.; Broekaert, W.F.; Chua, N.H.; Kush, A.

1999-05-04

A cDNA clone (HEV1) encoding hevein was isolated via polymerase chain reaction (PCR) using mixed oligonucleotides corresponding to two regions of hevein as primers and a Hevea brasiliensis latex cDNA library as a template. HEV1 is 1018 nucleotides long and includes an open reading frame of 204 amino acids. The deduced amino acid sequence contains a putative signal sequence of 17 amino acid residues followed by a 187 amino acid polypeptide. The amino-terminal region (43 amino acids) is identical to hevein and shows homology to several chitin-binding proteins and to the amino-termini of wound-induced genes in potato and poplar. The carboxyl-terminal portion of the polypeptide (144 amino acids) is 74--79% homologous to the carboxyl-terminal region of wound-inducible genes of potato. Wounding, as well as application of the plant hormones abscisic acid and ethylene, resulted in accumulation of hevein transcripts in leaves, stems and latex, but not in roots, as shown by using the cDNA as a probe. A fusion protein was produced in E. coli from the protein of the present invention and maltose binding protein produced by the E. coli. 12 figs.
cDNA encoding a polypeptide including a hevein sequence

DOEpatents

Raikhel, N.V.; Broekaert, W.F.; Chua, N.H.; Kush, A.

1995-03-21

A cDNA clone (HEV1) encoding hevein was isolated via polymerase chain reaction (PCR) using mixed oligonucleotides corresponding to two regions of hevein as primers and a Hevea brasiliensis latex cDNA library as a template. HEV1 is 1,018 nucleotides long and includes an open reading frame of 204 amino acids. The deduced amino acid sequence contains a putative signal sequence of 17 amino acid residues followed by a 187 amino acid polypeptide. The amino-terminal region (43 amino acids) is identical to hevein and shows homology to several chitin-binding proteins and to the amino-termini of wound-induced genes in potato and poplar. The carboxyl-terminal portion of the polypeptide (144 amino acids) is 74--79% homologous to the carboxyl-terminal region of wound-inducible genes of potato. Wounding, as well as application of the plant hormones abscisic acid and ethylene, resulted in accumulation of hevein transcripts in leaves, stems and latex, but not in roots, as shown by using the cDNA as a probe. A fusion protein was produced in E. coli from the protein of the present invention and maltose binding protein produced by the E. coli. 11 figures.
Lactobacillus kefiri shows inter-strain variations in the amino acid sequence of the S-layer proteins.

PubMed

Malamud, Mariano; Carasi, Paula; Bronsoms, Sílvia; Trejo, Sebastián A; Serradell, María de Los Angeles

2017-04-01

The S-layer is a proteinaceous envelope constituted by subunits that self-assemble to form a two-dimensional lattice that covers the surface of different species of Bacteria and Archaea, and it could be involved in cell recognition of microbes among other several distinct functions. In this work, both proteomic and genomic approaches were used to gain knowledge about the sequences of the S-layer protein (SLPs) encoding genes expressed by six aggregative and sixteen non-aggregative strains of potentially probiotic Lactobacillus kefiri. Peptide mass fingerprint (PMF) analysis confirmed the identity of SLPs extracted from L. kefiri, and based on the homology with phylogenetically related species, primers located outside and inside the SLP-genes were employed to amplify genomic DNA. The O-glycosylation site SASSAS was found in all L. kefiri SLPs. Ten strains were selected for sequencing of the complete genes. The total length of the mature proteins varies from 492 to 576 amino acids, and all SLPs have a calculated pI between 9.37 and 9.60. The N-terminal region is relatively conserved and shows a high percentage of positively charged amino acids. Major differences among strains are found in the C-terminal region. Different groups could be distinguished regarding the mature SLPs and the similarities observed in the PMF spectra. Interestingly, SLPs of the aggregative strains are 100% homologous, although these strains were isolated from different kefir grains. This knowledge provides relevant data for better understanding of the mechanisms involved in SLPs functionality and could contribute to the development of products of biotechnological interest from potentially probiotic bacteria.
Chimeric mitochondrial minichromosomes of the human body louse, Pediculus humanus: evidence for homologous and non-homologous recombination.

PubMed

Shao, Renfu; Barker, Stephen C

2011-02-15

The mitochondrial (mt) genome of the human body louse, Pediculus humanus, consists of 18 minichromosomes. Each minichromosome is 3 to 4 kb long and has 1 to 3 genes. There is unequivocal evidence for recombination between different mt minichromosomes in P. humanus. It is not known, however, how these minichromosomes recombine. Here, we report the discovery of eight chimeric mt minichromosomes in P. humanus. We classify these chimeric mt minichromosomes into two groups: Group I and Group II. Group I chimeric minichromosomes contain parts of two different protein-coding genes that are from different minichromosomes. The two parts of protein-coding genes in each Group I chimeric minichromosome are joined at a microhomologous nucleotide sequence; microhomologous nucleotide sequences are hallmarks of non-homologous recombination. Group II chimeric minichromosomes contain all of the genes and the non-coding regions of two different minichromosomes. The conserved sequence blocks in the non-coding regions of Group II chimeric minichromosomes resemble the "recombination repeats" in the non-coding regions of the mt genomes of higher plants. These repeats are essential to homologous recombination in higher plants. Our analyses of the nucleotide sequences of chimeric mt minichromosomes indicate both homologous and non-homologous recombination between minichromosomes in the mitochondria of the human body louse. Copyright © 2010 Elsevier B.V. All rights reserved.
Identification and chromosomal localization of Atm, the mouse homolog of the ataxia-telangiectasia gene.

PubMed

Pecker, I; Avraham, K B; Gilbert, D J; Savitsky, K; Rotman, G; Harnik, R; Fukao, T; Schröck, E; Hirotsune, S; Tagle, D A; Collins, F S; Wynshaw-Boris, A; Ried, T; Copeland, N G; Jenkins, N A; Shiloh, Y; Ziv, Y

1996-07-01

Atm, the mouse homolog of the human ATM gene defective in ataxia-telangiectasia (A-T), has been identified. The entire coding sequence of the Atm transcript was cloned and found to contain an open reading frame encoding a protein of 3066 amino acids with 84% overall identity and 91% similarity to the human ATM protein. Variable levels of expression of Atm were observed in different tissues. Fluorescence in situ hybridization and linkage analysis located the Atm gene on mouse chromosome 9, band 9C, in a region homologous to the ATM region on human chromosome 11q22-q23.

Detection of nucleic acid sequences by invader-directed cleavage

DOEpatents

Brow, Mary Ann D.; Hall, Jeff Steven Grotelueschen; Lyamichev, Victor; Olive, David Michael; Prudent, James Robert

1999-01-01

The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The 5' nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based by charge.
The mitochondnal genome of Aspergillus nidulans contains reading frames homologous to the human URFs 1 and 4.

PubMed Central

Brown, T A; Davies, R W; Ray, J A; Waring, R B; Scazzocchio, C

1983-01-01

A 2830-bp segment of the mitochondrial genome of the fungus Aspergillus nidulans was sequenced and shown to contain two unidentified reading frames (URFs). These reading frames are 352 and 488 codons in length, and would specify unmodified proteins of mol. wts. 39,000 and 54,000, respectively. The derived amino acid sequences indicate that these genes are equivalent to the human mitochondrial URFs 1 and 4, with 39% amino acid homology for URF1 and 26% for URF4. Both URFs were shown by secondary structure predictions to code for predominantly beta-sheeted proteins with strong structural conservation between the fungal and human homologues. Counterparts of mammalian URFs have not previously been identified in non-mammalian genomes, and the discovery that A. nidulans possesses reading frames so closely homologous with URF1 and URF4 shows that these genes are of general functional importance in the mitochondria of diverse species. PMID:11894959
Amino acid and nucleotide recurrence in aligned sequences: synonymous substitution patterns in association with global and local base compositions.

PubMed

Nishizawa, M; Nishizawa, K

2000-10-01

The tendency for repetitiveness of nucleotides in DNA sequences has been reported for a variety of organisms. We show that the tendency for repetitive use of amino acids is widespread and is observed even for segments conserved between human and Drosophila melanogaster at the level of >50% amino acid identity. This indicates that repetitiveness influences not only the weakly constrained segments but also those sequence segments conserved among phyla. Not only glutamine (Q) but also many of the 20 amino acids show a comparable level of repetitiveness. Repetitiveness in bases at codon position 3 is stronger for human than for D.melanogaster, whereas local repetitiveness in intron sequences is similar between the two organisms. While genes for immune system-specific proteins, but not ancient human genes (i.e. human homologs of Escherichia coli genes), have repetitiveness at codon bases 1 and 2, repetitiveness at codon base 3 for these groups is similar, suggesting that the human genome has at least two mechanisms generating local repetitiveness. Neither amino acid nor nucleotide repetitiveness is observed beyond the exon boundary, denying the possibility that such repetitiveness could mainly stem from natural selection on mRNA or protein sequences. Analyses of mammalian sequence alignments show that while the 'between gene' GC content heterogeneity, which is linked to 'isochores', is a principal factor associated with the bias in substitution patterns in human, 'within gene' heterogeneity in nucleotide composition is also associated with such bias on a more local scale. The relationship amongst the various types of repetitiveness is discussed.
Amino acid and nucleotide recurrence in aligned sequences: synonymous substitution patterns in association with global and local base compositions

PubMed Central

Nishizawa, Manami; Nishizawa, Kazuhisa

2000-01-01

The tendency for repetitiveness of nucleotides in DNA sequences has been reported for a variety of organisms. We show that the tendency for repetitive use of amino acids is widespread and is observed even for segments conserved between human and Drosophila melanogaster at the level of >50% amino acid identity. This indicates that repetitiveness influences not only the weakly constrained segments but also those sequence segments conserved among phyla. Not only glutamine (Q) but also many of the 20 amino acids show a comparable level of repetitiveness. Repetitiveness in bases at codon position 3 is stronger for human than for D.melanogaster, whereas local repetitiveness in intron sequences is similar between the two organisms. While genes for immune system-specific proteins, but not ancient human genes (i.e. human homologs of Escherichia coli genes), have repetitiveness at codon bases 1 and 2, repetitiveness at codon base 3 for these groups is similar, suggesting that the human genome has at least two mechanisms generating local repetitiveness. Neither amino acid nor nucleotide repetitiveness is observed beyond the exon boundary, denying the possibility that such repetitiveness could mainly stem from natural selection on mRNA or protein sequences. Analyses of mammalian sequence alignments show that while the ‘between gene’ GC content heterogeneity, which is linked to ‘isochores’, is a principal factor associated with the bias in substitution patterns in human, ‘within gene’ heterogeneity in nucleotide composition is also associated with such bias on a more local scale. The relationship amongst the various types of repetitiveness is discussed. PMID:11000273
Amino acid sequence of bovine muzzle epithelial desmocollin derived from cloned cDNA: a novel subtype of desmosomal cadherins.

PubMed

Koch, P J; Goldschmidt, M D; Walsh, M J; Zimbelmann, R; Schmelz, M; Franke, W W

1991-05-01

Desmosomes are cell-type-specific intercellular junctions found in epithelium, myocardium and certain other tissues. They consist of assemblies of molecules involved in the adhesion of specific cell types and in the anchorage of cell-type-specific cytoskeletal elements, the intermediate-size filaments, to the plasma membrane. To explore the individual desmosomal components and their functions we have isolated DNA clones encoding the desmosomal glycoprotein, desmocollin, using antibodies and a cDNA expression library from bovine muzzle epithelium. The cDNA-deduced amino-acid sequence of desmocollin (presently we cannot decide to which of the two desmocollins, DC I or DC II, this clone relates) defines a polypeptide with a calculated molecular weight of 85,000, with a single candidate sequence of 24 amino acids sufficiently long for a transmembrane arrangement, and an extracellular aminoterminal portion of 561 amino acid residues, compared to a cytoplasmic part of only 176 amino acids. Amino acid sequence comparisons have revealed that desmocollin is highly homologous to members of the cadherin family of cell adhesion molecules, including the previously sequenced desmoglein, another desmosome-specific cadherin. Using riboprobes derived from cDNAs for Northern-blot analyses, we have identified an mRNA of approximately 6 kb in stratified epithelia such as muzzle epithelium and tongue mucosa but not in two epithelial cell culture lines containing desmosomes and desmoplakins. The difference may indicate drastic differences in mRNA concentration or the existence of cell-type-specific desmocollin subforms. The molecular topology of desmocollin(s) is discussed in relation to possible functions of the individual molecular domains.
Methods and compositions for efficient nucleic acid sequencing

DOEpatents

Drmanac, Radoje

2006-07-04

Disclosed are novel methods and compositions for rapid and highly efficient nucleic acid sequencing based upon hybridization with two sets of small oligonucleotide probes of known sequences. Extremely large nucleic acid molecules, including chromosomes and non-amplified RNA, may be sequenced without prior cloning or subcloning steps. The methods of the invention also solve various current problems associated with sequencing technology such as, for example, high noise to signal ratios and difficult discrimination, attaching many nucleic acid fragments to a surface, preparing many, longer or more complex probes and labelling more species.
Methods and compositions for efficient nucleic acid sequencing

DOEpatents

Drmanac, Radoje

2002-01-01

Disclosed are novel methods and compositions for rapid and highly efficient nucleic acid sequencing based upon hybridization with two sets of small oligonucleotide probes of known sequences. Extremely large nucleic acid molecules, including chromosomes and non-amplified RNA, may be sequenced without prior cloning or subcloning steps. The methods of the invention also solve various current problems associated with sequencing technology such as, for example, high noise to signal ratios and difficult discrimination, attaching many nucleic acid fragments to a surface, preparing many, longer or more complex probes and labelling more species.
Pseudomonas aeruginosa Type III Secretory Toxin ExoU and Its Predicted Homologs.

PubMed

Sawa, Teiji; Hamaoka, Saeko; Kinoshita, Mao; Kainuma, Atsushi; Naito, Yoshifumi; Akiyama, Koichi; Kato, Hideya

2016-10-26

Pseudomonas aeruginosa ExoU, a type III secretory toxin and major virulence factor with patatin-like phospholipase activity, is responsible for acute lung injury and sepsis in immunocompromised patients. Through use of a recently updated bacterial genome database, protein sequences predicted to be homologous to Ps. aeruginosa ExoU were identified in 17 other Pseudomonas species ( Ps. fluorescens , Ps. lundensis , Ps. weihenstephanensis , Ps. marginalis, Ps. rhodesiae, Ps. synxantha , Ps. libanensis , Ps. extremaustralis , Ps. veronii , Ps. simiae , Ps. trivialis , Ps. tolaasii , Ps. orientalis , Ps. taetrolens , Ps. syringae , Ps. viridiflava , and Ps. cannabina ) and 8 Gram-negative bacteria from three other genera ( Photorhabdus , Aeromonas , and Paludibacterium ). In the alignment of the predicted primary amino acid sequences used for the phylogenetic analyses, both highly conserved and nonconserved parts of the toxin were discovered among the various species. Further comparative studies of the predicted ExoU homologs should provide us with more detailed information about the unique characteristics of the Ps. aeruginosa ExoU toxin.
cDNA encoding a polypeptide including a hev ein sequence

DOEpatents

Raikhel, Natasha V.; Broekaert, Willem F.; Chua, Nam-Hai; Kush, Anil

2000-07-04

A cDNA clone (HEV1) encoding hevein was isolated via polymerase chain reaction (PCR) using mixed oligonucleotides corresponding to two regions of hevein as primers and a Hevea brasiliensis latex cDNA library as a template. HEV1 is 1018 nucleotides long and includes an open reading frame of 204 amino acids. The deduced amino acid sequence contains a putative signal sequence of 17 amino acid residues followed by a 187 amino acid polypeptide. The amino-terminal region (43 amino acids) is identical to hevein and shows homology to several chitin-binding proteins and to the amino-termini of wound-induced genes in potato and poplar. The carboxyl-terminal portion of the polypeptide (144 amino acids) is 74-79% homologous to the carboxyl-terminal region of wound-inducible genes of potato. Wounding, as well as application of the plant hormones abscisic acid and ethylene, resulted in accumulation of hevein transcripts in leaves, stems and latex, but not in roots, as shown by using the cDNA as a probe. A fusion protein was produced in E. coli from the protein of the present invention and maltose binding protein produced by the E. coli.
Purification, characterization, gene cloning and nucleotide sequencing of D: -stereospecific amino acid amidase from soil bacterium: Delftia acidovorans.

PubMed

Hongpattarakere, Tipparat; Komeda, Hidenobu; Asano, Yasuhisa

2005-12-01

The D-amino acid amidase-producing bacterium was isolated from soil samples using an enrichment culture technique in medium broth containing D-phenylalanine amide as a sole source of nitrogen. The strain exhibiting the strongest activity was identified as Delftia acidovorans strain 16. This strain produced intracellular D-amino acid amidase constitutively. The enzyme was purified about 380-fold to homogeneity and its molecular mass was estimated to be about 50 kDa, on sodium dodecyl sulfate polyacrylamide gel electrophoresis. The enzyme was active preferentially toward D-amino acid amides rather than their L-counterparts. It exhibited strong amino acid amidase activity toward aromatic amino acid amides including D-phenylalanine amide, D-tryptophan amide and D-tyrosine amide, yet it was not specifically active toward low-molecular-weight D-amino acid amides such as D-alanine amide, L-alanine amide and L-serine amide. Moreover, it was not specifically active toward oligopeptides. The enzyme showed maximum activity at 40 degrees C and pH 8.5 and appeared to be very stable, with 92.5% remaining activity after the reaction was performed at 45 degrees C for 30 min. However, it was mostly inactivated in the presence of phenylmethanesulfonyl fluoride or Cd2+, Ag+, Zn2+, Hg2+ and As3+ . The NH2 terminal and internal amino acid sequences of the enzyme were determined; and the gene was cloned and sequenced. The enzyme gene damA encodes a 466-amino-acid protein (molecular mass 49,860.46 Da); and the deduced amino acid sequence exhibits homology to the D-amino acid amidase from Variovorax paradoxus (67.9% identity), the amidotransferase A subunit from Burkholderia fungorum (50% identity) and other enantioselective amidases.
DR-78, a novel Drosophila melanogaster genomic DNA fragment highly homologous to the DNA-binding domain of thyroid hormone-retinoic acid-vitamin D receptor subfamily.

PubMed

Martín-Blanco, E; Kornberg, T B

1993-11-16

Degenerate oligodeoxyribonucleotides were designed for both ends of the DNA-binding domain of members of the nuclear receptor superfamily. PCR amplified Drosophila melanogaster DNA was purified and cloned (DR plasmids). Genomic lambda DASH clones were identified at high stringency with an amplified DR-78 plasmid DNA and isolated. The partial sequence shows a very probable open reading frame which would encode a peptide highly homologous to members of the thyroid hormone-retinoic acid-vitamin D receptor subfamily. The fragment corresponds to a single copy gene and was mapped at position 78D of chromosome three by in situ hybridization.
Molecular characterization of DnaJ 5 homologs in silkworm Bombyx mori and its expression during egg diapause.

PubMed

Sirigineedi, Sasibhushan; Vijayagowri, Esvaran; Murthy, Geetha N; Rao, Guruprasada; Ponnuvel, Kangayam M

2014-12-01

A comparison of the cDNA sequences (1 056 bp) of Bombyx mori DnaJ 5 homolog with B. mori genome revealed that unlike in other Hsps, it has an intron of 234 bp. The DnaJ 5 homolog contains 351 amino acids, of which 70 contain the conserved DnaJ domain at the N-terminal end. This homolog of B. mori has all desirable functional domains similar to other insects, and the 13 different DnaJ homologs identified in B. mori genome were distributed on different chromosomes. The expressed sequence tag database analysis of Hsp40 gene expression revealed higher expression in wing disc followed by diapause-induced eggs. Microarray analysis revealed higher expression of DnaJ 5 homolog at 18th h after oviposition in diapause-induced eggs. Further validation of DnaJ 5 expression through qPCR in diapause-induced and nondiapause eggs at different time intervals revealed higher expression in diapause eggs at 18 and 24 h after oviposition, which coincided with the expression of Hsp70 as the Hsp 40 is its co-chaperone. This study thus provides an outline of the genome organization of Hsp40 gene, and its role in egg diapause induction in B. mori. © 2013 Institute of Zoology, Chinese Academy of Sciences.
A Symplectic Instanton Homology via Traceless Character Varieties

NASA Astrophysics Data System (ADS)

Horton, Henry T.

Since its inception, Floer homology has been an important tool in low-dimensional topology. Floer theoretic invariants of 3-manifolds tend to be either gauge theoretic or symplecto-geometric in nature, and there is a general philosophy that each gauge theoretic Floer homology should have a corresponding symplectic Floer homology and vice-versa. In this thesis, we construct a Lagrangian Floer invariant for any closed, oriented 3-manifold Y (called the symplectic instanton homology of Y and denoted SI(Y)) which is conjecturally equivalent to a Floer homology defined using a certain variant of Yang-Mills gauge theory. The crucial ingredient for defining SI( Y) is the use of traceless character varieties in the symplectic setting, which allow us to avoid the debilitating technical hurdles present when one attempts to define a symplectic version of instanton Floer homologies. Floer theories are also expected to roughly satisfy the axioms of a topological quantum field theory (TQFT), and furthermore Dehn surgeries on knots should induce exact triangles of Floer homologies. Following a strategy used by Ozsvath and Szabo in the context of Heegaard Floer homology, we prove that our theory is functorial with respect to connected 4-dimensional cobordisms, so that cobordisms induce homomorphisms between symplectic instanton homologies. By studying the effect of Dehn surgeries on traceless character varieties, we establish a surgery exact triangle using work of Seidel that relates the geometry of Lefschetz fibrations with exact triangles in Lagrangian Floer theory. We further prove that Dehn surgeries on a link L in a 3-manifold Y induce a spectral sequence of symplectic instanton homologies - the E2-page is isomorphic to a direct sum of symplectic instanton homologies of all possible combinations of 0- and 1-surgeries on the components of L, and the spectral sequence converges to SI(Y). For the branched double cover Sigma(L) of a link L in S3, we show there is a link surgery
Advances in Homology Protein Structure Modeling

PubMed Central

Xiang, Zhexin

2007-01-01

Homology modeling plays a central role in determining protein structure in the structural genomics project. The importance of homology modeling has been steadily increasing because of the large gap that exists between the overwhelming number of available protein sequences and experimentally solved protein structures, and also, more importantly, because of the increasing reliability and accuracy of the method. In fact, a protein sequence with over 30% identity to a known structure can often be predicted with an accuracy equivalent to a low-resolution X-ray structure. The recent advances in homology modeling, especially in detecting distant homologues, aligning sequences with template structures, modeling of loops and side chains, as well as detecting errors in a model, have contributed to reliable prediction of protein structure, which was not possible even several years ago. The ongoing efforts in solving protein structures, which can be time-consuming and often difficult, will continue to spur the development of a host of new computational methods that can fill in the gap and further contribute to understanding the relationship between protein structure and function. PMID:16787261
Novel homologous lactate transporter improves L-lactic acid production from glycerol in recombinant strains of Pichia pastoris.

PubMed

de Lima, Pollyne Borborema Almeida; Mulder, Kelly Cristina Leite; Melo, Nadiele Tamires Moreira; Carvalho, Lucas Silva; Menino, Gisele Soares; Mulinari, Eduardo; de Castro, Virgilio H; Dos Reis, Thaila F; Goldman, Gustavo Henrique; Magalhães, Beatriz Simas; Parachin, Nádia Skorupa

2016-09-15

Crude glycerol is the main byproduct of the biodiesel industry. Although it can have different applications, its purification is costly. Therefore, in this study a biotechnological route has been proposed for further utilization of crude glycerol in the fermentative production of lactic acid. This acid is largely utilized in food, pharmaceutical, textile, and chemical industries, making it the hydroxycarboxylic acid with the highest market potential worldwide. Currently, industrial production of lactic acid is done mainly using sugar as the substrate. Thus here, for the first time, Pichia pastoris has been engineered for heterologous L-lactic acid production using glycerol as a single carbon source. For that, the Bos taurus lactate dehydrogenase gene was introduced into P. pastoris. Moreover, a heterologous and a novel homologous lactate transporter have been evaluated for L-lactic acid production. Batch fermentation of the P. pastoris X-33 strain producing LDHb allowed for lactic acid production in this yeast. Although P. pastoris is known for its respiratory metabolism, batch fermentations were performed with different oxygenation levels, indicating that lower oxygen availability increased lactic acid production by 20 %, pushing the yeast towards a fermentative metabolism. Furthermore, a newly putative lactate transporter from P. pastoris named PAS has been identified by search similarity with the lactate transporter from Saccharomyces cerevisiae Jen1p. Both heterologous and homologous transporters, Jen1p and PAS, were evaluated in one strain already containing LDH activity. Fed-batch experiments of P. pastoris strains carrying the lactate transporter were performed with the batch phase at aerobic conditions followed by an aerobic oxygen-limited phase where production of lactic acid was favored. The results showed that the strain containing PAS presented the highest lactic acid titer, reaching a yield of approximately 0.7 g/g. We showed that P. pastoris has a
Cloning and characterization of WRKY gene homologs in Chieh-qua (Benincasa hispida Cogn. var. Chieh-qua How) and their expression in response to fusaric acid treatment.

PubMed

Mao, Yizhou; Jiang, Biao; Peng, Qingwu; Liu, Wenrui; Lin, Yue; Xie, Dasen; He, Xiaoming; Li, Shaoshan

2017-05-01

The WRKY transcription factors play an important role in plant resistance for biotic and abiotic stresses. In the present study, we cloned 10 WRKY gene homologs (CqWRKY) in Chieh-qua (Benincasa hispida Cogn. var. Chieh-qua) using the rapid-amplification of cDNA ends (RACE) or homology-based cloning methods. We characterized the structure of these CqWRKY genes. Phylogenetic analysis of these sequences with cucumber homologs suggested possible structural conservation of these genes among cucurbit crops. We examined the expression levels of these genes in response to fusaric acid (FA) treatment between resistant and susceptible Chieh-qua lines with quantitative real-time PCR. All genes could be upregulated upon FA treatment, but four CqWRKY genes exhibited differential expression between resistant and susceptible lines before and after FA application. CqWRKY31 seemed to be a positive regulator while CqWRKY1, CqWRKY23 and CqWRKY53 were negative regulators of fusaric resistance. This is the first report of characterization of WRKY family genes in Chieh-qua. The results may also be useful in breeding Chieh-qua for Fusarium wilt resistance.
Homology and phylogeny and their automated inference

NASA Astrophysics Data System (ADS)

Fuellen, Georg

2008-06-01

The analysis of the ever-increasing amount of biological and biomedical data can be pushed forward by comparing the data within and among species. For example, an integrative analysis of data from the genome sequencing projects for various species traces the evolution of the genomes and identifies conserved and innovative parts. Here, I review the foundations and advantages of this “historical” approach and evaluate recent attempts at automating such analyses. Biological data is comparable if a common origin exists (homology), as is the case for members of a gene family originating via duplication of an ancestral gene. If the family has relatives in other species, we can assume that the ancestral gene was present in the ancestral species from which all the other species evolved. In particular, describing the relationships among the duplicated biological sequences found in the various species is often possible by a phylogeny, which is more informative than homology statements. Detecting and elaborating on common origins may answer how certain biological sequences developed, and predict what sequences are in a particular species and what their function is. Such knowledge transfer from sequences in one species to the homologous sequences of the other is based on the principle of ‘my closest relative looks and behaves like I do’, often referred to as ‘guilt by association’. To enable knowledge transfer on a large scale, several automated ‘phylogenomics pipelines’ have been developed in recent years, and seven of these will be described and compared. Overall, the examples in this review demonstrate that homology and phylogeny analyses, done on a large (and automated) scale, can give insights into function in biology and biomedicine.
Dipeptide Sequence Determination: Analyzing Phenylthiohydantoin Amino Acids by HPLC

NASA Astrophysics Data System (ADS)

Barton, Janice S.; Tang, Chung-Fei; Reed, Steven S.

2000-02-01

Amino acid composition and sequence determination, important techniques for characterizing peptides and proteins, are essential for predicting conformation and studying sequence alignment. This experiment presents improved, fundamental methods of sequence analysis for an upper-division biochemistry laboratory. Working in pairs, students use the Edman reagent to prepare phenylthiohydantoin derivatives of amino acids for determination of the sequence of an unknown dipeptide. With a single HPLC technique, students identify both the N-terminal amino acid and the composition of the dipeptide. This method yields good precision of retention times and allows use of a broad range of amino acids as components of the dipeptide. Students learn fundamental principles and techniques of sequence analysis and HPLC.
Allergenic characterization of a novel allergen, homologous to chymotrypsin, from german cockroach.

PubMed

Jeong, Kyoung Yong; Son, Mina; Lee, Jae Hyun; Hong, Chein Soo; Park, Jung Won

2015-05-01

Cockroach feces are known to be rich in IgE-reactive components. Various protease allergens were identified by proteomic analysis of German cockroach fecal extract in a previous study. In this study, we characterized a novel allergen, a chymotrypsin-like serine protease. A cDNA sequence homologous to chymotrypsin was obtained by analysis of German cockroach expressed sequence tag (EST) clones. The recombinant chymotrypsins from the German cockroach and house dust mite (Der f 6) were expressed in Escherichia coli using the pEXP5NT/TOPO vector system, and their allergenicity was investigated by ELISA. The deduced amino acid sequence of German cockroach chymotrypsin showed 32.7 to 43.1% identity with mite group 3 (trypsin) and group 6 (chymotrypsin) allergens. Sera from 8 of 28 German cockroach allergy subjects (28.6%) showed IgE binding to the recombinant protein. IgE binding to the recombinant cockroach chymotrypsin was inhibited by house dust mite chymotrypsin Der f 6, while it minimally inhibited the German cockroach whole body extract. A novel allergen homologous to chymotrypsin was identified from the German cockroach and was cross-reactive with Der f 6.
Nucleotide and deduced amino acid sequence of the envelope gene of the Vasilchenko strain of TBE virus; comparison with other flaviviruses.

PubMed

Gritsun, T S; Frolova, T V; Pogodina, V V; Lashkevich, V A; Venugopal, K; Gould, E A

1993-02-01

A strain of tick-borne encephalitis virus known as Vasilchenko (Vs) exhibits relatively low virulence characteristics in monkeys, Syrian hamsters and humans. The gene encoding the envelope glycoprotein of this virus was cloned and sequenced. Alignment of the sequence with those of other known tick-borne flaviviruses and identification of the recognised amino acid genetic marker EHLPTA confirmed its identity as a member of the TBE complex. However, Vs virus was distinguishable from eastern and western tick-borne serotypes by the presence of the sequence AQQ at amino acid positions 232-234 and also by the presence of other specific amino acid substitutions which may be genetic markers for these viruses and could determine their pathogenetic characteristics. When compared with other tick-borne flaviviruses, Vs virus had 12 unique amino acid substitutions including an additional potential glycosylation site at position (315-317). The Vs virus strain shared closest nucleotide and amino acid homology (84.5% and 95.5% respectively) with western and far eastern strains of tick-borne encephalitis virus. Comparison with the far eastern serotype of tick-borne encephalitis virus, by cross-immunoelectrophoresis of Vs virions and PAGE analysis of the extracted virion proteins, revealed differences in surface charge and virus stability that may account for the different virulence characteristics of Vs virus. These results support and enlarge upon previous data obtained from molecular and serological analysis.

Cloning and characterization of the Drosophila homolog of the xeroderma pigmentosum complementation-group B correcting gene, ERCC3.

PubMed Central

Koken, M H; Vreeken, C; Bol, S A; Cheng, N C; Jaspers-Dekker, I; Hoeijmakers, J H; Eeken, J C; Weeda, G; Pastink, A

1992-01-01

Previously the human nucleotide excision repair gene ERCC3 was shown to be responsible for a rare combination of the autosomal recessive DNA repair disorders xeroderma pigmentosum (complementation group B) and Cockayne's syndrome (complementation group C). The human and mouse ERCC3 proteins contain several sequence motifs suggesting that it is a nucleic acid or chromatin binding helicase. To study the significance of these domains and the overall evolutionary conservation of the gene, the homolog from Drosophila melanogaster was isolated by low stringency hybridizations using two flanking probes of the human ERCC3 cDNA. The flanking probe strategy selects for long stretches of nucleotide sequence homology, and avoids isolation of small regions with fortuitous homology. In situ hybridization localized the gene onto chromosome III 67E3/4, a region devoid of known D.melanogaster mutagen sensitive mutants. Northern blot analysis showed that the gene is continuously expressed in all stages of fly development. A slight increase (2-3 times) of ERCC3Dm transcript was observed in the later stages. Two almost full length cDNAs were isolated, which have different 5' untranslated regions (UTR). The SD4 cDNA harbours only one long open reading frame (ORF) coding for ERCC3Dm. Another clone (SD2), however, has the potential to encode two proteins: a 170 amino acids polypeptide starting at the optimal first ATG has no detectable homology with any other proteins currently in the data bases, and another ORF beginning at the suboptimal second startcodon which is identical to that of SD4. Comparison of the encoded ERCC3Dm protein with the homologous proteins of mouse and man shows a strong amino acid conservation (71% identity), especially in the postulated DNA binding region and seven 'helicase' domains. The ERCC3Dm sequence is fully consistent with the presumed functions and the high conservation of these regions strengthens their functional significance. Microinjection and DNA
Expression of an Atriplex nummularia gene encoding a protein homologous to the bacterial molecular chaperone DnaJ.

PubMed Central

Zhu, J K; Shi, J; Bressan, R A; Hasegawa, P M

1993-01-01

DnaJ is a 36-kD heat shock protein that functions together with Dnak (Hsp70) as a molecular chaperone in Escherichia coli. We have obtained a cDNA clone from the higher plant Atriplex nummularia that encodes a 46.6-kD polypeptide (ANJ1) with an overall 35.2% amino acid sequence identity with the E. coli DnaJ. ANJ1 has 43.4% overall sequence identity with the Saccharomyces cerevisiae cytoplasmic DnaJ homolog YDJ1/MAS5. Complementation of the yeast mas5 mutation indicated that ANJ1 is a functional homolog of YDJ1/MAS5. The presence of other DnaJ homologs in A. nummularia was demonstrated by the detection of proteins that are antigenically related to the yeast mitochondrial DnaJ homolog SCJ1 and the yeast DnaJ-related protein Sec63. Expression of the ANJ1 gene was compared with that of an A. nummularia Hsp70 gene. Expression of both ANJ1 and Hsp70 transcripts was coordinately induced by heat shock. However, noncoordinate accumulation of ANJ1 and Hsp70 mRNAs occurred during the cell growth cycle and in response to NaCl stress. PMID:8467224
Expression of an Atriplex nummularia gene encoding a protein homologous to the bacterial molecular chaperone DnaJ.

PubMed

Zhu, J K; Shi, J; Bressan, R A; Hasegawa, P M

1993-03-01

DnaJ is a 36-kD heat shock protein that functions together with Dnak (Hsp70) as a molecular chaperone in Escherichia coli. We have obtained a cDNA clone from the higher plant Atriplex nummularia that encodes a 46.6-kD polypeptide (ANJ1) with an overall 35.2% amino acid sequence identity with the E. coli DnaJ. ANJ1 has 43.4% overall sequence identity with the Saccharomyces cerevisiae cytoplasmic DnaJ homolog YDJ1/MAS5. Complementation of the yeast mas5 mutation indicated that ANJ1 is a functional homolog of YDJ1/MAS5. The presence of other DnaJ homologs in A. nummularia was demonstrated by the detection of proteins that are antigenically related to the yeast mitochondrial DnaJ homolog SCJ1 and the yeast DnaJ-related protein Sec63. Expression of the ANJ1 gene was compared with that of an A. nummularia Hsp70 gene. Expression of both ANJ1 and Hsp70 transcripts was coordinately induced by heat shock. However, noncoordinate accumulation of ANJ1 and Hsp70 mRNAs occurred during the cell growth cycle and in response to NaCl stress.
77 FR 65537 - Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence...

Federal Register 2010, 2011, 2012, 2013, 2014

2012-10-29

... DEPARTMENT OF COMMERCE Patent and Trademark Office Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence Disclosures ACTION: Proposed collection; comment request... Patent applications that contain nucleotide and/or amino acid sequence disclosures must include a copy of...
High speed nucleic acid sequencing

DOEpatents

Korlach, Jonas [Ithaca, NY; Webb, Watt W [Ithaca, NY; Levene, Michael [Ithaca, NY; Turner, Stephen [Ithaca, NY; Craighead, Harold G [Ithaca, NY; Foquet, Mathieu [Ithaca, NY

2011-05-17

The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid. Each type of labeled nucleotide comprises an acceptor fluorophore attached to a phosphate portion of the nucleotide such that the fluorophore is removed upon incorporation into a growing strand. Fluorescent signal is emitted via fluorescent resonance energy transfer between the donor fluorophore and the acceptor fluorophore as each nucleotide is incorporated into the growing strand. The sequence is deduced by identifying which base is being incorporated into the growing strand.
Identification of SHIP-1 and SHIP-2 homologs in channel catfish, Ictalurus punctatus

USDA-ARS?s Scientific Manuscript database

Src homology domain 2 (SH2) domain-containing inositol 5’-phosphatases (SHIP) proteins have diverse roles in signal transduction. SHIP-1 and SHIP-2 homologs were identified in channel catfish, Ictalurus punctatus, based on sequence homology to murine and human SHIP sequences. Full-length cDNAs for ...
Metagenomic gene annotation by a homology-independent approach

DOE Office of Scientific and Technical Information (OSTI.GOV)

Froula, Jeff; Zhang, Tao; Salmeen, Annette

2011-06-02

Fully understanding the genetic potential of a microbial community requires functional annotation of all the genes it encodes. The recently developed deep metagenome sequencing approach has enabled rapid identification of millions of genes from a complex microbial community without cultivation. Current homology-based gene annotation fails to detect distantly-related or structural homologs. Furthermore, homology searches with millions of genes are very computational intensive. To overcome these limitations, we developed rhModeller, a homology-independent software pipeline to efficiently annotate genes from metagenomic sequencing projects. Using cellulases and carbonic anhydrases as two independent test cases, we demonstrated that rhModeller is much faster than HMMERmore » but with comparable accuracy, at 94.5percent and 99.9percent accuracy, respectively. More importantly, rhModeller has the ability to detect novel proteins that do not share significant homology to any known protein families. As {approx}50percent of the 2 million genes derived from the cow rumen metagenome failed to be annotated based on sequence homology, we tested whether rhModeller could be used to annotate these genes. Preliminary results suggest that rhModeller is robust in the presence of missense and frameshift mutations, two common errors in metagenomic genes. Applying the pipeline to the cow rumen genes identified 4,990 novel cellulases candidates and 8,196 novel carbonic anhydrase candidates.In summary, we expect rhModeller to dramatically increase the speed and quality of metagnomic gene annotation.« less
Cloning and nucleotide sequence of the Pseudomonas aeruginosa glucose-selective OprB porin gene and distribution of OprB within the family Pseudomonadaceae.

PubMed

Wylie, J L; Worobec, E A

1994-03-01

OprB is a glucose-selective porin known to be produced by Pseudomonas aeruginosa and Pseudomonas putida. We have cloned and sequenced the oprB gene of P. aeruginosa and obtained expression of OprB in Escherichia coli. The mature protein consists of 423 amino acid residues with a deduced molecular mass of 47597 Da. Several clusters of amino acid residues, potentially involved in the structure or function of the protein, were identified. An area of regional homology with E. coli LamB was also identified. Carbohydrate-inducible proteins, potentially homologous to OprB, were identified in several rRNA homology-group-I pseudomonads by sodium dodecyl sulfate/polyacrylamide gel electrophoresis analysis, Western immunoblotting and N-terminal amino acid sequencing. These species also contained DNA that hybridized to a P. aeruginosa oprB gene probe.
Neural network and SVM classifiers accurately predict lipid binding proteins, irrespective of sequence homology.

PubMed

Bakhtiarizadeh, Mohammad Reza; Moradi-Shahrbabak, Mohammad; Ebrahimi, Mansour; Ebrahimie, Esmaeil

2014-09-07

Due to the central roles of lipid binding proteins (LBPs) in many biological processes, sequence based identification of LBPs is of great interest. The major challenge is that LBPs are diverse in sequence, structure, and function which results in low accuracy of sequence homology based methods. Therefore, there is a need for developing alternative functional prediction methods irrespective of sequence similarity. To identify LBPs from non-LBPs, the performances of support vector machine (SVM) and neural network were compared in this study. Comprehensive protein features and various techniques were employed to create datasets. Five-fold cross-validation (CV) and independent evaluation (IE) tests were used to assess the validity of the two methods. The results indicated that SVM outperforms neural network. SVM achieved 89.28% (CV) and 89.55% (IE) overall accuracy in identification of LBPs from non-LBPs and 92.06% (CV) and 92.90% (IE) (in average) for classification of different LBPs classes. Increasing the number and the range of extracted protein features as well as optimization of the SVM parameters significantly increased the efficiency of LBPs class prediction in comparison to the only previous report in this field. Altogether, the results showed that the SVM algorithm can be run on broad, computationally calculated protein features and offers a promising tool in detection of LBPs classes. The proposed approach has the potential to integrate and improve the common sequence alignment based methods. Copyright © 2014 Elsevier Ltd. All rights reserved.
The N-terminal sequence of ribosomal protein L10 from the archaebacterium Halobacterium marismortui and its relationship to eubacterial protein L6 and other ribosomal proteins.

PubMed

Dijk, J; van den Broek, R; Nasiulas, G; Beck, A; Reinhardt, R; Wittmann-Liebold, B

1987-08-01

The amino-terminal sequence of ribosomal protein L10 from Halobacterium marismortui has been determined up to residue 54, using both a liquid- and a gas-phase sequenator. The two sequences are in good agreement. The protein is clearly homologous to protein HcuL10 from the related strain Halobacterium cutirubrum. Furthermore, a weaker but distinct homology to ribosomal protein L6 from Escherichia coli and Bacillus stearothermophilus can be detected. In addition to 7 identical amino acids in the first 36 residues in all four sequences a number of conservative replacements occurs, of mainly hydrophobic amino acids. In this common region the pattern of conserved amino acids suggests the presence of a beta-alpha fold as it occurs in ribosomal proteins L12 and L30. Furthermore, several potential cases of homology to other ribosomal components of the three ur-kingdoms have been found.
Production of Fatty Acid Components of Meadowfoam Oil in Somatic Soybean Embryos

PubMed Central

Cahoon, Edgar B.; Marillia, Elizabeth-France; Stecca, Kevin L.; Hall, Sarah E.; Taylor, David C.; Kinney, Anthony J.

2000-01-01

The seed oil of meadowfoam (Limnanthes alba) and other Limnanthes spp. is enriched in the unusual fatty acid Δ5-eicosenoic acid (20:1Δ5). This fatty acid has physical and chemical properties that make the seed oil of these plants useful for a number of industrial applications. An expressed sequence tag approach was used to identify cDNAs for enzymes involved in the biosynthesis of 20:1Δ5). By random sequencing of a library prepared from developing Limnanthes douglasii seeds, a class of cDNAs was identified that encode a homolog of acyl-coenzyme A (CoA) desaturases found in animals, fungi, and cyanobacteria. Expression of a cDNA for the L. douglasii acyl-CoA desaturase homolog in somatic soybean (Glycine max) embryos behind a strong seed-specific promoter resulted in the accumulation of Δ5-hexadecenoic acid to amounts of 2% to 3% (w/w) of the total fatty acids of single embryos. Δ5-Octadecenoic acid and 20:1Δ5 also composed <1% (w/w) each of the total fatty acids of these embryos. In addition, cDNAs were identified from the L. douglasii expressed sequence tags that encode a homolog of fatty acid elongase 1 (FAE1), a β-ketoacyl-CoA synthase that catalyzes the initial step of very long-chain fatty acid synthesis. Expression of the L. douglassi FAE1 homolog in somatic soybean embryos was accompanied by the accumulation of C20 and C22 fatty acids, principally as eicosanoic acid, to amounts of 18% (w/w) of the total fatty acids of single embryos. To partially reconstruct the biosynthetic pathway of 20:1Δ5 in transgenic plant tissues, cDNAs for the L. douglasii acyl-CoA desaturase and FAE1 were co-expressed in somatic soybean embryos. In the resulting transgenic embryos, 20:1Δ5 and Δ5-docosenoic acid composed up to 12% of the total fatty acids. PMID:10982439
Production of fatty acid components of meadowfoam oil in somatic soybean embryos.

PubMed

Cahoon, E B; Marillia, E F; Stecca, K L; Hall, S E; Taylor, D C; Kinney, A J

2000-09-01

The seed oil of meadowfoam (Limnanthes alba) and other Limnanthes spp. is enriched in the unusual fatty acid Delta(5)-eicosenoic acid (20:1Delta(5)). This fatty acid has physical and chemical properties that make the seed oil of these plants useful for a number of industrial applications. An expressed sequence tag approach was used to identify cDNAs for enzymes involved in the biosynthesis of 20:1Delta(5)). By random sequencing of a library prepared from developing Limnanthes douglasii seeds, a class of cDNAs was identified that encode a homolog of acyl-coenzyme A (CoA) desaturases found in animals, fungi, and cyanobacteria. Expression of a cDNA for the L. douglasii acyl-CoA desaturase homolog in somatic soybean (Glycine max) embryos behind a strong seed-specific promoter resulted in the accumulation of Delta(5)-hexadecenoic acid to amounts of 2% to 3% (w/w) of the total fatty acids of single embryos. Delta(5)-Octadecenoic acid and 20:1Delta(5) also composed <1% (w/w) each of the total fatty acids of these embryos. In addition, cDNAs were identified from the L. douglasii expressed sequence tags that encode a homolog of fatty acid elongase 1 (FAE1), a beta-ketoacyl-CoA synthase that catalyzes the initial step of very long-chain fatty acid synthesis. Expression of the L. douglassi FAE1 homolog in somatic soybean embryos was accompanied by the accumulation of C(20) and C(22) fatty acids, principally as eicosanoic acid, to amounts of 18% (w/w) of the total fatty acids of single embryos. To partially reconstruct the biosynthetic pathway of 20:1Delta(5) in transgenic plant tissues, cDNAs for the L. douglasii acyl-CoA desaturase and FAE1 were co-expressed in somatic soybean embryos. In the resulting transgenic embryos, 20:1Delta(5) and Delta(5)-docosenoic acid composed up to 12% of the total fatty acids.
Statistical alignment: computational properties, homology testing and goodness-of-fit.

PubMed

Hein, J; Wiuf, C; Knudsen, B; Møller, M B; Wibling, G

2000-09-08

The model of insertions and deletions in biological sequences, first formulated by Thorne, Kishino, and Felsenstein in 1991 (the TKF91 model), provides a basis for performing alignment within a statistical framework. Here we investigate this model.Firstly, we show how to accelerate the statistical alignment algorithms several orders of magnitude. The main innovations are to confine likelihood calculations to a band close to the similarity based alignment, to get good initial guesses of the evolutionary parameters and to apply an efficient numerical optimisation algorithm for finding the maximum likelihood estimate. In addition, the recursions originally presented by Thorne, Kishino and Felsenstein can be simplified. Two proteins, about 1500 amino acids long, can be analysed with this method in less than five seconds on a fast desktop computer, which makes this method practical for actual data analysis.Secondly, we propose a new homology test based on this model, where homology means that an ancestor to a sequence pair can be found finitely far back in time. This test has statistical advantages relative to the traditional shuffle test for proteins.Finally, we describe a goodness-of-fit test, that allows testing the proposed insertion-deletion (indel) process inherent to this model and find that real sequences (here globins) probably experience indels longer than one, contrary to what is assumed by the model. Copyright 2000 Academic Press.
A novel mutation in TFL1 homolog affecting determinacy in cowpea (Vigna unguiculata).

PubMed

Dhanasekar, P; Reddy, K S

2015-02-01

Mutations in the widely conserved Arabidopsis Terminal Flower 1 (TFL1) gene and its homologs have been demonstrated to result in determinacy across genera, the knowledge of which is lacking in cowpea. Understanding the molecular events leading to determinacy of apical meristems could hasten development of cowpea varieties with suitable ideotypes. Isolation and characterization of a novel mutation in cowpea TFL1 homolog (VuTFL1) affecting determinacy is reported here for the first time. Cowpea TFL1 homolog was amplified using primers designed based on conserved sequences in related genera and sequence variation was analysed in three gamma ray-induced determinate mutants, their indeterminate parent "EC394763" and two indeterminate varieties. The analyses of sequence variation exposed a novel SNP distinguishing the determinate mutants from the indeterminate types. The non-synonymous point mutation in exon 4 at position 1,176 resulted from transversion of cytosine (C) to adenine (A) leading to an amino acid change (Pro-136 to His) in determinate mutants. The effect of the mutation on protein function and stability was predicted to be detrimental using different bioinformatics/computational tools. The functionally significant novel substitution mutation is hypothesized to affect determinacy in the cowpea mutants. Development of suitable regeneration protocols in this hitherto recalcitrant crop and subsequent complementation assay in mutants or over-expressing assay in parents could decisively conclude the role of the SNP in regulating determinacy in these cowpea mutants.
Odor detection of mixtures of homologous carboxylic acids and coffee aroma compounds by humans.

PubMed

Miyazawa, Toshio; Gallagher, Michele; Preti, George; Wise, Paul M

2009-11-11

Mixture summation among homologous carboxylic acids, that is, the relationship between detection probabilities for mixtures and detection probabilities for their unmixed components, varies with similarity in carbon-chain length. The current study examined detection of acetic, butyric, hexanoic, and octanoic acids mixed with three other model odorants that differ greatly from the acids in both structure and odor character, namely, 2-hydroxy-3-methylcyclopent-2-en-1-one, furan-2-ylmethanethiol, and (3-methyl-3-sulfanylbutyl) acetate. Psychometric functions were measured for both single compounds and binary mixtures (2 of 5, forced-choice method). An air dilution olfactometer delivered stimuli, with vapor-phase calibration using gas chromatography-mass spectrometry. Across the three odorants that differed from the acids, acetic and butyric acid showed approximately additive (or perhaps even supra-additive) summation at low perithreshold concentrations, but subadditive interactions at high perithreshold concentrations. In contrast, the medium-chain acids showed subadditive interactions across a wide range of concentrations. Thus, carbon-chain length appears to influence not only summation with other carboxylic acids but also summation with at least some unrelated compounds.
An oleate 12-hydroxylase from Ricinus communis L. is a fatty acyl desaturase homolog

DOE Office of Scientific and Technical Information (OSTI.GOV)

Van De Loo, F.J.; Broun, P.; Turner, S.

1995-07-18

Recent spectroscopic evidence implicating a binuclear iron site at the reaction center of fatty acyl desaturases suggested to us that certain fatty acyl hydroxylases may share significant amino acid sequence similarity with desaturases. To test this theory, we prepared a cDNA library from developing endosperm of the castor-oil plant (Ricinus communis L.) and obtained partial nucleotide sequences for 468 anonymous clones that were not expressed at high levels in leaves, a tissue deficient in 12-hydroxyoleic acid. This resulted in the identification of several cDNA clones encoding a polypeptide of 387 amino acids with a predicted molecular weight of 44,407 andmore » with {approx}67% sequence homology to microsomal oleate desaturase from Arabidopsis. Expression of a full-length clone under control of the cauliflower mosaic virus 35S promoter in transgenic tobacco resulted in the accumulation of low levels of 12-hydroxyoleic acid in seeds, indicating that the clone encodes the castor oleate hydroxylase. These results suggest that fatty acyl desaturases and hydroxylases share similar reaction mechanisms and provide an example of enzyme evolution. 26 refs., 6 figs., 1 tab.« less
Kit for detecting nucleic acid sequences using competitive hybridization probes

DOEpatents

Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

2001-01-01

A kit is provided for detecting a target nucleic acid sequence in a sample, the kit comprising: a first hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a first portion of the target sequence, the first hybridization probe including a first complexing agent for forming a binding pair with a second complexing agent; and a second hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a second portion of the target sequence to which the first hybridization probe does not selectively hybridize, the second hybridization probe including a detectable marker; a third hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a first portion of the target sequence, the third hybridization probe including the same detectable marker as the second hybridization probe; and a fourth hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a second portion of the target sequence to which the third hybridization probe does not selectively hybridize, the fourth hybridization probe including the first complexing agent for forming a binding pair with the second complexing agent; wherein the first and second hybridization probes are capable of simultaneously hybridizing to the target sequence and the third and fourth hybridization probes are capable of simultaneously hybridizing to the target sequence, the detectable marker is not present on the first or fourth hybridization probes and the first, second, third, and fourth hybridization probes each include a competitive nucleic acid sequence which is sufficiently complementary to a third portion of the target sequence that the competitive sequences of the first, second, third, and fourth hybridization probes compete with each other to hybridize to the third portion of the
Sequence of a cDNA encoding pancreatic preprosomatostatin-22.

PubMed Central

Magazin, M; Minth, C D; Funckes, C L; Deschenes, R; Tavianini, M A; Dixon, J E

1982-01-01

We report the nucleotide sequence of a precursor to somatostatin that upon proteolytic processing may give rise to a hormone of 22 amino acids. The nucleotide sequence of a cDNA from the channel catfish (Ictalurus punctatus) encodes a precursor to somatostatin that is 105 amino acids (Mr, 11,500). The cDNA coding for somatostatin-22 consists of 36 nucleotides in the 5' untranslated region, 315 nucleotides that code for the precursor to somatostatin-22, 269 nucleotides at the 3' untranslated region, and a variable length of poly(A). The putative preprohormone contains a sequence of hydrophobic amino acids at the amino terminus that has the properties of a "signal" peptide. A connecting sequence of approximately 57 amino acids is followed by a single Arg-Arg sequence, which immediately precedes the hormone. Somatostatin-22 is homologous to somatostatin-14 in 7 of the 14 amino acids, including the Phe-Trp-Lys sequence. Hybridization selection of mRNA, followed by its translation in a wheat germ cell-free system, resulted in the synthesis of a single polypeptide having a molecular weight of approximately 10,000 as estimated on Na-DodSO4/polyacrylamide gels. Images PMID:6127673
Invariant glycines and prolines flanking in loops the strand beta 2 of various (alpha/beta)8-barrel enzymes: a hidden homology?

PubMed Central

Janecek, S.

1996-01-01

The question of parallel (alpha/beta)8-barrel fold evolution remains unclear, owing mainly to the lack of sequence homology throughout the amino acid sequences of (alpha/beta)8-barrel enzymes. The "classical" approaches used in the search for homologies among (alpha/beta)8-barrels (e.g., production of structurally based alignments) have yielded alignments perfect from the structural point of view, but the approaches have been unable to reveal the homologies. These are proposed to be "hidden" in (alpha/beta)8-barrel enzymes. The term "hidden homology" means that the alignment of sequence stretches proposed to be homologous need not be structurally fully satisfactory. This is due to the very long evolutionary history of all (alpha/beta)8-barrels. This work identifies so-called hidden homology around the strand beta 2 that is flanked by loops containing invariant glycines and prolines in 17 different (alpha/beta)8-barrel enzymes, i.e., roughly in half of all currently known (alpha/beta)8-barrel proteins. The search was based on the idea that a conserved sequence region of an (alpha/beta)8-barrel enzyme should be more or less conserved also in the equivalent part of the structure of the other enzymes with this folding motif, given their mutual evolutionary relatedness. For this purpose, the sequence region around the well-conserved second beta-strand of alpha-amylase flanked by the invariant glycine and proline (56_GFTAIWITP, Aspergillus oryzae alpha-amylase numbering), was used as the sequence-structural template. The proposal that the second beta-strand of (alpha/beta)8-barrel fold is important from the evolutionary point of view is strongly supported by the increasing trend of the observed beta 2-strand structural similarity for the pairs of (alpha/beta)8-barrel enzymes: alpha-amylase and the alpha-subunit of tryptophan synthase, alpha-amylase and mandelate racemase, and alpha-amylase and cyclodextrin glycosyltransferase. This trend is also in agreement with the
Invariant glycines and prolines flanking in loops the strand beta 2 of various (alpha/beta)8-barrel enzymes: a hidden homology?

PubMed

Janecek, S

1996-06-01

The question of parallel (alpha/beta)8-barrel fold evolution remains unclear, owing mainly to the lack of sequence homology throughout the amino acid sequences of (alpha/beta)8-barrel enzymes. The "classical" approaches used in the search for homologies among (alpha/beta)8-barrels (e.g., production of structurally based alignments) have yielded alignments perfect from the structural point of view, but the approaches have been unable to reveal the homologies. These are proposed to be "hidden" in (alpha/beta)8-barrel enzymes. The term "hidden homology" means that the alignment of sequence stretches proposed to be homologous need not be structurally fully satisfactory. This is due to the very long evolutionary history of all (alpha/beta)8-barrels. This work identifies so-called hidden homology around the strand beta 2 that is flanked by loops containing invariant glycines and prolines in 17 different (alpha/beta)8-barrel enzymes, i.e., roughly in half of all currently known (alpha/beta)8-barrel proteins. The search was based on the idea that a conserved sequence region of an (alpha/beta)8-barrel enzyme should be more or less conserved also in the equivalent part of the structure of the other enzymes with this folding motif, given their mutual evolutionary relatedness. For this purpose, the sequence region around the well-conserved second beta-strand of alpha-amylase flanked by the invariant glycine and proline (56_GFTAIWITP, Aspergillus oryzae alpha-amylase numbering), was used as the sequence-structural template. The proposal that the second beta-strand of (alpha/beta)8-barrel fold is important from the evolutionary point of view is strongly supported by the increasing trend of the observed beta 2-strand structural similarity for the pairs of (alpha/beta)8-barrel enzymes: alpha-amylase and the alpha-subunit of tryptophan synthase, alpha-amylase and mandelate racemase, and alpha-amylase and cyclodextrin glycosyltransferase. This trend is also in agreement with the

Partial mapping and sequencing of a fish iridovirus genome reveals genes homologous to the frog virus 3 p31, p40 and human eIF2alpha.

PubMed

Yu, Y X; Béarzotti, M; Vende, P; Ahne, W; Brémont, M

1999-09-01

Iridovirus-like pathogens have been recognized as a cause of serious systemic diseases among feral, cultured and ornamental fish in the recent years. Mortalities of fish due to systemic iridovirus infection reaching 30-100% were observed in Europe, Australia, Japan and Thailand. Up to now, the molecular biology of these important pathogens has been poorly documented. To get better insights on the genomic organization of these piscine iridoviruses, we have constructed a cosmid viral DNA library from the epizootic hematopoietic necrosis virus (EHNV). Two recombinant cosmids (Cos7 and Cos12) have been selected for systematic sequencing. Cos7 and 12 are localized side by side along the genome and cover the 2/3 part of the total EHNV genome which has been estimated to be approximately 101.47 kb in length. Thirty five kilobase pairs (kbps) from Cos7 and 10 kbps from Cos12 have been determined. Sequence analysis revealed open reading frames (ORF) sharing homologies with sequences from the Frog virus 3 such as the p31 and p40 proteins. Among the others identified ORFs, some of them presented homologies with known protein sequences, such as the human eIF2alpha protein, and some did not show any significant homologies with sequences available in the databases. But, none were related to Lymphocystis virus, a member of the Iridoviridae family, for which the full genome nucleotide sequence has been determined.
[Sequencing and analysis of the complete genome of a rabies virus isolate from Sika deer].

PubMed

Zhao, Yun-Jiao; Guo, Li; Huang, Ying; Zhang, Li-Shi; Qian, Ai-Dong

2008-05-01

One DRV strain was isolated from Sika Deer brain and sequenced. Nine overlapped gene fragments were amplified by RT-PCR through 3'-RACE and 5'-RACE method, and the complete DRV genome sequence was assembled. The length of the complete genome is 11863bp. The DRV genome organization was similar to other rabies viruses which were composed of five genes and the initiation sites and termination sites were highly conservative. There were mutated amino acids in important antigen sites of nucleoprotein and glycoprotein. The nucleotide and amino acid homologies of gene N, P, M, G, L in strains with completed genomie sequencing were compared. Compared with N gene sequence of other typical rabies viruses, a phylogenetic tree was established . These results indicated that DRV belonged to gene type 1. The highest homology compared with Chinese vaccine strain 3aG was 94%, and the lowest was 71% compared with WCBV. These findings provided theoretical reference for further research in rabies virus.
The primary structures of two yeast enolase genes. Homology between the 5' noncoding flanking regions of yeast enolase and glyceraldehyde-3-phosphate dehydrogenase genes.

PubMed

Holland, M J; Holland, J P; Thill, G P; Jackson, K A

1981-02-10

Segments of yeast genomic DNA containing two enolase structural genes have been isolated by subculture cloning procedures using a cDNA hybridization probe synthesized from purified yeast enolase mRNA. Based on restriction endonuclease and transcriptional maps of these two segments of yeast DNA, each hybrid plasmid contains a region of extensive nucleotide sequence homology which forms hybrids with the cDNA probe. The DNA sequences which flank this homologous region in the two hybrid plasmids are nonhomologous indicating that these sequences are nontandemly repeated in the yeast genome. The complete nucleotide sequence of the coding as well as the flanking noncoding regions of these genes has been determined. The amino acid sequence predicted from one reading frame of both structural genes is extremely similar to that determined for yeast enolase (Chin, C. C. Q., Brewer, J. M., Eckard, E., and Wold, F. (1981) J. Biol. Chem. 256, 1370-1376), confirming that these isolated structural genes encode yeast enolase. The nucleotide sequences of the coding regions of the genes are approximately 95% homologous, and neither gene contains an intervening sequence. Codon utilization in the enolase genes follows the same biased pattern previously described for two yeast glyceraldehyde-3-phosphate dehydrogenase structural genes (Holland, J. P., and Holland, M. J. (1980) J. Biol. Chem. 255, 2596-2605). DNA blotting analysis confirmed that the isolated segments of yeast DNA are colinear with yeast genomic DNA and that there are two nontandemly repeated enolase genes per haploid yeast genome. The noncoding portions of the two enolase genes adjacent to the initiation and termination codons are approximately 70% homologous and contain sequences thought to be involved in the synthesis and processing messenger RNA. Finally there are regions of extensive homology between the two enolase structural genes and two yeast glyceraldehyde-3-phosphate dehydrogenase structural genes within the 5
Genomic cloning and chromosomal localization of HRY, the human homolog to the Drosophila segmentation gene, hairy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Feder, J.N.; Jan, L.Y.; Jan, Y.N.

The Drosophila hairy gene encodes a basic helix- loop-helix protein that functions in at least two steps during Drosophila development: (1) during embryogenesis, when it partakes in the establishment of segments, and (2) during the larval stage, when it functions negatively in determining the pattern of sensory bristles on the adult fly. In the rat, a structurally homologous gene (RHL) behaves as an immediate-early gene in its response to growth factors and can, like that in Drosophila, suppress neuronal differentiation events. Here, the authors report the genomic cloning of the human hairy gene homolog (HRY). The coding region of themore » gene is contained within four exons. The predicted amino acid sequence reveals only four amino acid differences between the human and rat genes. Analysis of the DNA sequence 5[prime] to the coding region reveals a putatitve untranslated exon. To increase the value of the HRY gene as a genetic marker and to assess its potential involvement in genetic disorders, they sublocalized the locus to chromosome 3q28-q29 by fluorescence in situ hybridization. 34 refs., 4 figs., 1 tab.« less
PCV: An Alignment Free Method for Finding Homologous Nucleotide Sequences and its Application in Phylogenetic Study.

PubMed

Kumar, Rajnish; Mishra, Bharat Kumar; Lahiri, Tapobrata; Kumar, Gautam; Kumar, Nilesh; Gupta, Rahul; Pal, Manoj Kumar

2017-06-01

Online retrieval of the homologous nucleotide sequences through existing alignment techniques is a common practice against the given database of sequences. The salient point of these techniques is their dependence on local alignment techniques and scoring matrices the reliability of which is limited by computational complexity and accuracy. Toward this direction, this work offers a novel way for numerical representation of genes which can further help in dividing the data space into smaller partitions helping formation of a search tree. In this context, this paper introduces a 36-dimensional Periodicity Count Value (PCV) which is representative of a particular nucleotide sequence and created through adaptation from the concept of stochastic model of Kolekar et al. (American Institute of Physics 1298:307-312, 2010. doi: 10.1063/1.3516320 ). The PCV construct uses information on physicochemical properties of nucleotides and their positional distribution pattern within a gene. It is observed that PCV representation of gene reduces computational cost in the calculation of distances between a pair of genes while being consistent with the existing methods. The validity of PCV-based method was further tested through their use in molecular phylogeny constructs in comparison with that using existing sequence alignment methods.
DNA sequences of three beta-1,4-endoglucanase genes from Thermomonospora fusca.

PubMed Central

Lao, G; Ghangas, G S; Jung, E D; Wilson, D B

1991-01-01

The DNA sequences of the Thermomonospora fusca genes encoding cellulases E2 and E5 and the N-terminal end of E4 were determined. Each sequence contains an identical 14-bp inverted repeat upstream of the initiation codon. There were no significant homologies between the coding regions of the three genes. The E2 gene is 73% identical to the celA gene from Microbispora bispora, but this was the only homology found with other cellulase genes. E2 belongs to a family of cellulases that includes celA from M. bispora, cenA from Cellulomonas fimi, casA from an alkalophilic Streptomyces strain, and cellobiohydrolase II from Trichoderma reesei. E4 shows 44% identity to an avocado cellulase, while E5 belongs to the Bacillus cellulase family. There were strong similarities between the amino acid sequences of the E2 and E5 cellulose binding domains, and these regions also showed homology with C. fimi and Pseudomonas fluorescens cellulose binding domains. PMID:1904434
Productive Homologous and Non-homologous Recombination of Hepatitis C Virus in Cell Culture

PubMed Central

Li, Yi-Ping; Mikkelsen, Lotte S.; Gottwein, Judith M.; Bukh, Jens

2013-01-01

Genetic recombination is an important mechanism for increasing diversity of RNA viruses, and constitutes a viral escape mechanism to host immune responses and to treatment with antiviral compounds. Although rare, epidemiologically important hepatitis C virus (HCV) recombinants have been reported. In addition, recombination is an important regulatory mechanism of cytopathogenicity for the related pestiviruses. Here we describe recombination of HCV RNA in cell culture leading to production of infectious virus. Initially, hepatoma cells were co-transfected with a replicating JFH1ΔE1E2 genome (genotype 2a) lacking functional envelope genes and strain J6 (2a), which has functional envelope genes but does not replicate in culture. After an initial decrease in the number of HCV positive cells, infection spread after 13–36 days. Sequencing of recovered viruses revealed non-homologous recombinants with J6 sequence from the 5′ end to the NS2–NS3 region followed by JFH1 sequence from Core to the 3′ end. These recombinants carried duplicated sequence of up to 2400 nucleotides. HCV replication was not required for recombination, as recombinants were observed in most experiments even when two replication incompetent genomes were co-transfected. Reverse genetic studies verified the viability of representative recombinants. After serial passage, subsequent recombination events reducing or eliminating the duplicated region were observed for some but not all recombinants. Furthermore, we found that inter-genotypic recombination could occur, but at a lower frequency than intra-genotypic recombination. Productive recombination of attenuated HCV genomes depended on expression of all HCV proteins and tolerated duplicated sequence. In general, no strong site specificity was observed. Non-homologous recombination was observed in most cases, while few homologous events were identified. A better understanding of HCV recombination could help identification of natural recombinants
A discriminative method for family-based protein remote homology detection that combines inductive logic programming and propositional models

PubMed Central

2011-01-01

Background Remote homology detection is a hard computational problem. Most approaches have trained computational models by using either full protein sequences or multiple sequence alignments (MSA), including all positions. However, when we deal with proteins in the "twilight zone" we can observe that only some segments of sequences (motifs) are conserved. We introduce a novel logical representation that allows us to represent physico-chemical properties of sequences, conserved amino acid positions and conserved physico-chemical positions in the MSA. From this, Inductive Logic Programming (ILP) finds the most frequent patterns (motifs) and uses them to train propositional models, such as decision trees and support vector machines (SVM). Results We use the SCOP database to perform our experiments by evaluating protein recognition within the same superfamily. Our results show that our methodology when using SVM performs significantly better than some of the state of the art methods, and comparable to other. However, our method provides a comprehensible set of logical rules that can help to understand what determines a protein function. Conclusions The strategy of selecting only the most frequent patterns is effective for the remote homology detection. This is possible through a suitable first-order logical representation of homologous properties, and through a set of frequent patterns, found by an ILP system, that summarizes essential features of protein functions. PMID:21429187
A discriminative method for family-based protein remote homology detection that combines inductive logic programming and propositional models.

PubMed

Bernardes, Juliana S; Carbone, Alessandra; Zaverucha, Gerson

2011-03-23

Remote homology detection is a hard computational problem. Most approaches have trained computational models by using either full protein sequences or multiple sequence alignments (MSA), including all positions. However, when we deal with proteins in the "twilight zone" we can observe that only some segments of sequences (motifs) are conserved. We introduce a novel logical representation that allows us to represent physico-chemical properties of sequences, conserved amino acid positions and conserved physico-chemical positions in the MSA. From this, Inductive Logic Programming (ILP) finds the most frequent patterns (motifs) and uses them to train propositional models, such as decision trees and support vector machines (SVM). We use the SCOP database to perform our experiments by evaluating protein recognition within the same superfamily. Our results show that our methodology when using SVM performs significantly better than some of the state of the art methods, and comparable to other. However, our method provides a comprehensible set of logical rules that can help to understand what determines a protein function. The strategy of selecting only the most frequent patterns is effective for the remote homology detection. This is possible through a suitable first-order logical representation of homologous properties, and through a set of frequent patterns, found by an ILP system, that summarizes essential features of protein functions.
Prefiltering Model for Homology Detection Algorithms on GPU.

PubMed

Retamosa, Germán; de Pedro, Luis; González, Ivan; Tamames, Javier

2016-01-01

Homology detection has evolved over the time from heavy algorithms based on dynamic programming approaches to lightweight alternatives based on different heuristic models. However, the main problem with these algorithms is that they use complex statistical models, which makes it difficult to achieve a relevant speedup and find exact matches with the original results. Thus, their acceleration is essential. The aim of this article was to prefilter a sequence database. To make this work, we have implemented a groundbreaking heuristic model based on NVIDIA's graphics processing units (GPUs) and multicore processors. Depending on the sensitivity settings, this makes it possible to quickly reduce the sequence database by factors between 50% and 95%, while rejecting no significant sequences. Furthermore, this prefiltering application can be used together with multiple homology detection algorithms as a part of a next-generation sequencing system. Extensive performance and accuracy tests have been carried out in the Spanish National Centre for Biotechnology (NCB). The results show that GPU hardware can accelerate the execution times of former homology detection applications, such as National Centre for Biotechnology Information (NCBI), Basic Local Alignment Search Tool for Proteins (BLASTP), up to a factor of 4.
Developmental rearrangement of cyanobacterial nif genes: nucleotide sequence, open reading frames, and cytochrome P-450 homology of the Anabaena sp. strain PCC 7120 nifD element.

PubMed Central

Lammers, P J; McLaughlin, S; Papin, S; Trujillo-Provencio, C; Ryncarz, A J

1990-01-01

An 11-kbp DNA element of unknown function interrupts the nifD gene in vegetative cells of Anabaena sp. strain PCC 7120. In developing heterocysts the nifD element excises from the chromosome via site-specific recombination between short repeat sequences that flank the element. The nucleotide sequence of the nifH-proximal half of the element was determined to elucidate the genetic potential of the element. Four open reading frames with the same relative orientation as the nifD element-encoded xisA gene were identified in the sequenced region. Each of the open reading frames was preceded by a reasonable ribosome-binding site and had biased codon utilization preferences consistent with low levels of expression. Open reading frame 3 was highly homologous with three cytochrome P-450 omega-hydroxylase proteins and showed regional homology to functionally significant domains common to the cytochrome P-450 superfamily. The sequence encoding open reading frame 2 was the most highly conserved portion of the sequenced region based on heterologous hybridization experiments with three genera of heterocystous cyanobacteria. Images PMID:2123860
High-efficiency transformation of Pichia stipitis based on its URA3 gene and a homologous autonomous replication sequence, ARS2.

PubMed Central

Yang, V W; Marks, J A; Davis, B P; Jeffries, T W

1994-01-01

This paper describes the first high-efficiency transformation system for the xylose-fermenting yeast Pichia stipitis. The system includes integrating and autonomously replicating plasmids based on the gene for orotidine-5'-phosphate decarboxylase (URA3) and an autonomous replicating sequence (ARS) element (ARS2) isolated from P. stipitis CBS 6054. Ura- auxotrophs were obtained by selecting for resistance to 5-fluoroorotic acid and were identified as ura3 mutants by transformation with P. stipitis URA3. P. stipitis URA3 was cloned by its homology to Saccharomyces cerevisiae URA3, with which it is 69% identical in the coding region. P. stipitis ARS elements were cloned functionally through plasmid rescue. These sequences confer autonomous replication when cloned into vectors bearing the P. stipitis URA3 gene. P. stipitis ARS2 has features similar to those of the consensus ARS of S. cerevisiae and other ARS elements. Circular plasmids bearing the P. stipitis URA3 gene with various amounts of flanking sequences produced 600 to 8,600 Ura+ transformants per micrograms of DNA by electroporation. Most transformants obtained with circular vectors arose without integration of vector sequences. One vector yielded 5,200 to 12,500 Ura+ transformants per micrograms of DNA after it was linearized at various restriction enzyme sites within the P. stipitis URA3 insert. Transformants arising from linearized vectors produced stable integrants, and integration events were site specific for the genomic ura3 in 20% of the transformants examined. Plasmids bearing the P. stipitis URA3 gene and ARS2 element produced more than 30,000 transformants per micrograms of plasmid DNA. Autonomously replicating plasmids were stable for at least 50 generations in selection medium and were present at an average of 10 copies per nucleus. Images PMID:7811063
Shotgun Protein Sequencing with Meta-contig Assembly*

PubMed Central

Guthals, Adrian; Clauser, Karl R.; Bandeira, Nuno

2012-01-01

Full-length de novo sequencing from tandem mass (MS/MS) spectra of unknown proteins such as antibodies or proteins from organisms with unsequenced genomes remains a challenging open problem. Conventional algorithms designed to individually sequence each MS/MS spectrum are limited by incomplete peptide fragmentation or low signal to noise ratios and tend to result in short de novo sequences at low sequencing accuracy. Our shotgun protein sequencing (SPS) approach was developed to ameliorate these limitations by first finding groups of unidentified spectra from the same peptides (contigs) and then deriving a consensus de novo sequence for each assembled set of spectra (contig sequences). But whereas SPS enables much more accurate reconstruction of de novo sequences longer than can be recovered from individual MS/MS spectra, it still requires error-tolerant matching to homologous proteins to group smaller contig sequences into full-length protein sequences, thus limiting its effectiveness on sequences from poorly annotated proteins. Using low and high resolution CID and high resolution HCD MS/MS spectra, we address this limitation with a Meta-SPS algorithm designed to overlap and further assemble SPS contigs into Meta-SPS de novo contig sequences extending as long as 100 amino acids at over 97% accuracy without requiring any knowledge of homologous protein sequences. We demonstrate Meta-SPS using distinct MS/MS data sets obtained with separate enzymatic digestions and discuss how the remaining de novo sequencing limitations relate to MS/MS acquisition settings. PMID:22798278
Shotgun protein sequencing with meta-contig assembly.

PubMed

Guthals, Adrian; Clauser, Karl R; Bandeira, Nuno

2012-10-01

Full-length de novo sequencing from tandem mass (MS/MS) spectra of unknown proteins such as antibodies or proteins from organisms with unsequenced genomes remains a challenging open problem. Conventional algorithms designed to individually sequence each MS/MS spectrum are limited by incomplete peptide fragmentation or low signal to noise ratios and tend to result in short de novo sequences at low sequencing accuracy. Our shotgun protein sequencing (SPS) approach was developed to ameliorate these limitations by first finding groups of unidentified spectra from the same peptides (contigs) and then deriving a consensus de novo sequence for each assembled set of spectra (contig sequences). But whereas SPS enables much more accurate reconstruction of de novo sequences longer than can be recovered from individual MS/MS spectra, it still requires error-tolerant matching to homologous proteins to group smaller contig sequences into full-length protein sequences, thus limiting its effectiveness on sequences from poorly annotated proteins. Using low and high resolution CID and high resolution HCD MS/MS spectra, we address this limitation with a Meta-SPS algorithm designed to overlap and further assemble SPS contigs into Meta-SPS de novo contig sequences extending as long as 100 amino acids at over 97% accuracy without requiring any knowledge of homologous protein sequences. We demonstrate Meta-SPS using distinct MS/MS data sets obtained with separate enzymatic digestions and discuss how the remaining de novo sequencing limitations relate to MS/MS acquisition settings.
Cloning, sequencing, and characterization of the Bacillus subtilis biotin biosynthetic operon.

PubMed

Bower, S; Perkins, J B; Yocum, R R; Howitt, C L; Rahaim, P; Pero, J

1996-07-01

A 10-kb region of the Bacillus subtilis genome that contains genes involved in biotin-biosynthesis was cloned and sequenced. DNA sequence analysis indicated that B. subtilis contains homologs of the Escherichia coli and Bacillus sphaericus bioA, bioB, bioD, and bioF genes. These four genes and a homolog of the B. sphaericus bioW gene are arranged in a single operon in the order bioWAFDR and are followed by two additional genes, bioI and orf2. bioI and orf2 show no similarity to any other known biotin biosynthetic genes. The bioI gene encodes a protein with similarity to cytochrome P-450s and was able to complement mutations in either bioC or bioH of E. coli. Mutations in bioI caused B. subtilis to grow poorly in the absence of biotin. The bradytroph phenotype of bioI mutants was overcome by pimelic acid, suggesting that the product of bioI functions at a step prior to pimelic acid synthesis. The B. subtilis bio operon is preceded by a putative vegetative promoter sequence and contains just downstream a region of dyad symmetry with homology to the bio regulatory region of B. sphaericus. Analysis of a bioW-lacZ translational fusion indicated that expression of the biotin operon is regulated by biotin and the B. subtilis birA gene.
Cloning, sequencing, and characterization of the Bacillus subtilis biotin biosynthetic operon.

PubMed Central

Bower, S; Perkins, J B; Yocum, R R; Howitt, C L; Rahaim, P; Pero, J

1996-01-01

A 10-kb region of the Bacillus subtilis genome that contains genes involved in biotin-biosynthesis was cloned and sequenced. DNA sequence analysis indicated that B. subtilis contains homologs of the Escherichia coli and Bacillus sphaericus bioA, bioB, bioD, and bioF genes. These four genes and a homolog of the B. sphaericus bioW gene are arranged in a single operon in the order bioWAFDR and are followed by two additional genes, bioI and orf2. bioI and orf2 show no similarity to any other known biotin biosynthetic genes. The bioI gene encodes a protein with similarity to cytochrome P-450s and was able to complement mutations in either bioC or bioH of E. coli. Mutations in bioI caused B. subtilis to grow poorly in the absence of biotin. The bradytroph phenotype of bioI mutants was overcome by pimelic acid, suggesting that the product of bioI functions at a step prior to pimelic acid synthesis. The B. subtilis bio operon is preceded by a putative vegetative promoter sequence and contains just downstream a region of dyad symmetry with homology to the bio regulatory region of B. sphaericus. Analysis of a bioW-lacZ translational fusion indicated that expression of the biotin operon is regulated by biotin and the B. subtilis birA gene. PMID:8763940
Homology among tet determinants in conjugative elements of streptococci.

PubMed Central

Smith, M D; Hazum, S; Guild, W R

1981-01-01

A mutation to tetracycline sensitivity in a resistant strain of Streptococcus pneumoniae was shown by several criteria to be due to a point mutation in the conjugative omega (cat-tet) element found in the chromosomes of strains derived from BM6001, a clinical strain resistant to tetracycline and chloramphenicol. Strains carrying the mutation were transformed back to tetracycline resistance with the high efficiency of a point marker by donor deoxyribonucleic acids from its ancestral strain and from nine other clinical isolates of pneumococcus and by deoxyribonucleic acids from group D Streptococcus faecalis and group B Streptococcus agalactiae strains that also carry conjugative tet elements in their chromosomes. It was not transformed to resistance by tet plasmid deoxyribonucleic acids from either gram-negative or gram-positive species, except for one that carried transposon Tn916, the conjugative tet element present in the chromosomes of some S. faecalis strains. The results showed that the tet determinants in conjugative elements of several streptococcal species share a high degree of deoxyribonucleic acid sequence homology and suggested that they differ from other tet genes. PMID:6270063
FASH: A web application for nucleotides sequence search.

PubMed

Veksler-Lublinksy, Isana; Barash, Danny; Avisar, Chai; Troim, Einav; Chew, Paul; Kedem, Klara

2008-05-27

: FASH (Fourier Alignment Sequence Heuristics) is a web application, based on the Fast Fourier Transform, for finding remote homologs within a long nucleic acid sequence. Given a query sequence and a long text-sequence (e.g, the human genome), FASH detects subsequences within the text that are remotely-similar to the query. FASH offers an alternative approach to Blast/Fasta for querying long RNA/DNA sequences. FASH differs from these other approaches in that it does not depend on the existence of contiguous seed-sequences in its initial detection phase. The FASH web server is user friendly and very easy to operate. FASH can be accessed athttps://fash.bgu.ac.il:8443/fash/default.jsp (secured website).
Molecular cloning, sequence identification and tissue expression profile of three novel sheep (Ovis aries) genes - BCKDHA, NAGA and HEXA.

PubMed

Liu, G Y; Gao, S Z

2009-01-01

The complete coding sequences of three sheep genes- BCKDHA, NAGA and HEXA were amplified using the reverse transcriptase polymerase chain reaction (RT-PCR), based on the conserved sequence information of the mouse or other mammals. The nucleotide sequences of these three genes revealed that the sheep BCKDHA gene encodes a protein of 313 amino acids which has high homology with the BCKDHA gene that encodes a protein of 447 amino acids that has high homology with the Branched chain keto acid dehydrogenase El, alpha polypeptide (BCKDHA) of five species chimpanzee (93%), human (96%), crab-eating macaque (93%), bovine (98%) and mouse (91%). The sheep NAGA gene encodes a protein of 411 amino acids that has high homology with the alpha-N-acetylgalactosaminidase (NAGA) of five species human (85%), bovine (94%), mouse (91%), rat (83%) and chicken (74%). The sheep HEXA gene encodes a protein of 529 amino acids that has high homology with the hexosaminidase A(HEXA) of five species bovine (98%), human (84%), Bornean orangután (84%), rat (80%) and mouse (81%). Finally these three novel sheep genes were assigned to GenelDs: 100145857, 100145858 and 100145856. The phylogenetic tree analysis revealed that the sheep BCKDHA, NAGA, and HEXA all have closer genetic relationships to the BCKDHA, NAGA, and HEXA of bovine. Tissue expression profile analysis was also carried out and results revealed that sheep BCKDHA, NAGA and HEXA genes were differentially expressed in tissues including muscle, heart, liver, fat, kidney, lung, small and large intestine. Our experiment is the first to establish the primary foundation for further research on these three sheep genes.
A greedy, graph-based algorithm for the alignment of multiple homologous gene lists.

PubMed

Fostier, Jan; Proost, Sebastian; Dhoedt, Bart; Saeys, Yvan; Demeester, Piet; Van de Peer, Yves; Vandepoele, Klaas

2011-03-15

Many comparative genomics studies rely on the correct identification of homologous genomic regions using accurate alignment tools. In such case, the alphabet of the input sequences consists of complete genes, rather than nucleotides or amino acids. As optimal multiple sequence alignment is computationally impractical, a progressive alignment strategy is often employed. However, such an approach is susceptible to the propagation of alignment errors in early pairwise alignment steps, especially when dealing with strongly diverged genomic regions. In this article, we present a novel accurate and efficient greedy, graph-based algorithm for the alignment of multiple homologous genomic segments, represented as ordered gene lists. Based on provable properties of the graph structure, several heuristics are developed to resolve local alignment conflicts that occur due to gene duplication and/or rearrangement events on the different genomic segments. The performance of the algorithm is assessed by comparing the alignment results of homologous genomic segments in Arabidopsis thaliana to those obtained by using both a progressive alignment method and an earlier graph-based implementation. Especially for datasets that contain strongly diverged segments, the proposed method achieves a substantially higher alignment accuracy, and proves to be sufficiently fast for large datasets including a few dozens of eukaryotic genomes. http://bioinformatics.psb.ugent.be/software. The algorithm is implemented as a part of the i-ADHoRe 3.0 package.

Isolation and characterization of rhamnose-binding lectins from eggs of steelhead trout (Oncorhynchus mykiss) homologous to low density lipoprotein receptor superfamily.

PubMed

Tateno, H; Saneyoshi, A; Ogawa, T; Muramoto, K; Kamiya, H; Saneyoshi, M

1998-07-24

Two L-rhamnose-binding lectins named STL1 and STL2 were isolated from eggs of steelhead trout (Oncorhynchus mykiss) by affinity chromatography and ion exchange chromatography. The apparent molecular masses of purified STL1 and STL2 were estimated to be 84 and 68 kDa, respectively, by gel filtration chromatography. Sodium dodecyl sulfate polyacrylamide gel electrophoresis and matrix-assisted laser desorption ionization time of flight mass spectrometry of these lectins revealed that STL1 was composed of noncovalently linked trimer of 31.4-kDa subunits, and STL2 was noncovalently linked trimer of 21.5-kDa subunits. The minimum concentrations of STL1, a major component, and STL2, a minor component, needed to agglutinate rabbit erythrocytes were 9 and 0.2 microg/ml, respectively. The most effective saccharide in the hemagglutination inhibition assay for both STL1 and STL2 was L-rhamnose. Saccharides possessing the same configuration of hydroxyl groups at C2 and C4 as that in L-rhamnose, such as L-arabinose and D-galactose, also inhibited. The amino acid sequence of STL2 was determined by analysis of peptides generated by digestion of the S-carboxamidomethylated protein with Achromobacter protease I or Staphylococcus aureus V8 protease. The STL2 subunit of 195 amino acid residues proved to have a unique polypeptide architecture; that is, it was composed of two tandemly repeated homologous domains (STL2-N and STL2-C) with 52% internal homology. These two domains showed a sequence homology to the subunit (105 amino acid residues) of D-galactoside-specific sea urchin (Anthocidaris crassispina) egg lectin (37% for STL2-N and 46% for STL2-C, respectively). The N terminus of the STL1 subunit was blocked with an acetyl group. However, a partial amino acid sequence of the subunit showed a sequence similarity to STL2. Moreover, STL2 also showed a sequence homology to the ligand binding domain of the vitellogenin receptor. We have also employed surface plasmon resonance biosensor
Hybridization and sequencing of nucleic acids using base pair mismatches

DOEpatents

Fodor, Stephen P. A.; Lipshutz, Robert J.; Huang, Xiaohua

2001-01-01

Devices and techniques for hybridization of nucleic acids and for determining the sequence of nucleic acids. Arrays of nucleic acids are formed by techniques, preferably high resolution, light-directed techniques. Positions of hybridization of a target nucleic acid are determined by, e.g., epifluorescence microscopy. Devices and techniques are proposed to determine the sequence of a target nucleic acid more efficiently and more quickly through such synthesis and detection techniques.
Peak AAA fatty acid homolog contaminants present in the dietary supplement l-Tryptophan associated with the onset of eosinophilia-myalgia syndrome.

PubMed

Klarskov, Klaus; Gagnon, Hugo; Racine, Mathieu; Boudreault, Pierre-Luc; Normandin, Chad; Marsault, Eric; Gleich, Gerald J; Naylor, Stephen

2018-05-22

The eosinophilia-myalgia syndrome (EMS) outbreak that occurred in the USA and elsewhere in 1989 was caused by the ingestion of Showa Denko K.K. (SD) L-tryptophan (L-Trp). "Six compounds" detected in the L-Trp were reported as case-associated contaminants. Recently the final and most statistically significant contaminant, "Peak AAA" was structurally characterized. The "compound" was actually shown to be two structural isomers resulting from condensation reactions of L-Trp with fatty acids derived from the bacterial cell membrane. They were identified as the indole C-2 anteiso (AAA 1 -343) and linear (AAA 2 -343) aliphatic chain isomers. Based on those findings, we utilized a combination of on-line HPLC-electrospray ionization mass spectrometry (LC-MS), as well as both precursor and product ion tandem mass spectrometry (MS/MS) to facilitate identification of a homologous family of condensation products related to AAA 1 -343 and AAA 2 -343. We structurally characterized eight new AAA 1 -XXX/AAA 2 -XXX contaminants, where XXX represents the integer molecular ions of all the related homologs, differing by aliphatic chain length and isomer configuration. The contaminants were derived from the following fatty acids of the bacterial cell membrane, 5-methylheptanoic acid (anteiso-C8:0) for AAA 1 -315; n-octanoic acid (n-C8:0) for AAA 2 -315; 6-methyloctanoic acid (anteiso-C9:0) for AAA 1 -329; n-nonanoic acid (n-C9:0) for AAA 2 -329; 10-methyldodecanoic acid (anteiso-C13:0) for AAA 1 -385; n-tridecanoic acid (n-C13:0) for AAA 2 -385; 11-methyltridecanoic acid (anteiso-C14:0) for AAA 1 -399; and n-tetradecanoic acid (n-C14:0) for AAA 2 -399. The concentration levels for these contaminants were estimated to be 0.1-7.9 μg / 500 mg of an individual SD L-Trp tablet or capsule The structural similarity of these homologs to case-related contaminants of Spanish Toxic Oil Syndrome (TOS) is discussed. Copyright © 2018 Elsevier B.V. All rights reserved.
Mouse Vk gene classification by nucleic acid sequence similarity.

PubMed

Strohal, R; Helmberg, A; Kroemer, G; Kofler, R

1989-01-01

Analyses of immunoglobulin (Ig) variable (V) region gene usage in the immune response, estimates of V gene germline complexity, and other nucleic acid hybridization-based studies depend on the extent to which such genes are related (i.e., sequence similarity) and their organization in gene families. While mouse Igh heavy chain V region (VH) gene families are relatively well-established, a corresponding systematic classification of Igk light chain V region (Vk) genes has not been reported. The present analysis, in the course of which we reviewed the known extent of the Vk germline gene repertoire and Vk gene usage in a variety of responses to foreign and self antigens, provides a classification of mouse Vk genes in gene families composed of members with greater than 80% overall nucleic acid sequence similarity. This classification differed in several aspects from that of VH genes: only some Vk gene families were as clearly separated (by greater than 25% sequence dissimilarity) as typical VH gene families; most Vk gene families were closely related and, in several instances, members from different families were very similar (greater than 80%) over large sequence portions; frequently, classification by nucleic acid sequence similarity diverged from existing classifications based on amino-terminal protein sequence similarity. Our data have implications for Vk gene analyses by nucleic acid hybridization and describe potentially important differences in sequence organization between VH and Vk genes.
Combined sequence and structure analysis of the fungal laccase family.

PubMed

Kumar, S V Suresh; Phale, Prashant S; Durani, S; Wangikar, Pramod P

2003-08-20

Plant and fungal laccases belong to the family of multi-copper oxidases and show much broader substrate specificity than other members of the family. Laccases have consequently been of interest for potential industrial applications. We have analyzed the essential sequence features of fungal laccases based on multiple sequence alignments of more than 100 laccases. This has resulted in identification of a set of four ungapped sequence regions, L1-L4, as the overall signature sequences that can be used to identify the laccases, distinguishing them within the broader class of multi-copper oxidases. The 12 amino acid residues in the enzymes serving as the copper ligands are housed within these four identified conserved regions, of which L2 and L4 conform to the earlier reported copper signature sequences of multi-copper oxidases while L1 and L3 are distinctive to the laccases. The mapping of regions L1-L4 on to the three-dimensional structure of the Coprinus cinerius laccase indicates that many of the non-copper-ligating residues of the conserved regions could be critical in maintaining a specific, more or less C-2 symmetric, protein conformational motif characterizing the active site apparatus of the enzymes. The observed intraprotein homologies between L1 and L3 and between L2 and L4 at both the structure and the sequence levels suggest that the quasi C-2 symmetric active site conformational motif may have arisen from a structural duplication event that neither the sequence homology analysis nor the structure homology analysis alone would have unraveled. Although the sequence and structure homology is not detectable in the rest of the protein, the relative orientation of region L1 with L2 is similar to that of L3 with L4. The structure duplication of first-shell and second-shell residues has become cryptic because the intraprotein sequence homology noticeable for a given laccase becomes significant only after comparing the conservation pattern in several fungal
3'-terminal sequence of a small round structured virus (SRSV) in Japan.

PubMed

Utagawa, E T; Takeda, N; Inouye, S; Kasuga, K; Yamazaki, S

1994-01-01

We determined the nucleotide sequence of about 1,000 bases from the 3'-terminus of a small round structured virus (SRSV), which caused a gastroenteritis outbreak in Chiba Prefecture, Japan, in 1987. The sequence was compared with the corresponding sequence region of Norwalk virus; it consisted of a part of the open reading frame 2 (ORF2), whole ORF3, and 3'-noncoding region (NCR). The 624-base-long ORF3 had sequence homology of 68% with the corresponding region of Norwalk virus. (The amino acid sequence homology was 74%.) The 94-base-long NCR had 65% homology with Norwalk virus. We then selected two consensus-sequence portions in the above sequence between Chiba and Norwalk viruses for primers in the reverse transcriptase-polymerase chain reaction (RT-PCR). Using this primer set, we detected 669-bp bands in agarose gel electrophoresis of RT-PCR products from feces containing Chiba or Norwalk viruses. Furthermore, in Southern hybridization with Chiba probes which were labeled with digoxigenin-dUTP in PCR, the bands of the two viruses were clearly stained under a low stringency condition. Since both Chiba and Norwalk viruses were detected by the above primer set although they are geographically and chronologically different viruses, our primer-pair may be useful for detection of a broad range of SRSVs which cause gastroenteritis in different areas.
Terminal region sequence variations in variola virus DNA.

PubMed

Massung, R F; Loparev, V N; Knight, J C; Totmenin, A V; Chizhikov, V E; Parsons, J M; Safronov, P F; Gutorov, V V; Shchelkunov, S N; Esposito, J J

1996-07-15

Genome DNA terminal region sequences were determined for a Brazilian alastrim variola minor virus strain Garcia-1966 that was associated with an 0.8% case-fatality rate and African smallpox strains Congo-1970 and Somalia-1977 associated with variola major (9.6%) and minor (0.4%) mortality rates, respectively. A base sequence identity of > or = 98.8% was determined after aligning 30 kb of the left- or right-end region sequences with cognate sequences previously determined for Asian variola major strains India-1967 (31% death rate) and Bangladesh-1975 (18.5% death rate). The deduced amino acid sequences of putative proteins of > or = 65 amino acids also showed relatively high identity, although the Asian and African viruses were clearly more related to each other than to alastrim virus. Alastrim virus contained only 10 of 70 proteins that were 100% identical to homologs in Asian strains, and 7 alastrim-specific proteins were noted.
A trait stacking system via intra-genomic homologous recombination.

PubMed

Kumar, Sandeep; Worden, Andrew; Novak, Stephen; Lee, Ryan; Petolino, Joseph F

2016-11-01

A gene targeting method has been developed, which allows the conversion of 'breeding stacks', containing unlinked transgenes into a 'molecular stack' and thereby circumventing the breeding challenges associated with transgene segregation. A gene targeting method has been developed for converting two unlinked trait loci into a single locus transgene stack. The method utilizes intra-genomic homologous recombination (IGHR) between stably integrated target and donor loci which share sequence homology and nuclease cleavage sites whereby the donor contains a promoterless herbicide resistance transgene. Upon crossing with a zinc finger nuclease (ZFN)-expressing plant, double-strand breaks (DSB) are created in both the stably integrated target and donor loci. DSBs flanking the donor locus result in intra-genomic mobilization of a promoterless selectable marker-containing donor sequence, which can be utilized as a template for homology-directed repair of a concomitant DSB at the target locus resulting in a functional selectable marker via nuclease-mediated cassette exchange (NMCE). The method was successfully demonstrated in maize using a glyphosate tolerance gene as a donor whereby up to 3.3 % of the resulting progeny embryos cultured on selection medium regenerated plants with the donor sequence integrated into the target locus. The process could be extended to multiple cycles of trait stacking by virtue of a unique intron sequence homology for NMCE between the target and the donor loci. This is the first report that describes NMCE via IGHR, thereby enabling trait stacking using conventional crossing.
An Artificial Functional Family Filter in Homolog Searching in Next-generation Sequencing Metagenomics

PubMed Central

Du, Ruofei; Mercante, Donald; Fang, Zhide

2013-01-01

In functional metagenomics, BLAST homology search is a common method to classify metagenomic reads into protein/domain sequence families such as Clusters of Orthologous Groups of proteins (COGs) in order to quantify the abundance of each COG in the community. The resulting functional profile of the community is then used in downstream analysis to correlate the change in abundance to environmental perturbation, clinical variation, and so on. However, the short read length coupled with next-generation sequencing technologies poses a barrier in this approach, essentially because similarity significance cannot be discerned by searching with short reads. Consequently, artificial functional families are produced, in which those with a large number of reads assigned decreases the accuracy of functional profile dramatically. There is no method available to address this problem. We intended to fill this gap in this paper. We revealed that BLAST similarity scores of homologues for short reads from COG protein members coding sequences are distributed differently from the scores of those derived elsewhere. We showed that, by choosing an appropriate score cut-off, we are able to filter out most artificial families and simultaneously to preserve sufficient information in order to build the functional profile. We also showed that, by incorporated application of BLAST and RPS-BLAST, some artificial families with large read counts can be further identified after the score cutoff filtration. Evaluated on three experimental metagenomic datasets with different coverages, we found that the proposed method is robust against read coverage and consistently outperforms the other E-value cutoff methods currently used in literatures. PMID:23516532
Cloning and sequence analysis of complementary DNA encoding an aberrantly rearranged human T-cell gamma chain.

PubMed Central

Dialynas, D P; Murre, C; Quertermous, T; Boss, J M; Leiden, J M; Seidman, J G; Strominger, J L

1986-01-01

Complementary DNA (cDNA) encoding a human T-cell gamma chain has been cloned and sequenced. At the junction of the variable and joining regions, there is an apparent deletion of two nucleotides in the human cDNA sequence relative to the murine gamma-chain cDNA sequence, resulting simultaneously in the generation of an in-frame stop codon and in a translational frameshift. For this reason, the sequence presented here encodes an aberrantly rearranged human T-cell gamma chain. There are several surprising differences between the deduced human and murine gamma-chain amino acid sequences. These include poor homology in the variable region, poor homology in a discrete segment of the constant region precisely bounded by the expected junctions of exon CII, and the presence in the human sequence of five potential sites for N-linked glycosylation. Images PMID:3458221
A benchmark testing ground for integrating homology modeling and protein docking.

PubMed

Bohnuud, Tanggis; Luo, Lingqi; Wodak, Shoshana J; Bonvin, Alexandre M J J; Weng, Zhiping; Vajda, Sandor; Schueler-Furman, Ora; Kozakov, Dima

2017-01-01

Protein docking procedures carry out the task of predicting the structure of a protein-protein complex starting from the known structures of the individual protein components. More often than not, however, the structure of one or both components is not known, but can be derived by homology modeling on the basis of known structures of related proteins deposited in the Protein Data Bank (PDB). Thus, the problem is to develop methods that optimally integrate homology modeling and docking with the goal of predicting the structure of a complex directly from the amino acid sequences of its component proteins. One possibility is to use the best available homology modeling and docking methods. However, the models built for the individual subunits often differ to a significant degree from the bound conformation in the complex, often much more so than the differences observed between free and bound structures of the same protein, and therefore additional conformational adjustments, both at the backbone and side chain levels need to be modeled to achieve an accurate docking prediction. In particular, even homology models of overall good accuracy frequently include localized errors that unfavorably impact docking results. The predicted reliability of the different regions in the model can also serve as a useful input for the docking calculations. Here we present a benchmark dataset that should help to explore and solve combined modeling and docking problems. This dataset comprises a subset of the experimentally solved 'target' complexes from the widely used Docking Benchmark from the Weng Lab (excluding antibody-antigen complexes). This subset is extended to include the structures from the PDB related to those of the individual components of each complex, and hence represent potential templates for investigating and benchmarking integrated homology modeling and docking approaches. Template sets can be dynamically customized by specifying ranges in sequence similarity and in
Calcium-binding protein from mouse Ehrlich ascites-tumour cells is homologous to human calcyclin.

PubMed Central

Kuźnicki, J; Filipek, A; Hunziker, P E; Huber, S; Heizmann, C W

1989-01-01

A Ca2+-binding protein was purified from mouse Ehrlich ascites-tumour cells. The protein forms monomers and disulphide-linked dimers, which can be separated by reverse-phase h.p.l.c. A partial amino acid sequence analysis demonstrated that the protein has an EF-hand structure. A striking homology was found to rat and human calcyclin (a member of the S-100 protein family), which is possibly involved in cell-cycle regulation. Images Fig. 1. Fig. 2. PMID:2597136
Evidence of protein-free homology recognition in magnetic bead force-extension experiments

NASA Astrophysics Data System (ADS)

O'Lee, D. J.; Danilowicz, C.; Rochester, C.; Kornyshev, A. A.; Prentiss, M.

2016-07-01

Earlier theoretical studies have proposed that the homology-dependent pairing of large tracts of dsDNA may be due to physical interactions between homologous regions. Such interactions could contribute to the sequence-dependent pairing of chromosome regions that may occur in the presence or the absence of double-strand breaks. Several experiments have indicated the recognition of homologous sequences in pure electrolytic solutions without proteins. Here, we report single-molecule force experiments with a designed 60 kb long dsDNA construct; one end attached to a solid surface and the other end to a magnetic bead. The 60 kb constructs contain two 10 kb long homologous tracts oriented head to head, so that their sequences match if the two tracts fold on each other. The distance between the bead and the surface is measured as a function of the force applied to the bead. At low forces, the construct molecules extend substantially less than normal, control dsDNA, indicating the existence of preferential interaction between the homologous regions. The force increase causes no abrupt but continuous unfolding of the paired homologous regions. Simple semi-phenomenological models of the unfolding mechanics are proposed, and their predictions are compared with the data.
Evidence of protein-free homology recognition in magnetic bead force–extension experiments

PubMed Central

(O’) Lee, D. J.; Danilowicz, C.; Rochester, C.; Prentiss, M.

2016-01-01

Earlier theoretical studies have proposed that the homology-dependent pairing of large tracts of dsDNA may be due to physical interactions between homologous regions. Such interactions could contribute to the sequence-dependent pairing of chromosome regions that may occur in the presence or the absence of double-strand breaks. Several experiments have indicated the recognition of homologous sequences in pure electrolytic solutions without proteins. Here, we report single-molecule force experiments with a designed 60 kb long dsDNA construct; one end attached to a solid surface and the other end to a magnetic bead. The 60 kb constructs contain two 10 kb long homologous tracts oriented head to head, so that their sequences match if the two tracts fold on each other. The distance between the bead and the surface is measured as a function of the force applied to the bead. At low forces, the construct molecules extend substantially less than normal, control dsDNA, indicating the existence of preferential interaction between the homologous regions. The force increase causes no abrupt but continuous unfolding of the paired homologous regions. Simple semi-phenomenological models of the unfolding mechanics are proposed, and their predictions are compared with the data. PMID:27493568
Improve homology search sensitivity of PacBio data by correcting frameshifts.

PubMed

Du, Nan; Sun, Yanni

2016-09-01

Single-molecule, real-time sequencing (SMRT) developed by Pacific BioSciences produces longer reads than secondary generation sequencing technologies such as Illumina. The long read length enables PacBio sequencing to close gaps in genome assembly, reveal structural variations, and identify gene isoforms with higher accuracy in transcriptomic sequencing. However, PacBio data has high sequencing error rate and most of the errors are insertion or deletion errors. During alignment-based homology search, insertion or deletion errors in genes will cause frameshifts and may only lead to marginal alignment scores and short alignments. As a result, it is hard to distinguish true alignments from random alignments and the ambiguity will incur errors in structural and functional annotation. Existing frameshift correction tools are designed for data with much lower error rate and are not optimized for PacBio data. As an increasing number of groups are using SMRT, there is an urgent need for dedicated homology search tools for PacBio data. In this work, we introduce Frame-Pro, a profile homology search tool for PacBio reads. Our tool corrects sequencing errors and also outputs the profile alignments of the corrected sequences against characterized protein families. We applied our tool to both simulated and real PacBio data. The results showed that our method enables more sensitive homology search, especially for PacBio data sets of low sequencing coverage. In addition, we can correct more errors when comparing with a popular error correction tool that does not rely on hybrid sequencing. The source code is freely available at https://sourceforge.net/projects/frame-pro/ yannisun@msu.edu. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
DNA Sequence Polymorphism of the Lactate Dehydrogenase Genefrom Iranian Plasmodium vivax and Plasmodium falciparum Isolates.

PubMed

Getacher Feleke, Daniel; Nateghpour, Mehdi; Motevalli Haghi, Afsaneh; Hajjaran, Homa; Farivar, Leila; Mohebali, Mehdi; Raoofian, Reza

2015-01-01

Parasite lactate dehydrogenase (pLDH) is extensively employed as malaria rapid diagnostic tests (RDTs). Moreover, it is a well-known drug target candidate. However, the genetic diversity of this gene might influence performance of RDT kits and its drug target candidacy. This study aimed to determine polymorphism of pLDH gene from Iranian isolates of P. vivax and P. falciparum. Genomic DNA was extracted from whole blood of microscopically confirmed P. vivax and P. falciparum infected patients. pLDH gene of P. falciparum and P. vivax was amplified using conventional PCR from 43 symptomatic malaria patients from Sistan and Baluchistan Province, Southeast Iran from 2012 to 2013. Sequence analysis of 15 P. vivax LDH showed fourteen had 100% identity with P. vivax Sal-1 and Belem strains. Two nucleotide substitutions were detected with only one resulted in amino acid change. Analysis of P. falciparum LDH sequences showed six of the seven sequences had 100% homology with P. falciparum 3D7 and Mzr-1. Moreover, PfLDH displayed three nucleotide changes that resulted in changing only one amino acid. PvLDH and PfLDH showed 75%-76% nucleotide and 90.4%-90.76% amino acid homology. pLDH gene from Iranian P. falciparum and P. vivax isolates displayed 98.8-100% homology with 1-3 nucleotide substitutions. This indicated this gene was relatively conserved. Additional studies can be done weather this genetic variation can influence the performance of pLDH based RDTs or not.
Homologous prominence non-radial eruptions: A case study

NASA Astrophysics Data System (ADS)

Duchlev, P.; Koleva, K.; Madjarska, M. S.; Dechev, M.

2016-10-01

The present study provides important details on homologous eruptions of a solar prominence that occurred in active region NOAA 10904 on 2006 August 22. We report on the pre-eruptive phase of the homologous feature as well as the kinematics and the morphology of a forth from a series of prominence eruptions that is critical in defining the nature of the previous consecutive eruptions. The evolution of the overlying coronal field during homologous eruptions is discussed and a new observational criterion for homologous eruptions is provided. We find a distinctive sequence of three activation periods each of them containing pre-eruptive precursors such as a brightening and enlarging of the prominence body followed by small surge-like ejections from its southern end observed in the radio 17 GHz. We analyse a fourth eruption that clearly indicates a full reformation of the prominence after the third eruption. The fourth eruption although occurring 11 h later has an identical morphology, the same angle of propagation with respect to the radial direction, as well as similar kinematic evolution as the previous three eruptions. We find an important feature of the homologous eruptive prominence sequence that is the maximum height increase of each consecutive eruption. The present analysis establishes that all four eruptions observed in Hα are of confined type with the third eruption undergoing a thermal disappearance during its eruptive phase. We suggest that the observation of the same direction of the magnetic flux rope (MFR) ejections can be consider as an additional observational criterion for MFR homology. This observational indication for homologous eruptions is important, especially in the case of events of typical or poorly distinguishable morphology of eruptive solar phenomena.
Structural and Sequence Similarities of Hydra Xeroderma Pigmentosum A Protein to Human Homolog Suggest Early Evolution and Conservation

PubMed Central

Ghaskadbi, Saroj

2013-01-01

Xeroderma pigmentosum group A (XPA) is a protein that binds to damaged DNA, verifies presence of a lesion, and recruits other proteins of the nucleotide excision repair (NER) pathway to the site. Though its homologs from yeast, Drosophila, humans, and so forth are well studied, XPA has not so far been reported from protozoa and lower animal phyla. Hydra is a fresh-water cnidarian with a remarkable capacity for regeneration and apparent lack of organismal ageing. Cnidarians are among the first metazoa with a defined body axis, tissue grade organisation, and nervous system. We report here for the first time presence of XPA gene in hydra. Putative protein sequence of hydra XPA contains nuclear localization signal and bears the zinc-finger motif. It contains two conserved Pfam domains and various characterized features of XPA proteins like regions for binding to excision repair cross-complementing protein-1 (ERCC1) and replication protein A 70 kDa subunit (RPA70) proteins. Hydra XPA shows a high degree of similarity with vertebrate homologs and clusters with deuterostomes in phylogenetic analysis. Homology modelling corroborates the very close similarity between hydra and human XPA. The protein thus most likely functions in hydra in the same manner as in other animals, indicating that it arose early in evolution and has been conserved across animal phyla. PMID:24083246
Structural and sequence similarities of hydra xeroderma pigmentosum A protein to human homolog suggest early evolution and conservation.

PubMed

Barve, Apurva; Ghaskadbi, Saroj; Ghaskadbi, Surendra

2013-01-01

Xeroderma pigmentosum group A (XPA) is a protein that binds to damaged DNA, verifies presence of a lesion, and recruits other proteins of the nucleotide excision repair (NER) pathway to the site. Though its homologs from yeast, Drosophila, humans, and so forth are well studied, XPA has not so far been reported from protozoa and lower animal phyla. Hydra is a fresh-water cnidarian with a remarkable capacity for regeneration and apparent lack of organismal ageing. Cnidarians are among the first metazoa with a defined body axis, tissue grade organisation, and nervous system. We report here for the first time presence of XPA gene in hydra. Putative protein sequence of hydra XPA contains nuclear localization signal and bears the zinc-finger motif. It contains two conserved Pfam domains and various characterized features of XPA proteins like regions for binding to excision repair cross-complementing protein-1 (ERCC1) and replication protein A 70 kDa subunit (RPA70) proteins. Hydra XPA shows a high degree of similarity with vertebrate homologs and clusters with deuterostomes in phylogenetic analysis. Homology modelling corroborates the very close similarity between hydra and human XPA. The protein thus most likely functions in hydra in the same manner as in other animals, indicating that it arose early in evolution and has been conserved across animal phyla.
New Insight Into the Diversity of SemiSWEET Sugar Transporters and the Homologs in Prokaryotes

PubMed Central

Jia, Baolei; Hao, Lujiang; Xuan, Yuan Hu; Jeon, Che Ok

2018-01-01

Sugars will eventually be exported transporters (SWEETs) and SemiSWEETs represent a family of sugar transporters in eukaryotes and prokaryotes, respectively. SWEETs contain seven transmembrane helices (TMHs), while SemiSWEETs contain three. The functions of SemiSWEETs are less studied. In this perspective article, we analyzed the diversity and conservation of SemiSWEETs and further proposed the possible functions. 1,922 SemiSWEET homologs were retrieved from the UniProt database, which is not proportional to the sequenced prokaryotic genomes. However, these proteins are very diverse in sequences and can be classified into 19 clusters when >50% sequence identity is required. Moreover, a gene context analysis indicated that several SemiSWEETs are located in the operons that are related to diverse carbohydrate metabolism. Several proteins with seven TMHs can be found in bacteria, and sequence alignment suggested that these proteins in bacteria may be formed by the duplication and fusion. Multiple sequence alignments showed that the amino acids for sugar translocation are still conserved and coevolved, although the sequences show diversity. Among them, the functions of a few amino acids are still not clear. These findings highlight the challenges that exist in SemiSWEETs and provide future researchers the foundation to explore these uncharted areas. PMID:29872447

New Insight Into the Diversity of SemiSWEET Sugar Transporters and the Homologs in Prokaryotes.

PubMed

Jia, Baolei; Hao, Lujiang; Xuan, Yuan Hu; Jeon, Che Ok

2018-01-01

Sugars will eventually be exported transporters (SWEETs) and SemiSWEETs represent a family of sugar transporters in eukaryotes and prokaryotes, respectively. SWEETs contain seven transmembrane helices (TMHs), while SemiSWEETs contain three. The functions of SemiSWEETs are less studied. In this perspective article, we analyzed the diversity and conservation of SemiSWEETs and further proposed the possible functions. 1,922 SemiSWEET homologs were retrieved from the UniProt database, which is not proportional to the sequenced prokaryotic genomes. However, these proteins are very diverse in sequences and can be classified into 19 clusters when >50% sequence identity is required. Moreover, a gene context analysis indicated that several SemiSWEETs are located in the operons that are related to diverse carbohydrate metabolism. Several proteins with seven TMHs can be found in bacteria, and sequence alignment suggested that these proteins in bacteria may be formed by the duplication and fusion. Multiple sequence alignments showed that the amino acids for sugar translocation are still conserved and coevolved, although the sequences show diversity. Among them, the functions of a few amino acids are still not clear. These findings highlight the challenges that exist in SemiSWEETs and provide future researchers the foundation to explore these uncharted areas.
RNABindRPlus: a predictor that combines machine learning and sequence homology-based methods to improve the reliability of predicted RNA-binding residues in proteins.

PubMed

Walia, Rasna R; Xue, Li C; Wilkins, Katherine; El-Manzalawy, Yasser; Dobbs, Drena; Honavar, Vasant

2014-01-01

Protein-RNA interactions are central to essential cellular processes such as protein synthesis and regulation of gene expression and play roles in human infectious and genetic diseases. Reliable identification of protein-RNA interfaces is critical for understanding the structural bases and functional implications of such interactions and for developing effective approaches to rational drug design. Sequence-based computational methods offer a viable, cost-effective way to identify putative RNA-binding residues in RNA-binding proteins. Here we report two novel approaches: (i) HomPRIP, a sequence homology-based method for predicting RNA-binding sites in proteins; (ii) RNABindRPlus, a new method that combines predictions from HomPRIP with those from an optimized Support Vector Machine (SVM) classifier trained on a benchmark dataset of 198 RNA-binding proteins. Although highly reliable, HomPRIP cannot make predictions for the unaligned parts of query proteins and its coverage is limited by the availability of close sequence homologs of the query protein with experimentally determined RNA-binding sites. RNABindRPlus overcomes these limitations. We compared the performance of HomPRIP and RNABindRPlus with that of several state-of-the-art predictors on two test sets, RB44 and RB111. On a subset of proteins for which homologs with experimentally determined interfaces could be reliably identified, HomPRIP outperformed all other methods achieving an MCC of 0.63 on RB44 and 0.83 on RB111. RNABindRPlus was able to predict RNA-binding residues of all proteins in both test sets, achieving an MCC of 0.55 and 0.37, respectively, and outperforming all other methods, including those that make use of structure-derived features of proteins. More importantly, RNABindRPlus outperforms all other methods for any choice of tradeoff between precision and recall. An important advantage of both HomPRIP and RNABindRPlus is that they rely on readily available sequence and sequence
An aureobasidin A resistance gene isolated from Aspergillus is a homolog of yeast AUR1, a gene responsible for inositol phosphorylceramide (IPC) synthase activity.

PubMed

Kuroda, M; Hashida-Okado, T; Yasumoto, R; Gomi, K; Kato, I; Takesako, K

1999-03-01

The AUR1 gene of Saccharomyces cerevisiae, mutations in which confer resistance to the antibiotic aureobasidin A, is necessary for inositol phosphorylceramide (IPC) synthase activity. We report the molecular cloning and characterization of the Aspergillus nidulans aurA gene, which is homologous to AUR1. A single point mutation in the aurA gene of A. nidulans confers a high level of resistance to aureobasidin A. The A. nidulans aurA gene was used to identify its homologs in other Aspergillus species, including A. fumigatus, A. niger, and A. oryzae. The deduced amino acid sequence of an aurA homolog from the pathogenic fungus A. fumigatus showed 87% identity to that of A. nidulans. The AurA proteins of A. nidulans and A. fumigatus shared common characteristics in primary structure, including sequence, hydropathy profile, and N-glycosylation sites, with their S. cerevisiae, Schizosaccharomyces pombe, and Candida albicans counterparts. These results suggest that the aureobasidin resistance gene is conserved evolutionarily in various fungi.
Mitochondrial Genome Sequence of the Legume Vicia faba

PubMed Central

Negruk, Valentine

2013-01-01

The number of plant mitochondrial genomes sequenced exceeds two dozen. However, for a detailed comparative study of different phylogenetic branches more plant mitochondrial genomes should be sequenced. This article presents sequencing data and comparative analysis of mitochondrial DNA (mtDNA) of the legume Vicia faba. The size of the V. faba circular mitochondrial master chromosome of cultivar Broad Windsor was estimated as 588,000 bp with a genome complexity of 387,745 bp and 52 conservative mitochondrial genes; 32 of them encoding proteins, 3 rRNA, and 17 tRNA genes. Six tRNA genes were highly homologous to chloroplast genome sequences. In addition to the 52 conservative genes, 114 unique open reading frames (ORFs) were found, 36 without significant homology to any known proteins and 29 with homology to the Medicago truncatula nuclear genome and to other plant mitochondrial ORFs, 49 ORFs were not homologous to M. truncatula but possessed sequences with significant homology to other plant mitochondrial or nuclear ORFs. In general, the unique ORFs revealed very low homology to known closely related legumes, but several sequence homologies were found between V. faba, Beta vulgaris, Nicotiana tabacum, Vitis vinifera, and even the monocots Oryza sativa and Zea mays. Most likely these ORFs arose independently during angiosperm evolution (Kubo and Mikami, 2007; Kubo and Newton, 2008). Computational analysis revealed in total about 45% of V. faba mtDNA sequence being homologous to the Medicago truncatula nuclear genome (more than to any sequenced plant mitochondrial genome), and 35% of this homology ranging from a few dozen to 12,806 bp are located on chromosome 1. Apparently, mitochondrial rrn5, rrn18, rps10, ATP synthase subunit alpha, cox2, and tRNA sequences are part of transcribed nuclear mosaic ORFs. PMID:23675376
Caught in the act: the lifetime of synaptic intermediates during the search for homology on DNA

PubMed Central

Mani, Adam; Braslavsky, Ido; Arbel-Goren, Rinat; Stavans, Joel

2010-01-01

Homologous recombination plays pivotal roles in DNA repair and in the generation of genetic diversity. To locate homologous target sequences at which strand exchange can occur within a timescale that a cell’s biology demands, a single-stranded DNA-recombinase complex must search among a large number of sequences on a genome by forming synapses with chromosomal segments of DNA. A key element in the search is the time it takes for the two sequences of DNA to be compared, i.e. the synapse lifetime. Here, we visualize for the first time fluorescently tagged individual synapses formed by RecA, a prokaryotic recombinase, and measure their lifetime as a function of synapse length and differences in sequence between the participating DNAs. Surprisingly, lifetimes can be ∼10 s long when the DNAs are fully heterologous, and much longer for partial homology, consistently with ensemble FRET measurements. Synapse lifetime increases rapidly as the length of a region of full homology at either the 3′- or 5′-ends of the invading single-stranded DNA increases above 30 bases. A few mismatches can reduce dramatically the lifetime of synapses formed with nearly homologous DNAs. These results suggest the need for facilitated homology search mechanisms to locate homology successfully within the timescales observed in vivo. PMID:20044347
Phylo-mLogo: an interactive and hierarchical multiple-logo visualization tool for alignment of many sequences

PubMed Central

Shih, Arthur Chun-Chieh; Lee, DT; Peng, Chin-Lin; Wu, Yu-Wei

2007-01-01

Background When aligning several hundreds or thousands of sequences, such as epidemic virus sequences or homologous/orthologous sequences of some big gene families, to reconstruct the epidemiological history or their phylogenies, how to analyze and visualize the alignment results of many sequences has become a new challenge for computational biologists. Although there are several tools available for visualization of very long sequence alignments, few of them are applicable to the alignments of many sequences. Results A multiple-logo alignment visualization tool, called Phylo-mLogo, is presented in this paper. Phylo-mLogo calculates the variabilities and homogeneities of alignment sequences by base frequencies or entropies. Different from the traditional representations of sequence logos, Phylo-mLogo not only displays the global logo patterns of the whole alignment of multiple sequences, but also demonstrates their local homologous logos for each clade hierarchically. In addition, Phylo-mLogo also allows the user to focus only on the analysis of some important, structurally or functionally constrained sites in the alignment selected by the user or by built-in automatic calculation. Conclusion With Phylo-mLogo, the user can symbolically and hierarchically visualize hundreds of aligned sequences simultaneously and easily check the changes of their amino acid sites when analyzing many homologous/orthologous or influenza virus sequences. More information of Phylo-mLogo can be found at URL . PMID:17319966
The amino acid sequence of Staphylococcus aureus penicillinase.

PubMed Central

Ambler, R P

1975-01-01

The amino acid sequence of the penicillinase (penicillin amido-beta-lactamhydrolase, EC 3.5.2.6) from Staphylococcus aureus strain PC1 was determined. The protein consists of a single polypeptide chain of 257 residues, and the sequence was determined by characterization of tryptic, chymotryptic, peptic and CNBr peptides, with some additional evidence from thermolysin and S. aureus proteinase peptides. A mistake in the preliminary report of the sequence is corrected; residues 113-116 are now thought to be -Lys-Lys-Val-Lys- rather than -Lys-Val-Lys-Lys-. Detailed evidence for the amino acid sequence has been deposited as Supplementary Publication SUP 50056 (91 pages) at the British Library (Lending Division), Boston Spa, Wetherby, West Yorkshire LS23 7BQ, U.K., from whom copies may be obtained on the terms given in Biochem. J. (1975) 145, 5. PMID:1218078
Nucleotide sequences of two genomic DNAs encoding peroxidase of Arabidopsis thaliana.

PubMed

Intapruk, C; Higashimura, N; Yamamoto, K; Okada, N; Shinmyo, A; Takano, M

1991-02-15

The peroxidase (EC 1.11.1.7)-encoding gene of Arabidopsis thaliana was screened from a genomic library using a cDNA encoding a neutral isozyme of horseradish, Armoracia rusticana, peroxidase (HRP) as a probe, and two positive clones were isolated. From the comparison with the sequences of the HRP-encoding genes, we concluded that two clones contained peroxidase-encoding genes, and they were named prxCa and prxEa. Both genes consisted of four exons and three introns; the introns had consensus nucleotides, GT and AG, at the 5' and 3' ends, respectively. The lengths of each putative exon of the prxEa gene were the same as those of the HRP-basic-isozyme-encoding gene, prxC3, and coded for 349 amino acids (aa) with a sequence homology of 89% to that encoded by prxC3. The prxCa gene was very close to the HRP-neutral-isozyme-encoding gene, prxC1b, and coded for 354 aa with 91% homology to that encoded by prxC1b. The aa sequence homology was 64% between the two peroxidases encoded by prxCa and prxEa.
Homology analysis and cross-immunogenicity of OmpA from pathogenic Yersinia enterocolitica, Yersinia pseudotuberculosis and Yersinia pestis.

PubMed

Chen, Yuhuang; Duan, Ran; Li, Xu; Li, Kewei; Liang, Junrong; Liu, Chang; Qiu, Haiyan; Xiao, Yuchun; Jing, Huaiqi; Wang, Xin

2015-12-01

The outer membrane protein A (OmpA) is one of the intra-species conserved proteins with immunogenicity widely found in the family of Enterobacteriaceae. Here we first confirmed OmpA is conserved in the three pathogenic Yersinia: Yersinia pestis, Yersinia pseudotuberculosis and pathogenic Yersinia enterocolitica, with high homology at the nucleotide level and at the amino acid sequence level. The identity of ompA sequences for 262 Y. pestis strains, 134 Y. pseudotuberculosis strains and 219 pathogenic Y. enterocolitica strains are 100%, 98.8% and 97.7% similar. The main pattern of OmpA of pathogenic Yersinia are 86.2% and 88.8% identical at the nucleotide and amino acid sequence levels, respectively. Immunological analysis showed the immunogenicity of each OmpA and cross-immunogenicity of OmpA for pathogenic Yersinia where OmpA may be a vaccine candidate for Y. pestis and other pathogenic Yersinia. Copyright © 2015 Elsevier Ltd. All rights reserved.
SANSparallel: interactive homology search against Uniprot

PubMed Central

Somervuo, Panu; Holm, Liisa

2015-01-01

Proteins evolve by mutations and natural selection. The network of sequence similarities is a rich source for mining homologous relationships that inform on protein structure and function. There are many servers available to browse the network of homology relationships but one has to wait up to a minute for results. The SANSparallel webserver provides protein sequence database searches with immediate response and professional alignment visualization by third-party software. The output is a list, pairwise alignment or stacked alignment of sequence-similar proteins from Uniprot, UniRef90/50, Swissprot or Protein Data Bank. The stacked alignments are viewed in Jalview or as sequence logos. The database search uses the suffix array neighborhood search (SANS) method, which has been re-implemented as a client-server, improved and parallelized. The method is extremely fast and as sensitive as BLAST above 50% sequence identity. Benchmarks show that the method is highly competitive compared to previously published fast database search programs: UBLAST, DIAMOND, LAST, LAMBDA, RAPSEARCH2 and BLAT. The web server can be accessed interactively or programmatically at http://ekhidna2.biocenter.helsinki.fi/cgi-bin/sans/sans.cgi. It can be used to make protein functional annotation pipelines more efficient, and it is useful in interactive exploration of the detailed evidence supporting the annotation of particular proteins of interest. PMID:25855811
Using distances between Top-n-gram and residue pairs for protein remote homology detection.

PubMed

Liu, Bin; Xu, Jinghao; Zou, Quan; Xu, Ruifeng; Wang, Xiaolong; Chen, Qingcai

2014-01-01

Protein remote homology detection is one of the central problems in bioinformatics, which is important for both basic research and practical application. Currently, discriminative methods based on Support Vector Machines (SVMs) achieve the state-of-the-art performance. Exploring feature vectors incorporating the position information of amino acids or other protein building blocks is a key step to improve the performance of the SVM-based methods. Two new methods for protein remote homology detection were proposed, called SVM-DR and SVM-DT. SVM-DR is a sequence-based method, in which the feature vector representation for protein is based on the distances between residue pairs. SVM-DT is a profile-based method, which considers the distances between Top-n-gram pairs. Top-n-gram can be viewed as a profile-based building block of proteins, which is calculated from the frequency profiles. These two methods are position dependent approaches incorporating the sequence-order information of protein sequences. Various experiments were conducted on a benchmark dataset containing 54 families and 23 superfamilies. Experimental results showed that these two new methods are very promising. Compared with the position independent methods, the performance improvement is obvious. Furthermore, the proposed methods can also provide useful insights for studying the features of protein families. The better performance of the proposed methods demonstrates that the position dependant approaches are efficient for protein remote homology detection. Another advantage of our methods arises from the explicit feature space representation, which can be used to analyze the characteristic features of protein families. The source code of SVM-DT and SVM-DR is available at http://bioinformatics.hitsz.edu.cn/DistanceSVM/index.jsp.
Two amino acid residues confer different binding affinities of Abelson family kinase SRC homology 2 domains for phosphorylated cortactin.

PubMed

Gifford, Stacey M; Liu, Weizhi; Mader, Christopher C; Halo, Tiffany L; Machida, Kazuya; Boggon, Titus J; Koleske, Anthony J

2014-07-11

The closely related Abl family kinases, Arg and Abl, play important non-redundant roles in the regulation of cell morphogenesis and motility. Despite similar N-terminal sequences, Arg and Abl interact with different substrates and binding partners with varying affinities. This selectivity may be due to slight differences in amino acid sequence leading to differential interactions with target proteins. We report that the Arg Src homology (SH) 2 domain binds two specific phosphotyrosines on cortactin, a known Abl/Arg substrate, with over 10-fold higher affinity than the Abl SH2 domain. We show that this significant affinity difference is due to the substitution of arginine 161 and serine 187 in Abl to leucine 207 and threonine 233 in Arg, respectively. We constructed Abl SH2 domains with R161L and S187T mutations alone and in combination and find that these substitutions are sufficient to convert the low affinity Abl SH2 domain to a higher affinity "Arg-like" SH2 domain in binding to a phospho-cortactin peptide. We crystallized the Arg SH2 domain for structural comparison to existing crystal structures of the Abl SH2 domain. We show that these two residues are important determinants of Arg and Abl SH2 domain binding specificity. Finally, we expressed Arg containing an "Abl-like" low affinity mutant Arg SH2 domain (L207R/T233S) and find that this mutant, although properly localized to the cell periphery, does not support wild type levels of cell edge protrusion. Together, these observations indicate that these two amino acid positions confer different binding affinities and cellular functions on the distinct Abl family kinases. © 2014 by The American Society for Biochemistry and Molecular Biology, Inc.
Induction of homologous recombination in Saccharomyces cerevisiae.

PubMed

Simon, J R; Moore, P D

1988-09-01

We have investigated the effects of UV irradiation of Saccharomyces cerevisiae in order to distinguish whether UV-induced recombination results from the induction of enzymes required for homologous recombination, or the production of substrate sites for recombination containing regions of DNA damage. We utilized split-dose experiments to investigate the induction of proteins required for survival, gene conversion, and mutation in a diploid strain of S. cerevisiae. We demonstrate that inducing doses of UV irradiation followed by a 6 h period of incubation render the cells resistant to challenge doses of UV irradiation. The effects of inducing and challenge doses of UV irradiation upon interchromosomal gene conversion and mutation are strictly additive. Using the yeast URA3 gene cloned in non-replicating single- and double-stranded plasmid vectors that integrate into chromosomal genes upon transformation, we show that UV irradiation of haploid yeast cells and homologous plasmid DNA sequences each stimulate homologous recombination approximately two-fold, and that these effects are additive. Non-specific DNA damage has little effect on the stimulation of homologous recombination, as shown by studies in which UV-irradiated heterologous DNA was included in transformation/recombination experiments. We further demonstrate that the effect of competing single- and double-stranded heterologous DNA sequences differs in UV-irradiated and unirradiated cells, suggesting an induction of recombinational machinery in UV-irradiated S. cerevisiae cells.
orthoFind Facilitates the Discovery of Homologous and Orthologous Proteins.

PubMed

Mier, Pablo; Andrade-Navarro, Miguel A; Pérez-Pulido, Antonio J

2015-01-01

Finding homologous and orthologous protein sequences is often the first step in evolutionary studies, annotation projects, and experiments of functional complementation. Despite all currently available computational tools, there is a requirement for easy-to-use tools that provide functional information. Here, a new web application called orthoFind is presented, which allows a quick search for homologous and orthologous proteins given one or more query sequences, allowing a recurrent and exhaustive search against reference proteomes, and being able to include user databases. It addresses the protein multidomain problem, searching for homologs with the same domain architecture, and gives a simple functional analysis of the results to help in the annotation process. orthoFind is easy to use and has been proven to provide accurate results with different datasets. Availability: http://www.bioinfocabd.upo.es/orthofind/.
The semaphorontic view of homology.

PubMed

Havstad, Joyce C; Assis, Leandro C S; Rieppel, Olivier

2015-11-01

The relation of homology is generally characterized as an identity relation, or alternatively as a correspondence relation, both of which are transitive. We use the example of the ontogenetic development and evolutionary origin of the gnathostome jaw to discuss identity and transitivity of the homology relation under the transformationist and emergentist paradigms respectively. Token identity and consequent transitivity of homology relations are shown to be requirements that are too strong to allow the origin of genuine evolutionary novelties. We consequently introduce the concept of compositional identity that is grounded in relations prevailing between parts (organs and organ systems) of a whole (organism). We recognize an ontogenetic identity of parts within a whole throughout the sequence of successive developmental stages of those parts: this is an intra-organismal character identity maintained throughout developmental trajectory. Correspondingly, we recognize a phylogenetic identity of homologous parts within two or more organisms of different species: this is an inter-species character identity maintained throughout evolutionary trajectory. These different dimensions of character identity--ontogenetic (through development) and phylogenetic (via shared evolutionary history)--break the transitivity of homology relations. Under the transformationist paradigm, the relation of homology reigns over the entire character (-state) transformation series, and thus encompasses the plesiomorphic as well as the apomorphic condition of form. In contrast, genuine evolutionary novelties originate not through transformation of ancestral characters (-states), but instead through deviating developmental trajectories that result in alternate characters. Under the emergentist paradigm, homology is thus synonymous with synapomorphy. © 2015 The Authors. Journal of Experimental Zoology Part B: Molecular and Developmental Evolution Published by Wiley Periodicals, Inc.
Detection and isolation of nucleic acid sequences using competitive hybridization probes

DOEpatents

Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

1997-01-01

A method for detecting a target nucleic acid sequence in a sample is provided using hybridization probes which competitively hybridize to a target nucleic acid. According to the method, a target nucleic acid sequence is hybridized to first and second hybridization probes which are complementary to overlapping portions of the target nucleic acid sequence, the first hybridization probe including a first complexing agent capable of forming a binding pair with a second complexing agent and the second hybridization probe including a detectable marker. The first complexing agent attached to the first hybridization probe is contacted with a second complexing agent, the second complexing agent being attached to a solid support such that when the first and second complexing agents are attached, target nucleic acid sequences hybridized to the first hybridization probe become immobilized on to the solid support. The immobilized target nucleic acids are then separated and detected by detecting the detectable marker attached to the second hybridization probe. A kit for performing the method is also provided.
Detection and isolation of nucleic acid sequences using competitive hybridization probes

DOEpatents

Lucas, J.N.; Straume, T.; Bogen, K.T.

1997-04-01

A method for detecting a target nucleic acid sequence in a sample is provided using hybridization probes which competitively hybridize to a target nucleic acid. According to the method, a target nucleic acid sequence is hybridized to first and second hybridization probes which are complementary to overlapping portions of the target nucleic acid sequence, the first hybridization probe including a first complexing agent capable of forming a binding pair with a second complexing agent and the second hybridization probe including a detectable marker. The first complexing agent attached to the first hybridization probe is contacted with a second complexing agent, the second complexing agent being attached to a solid support such that when the first and second complexing agents are attached, target nucleic acid sequences hybridized to the first hybridization probe become immobilized on to the solid support. The immobilized target nucleic acids are then separated and detected by detecting the detectable marker attached to the second hybridization probe. A kit for performing the method is also provided. 7 figs.
Human homolog of the mouse sperm receptor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chamberlin, M.E.; Dean, J.

1990-08-01

The human zona pellucida, composed of three glycoproteins (ZP1, ZP2, and ZP3), forms an extracellular matrix that surrounds ovulated eggs and mediates species-specific fertilization. The genes that code for at least two of the zona proteins (ZP2 and ZP3) cross-hybridize with other mammalian DNA. The recently characterized mouse sperm receptor gene (Zp-3) was used to isolate its human homolog. The human homolog spans {approx}18.3 kilobase pairs (kbp) (compared to 8.6 kbp for the mouse gene) and contains eight exons, the sizes of which are strictly conserved between the two species. Four short (8-15 bp) sequences within the first 250 bpmore » of the 5{prime} flanking region in the human Zp-3 homolog are also present upstream of mouse Zp-3. These elements may modulate oocyte-specific gene expression. By using the polymerase chain reaction, a full-length cDNA of human ZP3 was isolated from human ovarian poly(A){sup +} RNA and used to deduce the structure of human ZP3 mRNA. Certain features of the human and mouse ZP3 transcripts are conserved. Both have unusually short 5{prime} and 3{prime} untranslated regions, both contain a single open reading frame that is 74% identical, and both code for 424 amino acid polypeptides that are 67% the same. The similarity between the two proteins may define domains that are important in maintaining the structural integrity of the zona pellucida, while the differences may play a role in mediating the species-specific events of mammalian fertilization.« less
Tumor malignancy is engaged to prokaryotic homolog toolbox.

PubMed

Fernandes, Janaina; Guedes, Patrícia G; Lage, Celso Luiz S; Rodrigues, Juliany Cola F; Lage, Claudia de Alencar S

2012-04-01

Cancer cells display high proliferation rates and survival provided by high glycolysis, chemoresistance and radioresistance, metabolic features that appear to be activated with malignancy, and seemed to have arisen as early in evolution as in unicellular/prokaryotic organisms. Based on these assumptions, we hypothesize that aggressive phenotypes found in malignant cells may be related to acquired unicellular behavior, launched within a tumor when viral and prokaryotic homologs are overexpressed performing likely robust functions. The ensemble of these expressed viral and prokaryotic close homologs in the proteome of a tumor tissue gives them advantage over normal cells. To assess the hypothesis validity, sequences of human proteins involved in apoptosis, energetic metabolism, cell mobility and adhesion, chemo- and radio-resistance were aligned to homologs present in other life forms, excluding all eukaryotes, using PSI-BLAST, with further corroboration from data available in the literature. The analysis revealed that selected sequences of proteins involved in apoptosis and tumor suppression (as p53 and pRB) scored non-significant (E-value>0.001) with prokaryotic homologs; on the other hand, human proteins involved in cellular chemo- and radio-resistance scored highly significant with prokaryotic and viral homologs (as catalase, E-value=zero). We inferred that such upregulated and/or functionally activated proteins in aggressive malignant cells represent a toolbox of modern human homologs evolved from a similar key set that have granted survival of ancient prokaryotes against extremely harsh environments. According to what has been discussed along this analysis, high mutation rates usually hit hotspots in important conserved protein domains, allowing uncontrolled expansion of more resistant, death-evading malignant clones. That is the case of point mutations in key viral proteins affording viruses escape to chemotherapy, and human homologs of such retroviral
Porcine parvovirus: DNA sequence and genome organization.

PubMed

Ranz, A I; Manclús, J J; Díaz-Aroca, E; Casal, J I

1989-10-01

We have determined the nucleotide sequence of an almost full-length clone of porcine parvovirus (PPV). The sequence is 4973 nucleotides (nt) long. The 3' end of virion DNA shows a Y-shaped configuration homologous to rodent parvoviruses. The 5' end of virion DNA shows a repetition of 127 nt at the carboxy terminus of the capsid proteins. The overall organization of the PPV genome is similar to those of other autonomous parvoviruses. There are two large open reading frames (ORFs) that almost entirely cover the genome, both located in the same frame of the complementary strand. The left ORF encodes the non-structural protein NS1 and the right ORF encodes the capsid proteins (VP1, VP2 and VP3). Promoter analysis, location of splicing sites and putative amino acid sequences for the viral proteins show a high homology of PPV with feline panleukopenia virus and canine parvoviruses (FPV and CPV) and rodent parvovirus. Therefore we conclude that PPV is related to the Kilham rat virus (KRV) group of autonomous parvoviruses formed by KRV, minute virus of mice, Lu III, H-1, FPV and CPV.

Sequence divergence of the red and green visual pigments in great apes and humans.

PubMed Central

Deeb, S S; Jorgensen, A L; Battisti, L; Iwasaki, L; Motulsky, A G

1994-01-01

We have determined the coding sequences of red and green visual pigment genes of the chimpanzee, gorilla, and orangutan. The deduced amino acid sequences of these pigments are highly homologous to the equivalent human pigments. None of the amino acid differences occurred at sites that were previously shown to influence pigment absorption characteristics. Therefore, we predict the spectra of red and green pigments of the apes to have wavelengths of maximum absorption that differ by < 2 nm from the equivalent human pigments and that color vision in these nonhuman primates will be very similar, if not identical, to that in humans. A total of 14 within-species polymorphisms (6 involving silent substitutions) were observed in the coding sequences of the red and green pigment genes of the great apes. Remarkably, the polymorphisms at 6 of these sites had been observed in human populations, suggesting that they predated the evolution of higher primates. Alleles at polymorphic sites were often shared between the red and green pigment genes. The average synonymous rate of divergence of red from green sequences was approximately 1/10th that estimated for other proteins of higher primates, indicating the involvement of gene conversion in generating these polymorphisms. The high degree of homology and juxtaposition of these two genes on the X chromosome has promoted unequal recombination and/or gene conversion that led to sequence homogenization. However, natural selection operated to maintain the degree of separation in peak absorbance between the red and green pigments that resulted in optimal chromatic discrimination. This represents a unique case of molecular coevolution between two homologous genes that functionally interact at the behavioral level. PMID:8041777
Characterization and Nucleotide Sequence of CARB-6, a New Carbenicillin-Hydrolyzing β-Lactamase from Vibrio cholerae

PubMed Central

Choury, Danièle; Aubert, Gérald; Szajnert, Marie-France; Azibi, Kemal; Delpech, Marc; Paul, Gérard

1999-01-01

A clinical strain of Vibrio cholerae non-O1 non-O139 isolated in France produced a new β-lactamase with a pI of 5.35. The purified enzyme, with a molecular mass of 33,000 Da, was characterized. Its kinetic constants show it to be a carbenicillin-hydrolyzing enzyme comparable to the five previously reported CARB β-lactamases and to SAR-1, another carbenicillin-hydrolyzing β-lactamase that has a pI of 4.9 and that is produced by a V. cholerae strain from Tanzania. This β-lactamase is designated CARB-6, and the gene for CARB-6 could not be transferred to Escherichia coli K-12 by conjugation. The nucleotide sequence of the structural gene was determined by direct sequencing of PCR-generated fragments from plasmid DNA with four pairs of primers covering the whole sequence of the reference CARB-3 gene. The gene encodes a 288-amino-acid protein that shares 94% homology with the CARB-1, CARB-2, and CARB-3 enzymes, 93% homology with the Proteus mirabilis N29 enzyme, and 86.5% homology with the CARB-4 enzyme. The sequence of CARB-6 differs from those of CARB-3, CARB-2, CARB-1, N29, and CARB-4 at 15, 16, 17, 19, and 37 amino acid positions, respectively. All these mutations are located in the C-terminal region of the sequence and at the surface of the molecule, according to the crystal structure of the Staphylococcus aureus PC-1 β-lactamase. PMID:9925522
Streptomyces griseus streptomycin phosphotransferase: expression of its gene in Escherichia coli and sequence homology with other antibiotic phosphotransferases and with eukaryotic protein kinases.

PubMed

Lim, C K; Smith, M C; Petty, J; Baumberg, S; Wootton, J C

1989-12-01

The aphD gene of Streptomyces griseus, encoding a streptomycin 6-phosphotransferase (SPH), was sub-cloned in the pBR322-based expression vector pRK9 (which contains the Serratia marcescens trp promoter) with selection for expression of streptomycin resistance in Escherichia coli. Two hybrid plasmids, pCKL631 and pCKL711, were isolated which conferred resistance. Both contained a approximately 2 kbp fragment already suspected to include aphD. The properties of in vitro deletion derivatives of these plasmids were consistent with the presumed location of aphD. In vitro deletion of a sequence including most of the trp promoter largely, but not quite completely, abolished the ability of the plasmid to confer streptomycin resistance, confirming that expression was indeed principally from the trp promoter. A polypeptide of approximately 34.5 kDa was present in minicells containing plasmids that conferred streptomycin resistance, but was absent when the plasmids contained in vitro deletions removing streptomycin resistance. Part of the fragment was sequenced and an open reading frame corresponding to aphD identified. A computer-assisted comparison of the deduced SPH sequence with those of other antibiotic phosphotransferases suggested a common structure A-B-C-D-E, where B and D were conserved between all sequences compared while A, C and E divided between the streptomycin and hygromycin B phosphotransferases on one hand and kanamycin/neomycin ones on the other. A composite sequence data base was searched for homologues to consensus matrices constructed from five approximately 12-residue subsequences within blocks B and D. For one subsequence, corresponding to the N-terminal portion of block D, those sequences from the database that yielded the highest homology scores comprised almost entirely either antibiotic phosphotransferases or eukaryotic protein kinases. Possible evolutionary implications of this homology, previously described by other groups, are discussed.
Phenolic acid esterases, coding sequences and methods

DOEpatents

Blum, David L.; Kataeva, Irina; Li, Xin-Liang; Ljungdahl, Lars G.

2002-01-01

Described herein are four phenolic acid esterases, three of which correspond to domains of previously unknown function within bacterial xylanases, from XynY and XynZ of Clostridium thermocellum and from a xylanase of Ruminococcus. The fourth specifically exemplified xylanase is a protein encoded within the genome of Orpinomyces PC-2. The amino acids of these polypeptides and nucleotide sequences encoding them are provided. Recombinant host cells, expression vectors and methods for the recombinant production of phenolic acid esterases are also provided.
Identification of random nucleic acid sequence aberrations using dual capture probes which hybridize to different chromosome regions

DOEpatents

Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

1998-01-01

A method is provided for detecting nucleic acid sequence aberrations using two immobilization steps. According to the method, a nucleic acid sequence aberration is detected by detecting nucleic acid sequences having both a first nucleic acid sequence type (e.g., from a first chromosome) and a second nucleic acid sequence type (e.g., from a second chromosome), the presence of the first and the second nucleic acid sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. In the method, immobilization of a first hybridization probe is used to isolate a first set of nucleic acids in the sample which contain the first nucleic acid sequence type. Immobilization of a second hybridization probe is then used to isolate a second set of nucleic acids from within the first set of nucleic acids which contain the second nucleic acid sequence type. The second set of nucleic acids are then detected, their presence indicating the presence of a nucleic acid sequence aberration.
The semaphorontic view of homology

PubMed Central

Assis, Leandro C.S.; Rieppel, Olivier

2015-01-01

ABSTRACT The relation of homology is generally characterized as an identity relation, or alternatively as a correspondence relation, both of which are transitive. We use the example of the ontogenetic development and evolutionary origin of the gnathostome jaw to discuss identity and transitivity of the homology relation under the transformationist and emergentist paradigms respectively. Token identity and consequent transitivity of homology relations are shown to be requirements that are too strong to allow the origin of genuine evolutionary novelties. We consequently introduce the concept of compositional identity that is grounded in relations prevailing between parts (organs and organ systems) of a whole (organism). We recognize an ontogenetic identity of parts within a whole throughout the sequence of successive developmental stages of those parts: this is an intra‐organismal character identity maintained throughout developmental trajectory. Correspondingly, we recognize a phylogenetic identity of homologous parts within two or more organisms of different species: this is an inter‐species character identity maintained throughout evolutionary trajectory. These different dimensions of character identity—ontogenetic (through development) and phylogenetic (via shared evolutionary history)—break the transitivity of homology relations. Under the transformationist paradigm, the relation of homology reigns over the entire character (‐state) transformation series, and thus encompasses the plesiomorphic as well as the apomorphic condition of form. In contrast, genuine evolutionary novelties originate not through transformation of ancestral characters (‐states), but instead through deviating developmental trajectories that result in alternate characters. Under the emergentist paradigm, homology is thus synonymous with synapomorphy. J. Exp. Zool. (Mol. Dev. Evol.) 324B: 578–587, 2015. © 2015 The Authors. Journal of Experimental Zoology Part B: Molecular and
Identification of random nucleic acid sequence aberrations using dual capture probes which hybridize to different chromosome regions

DOEpatents

Lucas, J.N.; Straume, T.; Bogen, K.T.

1998-03-24

A method is provided for detecting nucleic acid sequence aberrations using two immobilization steps. According to the method, a nucleic acid sequence aberration is detected by detecting nucleic acid sequences having both a first nucleic acid sequence type (e.g., from a first chromosome) and a second nucleic acid sequence type (e.g., from a second chromosome), the presence of the first and the second nucleic acid sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. In the method, immobilization of a first hybridization probe is used to isolate a first set of nucleic acids in the sample which contain the first nucleic acid sequence type. Immobilization of a second hybridization probe is then used to isolate a second set of nucleic acids from within the first set of nucleic acids which contain the second nucleic acid sequence type. The second set of nucleic acids are then detected, their presence indicating the presence of a nucleic acid sequence aberration. 14 figs.
Homology modeling of Homo sapiens lipoic acid synthase: Substrate docking and insights on its binding mode.

PubMed

Krishnamoorthy, Ezhilarasi; Hassan, Sameer; Hanna, Luke Elizabeth; Padmalayam, Indira; Rajaram, Rama; Viswanathan, Vijay

2017-05-07

Lipoic acid synthase (LIAS) is an iron-sulfur cluster mitochondrial enzyme which catalyzes the final step in the de novo pathway for the biosynthesis of lipoic acid, a potent antioxidant. Recently there has been significant interest in its role in metabolic diseases and its deficiency in LIAS expression has been linked to conditions such as diabetes, atherosclerosis and neonatal-onset epilepsy, suggesting a strong inverse correlation between LIAS reduction and disease status. In this study we use a bioinformatics approach to predict its structure, which would be helpful to understanding its role. A homology model for LIAS protein was generated using X-ray crystallographic structure of Thermosynechococcus elongatus BP-1 (PDB ID: 4U0P). The predicted structure has 93% of the residues in the most favour region of Ramachandran plot. The active site of LIAS protein was mapped and docked with S-Adenosyl Methionine (SAM) using GOLD software. The LIAS-SAM complex was further refined using molecular dynamics simulation within the subsite 1 and subsite 3 of the active site. To the best of our knowledge, this is the first study to report a reliable homology model of LIAS protein. This study will facilitate a better understanding mode of action of the enzyme-substrate complex for future studies in designing drugs that can target LIAS protein. Copyright © 2017 Elsevier Ltd. All rights reserved.
Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification.

PubMed

Sinclair, Robert M; Ravantti, Janne J; Bamford, Dennis H

2017-04-15

Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly
Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification

PubMed Central

Sinclair, Robert M.; Ravantti, Janne J.

2017-01-01

ABSTRACT Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids
Overcoming Sequence Misalignments with Weighted Structural Superposition

PubMed Central

Khazanov, Nickolay A.; Damm-Ganamet, Kelly L.; Quang, Daniel X.; Carlson, Heather A.

2012-01-01

An appropriate structural superposition identifies similarities and differences between homologous proteins that are not evident from sequence alignments alone. We have coupled our Gaussian-weighted RMSD (wRMSD) tool with a sequence aligner and seed extension (SE) algorithm to create a robust technique for overlaying structures and aligning sequences of homologous proteins (HwRMSD). HwRMSD overcomes errors in the initial sequence alignment that would normally propagate into a standard RMSD overlay. SE can generate a corrected sequence alignment from the improved structural superposition obtained by wRMSD. HwRMSD’s robust performance and its superiority over standard RMSD are demonstrated over a range of homologous proteins. Its better overlay results in corrected sequence alignments with good agreement to HOMSTRAD. Finally, HwRMSD is compared to established structural alignment methods: FATCAT, SSM, CE, and Dalilite. Most methods are comparable at placing residue pairs within 2 Å, but HwRMSD places many more residue pairs within 1 Å, providing a clear advantage. Such high accuracy is essential in drug design, where small distances can have a large impact on computational predictions. This level of accuracy is also needed to correct sequence alignments in an automated fashion, especially for omics-scale analysis. HwRMSD can align homologs with low sequence identity and large conformational differences, cases where both sequence-based and structural-based methods may fail. The HwRMSD pipeline overcomes the dependency of structural overlays on initial sequence pairing and removes the need to determine the best sequence-alignment method, substitution matrix, and gap parameters for each unique pair of homologs. PMID:22733542
The limits of protein sequence comparison?

PubMed Central

Pearson, William R; Sierk, Michael L

2010-01-01

Modern sequence alignment algorithms are used routinely to identify homologous proteins, proteins that share a common ancestor. Homologous proteins always share similar structures and often have similar functions. Over the past 20 years, sequence comparison has become both more sensitive, largely because of profile-based methods, and more reliable, because of more accurate statistical estimates. As sequence and structure databases become larger, and comparison methods become more powerful, reliable statistical estimates will become even more important for distinguishing similarities that are due to homology from those that are due to analogy (convergence). The newest sequence alignment methods are more sensitive than older methods, but more accurate statistical estimates are needed for their full power to be realized. PMID:15919194
Homology modeling and docking studies of a Δ9-fatty acid desaturase from a Cold-tolerant Pseudomonas sp. AMS8

PubMed Central

Garba, Lawal; Mohamad Yussoff, Mohamad Ariff; Abd Halim, Khairul Bariyyah; Ishak, Siti Nor Hasmah; Mohamad Ali, Mohd Shukuri; Oslan, Siti Nurbaya

2018-01-01

Membrane-bound fatty acid desaturases perform oxygenated desaturation reactions to insert double bonds within fatty acyl chains in regioselective and stereoselective manners. The Δ9-fatty acid desaturase strictly creates the first double bond between C9 and 10 positions of most saturated substrates. As the three-dimensional structures of the bacterial membrane fatty acid desaturases are not available, relevant information about the enzymes are derived from their amino acid sequences, site-directed mutagenesis and domain swapping in similar membrane-bound desaturases. The cold-tolerant Pseudomonas sp. AMS8 was found to produce high amount of monounsaturated fatty acids at low temperature. Subsequently, an active Δ9-fatty acid desaturase was isolated and functionally expressed in Escherichia coli. In this paper we report homology modeling and docking studies of a Δ9-fatty acid desaturase from a Cold-tolerant Pseudomonas sp. AMS8 for the first time to the best of our knowledge. Three dimensional structure of the enzyme was built using MODELLER version 9.18 using a suitable template. The protein model contained the three conserved-histidine residues typical for all membrane-bound desaturase catalytic activity. The structure was subjected to energy minimization and checked for correctness using Ramachandran plots and ERRAT, which showed a good quality model of 91.6 and 65.0%, respectively. The protein model was used to preform MD simulation and docking of palmitic acid using CHARMM36 force field in GROMACS Version 5 and Autodock tool Version 4.2, respectively. The docking simulation with the lowest binding energy, −6.8 kcal/mol had a number of residues in close contact with the docked palmitic acid namely, Ile26, Tyr95, Val179, Gly180, Pro64, Glu203, His34, His206, His71, Arg182, Thr85, Lys98 and His177. Interestingly, among the binding residues are His34, His71 and His206 from the first, second, and third conserved histidine motif, respectively, which constitute
Sequence similarities and evolutionary relationships of microbial, plant and animal alpha-amylases.

PubMed

Janecek, S

1994-09-01

Amino acid sequence comparison of 37 alpha-amylases from microbial, plant and animal sources was performed to identify their mutual sequence similarities in addition to the five already described conserved regions. These sequence regions were examined from structure/function and evolutionary perspectives. An unrooted evolutionary tree of alpha-amylases was constructed on a subset of 55 residues from the alignment of sequence similarities along with conserved regions. The most important new information extracted from the tree was as follows: (a) the close evolutionary relationship of Alteromonas haloplanctis alpha-amylase (thermolabile enzyme from an antarctic psychrotroph) with the already known group of homologous alpha-amylases from streptomycetes, Thermomonospora curvata, insects and mammals, and (b) the remarkable 40.1% identity between starch-saccharifying Bacillus subtilis alpha-amylase and the enzyme from the ruminal bacterium Butyrivibrio fibrisolvens, an alpha-amylase with an unusually large polypeptide chain (943 residues in the mature enzyme). Due to a very high degree of similarity, the whole amino acid sequences of three groups of alpha-amylases, namely (a) fungi and yeasts, (b) plants, and (c) A. haloplanctis, streptomycetes, T. curvata, insects and mammals, were aligned independently and their unrooted distance trees were calculated using these alignments. Possible rooting of the trees was also discussed. Based on the knowledge of the location of the five disulfide bonds in the structure of pig pancreatic alpha-amylase, the possible disulfide bridges were established for each of these groups of homologous alpha-amylases.
Lion (Panthera leo) and cheetah (Acinonyx jubatus) IFN-gamma sequences.

PubMed

Maas, Miriam; Van Rhijn, Ildiko; Allsopp, Maria T E P; Rutten, Victor P M G

2010-04-15

Cloning and sequencing of the full length lion and cheetah interferon-gamma (IFN-gamma) transcript will enable the expression of the recombinant cytokine, to be used for production of monoclonal antibodies and to set up lion and cheetah-specific IFN-gamma ELISAs. These are relevant in blood-based diagnosis of bovine tuberculosis, an important threat to lions in the Kruger National Park. Alignment of nucleotide and amino acid sequences of lion and cheetah and that of domestic cats showed homologies of 97-100%. Copyright 2009 Elsevier B.V. All rights reserved.
Homology groups for particles on one-connected graphs

NASA Astrophysics Data System (ADS)

MaciÄ Żek, Tomasz; Sawicki, Adam

2017-06-01

We present a mathematical framework for describing the topology of configuration spaces for particles on one-connected graphs. In particular, we compute the homology groups over integers for different classes of one-connected graphs. Our approach is based on some fundamental combinatorial properties of the configuration spaces, Mayer-Vietoris sequences for different parts of configuration spaces, and some limited use of discrete Morse theory. As one of the results, we derive the closed-form formulae for ranks of the homology groups for indistinguishable particles on tree graphs. We also give a detailed discussion of the second homology group of the configuration space of both distinguishable and indistinguishable particles. Our motivation is the search for new kinds of quantum statistics.
Comparative sequence analysis of a region on human chromosome 13q14, frequently deleted in B-cell chronic lymphocytic leukemia, and its homologous region on mouse chromosome 14.

PubMed

Kapanadze, B; Makeeva, N; Corcoran, M; Jareborg, N; Hammarsund, M; Baranova, A; Zabarovsky, E; Vorontsova, O; Merup, M; Gahrton, G; Jansson, M; Yankovsky, N; Einhorn, S; Oscier, D; Grandér, D; Sangfelt, O

2000-12-15

Previous studies have indicated the presence of a putative tumor suppressor gene on human chromosome 13q14, commonly deleted in patients with B-cell chronic lymphocytic leukemia (B-CLL). We have recently identified a minimally deleted region encompassing parts of two adjacent genes, termed LEU1 and LEU2 (leukemia-associated genes 1 and 2), and several additional transcripts. In addition, 50 kb centromeric to this region we have identified another gene, LEU5/RFP2. To elucidate further the complex genomic organization of this region, we have identified, mapped, and sequenced the homologous region in the mouse. Fluorescence in situ hybridization analysis demonstrated that the region maps to mouse chromosome 14. The overall organization and gene order in this region were found to be highly conserved in the mouse. Sequence comparison between the human deletion hotspot region and its homologous mouse region revealed a high degree of sequence conservation with an overall score of 74%. However, our data also show that in terms of transcribed sequences, only two of those, human LEU2 and LEU5/RFP2, are clearly conserved, strengthening the case for these genes as putative candidate B-CLL tumor suppressor genes.
Cyclic azole-homologated peptides from Marine sponges.

PubMed

Molinski, Tadeusz F

2017-12-19

This review discusses the chemistry of cyclic azole-homologated peptides (AHPs) from the marine sponges, Theonella swinhoei, other Theonella species, Calyx spp. and Plakina jamaicensis. The origin, distribution of AHPs and molecular structure elucidations of AHPs are described followed by their biosynthesis, bioactivity, and synthetic efforts towards their total synthesis. Reports of partial and total synthesis of AHPs extend beyond peptide coupling reactions and include creative construction of the non-proteinogenic amino acid components, mainly the homologated heteroaromatic and α-keto-β-amino acids. A useful conclusion is drawn regarding AHPs: despite their rarity, exotic structures and the potent protease inhibitory properties of some members, their synthesis is under-developed and beckons solutions for outstanding problems towards their efficient assembly.
Clustering evolving proteins into homologous families.

PubMed

Chan, Cheong Xin; Mahbob, Maisarah; Ragan, Mark A

2013-04-08

Clustering sequences into groups of putative homologs (families) is a critical first step in many areas of comparative biology and bioinformatics. The performance of clustering approaches in delineating biologically meaningful families depends strongly on characteristics of the data, including content bias and degree of divergence. New, highly scalable methods have recently been introduced to cluster the very large datasets being generated by next-generation sequencing technologies. However, there has been little systematic investigation of how characteristics of the data impact the performance of these approaches. Using clusters from a manually curated dataset as reference, we examined the performance of a widely used graph-based Markov clustering algorithm (MCL) and a greedy heuristic approach (UCLUST) in delineating protein families coded by three sets of bacterial genomes of different G+C content. Both MCL and UCLUST generated clusters that are comparable to the reference sets at specific parameter settings, although UCLUST tends to under-cluster compositionally biased sequences (G+C content 33% and 66%). Using simulated data, we sought to assess the individual effects of sequence divergence, rate heterogeneity, and underlying G+C content. Performance decreased with increasing sequence divergence, decreasing among-site rate variation, and increasing G+C bias. Two MCL-based methods recovered the simulated families more accurately than did UCLUST. MCL using local alignment distances is more robust across the investigated range of sequence features than are greedy heuristics using distances based on global alignment. Our results demonstrate that sequence divergence, rate heterogeneity and content bias can individually and in combination affect the accuracy with which MCL and UCLUST can recover homologous protein families. For application to data that are more divergent, and exhibit higher among-site rate variation and/or content bias, MCL may often be the better
Ancient DNA sequence revealed by error-correcting codes.

PubMed

Brandão, Marcelo M; Spoladore, Larissa; Faria, Luzinete C B; Rocha, Andréa S L; Silva-Filho, Marcio C; Palazzo, Reginaldo

2015-07-10

A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code.

Ancient DNA sequence revealed by error-correcting codes

PubMed Central

Brandão, Marcelo M.; Spoladore, Larissa; Faria, Luzinete C. B.; Rocha, Andréa S. L.; Silva-Filho, Marcio C.; Palazzo, Reginaldo

2015-01-01

A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code. PMID:26159228
The site-specific ribosomal insertion element type II of Bombyx mori (R2Bm) contains the coding sequence for a reverse transcriptase-like enzyme.

PubMed Central

Burke, W D; Calalang, C C; Eickbush, T H

1987-01-01

Two classes of DNA elements interrupt a fraction of the rRNA repeats of Bombyx mori. We have analyzed by genomic blotting and sequence analysis one class of these elements which we have named R2. These elements occupy approximately 9% of the rDNA units of B. mori and appear to be homologous to the type II rDNA insertions detected in Drosophila melanogaster. Approximately 25 copies of R2 exist within the B. mori genome, of which at least 20 are located at a precise location within otherwise typical rDNA units. Nucleotide sequence analysis has revealed that the 4.2-kilobase-pair R2 element has a single large open reading frame, occupying over 82% of the total length of the element. The central region of this 1,151-amino-acid open reading frame shows homology to the reverse transcriptase enzymes found in retroviruses and certain transposable elements. Amino acid homology of this region is highest to the mobile line 1 elements of mammals, followed by the mitochondrial type II introns of fungi, and the pol gene of retroviruses. Less homology exists with transposable elements of D. melanogaster and Saccharomyces cerevisiae. Two additional regions of sequence homology between L1 and R2 elements were also found outside the reverse transcriptase region. We suggest that the R2 elements are retrotransposons that are site specific in their insertion into the genome. Such mobility would enable these elements to occupy a small fraction of the rDNA units of B. mori despite their continual elimination from the rDNA locus by sequence turnover. Images PMID:2439905
Primary structure of rat cardiac beta-adrenergic and muscarinic cholinergic receptors obtained by automated DNA sequence analysis: further evidence for a multigene family.

PubMed Central

Gocayne, J; Robinson, D A; FitzGerald, M G; Chung, F Z; Kerlavage, A R; Lentes, K U; Lai, J; Wang, C D; Fraser, C M; Venter, J C

1987-01-01

Two cDNA clones, lambda RHM-MF and lambda RHB-DAR, encoding the muscarinic cholinergic receptor and the beta-adrenergic receptor, respectively, have been isolated from a rat heart cDNA library. The cDNA clones were characterized by restriction mapping and automated DNA sequence analysis utilizing fluorescent dye primers. The rat heart muscarinic receptor consists of 466 amino acids and has a calculated molecular weight of 51,543. The rat heart beta-adrenergic receptor consists of 418 amino acids and has a calculated molecular weight of 46,890. The two cardiac receptors have substantial amino acid homology (27.2% identity, 50.6% with favored substitutions). The rat cardiac beta receptor has 88.0% homology (92.5% with favored substitutions) with the human brain beta receptor and the rat cardiac muscarinic receptor has 94.6% homology (97.6% with favored substitutions) with the porcine cardiac muscarinic receptor. The muscarinic cholinergic and beta-adrenergic receptors appear to be as conserved as hemoglobin and cytochrome c but less conserved than histones and are clearly members of a multigene family. These data support our hypothesis, based upon biochemical and immunological evidence, that suggests considerable structural homology and evolutionary conservation between adrenergic and muscarinic cholinergic receptors. To our knowledge, this is the first report utilizing automated DNA sequence analysis to determine the structure of a gene. Images PMID:2825184
Primary structure of rat cardiac beta-adrenergic and muscarinic cholinergic receptors obtained by automated DNA sequence analysis: further evidence for a multigene family.

PubMed

Gocayne, J; Robinson, D A; FitzGerald, M G; Chung, F Z; Kerlavage, A R; Lentes, K U; Lai, J; Wang, C D; Fraser, C M; Venter, J C

1987-12-01

Two cDNA clones, lambda RHM-MF and lambda RHB-DAR, encoding the muscarinic cholinergic receptor and the beta-adrenergic receptor, respectively, have been isolated from a rat heart cDNA library. The cDNA clones were characterized by restriction mapping and automated DNA sequence analysis utilizing fluorescent dye primers. The rat heart muscarinic receptor consists of 466 amino acids and has a calculated molecular weight of 51,543. The rat heart beta-adrenergic receptor consists of 418 amino acids and has a calculated molecular weight of 46,890. The two cardiac receptors have substantial amino acid homology (27.2% identity, 50.6% with favored substitutions). The rat cardiac beta receptor has 88.0% homology (92.5% with favored substitutions) with the human brain beta receptor and the rat cardiac muscarinic receptor has 94.6% homology (97.6% with favored substitutions) with the porcine cardiac muscarinic receptor. The muscarinic cholinergic and beta-adrenergic receptors appear to be as conserved as hemoglobin and cytochrome c but less conserved than histones and are clearly members of a multigene family. These data support our hypothesis, based upon biochemical and immunological evidence, that suggests considerable structural homology and evolutionary conservation between adrenergic and muscarinic cholinergic receptors. To our knowledge, this is the first report utilizing automated DNA sequence analysis to determine the structure of a gene.
The catalytic chain of human complement subcomponent C1r. Purification and N-terminal amino acid sequences of the major cyanogen bromide-cleavage fragments.

PubMed

Arlaud, G J; Gagnon, J; Porter, R R

1982-01-01

1. The a- and b-chains of reduced and alkylated human complement subcomponent C1r were separated by high-pressure gel-permeation chromatography and isolated in good yield and in pure form. 2. CNBr cleavage of C1r b-chain yielded eight major peptides, which were purified by gel filtration and high-pressure reversed-phase chromatography. As determined from the sum of their amino acid compositions, these peptides accounted for a minimum molecular weight of 28 000, close to the value 29 100 calculated from the whole b-chain. 3. N-Terminal sequence determinations of C1r b-chain and its CNBr-cleavage peptides allowed the identification of about two-thirds of the amino acids of C1r b-chain. From our results, and on the basis of homology with other serine proteinases, an alignment of the eight CNBr-cleavage peptides from C1r b-chain is proposed. 4. The residues forming the 'charge-relay' system of the active site of serine proteinases (His-57, Asp-102 and Ser-195 in the chymotrypsinogen numbering) are found in the corresponding regions of C1r b-chain, and the amino acid sequence around these residues has been determined. 5. The N-terminal sequence of C1r b-chain has been extended to residue 60 and reveals that C1r b-chain lacks the 'histidine loop', a disulphide bond that is present in all other known serine proteinases.
Conservation of coevolving protein interfaces bridges prokaryote–eukaryote homologies in the twilight zone

PubMed Central

Rodriguez-Rivas, Juan; Marsili, Simone; Juan, David; Valencia, Alfonso

2016-01-01

Protein–protein interactions are fundamental for the proper functioning of the cell. As a result, protein interaction surfaces are subject to strong evolutionary constraints. Recent developments have shown that residue coevolution provides accurate predictions of heterodimeric protein interfaces from sequence information. So far these approaches have been limited to the analysis of families of prokaryotic complexes for which large multiple sequence alignments of homologous sequences can be compiled. We explore the hypothesis that coevolution points to structurally conserved contacts at protein–protein interfaces, which can be reliably projected to homologous complexes with distantly related sequences. We introduce a domain-centered protocol to study the interplay between residue coevolution and structural conservation of protein–protein interfaces. We show that sequence-based coevolutionary analysis systematically identifies residue contacts at prokaryotic interfaces that are structurally conserved at the interface of their eukaryotic counterparts. In turn, this allows the prediction of conserved contacts at eukaryotic protein–protein interfaces with high confidence using solely mutational patterns extracted from prokaryotic genomes. Even in the context of high divergence in sequence (the twilight zone), where standard homology modeling of protein complexes is unreliable, our approach provides sequence-based accurate information about specific details of protein interactions at the residue level. Selected examples of the application of prokaryotic coevolutionary analysis to the prediction of eukaryotic interfaces further illustrate the potential of this approach. PMID:27965389
Conservation of coevolving protein interfaces bridges prokaryote-eukaryote homologies in the twilight zone.

PubMed

Rodriguez-Rivas, Juan; Marsili, Simone; Juan, David; Valencia, Alfonso

2016-12-27

Protein-protein interactions are fundamental for the proper functioning of the cell. As a result, protein interaction surfaces are subject to strong evolutionary constraints. Recent developments have shown that residue coevolution provides accurate predictions of heterodimeric protein interfaces from sequence information. So far these approaches have been limited to the analysis of families of prokaryotic complexes for which large multiple sequence alignments of homologous sequences can be compiled. We explore the hypothesis that coevolution points to structurally conserved contacts at protein-protein interfaces, which can be reliably projected to homologous complexes with distantly related sequences. We introduce a domain-centered protocol to study the interplay between residue coevolution and structural conservation of protein-protein interfaces. We show that sequence-based coevolutionary analysis systematically identifies residue contacts at prokaryotic interfaces that are structurally conserved at the interface of their eukaryotic counterparts. In turn, this allows the prediction of conserved contacts at eukaryotic protein-protein interfaces with high confidence using solely mutational patterns extracted from prokaryotic genomes. Even in the context of high divergence in sequence (the twilight zone), where standard homology modeling of protein complexes is unreliable, our approach provides sequence-based accurate information about specific details of protein interactions at the residue level. Selected examples of the application of prokaryotic coevolutionary analysis to the prediction of eukaryotic interfaces further illustrate the potential of this approach.
Homology of pendrin, sodium-iodide symporter and apical iodide transporter.

PubMed

Benvenga, Salvatore; Guarneri, Fabrizio

2018-06-01

We observed local homology between human pendrin and sodium/iodide symporter (NIS), that was absent in the NIS-homologous sodium/monocarboxylate transporter or apical iodide transporter (AIT) which, however, does not transport iodide. Thus, we analyzed the full proteins. They shared 63 identical and 66 similar residues (overall homology 14.4%, but 21% when omitting intervening sequences of 15 or more residues). Pendrin was more homologous to NIS (25%) than AIT (20%), particularly in the STAS domain (sulfate transporter and antisigma factor antagonist). Homology was concentrated in 11 segments, with 3/11 involving the STAS domain. In 9/11, homology was greater with NIS (45-58.3%) than with AIT (8.3-42.3%); in 4 of these 9 segments, homology was comparable to or greater than that between NIS and AIT (8.3-52.6%). Pendrin residues which are mutated in Pendred's syndrome are identical to those in the aligned position of NIS and AIT. Hypothyroidism-associated pendrin mutations almost always fall within 4/11 segments. These are the first data that show homology between pendrin and NIS, and topographic relationships between pendrin mutations and the hypothyroid phenotype of PDS.
New steroid 5alpha-reductase type I (SRD5A1) homologous sequences on human chromosomes 6 and 8.

PubMed

Eminović, I; Liović, M; Prezelj, J; Kocijancic, A; Rozman, D; Komel, R

2001-01-01

To date, two genes encoding 5alpha-reductase isoenzymes are known (type I, type II), and one type I pseudogene. The divergent localization of these genes and the still not fully understood function of the encoded enzymes as well as the perplexing results we obtained after sequencing PCR-amplified SRD5A1 gene fragments (out of genomic DNA), made us assume that, in addition to the known SRD5A1 gene, one or more different human 5alpha-reductase type I coding genes may exist. Our research provide the first evidence for the existence of two new SRD5A1 related, previously unidentified sequences in the human genome. These sequences which were localized to chromosomes 6 and 8 are highly homologous (> 99%) to SRD5A1, and also do not contain any deletions or insertions that are otherwise a characteristic of the SRD5API pseudogene. Our results imply that these sequences may be either coding parts of yet unknown, active SRD5A1 genes, and/or of previously unidentified pseudogenes. These findings additionally support data of Chen et al. who confirmed the existence of various SRD5A1 proteins in cultured human skin cells.
Nucleic acid sequence detection using multiplexed oligonucleotide PCR

DOEpatents

Nolan, John P [Santa Fe, NM; White, P Scott [Los Alamos, NM

2006-12-26

Methods for rapidly detecting single or multiple sequence alleles in a sample nucleic acid are described. Provided are all of the oligonucleotide pairs capable of annealing specifically to a target allele and discriminating among possible sequences thereof, and ligating to each other to form an oligonucleotide complex when a particular sequence feature is present (or, alternatively, absent) in the sample nucleic acid. The design of each oligonucleotide pair permits the subsequent high-level PCR amplification of a specific amplicon when the oligonucleotide complex is formed, but not when the oligonucleotide complex is not formed. The presence or absence of the specific amplicon is used to detect the allele. Detection of the specific amplicon may be achieved using a variety of methods well known in the art, including without limitation, oligonucleotide capture onto DNA chips or microarrays, oligonucleotide capture onto beads or microspheres, electrophoresis, and mass spectrometry. Various labels and address-capture tags may be employed in the amplicon detection step of multiplexed assays, as further described herein.
Cloning of Giardia lamblia heat shock protein HSP70 homologs: implications regarding origin of eukaryotic cells and of endoplasmic reticulum.

PubMed Central

Gupta, R S; Aitken, K; Falah, M; Singh, B

1994-01-01

The genes for two different 70-kDa heat shock protein (HSP70) homologs have been cloned and sequenced from the protozoan Giardia lamblia. On the basis of their sequence features, one of these genes corresponds to the cytoplasmic form of HSP70. The second gene, on the basis of its characteristic N-terminal hydrophobic signal sequence and C-terminal endoplasmic reticulum (ER) retention sequence (Lys-Asp-Glu-Leu), is the equivalent of ER-resident GRP78 or the Bip family of proteins. Phylogenetic trees based on HSP70 sequences show that G. lamblia homologs show the deepest divergence among eukaryotic species. The identification of a GRP78 or Bip homolog in G. lamblia strongly suggests the existence of ER in this ancient eukaryote. Detailed phylogenetic analyses of HSP70 sequences by boot-strap neighbor-joining and maximum-parsimony methods show that the cytoplasmic and ER homologs form distinct subfamilies that evolved from a common eukaryotic ancestor by gene duplication that occurred very early in the evolution of eukaryotic cells. It is postulated that because of the essential "molecular chaperone" function of these proteins in translocation of other proteins across membranes, duplication of their genes accompanied the evolution of ER or nucleus in the eukaryotic cell ancestor. The presence in all eukaryotic cytoplasmic HSP70 homologs (including the cognate, heat-induced, and ER forms) of a number of autapomorphic sequence signatures that are not present in any prokaryotic or organellar homologs provides strong evidence regarding the monophyletic nature of eukaryotic lineage. Further, all eukaryotic HSP70 homologs share in common with the Gram-negative group of eubacteria a number of sequence features that are not present in any archaebacterium or Gram-positive bacterium, indicating their evolution from this group of organisms. Some implications of these findings regarding the evolution of eukaryotic cells and ER are discussed. Images PMID:8159675
The cDNA sequence of mouse Pgp-1 and homology to human CD44 cell surface antigen and proteoglycan core/link proteins.

PubMed

Wolffe, E J; Gause, W C; Pelfrey, C M; Holland, S M; Steinberg, A D; August, J T

1990-01-05

We describe the isolation and sequencing of a cDNA encoding mouse Pgp-1. An oligonucleotide probe corresponding to the NH2-terminal sequence of the purified protein was synthesized by the polymerase chain reaction and used to screen a mouse macrophage lambda gt11 library. A cDNA clone with an insert of 1.2 kilobases was selected and sequenced. In Northern blot analysis, only cells expressing Pgp-1 contained mRNA species that hybridized with this Pgp-1 cDNA. The nucleotide sequence of the cDNA has a single open reading frame that yields a protein-coding sequence of 1076 base pairs followed by a 132-base pair 3'-untranslated sequence that includes a putative polyadenylation signal but no poly(A) tail. The translated sequence comprises a 13-amino acid signal peptide followed by a polypeptide core of 345 residues corresponding to an Mr of 37,800. Portions of the deduced amino acid sequence were identical to those obtained by amino acid sequence analysis from the purified glycoprotein, confirming that the cDNA encodes Pgp-1. The predicted structure of Pgp-1 includes an NH2-terminal extracellular domain (residues 14-265), a transmembrane domain (residues 266-286), and a cytoplasmic tail (residues 287-358). Portions of the mouse Pgp-1 sequence are highly similar to that of the human CD44 cell surface glycoprotein implicated in cell adhesion. The protein also shows sequence similarity to the proteoglycan tandem repeat sequences found in cartilage link protein and cartilage proteoglycan core protein which are thought to be involved in binding to hyaluronic acid.
Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion

PubMed Central

Thomsen, Martin Christen Frølund; Nielsen, Morten

2012-01-01

Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally valuable information related to amino acid depletion. Seq2logo aims at resolving these issues allowing the user to include sequence weighting to correct for data redundancy, pseudo counts to correct for low number of observations and different logotype representations each capturing different aspects related to amino acid enrichment and depletion. Besides allowing input in the format of peptides and MSA, Seq2Logo accepts input as Blast sequence profiles, providing easy access for non-expert end-users to characterize and identify functionally conserved/variable amino acids in any given protein of interest. The output from the server is a sequence logo and a PSSM. Seq2Logo is available at http://www.cbs.dtu.dk/biotools/Seq2Logo (14 May 2012, date last accessed). PMID:22638583
Sequence of the fhuE outer-membrane receptor gene of Escherichia coli K12 and properties of mutants.

PubMed

Sauer, M; Hantke, K; Braun, V

1990-03-01

The fhuE gene of Escherichia coli codes for an outer-membrane receptor protein required for the uptake of iron(III) via coprogen, ferrioxamine B and rhodotorulic acid. The amino acid sequence, deduced from the nucleotide sequence, consisted of 729 residues. The mature form, composed of 693 residues, has a calculated molecular weight of 77,453, which agrees with the molecular weight of 76,000 determined by polyacrylamide gel electrophoresis. The FhuE protein contains four regions of homology with other TonB-dependent receptors. A valine to proline exchange in the 'TonB box' abolished transport activity. Phenotypic revertants with substitutions of arginine, glutamine, or leucine at the valine position exhibited increasing iron-coprogen transport rates. Point mutations resulting in the replacement of glycine (127) in the second homology region with either alanine, aspartate, valine, asparagine or histidine exhibited decreased transport rates (listed in descending order). A truncated FhuE protein lacking 24 amino acids at the C-terminal end was exported to the periplasm but failed to be inserted into the outer membrane.
Hydrophobic cluster analysis of G protein-coupled receptors: a powerful tool to derive structural and functional information from 2D-representation of protein sequences.

PubMed

Lentes, K U; Mathieu, E; Bischoff, R; Rasmussen, U B; Pavirani, A

1993-01-01

Current methods for comparative analyses of protein sequences are 1D-alignments of amino acid sequences based on the maximization of amino acid identity (homology) and the prediction of secondary structure elements. This method has a major drawback once the amino acid identity drops below 20-25%, since maximization of a homology score does not take into account any structural information. A new technique called Hydrophobic Cluster Analysis (HCA) has been developed by Lemesle-Varloot et al. (Biochimie 72, 555-574), 1990). This consists of comparing several sequences simultaneously and combining homology detection with secondary structure analysis. HCA is primarily based on the detection and comparison of structural segments constituting the hydrophobic core of globular protein domains, with or without transmembrane domains. We have applied HCA to the analysis of different families of G-protein coupled receptors, such as catecholamine receptors as well as peptide hormone receptors. Utilizing HCA the thrombin receptor, a new and as yet unique member of the family of G-protein coupled receptors, can be clearly classified as being closely related to the family of neuropeptide receptors rather than to the catecholamine receptors for which the shape of the hydrophobic clusters and the length of their third cytoplasmic loop are very different. Furthermore, the potential of HCA to predict relationships between new putative and already characterized members of this family of receptors will be presented.
New acute transforming feline retovirus with fms homology specifies a C-terminally truncated version of the c-fms protein that is different from SM-feline sarcoma virus v-fms protein

DOE Office of Scientific and Technical Information (OSTI.GOV)

Besmer, P.; Lader, E.; George, P.C.

1986-10-01

The HZ5-feline sarcoma virus (FeSV) is a new acute transforming feline retrovirus which was isolated from a multicentric fibrosarcoma of a domestic cat. The HZ5-FeSV transforms fibroblasts in vitro and is replication defective. A biologically active integrated HZ5-FeSV provirus was molecularly cloned from cellular DNA of HZ5-FeSV-infected FRE-3A rat cells. The HZ5-FeSV has oncogene homology with the fms sequences of the SM-FeSV. The genome organization of the 8.6-kilobase HZ5-FeSV provirus is 5' ..delta..gag-fms-..delta..pol-..delta..env 3'. The HZ5- and SM-FeSVs display indistinguishable in vitro transformation characteristics, and the structures of the gag-fms transforming genes in the two viruses are very similar. Inmore » the HZ5-FeSV and the SM-FeSV, identical c-fms and feline leukemia virus p10 sequences form the 5' gag-fms junction. With regard to v-fms the two viruses are homologous up to 11 amino acids before the C terminus of the SM-FeSV v-fms protein. In HZ5-FeSV a segment of 362 nucleotides then follows before the 3' recombination site with feline leukemia virus pol. The new 3' v-fms sequence encodes 27 amino acids before reaching a TGA termination signal. The relationship of this sequence with the recently characterized human c-fms sequence has been examined. The 3' HZ5-FeSV v-fms sequence is homologous with 3' c-fms sequences. A frameshift mutation (11-base-pair deletion) was found in the C-terminal fms coding sequence of the HZ5-FeSV. As a result, the HZ5-FeSV v-fms protein is predicted to be a C-terminally truncated version of c-fms. This frameshift mutation may determine the oncogenic properties of v-fms in the HZ5-FeSV.« less
Inverse statistical physics of protein sequences: a key issues review.

PubMed

Cocco, Simona; Feinauer, Christoph; Figliuzzi, Matteo; Monasson, Rémi; Weigt, Martin

2018-03-01

In the course of evolution, proteins undergo important changes in their amino acid sequences, while their three-dimensional folded structure and their biological function remain remarkably conserved. Thanks to modern sequencing techniques, sequence data accumulate at unprecedented pace. This provides large sets of so-called homologous, i.e. evolutionarily related protein sequences, to which methods of inverse statistical physics can be applied. Using sequence data as the basis for the inference of Boltzmann distributions from samples of microscopic configurations or observables, it is possible to extract information about evolutionary constraints and thus protein function and structure. Here we give an overview over some biologically important questions, and how statistical-mechanics inspired modeling approaches can help to answer them. Finally, we discuss some open questions, which we expect to be addressed over the next years.
Inverse statistical physics of protein sequences: a key issues review

NASA Astrophysics Data System (ADS)

Cocco, Simona; Feinauer, Christoph; Figliuzzi, Matteo; Monasson, Rémi; Weigt, Martin

2018-03-01

In the course of evolution, proteins undergo important changes in their amino acid sequences, while their three-dimensional folded structure and their biological function remain remarkably conserved. Thanks to modern sequencing techniques, sequence data accumulate at unprecedented pace. This provides large sets of so-called homologous, i.e. evolutionarily related protein sequences, to which methods of inverse statistical physics can be applied. Using sequence data as the basis for the inference of Boltzmann distributions from samples of microscopic configurations or observables, it is possible to extract information about evolutionary constraints and thus protein function and structure. Here we give an overview over some biologically important questions, and how statistical-mechanics inspired modeling approaches can help to answer them. Finally, we discuss some open questions, which we expect to be addressed over the next years.
Organic acid-tolerant microorganisms and uses thereof for producing organic acids

DOEpatents

Pfleger, Brian Frederick; Begemann, Matthew Brett

2014-05-06

Organic acid-tolerant microorganisms and methods of using same. The organic acid-tolerant microorganisms comprise modifications that reduce or ablate AcsA activity or AcsA homolog activity. The modifications increase tolerance of the microorganisms to such organic acids as 3-hydroxypropionic acid (3HP), acrylic acid, and propionic acid. Further modifications to the microorganisms such as increasing expression of malonyl-CoA reductase and/or acetyl-CoA carboxylase provide or increase the ability of the microorganisms to produce 3HP. Methods of generating an organic acid with the modified microorganisms are provided. Methods of using acsA or homologs thereof as counter-selectable markers include replacing acsA or homologs thereof in cells with genes of interest and selecting for the cells comprising the genes of interest with amounts of organic acids effective to inhibit growth of cells harboring acsA or the homologs.
Molecular cloning in Arabidopsis thaliana of a new protein phosphatase 2C (PP2C) with homology to ABI1 and ABI2.

PubMed

Rodriguez, P L; Leube, M P; Grill, E

1998-11-01

We report the cloning of both the cDNA and the corresponding genomic sequence of a new PP2C from Arabidopsis thaliana, named AtP2C-HA (for homology to ABI1/ABI2). The AtP2C-HA cDNA contains an open reading frame of 1536 bp and encodes a putative protein of 511 amino acids with a predicted molecular mass of 55.7 kDa. The AtP2C-HA protein is composed of two domains, a C-terminal PP2C catalytic domain and a N-terminal extension of ca. 180 amino acid residues. The deduced amino acid sequence is 55% and 54% identical to ABI1 and ABI2, respectively. Comparison of the genomic structure of the ABI1, ABI2 and AtP2C-HA genes suggests that they belong to a multigene family. The expression of the AtP2C-HA gene is up-regulated by abscisic acid (ABA) treatment.

The 86-kilodalton antigen from Schistosoma mansoni is a heat-shock protein homologous to yeast HSP-90.

PubMed

Johnson, K S; Wells, K; Bock, J V; Nene, V; Taylor, D W; Cordingley, J S

1989-08-01

We report the sequence of a cDNA clone encoding an 86-kDa polypeptide antigen (p86) from Schistosoma mansoni. Fusion proteins made in Escherichia coli are recognized by human infection sera. The reading frame of this antigen is highly homologous to those of the large heat-shock proteins of Saccharomyces cerevisiae (HSP90) and Drosophila melanogaster (HSP83). mRNA encoding p86 increases in response to heat shock of adult worms, as does HSP70. Comparisons of the sequences of HSP70 and HSP83 homologues show that these two families of heat-shock proteins are not significantly related except for the last four amino acid residues, which are Glu-Glu-Val-Asp in every case. This sequence is not found at the carboxy terminus of any other protein in the current databases.
Nucleotide sequencing and serological evidence that the recently recognized deer tick virus is a genotype of Powassan virus.

PubMed

Beasley, D W; Suderman, M T; Holbrook, M R; Barrett, A D

2001-11-05

Deer tick virus (DTV) is a recently recognized North American virus isolated from Ixodes dammini ticks. Nucleotide sequencing of fragments of structural and non-structural protein genes suggested that this virus was most closely related to the tick-borne flavivirus Powassan (POW), which causes potentially fatal encephalitis in humans. To determine whether DTV represents a new and distinct member of the Flavivirus genus of the family Flaviviridae, we sequenced the structural protein genes and 5' and 3' non-coding regions of this virus. In addition, we compared the reactivity of DTV and POW in hemagglutination inhibition tests with a panel of polyclonal and monoclonal antisera, and performed cross-neutralization experiments using anti-DTV antisera. Nucleotide sequencing revealed a high degree of homology between DTV and POW at both nucleotide (>80% homology) and amino acid (>90% homology) levels, and the two viruses were indistinguishable in serological assays and mouse neuroinvasiveness. On the basis of these results, we suggest that DTV should be classified as a genotype of POW virus.
AlloRep: A Repository of Sequence, Structural and Mutagenesis Data for the LacI/GalR Transcription Regulators.

PubMed

Sousa, Filipa L; Parente, Daniel J; Shis, David L; Hessman, Jacob A; Chazelle, Allen; Bennett, Matthew R; Teichmann, Sarah A; Swint-Kruse, Liskin

2016-02-22

Protein families evolve functional variation by accumulating point mutations at functionally important amino acid positions. Homologs in the LacI/GalR family of transcription regulators have evolved to bind diverse DNA sequences and allosteric regulatory molecules. In addition to playing key roles in bacterial metabolism, these proteins have been widely used as a model family for benchmarking structural and functional prediction algorithms. We have collected manually curated sequence alignments for >3000 sequences, in vivo phenotypic and biochemical data for >5750 LacI/GalR mutational variants, and noncovalent residue contact networks for 65 LacI/GalR homolog structures. Using this rich data resource, we compared the noncovalent residue contact networks of the LacI/GalR subfamilies to design and experimentally validate an allosteric mutant of a synthetic LacI/GalR repressor for use in biotechnology. The AlloRep database (freely available at www.AlloRep.org) is a key resource for future evolutionary studies of LacI/GalR homologs and for benchmarking computational predictions of functional change. Copyright © 2015 Elsevier Ltd. All rights reserved.
HasF, a TolC-homolog of Serratia marcescens, is involved in energy-dependent efflux.

PubMed

Kumar, Ayush; Worobec, Elizabeth A

2005-06-01

A tolC-like gene (hasF) was identified upon scanning the incomplete database of the S. marcescens genome. This gene was amplified using PCR and cloned in the pUC18 vector to yield pUCHF. Sequencing of the S. marcescens tolC-like hasF gene and subsequent amino acid sequence prediction revealed approximately 80% amino acid homology with the Escherichia coli TolC. A tolC-deficient strain of E. coli (BL923) containing pUCHF/hasF was analyzed for susceptibility to fluoroquinolones (ciprofloxacin, norfloxacin, and ofloxacin), chloramphenicol, sodium dodecyl sulfate (SDS), and ethidium bromide. Antibiotic susceptibility assays of the E. coli tolC-deficient mutant BL923 demonstrated a 64-fold increase in resistance to SDS and ethidium bromide upon introduction of the S. marcescens tolC-like hasF gene. No change was observed for susceptibility to fluoroquinolones and chloramphenicol. Ethidium bromide accumulation assays performed using E. coli BL923:pUCHF established the role of the S. marcescens hasF gene product in proton gradient-dependent efflux.
A new acute transforming feline retrovirus with fms homology specifies a C-terminally truncated version of the c-fms protein that is different from SM-feline sarcoma virus v-fms protein.

PubMed Central

Besmer, P; Lader, E; George, P C; Bergold, P J; Qiu, F H; Zuckerman, E E; Hardy, W D

1986-01-01

The HZ5-feline sarcoma virus (FeSV) is a new acute transforming feline retrovirus which was isolated from a multicentric fibrosarcoma of a domestic cat. The HZ5-FeSV transforms fibroblasts in vitro and is replication defective. A biologically active integrated HZ5-FeSV provirus was molecularly cloned from cellular DNA of HZ5-FeSV-infected FRE-3A rat cells. The HZ5-FeSV has oncogene homology with the fms sequences of the SM-FeSV. The genome organization of the 8.6-kilobase HZ5-FeSV provirus is 5' delta gag-fms-delta pol-delta env 3'. The HZ5-and SM-FeSVs display indistinguishable in vitro transformation characteristics, and the structures of the gag-fms transforming genes in the two viruses are very similar. In the HZ5-FeSV and the SM-FeSV, identical c-fms and feline leukemia virus p10 sequences form the 5' gag-fms junction. With regard to v-fms the two viruses are homologous up to 11 amino acids before the C terminus of the SM-FeSV v-fms protein. In HZ5-FeSV a segment of 362 nucleotides then follows before the 3' recombination site with feline leukemia virus pol. The new 3' v-fms sequence encodes 27 amino acids before reaching a TGA termination signal. The relationship of this sequence with the recently characterized human c-fms sequence has been examined. The 3' HZ5-FeSV v-fms sequence is homologous with 3' c-fms sequences. A frameshift mutation (11-base-pair deletion) was found in the C-terminal fms coding sequence of the HZ5-FeSV. As a result, the HZ5-FeSV v-fms protein is predicted to be a C-terminally truncated version of c-fms. This frameshift mutation may determine the oncogenic properties of v-fms in the HZ5-FeSV. Images PMID:3018286
RAD25 (SSL2), the yeast homolog of the human xeroderma pigmentosum group B DNA repair gene, is essential for viability

DOE Office of Scientific and Technical Information (OSTI.GOV)

Park, E.; Prakash, L.; Guzder, S.N.

1992-12-01

Xeroderma pigmentosum (XP) patients are extremely sensitive to ultraviolet (UV) light and suffer from a high incidence of skin cancers, due to a defect in nucleotide excision repair. The disease is genetically heterogeneous, and seven complementation groups, A-G, have been identified. Homologs of human excision repair genes ERCC1, XPDC/ERCC2, and XPAC have been identified in the yeast Saccharomyces cerevisiae. Since no homolog of human XPBC/ERCC3 existed among the known yeast genes, we cloned the yeast homolog by using XPBC cDNA as a hybridization probe. The yeast homolog, RAD25 (SSL2), encodes a protein of 843 amino acids (M[sub r] 95,356). Themore » RAD25 (SSL2)- and XPCX-encoded proteins share 55% identical and 72% conserved amino acid residues, and the two proteins resemble one another in containing the conserved DNA helicase sequence motifs. A nonsense mutation at codon 799 that deletes the 45 C-terminal amino acid residues in RAD25 (SSL2) confers UV sensitivity. This mutation shows epistasis with genes in the excision repair group, whereas a synergistic increase in UN sensitivity occurs when it is combined with mutations in genes in other DNA repair pathways, indicating that RAD25 (SSL2) functions in excision repair but not in other repair pathways. We also show that RAD25 (SSL2) is an essential gene. A mutation of the Lys[sup 392] residue to arginine in the conserved Walker type A nucleotide-binding motif is lethal, suggesting an essential role of the putative RAD 25 (SSL2) ATPase/DNA helicase activity in viability. 40 refs., 3 figs., 1 tab.« less
WEB-server for search of a periodicity in amino acid and nucleotide sequences

NASA Astrophysics Data System (ADS)

E Frenkel, F.; Skryabin, K. G.; Korotkov, E. V.

2017-12-01

A new web server (http://victoria.biengi.ac.ru/splinter/login.php) was designed and developed to search for periodicity in nucleotide and amino acid sequences. The web server operation is based upon a new mathematical method of searching for multiple alignments, which is founded on the position weight matrices optimization, as well as on implementation of the two-dimensional dynamic programming. This approach allows the construction of multiple alignments of the indistinctly similar amino acid and nucleotide sequences that accumulated more than 1.5 substitutions per a single amino acid or a nucleotide without performing the sequences paired comparisons. The article examines the principles of the web server operation and two examples of studying amino acid and nucleotide sequences, as well as information that could be obtained using the web server.
Detection and isolation of nucleic acid sequences using a bifunctional hybridization probe

DOEpatents

Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

2000-01-01

A method for detecting and isolating a target sequence in a sample of nucleic acids is provided using a bifunctional hybridization probe capable of hybridizing to the target sequence that includes a detectable marker and a first complexing agent capable of forming a binding pair with a second complexing agent. A kit is also provided for detecting a target sequence in a sample of nucleic acids using a bifunctional hybridization probe according to this method.
Nucleotide sequence of a complementary DNA encoding pea cytosolic copper/zinc superoxide dismutase. [Pisum sativum L

DOE Office of Scientific and Technical Information (OSTI.GOV)

White, D.A.; Zilinskas, B.A.

1991-08-01

The authors now report the nucleotide sequence of the cytosolic Cu/Zn SOD cloned from a {lambda}gt11 cDNA library constructed from mRNA extracted from leaves of 7- to 10-d pea seedlings (Pisum sativum L.). The clone was isolated using a 22-base synthetic oligonucleotide complementary to the amino acid sequence CGIIGLQG. This sequence, found at the protein's carboxy terminus, is highly conserved among plant cytosolic Cu/Zn SODs but not chloroplastic Cu/Zn SODs. The 738-base pair sequence contains an open reading frame specifying 152 codons and a predicted M{sub r} of 18,024 D. The deduced amino acid sequence is highly homologous (79-82% identity)more » with the sequences of other known plant cytosolic Cu/Zn SODs but less highly conserved (63-65%) when compared with several chloroplastic Cu/Zn SODs including pea (10).« less
Whole genome analysis of CRISPR Cas9 sgRNA off-target homologies via an efficient computational algorithm.

PubMed

Zhou, Hong; Zhou, Michael; Li, Daisy; Manthey, Joseph; Lioutikova, Ekaterina; Wang, Hong; Zeng, Xiao

2017-11-17

The beauty and power of the genome editing mechanism, CRISPR Cas9 endonuclease system, lies in the fact that it is RNA-programmable such that Cas9 can be guided to any genomic loci complementary to a 20-nt RNA, single guide RNA (sgRNA), to cleave double stranded DNA, allowing the introduction of wanted mutations. Unfortunately, it has been reported repeatedly that the sgRNA can also guide Cas9 to off-target sites where the DNA sequence is homologous to sgRNA. Using human genome and Streptococcus pyogenes Cas9 (SpCas9) as an example, this article mathematically analyzed the probabilities of off-target homologies of sgRNAs and discovered that for large genome size such as human genome, potential off-target homologies are inevitable for sgRNA selection. A highly efficient computationl algorithm was developed for whole genome sgRNA design and off-target homology searches. By means of a dynamically constructed sequence-indexed database and a simplified sequence alignment method, this algorithm achieves very high efficiency while guaranteeing the identification of all existing potential off-target homologies. Via this algorithm, 1,876,775 sgRNAs were designed for the 19,153 human mRNA genes and only two sgRNAs were found to be free of off-target homology. By means of the novel and efficient sgRNA homology search algorithm introduced in this article, genome wide sgRNA design and off-target analysis were conducted and the results confirmed the mathematical analysis that for a sgRNA sequence, it is almost impossible to escape potential off-target homologies. Future innovations on the CRISPR Cas9 gene editing technology need to focus on how to eliminate the Cas9 off-target activity.
Exploring representations of protein structure for automated remote homology detection and mapping of protein structure space

PubMed Central

2014-01-01

Background Due to rapid sequencing of genomes, there are now millions of deposited protein sequences with no known function. Fast sequence-based comparisons allow detecting close homologs for a protein of interest to transfer functional information from the homologs to the given protein. Sequence-based comparison cannot detect remote homologs, in which evolution has adjusted the sequence while largely preserving structure. Structure-based comparisons can detect remote homologs but most methods for doing so are too expensive to apply at a large scale over structural databases of proteins. Recently, fragment-based structural representations have been proposed that allow fast detection of remote homologs with reasonable accuracy. These representations have also been used to obtain linearly-reducible maps of protein structure space. It has been shown, as additionally supported from analysis in this paper that such maps preserve functional co-localization of the protein structure space. Methods Inspired by a recent application of the Latent Dirichlet Allocation (LDA) model for conducting structural comparisons of proteins, we propose higher-order LDA-obtained topic-based representations of protein structures to provide an alternative route for remote homology detection and organization of the protein structure space in few dimensions. Various techniques based on natural language processing are proposed and employed to aid the analysis of topics in the protein structure domain. Results We show that a topic-based representation is just as effective as a fragment-based one at automated detection of remote homologs and organization of protein structure space. We conduct a detailed analysis of the information content in the topic-based representation, showing that topics have semantic meaning. The fragment-based and topic-based representations are also shown to allow prediction of superfamily membership. Conclusions This work opens exciting venues in designing novel
Substitution of a single amino acid residue in the aromatic/arginine selectivity filter alters the transport profiles of tonoplast aquaporin homologs.

PubMed

Azad, Abul Kalam; Yoshikawa, Naoki; Ishikawa, Takahiro; Sawa, Yoshihiro; Shibata, Hitoshi

2012-01-01

Aquaporins are integral membrane proteins that facilitate the transport of water and some small solutes across cellular membranes. X-ray crystallography of aquaporins indicates that four amino acids constitute an aromatic/arginine (ar/R) pore constriction known as the selectivity filter. On the basis of these four amino acids, tonoplast aquaporins called tonoplast intrinsic proteins (TIPs) are divided into three groups in Arabidopsis. Herein, we describe the characterization of two group I TIP1s (TgTIP1;1 and TgTIP1;2) from tulip (Tulipa gesneriana). TgTIP1;1 and TgTIP1;2 have a novel isoleucine in loop E (LE2 position) of the ar/R filter; the residue at LE2 is a valine in all group I TIPs from model plants. The homologs showed mercury-sensitive water channel activity in a fast kinetics swelling assay upon heterologous expression in Pichia pastoris. Heterologous expression of both homologs promoted the growth of P. pastoris on ammonium or urea as sole sources of nitrogen and decreased growth and survival in the presence of H(2)O(2). TgTIP1;1- and TgTIP1;2-mediated H(2)O(2) conductance was demonstrated further by a fluorescence assay. Substitutions in the ar/R selectivity filter of TgTIP1;1 showed that mutants that mimicked the ar/R constriction of group I TIPs could conduct the same substrates that were transported by wild-type TgTIP1;1. In contrast, mutants that mimicked group II TIPs showed no evidence of urea or H(2)O(2) conductance. These results suggest that the amino acid residue at LE2 position is critical for the transport selectivity of the TIP homologs and group I TIPs might have a broader spectrum of substrate selectivity than group II TIPs. Copyright © 2011 Elsevier B.V. All rights reserved.
Nucleotide Sequence and Genetic Structure of a Novel Carbaryl Hydrolase Gene (cehA) from Rhizobium sp. Strain AC100

PubMed Central

Hashimoto, Masayuki; Fukui, Mitsuru; Hayano, Kouichi; Hayatsu, Masahito

2002-01-01

Rhizobium sp. strain AC100, which is capable of degrading carbaryl (1-naphthyl-N-methylcarbamate), was isolated from soil treated with carbaryl. This bacterium hydrolyzed carbaryl to 1-naphthol and methylamine. Carbaryl hydrolase from the strain was purified to homogeneity, and its N-terminal sequence, molecular mass (82 kDa), and enzymatic properties were determined. The purified enzyme hydrolyzed 1-naphthyl acetate and 4-nitrophenyl acetate indicating that the enzyme is an esterase. We then cloned the carbaryl hydrolase gene (cehA) from the plasmid DNA of the strain and determined the nucleotide sequence of the 10-kb region containing cehA. No homologous sequences were found by a database homology search using the nucleotide and deduced amino acid sequences of the cehA gene. Six open reading frames including the cehA gene were found in the 10-kb region, and sequencing analysis shows that the cehA gene is flanked by two copies of insertion sequence-like sequence, suggesting that it makes part of a composite transposon. PMID:11872471
MollDE: a homology modeling framework you can click with.

PubMed

Canutescu, Adrian A; Dunbrack, Roland L

2005-06-15

Molecular Integrated Development Environment (MolIDE) is an integrated application designed to provide homology modeling tools and protocols under a uniform, user-friendly graphical interface. Its main purpose is to combine the most frequent modeling steps in a semi-automatic, interactive way, guiding the user from the target protein sequence to the final three-dimensional protein structure. The typical basic homology modeling process is composed of building sequence profiles of the target sequence family, secondary structure prediction, sequence alignment with PDB structures, assisted alignment editing, side-chain prediction and loop building. All of these steps are available through a graphical user interface. MolIDE's user-friendly and streamlined interactive modeling protocol allows the user to focus on the important modeling questions, hiding from the user the raw data generation and conversion steps. MolIDE was designed from the ground up as an open-source, cross-platform, extensible framework. This allows developers to integrate additional third-party programs to MolIDE. http://dunbrack.fccc.edu/molide/molide.php rl_dunbrack@fccc.edu.
Sequence space and the ongoing expansion of the protein universe.

PubMed

Povolotskaya, Inna S; Kondrashov, Fyodor A

2010-06-17

The need to maintain the structural and functional integrity of an evolving protein severely restricts the repertoire of acceptable amino-acid substitutions. However, it is not known whether these restrictions impose a global limit on how far homologous protein sequences can diverge from each other. Here we explore the limits of protein evolution using sequence divergence data. We formulate a computational approach to study the rate of divergence of distant protein sequences and measure this rate for ancient proteins, those that were present in the last universal common ancestor. We show that ancient proteins are still diverging from each other, indicating an ongoing expansion of the protein sequence universe. The slow rate of this divergence is imposed by the sparseness of functional protein sequences in sequence space and the ruggedness of the protein fitness landscape: approximately 98 per cent of sites cannot accept an amino-acid substitution at any given moment but a vast majority of all sites may eventually be permitted to evolve when other, compensatory, changes occur. Thus, approximately 3.5 x 10(9) yr has not been enough to reach the limit of divergent evolution of proteins, and for most proteins the limit of sequence similarity imposed by common function may not exceed that of random sequences.
Investigating homology between proteins using energetic profiles.

PubMed

Wrabl, James O; Hilser, Vincent J

2010-03-26

Accumulated experimental observations demonstrate that protein stability is often preserved upon conservative point mutation. In contrast, less is known about the effects of large sequence or structure changes on the stability of a particular fold. Almost completely unknown is the degree to which stability of different regions of a protein is generally preserved throughout evolution. In this work, these questions are addressed through thermodynamic analysis of a large representative sample of protein fold space based on remote, yet accepted, homology. More than 3,000 proteins were computationally analyzed using the structural-thermodynamic algorithm COREX/BEST. Estimated position-specific stability (i.e., local Gibbs free energy of folding) and its component enthalpy and entropy were quantitatively compared between all proteins in the sample according to all-vs.-all pairwise structural alignment. It was discovered that the local stabilities of homologous pairs were significantly more correlated than those of non-homologous pairs, indicating that local stability was indeed generally conserved throughout evolution. However, the position-specific enthalpy and entropy underlying stability were less correlated, suggesting that the overall regional stability of a protein was more important than the thermodynamic mechanism utilized to achieve that stability. Finally, two different types of statistically exceptional evolutionary structure-thermodynamic relationships were noted. First, many homologous proteins contained regions of similar thermodynamics despite localized structure change, suggesting a thermodynamic mechanism enabling evolutionary fold change. Second, some homologous proteins with extremely similar structures nonetheless exhibited different local stabilities, a phenomenon previously observed experimentally in this laboratory. These two observations, in conjunction with the principal conclusion that homologous proteins generally conserved local stability, may
Identification and characterization of a triacylglycerol lipase in Arabidopsis homologous to mammalian acid lipases.

PubMed

El-Kouhen, Karim; Blangy, Stéphanie; Ortiz, Emilia; Gardies, Anne-Marie; Ferté, Natalie; Arondel, Vincent

2005-11-07

Triacylglycerol (TAG) lipases have been thoroughly characterized in mammals and microorganisms. By contrast, very little is known on plant TAG lipases. An Arabidopsis cDNA called AtLip1 (At2g15230), which exhibits strong homology to lysosomal acid lipase, was found to drive the synthesis of an active TAG lipase when expressed in the baculovirus system. The lipase had a maximal activity at pH 6 and the specific activity was estimated to be about 45 micromol min(-1) mg(-1) protein using triolein as a substrate. Knock-out mutant analysis showed no phenotype during germination indicating that this enzyme is fully dispensable for TAG storage breakdown during germination. Northern blot analyses indicated that the transcript is present in all tissues tested.
Soil amino acid composition across a boreal forest successional sequence

Treesearch

Nancy R. Werdin-Pfisterer; Knut Kielland; Richard D. Boone

2009-01-01

Soil amino acids are important sources of organic nitrogen for plant nutrition, yet few studies have examined which amino acids are most prevalent in the soil. In this study, we examined the composition, concentration, and seasonal patterns of soil amino acids across a primary successional sequence encompassing a natural gradient of plant productivity and soil...
Preferential amino acid sequences in alumina-catalyzed peptide bond formation.

PubMed

Bujdák, J; Rode, B M

2002-05-21

The catalytic effect of activated alumina on amino acid condensation was investigated. The readiness of amino acids to form peptide sequences was estimated on the basis of the yield of dipeptides and was found to decrease in the order glycine (Gly), alanine (Ala), leucine (Leu), valine (Val), proline (Pro). For example, approximately 15% Gly was converted to the dipeptide (Gly(2)), 5% to cyclic anhydride (cyc(Gly(2))) and small amounts of tri- (Gly(3)) and tetrapeptide (Gly(4)) were formed after 28 days. On the other hand, only trace amounts of Pro(2) were formed from proline under the same conditions. Preferential formation of certain sequences was observed in the mixed reaction systems containing two amino acids. For example, almost ten times more Gly-Val than Val-Gly was formed in the Gly+Val reaction system. The preferred sequences can be explained on the basis of an inductive effect that side groups have on the nucleophilicity and electrophilicity, respectively, of the amino and carboxyl groups. A comparison with published data of amino acid reactions in other reaction systems revealed that the main trends of preferential sequence formation were the same as those described for the salt-induced peptide formation (SIPF) reaction. The results of this work and other previously published papers show that alumina and related mineral surfaces might have played a crucial role in the prebiotic formation of the first peptides on the primitive earth.
Complete Unique Genome Sequence, Expression Profile, and Salivary Gland Tissue Tropism of the Herpesvirus 7 Homolog in Pigtailed Macaques.

PubMed

Staheli, Jeannette P; Dyen, Michael R; Deutsch, Gail H; Basom, Ryan S; Fitzgibbon, Matthew P; Lewis, Patrick; Barcy, Serge

2016-08-01

Human herpesvirus 6A (HHV-6A), HHV-6B, and HHV-7 are classified as roseoloviruses and are highly prevalent in the human population. Roseolovirus reactivation in an immunocompromised host can cause severe pathologies. While the pathogenic potential of HHV-7 is unclear, it can reactivate HHV-6 from latency and thus contributes to severe pathological conditions associated with HHV-6. Because of the ubiquitous nature of roseoloviruses, their roles in such interactions and the resulting pathological consequences have been difficult to study. Furthermore, the lack of a relevant animal model for HHV-7 infection has hindered a better understanding of its contribution to roseolovirus-associated diseases. Using next-generation sequencing analysis, we characterized the unique genome of an uncultured novel pigtailed macaque roseolovirus. Detailed genomic analysis revealed the presence of gene homologs to all 84 known HHV-7 open reading frames. Phylogenetic analysis confirmed that the virus is a macaque homolog of HHV-7, which we have provisionally named Macaca nemestrina herpesvirus 7 (MneHV7). Using high-throughput RNA sequencing, we observed that the salivary gland tissue samples from nine different macaques had distinct MneHV7 gene expression patterns and that the overall number of viral transcripts correlated with viral loads in parotid gland tissue and saliva. Immunohistochemistry staining confirmed that, like HHV-7, MneHV7 exhibits a natural tropism for salivary gland ductal cells. We also observed staining for MneHV7 in peripheral nerve ganglia present in salivary gland tissues, suggesting that HHV-7 may also have a tropism for the peripheral nervous system. Our data demonstrate that MneHV7-infected macaques represent a relevant animal model that may help clarify the causality between roseolovirus reactivation and diseases. Human herpesvirus 6A (HHV-6A), HHV-6B, and HHV-7 are classified as roseoloviruses. We have recently discovered that pigtailed macaques are naturally

Complete Unique Genome Sequence, Expression Profile, and Salivary Gland Tissue Tropism of the Herpesvirus 7 Homolog in Pigtailed Macaques

PubMed Central

Staheli, Jeannette P.; Dyen, Michael R.; Deutsch, Gail H.; Basom, Ryan S.; Fitzgibbon, Matthew P.; Lewis, Patrick

2016-01-01

ABSTRACT Human herpesvirus 6A (HHV-6A), HHV-6B, and HHV-7 are classified as roseoloviruses and are highly prevalent in the human population. Roseolovirus reactivation in an immunocompromised host can cause severe pathologies. While the pathogenic potential of HHV-7 is unclear, it can reactivate HHV-6 from latency and thus contributes to severe pathological conditions associated with HHV-6. Because of the ubiquitous nature of roseoloviruses, their roles in such interactions and the resulting pathological consequences have been difficult to study. Furthermore, the lack of a relevant animal model for HHV-7 infection has hindered a better understanding of its contribution to roseolovirus-associated diseases. Using next-generation sequencing analysis, we characterized the unique genome of an uncultured novel pigtailed macaque roseolovirus. Detailed genomic analysis revealed the presence of gene homologs to all 84 known HHV-7 open reading frames. Phylogenetic analysis confirmed that the virus is a macaque homolog of HHV-7, which we have provisionally named Macaca nemestrina herpesvirus 7 (MneHV7). Using high-throughput RNA sequencing, we observed that the salivary gland tissue samples from nine different macaques had distinct MneHV7 gene expression patterns and that the overall number of viral transcripts correlated with viral loads in parotid gland tissue and saliva. Immunohistochemistry staining confirmed that, like HHV-7, MneHV7 exhibits a natural tropism for salivary gland ductal cells. We also observed staining for MneHV7 in peripheral nerve ganglia present in salivary gland tissues, suggesting that HHV-7 may also have a tropism for the peripheral nervous system. Our data demonstrate that MneHV7-infected macaques represent a relevant animal model that may help clarify the causality between roseolovirus reactivation and diseases. IMPORTANCE Human herpesvirus 6A (HHV-6A), HHV-6B, and HHV-7 are classified as roseoloviruses. We have recently discovered that pigtailed
37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

Code of Federal Regulations, 2010 CFR

2010-07-01

... 37 Patents, Trademarks, and Copyrights 1 2010-07-01 2010-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide and...
37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 37 Patents, Trademarks, and Copyrights 1 2011-07-01 2011-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide and...
Identification of MicroRNAs in Helicoverpa armigera and Spodoptera litura Based on Deep Sequencing and Homology Analysis

PubMed Central

Ge, Xie; Zhang, Yong; Jiang, Jianhao; Zhong, Yi; Yang, Xiaonan; Li, Zhiqian; Huang, Yongping; Tan, Anjiang

2013-01-01

The current identification of microRNAs (miRNAs) in insects is largely dependent on genome sequences. However, the lack of available genome sequences inhibits the identification of miRNAs in various insect species. In this study, we used a miRNA database of the silkworm Bombyx mori as a reference to identify miRNAs in Helicoverpa armigera and Spodoptera litura using deep sequencing and homology analysis. Because all three species belong to the Lepidoptera, the experiment produced reliable results. Our study identified 97 and 91 conserved miRNAs in H. armigera and S. litura, respectively. Using the genome of B. mori and BAC sequences of H. armigera as references, 1 novel miRNA and 8 novel miRNA candidates were identified in H. armigera, and 4 novel miRNA candidates were identified in S. litura. An evolutionary analysis revealed that most of the identified miRNAs were insect-specific, and more than 20 miRNAs were Lepidoptera-specific. The investigation of the expression patterns of miR-2a, miR-34, miR-2796-3p and miR-11 revealed their potential roles in insect development. miRNA target prediction revealed that conserved miRNA target sites exist in various genes in the 3 species. Conserved miRNA target sites for the Hsp90 gene among the 3 species were validated in the mammalian 293T cell line using a dual-luciferase reporter assay. Our study provides a new approach with which to identify miRNAs in insects lacking genome information and contributes to the functional analysis of insect miRNAs. PMID:23289012
Rapid Hypothesis Testing with Candida albicans through Gene Disruption with Short Homology Regions

PubMed Central

Wilson, R. Bryce; Davis, Dana; Mitchell, Aaron P.

1999-01-01

Disruption of newly identified genes in the pathogen Candida albicans is a vital step in determination of gene function. Several gene disruption methods described previously employ long regions of homology flanking a selectable marker. Here, we describe disruption of C. albicans genes with PCR products that have 50 to 60 bp of homology to a genomic sequence on each end of a selectable marker. We used the method to disrupt two known genes, ARG5 and ADE2, and two sequences newly identified through the Candida genome project, HRM101 and ENX3. HRM101 and ENX3 are homologous to genes in the conserved RIM101 (previously called RIM1) and PacC pathways of Saccharomyces cerevisiae and Aspergillus nidulans. We show that three independent hrm101/hrm101 mutants and two independent enx3/enx3 mutants are defective in filamentation on Spider medium. These observations argue that HRM101 and ENX3 sequences are indeed portions of genes and that the respective gene products have related functions. PMID:10074081
Universal sequence map (USM) of arbitrary discrete sequences

PubMed Central

2002-01-01

Background For over a decade the idea of representing biological sequences in a continuous coordinate space has maintained its appeal but not been fully realized. The basic idea is that any sequence of symbols may define trajectories in the continuous space conserving all its statistical properties. Ideally, such a representation would allow scale independent sequence analysis – without the context of fixed memory length. A simple example would consist on being able to infer the homology between two sequences solely by comparing the coordinates of any two homologous units. Results We have successfully identified such an iterative function for bijective mappingψ of discrete sequences into objects of continuous state space that enable scale-independent sequence analysis. The technique, named Universal Sequence Mapping (USM), is applicable to sequences with an arbitrary length and arbitrary number of unique units and generates a representation where map distance estimates sequence similarity. The novel USM procedure is based on earlier work by these and other authors on the properties of Chaos Game Representation (CGR). The latter enables the representation of 4 unit type sequences (like DNA) as an order free Markov Chain transition table. The properties of USM are illustrated with test data and can be verified for other data by using the accompanying web-based tool:http://bioinformatics.musc.edu/~jonas/usm/. Conclusions USM is shown to enable a statistical mechanics approach to sequence analysis. The scale independent representation frees sequence analysis from the need to assume a memory length in the investigation of syntactic rules. PMID:11895567
Identification of a human src homology 2-containing protein-tyrosine-phosphatase: a putative homolog of Drosophila corkscrew.

PubMed Central

Freeman, R M; Plutzky, J; Neel, B G

1992-01-01

src homology 2 (SH2) domains direct binding to specific phosphotyrosyl proteins. Recently, SH2-containing protein-tyrosine-phosphatases (PTPs) were identified. Using degenerate oligonucleotides and the PCR, we have cloned a cDNA for an additional PTP, SH-PTP2, which contains two SH2 domains and is expressed ubiquitously. When expressed in Escherichia coli, SH-PTP2 displays tyrosine-specific phosphatase activity. Strong sequence similarity between SH-PTP2 and the Drosophila gene corkscrew (csw) and their similar patterns of expression suggest that SH-PTP2 is the human corkscrew homolog. Sequence comparisons between SH-PTP2, SH-PTP1, corkscrew, and other SH2-containing proteins suggest the existence of a subfamily of SH2 domains found specifically in PTPs, whereas comparison of the PTP domains of the SH2-containing PTPs with other tyrosine phosphatases suggests the existence of a subfamily of PTPs containing SH2 domains. Since corkscrew, a member of the terminal class signal transduction pathway, acts in concert with D-raf to positively transduce the signal generated by the receptor tyrosine kinase torso, these findings suggest several mechanisms by which SH-PTP2 may participate in mammalian signal transduction. Images PMID:1280823
[Cloning and sequence analysis of 55 K protein of egg drop syndrome virus].

PubMed

Zhu, L; Jin, Q; Zeng, L

1999-06-30

For understanding the characteristics of genomic structure of egg drop syndrome virus(EDSV). Nucleic acid was extracted using routine method from weak virulent strain AA-2 of EDSV isolated from Chinese sick hens. Construction of the whole genomic library was by hydrolysis with Hind III, strand encoding 55 K gene locating in Hind III--A segment was sequenced and analyzed. The open reading frame has a length of 1,014 nt and codes a polypeptide of 337 amino acids with molecular weight of 38,200. Analysis of the amino acid sequence revealed a homology from 25.5%-32.4% to the 55 K protein of human adenovirus types 2, 12, 40, canine adenovirus and fowl adenoviruses of group 1, whereas to ovine adenovirus is 46.4%. The genomic structure of EDSV has some relationship with adenoviruses.
Evolutionary distance from human homologs reflects allergenicity of animal food proteins.

PubMed

Jenkins, John A; Breiteneder, Heimo; Mills, E N Clare

2007-12-01

In silico analysis of allergens can identify putative relationships among protein sequence, structure, and allergenic properties. Such systematic analysis reveals that most plant food allergens belong to a restricted number of protein superfamilies, with pollen allergens behaving similarly. We have investigated the structural relationships of animal food allergens and their evolutionary relatedness to human homologs to define how closely a protein must resemble a human counterpart to lose its allergenic potential. Profile-based sequence homology methods were used to classify animal food allergens into Pfam families, and in silico analyses of their evolutionary and structural relationships were performed. Animal food allergens could be classified into 3 main families--tropomyosins, EF-hand proteins, and caseins--along with 14 minor families each composed of 1 to 3 allergens. The evolutionary relationships of each of these allergen superfamilies showed that in general, proteins with a sequence identity to a human homolog above approximately 62% were rarely allergenic. Single substitutions in otherwise highly conserved regions containing IgE epitopes in EF-hand parvalbumins may modulate allergenicity. These data support the premise that certain protein structures are more allergenic than others. Contrasting with plant food allergens, animal allergens, such as the highly conserved tropomyosins, challenge the capability of the human immune system to discriminate between foreign and self-proteins. Such immune responses run close to becoming autoimmune responses. Exploiting the closeness between animal allergens and their human homologs in the development of recombinant allergens for immunotherapy will need to consider the potential for developing unanticipated autoimmune responses.
The complete amino acid sequence of human skeletal-muscle fructose-bisphosphate aldolase.

PubMed Central

Freemont, P S; Dunbar, B; Fothergill-Gilmore, L A

1988-01-01

The complete amino acid sequence of human skeletal-muscle fructose-bisphosphate aldolase, comprising 363 residues, was determined. The sequence was deduced by automated sequencing of CNBr-cleavage, o-iodosobenzoic acid-cleavage, trypsin-digest and staphylococcal-proteinase-digest fragments. Comparison of the sequence with other class I aldolase sequences shows that the mammalian muscle isoenzyme is one of the most highly conserved enzymes known, with only about 2% of the residues changing per 100 million years. Non-mammalian aldolases appear to be evolving at the same rate as other glycolytic enzymes, with about 4% of the residues changing per 100 million years. Secondary-structure predictions are analysed in an accompanying paper [Sawyer, Fothergill-Gilmore & Freemont (1988) Biochem. J. 249, 789-793]. PMID:3355497
Homology difference analysis of invasive mealybug species Phenacoccus solenopsis Tinsley in Southern China with COI gene sequence variability.

PubMed

Wu, F Z; Ma, J; Hu, X N; Zeng, L

2015-02-01

The mealybug species Phenacoccus solenopsis (P. solenopsis) has caused much agricultural damage since its recent invasion in China. However, the source of this invasion remains unclear. This study uses molecular methods to clarify the relationships among different population of P. solenopsis from China, USA, Pakistan, India, and Vietnam to determine the geographic origin of the introduction of this species into China. P. solenopsis samples were collected from 25 different locations in three provinces of Southern China. Samples from the USA, Pakistan, and Vietnam were also obtained. Parts of the mitochondrial genes for cytochrome oxidase I (COI) were sequenced for each sample. Homologous DNA sequences of the samples from the USA and India were downloaded from Gen Bank. Two haplotypes were found in China. The first was from most samples from the Guangdong, Guangxi, and Hainan populations in the China and Pakistan groups, and the second from a few samples from the Guangdong, Guangxi, Hainan populations in the China, Pakistan, India, and Vietnam groups. As shown in the maximum likelihood of trees constructed using the COI sequences, these samples belonged to two clades. Phylogenetic analysis suggested that most P. solenopsis mealybugs in Southern China are probably closely related to populations in Pakistan. The variation, relationship, expansion, and probable geographic origin of P. solenopsis mealybugs in Southern China are also discussed.
Molecular cloning and nucleotide sequence of the alpha and beta subunits of allophycocyanin from the cyanelle genome of Cyanophora paradoxa.

PubMed Central

Bryant, D A; de Lorimier, R; Lambert, D H; Dubbs, J M; Stirewalt, V L; Stevens, S E; Porter, R D; Tam, J; Jay, E

1985-01-01

The genes for the alpha- and beta-subunit apoproteins of allophycocyanin (AP) were isolated from the cyanelle genome of Cyanophora paradoxa and subjected to nucleotide sequence analysis. The AP beta-subunit apoprotein gene was localized to a 7.8-kilobase-pair Pst I restriction fragment from cyanelle DNA by hybridization with a tetradecameric oligonucleotide probe. Sequence analysis using that oligonucleotide and its complement as primers for the dideoxy chain-termination sequencing method confirmed the presence of both AP alpha- and beta-subunit genes on this restriction fragment. Additional oligonucleotide primers were synthesized as sequencing progressed and were used to determine rapidly the nucleotide sequence of a 1336-base-pair region of this cloned fragment. This strategy allowed the sequencing to be completed without a detailed restriction map and without extensive and time-consuming subcloning. The sequenced region contains two open reading frames whose deduced amino acid sequences are 81-85% homologous to cyanobacterial and red algal AP subunits whose amino acid sequences have been determined. The two open reading frames are in the same orientation and are separated by 39 base pairs. AP alpha is 5' to AP beta and both coding sequences are preceded by a polypurine, Shine-Dalgarno-type sequence. Sequences upstream from AP alpha closely resemble the Escherichia coli consensus promoter sequences and also show considerable homology to promoter sequences for several chloroplast-encoded psbA genes. A 56-base-pair palindromic sequence downstream from the AP beta gene could play a role in the termination of transcription or translation. The allophycocyanin apoprotein subunit genes are located on the large single-copy region of the cyanelle genome. PMID:2987916
Chromosome specific repetitive DNA sequences

DOEpatents

Moyzis, Robert K.; Meyne, Julianne

1991-01-01

A method is provided for determining specific nucleotide sequences useful in forming a probe which can identify specific chromosomes, preferably through in situ hybridization within the cell itself. In one embodiment, chromosome preferential nucleotide sequences are first determined from a library of recombinant DNA clones having families of repetitive sequences. Library clones are identified with a low homology with a sequence of repetitive DNA families to which the first clones respectively belong and variant sequences are then identified by selecting clones having a pattern of hybridization with genomic DNA dissimilar to the hybridization pattern shown by the respective families. In another embodiment, variant sequences are selected from a sequence of a known repetitive DNA family. The selected variant sequence is classified as chromosome specific, chromosome preferential, or chromosome nonspecific. Sequences which are classified as chromosome preferential are further sequenced and regions are identified having a low homology with other regions of the chromosome preferential sequence or with known sequences of other family me This invention is the result of a contract with the Department of Energy (Contract No. W-7405-ENG-36).
Characterization of tannase protein sequences of bacteria and fungi: an in silico study.

PubMed

Banerjee, Amrita; Jana, Arijit; Pati, Bikash R; Mondal, Keshab C; Das Mohapatra, Pradeep K

2012-04-01

The tannase protein sequences of 149 bacteria and 36 fungi were retrieved from NCBI database. Among them only 77 bacterial and 31 fungal tannase sequences were taken which have different amino acid compositions. These sequences were analysed for different physical and chemical properties, superfamily search, multiple sequence alignment, phylogenetic tree construction and motif finding to find out the functional motif and the evolutionary relationship among them. The superfamily search for these tannase exposed the occurrence of proline iminopeptidase-like, biotin biosynthesis protein BioH, O-acetyltransferase, carboxylesterase/thioesterase 1, carbon-carbon bond hydrolase, haloperoxidase, prolyl oligopeptidase, C-terminal domain and mycobacterial antigens families and alpha/beta hydrolase superfamily. Some bacterial and fungal sequence showed similarity with different families individually. The multiple sequence alignment of these tannase protein sequences showed conserved regions at different stretches with maximum homology from amino acid residues 389-469 and 482-523 which could be used for designing degenerate primers or probes specific for tannase producing bacterial and fungal species. Phylogenetic tree showed two different clusters; one has only bacteria and another have both fungi and bacteria showing some relationship between these different genera. Although in second cluster near about all fungal species were found together in a corner which indicates the sequence level similarity among fungal genera. The distributions of fourteen motifs analysis revealed Motif 1 with a signature amino acid sequence of 29 amino acids, i.e. GCSTGGREALKQAQRWPHDYDGIIANNPA, was uniformly observed in 83.3 % of studied tannase sequences representing its participation with the structure and enzymatic function.
Assessment of Homology Templates and an Anesthetic Binding Site within the γ-Aminobutyric Acid Receptor

PubMed Central

Bertaccini, Edward J.; Yoluk, Ozge; Lindahl, Erik R.; Trudell, James R.

2013-01-01

Background Anesthetics mediate portions of their activity via modulation of the γ-aminobutyric acid receptor (GABAaR). While its molecular structure remains unknown, significant progress has been made towards understanding its interactions with anesthetics via molecular modeling. Methods The structure of the torpedo acetylcholine receptor (nAChRα), the structures of the α4 and β2 subunits of the human nAChR, the structures of the eukaryotic glutamate-gated chloride channel (GluCl), and the prokaryotic pH sensing channels, from Gloeobacter violaceus and Erwinia chrysanthemi, were aligned with the SAlign and 3DMA algorithms. A multiple sequence alignment from these structures and those of the GABAaR was performed with ClustalW. The Modeler and Rosetta algorithms independently created three-dimensional constructs of the GABAaR from the GluCl template. The CDocker algorithm docked a congeneric series of propofol derivatives into the binding pocket and scored calculated binding affinities for correlation with known GABAaR potentiation EC50’s. Results Multiple structure alignments of templates revealed a clear consensus of residue locations relevant to anesthetic effects except for torpedo nAChR. Within the GABAaR models generated from GluCl, the residues notable for modulating anesthetic action within transmembrane segments 1, 2, and 3 converged on the intersubunit interface between alpha and beta subunits. Docking scores of a propofol derivative series into this binding site showed strong linear correlation with GABAaR potentiation EC50. Conclusion Consensus structural alignment based on homologous templates revealed an intersubunit anesthetic binding cavity within the transmembrane domain of the GABAaR, which showed correlation of ligand docking scores with experimentally measured GABAaR potentiation. PMID:23770602
Assessment of homology templates and an anesthetic binding site within the γ-aminobutyric acid receptor.

PubMed

Bertaccini, Edward J; Yoluk, Ozge; Lindahl, Erik R; Trudell, James R

2013-11-01

Anesthetics mediate portions of their activity via modulation of the γ-aminobutyric acid receptor (GABAaR). Although its molecular structure remains unknown, significant progress has been made toward understanding its interactions with anesthetics via molecular modeling. The structure of the torpedo acetylcholine receptor (nAChRα), the structures of the α4 and β2 subunits of the human nAChR, the structures of the eukaryotic glutamate-gated chloride channel (GluCl), and the prokaryotic pH-sensing channels, from Gloeobacter violaceus and Erwinia chrysanthemi, were aligned with the SAlign and 3DMA algorithms. A multiple sequence alignment from these structures and those of the GABAaR was performed with ClustalW. The Modeler and Rosetta algorithms independently created three-dimensional constructs of the GABAaR from the GluCl template. The CDocker algorithm docked a congeneric series of propofol derivatives into the binding pocket and scored calculated binding affinities for correlation with known GABAaR potentiation EC50s. Multiple structure alignments of templates revealed a clear consensus of residue locations relevant to anesthetic effects except for torpedo nAChR. Within the GABAaR models generated from GluCl, the residues notable for modulating anesthetic action within transmembrane segments 1, 2, and 3 converged on the intersubunit interface between α and β subunits. Docking scores of a propofol derivative series into this binding site showed strong linear correlation with GABAaR potentiation EC50. Consensus structural alignment based on homologous templates revealed an intersubunit anesthetic binding cavity within the transmembrane domain of the GABAaR, which showed a correlation of ligand docking scores with experimentally measured GABAaR potentiation.
Sequences Of Amino Acids For Human Serum Albumin

NASA Technical Reports Server (NTRS)

Carter, Daniel C.

1992-01-01

Sequences of amino acids defined for use in making polypeptides one-third to one-sixth as large as parent human serum albumin molecule. Smaller, chemically stable peptides have diverse applications including service as artificial human serum and as active components of biosensors and chromatographic matrices. In applications involving production of artificial sera from new sequences, little or no concern about viral contaminants. Smaller genetically engineered polypeptides more easily expressed and produced in large quantities, making commercial isolation and production more feasible and profitable.
MetaGO: Predicting Gene Ontology of Non-homologous Proteins Through Low-Resolution Protein Structure Prediction and Protein-Protein Network Mapping.

PubMed

Zhang, Chengxin; Zheng, Wei; Freddolino, Peter L; Zhang, Yang

2018-03-10

Homology-based transferal remains the major approach to computational protein function annotations, but it becomes increasingly unreliable when the sequence identity between query and template decreases below 30%. We propose a novel pipeline, MetaGO, to deduce Gene Ontology attributes of proteins by combining sequence homology-based annotation with low-resolution structure prediction and comparison, and partner's homology-based protein-protein network mapping. The pipeline was tested on a large-scale set of 1000 non-redundant proteins from the CAFA3 experiment. Under the stringent benchmark conditions where templates with >30% sequence identity to the query are excluded, MetaGO achieves average F-measures of 0.487, 0.408, and 0.598, for Molecular Function, Biological Process, and Cellular Component, respectively, which are significantly higher than those achieved by other state-of-the-art function annotations methods. Detailed data analysis shows that the major advantage of the MetaGO lies in the new functional homolog detections from partner's homology-based network mapping and structure-based local and global structure alignments, the confidence scores of which can be optimally combined through logistic regression. These data demonstrate the power of using a hybrid model incorporating protein structure and interaction networks to deduce new functional insights beyond traditional sequence homology-based referrals, especially for proteins that lack homologous function templates. The MetaGO pipeline is available at http://zhanglab.ccmb.med.umich.edu/MetaGO/. Copyright © 2018. Published by Elsevier Ltd.
Comparative characterization of random-sequence proteins consisting of 5, 12, and 20 kinds of amino acids

PubMed Central

Tanaka, Junko; Doi, Nobuhide; Takashima, Hideaki; Yanagawa, Hiroshi

2010-01-01

Screening of functional proteins from a random-sequence library has been used to evolve novel proteins in the field of evolutionary protein engineering. However, random-sequence proteins consisting of the 20 natural amino acids tend to aggregate, and the occurrence rate of functional proteins in a random-sequence library is low. From the viewpoint of the origin of life, it has been proposed that primordial proteins consisted of a limited set of amino acids that could have been abundantly formed early during chemical evolution. We have previously found that members of a random-sequence protein library constructed with five primitive amino acids show high solubility (Doi et al., Protein Eng Des Sel 2005;18:279–284). Although such a library is expected to be appropriate for finding functional proteins, the functionality may be limited, because they have no positively charged amino acid. Here, we constructed three libraries of 120-amino acid, random-sequence proteins using alphabets of 5, 12, and 20 amino acids by preselection using mRNA display (to eliminate sequences containing stop codons and frameshifts) and characterized and compared the structural properties of random-sequence proteins arbitrarily chosen from these libraries. We found that random-sequence proteins constructed with the 12-member alphabet (including five primitive amino acids and positively charged amino acids) have higher solubility than those constructed with the 20-member alphabet, though other biophysical properties are very similar in the two libraries. Thus, a library of moderate complexity constructed from 12 amino acids may be a more appropriate resource for functional screening than one constructed from 20 amino acids. PMID:20162614
An ectromelia virus profilin homolog interacts with cellular tropomyosin and viral A-type inclusion protein.

PubMed

Butler-Cole, Christine; Wagner, Mary J; Da Silva, Melissa; Brown, Gordon D; Burke, Robert D; Upton, Chris

2007-07-24

Profilins are critical to cytoskeletal dynamics in eukaryotes; however, little is known about their viral counterparts. In this study, a poxviral profilin homolog, ectromelia virus strain Moscow gene 141 (ECTV-PH), was investigated by a variety of experimental and bioinformatics techniques to characterize its interactions with cellular and viral proteins. Profilin-like proteins are encoded by all orthopoxviruses sequenced to date, and share over 90% amino acid (aa) identity. Sequence comparisons show highest similarity to mammalian type 1 profilins; however, a conserved 3 aa deletion in mammalian type 3 and poxviral profilins suggests that these homologs may be more closely related. Structural analysis shows that ECTV-PH can be successfully modelled onto both the profilin 1 crystal structure and profilin 3 homology model, though few of the surface residues thought to be required for binding actin, poly(L-proline), and PIP2 are conserved. Immunoprecipitation and mass spectrometry identified two proteins that interact with ECTV-PH within infected cells: alpha-tropomyosin, a 38 kDa cellular actin-binding protein, and the 84 kDa product of vaccinia virus strain Western Reserve (VACV-WR) 148, which is the truncated VACV counterpart of the orthopoxvirus A-type inclusion (ATI) protein. Western and far-western blots demonstrated that the interaction with alpha-tropomyosin is direct, and immunofluorescence experiments suggest that ECTV-PH and alpha-tropomyosin may colocalize to structures that resemble actin tails and cellular protrusions. Sequence comparisons of the poxviral ATI proteins show that although full-length orthologs are only present in cowpox and ectromelia viruses, an ~ 700 aa truncated ATI protein is conserved in over 90% of sequenced orthopoxviruses. Immunofluorescence studies indicate that ECTV-PH localizes to cytoplasmic inclusion bodies formed by both truncated and full-length versions of the viral ATI protein. Furthermore, colocalization of ECTV-PH and

37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

Code of Federal Regulations, 2011 CFR

2011-07-01

... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data shall...
Evaluation of targeted exome sequencing for 28 protein-based blood group systems, including the homologous gene systems, for blood group genotyping.

PubMed

Schoeman, Elizna M; Lopez, Genghis H; McGowan, Eunike C; Millard, Glenda M; O'Brien, Helen; Roulis, Eileen V; Liew, Yew-Wah; Martin, Jacqueline R; McGrath, Kelli A; Powley, Tanya; Flower, Robert L; Hyland, Catherine A

2017-04-01

Blood group single nucleotide polymorphism genotyping probes for a limited range of polymorphisms. This study investigated whether massively parallel sequencing (also known as next-generation sequencing), with a targeted exome strategy, provides an extended blood group genotype and the extent to which massively parallel sequencing correctly genotypes in homologous gene systems, such as RH and MNS. Donor samples (n = 28) that were extensively phenotyped and genotyped using single nucleotide polymorphism typing, were analyzed using the TruSight One Sequencing Panel and MiSeq platform. Genes for 28 protein-based blood group systems, GATA1, and KLF1 were analyzed. Copy number variation analysis was used to characterize complex structural variants in the GYPC and RH systems. The average sequencing depth per target region was 66.2 ± 39.8. Each sample harbored on average 43 ± 9 variants, of which 10 ± 3 were used for genotyping. For the 28 samples, massively parallel sequencing variant sequences correctly matched expected sequences based on single nucleotide polymorphism genotyping data. Copy number variation analysis defined the Rh C/c alleles and complex RHD hybrids. Hybrid RHD*D-CE-D variants were correctly identified, but copy number variation analysis did not confidently distinguish between D and CE exon deletion versus rearrangement. The targeted exome sequencing strategy employed extended the range of blood group genotypes detected compared with single nucleotide polymorphism typing. This single-test format included detection of complex MNS hybrid cases and, with copy number variation analysis, defined RH hybrid genes along with the RHCE*C allele hitherto difficult to resolve by variant detection. The approach is economical compared with whole-genome sequencing and is suitable for a red blood cell reference laboratory setting. © 2017 AABB.
37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

Code of Federal Regulations, 2011 CFR

2011-07-01

... and/or amino acid sequences as part of the application. 1.823 Section 1.823 Patents, Trademarks, and... Amino Acid Sequences § 1.823 Requirements for nucleotide and/or amino acid sequences as part of the... incorporation-by-reference of the Sequence Listing as required by § 1.52(e)(5). The presentation of the...
37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

Code of Federal Regulations, 2013 CFR

2013-07-01

... and/or amino acid sequences as part of the application. 1.823 Section 1.823 Patents, Trademarks, and... Amino Acid Sequences § 1.823 Requirements for nucleotide and/or amino acid sequences as part of the... incorporation-by-reference of the Sequence Listing as required by § 1.52(e)(5). The presentation of the...
37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

Code of Federal Regulations, 2012 CFR

2012-07-01

... and/or amino acid sequences as part of the application. 1.823 Section 1.823 Patents, Trademarks, and... Amino Acid Sequences § 1.823 Requirements for nucleotide and/or amino acid sequences as part of the... incorporation-by-reference of the Sequence Listing as required by § 1.52(e)(5). The presentation of the...
37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

Code of Federal Regulations, 2010 CFR

2010-07-01

... and/or amino acid sequences as part of the application. 1.823 Section 1.823 Patents, Trademarks, and... Amino Acid Sequences § 1.823 Requirements for nucleotide and/or amino acid sequences as part of the... incorporation-by-reference of the Sequence Listing as required by § 1.52(e)(5). The presentation of the...
37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

Code of Federal Regulations, 2014 CFR

2014-07-01

... and/or amino acid sequences as part of the application. 1.823 Section 1.823 Patents, Trademarks, and... Amino Acid Sequences § 1.823 Requirements for nucleotide and/or amino acid sequences as part of the... incorporation-by-reference of the Sequence Listing as required by § 1.52(e)(5). The presentation of the...
Girardia dorotocephala transcriptome sequence, assembly, and validation through characterization of piwi homologs and stem cell progeny markers.

PubMed

Almazan, Eugene Matthew P; Lesko, Sydney L; Markey, Michael P; Rouhana, Labib

2018-01-15

Planarian flatworms are popular models for the study of regeneration and stem cell biology in vivo. Technical advances and increased availability of genetic information have fueled the discovery of molecules responsible for stem cell pluripotency and regeneration in flatworms. Unfortunately, most of the planarian research performed worldwide utilizes species that are not natural habitants of North America, which limits their availability to newcomer laboratories and impedes their distribution for educational activities. In order to circumvent these limitations and increase the genetic information available for comparative studies, we sequenced the transcriptome of Girardia dorotocephala, a planarian species pandemic and commercially available in North America. A total of 254,802,670 paired sequence reads were obtained from RNA extracted from intact individuals, regenerating fragments, as well as freshly excised auricles of a clonal line of G. dorotocephala (MA-C2), and used for de novo assembly of its transcriptome. The resulting transcriptome draft was validated through functional analysis of genetic markers of stem cells and their progeny in G. dorotocephala. Akin to orthologs in other planarian species, G. dorotocephala Piwi1 (GdPiwi1) was found to be a robust marker of the planarian stem cell population and GdPiwi2 an essential component for stem cell-driven regeneration. Identification of G. dorotocephala homologs of the early stem cell descendent marker PROG-1 revealed a family of lysine-rich proteins expressed during epithelial cell differentiation. Sequences from the MA-C2 transcriptome were found to be 98-99% identical to nucleotide sequences from G. dorotocephala populations with different chromosomal number, demonstrating strong conservation regardless of karyotype evolution. Altogether, this work establishes G. dorotocephala as a viable and accessible option for analysis of gene function in North America. Copyright © 2017 The Authors. Published by
Girardia dorotocephala transcriptome sequence, assembly, and validation through characterization of piwi homologs and stem cell progeny markers

PubMed Central

Almazan, Eugene Matthew P.; Lesko, Sydney L.; Markey, Michael P.; Rouhana, Labib

2017-01-01

Planarian flatworms are popular models for the study of regeneration and stem cell biology in vivo. Technical advances and increased availability of genetic information have fueled the discovery of molecules responsible for stem cell pluripotency and regeneration in flatworms. Unfortunately, most of the planarian research performed worldwide utilizes species that are not natural habitants of North America, which limits their availability to newcomer laboratories and impedes their distribution for educational activities. In order to circumvent these limitations and increase the genetic information available for comparative studies, we sequenced the transcriptome of Girardia dorotocephala, a planarian species pandemic and commercially available in North America. A total of 254,802,670 paired sequence reads were obtained from RNA extracted from intact individuals, regenerating fragments, as well as freshly excised auricles of a clonal line of G. dorotocephala (MA-C2), and used for de novo assembly of its transcriptome. The resulting transcriptome draft was validated through functional analysis of genetic markers of stem cells and their progeny in G. dorotocephala. Akin to orthologs in other planarian species, G. dorotocephala Piwi1 (GdPiwi1) was found to be a robust marker of the planarian stem cell population and GdPiwi2 an essential component for stem cell-driven regeneration. Identification of G. dorotocephala homologs of the early stem cell descendent marker PROG-1 revealed a family of lysine-rich proteins expressed during epithelial cell differentiation. Sequences from the MA-C2 transcriptome were found to be 98–99% identical to nucleotide sequences from G. dorotocephala populations with different chromosomal number, demonstrating strong conservation regardless of karyotype evolution. Altogether, this work establishes G. dorotocephala as a viable and accessible option for analysis of gene function in North America. PMID:28774726
Nucleotide sequence analysis of the gene encoding the Deinococcus radiodurans surface protein, derived amino acid sequence, and complementary protein chemical studies

DOE Office of Scientific and Technical Information (OSTI.GOV)

Peters, J.; Peters, M.; Lottspeich, F.

1987-11-01

The complete nucleotide sequence of the gene encoding the surface (hexagonally packed intermediate (HPI))-layer polypeptide of Deinococcus radiodurans Sark was determined and found to encode a polypeptide of 1036 amino acids. Amino acid sequence analysis of about 30% of the residues revealed that the mature polypeptide consists of at least 978 amino acids. The N terminus was blocked to Edman degradation. The results of proteolytic modification of the HPI layer in situ and M/sub r/ estimations of the HPI polypeptide expressed in Escherichia coli indicated that there is a leader sequence. The N-terminal region contained a very high percentage (29%)more » of threonine and serine, including a cluster of nine consecutive serine or threonine residues, whereas a stretch near the C terminus was extremely rich in aromatic amino acids (29%). The protein contained at least two disulfide bridges, as well as tightly bound reducing sugars and fatty acids.« less
Sequencing, Analysis, and Annotation of Expressed Sequence Tags for Camelus dromedarius

PubMed Central

Al-Swailem, Abdulaziz M.; Shehata, Maher M.; Abu-Duhier, Faisel M.; Al-Yamani, Essam J.; Al-Busadah, Khalid A.; Al-Arawi, Mohammed S.; Al-Khider, Ali Y.; Al-Muhaimeed, Abdullah N.; Al-Qahtani, Fahad H.; Manee, Manee M.; Al-Shomrani, Badr M.; Al-Qhtani, Saad M.; Al-Harthi, Amer S.; Akdemir, Kadir C.; Otu, Hasan H.

2010-01-01

Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and ∼40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism. PMID:20502665
A Comprehensive Strategy for Accurate Mutation Detection of the Highly Homologous PMS2.

PubMed

Li, Jianli; Dai, Hongzheng; Feng, Yanming; Tang, Jia; Chen, Stella; Tian, Xia; Gorman, Elizabeth; Schmitt, Eric S; Hansen, Terah A A; Wang, Jing; Plon, Sharon E; Zhang, Victor Wei; Wong, Lee-Jun C

2015-09-01

Germline mutations in the DNA mismatch repair gene PMS2 underlie the cancer susceptibility syndrome, Lynch syndrome. However, accurate molecular testing of PMS2 is complicated by a large number of highly homologous sequences. To establish a comprehensive approach for mutation detection of PMS2, we have designed a strategy combining targeted capture next-generation sequencing (NGS), multiplex ligation-dependent probe amplification, and long-range PCR followed by NGS to simultaneously detect point mutations and copy number changes of PMS2. Exonic deletions (E2 to E9, E5 to E9, E8, E10, E14, and E1 to E15), duplications (E11 to E12), and a nonsense mutation, p.S22*, were identified. Traditional multiplex ligation-dependent probe amplification and Sanger sequencing approaches cannot differentiate the origin of the exonic deletions in the 3' region when PMS2 and PMS2CL share identical sequences as a result of gene conversion. Our approach allows unambiguous identification of mutations in the active gene with a straightforward long-range-PCR/NGS method. Breakpoint analysis of multiple samples revealed that recurrent exon 14 deletions are mediated by homologous Alu sequences. Our comprehensive approach provides a reliable tool for accurate molecular analysis of genes containing multiple copies of highly homologous sequences and should improve PMS2 molecular analysis for patients with Lynch syndrome. Copyright © 2015 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.
Finding similar nucleotide sequences using network BLAST searches.

PubMed

Ladunga, Istvan

2009-06-01

The Basic Local Alignment Search Tool (BLAST) is a keystone of bioinformatics due to its performance and user-friendliness. Beginner and intermediate users will learn how to design and submit blastn and Megablast searches on the Web pages at the National Center for Biotechnology Information. We map nucleic acid sequences to genomes, find identical or similar mRNA, expressed sequence tag, and noncoding RNA sequences, and run Megablast searches, which are much faster than blastn. Understanding results is assisted by taxonomy reports, genomic views, and multiple alignments. We interpret expected frequency thresholds, biological significance, and statistical significance. Weak hits provide no evidence, but hints for further analyses. We find genes that may code for homologous proteins by translated BLAST. We reduce false positives by filtering out low-complexity regions. Parsed BLAST results can be integrated into analysis pipelines. Links in the output connect to Entrez, PUBMED, structural, sequence, interaction, and expression databases. This facilitates integration with a wide spectrum of biological knowledge.
Restriction and Sequence Alterations Affect DNA Uptake Sequence-Dependent Transformation in Neisseria meningitidis

PubMed Central

Ambur, Ole Herman; Frye, Stephan A.; Nilsen, Mariann; Hovland, Eirik; Tønjum, Tone

2012-01-01

Transformation is a complex process that involves several interactions from the binding and uptake of naked DNA to homologous recombination. Some actions affect transformation favourably whereas others act to limit it. Here, meticulous manipulation of a single type of transforming DNA allowed for quantifying the impact of three different mediators of meningococcal transformation: NlaIV restriction, homologous recombination and the DNA Uptake Sequence (DUS). In the wildtype, an inverse relationship between the transformation frequency and the number of NlaIV restriction sites in DNA was observed when the transforming DNA harboured a heterologous region for selection (ermC) but not when the transforming DNA was homologous with only a single nucleotide heterology. The influence of homologous sequence in transforming DNA was further studied using plasmids with a small interruption or larger deletions in the recombinogenic region and these alterations were found to impair transformation frequency. In contrast, a particularly potent positive driver of DNA uptake in Neisseria sp. are short DUS in the transforming DNA. However, the molecular mechanism(s) responsible for DUS specificity remains unknown. Increasing the number of DUS in the transforming DNA was here shown to exert a positive effect on transformation. Furthermore, an influence of variable placement of DUS relative to the homologous region in the donor DNA was documented for the first time. No effect of altering the orientation of DUS was observed. These observations suggest that DUS is important at an early stage in the recognition of DNA, but does not exclude the existence of more than one level of DUS specificity in the sequence of events that constitute transformation. New knowledge on the positive and negative drivers of transformation may in a larger perspective illuminate both the mechanisms and the evolutionary role(s) of one of the most conserved mechanisms in nature: homologous recombination. PMID
Directed alteration of Saccharomyces cerevisiae mitochondrial DNA by biolistic transformation and homologous recombination.

PubMed

Bonnefoy, Nathalie; Fox, Thomas D

2007-01-01

Saccharomyces cerevisiae is currently the only species in which genetic transformation of mitochondria can be used to generate a wide variety of defined alterations in mitochondrial deoxyribonucleic acid (mtDNA). DNA sequences can be delivered into yeast mitochondria by microprojectile bombardment (biolistic transformation) and subsequently incorporated into mtDNA by the highly active homologous recombination machinery present in the organelle. Although transformation frequencies are relatively low, the availability of strong mitochondrial selectable markers for the yeast system, both natural and synthetic, makes the isolation of transformants routine. The strategies and procedures reviewed here allow the researcher to insert defined mutations into endogenous mitochondrial genes and to insert new genes into mtDNA. These methods provide powerful in vivo tools for the study of mitochondrial biology.
MultiSeq: unifying sequence and structure data for evolutionary analysis

PubMed Central

Roberts, Elijah; Eargle, John; Wright, Dan; Luthey-Schulten, Zaida

2006-01-01

Background Since the publication of the first draft of the human genome in 2000, bioinformatic data have been accumulating at an overwhelming pace. Currently, more than 3 million sequences and 35 thousand structures of proteins and nucleic acids are available in public databases. Finding correlations in and between these data to answer critical research questions is extremely challenging. This problem needs to be approached from several directions: information science to organize and search the data; information visualization to assist in recognizing correlations; mathematics to formulate statistical inferences; and biology to analyze chemical and physical properties in terms of sequence and structure changes. Results Here we present MultiSeq, a unified bioinformatics analysis environment that allows one to organize, display, align and analyze both sequence and structure data for proteins and nucleic acids. While special emphasis is placed on analyzing the data within the framework of evolutionary biology, the environment is also flexible enough to accommodate other usage patterns. The evolutionary approach is supported by the use of predefined metadata, adherence to standard ontological mappings, and the ability for the user to adjust these classifications using an electronic notebook. MultiSeq contains a new algorithm to generate complete evolutionary profiles that represent the topology of the molecular phylogenetic tree of a homologous group of distantly related proteins. The method, based on the multidimensional QR factorization of multiple sequence and structure alignments, removes redundancy from the alignments and orders the protein sequences by increasing linear dependence, resulting in the identification of a minimal basis set of sequences that spans the evolutionary space of the homologous group of proteins. Conclusion MultiSeq is a major extension of the Multiple Alignment tool that is provided as part of VMD, a structural visualization program for
Amino acid sequence of tyrosinase from Neurospora crassa.

PubMed Central

Lerch, K

1978-01-01

The amino-acid sequence of tyrosinase from Neurospora crassa (monophenol,dihydroxyphenylalanine:oxygen oxidoreductase, EC 1.14.18.1) is reported. This copper-containing oxidase consists of a single polypeptide chain of 407 amino acids. The primary structure was determined by automated and manual sequence analysis on fragments produced by cleavage with cyanogen bromide and on peptides obtained by digestion with trypsin, pepsin, thermolysin, or chymotrypsin. The amino terminus of the protein is acetylated and the single cysteinyl residue 96 is covalently linked via a thioether bridge to histidyl residue 94. The formation and the possible role of this unusual structure in Neurospora tyrosinase is discussed. Dye-sensitized photooxidation of apotyrosinase and active-site-directed inactivation of the native enzyme indicate the possible involvement of histidyl residues 188, 192, 289, and 305 or 306 as ligands to the active-site copper as well as in the catalytic mechanism of this monooxygenase. PMID:151279
The primary structure of rat liver ribosomal protein L37. Homology with yeast and bacterial ribosomal proteins.

PubMed

Lin, A; McNally, J; Wool, I G

1983-09-10

The covalent structure of the rat liver 60 S ribosomal subunit protein L37 was determined. Twenty-four tryptic peptides were purified and the sequence of each was established; they accounted for all 111 residues of L37. The sequence of the first 30 residues of L37, obtained previously by automated Edman degradation of the intact protein, provided the alignment of the first 9 tryptic peptides. Three peptides (CN1, CN2, and CN3) were produced by cleavage of protein L37 with cyanogen bromide. The sequence of CN1 (65 residues) was established from the sequence of secondary peptides resulting from cleavage with trypsin and chymotrypsin. The sequence of CN1 in turn served to order tryptic peptides 1 through 14. The sequence of CN2 (15 residues) was determined entirely by a micromanual procedure and allowed the alignment of tryptic peptides 14 through 18. The sequence of the NH2-terminal 28 amino acids of CN3 (31 residues) was determined; in addition the complete sequences of the secondary tryptic and chymotryptic peptides were done. The sequence of CN3 provided the order of tryptic peptides 18 through 24. Thus the sequence of the three cyanogen bromide peptides also accounted for the 111 residues of protein L37. The carboxyl-terminal amino acids were identified after carboxypeptidase A treatment. There is a disulfide bridge between half-cystinyl residues at positions 40 and 69. Rat liver ribosomal protein L37 is homologous with yeast YP55 and with Escherichia coli L34. Moreover, there is a segment of 17 residues in rat L37 that occurs, albeit with modifications, in yeast YP55 and in E. coli S4, L20, and L34.
Molecular cloning and sequence analysis of full-length growth hormone cDNAs from six important economic fishes.

PubMed

Zhang, Jing-Nan; Song, Ping; Hu, Jia-Rui; Mo, Sai-Jun; Peng, Mao-Yu; Zhou, Wei; Zou, Ji-Xing; Hu, Yin-Chang

2005-01-01

In this study,the full-length cDNAs of GH (Growth Hormone) gene was isolated from six important economic fishes, Siniperca kneri, Epinephelus coioides, Monopterus albus, Silurus asotus, Misgurnus anguillicaudatus and Carassius auratus gibelio Bloch. It is the first time to clone these GH sequences except E. coioides GH. The lengths of the above cDNAs are as follows: 953 bp, 1 023 bp, 825 bp, 1 082 bp, 1 154 bp and 1 180 bp. Each sequence includes an ORF of about 600 bp which encodes a protein of about 200 amino acid: S. kneri, E. coioides and M. albus GHs of 204 amino acid, S. asotus GH of 200 amino acid, M. anguillicaudatus and C. auratus gibelio GHs of 210 amino acid. Then detailed sequence analysis of the six GHs with many other fish sequences was performed. The six sequences all showed high homology to other sequences, especially to sequences within the same order, and many conserved residues were identified, most localized in five domains. The phylogenetic trees (MP and NJ) of many fish GH ORF sequences (including the new six) with Amia calva as outgroup were generally resolved and largely congruent with the morphology-based tree though some incongruities were observed, suggesting GH ORF should be paid more attention to in teleostean phylogeny.
Better Understanding of Homologous Recombination through a 12-Week Laboratory Course for Undergraduates Majoring in Biotechnology

ERIC Educational Resources Information Center

Li, Ming; Shen, Xiaodong; Zhao, Yan; Hu, Xiaomei; Hu, Fuquan; Rao, Xiancai

2017-01-01

Homologous recombination, a central concept in biology, is defined as the exchange of DNA strands between two similar or identical nucleotide sequences. Unfortunately, undergraduate students majoring in biotechnology often experience difficulties in understanding the molecular basis of homologous recombination. In this study, we developed and…

Mapping neurofibromatosis 1 homologous loci by fluorescence in situ hybridization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Viskochil, D.; Breidenbach, H.H.; Cawthon, R.

Neurofibromatosis 1 maps to chromosome band 17q11.2 and the NF1 gene is comprised of 59 exons that span approximately 335 kb of genomic DNA. In order to further analyze the structure of NF1 from exons 2 through 27b, we isolated a number of cosmid and bacteriophage P-1 genomic clones using NF1-exon probes under high-stringency hybridization conditions. Using tagged, intron-based primers and DNA from various clones as a template, we PCR-amplified and sequenced individual NF1 exons. The exon sequences in PCR products from several genomic clones differed from the exon sequence derived from cloned NF1 cDNAs. Clones with variant sequences weremore » mapped by fluorescence in situ hybridization under high-stringency conditions. Three clones mapped to chromosome band 15q11.2, one mapped to 14q11.2, one mapped to both 2q14.1-14.3 and 14q11.2, one mapped to 2q33-34, and one mapped to both 18q11.2 and 21q21. Even though some PCR-product sequences retained proper splice junctions and open reading frames, we have yet to identify cDNAs that correspond to the variant exon sequences. We are now sequencing clones that map to NF1-homologous loci in order to develop discriminating primer pairs for the exclusive amplification of NF1-specific sequences in our efforts to develop a comprehensive NF1 mutation screen using genomic DNA as template. The role of NF1-homologous sequences may play in neurofibromatosis 1 is not clear.« less
Comparative analyses of putative toxin gene homologs from an Old World viper, Daboia russelii

PubMed Central

Krishnan, Neeraja M.

2017-01-01

Availability of snake genome sequences has opened up exciting areas of research on comparative genomics and gene diversity. One of the challenges in studying snake genomes is the acquisition of biological material from live animals, especially from the venomous ones, making the process cumbersome and time-consuming. Here, we report comparative sequence analyses of putative toxin gene homologs from Russell’s viper (Daboia russelii) using whole-genome sequencing data obtained from shed skin. When compared with the major venom proteins in Russell’s viper studied previously, we found 45–100% sequence similarity between the venom proteins and their putative homologs in the skin. Additionally, comparative analyses of 20 putative toxin gene family homologs provided evidence of unique sequence motifs in nerve growth factor (NGF), platelet derived growth factor (PDGF), Kunitz/Bovine pancreatic trypsin inhibitor (Kunitz BPTI), cysteine-rich secretory proteins, antigen 5, andpathogenesis-related1 proteins (CAP) and cysteine-rich secretory protein (CRISP). In those derived proteins, we identified V11 and T35 in the NGF domain; F23 and A29 in the PDGF domain; N69, K2 and A5 in the CAP domain; and Q17 in the CRISP domain to be responsible for differences in the largest pockets across the protein domain structures in crotalines, viperines and elapids from the in silico structure-based analysis. Similarly, residues F10, Y11 and E20 appear to play an important role in the protein structures across the kunitz protein domain of viperids and elapids. Our study highlights the usefulness of shed skin in obtaining good quality high-molecular weight DNA for comparative genomic studies, and provides evidence towards the unique features and evolution of putative venom gene homologs in vipers. PMID:29230357
Vital Roles of the Second DNA-binding Site of Rad52 Protein in Yeast Homologous Recombination*

PubMed Central

Arai, Naoto; Kagawa, Wataru; Saito, Kengo; Shingu, Yoshinori; Mikawa, Tsutomu; Kurumizaka, Hitoshi; Shibata, Takehiko

2011-01-01

RecA/Rad51 proteins are essential in homologous DNA recombination and catalyze the ATP-dependent formation of D-loops from a single-stranded DNA and an internal homologous sequence in a double-stranded DNA. RecA and Rad51 require a “recombination mediator” to overcome the interference imposed by the prior binding of single-stranded binding protein/replication protein A to the single-stranded DNA. Rad52 is the prototype of recombination mediators, and the human Rad52 protein has two distinct DNA-binding sites: the first site binds to single-stranded DNA, and the second site binds to either double- or single-stranded DNA. We previously showed that yeast Rad52 extensively stimulates Rad51-catalyzed D-loop formation even in the absence of replication protein A, by forming a 2:1 stoichiometric complex with Rad51. However, the precise roles of Rad52 and Rad51 within the complex are unknown. In the present study, we constructed yeast Rad52 mutants in which the amino acid residues corresponding to the second DNA-binding site of the human Rad52 protein were replaced with either alanine or aspartic acid. We found that the second DNA-binding site is important for the yeast Rad52 function in vivo. Rad51-Rad52 complexes consisting of these Rad52 mutants were defective in promoting the formation of D-loops, and the ability of the complex to associate with double-stranded DNA was specifically impaired. Our studies suggest that Rad52 within the complex associates with double-stranded DNA to assist Rad51-mediated homologous pairing. PMID:21454474
Homology modeling and prediction of the amino acid residues participating in the transfer of acetyl-CoA to arylalkylamine by the N-acetyltransferase from Chryseobacterium sp.

PubMed

Takenaka, Shinji; Ozeki, Takahiro; Tanaka, Kosei; Yoshida, Ken-Ichi

2017-11-01

To predict the amino acid residues playing important roles in acetyl-CoA and substrate binding and to study the acetyl group transfer mechanism of Chryseobacterium sp. 5-3B N-acetyltransferase (5-3B NatA). A 3-dimensional homology model of 5-3B NatA was constructed to compare the theoretical structure of this compound with the structures of previously reported proteins belonging to the bacterial GCN5 N-acetyltransferase family. Homology modeling of the 5-3B NatA structure and a characterization of the enzyme's kinetic parameters identified the essential amino acid residues involved in binding and acetyl-group transfer. 126 Leu, 132 Leu, and 135 Lys were implicated in the binding of phosphopantothenic acid, and 100 Tyr and 131 Lys in that of adenosyl biphosphate. The data supported the participation of 83 Glu and 133 Tyr in catalyzing acetyl-group transfer to L-2-phenylglycine. 5-3B NatA catalyzes the enantioselective N-acetylation of L-2-phenylglycine via a ternary complex comprising the enzyme, acetyl-CoA, and the substrate.
Nanopores and nucleic acids: prospects for ultrarapid sequencing

NASA Technical Reports Server (NTRS)

Deamer, D. W.; Akeson, M.

2000-01-01

DNA and RNA molecules can be detected as they are driven through a nanopore by an applied electric field at rates ranging from several hundred microseconds to a few milliseconds per molecule. The nanopore can rapidly discriminate between pyrimidine and purine segments along a single-stranded nucleic acid molecule. Nanopore detection and characterization of single molecules represents a new method for directly reading information encoded in linear polymers. If single-nucleotide resolution can be achieved, it is possible that nucleic acid sequences can be determined at rates exceeding a thousand bases per second.
Identification of the homolog of cell-counting factor in the cellular slime mold Dictyostelium discoideum.

PubMed

Okuwa, Takako; Katayama, Takahiro; Takano, Akinori; Yasukawa, Hiroo

2002-10-01

Genes for the cell-counting factors in Dictyostelium discoideum, countin and countin2, are considered to control the size of the multicellular structure of this organism. A novel gene, countin3, that is homologous to countin and countin2 genes (49 and 39% identity in amino acid sequence, respectively) was identified in the D. discoideum genome. The expression of countin3 was observed in the vegetatively growing cells, decreased in the aggregating stage, increased in the mid-developmental stage and decreased again in subsequent stages. This expression pattern is different from that of countin and countin2. The distinct expression kinetics of three genes suggests that they would have unique roles in size control of D. discoideum.
Sequencing BPS spectra

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gukov, Sergei; Nawata, Satoshi; Saberi, Ingmar

In this article, we provide both a detailed study of color-dependence of link homologies, as realized in physics as certain spaces of BPS states, and a broad study of the behavior of BPS states in general. We consider how the spectrum of BPS states varies as continuous parameters of a theory are perturbed. This question can be posed in a wide variety of physical contexts, and we answer it by proposing that the relationship between unperturbed and perturbed BPS spectra is described by a spectral sequence. These general considerations unify previous applications of spectral sequence techniques to physics, and explainmore » from a physical standpoint the appearance of many spectral sequences relating various link homology theories to one another. We also study structural properties of colored HOMFLY homology for links and evaluate Poincar e polynomials in numerous examples. Among these structural properties is a novel "sliding" property, which can be explained by using (re fined) modular S-matrix. This leads to the identi fication of modular transformations in Chern-Simons theory and 3d N = 2 theory via the 3d/3d correspondence. In conclusion, we introduce the notion of associated varieties as classical limits of recursion relations of colored superpolynomials of links, and study their properties.« less
Sequencing BPS spectra

DOE PAGES

Gukov, Sergei; Nawata, Satoshi; Saberi, Ingmar; ...

2016-03-02

In this article, we provide both a detailed study of color-dependence of link homologies, as realized in physics as certain spaces of BPS states, and a broad study of the behavior of BPS states in general. We consider how the spectrum of BPS states varies as continuous parameters of a theory are perturbed. This question can be posed in a wide variety of physical contexts, and we answer it by proposing that the relationship between unperturbed and perturbed BPS spectra is described by a spectral sequence. These general considerations unify previous applications of spectral sequence techniques to physics, and explainmore » from a physical standpoint the appearance of many spectral sequences relating various link homology theories to one another. We also study structural properties of colored HOMFLY homology for links and evaluate Poincar e polynomials in numerous examples. Among these structural properties is a novel "sliding" property, which can be explained by using (re fined) modular S-matrix. This leads to the identi fication of modular transformations in Chern-Simons theory and 3d N = 2 theory via the 3d/3d correspondence. In conclusion, we introduce the notion of associated varieties as classical limits of recursion relations of colored superpolynomials of links, and study their properties.« less
PubDNA Finder: a web database linking full-text articles to sequences of nucleic acids.

PubMed

García-Remesal, Miguel; Cuevas, Alejandro; Pérez-Rey, David; Martín, Luis; Anguita, Alberto; de la Iglesia, Diana; de la Calle, Guillermo; Crespo, José; Maojo, Víctor

2010-11-01

PubDNA Finder is an online repository that we have created to link PubMed Central manuscripts to the sequences of nucleic acids appearing in them. It extends the search capabilities provided by PubMed Central by enabling researchers to perform advanced searches involving sequences of nucleic acids. This includes, among other features (i) searching for papers mentioning one or more specific sequences of nucleic acids and (ii) retrieving the genetic sequences appearing in different articles. These additional query capabilities are provided by a searchable index that we created by using the full text of the 176 672 papers available at PubMed Central at the time of writing and the sequences of nucleic acids appearing in them. To automatically extract the genetic sequences occurring in each paper, we used an original method we have developed. The database is updated monthly by automatically connecting to the PubMed Central FTP site to retrieve and index new manuscripts. Users can query the database via the web interface provided. PubDNA Finder can be freely accessed at http://servet.dia.fi.upm.es:8080/pubdnafinder
Expression of the Pasteurella haemolytica leukotoxin is inhibited by a locus that encodes an ATP-binding cassette homolog.

PubMed Central

Highlander, S K; Wickersham, E A; Garza, O; Weinstock, G M

1993-01-01

Multicopy and single-copy chromosomal fusions between the Pasteurella haemolytica leukotoxin regulatory region and the Escherichia coli beta-galactosidase gene have been constructed. These fusions were used as reporters to identify and isolate regulators of leukotoxin expression from a P. haemolytica cosmid library. A cosmid clone, which inhibited leukotoxin expression from multicopy and single-copy protein fusions, was isolated and found to contain the complete leukotoxin gene cluster plus additional upstream sequences. The locus responsible for inhibition of expression from leukotoxin-beta-galactosidase fusions was mapped within these upstream sequences, by transposon mutagenesis with Tn5, and its DNA sequence was determined. The inhibitory activity was found to be associated with a predicted 440-amino-acid reading frame (lapA) that lies within a four-gene arginine transport locus. LapA is predicted to be the nucleotide-binding component of this transport system and shares homology with the Clp family of proteases. Images PMID:8359916
Homologies between the amino acid sequences of some vertebrate peptide hormones and peptides isolated from invertebrate sources.

PubMed

De Loof, A; Schoofs, L

1990-01-01

1. The 4K-prothoracicotropic hormone (PTTH) or bombyxin and the melanization-reddish coloration hormone of the silkworm Bombyx mori resemble insulin and insulin-like growth factors. 2. The family of adipokinetic/red pigment concentrating hormones has some similarity with glucagon. 3. Members of the FMRFamide family are found in vertebrates as well as in invertebrates. 4. In Locusta, a molecule immunologically and biologically related to amphibian melanophore stimulating hormone has been partially characterized. 5. Enkephalins and enkephalin-related peptides occur in insects and other invertebrates. 6. Peptides belonging to the tachykinin family have been isolated from molluscan (Octopus) salivary glands and from insect nervous tissue (Locusta migratoria). 7. Invertebrate arginine-vasotocin homologs have been isolated from an insect (Locusta migratoria) and from a mollusc (Conus). 8. In Leucophaea, Locusta and Drosophila, peptides resembling those of the vertebrate gastrin/cholecystokinin family have been identified. 9. As the number of different neuro-/gut peptides with possible function(s) as hormone, neurotransmitter or neuromodulator is now estimated to be of the order of a few hundred, more similarities will probably show up in the near future.
Molecular association of normal alkanoic acids with their thallium(I) salts: a new homologous series of fatty acid metal soaps.

PubMed

Fernández-García, M; García, M V; Redondo, M I; Cheda, J A; Fernández-García, M; Westrum, E F; Fernández-Martín, F

1997-02-01

A new homologous series of thallium(I) hydrogen dialkanoates, fatty acid thallium soaps, from the dipropane up to the ditetradecane is reported for the first time. This association with 1:1 stoichiometry is the only one exhibited by the thallium derivatives. They have been prepared by solidification of molten mixtures with equimolar proportions of acid and corresponding neutral salt, through crystallization from an anhydrous ethanolic solution of the mixture has also been successful in getting pure compounds with largest chain lengths. Vibrational spectroscopies clearly characterize these crystalline compounds as very strong hydrogen bonding systems. Assignations of active modes in proton and carbon nuclear magnetic resonance spectrometry (NMR) (in ethanol) and infrared (IR) and Raman spectra (in solid state) are reported. According to X-ray diffraction (XRD) they have monomolecular lamellar structures with the acyl chains arranged up and down to the cation/H-bond network in a methyl-to-methyl fashion, and vertically oriented to the basal plane. The acyl chains present all-trans conformation and alternating configuration (perpendicular orthorhombic subcell), like the beta'-phases of other kinds of lipids. Lamellar thickness is reported for the six room-temperature crystalline members. The molecular compounds present polymorphism, one crystal/crystal transition at temperatures close to the peritectical melting. Phase transition thermodynamics are also given and discussed with respect to their acid and salt parents. Their incongruent melting involves nearly 90% of the total enthalpic increments of both constituents' melting processes, making these compounds potential thermal energy storage materials.
BLAST and FASTA similarity searching for multiple sequence alignment.

PubMed

Pearson, William R

2014-01-01

BLAST, FASTA, and other similarity searching programs seek to identify homologous proteins and DNA sequences based on excess sequence similarity. If two sequences share much more similarity than expected by chance, the simplest explanation for the excess similarity is common ancestry-homology. The most effective similarity searches compare protein sequences, rather than DNA sequences, for sequences that encode proteins, and use expectation values, rather than percent identity, to infer homology. The BLAST and FASTA packages of sequence comparison programs provide programs for comparing protein and DNA sequences to protein databases (the most sensitive searches). Protein and translated-DNA comparisons to protein databases routinely allow evolutionary look back times from 1 to 2 billion years; DNA:DNA searches are 5-10-fold less sensitive. BLAST and FASTA can be run on popular web sites, but can also be downloaded and installed on local computers. With local installation, target databases can be customized for the sequence data being characterized. With today's very large protein databases, search sensitivity can also be improved by searching smaller comprehensive databases, for example, a complete protein set from an evolutionarily neighboring model organism. By default, BLAST and FASTA use scoring strategies target for distant evolutionary relationships; for comparisons involving short domains or queries, or searches that seek relatively close homologs (e.g. mouse-human), shallower scoring matrices will be more effective. Both BLAST and FASTA provide very accurate statistical estimates, which can be used to reliably identify protein sequences that diverged more than 2 billion years ago.
Complete complementary DNA-derived amino acid sequence of canine cardiac phospholamban.

PubMed Central

Fujii, J; Ueno, A; Kitano, K; Tanaka, S; Kadoma, M; Tada, M

1987-01-01

Complementary DNA (cDNA) clones specific for phospholamban of sarcoplasmic reticulum membranes have been isolated from a canine cardiac cDNA library. The amino acid sequence deduced from the cDNA sequence indicates that phospholamban consists of 52 amino acid residues and lacks an amino-terminal signal sequence. The protein has an inferred mol wt 6,080 that is in agreement with its apparent monomeric mol wt 6,000, estimated previously by sodium dodecyl sulfate-polyacrylamide gel electrophoresis. Phospholamban contains two distinct domains, a hydrophilic region at the amino terminus (domain I) and a hydrophobic region at the carboxy terminus (domain II). We propose that domain I is localized at the cytoplasmic surface and offers phosphorylatable sites whereas domain II is anchored into the sarcoplasmic reticulum membrane. PMID:3793929
The nucleotide sequence of a segment of Trypanosoma brucei mitochondrial maxi-circle DNA that contains the gene for apocytochrome b and some unusual unassigned reading frames.

PubMed Central

Benne, R; De Vries, B F; Van den Burg, J; Klaver, B

1983-01-01

The nucleotide sequence of a 2.5-kb segment of the maxi-circle of Trypanosoma brucei mtDNA has been determined. The segment contains the gene for apocytochrome b, which displays about 25% homology at the amino acid level to the apocytochrome b gene from fungal and mammalian mtDNAs. Northern blot and S1 nuclease analyses have yielded accurate map positions of an RNA species in an area that coincides with the reading frame. The segment also contains two pairs of overlapping unassigned reading frames, which lack homology with any known mitochondrial gene or URF. The DNA sequence in these areas is AG-rich (70%), resulting in URFs with an unusually high level of glycine and charged amino acids (60%). They may not encode proteins, in spite of their size and the fact that abundant transcripts are mapped in these areas. Images PMID:6314266
Bhageerath-H: A homology/ab initio hybrid server for predicting tertiary structures of monomeric soluble proteins

PubMed Central

2014-01-01

Background The advent of human genome sequencing project has led to a spurt in the number of protein sequences in the databanks. Success of structure based drug discovery severely hinges on the availability of structures. Despite significant progresses in the area of experimental protein structure determination, the sequence-structure gap is continually widening. Data driven homology based computational methods have proved successful in predicting tertiary structures for sequences sharing medium to high sequence similarities. With dwindling similarities of query sequences, advanced homology/ ab initio hybrid approaches are being explored to solve structure prediction problem. Here we describe Bhageerath-H, a homology/ ab initio hybrid software/server for predicting protein tertiary structures with advancing drug design attempts as one of the goals. Results Bhageerath-H web-server was validated on 75 CASP10 targets which showed TM-scores ≥0.5 in 91% of the cases and Cα RMSDs ≤5Å from the native in 58% of the targets, which is well above the CASP10 water mark. Comparison with some leading servers demonstrated the uniqueness of the hybrid methodology in effectively sampling conformational space, scoring best decoys and refining low resolution models to high and medium resolution. Conclusion Bhageerath-H methodology is web enabled for the scientific community as a freely accessible web server. The methodology is fielded in the on-going CASP11 experiment. PMID:25521245
Primary structure and functional characterization of a Drosophila dopamine receptor with high homology to human D1/5 receptors.

PubMed

Gotzes, F; Balfanz, S; Baumann, A

1994-01-01

Members of the superfamily of G-protein coupled receptors share significant similarities in sequence and transmembrane architecture. We have isolated a Drosophila homologue of the mammalian dopamine receptor family using a low stringency hybridization approach. The deduced amino acid sequence is approximately 70% homologous to the human D1/D5 receptors. When expressed in HEK 293 cells, the Drosophila receptor stimulates cAMP production in response to dopamine application. This effect was mimicked by SKF 38393, a specific D1 receptor agonist, but inhibited by dopaminergic antagonists such as butaclamol and flupentixol. In situ hybridization revealed that the Drosophila dopamine receptor is highly expressed in the somata of the optic lobes. This suggests that the receptor might be involved in the processing of visual information and/or visual learning in invertebrates.
Cloning and expression of Bartonella henselae sucB gene encoding an immunogenic dihydrolipoamide succinyltransferase homologous protein.

PubMed

Kabeya, Hidenori; Maruyama, Soichi; Hirano, Kouji; Mikami, Takeshi

2003-01-01

Immunoscreening of a ZAP genomic library of Bartonella henselae strain Houston-1 expressed in Escherichia coli resulted in the isolation of a clone containing 3.5 kb BamHI genomic DNA fragment. This 3.5 kb DNA fragment was found to contain a sequence of a gene encoding a protein with significant homology to the dihydrolipoamide succinyltransferase of Brucella melitensis (sucB). Subsequent cloning and DNA sequence analysis revealed that the deduced amino acid sequence from the cloned gene showed 66.5% identity to SucB protein of B. melitensis, and 43.4 and 47.2% identities to those of Coxiella burnetii and E. coli, respectively. The gene was expressed as a His-Nus A-tagged fusion protein. The recombinant SucB protein (rSucB) was shown to be an immunoreactive protein of about 115 kDa by Western blot analysis with sera from B. henselae-immunized mice. Therefore the rSucB may be a candidate antigen for a specific serological diagnosis of B. henselae infection.
Evolutionary profiles from the QR factorization of multiple sequence alignments

PubMed Central

Sethi, Anurag; O'Donoghue, Patrick; Luthey-Schulten, Zaida

2005-01-01

We present an algorithm to generate complete evolutionary profiles that represent the topology of the molecular phylogenetic tree of the homologous group. The method, based on the multidimensional QR factorization of numerically encoded multiple sequence alignments, removes redundancy from the alignments and orders the protein sequences by increasing linear dependence, resulting in the identification of a minimal basis set of sequences that spans the evolutionary space of the homologous group of proteins. We observe a general trend that these smaller, more evolutionarily balanced profiles have comparable and, in many cases, better performance in database searches than conventional profiles containing hundreds of sequences, constructed in an iterative and computationally intensive procedure. For more diverse families or superfamilies, with sequence identity <30%, structural alignments, based purely on the geometry of the protein structures, provide better alignments than pure sequence-based methods. Merging the structure and sequence information allows the construction of accurate profiles for distantly related groups. These structure-based profiles outperformed other sequence-based methods for finding distant homologs and were used to identify a putative class II cysteinyl-tRNA synthetase (CysRS) in several archaea that eluded previous annotation studies. Phylogenetic analysis showed the putative class II CysRSs to be a monophyletic group and homology modeling revealed a constellation of active site residues similar to that in the known class I CysRS. PMID:15741270
Metagenomic ventures into outer sequence space.

PubMed

Dutilh, Bas E

Sequencing DNA or RNA directly from the environment often results in many sequencing reads that have no homologs in the database. These are referred to as "unknowns," and reflect the vast unexplored microbial sequence space of our biosphere, also known as "biological dark matter." However, unknowns also exist because metagenomic datasets are not optimally mined. There is a pressure on researchers to publish and move on, and the unknown sequences are often left for what they are, and conclusions drawn based on reads with annotated homologs. This can cause abundant and widespread genomes to be overlooked, such as the recently discovered human gut bacteriophage crAssphage. The unknowns may be enriched for bacteriophage sequences, the most abundant and genetically diverse component of the biosphere and of sequence space. However, it remains an open question, what is the actual size of biological sequence space? The de novo assembly of shotgun metagenomes is the most powerful tool to address this question.

The complete amino acid sequence of human erythrocyte diphosphoglycerate mutase.

PubMed Central

Haggarty, N W; Dunbar, B; Fothergill, L A

1983-01-01

The complete amino acid sequence of human erythrocyte diphosphoglycerate mutase, comprising 239 residues, was determined. The sequence was deduced from the four cyanogen bromide fragments, and from the peptides derived from these fragments after digestion with a number of proteolytic enzymes. Comparison of this sequence with that of the yeast glycolytic enzyme, phosphoglycerate mutase, shows that these enzymes are 47% identical. Most, but not all, of the residues implicated as being important for the activity of the glycolytic mutase are conserved in the erythrocyte diphosphoglycerate mutase. PMID:6313356
Tocopherol and tocotrienol homologs in parenteral lipid emulsions

PubMed Central

Xu, Zhidong; Harvey, Kevin A; Pavlina, Thomas M; Zaloga, Gary P; Siddiqui, Rafat A

2015-01-01

Parenteral lipid emulsions, which are made of oils from plant and fish sources, contain different types of tocopherols and tocotrienols (vitamin E homologs). The amount and types of vitamin E homologs in various lipid emulsions vary considerably and are not completely known. The objective of this analysis was to develop a quantitative method to determine levels of all vitamin E homologs in various lipid emulsions. An HPLC system was used to measure vitamin E homologs using a Pinnacle DB Silica normal phase column and an isocratic, n-hexane:1,4 dioxane (98:2) mobile phase. An optimized protocol was used to report vitamin E homolog concentrations in soybean oil-based (Intralipid®, Ivelip®, Lipofundin® N, Liposyn® III, and Liposyn® II), medium- and long-chain fatty acid-based (Lipofundin®, MCT and Structolipid®), olive oil-based (ClinOleic®), and fish oil-based (Omegaven®) and mixture of these oils-based (SMOFlipid®, Lipidem®) commercial parenteral lipid emulsions. Total content of all vitamin E homologs varied greatly between different emulsions, ranging from 57.9 to 383.9 µg/mL. Tocopherols (α, β, γ, δ) were the predominant vitamin E homologs for all emulsions, with tocotrienol content < 0.3%. In all of the soybean emulsions, except for Lipofundin® N, the predominant vitamin E homolog was γ-tocopherol, which ranged from 57–156 µg/mL. ClinOleic® predominantly contained α-tocopherol (32 µg/mL), whereas α-tocopherol content in Omegaven® was higher than most of the other lipid emulsions (230 µg/mL). Practical applications The information on the types and quantity of vitamin E homologs in various lipid emulsions will be extremely useful to physicians and healthcare personnel in selecting appropriate lipid emulsions that are exclusively used in patients with inadequate gastrointestinal function, including hospitalized and critically ill patients. Some emulsions may require vitamin E supplementation in order to meet minimal human requirements
DETERMINATE and LATE FLOWERING are two TERMINAL FLOWER1/CENTRORADIALIS homologs that control two distinct phases of flowering initiation and development in pea.

PubMed

Foucher, Fabrice; Morin, Julie; Courtiade, Juliette; Cadioux, Sandrine; Ellis, Noel; Banfield, Mark J; Rameau, Catherine

2003-11-01

Genes in the TERMINAL FLOWER1 (TFL1)/CENTRORADIALIS family are important key regulatory genes involved in the control of flowering time and floral architecture in several different plant species. To understand the functions of TFL1 homologs in pea, we isolated three TFL1 homologs, which we have designated PsTFL1a, PsTFL1b, and PsTFL1c. By genetic mapping and sequencing of mutant alleles, we demonstrate that PsTFL1a corresponds to the DETERMINATE (DET) gene and PsTFL1c corresponds to the LATE FLOWERING (LF) gene. DET acts to maintain the indeterminacy of the apical meristem during flowering, and consistent with this role, DET expression is limited to the shoot apex after floral initiation. LF delays the induction of flowering by lengthening the vegetative phase, and allelic variation at the LF locus is an important component of natural variation for flowering time in pea. The most severe class of alleles flowers early and carries either a deletion of the entire PsTFL1c gene or an amino acid substitution. Other natural and induced alleles for LF, with an intermediate flowering time phenotype, present no changes in the PsTFL1c amino acid sequence but affect LF transcript level in the shoot apex: low LF transcript levels are correlated with early flowering, and high LF transcript levels are correlated with late flowering. Thus, different TFL1 homologs control two distinct aspects of plant development in pea, whereas a single gene, TFL1, performs both functions in Arabidopsis. These results show that different species have evolved different strategies to control key developmental transitions and also that the genetic basis for natural variation in flowering time may differ among plant species.
Analysis of eight out genes in a cluster required for pectic enzyme secretion by Erwinia chrysanthemi: sequence comparison with secretion genes from other gram-negative bacteria.

PubMed Central

Lindeberg, M; Collmer, A

1992-01-01

Many extracellular proteins produced by Erwinia chrysanthemi require the out gene products for transport across the outer membrane. In a previous report (S. Y. He, M. Lindeberg, A. K. Chatterjee, and A. Collmer, Proc. Natl. Acad. Sci. USA 88:1079-1083, 1991) cosmid pCPP2006, sufficient for secretion of Erwinia chrysanthemi extracellular proteins by Escherichia coli, was partially sequenced, revealing four out genes sharing high homology with pulH through pulK from Klebsiella oxytoca. The nucleotide sequence of eight additional out genes reveals homology with pulC through pulG, pulL, pulM, pulO, and other genes involved in secretion by various gram-negative bacteria. Although signal sequences and hydrophobic regions are generally conserved between Pul and Out proteins, four out genes contain unique inserts, a pulN homolog is not present, and outO appears to be transcribed separately from outC through outM. The sequenced region was subcloned, and an additional 7.6-kb region upstream was identified as being required for secretion in E. coli. out gene homologs were found on Erwinia carotovora cosmid clone pAKC651 but were not detected in E. coli. The outC-through-outM operon is weakly induced by polygalacturonic acid and strongly expressed in the early stationary phase. The out and pul genes are highly similar in sequence, hydropathic properties, and overall arrangement but differ in both transcriptional organization and the nature of their induction. Images PMID:1429461
Biological activity of cannabichromene, its homologs and isomers.

PubMed

Turner, C E; Elsohly, M A

1981-01-01

Cannabichromene (CBC) is one of four major cannabinoids in Cannabis sativa L. and is the second most abundant cannabinoid in drug-type cannabis. Cannabichromene and some of its homologs, analogs, and isomers were evaluated for antiinflammatory, antibacterial, and antifungal activity. Antiinflammatory activity was evaluated by the carrageenan-induced rat paw edema and the erythrocyte membrane stabilization method. In both tests, CBC was superior to phenylbutazone. Antibacterial activity of CBC and its isomers and homologs was evaluated using gram-positive, gram-negative, and acid-fast bacteria. Antifungal activity was evaluated using yeast-like and filamentous fungi and a dermatophyte. Antibacterial activity was strong, and the antifungal activity was mild to moderate.
Divergence and evolution of homologous regions of Bombyx mori nuclear polyhedrosis virus.

PubMed Central

Majima, K; Kobara, R; Maeda, S

1993-01-01

Homologous regions (hrs) (hr1,hr2-left,hr2-right,hr3,hr4-left,hr 4-right, and hr5) similar to those found in the Autographa californica nuclear polyhedrosis virus (AcNPV) genome were found in the Bombyx mori NPV (BmNPV) genome. The BmNPV hrs contained two to eight repeats of a homologous nucleotide sequence which were on average about 75 bp long. All of these homologous sequence repeats contained a 26-bp-long palindrome motif with an EcoRI or EcoRI-like site at its core. The consensus sequence of the BmNPV hrs showed 95% conservation with respect to those found in AcNPV. Nucleotide sequence analysis indicated that hr2-left and hr2-right of BmNPV evolved from an ancestor similar to hr2 of AcNPV by inversion, cleavage, and ligation. The polarities of the BmNPV and AcNPV hrs were conserved except for that of hr4-left. Within hr4-right of BmNPV, four repeats of a previously underscribed palindrome motif were found. Bmhr5D, a BmNPV mutant which lacked hr5, replicated at a rate similar to that of wild-type BmNPV in BmN cells and silkworm larvae, indicating that hr5 was not essential for viral replication. After ten passages of Bmhr5D in BmN cells, no detectable changes in its genome were observed by restriction endonuclease analysis. The evolution and divergence of the BmNPV genome are also discussed. Images PMID:8230471
Quantum-Sequencing: Biophysics of quantum tunneling through nucleic acids

NASA Astrophysics Data System (ADS)

Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

2014-03-01

Tunneling microscopy and spectroscopy has extensively been used in physical surface sciences to study quantum tunneling to measure electronic local density of states of nanomaterials and to characterize adsorbed species. Quantum-Sequencing (Q-Seq) is a new method based on tunneling microscopy for electronic sequencing of single molecule of nucleic acids. A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free single-molecule sequencing method. Here, we present the unique ``electronic fingerprints'' for all nucleotides on DNA and RNA using Q-Seq along their intrinsic biophysical parameters. We have analyzed tunneling spectra for the nucleotides at different pH conditions and analyzed the HOMO, LUMO and energy gap for all of them. In addition we show a number of biophysical parameters to further characterize all nucleobases (electron and hole transition voltage and energy barriers). These results highlight the robustness of Q-Seq as a technique for next-generation sequencing.
Searching remote homology with spectral clustering with symmetry in neighborhood cluster kernels.

PubMed

Maulik, Ujjwal; Sarkar, Anasua

2013-01-01

Remote homology detection among proteins utilizing only the unlabelled sequences is a central problem in comparative genomics. The existing cluster kernel methods based on neighborhoods and profiles and the Markov clustering algorithms are currently the most popular methods for protein family recognition. The deviation from random walks with inflation or dependency on hard threshold in similarity measure in those methods requires an enhancement for homology detection among multi-domain proteins. We propose to combine spectral clustering with neighborhood kernels in Markov similarity for enhancing sensitivity in detecting homology independent of "recent" paralogs. The spectral clustering approach with new combined local alignment kernels more effectively exploits the unsupervised protein sequences globally reducing inter-cluster walks. When combined with the corrections based on modified symmetry based proximity norm deemphasizing outliers, the technique proposed in this article outperforms other state-of-the-art cluster kernels among all twelve implemented kernels. The comparison with the state-of-the-art string and mismatch kernels also show the superior performance scores provided by the proposed kernels. Similar performance improvement also is found over an existing large dataset. Therefore the proposed spectral clustering framework over combined local alignment kernels with modified symmetry based correction achieves superior performance for unsupervised remote homolog detection even in multi-domain and promiscuous domain proteins from Genolevures database families with better biological relevance. Source code available upon request. sarkar@labri.fr.
Amino acid sequence of the Amur tiger prion protein.

PubMed

Wu, Changde; Pang, Wanyong; Zhao, Deming

2006-10-01

Prion diseases are fatal neurodegenerative disorders in human and animal associated with conformational conversion of a cellular prion protein (PrP(C)) into the pathologic isoform (PrP(Sc)). Various data indicate that the polymorphisms within the open reading frame (ORF) of PrP are associated with the susceptibility and control the species barrier in prion diseases. In the present study, partial Prnp from 25 Amur tigers (tPrnp) were cloned and screened for polymorphisms. Four single nucleotide polymorphisms (T423C, A501G, C511A, A610G) were found; the C511A and A610G nucleotide substitutions resulted in the amino acid changes Lysine171Glutamine and Alanine204Threoine, respectively. The tPrnp amino acid sequence is similar to house cat (Felis catus ) and sheep, but differs significantly from other two cat Prnp sequences that were previously deposited in GenBank.
Thermal adaptation analyzed by comparison of protein sequences from mesophilic and extremely thermophilic Methanococcus species

NASA Technical Reports Server (NTRS)

Haney, P. J.; Badger, J. H.; Buldak, G. L.; Reich, C. I.; Woese, C. R.; Olsen, G. J.

1999-01-01

The genome sequence of the extremely thermophilic archaeon Methanococcus jannaschii provides a wealth of data on proteins from a thermophile. In this paper, sequences of 115 proteins from M. jannaschii are compared with their homologs from mesophilic Methanococcus species. Although the growth temperatures of the mesophiles are about 50 degrees C below that of M. jannaschii, their genomic G+C contents are nearly identical. The properties most correlated with the proteins of the thermophile include higher residue volume, higher residue hydrophobicity, more charged amino acids (especially Glu, Arg, and Lys), and fewer uncharged polar residues (Ser, Thr, Asn, and Gln). These are recurring themes, with all trends applying to 83-92% of the proteins for which complete sequences were available. Nearly all of the amino acid replacements most significantly correlated with the temperature change are the same relatively conservative changes observed in all proteins, but in the case of the mesophile/thermophile comparison there is a directional bias. We identify 26 specific pairs of amino acids with a statistically significant (P < 0.01) preferred direction of replacement.
Non-homologous isofunctional enzymes: a systematic analysis of alternative solutions in enzyme evolution.

PubMed

Omelchenko, Marina V; Galperin, Michael Y; Wolf, Yuri I; Koonin, Eugene V

2010-04-30

Evolutionarily unrelated proteins that catalyze the same biochemical reactions are often referred to as analogous - as opposed to homologous - enzymes. The existence of numerous alternative, non-homologous enzyme isoforms presents an interesting evolutionary problem; it also complicates genome-based reconstruction of the metabolic pathways in a variety of organisms. In 1998, a systematic search for analogous enzymes resulted in the identification of 105 Enzyme Commission (EC) numbers that included two or more proteins without detectable sequence similarity to each other, including 34 EC nodes where proteins were known (or predicted) to have distinct structural folds, indicating independent evolutionary origins. In the past 12 years, many putative non-homologous isofunctional enzymes were identified in newly sequenced genomes. In addition, efforts in structural genomics resulted in a vastly improved structural coverage of proteomes, providing for definitive assessment of (non)homologous relationships between proteins. We report the results of a comprehensive search for non-homologous isofunctional enzymes (NISE) that yielded 185 EC nodes with two or more experimentally characterized - or predicted - structurally unrelated proteins. Of these NISE sets, only 74 were from the original 1998 list. Structural assignments of the NISE show over-representation of proteins with the TIM barrel fold and the nucleotide-binding Rossmann fold. From the functional perspective, the set of NISE is enriched in hydrolases, particularly carbohydrate hydrolases, and in enzymes involved in defense against oxidative stress. These results indicate that at least some of the non-homologous isofunctional enzymes were recruited relatively recently from enzyme families that are active against related substrates and are sufficiently flexible to accommodate changes in substrate specificity.
Partial characterization of the lettuce infectious yellows virus genomic RNAs, identification of the coat protein gene and comparison of its amino acid sequence with those of other filamentous RNA plant viruses.

PubMed

Klaassen, V A; Boeshore, M; Dolja, V V; Falk, B W

1994-07-01

Purified virions of lettuce infectious yellows virus (LIYV), a tentative member of the closterovirus group, contained two RNAs of approximately 8500 and 7300 nucleotides (RNAs 1 and 2 respectively) and a single coat protein species with M(r) of approximately 28,000. LIYV-infected plants contained multiple dsRNAs. The two largest were the correct size for the replicative forms of LIYV virion RNAs 1 and 2. To assess the relationships between LIYV RNAs 1 and 2, cDNAs corresponding to the virion RNAs were cloned. Northern blot hybridization analysis showed no detectable sequence homology between these RNAs. A partial amino acid sequence obtained from purified LIYV coat protein was found to align in the most upstream of four complete open reading frames (ORFs) identified in a LIYV RNA 2 cDNA clone. The identity of this ORF was confirmed as the LIYV coat protein gene by immunological analysis of the gene product expressed in vitro and in Escherichia coli. Computer analysis of the LIYV coat protein amino acid sequence indicated that it belongs to a large family of proteins forming filamentous capsids of RNA plant viruses. The LIYV coat protein appears to be most closely related to the coat proteins of two closteroviruses, beet yellows virus and citrus tristeza virus.
[Cloning and bioinformatics analysis of abscisic acid 8'-hydroxylase from Pseudostellariae Radix].

PubMed

Li, Jun; Long, Deng-Kai; Zhou, Tao; Ding, Ling; Zheng, Wei; Jiang, Wei-Ke

2016-07-01

Abscisic acid 8'-hydroxylase was one of key enzymes genes in the metabolism of abscisic acid (ABA). Seven menbers of abscisic acid 8'-hydroxylase were identified from Pseudostellaria heterophylla transcriptome sequencing results by using sequence homology. The expression profiles of these genes were analyzed by transcriptome data. The coding sequence of ABA8ox1 was cloned and analyzed by informational technology. The full-length cDNA of ABA8ox1 was 1 401 bp,with 480 encoded amino acids. The predicated isoelectric point (pI) and relative molecular mass (MW) were 8.55 and 53 kDa,respectively. Transmembrane structure analysis showed that there were 21 amino acids in-side and 445 amino acids out-side. High level of transcripts can detect in bark of root and fibrous root. Multi-alignment and phylogenetic analysis both show that ABA8ox1 had a high similarity with the CYP707As from other plants,especially with AtCYP707A1 and AtCYP707A3 in Arabidopsis thaliana. These results lay a foundation for molecular mechanism of tuberous root expanding and response to adversity stress. Copyright© by the Chinese Pharmaceutical Association.
Gene Discovery through Genomic Sequencing of Brucella abortus

PubMed Central

Sánchez, Daniel O.; Zandomeni, Ruben O.; Cravero, Silvio; Verdún, Ramiro E.; Pierrou, Ester; Faccio, Paula; Diaz, Gabriela; Lanzavecchia, Silvia; Agüero, Fernán; Frasch, Alberto C. C.; Andersson, Siv G. E.; Rossetti, Osvaldo L.; Grau, Oscar; Ugalde, Rodolfo A.

2001-01-01

Brucella abortus is the etiological agent of brucellosis, a disease that affects bovines and human. We generated DNA random sequences from the genome of B. abortus strain 2308 in order to characterize molecular targets that might be useful for developing immunological or chemotherapeutic strategies against this pathogen. The partial sequencing of 1,899 clones allowed the identification of 1,199 genomic sequence surveys (GSSs) with high homology (BLAST expect value < 10−5) to sequences deposited in the GenBank databases. Among them, 925 represent putative novel genes for the Brucella genus. Out of 925 nonredundant GSSs, 470 were classified in 15 categories based on cellular function. Seven hundred GSSs showed no significant database matches and remain available for further studies in order to identify their function. A high number of GSSs with homology to Agrobacterium tumefaciens and Rhizobium meliloti proteins were observed, thus confirming their close phylogenetic relationship. Among them, several GSSs showed high similarity with genes related to nodule nitrogen fixation, synthesis of nod factors, nodulation protein symbiotic plasmid, and nodule bacteroid differentiation. We have also identified several B. abortus homologs of virulence and pathogenesis genes from other pathogens, including a homolog to both the Shda gene from Salmonella enterica serovar Typhimurium and the AidA-1 gene from Escherichia coli. Other GSSs displayed significant homologies to genes encoding components of the type III and type IV secretion machineries, suggesting that Brucella might also have an active type III secretion machinery. PMID:11159979
Isolation and characterization of an AGAMOUS homolog from Fraxinus pennsylvanica

Treesearch

Ningxia Du; Paula M. Pijut

2010-01-01

An AGAMOUS homolog (FpAG) was isolated from green ash (Fraxinus pennsylvanica) using a reverse transcriptase polymerase chain reaction method. Southern blot analysis indicated that FpAG was present as a single-copy sequence in the genome of green ash. RNA accumulated in the reproductive tissues (female...
Method for high-volume sequencing of nucleic acids: random and directed priming with libraries of oligonucleotides

DOEpatents

Studier, F. William

1995-04-18

Random and directed priming methods for determining nucleotide sequences by enzymatic sequencing techniques, using libraries of primers of lengths 8, 9 or 10 bases, are disclosed. These methods permit direct sequencing of nucleic acids as large as 45,000 base pairs or larger without the necessity for subcloning. Individual primers are used repeatedly to prime sequence reactions in many different nucleic acid molecules. Libraries containing as few as 10,000 octamers, 14,200 nonamers, or 44,000 decamers would have the capacity to determine the sequence of almost any cosmid DNA. Random priming with a fixed set of primers from a smaller library can also be used to initiate the sequencing of individual nucleic acid molecules, with the sequence being completed by directed priming with primers from the library. In contrast to random cloning techniques, a combined random and directed priming strategy is far more efficient.
Method for high-volume sequencing of nucleic acids: random and directed priming with libraries of oligonucleotides

DOEpatents

Studier, F.W.

1995-04-18

Random and directed priming methods for determining nucleotide sequences by enzymatic sequencing techniques, using libraries of primers of lengths 8, 9 or 10 bases, are disclosed. These methods permit direct sequencing of nucleic acids as large as 45,000 base pairs or larger without the necessity for subcloning. Individual primers are used repeatedly to prime sequence reactions in many different nucleic acid molecules. Libraries containing as few as 10,000 octamers, 14,200 nonamers, or 44,000 decamers would have the capacity to determine the sequence of almost any cosmid DNA. Random priming with a fixed set of primers from a smaller library can also be used to initiate the sequencing of individual nucleic acid molecules, with the sequence being completed by directed priming with primers from the library. In contrast to random cloning techniques, a combined random and directed priming strategy is far more efficient. 2 figs.
Asymmetric synthesis of α-amino acids via homologation of Ni(II) complexes of glycine Schiff bases. Part 2: aldol, Mannich addition reactions, deracemization and (S) to (R) interconversion of α-amino acids.

PubMed

Sorochinsky, Alexander E; Aceña, José Luis; Moriwaki, Hiroki; Sato, Tatsunori; Soloshonok, Vadim

2013-11-01

This review provides a comprehensive treatment of literature data dealing with asymmetric synthesis of α-amino-β-hydroxy and α,β-diamino acids via homologation of chiral Ni(II) complexes of glycine Schiff bases using aldol and Mannich-type reactions. These reactions proceed with synthetically useful chemical yields and thermodynamically controlled stereoselectivity and allow direct introduction of two stereogenic centers in a single operation with predictable stereochemical outcome. Furthermore, new application of Ni(II) complexes of α-amino acids Schiff bases for deracemization of racemic α-amino acids and (S) to (R) interconversion providing additional synthetic opportunities for preparation of enantiomerically pure α-amino acids, is also reviewed. Origin of observed diastereo-/enantioselectivity in the aldol, Mannich-type and deracemization reactions, generality and limitations of these methodologies are critically discussed.
Description of durancin TW-49M, a novel enterocin B-homologous bacteriocin in carrot-isolated Enterococcus durans QU 49.

PubMed

Hu, C-B; Zendo, T; Nakayama, J; Sonomoto, K

2008-09-01

To characterize the novel bacteriocin produced by Enterococcus durans. Enterococcus durans QU 49 was isolated from carrot and expressed bactericidal activity over 20-43 degrees C. Bacteriocins were purified to homogeneity using the three-step purification method, one of which, termed durancin TW-49M, was an enterocin B-homologous peptide with most identical residues occurring in the N-terminus. Durancin TW-49M was more tolerant in acidic than in alkali. DNA sequencing analysis revealed durancin TW-49M was translated as a prepeptide of the double-glycine type. Durancin TW-49M and enterocin B expressed similar antimicrobial spectra, in which no significant variation due to the diversity in their C-termini was observed. Durancin TW-49M, a novel nonpediocin-like class II bacteriocin, was characterized to the amino acid and genetic levels. The diverse C-terminal parts of durancin TW-49M and enterocin B were hardly to be suggested as the place determining the target cell specificity. This is the first and comprehensive study of a novel bacteriocin produced by Ent. durans. The high homology at the N-terminal halves between durancin TW-49M and enterocin B makes them suitable to study the structure-function relationship of bacteriocins and their immunity proteins.
Alkyl phosphonic acids and sulfonic acids in the Murchison meteorite

NASA Technical Reports Server (NTRS)

Cooper, George W.; Onwo, Wilfred M.; Cronin, John R.

1992-01-01

Homologous series of alkyl phosphonic acids and alkyl sulfonic acids, along with inorganic orthophosphate and sulfate, are identified in water extracts of the Murchison meteorite after conversion to their t-butyl dimethylsilyl derivatives. The methyl, ethyl, propyl, and butyl compounds are observed in both series. Five of the eight possible alkyl phosphonic acids and seven of the eight possible alkyl sulfonic acids through C4 are identified. Abundances decrease with increasing carbon number as observed of other homologous series indigenous to Murchison. Concentrations range downward from approximately 380 nmol/gram in the alkyl sulfonic acid series, and from 9 nmol/gram in the alkyl phosphonic acid series.

Conservation of Shannon's redundancy for proteins. [information theory applied to amino acid sequences

NASA Technical Reports Server (NTRS)

Gatlin, L. L.

1974-01-01

Concepts of information theory are applied to examine various proteins in terms of their redundancy in natural originators such as animals and plants. The Monte Carlo method is used to derive information parameters for random protein sequences. Real protein sequence parameters are compared with the standard parameters of protein sequences having a specific length. The tendency of a chain to contain some amino acids more frequently than others and the tendency of a chain to contain certain amino acid pairs more frequently than other pairs are used as randomness measures of individual protein sequences. Non-periodic proteins are generally found to have random Shannon redundancies except in cases of constraints due to short chain length and genetic codes. Redundant characteristics of highly periodic proteins are discussed. A degree of periodicity parameter is derived.
Recombination–deletion between homologous cassettes in retrovirus is suppressed via a strategy of degenerate codon substitution

PubMed Central

Im, Eung Jun; Bais, Anthony J; Yang, Wen; Ma, Qiangzhong; Guo, Xiuyang; Sepe, Steven M; Junghans, Richard P

2014-01-01

Transduction and expression procedures in gene therapy protocols may optimally transfer more than a single gene to correct a defect and/or transmit new functions to recipient cells or organisms. This may be accomplished by transduction with two (or more) vectors, or, more efficiently, in a single vector. Occasionally, it may be useful to coexpress homologous genes or chimeric proteins with regions of shared homology. Retroviridae include the dominant vector systems for gene transfer (e.g., gamma-retro and lentiviruses) and are capable of such multigene expression. However, these same viruses are known for efficient recombination–deletion when domains are duplicated within the viral genome. This problem can be averted by resorting to two-vector strategies (two-chain two-vector), but at a penalty to cost, convenience, and efficiency. Employing a chimeric antigen receptor system as an example, we confirm that coexpression of two genes with homologous domains in a single gamma-retroviral vector (two-chain single-vector) leads to recombination–deletion between repeated sequences, excising the equivalent of one of the chimeric antigen receptors. Here, we show that a degenerate codon substitution strategy in the two-chain single-vector format efficiently suppressed intravector deletional loss with rescue of balanced gene coexpression by minimizing sequence homology between repeated domains and preserving the final protein sequence. PMID:25419532
Isolation of complementary DNA clones encoding pathogenesis-related proteins P and Q, two acidic chitinases from tobacco.

PubMed Central

Payne, G; Ahl, P; Moyer, M; Harper, A; Beck, J; Meins, F; Ryals, J

1990-01-01

Complementary DNA clones encoding two isoforms of the acidic endochitinase (chitinase, EC 3.2.1.14) from tobacco were isolated. Comparison of amino acid sequences deduced from the cDNA clones and the sequence of peptides derived from purified proteins show that these clones encode the pathogenesis-related proteins PR-P and PR-Q. The cDNA inserts were not homologous to either the bacterial form of chitinase or the form from cucumber but shared significant homology to the basic form of chitinase from tobacco and bean. The acidic isoforms of tobacco chitinase did not contain the amino-terminal, cysteine-rich "hevein" domain found in the basic isoforms, indicating that this domain, which binds chitin, is not essential for chitinolytic activity. The accumulation of mRNA for the pathogenesis-related proteins PR-1, PR-R, PR-P, and PR-Q in Xanthi.nc tobacco leaves following infection with tobacco mosaic virus was measured by primer extension. The results indicate that the induction of these proteins during the local necrotic lesion response to the virus is coordinated at the mRNA level. Images PMID:2296608
Analysis of the DNA sequence of a 15,500 bp fragment near the left telomere of chromosome XV from Saccharomyces cerevisiae reveals a putative sugar transporter, a carboxypeptidase homologue and two new open reading frames.

PubMed

Gamo, F J; Lafuente, M J; Casamayor, A; Ariño, J; Aldea, M; Casas, C; Herrero, E; Gancedo, C

1996-06-15

We report the sequence of a 15.5 kb DNA segment located near the left telomere of chromosome XV of Saccharomyces cerevisiae. The sequence contains nine open reading frames (ORFs) longer than 300 bp. Three of them are internal to other ones. One corresponds to the gene LGT3 that encodes a putative sugar transporter. Three adjacent ORFs were separated by two stop codons in frame. These ORFs presented homology with the gene CPS1 that encodes carboxypeptidase S. The stop codons were not found in the same sequence derived from another yeast strain. Two other ORFs without significant homology in databases were also found. One of them, O0420, is very rich in serine and threonine and presents a series of repeated or similar amino acid stretches along the sequence.
Homologous and heterologous recombination between adenovirus vector DNA and chromosomal DNA.

PubMed

Stephen, Sam Laurel; Sivanandam, Vijayshankar Ganesh; Kochanek, Stefan

2008-11-01

Adenovirus vector DNA is perceived to remain as episome following gene transfer. We quantitatively and qualitatively analysed recombination between high capacity adenoviral vector (HC-AdV) and chromosomal DNA following gene transfer in vitro. We studied homologous and heterologous recombination with a single HC-AdV carrying (i) a large genomic HPRT fragment with the HPRT CHICAGO mutation causing translational stop upon homologous recombination with the HPRT locus and (ii) a selection marker to allow for clonal selection in the event of heterologous recombination. We analysed the sequences at the junctions between vector and chromosomal DNA. In primary cells and in cell lines, the frequency of homologous recombination ranged from 2 x 10(-5) to 1.6 x 10(-6). Heterologous recombination occurred at rates between 5.5 x 10(-3) and 1.1 x 10(-4). HC-AdV DNA integrated via the termini mostly as intact molecules. Analysis of the junction sequences indicated vector integration in a relatively random manner without an obvious preference for particular chromosomal regions, but with a preference for integration into genes. Integration into protooncogenes or tumor suppressor genes was not observed. Patchy homologies between vector termini and chromosomal DNA were found at the site of integration. Although the majority of integrations had occurred without causing mutations in the chromosomal DNA, cases of nucleotide substitutions and insertions were observed. In several cases, deletions of even relative large chromosomal regions were likely. These results extend previous information on the integration patterns of adenovirus vector DNA and contribute to a risk-benefit assessment of adenovirus-mediated gene transfer.
Generation and Analysis of Expressed Sequence Tags from Olea europaea L.

PubMed Central

Ozdemir Ozgenturk, Nehir; Oruç, Fatma; Sezerman, Ugur; Kuçukural, Alper; Vural Korkut, Senay; Toksoz, Feriha; Un, Cemal

2010-01-01

Olive (Olea europaea L.) is an important source of edible oil which was originated in Near-East region. In this study, two cDNA libraries were constructed from young olive leaves and immature olive fruits for generation of ESTs to discover the novel genes and search the function of unknown genes of olive. The randomly selected 3840 colonies were sequenced for EST collection from both libraries. Readable 2228 sequences for olive leaf and 1506 sequences for olive fruit were assembled into 205 and 69 contigs, respectively, whereas 2478 were singletons. Putative functions of all 2752 differentially expressed unique sequences were designated by gene homology based on BLAST and annotated using BLAST2GO. While 1339 ESTs show no homology to the database, 2024 ESTs have homology (under 80%) with hypothetical proteins, putative proteins, expressed proteins, and unknown proteins in NCBI-GenBank. 635 EST's unique genes sequence have been identified by over 80% homology to known function in other species which were not previously described in Olea family. Only 3.1% of total EST's was shown similarity with olive database existing in NCBI. This generated EST's data and consensus sequences were submitted to NCBI as valuable source for functional genome studies of olive. PMID:21197085
Complete Amino Acid Sequence of a Copper/Zinc-Superoxide Dismutase from Ginger Rhizome.

PubMed

Nishiyama, Yuki; Fukamizo, Tamo; Yoneda, Kazunari; Araki, Tomohiro

2017-04-01

Superoxide dismutase (SOD) is an antioxidant enzyme protecting cells from oxidative stress. Ginger (Zingiber officinale) is known for its antioxidant properties, however, there are no data on SODs from ginger rhizomes. In this study, we purified SOD from the rhizome of Z. officinale (Zo-SOD) and determined its complete amino acid sequence using N terminal sequencing, amino acid analysis, and de novo sequencing by tandem mass spectrometry. Zo-SOD consists of 151 amino acids with two signature Cu/Zn-SOD motifs and has high similarity to other plant Cu/Zn-SODs. Multiple sequence alignment showed that Cu/Zn-binding residues and cysteines forming a disulfide bond, which are highly conserved in Cu/Zn-SODs, are also present in Zo-SOD. Phylogenetic analysis revealed that plant Cu/Zn-SODs clustered into distinct chloroplastic, cytoplasmic, and intermediate groups. Among them, only chloroplastic enzymes carried amino acid substitutions in the region functionally important for enzymatic activity, suggesting that chloroplastic SODs may have a function distinct from those of SODs localized in other subcellular compartments. The nucleotide sequence of the Zo-SOD coding region was obtained by reverse-translation, and the gene was synthesized, cloned, and expressed. The recombinant Zo-SOD demonstrated pH stability in the range of 5-10, which is similar to other reported Cu/Zn-SODs, and thermal stability in the range of 10-60 °C, which is higher than that for most plant Cu/Zn-SODs but lower compared to the enzyme from a Z. officinale relative Curcuma aromatica.
Protein location prediction using atomic composition and global features of the amino acid sequence

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cherian, Betsy Sheena, E-mail: betsy.skb@gmail.com; Nair, Achuthsankar S.

2010-01-22

Subcellular location of protein is constructive information in determining its function, screening for drug candidates, vaccine design, annotation of gene products and in selecting relevant proteins for further studies. Computational prediction of subcellular localization deals with predicting the location of a protein from its amino acid sequence. For a computational localization prediction method to be more accurate, it should exploit all possible relevant biological features that contribute to the subcellular localization. In this work, we extracted the biological features from the full length protein sequence to incorporate more biological information. A new biological feature, distribution of atomic composition is effectivelymore » used with, multiple physiochemical properties, amino acid composition, three part amino acid composition, and sequence similarity for predicting the subcellular location of the protein. Support Vector Machines are designed for four modules and prediction is made by a weighted voting system. Our system makes prediction with an accuracy of 100, 82.47, 88.81 for self-consistency test, jackknife test and independent data test respectively. Our results provide evidence that the prediction based on the biological features derived from the full length amino acid sequence gives better accuracy than those derived from N-terminal alone. Considering the features as a distribution within the entire sequence will bring out underlying property distribution to a greater detail to enhance the prediction accuracy.« less
Microorganisms for producing organic acids

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pfleger, Brian Frederick; Begemann, Matthew Brett

Organic acid-producing microorganisms and methods of using same. The organic acid-producing microorganisms comprise modifications that reduce or ablate AcsA activity or AcsA homolog activity. The modifications increase tolerance of the microorganisms to such organic acids as 3-hydroxypropionic acid, acrylic acid, propionic acid, lactic acid, and others. Further modifications to the microorganisms increase production of such organic acids as 3-hydroxypropionic acid, lactate, and others. Methods of producing such organic acids as 3-hydroxypropionic acid, lactate, and others with the modified microorganisms are provided. Methods of using acsA or homologs thereof as counter-selectable markers are also provided.
Microorganisms for producing organic acids

DOEpatents

Pfleger, Brian Frederick; Begemann, Matthew Brett

2014-09-30

Organic acid-producing microorganisms and methods of using same. The organic acid-producing microorganisms comprise modifications that reduce or ablate AcsA activity or AcsA homolog activity. The modifications increase tolerance of the microorganisms to such organic acids as 3-hydroxypropionic acid, acrylic acid, propionic acid, lactic acid, and others. Further modifications to the microorganisms increase production of such organic acids as 3-hydroxypropionic acid, lactate, and others. Methods of producing such organic acids as 3-hydroxypropionic acid, lactate, and others with the modified microorganisms are provided. Methods of using acsA or homologs thereof as counter-selectable markers are also provided.
Searching Remote Homology with Spectral Clustering with Symmetry in Neighborhood Cluster Kernels

PubMed Central

Maulik, Ujjwal; Sarkar, Anasua

2013-01-01

Remote homology detection among proteins utilizing only the unlabelled sequences is a central problem in comparative genomics. The existing cluster kernel methods based on neighborhoods and profiles and the Markov clustering algorithms are currently the most popular methods for protein family recognition. The deviation from random walks with inflation or dependency on hard threshold in similarity measure in those methods requires an enhancement for homology detection among multi-domain proteins. We propose to combine spectral clustering with neighborhood kernels in Markov similarity for enhancing sensitivity in detecting homology independent of “recent” paralogs. The spectral clustering approach with new combined local alignment kernels more effectively exploits the unsupervised protein sequences globally reducing inter-cluster walks. When combined with the corrections based on modified symmetry based proximity norm deemphasizing outliers, the technique proposed in this article outperforms other state-of-the-art cluster kernels among all twelve implemented kernels. The comparison with the state-of-the-art string and mismatch kernels also show the superior performance scores provided by the proposed kernels. Similar performance improvement also is found over an existing large dataset. Therefore the proposed spectral clustering framework over combined local alignment kernels with modified symmetry based correction achieves superior performance for unsupervised remote homolog detection even in multi-domain and promiscuous domain proteins from Genolevures database families with better biological relevance. Source code available upon request. Contact: sarkar@labri.fr. PMID:23457439
Human retroviruses and AIDS 1996. A compilation and analysis of nucleic acid and amino acid sequences

DOE Office of Scientific and Technical Information (OSTI.GOV)

Myers, G.; Foley, B.; Korber, B.

1997-04-01

This compendium and the accompanying floppy diskettes are the result of an effort to compile and rapidly publish all relevant molecular data concerning the human immunodeficiency viruses (HIV) and related retroviruses. The scope of the compendium and database is best summarized by the five parts that it comprises: (1) Nuclear Acid Alignments and Sequences; (2) Amino Acid Alignments; (3) Analysis; (4) Related Sequences; and (5) Database Communications. Information within all the parts is updated throughout the year on the Web site, http://hiv-web.lanl.gov. While this publication could take the form of a review or sequence monograph, it is not so conceived.more » Instead, the literature from which the database is derived has simply been summarized and some elementary computational analyses have been performed upon the data. Interpretation and commentary have been avoided insofar as possible so that the reader can form his or her own judgments concerning the complex information. In addition to the general descriptions of the parts of the compendium, the user should read the individual introductions for each part.« less
All five host-range variants of Xanthomonas citri carry one pthA homolog with 17.5 repeats that determines pathogenicity on citrus, but none determine host-range variation.

PubMed

Al-Saadi, Abdulwahid; Reddy, Joseph D; Duan, Yong P; Brunings, Asha M; Yuan, Qiaoping; Gabriel, Dean W

2007-08-01

Citrus canker disease is caused by five groups of Xanthomonas citri strains that are distinguished primarily by host range: three from Asia (A, A*, and A(w)) and two that form a phylogenetically distinct clade and originated in South America (B and C). Every X. citri strain carries multiple DNA fragments that hybridize with pthA, which is essential for the pathogenicity of wide-host-range X. citri group A strain 3213. DNA fragments that hybridized with pthA were cloned from a representative strain from all five groups. Each strain carried one and only one pthA homolog that functionally complemented a knockout mutation of pthA in 3213. Every complementing homolog was of identical size to pthA and carried 17.5 nearly identical, direct tandem repeats, including three new genes from narrow-host-range groups C (pthC), A(w) (pthAW), and A* (pthA*). Every noncomplementing paralog was of a different size; one of these was sequenced from group A* (pthA*-2) and was found to have an intact promoter and full-length reading frame but with 15.5 repeats. None of the complementing homologs nor any of the noncomplementing paralogs conferred avirulence to 3213 on grapefruit or suppressed avirulence of a group A* strain on grapefruit. A knockout mutation of pthC in a group C strain resulted in loss of pathogenicity on lime, but the strain was unaffected in ability to elicit an HR on grapefruit. This pthC- mutant was fully complemented by pthA, pthB, or pthC. Analysis of the predicted amino-acid sequences of all functional pthA homologs and nonfunctional paralogs indicated that the specific sequence of the 17th repeat may be essential for pathogenicity of X. citri on citrus.
Diagnostics based on nucleic acid sequence variant profiling: PCR, hybridization, and NGS approaches.

PubMed

Khodakov, Dmitriy; Wang, Chunyan; Zhang, David Yu

2016-10-01

Nucleic acid sequence variations have been implicated in many diseases, and reliable detection and quantitation of DNA/RNA biomarkers can inform effective therapeutic action, enabling precision medicine. Nucleic acid analysis technologies being translated into the clinic can broadly be classified into hybridization, PCR, and sequencing, as well as their combinations. Here we review the molecular mechanisms of popular commercial assays, and their progress in translation into in vitro diagnostics. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
The complete nucleotide sequence of RNA 3 of a peach isolate of Prunus necrotic ringspot virus.

PubMed

Hammond, R W; Crosslin, J M

1995-04-01

The complete nucleotide sequence of RNA 3 of the PE-5 peach isolate of Prunus necrotic ringspot ilarvirus (PNRSV) was obtained from cloned cDNA. The RNA sequence is 1941 nucleotides and contains two open reading frames (ORFs). ORF 1 consisted of 284 amino acids with a calculated molecular weight of 31,729 Da and ORF 2 contained 224 amino acids with a calculated molecular weight of 25,018 Da. ORF 2 corresponds to the coat protein gene. Expression of ORF 2 engineered into a pTrcHis vector in Escherichia coli results in a fusion polypeptide of approximately 28 kDa which cross-reacts with PNRSV polyclonal antiserum. Analysis of the coat protein amino acid sequence reveals a putative "zinc-finger" domain at the amino-terminal portion of the protein. Two tetranucleotide AUGC motifs occur in the 3'-UTR of the RNA and may function in coat protein binding and genome activation. ORF 1 homologies to other ilarviruses and alfalfa mosaic virus are confined to limited regions of conserved amino acids. The translated amino acid sequence of the coat protein gene shows 92% similarity to one isolate of apple mosaic virus, a closely related member of the ilarvirus group of plant viruses, but only 66% similarity to the amino acid sequence of the coat protein gene of a second isolate. These relationships are also reflected at the nucleotide sequence level. These results in one instance confirm the close similarities observed at the biophysical and serological levels between these two viruses, but on the other hand call into question the nomenclature used to describe these viruses.
A draft sequence of the rice genome (Oryza sativa L. ssp. indica).

PubMed

Yu, Jun; Hu, Songnian; Wang, Jun; Wong, Gane Ka-Shu; Li, Songgang; Liu, Bin; Deng, Yajun; Dai, Li; Zhou, Yan; Zhang, Xiuqing; Cao, Mengliang; Liu, Jing; Sun, Jiandong; Tang, Jiabin; Chen, Yanjiong; Huang, Xiaobing; Lin, Wei; Ye, Chen; Tong, Wei; Cong, Lijuan; Geng, Jianing; Han, Yujun; Li, Lin; Li, Wei; Hu, Guangqiang; Huang, Xiangang; Li, Wenjie; Li, Jian; Liu, Zhanwei; Li, Long; Liu, Jianping; Qi, Qiuhui; Liu, Jinsong; Li, Li; Li, Tao; Wang, Xuegang; Lu, Hong; Wu, Tingting; Zhu, Miao; Ni, Peixiang; Han, Hua; Dong, Wei; Ren, Xiaoyu; Feng, Xiaoli; Cui, Peng; Li, Xianran; Wang, Hao; Xu, Xin; Zhai, Wenxue; Xu, Zhao; Zhang, Jinsong; He, Sijie; Zhang, Jianguo; Xu, Jichen; Zhang, Kunlin; Zheng, Xianwu; Dong, Jianhai; Zeng, Wanyong; Tao, Lin; Ye, Jia; Tan, Jun; Ren, Xide; Chen, Xuewei; He, Jun; Liu, Daofeng; Tian, Wei; Tian, Chaoguang; Xia, Hongai; Bao, Qiyu; Li, Gang; Gao, Hui; Cao, Ting; Wang, Juan; Zhao, Wenming; Li, Ping; Chen, Wei; Wang, Xudong; Zhang, Yong; Hu, Jianfei; Wang, Jing; Liu, Song; Yang, Jian; Zhang, Guangyu; Xiong, Yuqing; Li, Zhijie; Mao, Long; Zhou, Chengshu; Zhu, Zhen; Chen, Runsheng; Hao, Bailin; Zheng, Weimou; Chen, Shouyi; Guo, Wei; Li, Guojie; Liu, Siqi; Tao, Ming; Wang, Jian; Zhu, Lihuang; Yuan, Longping; Yang, Huanming

2002-04-05

We have produced a draft sequence of the rice genome for the most widely cultivated subspecies in China, Oryza sativa L. ssp. indica, by whole-genome shotgun sequencing. The genome was 466 megabases in size, with an estimated 46,022 to 55,615 genes. Functional coverage in the assembled sequences was 92.0%. About 42.2% of the genome was in exact 20-nucleotide oligomer repeats, and most of the transposons were in the intergenic regions between genes. Although 80.6% of predicted Arabidopsis thaliana genes had a homolog in rice, only 49.4% of predicted rice genes had a homolog in A. thaliana. The large proportion of rice genes with no recognizable homologs is due to a gradient in the GC content of rice coding sequences.
"De-novo" amino acid sequence elucidation of protein G'e by combined "top-down" and "bottom-up" mass spectrometry.

PubMed

Yefremova, Yelena; Al-Majdoub, Mahmoud; Opuni, Kwabena F M; Koy, Cornelia; Cui, Weidong; Yan, Yuetian; Gross, Michael L; Glocker, Michael O

2015-03-01

Mass spectrometric de-novo sequencing was applied to review the amino acid sequence of a commercially available recombinant protein G´ with great scientific and economic importance. Substantial deviations to the published amino acid sequence (Uniprot Q54181) were found by the presence of 46 additional amino acids at the N-terminus, including a so-called "His-tag" as well as an N-terminal partial α-N-gluconoylation and α-N-phosphogluconoylation, respectively. The unexpected amino acid sequence of the commercial protein G' comprised 241 amino acids and resulted in a molecular mass of 25,998.9 ± 0.2 Da for the unmodified protein. Due to the higher mass that is caused by its extended amino acid sequence compared with the original protein G' (185 amino acids), we named this protein "protein G'e." By means of mass spectrometric peptide mapping, the suggested amino acid sequence, as well as the N-terminal partial α-N-gluconoylations, was confirmed with 100% sequence coverage. After the protein G'e sequence was determined, we were able to determine the expression vector pET-28b from Novagen with the Xho I restriction enzyme cleavage site as the best option that was used for cloning and expressing the recombinant protein G'e in E. coli. A dissociation constant (K(d)) value of 9.4 nM for protein G'e was determined thermophoretically, showing that the N-terminal flanking sequence extension did not cause significant changes in the binding affinity to immunoglobulins.
Resolution of model Holliday junctions by yeast endonuclease: effect of DNA structure and sequence.

PubMed Central

Parsons, C A; Murchie, A I; Lilley, D M; West, S C

1989-01-01

The resolution of Holliday junctions in DNA involves specific cleavage at or close to the site of the junction. A nuclease from Saccharomyces cerevisiae cleaves model Holliday junctions in vitro by the introduction of nicks in regions of duplex DNA adjacent to the crossover point. In previous studies [Parsons and West (1988) Cell, 52, 621-629] it was shown that cleavage occurred within homologous arm sequences with precise symmetry across the junction. In contrast, junctions with heterologous arm sequences were cleaved asymmetrically. In this work, we have studied the effect of sequence changes and base modification upon the site of cleavage. It is shown that the specificity of cleavage is unchanged providing that perfect homology is maintained between opposing arm sequences. However, in the absence of homology, cleavage depends upon sequence context and is affected by minor changes such as base modification. These data support the proposed mechanism for cleavage of a Holliday junction, which requires homologous alignment of arm sequences in an enzyme--DNA complex as a prerequisite for symmetrical cleavage by the yeast endonuclease. Images PMID:2653810
Correlation between fibroin amino acid sequence and physical silk properties.

PubMed

Fedic, Robert; Zurovec, Michal; Sehnal, Frantisek

2003-09-12

The fiber properties of lepidopteran silk depend on the amino acid repeats that interact during H-fibroin polymerization. The aim of our research was to relate repeat composition to insect biology and fiber strength. Representative regions of the H-fibroin genes were sequenced and analyzed in three pyralid species: wax moth (Galleria mellonella), European flour moth (Ephestia kuehniella), and Indian meal moth (Plodia interpunctella). The amino acid repeats are species-specific, evidently a diversification of an ancestral region of 43 residues, and include three types of regularly dispersed motifs: modifications of GSSAASAA sequence, stretches of tripeptides GXZ where X and Z represent bulky residues, and sequences similar to PVIVIEE. No concatenations of GX dipeptide or alanine, which are typical for Bombyx silkworms and Antheraea silk moths, respectively, were found. Despite different repeat structure, the silks of G. mellonella and E. kuehniella exhibit similar tensile strength as the Bombyx and Antheraea silks. We suggest that in these latter two species, variations in the repeat length obstruct repeat alignment, but sufficiently long stretches of iterated residues get superposed to interact. In the pyralid H-fibroins, interactions of the widely separated and diverse motifs depend on the precision of repeat matching; silk is strong in G. mellonella and E. kuehniella, with 2-3 types of long homogeneous repeats, and nearly 10 times weaker in P. interpunctella, with seven types of shorter erratic repeats. The high proportion of large amino acids in the H-fibroin of pyralids has probably evolved in connection with the spinning habit of caterpillars that live in protective silk tubes and spin continuously, enlarging the tubes on one end and partly devouring the other one. The silk serves as a depot of energetically rich and essential amino acids that may be scarce in the diet.
Msh2 Blocks an Alternative Mechanism for Non-Homologous Tail Removal during Single-Strand Annealing in Saccharomyces cerevisiae

PubMed Central

Manthey, Glenn M.; Naik, Nilan; Bailis, Adam M.

2009-01-01

Chromosomal translocations are frequently observed in cells exposed to agents that cause DNA double-strand breaks (DSBs), such as ionizing radiation and chemotherapeutic drugs, and are often associated with tumors in mammals. Recently, translocation formation in the budding yeast, Saccharomyces cerevisiae, has been found to occur at high frequencies following the creation of multiple DSBs adjacent to repetitive sequences on non-homologous chromosomes. The genetic control of translocation formation and the chromosome complements of the clones that contain translocations suggest that translocation formation occurs by single-strand annealing (SSA). Among the factors important for translocation formation by SSA is the central mismatch repair (MMR) and homologous recombination (HR) factor, Msh2. Here we describe the effects of several msh2 missense mutations on translocation formation that suggest that Msh2 has separable functions in stabilizing annealed single strands, and removing non-homologous sequences from their ends. Additionally, interactions between the msh2 alleles and a null allele of RAD1, which encodes a subunit of a nuclease critical for the removal of non-homologous tails suggest that Msh2 blocks an alternative mechanism for removing these sequences. These results suggest that Msh2 plays multiple roles in the formation of chromosomal translocations following acute levels of DNA damage. PMID:19834615

Molecular Cloning and Sequence Analysis of a Phenylalanine Ammonia-Lyase Gene from Dendrobium

PubMed Central

Cai, Yongping; Lin, Yi

2013-01-01

In this study, a phenylalanine ammonia-lyase (PAL) gene was cloned from Dendrobium candidum using homology cloning and RACE. The full-length sequence and catalytic active sites that appear in PAL proteins of Arabidopsis thaliana and Nicotiana tabacum are also found: PAL cDNA of D. candidum (designated Dc-PAL1, GenBank No. JQ765748) has 2,458 bps and contains a complete open reading frame (ORF) of 2,142 bps, which encodes 713 amino acid residues. The amino acid sequence of DcPAL1 has more than 80% sequence identity with the PAL genes of other plants, as indicated by multiple alignments. The dominant sites and catalytic active sites, which are similar to that showing in PAL proteins of Arabidopsis thaliana and Nicotiana tabacum, are also found in DcPAL1. Phylogenetic tree analysis revealed that DcPAL is more closely related to PALs from orchidaceae plants than to those of other plants. The differential expression patterns of PAL in protocorm-like body, leaf, stem, and root, suggest that the PAL gene performs multiple physiological functions in Dendrobium candidum. PMID:23638048
Sequence of rat alpha- and gamma-casein mRNAs: evolutionary comparison of the calcium-dependent rat casein multigene family.

PubMed Central

Hobbs, A A; Rosen, J M

1982-01-01

The complete sequences of rat alpha- and gamma-casein mRNAs have been determined. The 1402-nucleotide alpha- and 864-nucleotide gamma-casein mRNAs both encode 15 amino acid signal peptides and mature proteins of 269 and 164 residues, respectively. Considerable homology between the 5' non-coding regions, and the regions encoding the signal peptides and the phosphorylation sites, in these mRNAs as compared to several other rodent casein mRNAs, was observed. Significant homology was also detected between rat alpha- and bovine alpha s1-casein. Comparison of the rodent and bovine sequences suggests that the caseins evolved at about the time of the appearance of the primitive mammals. This may have occurred by intragenic duplication of a nucleotide sequence encoding a primitive phosphorylation site, -(Ser)n-Glu-Glu-, and intergenic duplication resulting in the small casein multigene family. A unique feature of the rat alpha-casein sequence is an insertion in the coding region containing 10 repeated elements of 18 nucleotides each. This insertion appears to have occurred 7-12 million years ago, just prior to the divergence of rat and mouse. Images PMID:6298707
Peroxisomal Pex11 is a pore-forming protein homologous to TRPM channels.

PubMed

Mindthoff, Sabrina; Grunau, Silke; Steinfort, Laura L; Girzalsky, Wolfgang; Hiltunen, J Kalervo; Erdmann, Ralf; Antonenkov, Vasily D

2016-02-01

More than 30 proteins (Pex proteins) are known to participate in the biogenesis of peroxisomes-ubiquitous oxidative organelles involved in lipid and ROS metabolism. The Pex11 family of homologous proteins is responsible for division and proliferation of peroxisomes. We show that yeast Pex11 is a pore-forming protein sharing sequence similarity with TRPM cation-selective channels. The Pex11 channel with a conductance of Λ=4.1 nS in 1.0M KCl is moderately cation-selective (PK(+)/PCl(-)=1.85) and resistant to voltage-dependent closing. The estimated size of the channel's pore (r~0.6 nm) supports the notion that Pex11 conducts solutes with molecular mass below 300-400 Da. We localized the channel's selectivity determining sequence. Overexpression of Pex11 resulted in acceleration of fatty acids β-oxidation in intact cells but not in the corresponding lysates. The β-oxidation was affected in cells by expression of the Pex11 protein carrying point mutations in the selectivity determining sequence. These data suggest that the Pex11-dependent transmembrane traffic of metabolites may be a rate-limiting step in the β-oxidation of fatty acids. This conclusion was corroborated by analysis of the rate of β-oxidation in yeast strains expressing Pex11 with mutations mimicking constitutively phosphorylated (S165D, S167D) or unphosphorylated (S165A, S167A) protein. The results suggest that phosphorylation of Pex11 is a mechanism that can control the peroxisomal β-oxidation rate. Our results disclose an unexpected function of Pex11 as a non-selective channel responsible for transfer of metabolites across peroxisomal membrane. The data indicate that peroxins may be involved in peroxisomal metabolic processes in addition to their role in peroxisome biogenesis. Copyright © 2015 Elsevier B.V. All rights reserved.
Complete Genome Sequences of Porcine Epidemic Diarrhea Virus Strains JSLS-1/2015 and JS-2/2015 Isolated from China.

PubMed

Tao, Jie; Li, Benqiang; Zhang, Chunling; Liu, Huili

2016-11-10

Two porcine epidemic diarrhea virus (PEDV) strains, JSLS-1/2015 and JS-2/2015, were isolated from piglets with watery diarrhea in South China. Two genomic sequences were highly homologous to the attenuated DR13 strain. Furthermore, JSLS-1/2015 contains a 24-amino-acid deletion in open reading frame 1b, which was first reported in PEDV isolates. Copyright © 2016 Tao et al.
Improved γ-linolenic acid production in Mucor circinelloides by homologous overexpressing of delta-12 and delta-6 desaturases.

PubMed

Zhang, Yao; Luan, Xiao; Zhang, Huaiyuan; Garre, Victoriano; Song, Yuanda; Ratledge, Colin

2017-06-21

γ-Linolenic acid (GLA) is important because of its nutritional value and medicinal applications. Although the biosynthetic pathways of some plant and microbial GLA have been deciphered, current understanding of the correlation between desaturases and GLA synthesis in oleaginous fungi is incomplete. In previous work, we found that a large amount of oleic acid (OA) had not been converted to linoleic acid (LA) or GLA in Mucor circinelloides CBS 277.49, which may be due to inadequate activities of the delta-12 or delta-6 desaturases, and thus leading to the accumulation of OA and LA. Thus, it is necessary to explore the main contributing factor during the process of GLA biosynthesis in M. circinelloides. To enhance GLA production in M. circinelloides, homologous overexpression of delta-12 and two delta-6 desaturases (named delta-6-1 and delta-6-2, respectively) were analyzed. When delta-6 desaturase were overexpressed in M. circinelloides, up to 43% GLA was produced in the total fatty acids, and the yield of GLA reached 180 mg/l, which were, respectively, 38 and 33% higher than the control strain. These findings revealed that delta-6 desaturase (especially for delta-6-1 desaturase) plays an important role in GLA synthesis by M. circinelloides. The strain overexpressing delta-6-1 desaturase may have potential application in microbial GLA production.
Biosynthesis and expression of ependymin homologous sequences in zebrafish brain.

PubMed

Sterrer, S; Königstorfer, A; Hoffmann, W

1990-01-01

Ependymins are unique, brain specific glycoproteins, which are major constituents of the cerebrospinal fluid. Originally, they were discovered in goldfish and are thought to be involved in synaptic plasticity. In the present study two transcripts were characterized in Brachydanio rerio originating from a single gene possibly by alternative splicing. These transcripts differ only in the length of their 3'-non-coding-regions and the encoded protein shares 90 and 88% homology with the two corresponding goldfish proteins, respectively. In situ hybridization revealed the expression of ependymins exclusively in the leptomeninx including its invaginations but not at all in the ependymal layer surrounding the ventricles. An initial developmental profile showed that ependymins first appear before hatching, i.e. between 48 and 72 h postfertilization.
[Study on the genetic difference of SEO type Hantaviruses].

PubMed

Zhang, X; Zhou, S; Wang, H; Hu, J; Guan, Z; Liu, H

2000-10-01

To understand the genetic type of Hantaviruses and the difference between them caused by rodents in Beijing and to furhter explore the source of the infectious factors. Hantavirus RNA, isolated from lungs of rodents captured in Beijing and positive with Hantavirus antigens with frozen sectioning and Immunofluorescent assay, were reverse-transcribed and amplified with PCR with Hantavirus-specific primers. Five of the PCR amplifications were discovered and sequenced with 300 bp sequence data of M segments (from 2003 - 2302nt according cDNA of seoul 8039 strain). Nucleotide sequence homology showed that they were sequences of SEO-type Hantavirus. Compared with SEO type Hantavirus, the nucleotide sequence homology of these samples was more than 94% while the homology of amonia acid sequence was more than 98%. When compared with HNT type Hantavirus, the homology of nucleotide sequence became less than 72% with the homology of amonia acid sequence less than 81%. Similar to other Hantavirus of SEO type, their nucleotide sequences and deduced amino acid sequences were highly preserved. Phylogenetic tree analysis showed that the five viruses could be divided into at least 4 branches. It was quite likely that there were at least two sub-type SEO viruses with 4 branches that were circulating in Beijing.
Annotating Protein Functional Residues by Coupling High-Throughput Fitness Profile and Homologous-Structure Analysis.

PubMed

Du, Yushen; Wu, Nicholas C; Jiang, Lin; Zhang, Tianhao; Gong, Danyang; Shu, Sara; Wu, Ting-Ting; Sun, Ren

2016-11-01

Identification and annotation of functional residues are fundamental questions in protein sequence analysis. Sequence and structure conservation provides valuable information to tackle these questions. It is, however, limited by the incomplete sampling of sequence space in natural evolution. Moreover, proteins often have multiple functions, with overlapping sequences that present challenges to accurate annotation of the exact functions of individual residues by conservation-based methods. Using the influenza A virus PB1 protein as an example, we developed a method to systematically identify and annotate functional residues. We used saturation mutagenesis and high-throughput sequencing to measure the replication capacity of single nucleotide mutations across the entire PB1 protein. After predicting protein stability upon mutations, we identified functional PB1 residues that are essential for viral replication. To further annotate the functional residues important to the canonical or noncanonical functions of viral RNA-dependent RNA polymerase (vRdRp), we performed a homologous-structure analysis with 16 different vRdRp structures. We achieved high sensitivity in annotating the known canonical polymerase functional residues. Moreover, we identified a cluster of noncanonical functional residues located in the loop region of the PB1 β-ribbon. We further demonstrated that these residues were important for PB1 protein nuclear import through the interaction with Ran-binding protein 5. In summary, we developed a systematic and sensitive method to identify and annotate functional residues that are not restrained by sequence conservation. Importantly, this method is generally applicable to other proteins about which homologous-structure information is available. To fully comprehend the diverse functions of a protein, it is essential to understand the functionality of individual residues. Current methods are highly dependent on evolutionary sequence conservation, which is
Multiple copies of a bile acid-inducible gene in Eubacterium sp. strain VPI 12708.

PubMed Central

Gopal-Srivastava, R; Mallonee, D H; White, W B; Hylemon, P B

1990-01-01

Eubacterium sp. strain VPI 12708 is an anaerobic intestinal bacterium which possesses inducible bile acid 7-dehydroxylation activity. Several new polypeptides are produced in this strain following induction with cholic acid. Genes coding for two copies of a bile acid-inducible 27,000-dalton polypeptide (baiA1 and baiA2) have been previously cloned and sequenced. We now report on a gene coding for a third copy of this 27,000-dalton polypeptide (baiA3). The baiA3 gene has been cloned in lambda DASH on an 11.2-kilobase DNA fragment from a partial Sau3A digest of the Eubacterium DNA. DNA sequence analysis of the baiA3 gene revealed 100% homology with the baiA1 gene within the coding region of the 27,000-dalton polypeptides. The baiA2 gene shares 81% sequence identity with the other two genes at the nucleotide level. The flanking nucleotide sequences associated with the baiA1 and baiA3 genes are identical for 930 bases in the 5' direction from the initiation codon and for at least 325 bases in the 3' direction from the stop codon, including the putative promoter regions for the genes. An additional open reading frame (occupying from 621 to 648 bases, depending on the correct start codon) was found in the identical 5' regions associated with the baiA1 and baiA3 clones. The 5' sequence 930 bases upstream from the baiA1 and baiA3 genes was totally divergent. The baiA2 gene, which is part of a large bile acid-inducible operon, showed no homology with the other two genes either in the 5' or 3' direction from the polypeptide coding region, except for a 15-base-pair presumed ribosome-binding site in the 5' region. These studies strongly suggest that a gene duplication (baiA1 and baiA3) has occurred and is stably maintained in this bacterium. Images PMID:2376563
Complete cDNA sequence and amino acid analysis of a bovine ribonuclease K6 gene.

PubMed

Pietrowski, D; Förster, M

2000-01-01

The complete cDNA sequence of a ribonuclease k6 gene of Bos Taurus has been determined. It codes for a protein with 154 amino acids and contains the invariant cysteine, histidine and lysine residues as well as the characteristic motifs specific to ribonuclease active sites. The deduced protein sequence is 27 residues longer than other known ribonucleases k6 and shows amino acids exchanges which could reflect a strain specificity or polymorphism within the bovine genome. Based on sequence similarity we have termed the identified gene bovine ribonuclease k6 b (brk6b).
A novel archaeal alanine dehydrogenase homologous to ornithine cyclodeaminase and mu-crystallin.

PubMed

Schröder, Imke; Vadas, Alexander; Johnson, Eric; Lim, Sierin; Monbouquette, Harold G

2004-11-01

A novel alanine dehydrogenase (AlaDH) showing no significant amino acid sequence homology with previously known bacterial AlaDHs was purified to homogeneity from the soluble fraction of the hyperthermophilic archaeon Archaeoglobus fulgidus. AlaDH catalyzed the reversible, NAD+-dependent deamination of L-alanine to pyruvate and NH4+. NADP(H) did not serve as a coenzyme. The enzyme is a homodimer of 35 kDa per subunit. The Km values for L-alanine, NAD+, pyruvate, NADH, and NH4+ were estimated at 0.71, 0.60, 0.16, 0.02, and 17.3 mM, respectively. The A. fulgidus enzyme exhibited its highest activity at about 82 degrees C (203 U/mg for reductive amination of pyruvate) yet still retained 30% of its maximum activity at 25 degrees C. The thermostability of A. fulgidus AlaDH was increased by more than 10-fold by 1.5 M KCl to a half-life of 55 h at 90 degrees C. At 25 degrees C in the presence of this salt solution, the enzyme was approximately 100% stable for more than 3 months. Closely related A. fulgidus AlaDH homologues were found in other archaea. On the basis of its amino acid sequence, A. fulgidus AlaDH is a member of the ornithine cyclodeaminase-mu-crystallin family of enzymes. Similar to the mu-crystallins, A. fulgidus AlaDH did not exhibit any ornithine cyclodeaminase activity. The recombinant human mu-crystallin was assayed for AlaDH activity, but no activity was detected. The novel A. fulgidus gene encoding AlaDH, AF1665, is designated ala.
Nucleotide sequence of the 3' terminal region of lettuce mosaic potyvirus RNA shows a Gln/Val dipeptide at the cleavage site between the polymerase and the coat protein.

PubMed

Dinant, S; Lot, H; Albouy, J; Kuziak, C; Meyer, M; Astier-Manifacier, S

1991-01-01

DNA complementary to the 3' terminal 1651 nucleotides of the genome of the common strain of lettuce mosaic virus (LMV-O) has been cloned and sequenced. Microsequencing of the N-terminus enabled localization of the coat protein gene in this sequence. It showed also that the LMV coat protein coding region is at the 3' end of the genome, and that the coat protein is processed from a larger protein by cleavage at an unusual Q/V dipeptide between the polymerase and the coat protein. This is the first report of such a site for cleavage of a potyvirus polyprotein, where only Q/A, Q/S, and Q/G cleavage sites have been reported. The LMV coat protein gene encodes a 278 amino acid polypeptide with a calculated Mr of 31,171 and is flanked by a region which has a high degree of homology with the putative polymerase and a 3' untranslated region of 211 nucleotides in length. Percentage of homology with the coat protein of other potyviruses confirms that LMV is a distinct member of this group. Moreover, amino acid homologies noticed with the coat protein of potexvirus, bymovirus, and carlavirus elongated plant viruses suggest a functional significance for the conserved domains.
Changing partners: moving from non-homologous to homologous centromere pairing in meiosis

PubMed Central

Stewart, Mara N.; Dawson, Dean S.

2010-01-01

Reports of centromere pairing in early meiotic cells have appeared sporadically over the past thirty years. Recent experiments demonstrate that early centromere pairing occurs between non-homologous centromeres. As meiosis proceeds, centromeres change partners, becoming arranged in homologous pairs. Investigations of these later centromere pairs indicate that paired homologous centromeres are actively associated rather than positioned passively, side-by-side. Meiotic centromere pairing has been observed in organisms as diverse as mice, wheat and yeast, indicating that non-homologous centromere pairing in early meiosis and active homologous centromere pairing in later meiosis might be themes in meiotic chromosome behavior. Moreover, such pairing could have previously unrecognized roles in mediating chromosome organization or architecture that impact meiotic segregation fidelity. PMID:18804891
Human alpha beta hydrolase domain containing protein 11 and its yeast homolog are lipid hydrolases

DOE Office of Scientific and Technical Information (OSTI.GOV)

Arya, Madhuri; Srinivasan, Malathi; Rajasekharan, Ram

Mammalian alpha/beta hydrolase domain (ABHD) family of proteins have emerged as key regulators of lipid metabolism and are found to be associated with human diseases. Human α/β-hydrolase domain containing protein 11 (ABHD11) has recently been predicted as a potential biomarker for human lung adenocarcinoma. In silico analyses of the ABHD11 protein sequence revealed the presence of a conserved lipase motif GXSXG. However, the role of ABHD11 in lipid metabolism is not known. To understand the biological function of ABHD11, we heterologously expressed the human ABHD11 in budding yeast, Saccharomyces cerevisiae. In vivo [{sup 14}C]acetate labeling of cellular lipids in yeast cellsmore » overexpressing ABHD11 showed a decrease in triacylglycerol content. Overexpression of ABHD11 also alters the molecular species of triacylglycerol in yeast. Similar activity was observed in its yeast homolog, Ygr031w. The role of the conserved lipase motif in the hydrolase activity was proven by the mutation of all conserved amino acid residues of GXSXG motif. Collectively, our results demonstrate that human ABHD11 and its yeast homolog YGR031W have a pivotal role in the lipid metabolism. - Highlights: • Overexpression of ABHD11 protein and its yeast homolog Ygr031w cause a reduction in triacylglycerol levels in yeast. • The reduction in triacylglycerol is due to the presence of lipase motif GXSXG. • Overexpression of ABHD11 and Ygr031w alters the molecular species of triacylglycerol.« less
The VP35 and VP40 proteins of filoviruses. Homology between Marburg and Ebola viruses.

PubMed

Bukreyev, A A; Volchkov, V E; Blinov, V M; Netesov, S V

1993-05-03

The fragments of genomic RNA sequences of Marburg (MBG) and Ebola (EBO) viruses are reported. These fragments were found to encode the VP35 and VP40 proteins. The canonic sequences were revealed before and after each open reading frame. It is suggested that these sequences are mRNA extremities and at the same time the regulatory elements for mRNA transcription. Homology between the MBG and EBO proteins was discovered.
Characterization of a highly conserved human homolog to the chicken neural cell surface protein Bravo/Nr-CAM that maps to chromosome band 7q31

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lane, R.P.; Vielmetter, J.; Dreyer, W.J.

1996-08-01

The neuronal cell adhesion molecule Bravo/Nr-CAM is a cell surface protein of the immunoglobulin (Ig) superfamily and is closely related to the L1/NgCAM and neurofascin molecules, all of which contain six immunoglobulin domains, five fibronectin repeats, a transmembrane region, and an intracellular domain. Chicken Bravo/Nr-CAM has been shown to interact with other cell surface molecules of the Ig superfamily and has been implicated in specific pathfinding roles of axonal growth cones in the developing nervous system. We now report the characterization of cDNA clones encoding the human Bravo/Nr-CAM protein, which, like its chicken homolog, is composed of six V-like Igmore » domains and five fibronectin type III repeats. The human Bravo/Nr-CAM homolog also contains a transmembrane and intracellular domain, both of which are 100% conserved at the amino acid level compared to its chicken homolog. Overall, the human Bravo/Nr-CAM homolog is 82% identical to the chicken Bravo/Nr-CAM amino acid sequence. Independent cDNAs encoding four different isoforms were also identified, all of which contain alternatively spliced variants around the fifth fibronectin type III repeat, including one isoform that had been previously identified for chicken Bravo/Nr-CAM. Northern blot analysis reveals one mRNA species of approximately 7.0 kb in adult human brain tissue. Fluorescence in situ hybridization maps the gene for human Bravo/Nr-CAM to human chromosome 7q31.1-q31.2. This chromosomal locus has been previously identified as containing a tumore suppressor candidate gene commonly deleted in certain human cancer tissues. 38 refs., 5 figs.« less
Isolation, Characterization, Molecular Gene Cloning, and Sequencing of a Novel Phytase from Bacillus subtilis

PubMed Central

Kerovuo, Janne; Lauraeus, Marko; Nurminen, Päivi; Kalkkinen, Nisse; Apajalahti, Juha

1998-01-01

The Bacillus subtilis strain VTT E-68013 was chosen for purification and characterization of its excreted phytase. Purified enzyme had maximal phytase activity at pH 7 and 55°C. Isolated enzyme required calcium for its activity and/or stability and was readily inhibited by EDTA. The enzyme proved to be highly specific since, of the substrates tested, only phytate, ADP, and ATP were hydrolyzed (100, 75, and 50% of the relative activity, respectively). The phytase gene (phyC) was cloned from the B. subtilis VTT E-68013 genomic library. The deduced amino acid sequence (383 residues) showed no homology to the sequences of other phytases nor to those of any known phosphatases. PhyC did not have the conserved RHGXRXP sequence found in the active site of known phytases, and therefore PhyC appears not to be a member of the phytase subfamily of histidine acid phosphatases but a novel enzyme having phytase activity. Due to its pH profile and optimum, it could be an interesting candidate for feed applications. PMID:9603817
Genome sequences of a mouse-avirulent and a mouse-virulent strain of Ross River virus.

PubMed

Faragher, S G; Meek, A D; Rice, C M; Dalgarno, L

1988-04-01

The nucleotide sequence of the genomic RNA of a mouse-avirulent strain of Ross River virus, RRV NB5092 (isolated in 1969), has been determined and the corresponding sequence for the prototype mouse-virulent strain, RRV T48 (isolated in 1959), has been completed. The RRV NB5092 genome is approximately 11,674 nucleotides in length, compared with 11,853 nucleotides for RRV T48. RRV NB5092 and RRV T48 have the same genome organization. For both viruses an untranslated region of 80 nucleotides at the 5' end of the genome is followed by a 7440-nucleotide open reading frame which is interrupted after 5586 nucleotides by a single opal termination codon. By homology with other alphaviruses, the 5586-nucleotide open reading frame encodes the nonstructural proteins nsP1, nsP2, and nsP3; a fourth nonstructural protein, nsP4, is produced by read-through of the opal codon. The RRV nonstructural proteins show strong homology with the corresponding proteins of Sindbis virus and Semliki Forest virus in terms of size, net charge, and hydropathy characteristics. However, homology is not uniform between or within the proteins; nsP1, nsP2, and nsP4 contain extended domains which are highly conserved between alphaviruses, while the C-terminal region of nsP3 shows little conservation in sequence or length between alphaviruses. An untranslated "junction" region of 44 nucleotides (for RRV NB5092) or 47 nucleotides (for RRV T48) separates the nonstructural and structural protein coding regions. The structural proteins (capsid-E3-E2-6K-E1) are translated from an open reading frame of 3762 nucleotides which is followed by a 3'-untranslated region of approximately 348 nucleotides (for RRV NB5092) or 524 nucleotides (for RRV T48). Excluding deletions and insertions, the genomes of RRV NB5092 and RRV T48 differ at 284 nucleotides, representing a sequence divergence of 2.38%. Sequence deletions or insertions were found only in the noncoding regions and include a 173-nucleotide deletion in the 3
Negative Ion In-Source Decay Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry for Sequencing Acidic Peptides

NASA Astrophysics Data System (ADS)

McMillen, Chelsea L.; Wright, Patience M.; Cassady, Carolyn J.

2016-05-01

Matrix-assisted laser desorption/ionization (MALDI) in-source decay was studied in the negative ion mode on deprotonated peptides to determine its usefulness for obtaining extensive sequence information for acidic peptides. Eight biological acidic peptides, ranging in size from 11 to 33 residues, were studied by negative ion mode ISD (nISD). The matrices 2,5-dihydroxybenzoic acid, 2-aminobenzoic acid, 2-aminobenzamide, 1,5-diaminonaphthalene, 5-amino-1-naphthol, 3-aminoquinoline, and 9-aminoacridine were used with each peptide. Optimal fragmentation was produced with 1,5-diaminonphthalene (DAN), and extensive sequence informative fragmentation was observed for every peptide except hirudin(54-65). Cleavage at the N-Cα bond of the peptide backbone, producing c' and z' ions, was dominant for all peptides. Cleavage of the N-Cα bond N-terminal to proline residues was not observed. The formation of c and z ions is also found in electron transfer dissociation (ETD), electron capture dissociation (ECD), and positive ion mode ISD, which are considered to be radical-driven techniques. Oxidized insulin chain A, which has four highly acidic oxidized cysteine residues, had less extensive fragmentation. This peptide also exhibited the only charged localized fragmentation, with more pronounced product ion formation adjacent to the highly acidic residues. In addition, spectra were obtained by positive ion mode ISD for each protonated peptide; more sequence informative fragmentation was observed via nISD for all peptides. Three of the peptides studied had no product ion formation in ISD, but extensive sequence informative fragmentation was found in their nISD spectra. The results of this study indicate that nISD can be used to readily obtain sequence information for acidic peptides.
Negative Ion In-Source Decay Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry for Sequencing Acidic Peptides.

PubMed

McMillen, Chelsea L; Wright, Patience M; Cassady, Carolyn J

2016-05-01

Matrix-assisted laser desorption/ionization (MALDI) in-source decay was studied in the negative ion mode on deprotonated peptides to determine its usefulness for obtaining extensive sequence information for acidic peptides. Eight biological acidic peptides, ranging in size from 11 to 33 residues, were studied by negative ion mode ISD (nISD). The matrices 2,5-dihydroxybenzoic acid, 2-aminobenzoic acid, 2-aminobenzamide, 1,5-diaminonaphthalene, 5-amino-1-naphthol, 3-aminoquinoline, and 9-aminoacridine were used with each peptide. Optimal fragmentation was produced with 1,5-diaminonphthalene (DAN), and extensive sequence informative fragmentation was observed for every peptide except hirudin(54-65). Cleavage at the N-Cα bond of the peptide backbone, producing c' and z' ions, was dominant for all peptides. Cleavage of the N-Cα bond N-terminal to proline residues was not observed. The formation of c and z ions is also found in electron transfer dissociation (ETD), electron capture dissociation (ECD), and positive ion mode ISD, which are considered to be radical-driven techniques. Oxidized insulin chain A, which has four highly acidic oxidized cysteine residues, had less extensive fragmentation. This peptide also exhibited the only charged localized fragmentation, with more pronounced product ion formation adjacent to the highly acidic residues. In addition, spectra were obtained by positive ion mode ISD for each protonated peptide; more sequence informative fragmentation was observed via nISD for all peptides. Three of the peptides studied had no product ion formation in ISD, but extensive sequence informative fragmentation was found in their nISD spectra. The results of this study indicate that nISD can be used to readily obtain sequence information for acidic peptides.

Complete genome sequence analysis of a duck circovirus from Guangxi pockmark ducks.

PubMed

Xie, Liji; Xie, Zhixun; Zhao, Guangyuan; Liu, Jiabo; Pang, Yaoshan; Deng, Xianwen; Xie, Zhiqin; Fan, Qing

2012-12-01

We report here the complete genomic sequence of a novel duck circovirus (DuCV) strain, GX1104, isolated from Guangxi pockmark ducks in Guangxi, China. The whole nucleotide sequence had the highest homology (97.2%) with the sequence of strain TC/2002 (GenBank accession number AY394721.1) and had a low homology (76.8% to 78.6%) with the sequences of other strains isolated from China, Germany, and the United States. This report will help to understand the epidemiology and molecular characteristics of Guangxi pockmark duck circovirus in southern China.
Molecular characterization of long direct repeat (LDR) sequences expressing a stable mRNA encoding for a 35-amino-acid cell-killing peptide and a cis-encoded small antisense RNA in Escherichia coli.

PubMed

Kawano, Mitsuoki; Oshima, Taku; Kasai, Hiroaki; Mori, Hirotada

2002-07-01

Genome sequence analyses of Escherichia coli K-12 revealed four copies of long repetitive elements. These sequences are designated as long direct repeat (LDR) sequences. Three of the repeats (LDR-A, -B, -C), each approximately 500 bp in length, are located as tandem repeats at 27.4 min on the genetic map. Another copy (LDR-D), 450 bp in length and nearly identical to LDR-A, -B and -C, is located at 79.7 min, a position that is directly opposite the position of LDR-A, -B and -C. In this study, we demonstrate that LDR-D encodes a 35-amino-acid peptide, LdrD, the overexpression of which causes rapid cell killing and nucleoid condensation of the host cell. Northern blot and primer extension analysis showed constitutive transcription of a stable mRNA (approximately 370 nucleotides) encoding LdrD and an unstable cis-encoded antisense RNA (approximately 60 nucleotides), which functions as a trans-acting regulator of ldrD translation. We propose that LDR encodes a toxin-antitoxin module. LDR-homologous sequences are not pre-sent on any known plasmids but are conserved in Salmonella and other enterobacterial species.
Cloning, sequencing, and expression of the Zymomonas mobilis phosphoglycerate mutase gene (pgm) in Escherichia coli.

PubMed Central

Yomano, L P; Scopes, R K; Ingram, L O

1993-01-01

Phosphoglycerate mutase is an essential glycolytic enzyme for Zymomonas mobilis, catalyzing the reversible interconversion of 3-phosphoglycerate and 2-phosphoglycerate. The pgm gene encoding this enzyme was cloned on a 5.2-kbp DNA fragment and expressed in Escherichia coli. Recombinants were identified by using antibodies directed against purified Z. mobilis phosphoglycerate mutase. The pgm gene contains a canonical ribosome-binding site, a biased pattern of codon usage, a long upstream untranslated region, and four promoters which share sequence homology. Interestingly, adhA and a D-specific 2-hydroxyacid dehydrogenase were found on the same DNA fragment and appear to form a cluster of genes which function in central metabolism. The translated sequence for Z. mobilis pgm was in full agreement with the 40 N-terminal amino acid residues determined by protein sequencing. The primary structure of the translated sequence is highly conserved (52 to 60% identity with other phosphoglycerate mutases) and also shares extensive homology with bisphosphoglycerate mutases (51 to 59% identity). Since Southern blots indicated the presence of only a single copy of pgm in the Z. mobilis chromosome, it is likely that the cloned pgm gene functions to provide both activities. Z. mobilis phosphoglycerate mutase is unusual in that it lacks the flexible tail and lysines at the carboxy terminus which are present in the enzyme isolated from all other organisms examined. Images PMID:8320209
Homology modeling a fast tool for drug discovery: current perspectives.

PubMed

Vyas, V K; Ukawala, R D; Ghate, M; Chintha, C

2012-01-01

Major goal of structural biology involve formation of protein-ligand complexes; in which the protein molecules act energetically in the course of binding. Therefore, perceptive of protein-ligand interaction will be very important for structure based drug design. Lack of knowledge of 3D structures has hindered efforts to understand the binding specificities of ligands with protein. With increasing in modeling software and the growing number of known protein structures, homology modeling is rapidly becoming the method of choice for obtaining 3D coordinates of proteins. Homology modeling is a representation of the similarity of environmental residues at topologically corresponding positions in the reference proteins. In the absence of experimental data, model building on the basis of a known 3D structure of a homologous protein is at present the only reliable method to obtain the structural information. Knowledge of the 3D structures of proteins provides invaluable insights into the molecular basis of their functions. The recent advances in homology modeling, particularly in detecting and aligning sequences with template structures, distant homologues, modeling of loops and side chains as well as detecting errors in a model contributed to consistent prediction of protein structure, which was not possible even several years ago. This review focused on the features and a role of homology modeling in predicting protein structure and described current developments in this field with victorious applications at the different stages of the drug design and discovery.
Homology Modeling a Fast Tool for Drug Discovery: Current Perspectives

PubMed Central

Vyas, V. K.; Ukawala, R. D.; Ghate, M.; Chintha, C.

2012-01-01

Major goal of structural biology involve formation of protein-ligand complexes; in which the protein molecules act energetically in the course of binding. Therefore, perceptive of protein-ligand interaction will be very important for structure based drug design. Lack of knowledge of 3D structures has hindered efforts to understand the binding specificities of ligands with protein. With increasing in modeling software and the growing number of known protein structures, homology modeling is rapidly becoming the method of choice for obtaining 3D coordinates of proteins. Homology modeling is a representation of the similarity of environmental residues at topologically corresponding positions in the reference proteins. In the absence of experimental data, model building on the basis of a known 3D structure of a homologous protein is at present the only reliable method to obtain the structural information. Knowledge of the 3D structures of proteins provides invaluable insights into the molecular basis of their functions. The recent advances in homology modeling, particularly in detecting and aligning sequences with template structures, distant homologues, modeling of loops and side chains as well as detecting errors in a model contributed to consistent prediction of protein structure, which was not possible even several years ago. This review focused on the features and a role of homology modeling in predicting protein structure and described current developments in this field with victorious applications at the different stages of the drug design and discovery. PMID:23204616
UET: a database of evolutionarily-predicted functional determinants of protein sequences that cluster as functional sites in protein structures.

PubMed

Lua, Rhonald C; Wilson, Stephen J; Konecki, Daniel M; Wilkins, Angela D; Venner, Eric; Morgan, Daniel H; Lichtarge, Olivier

2016-01-04

The structure and function of proteins underlie most aspects of biology and their mutational perturbations often cause disease. To identify the molecular determinants of function as well as targets for drugs, it is central to characterize the important residues and how they cluster to form functional sites. The Evolutionary Trace (ET) achieves this by ranking the functional and structural importance of the protein sequence positions. ET uses evolutionary distances to estimate functional distances and correlates genotype variations with those in the fitness phenotype. Thus, ET ranks are worse for sequence positions that vary among evolutionarily closer homologs but better for positions that vary mostly among distant homologs. This approach identifies functional determinants, predicts function, guides the mutational redesign of functional and allosteric specificity, and interprets the action of coding sequence variations in proteins, people and populations. Now, the UET database offers pre-computed ET analyses for the protein structure databank, and on-the-fly analysis of any protein sequence. A web interface retrieves ET rankings of sequence positions and maps results to a structure to identify functionally important regions. This UET database integrates several ways of viewing the results on the protein sequence or structure and can be found at http://mammoth.bcm.tmc.edu/uet/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Monitoring Replication Protein A (RPA) dynamics in homologous recombination through site-specific incorporation of non-canonical amino acids.

PubMed

Pokhrel, Nilisha; Origanti, Sofia; Davenport, Eric Parker; Gandhi, Disha; Kaniecki, Kyle; Mehl, Ryan A; Greene, Eric C; Dockendorff, Chris; Antony, Edwin

2017-09-19

An essential coordinator of all DNA metabolic processes is Replication Protein A (RPA). RPA orchestrates these processes by binding to single-stranded DNA (ssDNA) and interacting with several other DNA binding proteins. Determining the real-time kinetics of single players such as RPA in the presence of multiple DNA processors to better understand the associated mechanistic events is technically challenging. To overcome this hurdle, we utilized non-canonical amino acids and bio-orthogonal chemistry to site-specifically incorporate a chemical fluorophore onto a single subunit of heterotrimeric RPA. Upon binding to ssDNA, this fluorescent RPA (RPAf) generates a quantifiable change in fluorescence, thus serving as a reporter of its dynamics on DNA in the presence of multiple other DNA binding proteins. Using RPAf, we describe the kinetics of facilitated self-exchange and exchange by Rad51 and mediator proteins during various stages in homologous recombination. RPAf is widely applicable to investigate its mechanism of action in processes such as DNA replication, repair and telomere maintenance. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Multiple homologous genes knockout (KO) by CRISPR/Cas9 system in rabbit.

PubMed

Liu, Huan; Sui, Tingting; Liu, Di; Liu, Tingjun; Chen, Mao; Deng, Jichao; Xu, Yuanyuan; Li, Zhanjun

2018-03-20

The CRISPR/Cas9 system is a highly efficient and convenient genome editing tool, which has been widely used for single or multiple gene mutation in a variety of organisms. Disruption of multiple homologous genes, which have similar DNA sequences and gene function, is required for the study of the desired phenotype. In this study, to test whether the CRISPR/Cas9 system works on the mutation of multiple homologous genes, a single guide RNA (sgRNA) targeting three fucosyltransferases encoding genes (FUT1, FUT2 and SEC1) was designed. As expected, triple gene mutation of FUT1, FUT2 and SEC1 could be achieved simultaneously via a sgRNA mediated CRISPR/Cas9 system. Besides, significantly reduced serum fucosyltransferases enzymes activity was also determined in those triple gene mutation rabbits. Thus, we provide the first evidence that multiple homologous genes knockout (KO) could be achieved efficiently by a sgRNA mediated CRISPR/Cas9 system in mammals, which could facilitate the genotype to phenotype studies of homologous genes in future. Copyright © 2018 Elsevier B.V. All rights reserved.
Homology modeling reveals the structural background of the striking difference in thermal stability between two related [NiFe]hydrogenases.

PubMed

Szilágyi, András; Kovács, Kornél L; Rákhely, Gábor; Závodszky, Péter

2002-02-01

Hydrogenases are redox metalloenzymes in bacteria that catalyze the uptake or production of molecular hydrogen. Two homologous nickel-iron hydrogenases, HupSL and HydSL from the photosynthetic purple sulfur bacterium Thiocapsa roseopersicina, differ substantially in their thermal stabilities despite the high sequence similarity between them. The optimum temperature of HydSL activity is estimated to be at least 50 degrees C higher than that of HupSL. In this work, homology models of both proteins were constructed and analyzed for a number of structural properties. The comparison of the models reveals that the higher stability of HydSL can be attributed to increased inter-subunit electrostatic interactions: the homology models reliably predict that HydSL contains at least five more inter-subunit ion pairs than HupSL. The subunit interface of HydSL is more polar than that of HupSL, and it contains a few extra inter-subunit hydrogen bonds. A more optimized cavity system and amino acid replacements resulting in increased conformational rigidity may also contribute to the higher stability of HydSL. The results are in accord with the general observation that with increasing temperature, the role of electrostatic interactions in protein stability increases. Electronic supplementary material to this paper can be obtained by using the Springer Link server located at http://dx.doi.org/10.1007/s00894-001-0071-8.
Trans-Homolog Interactions Facilitating Paramutation in Maize

PubMed Central

2015-01-01

Paramutations represent locus-specific trans-homolog interactions affecting the heritable silencing properties of endogenous alleles. Although examples of paramutation are well studied in maize (Zea mays), the responsible mechanisms remain unclear. Genetic analyses indicate roles for plant-specific DNA-dependent RNA polymerases that generate small RNAs, and current working models hypothesize that these small RNAs direct heritable changes at sequences often acting as transcriptional enhancers. Several studies have defined specific sequences that mediate paramutation behaviors, and recent results identify a diversity of DNA-dependent RNA polymerase complexes operating in maize. Other reports ascribe broader roles for some of these complexes in normal genome function. This review highlights recent research to understand the molecular mechanisms of paramutation and examines evidence relevant to small RNA-based modes of transgenerational epigenetic inheritance. PMID:26149572
37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

Code of Federal Regulations, 2014 CFR

2014-07-01

...” means those amino acids other than “Xaa” and those nucleotide bases other than “n”defined in accordance... 37 Patents, Trademarks, and Copyrights 1 2014-07-01 2014-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences...
37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

Code of Federal Regulations, 2013 CFR

2013-07-01

...” means those amino acids other than “Xaa” and those nucleotide bases other than “n”defined in accordance... 37 Patents, Trademarks, and Copyrights 1 2013-07-01 2013-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences...
37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

Code of Federal Regulations, 2012 CFR

2012-07-01

...” means those amino acids other than “Xaa” and those nucleotide bases other than “n”defined in accordance... 37 Patents, Trademarks, and Copyrights 1 2012-07-01 2012-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences...
Cloning and characterization of the major histone H2A genes completes the cloning and sequencing of known histone genes of Tetrahymena thermophila.

PubMed Central

Liu, X; Gorovsky, M A

1996-01-01

A truncated cDNA clone encoding Tetrahymena thermophila histone H2A2 was isolated using synthetic degenerate oligonucleotide probes derived from H2A protein sequences of Tetrahymena pyriformis. The cDNA clone was used as a homologous probe to isolate a truncated genomic clone encoding H2A1. The remaining regions of the genes for H2A1 (HTA1) and H2A2 (HTA2) were then isolated using inverse PCR on circularized genomic DNA fragments. These partial clones were assembled into intact HTA1 and HTA2 clones. Nucleotide sequences of the two genes were highly homologous within the coding region but not in the noncoding regions. Comparison of the deduced amino acid sequences with protein sequences of T. pyriformis H2As showed only two and three differences respectively, in a total of 137 amino acids for H2A1, and 132 amino acids for H2A2, indicating the two genes arose before the divergence of these two species. The HTA2 gene contains a TAA triplet within the coding region, encoding a glutamine residue. In contrast with the T. thermophila HHO and HTA3 genes, no introns were identified within the two genes. The 5'- and 3'-ends of the histone H2A mRNAs; were determined by RNase protection and by PCR mapping using RACE and RLM-RACE methods. Both genes encode polyadenylated mRNAs and are highly expressed in vegetatively growing cells but only weakly expressed in starved cultures. With the inclusion of these two genes, T. thermophila is the first organism whose entire complement of known core and linker histones, including replication-dependent and basal variants, has been cloned and sequenced. PMID:8760889
NullSeq: A Tool for Generating Random Coding Sequences with Desired Amino Acid and GC Contents.

PubMed

Liu, Sophia S; Hockenberry, Adam J; Lancichinetti, Andrea; Jewett, Michael C; Amaral, Luís A N

2016-11-01

The existence of over- and under-represented sequence motifs in genomes provides evidence of selective evolutionary pressures on biological mechanisms such as transcription, translation, ligand-substrate binding, and host immunity. In order to accurately identify motifs and other genome-scale patterns of interest, it is essential to be able to generate accurate null models that are appropriate for the sequences under study. While many tools have been developed to create random nucleotide sequences, protein coding sequences are subject to a unique set of constraints that complicates the process of generating appropriate null models. There are currently no tools available that allow users to create random coding sequences with specified amino acid composition and GC content for the purpose of hypothesis testing. Using the principle of maximum entropy, we developed a method that generates unbiased random sequences with pre-specified amino acid and GC content, which we have developed into a python package. Our method is the simplest way to obtain maximally unbiased random sequences that are subject to GC usage and primary amino acid sequence constraints. Furthermore, this approach can easily be expanded to create unbiased random sequences that incorporate more complicated constraints such as individual nucleotide usage or even di-nucleotide frequencies. The ability to generate correctly specified null models will allow researchers to accurately identify sequence motifs which will lead to a better understanding of biological processes as well as more effective engineering of biological systems.
Non-Homologous End Joining and Homology Directed DNA Repair Frequency of Double-Stranded Breaks Introduced by Genome Editing Reagents.

PubMed

Zaboikin, Michail; Zaboikina, Tatiana; Freter, Carl; Srinivasakumar, Narasimhachar

2017-01-01

Genome editing using transcription-activator like effector nucleases or RNA guided nucleases allows one to precisely engineer desired changes within a given target sequence. The genome editing reagents introduce double stranded breaks (DSBs) at the target site which can then undergo DNA repair by non-homologous end joining (NHEJ) or homology directed recombination (HDR) when a template DNA molecule is available. NHEJ repair results in indel mutations at the target site. As PCR amplified products from mutant target regions are likely to exhibit different melting profiles than PCR products amplified from wild type target region, we designed a high resolution melting analysis (HRMA) for rapid identification of efficient genome editing reagents. We also designed TaqMan assays using probes situated across the cut site to discriminate wild type from mutant sequences present after genome editing. The experiments revealed that the sensitivity of the assays to detect NHEJ-mediated DNA repair could be enhanced by selection of transfected cells to reduce the contribution of unmodified genomic DNA from untransfected cells to the DNA melting profile. The presence of donor template DNA lacking the target sequence at the time of genome editing further enhanced the sensitivity of the assays for detection of mutant DNA molecules by excluding the wild-type sequences modified by HDR. A second TaqMan probe that bound to an adjacent site, outside of the primary target cut site, was used to directly determine the contribution of HDR to DNA repair in the presence of the donor template sequence. The TaqMan qPCR assay, designed to measure the contribution of NHEJ and HDR in DNA repair, corroborated the results from HRMA. The data indicated that genome editing reagents can produce DSBs at high efficiency in HEK293T cells but a significant proportion of these are likely masked by reversion to wild type as a result of HDR. Supplying a donor plasmid to provide a template for HDR (that
[Complete genome sequencing of polymalic acid-producing strain Aureobasidium pullulans CCTCC M2012223].

PubMed

Wang, Yongkang; Song, Xiaodan; Li, Xiaorong; Yang, Sang-tian; Zou, Xiang

2017-01-04

To explore the genome sequence of Aureobasidium pullulans CCTCC M2012223, analyze the key genes related to the biosynthesis of important metabolites, and provide genetic background for metabolic engineering. Complete genome of A. pullulans CCTCC M2012223 was sequenced by Illumina HiSeq high throughput sequencing platform. Then, fragment assembly, gene prediction, functional annotation, and GO/COG cluster were analyzed in comparison with those of other five A. pullulans varieties. The complete genome sequence of A. pullulans CCTCC M2012223 was 30756831 bp with an average GC content of 47.49%, and 9452 genes were successfully predicted. Genome-wide analysis showed that A. pullulans CCTCC M2012223 had the biggest genome assembly size. Protein sequences involved in the pullulan and polymalic acid pathway were highly conservative in all of six A. pullulans varieties. Although both A. pullulans CCTCC M2012223 and A. pullulans var. melanogenum have a close affinity, some point mutation and inserts were occurred in protein sequences involved in melanin biosynthesis. Genome information of A. pullulans CCTCC M2012223 was annotated and genes involved in melanin, pullulan and polymalic acid pathway were compared, which would provide a theoretical basis for genetic modification of metabolic pathway in A. pullulans.
Characterization of Group V Dubnium Homologs on DGA Extraction Chromatography Resin from Nitric and Hydrofluoric Acid Matrices

DOE Office of Scientific and Technical Information (OSTI.GOV)

Despotopulos, J D; Sudowe, R

2012-02-21

behaving somewhere between Nb and Pa. Much more recent studies have examined the properties of Db from HNO{sub 3}/HF matrices, and suggest Db forms complexes similar to those of Pa. Very little experimental work into the behavior of element 114 has been performed. Thermochromatography experiments of three atoms of element 114 indicate that the element 114 is at least as volatile as Hg, At, and element 112. Lead was shown to deposit on gold at temperatures about 1000 C higher than the atoms of element 114. Results indicate a substantially increased stability of element 114. No liquid phase studies of element 114 or its homologs (Pb, Sn, Ge) or pseudo-homologs (Hg, Cd) have been performed. Theoretical predictions indicate that element 114 is should have a much more stable +2 oxidation state and neutral state than Pb, which would result in element 114 being less reactive and less metallic than Pb. The relativistic effects on the 7p{sub 1/2} electrons are predicted to cause a diagonal relationship to be introduced into the periodic table. Therefore, 114{sup 2+} is expected to behave as if it were somewhere between Hg{sup 2+}, Cd{sup 2+}, and Pb{sup 2+}. In this work two commercially available extraction chromatography resins are evaluated, one for the separation of Db homologs and pseudo?homologs from each other as well as from potential interfering elements such as Group IV Rf homologs and actinides, and the other for separation of element 114 homologs. One resin, Eichrom's DGA resin, contains a N,N,N',N'-tetra-n-octyldiglycolamide extractant, which separates analytes based on both size and charge characteristics of the solvated metal species, coated on an inert support. The DGA resin was examined for Db chemical systems, and shows a high degree of selectivity for tri-, tetra-, and hexavalent metal ions in multiple acid matrices with fast kinetics. The other resin, Eichrom's Pb resin, contains a di-t-butylcyclohexano 18-crown-6 extractant with isodecanol solvent, which
History of retinoic acid receptors.

PubMed

Benbrook, Doris M; Chambon, Pierre; Rochette-Egly, Cécile; Asson-Batres, Mary Ann

2014-01-01

The discovery of retinoic acid receptors arose from research into how vitamins are essential for life. Early studies indicated that Vitamin A was metabolized into an active factor, retinoic acid (RA), which regulates RNA and protein expression in cells. Each step forward in our understanding of retinoic acid in human health was accomplished by the development and application of new technologies. Development cDNA cloning techniques and discovery of nuclear receptors for steroid hormones provided the basis for identification of two classes of retinoic acid receptors, RARs and RXRs, each of which has three isoforms, α, β and ɣ. DNA manipulation and crystallographic studies revealed that the receptors contain discrete functional domains responsible for binding to DNA, ligands and cofactors. Ligand binding was shown to induce conformational changes in the receptors that cause release of corepressors and recruitment of coactivators to create functional complexes that are bound to consensus promoter DNA sequences called retinoic acid response elements (RAREs) and that cause opening of chromatin and transcription of adjacent genes. Homologous recombination technology allowed the development of mice lacking expression of retinoic acid receptors, individually or in various combinations, which demonstrated that the receptors exhibit vital, but redundant, functions in fetal development and in vision, reproduction, and other functions required for maintenance of adult life. More recent advancements in sequencing and proteomic technologies reveal the complexity of retinoic acid receptor involvement in cellular function through regulation of gene expression and kinase activity. Future directions will require systems biology approaches to decipher how these integrated networks affect human stem cells, health, and disease.
Next generation sequencing identifies mutations in Atonal homolog 7 (ATOH7) in families with global eye developmental defects

PubMed Central

Khan, Kamron; Logan, Clare V.; McKibbin, Martin; Sheridan, Eamonn; Elçioglu, Nursel H.; Yenice, Ozlem; Parry, David A.; Fernandez-Fuentes, Narcis; Abdelhamed, Zakia I.A.; Al-Maskari, Ahmed; Poulter, James A.; Mohamed, Moin D.; Carr, Ian M.; Morgan, Joanne E.; Jafri, Hussain; Raashid, Yasmin; Taylor, Graham R.; Johnson, Colin A.; Inglehearn, Chris F.; Toomes, Carmel; Ali, Manir

2012-01-01

The atonal homolog 7 (ATOH7) gene encodes a transcription factor involved in determining the fate of retinal progenitor cells and is particularly required for optic nerve and ganglion cell development. Using a combination of autozygosity mapping and next generation sequencing, we have identified homozygous mutations in this gene, p.E49V and p.P18RfsX69, in two consanguineous families diagnosed with multiple ocular developmental defects, including severe vitreoretinal dysplasia, optic nerve hypoplasia, persistent fetal vasculature, microphthalmia, congenital cataracts, microcornea, corneal opacity and nystagmus. Most of these clinical features overlap with defects in the Norrin/β-catenin signalling pathway that is characterized by dysgenesis of the retinal and hyaloid vasculature. Our findings document Mendelian mutations within ATOH7 and imply a role for this molecule in the development of structures at the front as well as the back of the eye. This work also provides further insights into the function of ATOH7, especially its importance in retinal vascular development and hyaloid regression. PMID:22068589

Homologous and Homologous like Microwave Solar Radio Bursts

NASA Astrophysics Data System (ADS)

Trevisan, R. H.; Sawant, H. S.; Kalman, B.; Gesztelyi, L.

1990-11-01

ABSTRACT. Solar radio observations at 1.6 GHz were carried out in the month of July, 1985 by using 13.7 m diameter Itapetinga antenna with time resolution of 3 ms. Homologous Bursts, with total duration of about couple of seconds and repeated by some seconds were observed associated with Homologous H- flares. These H- flares were having periodicities of about 40 min. Observed long periodicities were attributed to oscillation of prominences, and small periods were attributed to removal of plasma from the field interaction zone. Also observed are "Homologous-Like" bursts. These bursts are double peak bursts with same time profile repeating in time. In addition to this, the ratio of the total duration of the bursts to time difference in the peaks of bursts remain constant. Morphological studies of these bursts have been presented. Keq tuoit : SUN-BURSTS - SUN-FLARE
Recently published protein sequences. I.

NASA Technical Reports Server (NTRS)

Jukes, T. H.; Holmquist, R.

1972-01-01

Some polypeptide sequences that have been published in the 1972 scientific literature are listed. Only selected sequences are included. The compilation has two objectives. Current information between periods when more comprehensive compilations are published is to be assembled and the use of data that do not include arrangements of unsequenced peptides for 'maximum homology' is to be encouraged.
Homologous recombination within the capsid gene of porcine circovirus type 2 subgroup viruses via natural co-infection

USDA-ARS?s Scientific Manuscript database

Several studies had reported homologous recombination between porcine circovirus type 2 (PCV2)-group 1 (Gp1) and -group 2 (Gp2) viruses. Interestingly, the recombination events described thus far mapped either within the Rep gene sequences or the sequences flanking the Rep gene region. Previously, ...
GOLabeler: Improving Sequence-based Large-scale Protein Function Prediction by Learning to Rank.

PubMed

You, Ronghui; Zhang, Zihan; Xiong, Yi; Sun, Fengzhu; Mamitsuka, Hiroshi; Zhu, Shanfeng

2018-03-07

Gene Ontology (GO) has been widely used to annotate functions of proteins and understand their biological roles. Currently only <1% of more than 70 million proteins in UniProtKB have experimental GO annotations, implying the strong necessity of automated function prediction (AFP) of proteins, where AFP is a hard multilabel classification problem due to one protein with a diverse number of GO terms. Most of these proteins have only sequences as input information, indicating the importance of sequence-based AFP (SAFP: sequences are the only input). Furthermore homology-based SAFP tools are competitive in AFP competitions, while they do not necessarily work well for so-called difficult proteins, which have <60% sequence identity to proteins with annotations already. Thus the vital and challenging problem now is how to develop a method for SAFP, particularly for difficult proteins. The key of this method is to extract not only homology information but also diverse, deep- rooted information/evidence from sequence inputs and integrate them into a predictor in a both effective and efficient manner. We propose GOLabeler, which integrates five component classifiers, trained from different features, including GO term frequency, sequence alignment, amino acid trigram, domains and motifs, and biophysical properties, etc., in the framework of learning to rank (LTR), a paradigm of machine learning, especially powerful for multilabel classification. The empirical results obtained by examining GOLabeler extensively and thoroughly by using large-scale datasets revealed numerous favorable aspects of GOLabeler, including significant performance advantage over state-of-the-art AFP methods. http://datamining-iip.fudan.edu.cn/golabeler. zhusf@fudan.edu.cn. Supplementary data are available at Bioinformatics online.
Amino Acid Properties Conserved in Molecular Evolution

PubMed Central

Rudnicki, Witold R.; Mroczek, Teresa; Cudek, Paweł

2014-01-01

That amino acid properties are responsible for the way protein molecules evolve is natural and is also reasonably well supported both by the structure of the genetic code and, to a large extent, by the experimental measures of the amino acid similarity. Nevertheless, there remains a significant gap between observed similarity matrices and their reconstructions from amino acid properties. Therefore, we introduce a simple theoretical model of amino acid similarity matrices, which allows splitting the matrix into two parts – one that depends only on mutabilities of amino acids and another that depends on pairwise similarities between them. Then the new synthetic amino acid properties are derived from the pairwise similarities and used to reconstruct similarity matrices covering a wide range of information entropies. Our model allows us to explain up to 94% of the variability in the BLOSUM family of the amino acids similarity matrices in terms of amino acid properties. The new properties derived from amino acid similarity matrices correlate highly with properties known to be important for molecular evolution such as hydrophobicity, size, shape and charge of amino acids. This result closes the gap in our understanding of the influence of amino acids on evolution at the molecular level. The methods were applied to the single family of similarity matrices used often in general sequence homology searches, but it is general and can be used also for more specific matrices. The new synthetic properties can be used in analyzes of protein sequences in various biological applications. PMID:24967708
Complete amino acid sequence of ananain and a comparison with stem bromelain and other plant cysteine proteases.

PubMed Central

Lee, K L; Albee, K L; Bernasconi, R J; Edmunds, T

1997-01-01

The amino acid sequences of ananain (EC3.4.22.31) and stem bromelain (3.4.22.32), two cysteine proteases from pineapple stem, are similar yet ananain and stem bromelain possess distinct specificities towards synthetic peptide substrates and different reactivities towards the cysteine protease inhibitors E-64 and chicken egg white cystatin. We present here the complete amino acid sequence of ananain and compare it with the reported sequences of pineapple stem bromelain, papain and chymopapain from papaya and actinidin from kiwifruit. Ananain is comprised of 216 residues with a theoretical mass of 23464 Da. This primary structure includes a sequence insert between residues 170 and 174 not present in stem bromelain or papain and a hydrophobic series of amino acids adjacent to His-157. It is possible that these sequence differences contribute to the different substrate and inhibitor specificities exhibited by ananain and stem bromelain. PMID:9355753
Vba2p, a vacuolar membrane protein involved in basic amino acid transport in Schizosaccharomyces pombe.

PubMed

Sugimoto, Naoko; Iwaki, Tomoko; Chardwiriyapreecha, Soracom; Shimazu, Masamitsu; Sekito, Takayuki; Takegawa, Kaoru; Kakinuma, Yoshimi

2010-01-01

A recent study filling the gap in the genome sequence in the left arm of chromosome 2 of Schizosaccharomyces pombe revealed a homolog of budding yeast Vba2p, a vacuolar transporter of basic amino acids. GFP-tagged Vba2p in fission yeast was localized to the vacuolar membrane. Upon disruption of vba2, the uptake of several amino acids, including lysine, histidine, and arginine, was impaired. A transient increase in lysine uptake under nitrogen starvation was lowered by this mutation. These findings suggest that Vba2p is involved in basic amino acid transport in S. pombe under diverse conditions.
Arabidopsis Glutamate Receptor Homolog3.5 Modulates Cytosolic Ca2+ Level to Counteract Effect of Abscisic Acid in Seed Germination1[OPEN

PubMed Central

Kong, Dongdong; Ju, Chuanli; Parihar, Aisha; Kim, So; Cho, Daeshik; Kwak, June M.

2015-01-01

Seed germination is a critical step in a plant’s life cycle that allows successful propagation and is therefore strictly controlled by endogenous and environmental signals. However, the molecular mechanisms underlying germination control remain elusive. Here, we report that the Arabidopsis (Arabidopsis thaliana) glutamate receptor homolog3.5 (AtGLR3.5) is predominantly expressed in germinating seeds and increases cytosolic Ca2+ concentration that counteracts the effect of abscisic acid (ABA) to promote germination. Repression of AtGLR3.5 impairs cytosolic Ca2+ concentration elevation, significantly delays germination, and enhances ABA sensitivity in seeds, whereas overexpression of AtGLR3.5 results in earlier germination and reduced seed sensitivity to ABA. Furthermore, we show that Ca2+ suppresses the expression of ABSCISIC ACID INSENSITIVE4 (ABI4), a key transcription factor involved in ABA response in seeds, and that ABI4 plays a fundamental role in modulation of Ca2+-dependent germination. Taken together, our results provide molecular genetic evidence that AtGLR3.5-mediated Ca2+ influx stimulates seed germination by antagonizing the inhibitory effects of ABA through suppression of ABI4. These findings establish, to our knowledge, a new and pivotal role of the plant glutamate receptor homolog and Ca2+ signaling in germination control and uncover the orchestrated modulation of the AtGLR3.5-mediated Ca2+ signal and ABA signaling via ABI4 to fine-tune the crucial developmental process, germination, in Arabidopsis. PMID:25681329
Cloning and Characterization of an Outer Membrane Protein of Vibrio vulnificus Required for Heme Utilization: Regulation of Expression and Determination of the Gene Sequence

PubMed Central

Litwin, Christine M.; Byrne, Burke L.

1998-01-01

Vibrio vulnificus is a halophilic, marine pathogen that has been associated with septicemia and serious wound infections in patients with iron overload and preexisting liver disease. For V. vulnificus, the ability to acquire iron from the host has been shown to correlate with virulence. V. vulnificus is able to use host iron sources such as hemoglobin and heme. We previously constructed a fur mutant of V. vulnificus which constitutively expresses at least two iron-regulated outer membrane proteins, of 72 and 77 kDa. The N-terminal amino acid sequence of the 77-kDa protein purified from the V. vulnificus fur mutant had 67% homology with the first 15 amino acids of the mature protein of the Vibrio cholerae heme receptor, HutA. In this report, we describe the cloning, DNA sequence, mutagenesis, and analysis of transcriptional regulation of the structural gene for HupA, the heme receptor of V. vulnificus. DNA sequencing of hupA demonstrated a single open reading frame of 712 amino acids that was 50% identical and 66% similar to the sequence of V. cholerae HutA and similar to those of other TonB-dependent outer membrane receptors. Primer extension analysis localized one promoter for the V. vulnificus hupA gene. Analysis of the promoter region of V. vulnificus hupA showed a sequence homologous to the consensus Fur box. Northern blot analysis showed that the transcript was strongly regulated by iron. An internal deletion in the V. vulnificus hupA gene, done by using marker exchange, resulted in the loss of expression of the 77-kDa protein and the loss of the ability to use hemin or hemoglobin as a source of iron. The hupA deletion mutant of V. vulnificus will be helpful in future studies of the role of heme iron in V. vulnificus pathogenesis. PMID:9632577
Cloning and characterization of a Candida albicans gene homologous to fructose-1,6-bisphosphatase genes.

PubMed

De la Rosa, J M; Ruíz, T; Rodríguez, L

2000-12-01

By sequencing of the DNA adjacent to the Candida albicans SEC61 gene, an open reading frame encoding a polypeptide of 331 amino acids was found. The predicted protein showed a strong homology with the fructose-1,6-bisphosphatase [FbPase] from other organisms, and conserved regions included the catalytic motif found in all known FbPases. Although the cloned gene did not complement the growth failure of a Saccharomyces cerevisiae fbp1 mutant in media with gluconeogenic carbon sources, it was transcribed in the transformants in a fashion that indicates a partial repression by glucose. A similar control on the transcription of this gene and on FbPase activity was found in wild-type C. albicans, where the cloned gene (CaFBP1) was shown to be localized in a single chromosomal locus in the genome.
NHE3 in an ancestral vertebrate: primary sequence, distribution, localization, and function in gills.

PubMed

Choe, Keith P; Kato, Akira; Hirose, Shigehisa; Plata, Consuelo; Sindic, Aleksandra; Romero, Michael F; Claiborne, J B; Evans, David H

2005-11-01

In mammals, the Na+/H+ exchanger 3 (NHE3) is expressed with Na+/K+-ATPase in renal proximal tubules, where it secretes H+ and absorbs Na+ to maintain blood pH and volume. In elasmobranchs (sharks, skates, and stingrays), the gills are the dominant site of pH and osmoregulation. This study was conducted to determine whether epithelial NHE homologs exist in elasmobranchs and, if so, to localize their expression in gills and determine whether their expression is altered by environmental salinity or hypercapnia. Degenerate primers and RT-PCR were used to deduce partial sequences of mammalian NHE2 and NHE3 homologs from the gills of the euryhaline Atlantic stingray (Dasyatis sabina). Real-time PCR was then used to demonstrate that mRNA expression of the NHE3 homolog increased when stingrays were transferred to low salinities but not during hypercapnia. Expression of the NHE2 homolog did not change with either treatment. Rapid amplification of cDNA was then used to deduce the complete sequence of a putative NHE3. The 2,744-base pair cDNA includes a coding region for a 2,511-amino acid protein that is 70% identical to human NHE3 (SLC9A3). Antisera generated against the carboxyl tail of the putative stingray NHE3 labeled the apical membranes of Na+/K+-ATPase-rich epithelial cells, and acclimation to freshwater caused a redistribution of labeling in the gills. This study provides the first NHE3 cloned from an elasmobranch and is the first to demonstrate an increase in gill NHE3 expression during acclimation to low salinities, suggesting that NHE3 can absorb Na+ from ion-poor environments.
Discovery of Escherichia coli CRISPR sequences in an undergraduate laboratory.

PubMed

Militello, Kevin T; Lazatin, Justine C

2017-05-01

Clustered regularly interspaced short palindromic repeats (CRISPRs) represent a novel type of adaptive immune system found in eubacteria and archaebacteria. CRISPRs have recently generated a lot of attention due to their unique ability to catalog foreign nucleic acids, their ability to destroy foreign nucleic acids in a mechanism that shares some similarity to RNA interference, and the ability to utilize reconstituted CRISPR systems for genome editing in numerous organisms. In order to introduce CRISPR biology into an undergraduate upper-level laboratory, a five-week set of exercises was designed to allow students to examine the CRISPR status of uncharacterized Escherichia coli strains and to allow the discovery of new repeats and spacers. Students started the project by isolating genomic DNA from E. coli and amplifying the iap CRISPR locus using the polymerase chain reaction (PCR). The PCR products were analyzed by Sanger DNA sequencing, and the sequences were examined for the presence of CRISPR repeat sequences. The regions between the repeats, the spacers, were extracted and analyzed with BLASTN searches. Overall, CRISPR loci were sequenced from several previously uncharacterized E. coli strains and one E. coli K-12 strain. Sanger DNA sequencing resulted in the discovery of 36 spacer sequences and their corresponding surrounding repeat sequences. Five of the spacers were homologous to foreign (non-E. coli) DNA. Assessment of the laboratory indicates that improvements were made in the ability of students to answer questions relating to the structure and function of CRISPRs. Future directions of the laboratory are presented and discussed. © 2016 by The International Union of Biochemistry and Molecular Biology, 45(3):262-269, 2017. © 2016 The International Union of Biochemistry and Molecular Biology.
The organisation and interviral homologies of genes at the 3' end of tobacco rattle virus RNA1

PubMed Central

Boccara, Martine; Hamilton, William D. O.; Baulcombe, David C.

1986-01-01

The RNA1 of tobacco rattle virus (TRV) has been cloned as cDNA and the nucleotide sequence determined of 2 kb from the 3'-terminal region. The sequence contains three long open reading frames. One of these starts 5' of the cDNA and probably corresponds to the carboxy-terminal sequence of a 170-K protein encoded on RNA1. The deduced protein sequence from this reading frame shows homology with the putative replicases of tobacco mosaic virus (TMV) and tricornaviruses. The location of the second open reading frame, which encodes a 29-K polypeptide, was shown by Northern blot analysis to coincide with a 1.6-kb subgenomic RNA. The validity of this reading frame was confirmed by showing that the cDNA extending over this region could be transcribed and translated in vitro to produce a polypeptide of the predicted size which co-migrates in electrophoresis with a translation product of authentic viral RNA. The sequence of this 29-K polypeptide showed homology with two regions in the 30-K protein of TMV. This homology includes positions in the TMV 30-K protein where mutations have been identified which affect the transport of virus between cells. The third open reading frame encodes a potential 16-K protein and was shown by Northern blot hybridisation to be contained within the region of a 0.7-kb subgenomic RNA which is found in cellular RNA of infected cells but not virus particles. The many similarities between TRV and TMV in viral morphology, gene organisation and sequence suggest that these two viral groups may share a common viral ancestor. ImagesFig. 2.Fig. 3. PMID:16453668
Localization, cloning, and sequence determination of the conjugative plasmid ColB2 pilin gene.

PubMed Central

Finlay, B B; Frost, L S; Paranchych, W

1984-01-01

ColB2 is a colicin-producing, 96-kilobase plasmid which encodes a conjugative system that is similar, but not identical, to F. A restriction map of this plasmid was generated, and DNA homology studies between F and ColB2 plasmids revealed homology only between their transfer operons. The locations of the ColB2 transfer operon and ColB2 pilin gene were localized on this restriction map. The gene encoding ColB2 pilin, traA, was cloned and sequenced. The pilin protein of ColB2 is identical to F, except at the amino terminus, where ala-gln of ColB2 pilin corresponds to Ala-Gly-Ser-Ser of F pilin. This is due to a 6-base-pair deletion in the ColB2 pilin gene. Biochemical studies on tryptic peptides derived from ColB2 pilin demonstrate the location of this gene to be correct. There is a putative signal peptidase cleavage site after the sequence Ala-Met-Ala, giving a signal peptide of 51 amino acids and a mature pilin protein of 68 amino acids (7,000 daltons). The amino terminus is blocked, probably with an acetyl group. A chimera containing the ColB2 pilin gene was able to complement an F traA mutant, demonstrating that the pilus assembly proteins of F can utilize the ColB2 pilin protein to form a pilus. Images PMID:6090427
CDSbank: taxonomy-aware extraction, selection, renaming and formatting of protein-coding DNA or amino acid sequences.

PubMed

Hazes, Bart

2014-02-28

Protein-coding DNA sequences and their corresponding amino acid sequences are routinely used to study relationships between sequence, structure, function, and evolution. The rapidly growing size of sequence databases increases the power of such comparative analyses but it makes it more challenging to prepare high quality sequence data sets with control over redundancy, quality, completeness, formatting, and labeling. Software tools for some individual steps in this process exist but manual intervention remains a common and time consuming necessity. CDSbank is a database that stores both the protein-coding DNA sequence (CDS) and amino acid sequence for each protein annotated in Genbank. CDSbank also stores Genbank feature annotation, a flag to indicate incomplete 5' and 3' ends, full taxonomic data, and a heuristic to rank the scientific interest of each species. This rich information allows fully automated data set preparation with a level of sophistication that aims to meet or exceed manual processing. Defaults ensure ease of use for typical scenarios while allowing great flexibility when needed. Access is via a free web server at http://hazeslab.med.ualberta.ca/CDSbank/. CDSbank presents a user-friendly web server to download, filter, format, and name large sequence data sets. Common usage scenarios can be accessed via pre-programmed default choices, while optional sections give full control over the processing pipeline. Particular strengths are: extract protein-coding DNA sequences just as easily as amino acid sequences, full access to taxonomy for labeling and filtering, awareness of incomplete sequences, and the ability to take one protein sequence and extract all synonymous CDS or identical protein sequences in other species. Finally, CDSbank can also create labeled property files to, for instance, annotate or re-label phylogenetic trees.
Transcriptional transitions in Nicotiana benthamiana leaves upon induction of oil synthesis by WRINKLED1 homologs from diverse species and tissues.

PubMed

Grimberg, Åsa; Carlsson, Anders S; Marttila, Salla; Bhalerao, Rishikesh; Hofvander, Per

2015-08-08

Carbon accumulation and remobilization are essential mechanisms in plants to ensure energy transfer between plant tissues with different functions or metabolic needs and to support new generations. Knowledge about the regulation of carbon allocation into oil (triacylglycerol) in plant storage tissue can be of great economic and environmental importance for developing new high-yielding oil crops. Here, the effect on global gene expression as well as on physiological changes in leaves transiently expressing five homologs of the transcription factor WRINKLED1 (WRI1) originating from diverse species and tissues; Arabidopsis thaliana and potato (Solanum tuberosum) seed embryo, poplar (Populus trichocarpa) stem cambium, oat (Avena sativa) grain endosperm, and nutsedge (Cyperus esculentus) tuber parenchyma, were studied by agroinfiltration in Nicotiana benthamiana. All WRI1 homologs induced oil accumulation when expressed in leaf tissue. Transcriptome sequencing revealed that all homologs induced the same general patterns with a drastic shift in gene expression profiles of leaves from that of a typical source tissue to a source-limited sink-like tissue: Transcripts encoding enzymes for plastid uptake and metabolism of phosphoenolpyruvate, fatty acid and oil biosynthesis were up-regulated, as were also transcripts encoding starch degradation. Transcripts encoding enzymes in photosynthesis and starch synthesis were instead down-regulated. Moreover, transcripts representing fatty acid degradation were up-regulated indicating that fatty acids might be degraded to feed the increased need to channel carbons into fatty acid synthesis creating a futile cycle. RT-qPCR analysis of leaves expressing Arabidopsis WRI1 showed the temporal trends of transcripts selected as 'markers' for key metabolic pathways one to five days after agroinfiltration. Chlorophyll fluorescence measurements of leaves expressing Arabidopsis WRI1 showed a significant decrease in photosynthesis, even though
Sequence of a second gene encoding bovine submaxillary mucin: implication for mucin heterogeneity and cloning.

PubMed

Jiang, W; Woitach, J T; Gupta, D; Bhavanandan, V P

1998-10-20

Secreted epithelial mucins are extremely large and heterogeneous glycoproteins. We report the 5 kilobase DNA sequence of a second gene, BSM2, which encodes bovine submaxillary mucin. The determined nucleotide and deduced amino acid sequences of BSM2 are 95.2% and 92. 2% identical, respectively, to those of the previously described BSM1 gene isolated from the same cow. Further, the five predicted protein domains of the two genes are 100%, 94%, 93%, 77%, and 88% identical. Based on the above results, we propose that expression of multiple homologous core proteins from a single animal is a factor in generating diversity of saccharides in mucins and in providing resistance of the molecules to proteolysis. In addition, this work raises several important issues in mucin cloning such as assembling sequences from seemingly overlapping clones and deducing consensus sequences for nearly identical tandem repeats. Copyright 1998 Academic Press.
Cloning and sequencing the genes encoding goldfish and carp ependymin.

PubMed

Adams, D S; Shashoua, V E

1994-04-20

Ependymins (EPNs) are brain glycoproteins thought to function in optic nerve regeneration and long-term memory consolidation. To date, epn genes have been characterized in two orders of teleost fish. In this study, polymerase chain reactions (PCR) were used to amplify the complete 1.6-kb epn genes, gf-I and cc-I, from genomic DNA of Cypriniformes, goldfish and carp, respectively. Amplified bands were cloned and sequenced. Each gene consists of six exons and five introns. The exon portion of gf-I encodes a predicted 215-amino-acid (aa) protein previously characterized as GF-I, while cc-I encodes a predicted 215-aa protein 95% homologous to GF-I.
Homologs of the Xenopus developmental gene DG42 are present in zebrafish and mouse and are involved in the synthesis of Nod-like chitin oligosaccharides during early embryogenesis.

PubMed

Semino, C E; Specht, C A; Raimondi, A; Robbins, P W

1996-05-14

The Xenopus developmental gene DG42 is expressed during early embryonic development, between the midblastula and neurulation stages. The deduced protein sequence of Xenopus DG42 shows similarity to Rhizobium Nod C, Streptococcus Has A, and fungal chitin synthases. Previously, we found that the DG42 protein made in an in vitro transcription/translation system catalyzed synthesis of an array of chitin oligosaccharides. Here we show that cell extracts from early Xenopus and zebrafish embryos also synthesize chitooligosaccharides. cDNA fragments homologous to DG42 from zebrafish and mouse were also cloned and sequenced. Expression of these homologs was similar to that described for Xenopus based on Northern and Western blot analysis. The Xenopus anti-DG42 antibody recognized a 63-kDa protein in extracts from zebrafish embryos that followed a similar developmental expression pattern to that previously described for Xenopus. The chitin oligosaccharide synthase activity found in extracts was inactivated by a specific DG42 antibody; synthesis of hyaluronic acid (HA) was not affected under the conditions tested. Other experiments demonstrate that expression of DG42 under plasmid control in mouse 3T3 cells gives rise to chitooligosaccharide synthase activity without an increase in HA synthase level. A possible relationship between our results and those of other investigators, which show stimulation of HA synthesis by DG42 in mammalian cell culture systems, is provided by structural analyses to be published elsewhere that suggest that chitin oligosaccharides are present at the reducing ends of HA chains. Since in at least one vertebrate system hyaluronic acid formation can be inhibited by a pure chitinase, it seems possible that chitin oligosaccharides serve as primers for hyaluronic acid synthesis.
Small tandemly repeated DNA sequences of higher plants likely originate from a tRNA gene ancestor.

PubMed Central

Benslimane, A A; Dron, M; Hartmann, C; Rode, A

1986-01-01

Several monomers (177 bp) of a tandemly arranged repetitive nuclear DNA sequence of Brassica oleracea have been cloned and sequenced. They share up to 95% homology between one another and up to 80% with other satellite DNA sequences of Cruciferae, suggesting a common ancestor. Both strands of these monomers show more than 50% homology with many tRNA genes; the best homologies have been obtained with Lys and His yeast mitochondrial tRNA genes (respectively 64% and 60%). These results suggest that small tandemly repeated DNA sequences of plants may have evolved from a tRNA gene ancestor. These tandem repeats have probably arisen via a process involving reverse transcription of polymerase III RNA intermediates, as is the case for interspersed DNA sequences of mammalians. A model is proposed to explain the formation of such small tandemly repeated DNA sequences. Images PMID:3774553

ANCAC: amino acid, nucleotide, and codon analysis of COGs--a tool for sequence bias analysis in microbial orthologs.

PubMed

Meiler, Arno; Klinger, Claudia; Kaufmann, Michael

2012-09-08

The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG) within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC's NUCOCOG dataset as the largest one available for that purpose thus far. Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills.
Homology of vanadium oxide

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vasyutinskii, N.A.

1987-05-01

The authors examine the homology of vanadium oxide and note that data on the existence of phases and homogeneity limits in the V-O system are very contradictory. A graphical illustration shows the homologous series of vanadium oxides. The predominant part of the discrete formations in the system V-O is characterized by integral stoichiometry and forms six homologous series. It is found that homologous series of vanadium oxides are not only a basis for systematization of such oxides, but also may serve as a means for predicting the composition of new phases, limits of homogeneity, their structure, and properties.
Cloning and Sequencing of Defective Particles Derived from the Autonomous Parvovirus Minute Virus of Mice for the Construction of Vectors with Minimal cis-Acting Sequences

PubMed Central

Clément, Nathalie; Avalosse, Bernard; El Bakkouri, Karim; Velu, Thierry; Brandenburger, Annick

2001-01-01

The production of wild-type-free stocks of recombinant parvovirus minute virus of mice [MVM(p)] is difficult due to the presence of homologous sequences in vector and helper genomes that cannot easily be eliminated from the overlapping coding sequences. We have therefore cloned and sequenced spontaneously occurring defective particles of MVM(p) with very small genomes to identify the minimal cis-acting sequences required for DNA amplification and virus production. One of them has lost all capsid-coding sequences but is still able to replicate in permissive cells when nonstructural proteins are provided in trans by a helper plasmid. Vectors derived from this particle produce stocks with no detectable wild-type MVM after cotransfection with new, matched, helper plasmids that present no homology downstream from the transgene. PMID:11152501
Identification and Analysis of Novel Amino-Acid Sequence Repeats in Bacillus anthracis str. Ames Proteome Using Computational Tools

PubMed Central

Hemalatha, G. R.; Rao, D. Satyanarayana; Guruprasad, L.

2007-01-01

We have identified four repeats and ten domains that are novel in proteins encoded by the Bacillus anthracis str. Ames proteome using automated in silico methods. A “repeat” corresponds to a region comprising less than 55-amino-acid residues that occur more than once in the protein sequence and sometimes present in tandem. A “domain” corresponds to a conserved region with greater than 55-amino-acid residues and may be present as single or multiple copies in the protein sequence. These correspond to (1) 57-amino-acid-residue PxV domain, (2) 122-amino-acid-residue FxF domain, (3) 111-amino-acid-residue YEFF domain, (4) 109-amino-acid-residue IMxxH domain, (5) 103-amino-acid-residue VxxT domain, (6) 84-amino-acid-residue ExW domain, (7) 104-amino-acid-residue NTGFIG domain, (8) 36-amino-acid-residue NxGK repeat, (9) 95-amino-acid-residue VYV domain, (10) 75-amino-acid-residue KEWE domain, (11) 59-amino-acid-residue AFL domain, (12) 53-amino-acid-residue RIDVK repeat, (13) (a) 41-amino-acid-residue AGQF repeat and (b) 42-amino-acid-residue GSAL repeat. A repeat or domain type is characterized by specific conserved sequence motifs. We discuss the presence of these repeats and domains in proteins from other genomes and their probable secondary structure. PMID:17538688
Cloning and sequence of the gene encoding a cefotaxime-hydrolyzing class A beta-lactamase isolated from Escherichia coli.

PubMed Central

Ishii, Y; Ohno, A; Taguchi, H; Imajo, S; Ishiguro, M; Matsuzawa, H

1995-01-01

Escherichia coli TUH12191, which is resistant to piperacillin, cefazolin, cefotiam, ceftizoxime, cefuzonam, and aztreonam but is susceptible to cefoxitin, latamoxef, flomoxef, and imipenem, was isolated from the urine of a patient treated with beta-lactam antibiotics. The beta-lactamase (Toho-1) purified from the bacteria had a pI of 7.8, had a molecular weight of about 29,000, and hydrolyzed beta-lactam antibiotics such as penicillin G, ampicillin, oxacillin, carbenicillin, piperacillin, cephalothin, cefoxitin, cefotaxime, ceftazidime, and aztreonam. Toho-1 was markedly inhibited by beta-lactamase inhibitors such as clavulanic acid and tazobactam. Resistance to beta-lactams, streptomycin, spectinomycin, sulfamethoxazole, and trimethoprim was transferred by conjugational transfer from E. coli TUH12191 to E. coli ML4903, and the transferred plasmid was about 58 kbp, belonging to incompatibility group M. The cefotaxime resistance gene for Toho-1 was subcloned from the 58-kbp plasmid by transformation of E. coli MV1184. The sequence of the gene for Toho-1 was determined, and the open reading frame of the gene consisted of 873 or 876 bases (initial sequence, ATGATG). The nucleotide sequence of the gene (DDBJ accession number D37830) was found to be about 73% homologous to the sequence of the gene encoding a class A beta-lactamase produced by Klebsiella oxytoca E23004. According to the amino acid sequence deduced from the DNA sequence, the precursor consisted of 290 or 291 amino acid residues, which contained amino acid motifs common to class A beta-lactamases (70SXXK, 130SDN, and 234KTG). Toho-1 was about 83% homologous to the beta-lactamase mediated by the chromosome of K. oxytoca D488 and the beta-lactamase mediated by the plasmid of E. coli MEN-1. Therefore, the newly isolated beta-lactamase Toho-1 produced by E. coli TUH12191 is similar to beta-lactamases produced by K. oxytoca D488, K. oxytoca E23004, and E. coli MEN-1 rather than to mutants of TEM or SHV enzymes
Somatic association of telocentric chromosomes carrying homologous centromeres in common wheat.

PubMed

Mello-Sampayo, T

1973-01-01

Measurements of distances between telocentric chromosomes, either homologous or representing the opposite arms of a metacentric chromosome (complementary telocentrics), were made at metaphase in root tip cells of common wheat carrying two homologous pairs of complementary telocentrics of chromosome 1 B or 6 B (double ditelosomic 1 B or 6 B). The aim was to elucidate the relative locations of the telocentric chromosomes within the cell. The data obtained strongly suggest that all four telocentrics of chromosome 1 B or 6 B are spacially and simultaneously co-associated. In plants carrying two complementary (6 B (S) and 6 B (L)) and a non-related (5 B (L)) telocentric, only the complementary chromosomes were found to be somatically associated. It is thought, therefore, that the somatic association of chromosomes may involve more than two chromosomes in the same association and, since complementary telocentrics are as much associated as homologous, that the homology between centromeres (probably the only homologous region that exists between complementary telocentrics) is a very important condition for somatic association of chromosomes. The spacial arrangement of chromosomes was studied at anaphase and prophase and the polar orientation of chromosomes at prophase was found to resemble anaphase orientation. This was taken as good evidence for the maintenance of the chromosome arrangement - the Rabl orientation - and of the peripheral location of the centromere and its association with the nuclear membrane. Within this general arrangement homologous telocentric chromosomes were frequently seen to have their centromeres associated or directed towards each other. The role of the centromere in somatic association as a spindle fibre attachment and chromosome binder is discussed. It is suggested that for non-homologous chromosomes to become associated in root tips, the only requirement needed should be the homology of centromeres such as exists between complementary
The HMMER Web Server for Protein Sequence Similarity Search.

PubMed

Prakash, Ananth; Jeffryes, Matt; Bateman, Alex; Finn, Robert D

2017-12-08

Protein sequence similarity search is one of the most commonly used bioinformatics methods for identifying evolutionarily related proteins. In general, sequences that are evolutionarily related share some degree of similarity, and sequence-search algorithms use this principle to identify homologs. The requirement for a fast and sensitive sequence search method led to the development of the HMMER software, which in the latest version (v3.1) uses a combination of sophisticated acceleration heuristics and mathematical and computational optimizations to enable the use of profile hidden Markov models (HMMs) for sequence analysis. The HMMER Web server provides a common platform by linking the HMMER algorithms to databases, thereby enabling the search for homologs, as well as providing sequence and functional annotation by linking external databases. This unit describes three basic protocols and two alternate protocols that explain how to use the HMMER Web server using various input formats and user defined parameters. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.
Evidence of Divergent Amino Acid Usage in Comparative Analyses of R5- and X4-Associated HIV-1 Vpr Sequences

PubMed Central

Antell, Gregory C.; Zhong, Wen; Kercher, Katherine; Passic, Shendra; Williams, Jean; Liu, Yucheng; James, Tony; Jacobson, Jeffrey M.; Szep, Zsofia

2017-01-01

Vpr is an HIV-1 accessory protein that plays numerous roles during viral replication, and some of which are cell type dependent. To test the hypothesis that HIV-1 tropism extends beyond the envelope into the vpr gene, studies were performed to identify the associations between coreceptor usage and Vpr variation in HIV-1-infected patients. Colinear HIV-1 Env-V3 and Vpr amino acid sequences were obtained from the LANL HIV-1 sequence database and from well-suppressed patients in the Drexel/Temple Medicine CNS AIDS Research and Eradication Study (CARES) Cohort. Genotypic classification of Env-V3 sequences as X4 (CXCR4-utilizing) or R5 (CCR5-utilizing) was used to group colinear Vpr sequences. To reveal the sequences associated with a specific coreceptor usage genotype, Vpr amino acid sequences were assessed for amino acid diversity and Jensen-Shannon divergence between the two groups. Five amino acid alphabets were used to comprehensively examine the impact of amino acid substitutions involving side chains with similar physiochemical properties. Positions 36, 37, 41, 89, and 96 of Vpr were characterized by statistically significant divergence across multiple alphabets when X4 and R5 sequence groups were compared. In addition, consensus amino acid switches were found at positions 37 and 41 in comparisons of the R5 and X4 sequence populations. These results suggest an evolutionary link between Vpr and gp120 in HIV-1-infected patients. PMID:28620613
Human Retroviruses and AIDS. A compilation and analysis of nucleic acid and amino acid sequences: I--II; III--V

DOE Office of Scientific and Technical Information (OSTI.GOV)

Myers, G.; Korber, B.; Wain-Hobson, S.

1993-12-31

This compendium and the accompanying floppy diskettes are the result of an effort to compile and rapidly publish all relevant molecular data concerning the human immunodeficiency viruses (HIV) and related retroviruses. The scope of the compendium and database is best summarized by the five parts that it comprises: (I) HIV and SIV Nucleotide Sequences; (II) Amino Acid Sequences; (III) Analyses; (IV) Related Sequences; and (V) Database Communications. Information within all the parts is updated at least twice in each year, which accounts for the modes of binding and pagination in the compendium.
Gene Unprediction with Spurio: A tool to identify spurious protein sequences.

PubMed

Höps, Wolfram; Jeffryes, Matt; Bateman, Alex

2018-01-01

We now have access to the sequences of tens of millions of proteins. These protein sequences are essential for modern molecular biology and computational biology. The vast majority of protein sequences are derived from gene prediction tools and have no experimental supporting evidence for their translation. Despite the increasing accuracy of gene prediction tools there likely exists a large number of spurious protein predictions in the sequence databases. We have developed the Spurio tool to help identify spurious protein predictions in prokaryotes. Spurio searches the query protein sequence against a prokaryotic nucleotide database using tblastn and identifies homologous sequences. The tblastn matches are used to score the query sequence's likelihood of being a spurious protein prediction using a Gaussian process model. The most informative feature is the appearance of stop codons within the presumed translation of homologous DNA sequences. Benchmarking shows that the Spurio tool is able to distinguish spurious from true proteins. However, transposon proteins are prone to be predicted as spurious because of the frequency of degraded homologs found in the DNA sequence databases. Our initial experiments suggest that less than 1% of the proteins in the UniProtKB sequence database are likely to be spurious and that Spurio is able to identify over 60 times more spurious proteins than the AntiFam resource. The Spurio software and source code is available under an MIT license at the following URL: https://bitbucket.org/bateman-group/spurio.
Structural insights into the anti-HIV activity of the Oscillatoria agardhii agglutinin homolog lectin family.

PubMed

Koharudin, Leonardus M I; Kollipara, Sireesha; Aiken, Christopher; Gronenborn, Angela M

2012-09-28

Oscillatoria agardhii agglutinin homolog (OAAH) proteins belong to a recently discovered lectin family. All members contain a sequence repeat of ~66 amino acids, with the number of repeats varying among different family members. Apart from data for the founding member OAA, neither three-dimensional structures, information about carbohydrate binding specificities, nor antiviral activity data have been available up to now for any other members of the OAAH family. To elucidate the structural basis for the antiviral mechanism of OAAHs, we determined the crystal structures of Pseudomonas fluorescens and Myxococcus xanthus lectins. Both proteins exhibit the same fold, resembling the founding family member, OAA, with minor differences in loop conformations. Carbohydrate binding studies by NMR and x-ray structures of glycan-lectin complexes reveal that the number of sugar binding sites corresponds to the number of sequence repeats in each protein. As for OAA, tight and specific binding to α3,α6-mannopentaose was observed. All the OAAH proteins described here exhibit potent anti-HIV activity at comparable levels. Altogether, our results provide structural details of the protein-carbohydrate interaction for this novel lectin family and insights into the molecular basis of their HIV inactivation properties.
Molecular evolution of an Avirulence Homolog (Avh) gene subfamily in Phytophthora ramorum

Treesearch

GossErica M.; Caroline M. Press; Niklaus J. Grünwald

2008-01-01

Pathogen effectors can serve a virulence function on behalf of the pathogen or trigger a rapid defense response in resistant hosts. Sequencing of the Phytophthora ramorum genome and subsequent analysis identified a diverse superfamily of approximately 350 genes that are homologous to the four known avirulence genes in plant pathogenic oomycetes and...
Complete nucleotide sequences of the coat protein messenger RNAs of brome mosaic virus and cowpea chlorotic mottle virus.

PubMed Central

Dasgupta, R; Kaesberg, P

1982-01-01

The nucleotide sequences of the subgenomic coat protein messengers (RNA4's) of two related bromoviruses, brome mosaic virus (BMV) and cowpea chlorotic mottle virus (CCMV), have been determined by direct RNA and CDNA sequencing without cloning. BMV RNA4 is 876 b long including a 5' noncoding region of nine nucleotides and a 3' noncoding region of 300 nucleotides. CCMV RNA 4 is 824 b long, including a 5' noncoding region of 10 nucleotides and a 3' noncoding region of 244 nucleotides. The encoded coat proteins are similar in length (188 amino acids for BMV and 189 amino acids for CCMV) and display about 70% homology in their amino acid sequences. Length difference between the two RNAs is due mostly to a single deletion, in CCMV with respect to BMV, of about 57 b immediately following the coding region. Allowing for this deletion the RNAs are indicate that mutations leading to divergence were constrained in the coding region primarily by the requirement of maintaining a favorable coat protein structure and in the 3' noncoding region primarily by the requirement of maintaining a favorable RNA spatial configuration. PMID:6895941
Bacteriophage T5 encodes a homolog of the eukaryotic transcription coactivator PC4 implicated in recombination-dependent DNA replication.

PubMed

Steigemann, Birthe; Schulz, Annina; Werten, Sebastiaan

2013-11-15

The RNA polymerase II cofactor PC4 globally regulates transcription of protein-encoding genes through interactions with unwinding DNA, the basal transcription machinery and transcription activators. Here, we report the surprising identification of PC4 homologs in all sequenced representatives of the T5 family of bacteriophages, as well as in an archaeon and seven phyla of eubacteria. We have solved the crystal structure of the full-length T5 protein at 1.9Å, revealing a striking resemblance to the characteristic single-stranded DNA (ssDNA)-binding core domain of PC4. Intriguing novel structural features include a potential regulatory region at the N-terminus and a C-terminal extension of the homodimerisation interface. The genome organisation of T5-related bacteriophages points at involvement of the PC4 homolog in recombination-dependent DNA replication, strongly suggesting that the protein corresponds to the hitherto elusive replicative ssDNA-binding protein of the T5 family. Our findings imply that PC4-like factors intervene in multiple unwinding-related processes by acting as versatile modifiers of nucleic acid conformation and raise the possibility that the eukaryotic transcription coactivator derives from ancestral DNA replication, recombination and repair factors. © 2013.
Overexpression of the homologous lanosterol synthase gene in ganoderic acid biosynthesis in Ganoderma lingzhi.

PubMed

Zhang, De-Huai; Li, Na; Yu, Xuya; Zhao, Peng; Li, Tao; Xu, Jun-Wei

2017-02-01

Ganoderic acids (GAs) in Ganoderma lingzhi exhibit anticancer and antimetastatic activities. GA yields can be potentially improved by manipulating G. lingzhi through genetic engineering. In this study, a putative lanosterol synthase (LS) gene was cloned and overexpressed in G. lingzhi. Results showed that its overexpression (OE) increased the ganoderic acid (GA) content and the accumulation of lanosterol and ergosterol in a submerged G. lingzhi culture. The maximum contents of GA-O, GA-Mk, GA-T, GA-S, GA-Mf, and GA-Me in transgenic strains were 46.6 ± 4.8, 24.3 ± 3.5, 69.8 ± 8.2, 28.9 ± 1.4, 15.4 ± 1.2, and 26.7 ± 3.1 μg/100 mg dry weight, respectively, these values being 6.1-, 2.2-, 3.2-, 4.8-, 2.0-, and 1.9-times higher than those in wild-type strains. In addition, accumulated amounts of lanosterol and ergosterol in transgenic strains were 2.3 and 1.4-fold higher than those in the control strains, respectively. The transcription level of LS was also increased by more than five times in the presence of the G. lingzhi glyceraldehyde-3-phosphate dehydrogenase gene promoter, whereas transcription levels of 3-hydroxy-3-methylglutaryl coenzyme A enzyme and squalene synthase did not change significantly in transgenic strains. This study demonstrated that OE of the homologous LS gene can enhance lanosterol accumulation. A large precursor supply promotes GA biosynthesis. Copyright © 2016 Elsevier Ltd. All rights reserved.
GCPred: a web tool for guanylyl cyclase functional centre prediction from amino acid sequence.

PubMed

Xu, Nuo; Fu, Dongfang; Li, Shiang; Wang, Yuxuan; Wong, Aloysius

2018-06-15

GCPred is a webserver for the prediction of guanylyl cyclase (GC) functional centres from amino acid sequence. GCs are enzymes that generate the signalling molecule cyclic guanosine 3', 5'-monophosphate from guanosine-5'-triphosphate. A novel class of GC centres (GCCs) has been identified in complex plant proteins. Using currently available experimental data, GCPred is created to automate and facilitate the identification of similar GCCs. The server features GCC values that consider in its calculation, the physicochemical properties of amino acids constituting the GCC and the conserved amino acids within the centre. From user input amino acid sequence, the server returns a table of GCC values and graphs depicting deviations from mean values. The utility of this server is demonstrated using plant proteins and the human interleukin-1 receptor-associated kinase family of proteins as example. The GCPred server is available at http://gcpred.com. Supplementary data are available at Bioinformatics online.
Integrated proteomics, genomics, metabolomics approaches reveal oxalic acid as pathogenicity factor in Tilletia indica inciting Karnal bunt disease of wheat.

PubMed

Pandey, Vishakha; Singh, Manoj; Pandey, Dinesh; Kumar, Anil

2018-05-18

Tilletia indica incites Karnal bunt (KB) disease in wheat. To date, no KB resistant wheat cultivar could be developed due to non-availability of potential biomarkers related to pathogenicity/virulence for screening of resistant wheat genotypes. The present study was carried out to compare the proteomes of T. indica highly (TiK) and low (TiP) virulent isolates. Twenty one protein spots consistently observed as up-regulated/differential in the TiK proteome were selected for identification by MALDI-TOF/TOF. Identified sequences showed homology with fungal proteins playing essential role in plant infection and pathogen survival, including stress response, adhesion, fungal penetration, invasion, colonization, degradation of host cell wall, signal transduction pathway. These results were integrated with T. indica genome sequence for identification of homologs of candidate pathogenicity/virulence related proteins. Protein identified in TiK isolate as malate dehydrogenase that converts malate to oxaloacetate which is precursor of oxalic acid. Oxalic acid is key pathogenicity factor in phytopathogenic fungi. These results were validated by GC-MS based metabolic profiling of T. indica isolates indicating that oxalic acid was exclusively identified in TiK isolate. Thus, integrated omics approaches leads to identification of pathogenicity/virulence factor(s) that would provide insights into pathogenic mechanisms of fungi and aid in devising effective disease management strategies.
Deletion of the Clostridium thermocellum recA gene reveals that it is required for thermophilic plasmid replication but not plasmid integration at homologous DNA sequences.

PubMed

Groom, Joseph; Chung, Daehwan; Kim, Sun-Ki; Guss, Adam; Westpheling, Janet

2018-05-28

A limitation to the engineering of cellulolytic thermophiles is the availability of functional, thermostable (≥ 60 °C) replicating plasmid vectors for rapid expression and testing of genes that provide improved or novel fuel molecule production pathways. A series of plasmid vectors for genetic manipulation of the cellulolytic thermophile Caldicellulosiruptor bescii has recently been extended to Clostridium thermocellum, another cellulolytic thermophile that very efficiently solubilizes plant biomass and produces ethanol. While the C. bescii pBAS2 replicon on these plasmids is thermostable, the use of homologous promoters, signal sequences and genes led to undesired integration into the bacterial chromosome, a result also observed with less thermostable replicating vectors. In an attempt to overcome undesired plasmid integration in C. thermocellum, a deletion of recA was constructed. As expected, C. thermocellum ∆recA showed impaired growth in chemically defined medium and an increased susceptibility to UV damage. Interestingly, we also found that recA is required for replication of the C. bescii thermophilic plasmid pBAS2 in C. thermocellum, but it is not required for replication of plasmid pNW33N. In addition, the C. thermocellum recA mutant retained the ability to integrate homologous DNA into the C. thermocellum chromosome. These data indicate that recA can be required for replication of certain plasmids, and that a recA-independent mechanism exists for the integration of homologous DNA into the C. thermocellum chromosome. Understanding thermophilic plasmid replication is not only important for engineering of these cellulolytic thermophiles, but also for developing genetic systems in similar new potentially useful non-model organisms.
Deletion of the Clostridium thermocellum recA Gene Reveals that it is Required for Thermophilic Plasmid Replication but not Plasmid Integration at Homologous DNA Sequences

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chung, Daehwan; Groom, Joseph; Kim, Sun-Ki

A limitation to the engineering of cellulolytic thermophiles is the availability of functional, thermostable (>/= 60 degrees C) replicating plasmid vectors for rapid expression and testing of genes that provide improved or novel fuel molecule production pathways. A series of plasmid vectors for genetic manipulation of the cellulolytic thermophile Caldicellulosiruptor bescii has recently been extended to Clostridium thermocellum, another cellulolytic thermophile that very efficiently solubilizes plant biomass and produces ethanol. While the C. bescii pBAS2 replicon on these plasmids is thermostable, the use of homologous promoters, signal sequences and genes led to undesired integration into the bacterial chromosome, a resultmore » also observed with less thermostable replicating vectors. In an attempt to overcome undesired plasmid integration in C. thermocellum, a deletion of recA was constructed. As expected, C. thermocellum ..delta..recA showed impaired growth in chemically defined medium and an increased susceptibility to UV damage. Interestingly, we also found that recA is required for replication of the C. bescii thermophilic plasmid pBAS2 in C. thermocellum, but it is not required for replication of plasmid pNW33N. In addition, the C. thermocellum recA mutant retained the ability to integrate homologous DNA into the C. thermocellum chromosome. These data indicate that recA can be required for replication of certain plasmids, and that a recA-independent mechanism exists for the integration of homologous DNA into the C. thermocellum chromosome. Understanding thermophilic plasmid replication is not only important for engineering of these cellulolytic thermophiles, but also for developing genetic systems in similar new potentially useful non-model organisms.« less
External and semi-internal controls for PCR amplification of homologous sequences in mixed templates.

PubMed

Kalle, Elena; Gulevich, Alexander; Rensing, Christopher

2013-11-01

In a mixed template, the presence of homologous target DNA sequences creates environments that almost inevitably give rise to artifacts and biases during PCR. Heteroduplexes, chimeras, and skewed template-to-product ratios are the exclusive attributes of mixed template PCR and never occur in a single template assay. Yet, multi-template PCR has been used without appropriate attention to quality control and assay validation, in spite of the fact that such practice diminishes the reliability of results. External and internal amplification controls became obligatory elements of good laboratory practice in different PCR assays. We propose the inclusion of an analogous approach as a quality control system for multi-template PCR applications. The amplification controls must take into account the characteristics of multi-template PCR and be able to effectively monitor particular assay performance. This study demonstrated the efficiency of a model mixed template as an adequate external amplification control for a particular PCR application. The conditions of multi-template PCR do not allow implementation of a classic internal control; therefore we developed a convenient semi-internal control as an acceptable alternative. In order to evaluate the effects of inhibitors, a model multi-template mix was amplified in a mixture with DNAse-treated sample. Semi-internal control allowed establishment of intervals for robust PCR performance for different samples, thus enabling correct comparison of the samples. The complexity of the external and semi-internal amplification controls must be comparable with the assumed complexity of the samples. We also emphasize that amplification controls should be applied in multi-template PCR regardless of the post-assay method used to analyze products. © 2013 Elsevier B.V. All rights reserved.

Genomic DNA sequence and cytosine methylation changes of adult rice leaves after seeds space flight

NASA Astrophysics Data System (ADS)

Shi, Jinming

In this study, cytosine methylation on CCGG site and genomic DNA sequence changes of adult leaves of rice after seeds space flight were detected by methylation-sensitive amplification polymorphism (MSAP) and Amplified fragment length polymorphism (AFLP) technique respectively. Rice seeds were planted in the trial field after 4 days space flight on the shenzhou-6 Spaceship of China. Adult leaves of space-treated rice including 8 plants chosen randomly and 2 plants with phenotypic mutation were used for AFLP and MSAP analysis. Polymorphism of both DNA sequence and cytosine methylation were detected. For MSAP analysis, the average polymorphic frequency of the on-ground controls, space-treated plants and mutants are 1.3%, 3.1% and 11% respectively. For AFLP analysis, the average polymorphic frequencies are 1.4%, 2.9%and 8%respectively. Total 27 and 22 polymorphic fragments were cloned sequenced from MSAP and AFLP analysis respectively. Nine of the 27 fragments from MSAP analysis show homology to coding sequence. For the 22 polymorphic fragments from AFLP analysis, no one shows homology to mRNA sequence and eight fragments show homology to repeat region or retrotransposon sequence. These results suggest that although both genomic DNA sequence and cytosine methylation status can be effected by space flight, the genomic region homology to the fragments from genome DNA and cytosine methylation analysis were different.
Homology Modeling of Class A G Protein-Coupled Receptors

PubMed Central

Costanzi, Stefano

2012-01-01

G protein-coupled receptors (GPCRs) are a large superfamily of membrane bound signaling proteins that hold great pharmaceutical interest. Since experimentally elucidated structures are available only for a very limited number of receptors, homology modeling has become a widespread technique for the construction of GPCR models intended to study the structure-function relationships of the receptors and aid the discovery and development of ligands capable of modulating their activity. Through this chapter, various aspects involved in the constructions of homology models of the serpentine domain of the largest class of GPCRs, known as class A or rhodopsin family, are illustrated. In particular, the chapter provides suggestions, guidelines and critical thoughts on some of the most crucial aspect of GPCR modeling, including: collection of candidate templates and a structure-based alignment of their sequences; identification and alignment of the transmembrane helices of the query receptor to the corresponding domains of the candidate templates; selection of one or more templates receptor; election of homology or de novo modeling for the construction of specific extracellular and intracellular domains; construction of the three-dimensional models, with special consideration to extracellular regions, disulfide bridges, and interhelical cavity; validation of the models through controlled virtual screening experiments. PMID:22323225
GCView: the genomic context viewer for protein homology searches

PubMed Central

Grin, Iwan; Linke, Dirk

2011-01-01

Genomic neighborhood can provide important insights into evolution and function of a protein or gene. When looking at operons, changes in operon structure and composition can only be revealed by looking at the operon as a whole. To facilitate the analysis of the genomic context of a query in multiple organisms we have developed Genomic Context Viewer (GCView). GCView accepts results from one or multiple protein homology searches such as BLASTp as input. For each hit, the neighboring protein-coding genes are extracted, the regions of homology are labeled for each input and the results are presented as a clear, interactive graphical output. It is also possible to add more searches to iteratively refine the output. GCView groups outputs by the hits for different proteins. This allows for easy comparison of different operon compositions and structures. The tool is embedded in the framework of the Bioinformatics Toolkit of the Max-Planck Institute for Developmental Biology (MPI Toolkit). Job results from the homology search tools inside the MPI Toolkit can be forwarded to GCView and results can be subsequently analyzed by sequence analysis tools. Results are stored online, allowing for later reinspection. GCView is freely available at http://toolkit.tuebingen.mpg.de/gcview. PMID:21609955
Efficient Detection of Copy Number Mutations in PMS2 Exons with a Close Homolog.

PubMed

Herman, Daniel S; Smith, Christina; Liu, Chang; Vaughn, Cecily P; Palaniappan, Selvi; Pritchard, Colin C; Shirts, Brian H

2018-07-01

Detection of 3' PMS2 copy-number mutations that cause Lynch syndrome is difficult because of highly homologous pseudogenes. To improve the accuracy and efficiency of clinical screening for these mutations, we developed a new method to analyze standard capture-based, next-generation sequencing data to identify deletions and duplications in PMS2 exons 9 to 15. The approach captures sequences using PMS2 targets, maps sequences randomly among regions with equal mapping quality, counts reads aligned to homologous exons and introns, and flags read count ratios outside of empirically derived reference ranges. The method was trained on 1352 samples, including 8 known positives, and tested on 719 samples, including 17 known positives. Clinical implementation of the first version of this method detected new mutations in the training (N = 7) and test (N = 2) sets that had not been identified by our initial clinical testing pipeline. The described final method showed complete sensitivity in both sample sets and false-positive rates of 5% (training) and 7% (test), dramatically decreasing the number of cases needing additional mutation evaluation. This approach leveraged the differences between gene and pseudogene to distinguish between PMS2 and PMS2CL copy-number mutations. These methods enable efficient and sensitive Lynch syndrome screening for 3' PMS2 copy-number mutations and may be applied similarly to other genomic regions with highly homologous pseudogenes. Copyright © 2018 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.
DSAP: deep-sequencing small RNA analysis pipeline.

PubMed

Huang, Po-Jung; Liu, Yi-Chung; Lee, Chi-Ching; Lin, Wei-Chen; Gan, Richie Ruei-Chi; Lyu, Ping-Chiang; Tang, Petrus

2010-07-01

DSAP is an automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology. DSAP uses a tab-delimited file as an input format, which holds the unique sequence reads (tags) and their corresponding number of copies generated by the Solexa sequencing platform. The input data will go through four analysis steps in DSAP: (i) cleanup: removal of adaptors and poly-A/T/C/G/N nucleotides; (ii) clustering: grouping of cleaned sequence tags into unique sequence clusters; (iii) non-coding RNA (ncRNA) matching: sequence homology mapping against a transcribed sequence library from the ncRNA database Rfam (http://rfam.sanger.ac.uk/); and (iv) known miRNA matching: detection of known miRNAs in miRBase (http://www.mirbase.org/) based on sequence homology. The expression levels corresponding to matched ncRNAs and miRNAs are summarized in multi-color clickable bar charts linked to external databases. DSAP is also capable of displaying miRNA expression levels from different jobs using a log(2)-scaled color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in miRBase. DSAP is available at http://dsap.cgu.edu.tw.
Prokaryotic Caspase Homologs: Phylogenetic Patterns and Functional Characteristics Reveal Considerable Diversity

PubMed Central

Asplund-Samuelsson, Johannes; Bergman, Birgitta; Larsson, John

2012-01-01

Caspases accomplish initiation and execution of apoptosis, a programmed cell death process specific to metazoans. The existence of prokaryotic caspase homologs, termed metacaspases, has been known for slightly more than a decade. Despite their potential connection to the evolution of programmed cell death in eukaryotes, the phylogenetic distribution and functions of these prokaryotic metacaspase sequences are largely uncharted, while a few experiments imply involvement in programmed cell death. Aiming at providing a more detailed picture of prokaryotic caspase homologs, we applied a computational approach based on Hidden Markov Model search profiles to identify and functionally characterize putative metacaspases in bacterial and archaeal genomes. Out of the total of 1463 analyzed genomes, merely 267 (18%) were identified to contain putative metacaspases, but their taxonomic distribution included most prokaryotic phyla and a few archaea (Euryarchaeota). Metacaspases were particularly abundant in Alphaproteobacteria, Deltaproteobacteria and Cyanobacteria, which harbor many morphologically and developmentally complex organisms, and a distinct correlation was found between abundance and phenotypic complexity in Cyanobacteria. Notably, Bacillus subtilis and Escherichia coli, known to undergo genetically regulated autolysis, lacked metacaspases. Pfam domain architecture analysis combined with operon identification revealed rich and varied configurations among the metacaspase sequences. These imply roles in programmed cell death, but also e.g. in signaling, various enzymatic activities and protein modification. Together our data show a wide and scattered distribution of caspase homologs in prokaryotes with structurally and functionally diverse sub-groups, and with a potentially intriguing evolutionary role. These features will help delineate future characterizations of death pathways in prokaryotes. PMID:23185476
Four Trypanosoma brucei fatty acyl-CoA synthetases: fatty acid specificity of the recombinant proteins.

PubMed Central

Jiang, D W; Englund, P T

2001-01-01

As part of our investigation of fatty acid metabolism in Trypanosoma brucei, we have expressed four acyl-CoA synthetase (TbACS) genes in Esherichia coli. The recombinant proteins, with His-tags on their C-termini, were purified to near homogeneity using nickel-chelate affinity chromatography. Although these enzymes are highly homologous, they have distinct specificities for fatty acid chain length. TbACS1 prefers saturated fatty acids in the range C(11:0) to C(14:0) and TbACS2 prefers shorter fatty acids, mainly C(10:0). TbACS3 and 4, which have 95% sequence identity, have similar specificities, favouring fatty acids between C(14:0) and C(17:0). In addition, TbACS1, 3 and 4 function well with a variety of unsaturated fatty acids. PMID:11535136
Nucleotide sequence of the phosphoglycerate kinase gene from the extreme thermophile Thermus thermophilus. Comparison of the deduced amino acid sequence with that of the mesophilic yeast phosphoglycerate kinase.

PubMed Central

Bowen, D; Littlechild, J A; Fothergill, J E; Watson, H C; Hall, L

1988-01-01

Using oligonucleotide probes derived from amino acid sequencing information, the structural gene for phosphoglycerate kinase from the extreme thermophile, Thermus thermophilus, was cloned in Escherichia coli and its complete nucleotide sequence determined. The gene consists of an open reading frame corresponding to a protein of 390 amino acid residues (calculated Mr 41,791) with an extreme bias for G or C (93.1%) in the codon third base position. Comparison of the deduced amino acid sequence with that of the corresponding mesophilic yeast enzyme indicated a number of significant differences. These are discussed in terms of the unusual codon bias and their possible role in enhanced protein thermal stability. Images Fig. 1. PMID:3052437
Genetic Homologies Among Streptomyces violaceoruber Strains

PubMed Central

Monson, A. M.; Bradley, S. G.; Enquist, L. W.; Cruces, Griselda

1969-01-01

Most of the genetic studies on streptomycetes have been done with cultures erroneously designated as Streptomyces coelicolor. To determine whether these cultures are genetically homologous with the S. violaceoruber nominifer, their deoxyribonucleic acids (DNA) were analyzed, and selected pairs of mutants were crossed. The four cultures used in genetic studies, and called S. coelicolor in the literature, were found to constitute a genospecies, based upon DNA hybridization and recombination tests. In addition, DNA from Actinopycnidium caeruleum formed extensive duplexes with S. violaceoruber DNA. S. violaceoruber cultures and A. caeruleum were distinctly different from the S. coelicolor nominifer. PMID:5370275
Draft genome sequence of the silver pomfret fish, Pampus argenteus.

PubMed

AlMomin, Sabah; Kumar, Vinod; Al-Amad, Sami; Al-Hussaini, Mohsen; Dashti, Talal; Al-Enezi, Khaznah; Akbar, Abrar

2016-01-01

Silver pomfret, Pampus argenteus, is a fish species from coastal waters. Despite its high commercial value, this edible fish has not been sequenced. Hence, its genetic and genomic studies have been limited. We report the first draft genome sequence of the silver pomfret obtained using a Next Generation Sequencing (NGS) technology. We assembled 38.7 Gb of nucleotides into scaffolds of 350 Mb with N50 of about 1.5 kb, using high quality paired end reads. These scaffolds represent 63.7% of the estimated silver pomfret genome length. The newly sequenced and assembled genome has 11.06% repetitive DNA regions, and this percentage is comparable to that of the tilapia genome. The genome analysis predicted 16 322 genes. About 91% of these genes showed homology with known proteins. Many gene clusters were annotated to protein and fatty-acid metabolism pathways that may be important in the context of the meat texture and immune system developmental processes. The reference genome can pave the way for the identification of many other genomic features that could improve breeding and population-management strategies, and it can also help characterize the genetic diversity of P. argenteus.
The complete genome sequence of the Atlantic salmon paramyxovirus (ASPV)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nylund, Stian; Karlsen, Marius; Nylund, Are

2008-03-30

The complete RNA genome of the Atlantic salmon paramyxovirus (ASPV), isolated from Atlantic salmon suffering from proliferative gill inflammation (PGI), has been determined. The genome is 16,965 nucleotides in length and consists of six nonoverlapping genes in the order 3'- N - P/C/V - M - F - HN - L -5', coding for the nucleocapsid, phospho-, matrix, fusion, hemagglutinin-neuraminidase and large polymerase proteins, respectively. The gene junctions contain highly conserved transcription start and stop signal sequences and trinucleotide intergenic regions similar to those of other Paramyxoviridae. The ASPV P-gene expression strategy is like that of the respiro- and morbilliviruses,more » which express the phosphoprotein from the primary transcript, and edit a portion of the mRNA to encode the accessory proteins V and W. It also encodes the C-protein by ribosomal choice of translation initiation. Pairwise comparisons of amino acid identities, and phylogenetic analysis of deduced ASPV protein sequences with homologous sequences from other Paramyxoviridae, show that ASPV has an affinity for the genus Respirovirus, but may represent a new genus within the subfamily Paramyxovirinae.« less
Reevaluation of the Reliability and Usefulness of the Somatic Homologous Recombination Reporter Lines

PubMed Central

Ülker, Bekir; Hommelsheim, Carl Maximilian; Berson, Tobias; Thomas, Stefan; Chandrasekar, Balakumaran; Olcay, Ahmet Can; Berendzen, Kenneth Wayne; Frantzeskakis, Lamprinos

2012-01-01

A widely used approach for assessing genome instability in plants makes use of somatic homologous recombination (SHR) reporter lines. Here, we review the published characteristics and uses of SHR lines. We found a lack of detailed information on these lines and a lack of sufficient evidence that they report only homologous recombination. We postulate that instead of SHR, these lines might be reporting a number of alternative stress-induced stochastic events known to occur at transcriptional, posttranscriptional, and posttranslational levels. We conclude that the reliability and usefulness of the somatic homologous recombination reporter lines requires revision. Thus, more detailed information about these reporter lines is needed before they can be used with confidence to measure genome instability, including the complete sequences of SHR constructs, the genomic location of reporter genes and, importantly, molecular evidence that reconstituted gene expression in these lines is indeed a result of somatic recombination. PMID:23144181
Object-oriented Persistent Homology

PubMed Central

Wang, Bao; Wei, Guo-Wei

2015-01-01

Persistent homology provides a new approach for the topological simplification of big data via measuring the life time of intrinsic topological features in a filtration process and has found its success in scientific and engineering applications. However, such a success is essentially limited to qualitative data classification and analysis. Indeed, persistent homology has rarely been employed for quantitative modeling and prediction. Additionally, the present persistent homology is a passive tool, rather than a proactive technique, for classification and analysis. In this work, we outline a general protocol to construct object-oriented persistent homology methods. By means of differential geometry theory of surfaces, we construct an objective functional, namely, a surface free energy defined on the data of interest. The minimization of the objective functional leads to a Laplace-Beltrami operator which generates a multiscale representation of the initial data and offers an objective oriented filtration process. The resulting differential geometry based object-oriented persistent homology is able to preserve desirable geometric features in the evolutionary filtration and enhances the corresponding topological persistence. The cubical complex based homology algorithm is employed in the present work to be compatible with the Cartesian representation of the Laplace-Beltrami flow. The proposed Laplace-Beltrami flow based persistent homology method is extensively validated. The consistence between Laplace-Beltrami flow based filtration and Euclidean distance based filtration is confirmed on the Vietoris-Rips complex for a large amount of numerical tests. The convergence and reliability of the present Laplace-Beltrami flow based cubical complex filtration approach are analyzed over various spatial and temporal mesh sizes. The Laplace-Beltrami flow based persistent homology approach is utilized to study the intrinsic topology of proteins and fullerene molecules. Based on a
Human mRNA polyadenylate binding protein: evolutionary conservation of a nucleic acid binding motif.

PubMed Central

Grange, T; de Sa, C M; Oddos, J; Pictet, R

1987-01-01

We have isolated a full length cDNA (cDNA) coding for the human poly(A) binding protein. The cDNA derived 73 kd basic translation product has the same Mr, isoelectric point and peptidic map as the poly(A) binding protein. DNA sequence analysis reveals a 70,244 dalton protein. The N terminal part, highly homologous to the yeast poly(A) binding protein, is sufficient for poly(A) binding activity. This domain consists of a four-fold repeated unit of approximately 80 amino acids present in other nucleic acid binding proteins. In the C terminal part there is, as in the yeast protein, a sequence of approximately 150 amino acids, rich in proline, alanine and glutamine which together account for 48% of the residues. A 2,9 kb mRNA corresponding to this cDNA has been detected in several vertebrate cell types and in Drosophila melanogaster at every developmental stage including oogenesis. Images PMID:2885805
The colocalization transition of homologous chromosomes at meiosis

NASA Astrophysics Data System (ADS)

Nicodemi, Mario; Panning, Barbara; Prisco, Antonella

2008-06-01

Meiosis is the specialized cell division required in sexual reproduction. During its early stages, in the mother cell nucleus, homologous chromosomes recognize each other and colocalize in a crucial step that remains one of the most mysterious of meiosis. Starting from recent discoveries on the system molecular components and interactions, we discuss a statistical mechanics model of chromosome early pairing. Binding molecules mediate long-distance interaction of special DNA recognition sequences and, if their concentration exceeds a critical threshold, they induce a spontaneous colocalization transition of chromosomes, otherwise independently diffusing.
Complete cDNA sequence of SAP-like pentraxin from Limulus polyphemus: implications for pentraxin evolution.

PubMed

Tharia, Hazel A; Shrive, Annette K; Mills, John D; Arme, Chris; Williams, Gwyn T; Greenhough, Trevor J

2002-02-22

The serum amyloid P component (SAP)-like pentraxin Limulus polyphemus SAP is a recently discovered, distinct pentraxin species, of known structure, which does not bind phosphocholine and whose N-terminal sequence has been shown to differ markedly from the highly conserved N terminus of all other known horseshoe crab pentraxins. The complete cDNA sequence of Limulus SAP, and the derived amino acid sequence, the first invertebrate SAP-like pentraxin sequence, have been determined. Two sequences were identified that differed only in the length of the 3' untranslated region. Limulus SAP is synthesised as a precursor protein of 234 amino acid residues, the first 17 residues encoding a signal peptide that is absent from the mature protein. Phylogenetic analysis clusters Limulus SAP pentraxin with the horseshoe crab C-reactive proteins (CRPs) rather than the mammalian SAPs, which are clustered with mammalian CRPs. The deduced amino acid sequence shares 22% identity with both human SAP and CRP, which are 51% identical, and 31-35% with horseshoe crab CRPs. These analyses indicate that gene duplication of CRP (or SAP), followed by sequence divergence and the evolution of CRP and/or SAP function, occurred independently along the chordate and arthropod evolutionary lines rather than in a common ancestor. They further indicate that the CRP/SAP gene duplication event in Limulus occurred before both the emergence of the Limulus CRP variants and the mammalian CRP/SAP gene duplication. Limulus SAP, which does not exhibit the CRP characteristic of calcium-dependent binding to phosphocholine, is established as a pentraxin species distinct from all other known horseshoe crab pentraxins that exist in many variant forms sharing a high level of sequence homology. Copyright 2002 Elsevier Science Ltd.
Transcription of telomeric DNA leads to high levels of homologous recombination and t-loops.

PubMed

Kar, Anirban; Willcox, Smaranda; Griffith, Jack D

2016-11-02

The formation of DNA loops at chromosome ends (t-loops) and the transcription of telomeres producing G-rich RNA (TERRA) represent two central features of telomeres. To explore a possible link between them we employed artificial human telomeres containing long arrays of TTAGGG repeats flanked by the T7 or T3 promoters. Transcription of these DNAs generates a high frequency of t-loops within individual molecules and homologous recombination events between different DNAs at their telomeric sequences. T-loop formation does not require a single strand overhang, arguing that both terminal strands insert into the preceding duplex. The loops are very stable and some RNase H resistant TERRA remains at the t-loop, likely adding to their stability. Transcription of DNAs containing TTAGTG or TGAGTG repeats showed greatly reduced loop formation. While in the cell multiple pathways may lead to t-loop formation, the pathway revealed here does not depend on the shelterins but rather on the unique character of telomeric DNA when it is opened for transcription. Hence, telomeric sequences may have evolved to facilitate their ability to loop back on themselves. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Annotating Protein Functional Residues by Coupling High-Throughput Fitness Profile and Homologous-Structure Analysis

PubMed Central

Du, Yushen; Wu, Nicholas C.; Jiang, Lin; Zhang, Tianhao; Gong, Danyang; Shu, Sara; Wu, Ting-Ting

2016-01-01

ABSTRACT Identification and annotation of functional residues are fundamental questions in protein sequence analysis. Sequence and structure conservation provides valuable information to tackle these questions. It is, however, limited by the incomplete sampling of sequence space in natural evolution. Moreover, proteins often have multiple functions, with overlapping sequences that present challenges to accurate annotation of the exact functions of individual residues by conservation-based methods. Using the influenza A virus PB1 protein as an example, we developed a method to systematically identify and annotate functional residues. We used saturation mutagenesis and high-throughput sequencing to measure the replication capacity of single nucleotide mutations across the entire PB1 protein. After predicting protein stability upon mutations, we identified functional PB1 residues that are essential for viral replication. To further annotate the functional residues important to the canonical or noncanonical functions of viral RNA-dependent RNA polymerase (vRdRp), we performed a homologous-structure analysis with 16 different vRdRp structures. We achieved high sensitivity in annotating the known canonical polymerase functional residues. Moreover, we identified a cluster of noncanonical functional residues located in the loop region of the PB1 β-ribbon. We further demonstrated that these residues were important for PB1 protein nuclear import through the interaction with Ran-binding protein 5. In summary, we developed a systematic and sensitive method to identify and annotate functional residues that are not restrained by sequence conservation. Importantly, this method is generally applicable to other proteins about which homologous-structure information is available. PMID:27803181
Cloning and Sequence Analysis of Vibrio halioticoli Genes Encoding Three Types of Polyguluronate Lyase.

PubMed

Sugimura; Sawabe; Ezura

2000-01-01

The alginate lyase-coding genes of Vibrio halioticoli IAM 14596(T), which was isolated from the gut of the abalone Haliotis discus hannai, were cloned using plasmid vector pUC 18, and expressed in Escherichia coli. Three alginate lyase-positive clones, pVHB, pVHC, and pVHE, were obtained, and all clones expressed the enzyme activity specific for polyguluronate. Three genes, alyVG1, alyVG2, and alyVG3, encoding polyguluronate lyase were sequenced: alyVG1 from pVHB was composed of a 1056-bp open reading frame (ORF) encoding 352 amino acid residues; alyVG2 gene from pVHC was composed of a 993-bp ORF encoding 331 amino acid residues; and alyVG3 gene from pVHE was composed of a 705-bp ORF encoding 235 amino acid residues. Comparison of nucleotide and deduced amino acid sequences among AlyVG1, AlyVG2, and AlyVG3 revealed low homologies. The identity value between AlyVG1 and AlyVG2 was 18.7%, and that between AlyVG2 and AlyVG3 was 17.0%. A higher identity value (26.0%) was observed between AlyVG1 and AlyVG3. Sequence comparison among known polyguluronate lyases including AlyVG1, AlyVG2, and AlyVG3 also did not reveal an identical region in these sequences. However, AlyVG1 showed the highest identity value (36.2%) and the highest similarity (73.3%) to AlyA from Klebsiella pneumoniae. A consensus region comprising nine amino acid (YFKAGXYXQ) in the carboxy-terminal region previously reported by Mallisard and colleagues was observed only in AlyVG1 and AlyVG2.
Functional Genomics Analysis of Singapore Grouper Iridovirus: Complete Sequence Determination and Proteomic Analysis

PubMed Central

Song, Wen Jun; Qin, Qi Wei; Qiu, Jin; Huang, Can Hua; Wang, Fan; Hew, Choy Leong

2004-01-01

Here we report the complete genome sequence of Singapore grouper iridovirus (SGIV). Sequencing of the random shotgun and restriction endonuclease genomic libraries showed that the entire SGIV genome consists of 140,131 nucleotide bp. One hundred sixty-two open reading frames (ORFs) from the sense and antisense DNA strands, coding for lengths varying from 41 to 1,268 amino acids, were identified. Computer-assisted analyses of the deduced amino acid sequences revealed that 77 of the ORFs exhibited homologies to known virus genes, 23 of which matched functional iridovirus proteins. Forty-two putative conserved domains or signatures were detected in the National Center for Biotechnology Information CD-Search database and PROSITE database. An assortment of enzyme activities involved in DNA replication, transcription, nucleotide metabolism, cell signaling, etc., were identified. Viruses were cultured on a cell line derived from the embryonated egg of the grouper Epinephelus tauvina, isolated, and purified by sucrose gradient ultracentrifugation. The protein extract from the purified virions was analyzed by polyacrylamide gel electrophoresis followed by in-gel digestion of protein bands. Matrix-assisted laser desorption ionization-time of flight mass spectrometry and database searching led to identification of 26 proteins. Twenty of these represented novel or previously unidentified genes, which were further confirmed by reverse transcription-PCR (RT-PCR) and DNA sequencing of their respective RT-PCR products. PMID:15507645

DNA sequence analysis of ARS elements from chromosome III of Saccharomyces cerevisiae: identification of a new conserved sequence.

PubMed Central

Palzkill, T G; Oliver, S G; Newlon, C S

1986-01-01

Four fragments of Saccharomyces cerevisiae chromosome III DNA which carry ARS elements have been sequenced. Each fragment contains multiple copies of sequences that have at least 10 out of 11 bases of homology to a previously reported 11 bp core consensus sequence. A survey of these new ARS sequences and previously reported sequences revealed the presence of an additional 11 bp conserved element located on the 3' side of the T-rich strand of the core consensus. Subcloning analysis as well as deletion and transposon insertion mutagenesis of ARS fragments support a role for 3' conserved sequence in promoting ARS activity. PMID:3529036
Identification of a transformer homolog in the acorn worm, Saccoglossus kowalevskii, and analysis of its activity in insect cells.

PubMed

Suzuki, Masataka G; Tochigi, Mayuko; Sakaguchi, Honami; Aoki, Fugaku; Miyamoto, Norio

2015-06-01

The transformer (tra) gene is an intermediate component of the sex determination hierarchy in many insect species. The homolog of tra is also found in two branchiopod crustacean species but is not known outside arthropods. We have isolated a tra homolog in the acorn worm, Saccoglossus kowalevskii, which is a hemichordate belonging to the deuterostome superphylum. The full-length complementary DNA (cDNA) of the S. kowalevskii tra homolog (Sktra) has a 3786-bp open reading frame that encodes a 1261-amino acid sequence including a TRA-CAM domain and an arginine/serine (RS)-rich domain, both of which are characteristic of TRA orthologs. Reverse transcription PCR (RT-PCR) analyses demonstrated that Sktra showed no differences in expression patterns between testes and ovaries, but its expression level was approximately 7.5-fold higher in the testes than in the ovaries. TRA, together with the protein product of the transformer-2 (tra-2) gene, assembles on doublesex (dsx) pre-messenger RNA (mRNA) via the cis-regulatory element, enhancing female-specific splicing of dsx in Drosophila. To understand functional conservation of the SkTRA protein as a dsx-splicing activator, we investigated whether SkTRA is capable of inducing female-specific splicing of the Drosophila dsx. Ectopic expression of Sktra cDNA in insect cultured cells did not induce the female-specific splicing of dsx. On the other hand, forced expression of Sktra-2 (a tra-2 homolog of S. kowalevskii) was able to induce the female-specific dsx splicing. These results demonstrate that the function as a dsx-splicing activator is not conserved in SkTRA even though SkTRA-2 is capable of functionally replacing the Drosophila TRA-2. We have also found a tra homolog in an echinoderm genome. This study provides the first evidence that that tra is conserved not only in arthropods but also in basal species of deuterostoms.
Molecular identification and partial sequence analysis of an aryl hydrocarbon receptor from beluga (Delphinapterus leucas)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jensen, B.A.; Hahn, M.E.

1995-12-31

The aryl hydrocarbon receptor (AhR) mediates the effects of many common and potentially toxic organic hydrocarbons, including some polychlorinated biphenyls and dioxins. Since small cetaceans often inhabit industrially polluted coastal waters, comparison of the molecular structure and function of this protein in cetaeans with other marine and mammalian species is important for evaluating the sensitivity of cetaceans to these pollutants. An AhR protein has been identified in beluga liver by photoaffinity labeling. In the present study, the authors sought to clone and sequence an AhR cDNA from beluga as a prelude to studying its structure and function, using reverse-transcription polymerasemore » chain reaction (RT-PCR) and degenerate primers, a 515 base pair fragment was amplified, cloned and sequenced, revealing homology to the PAS domain (ligand binding and dimerization region) of AhRs from terrestrial mammals. This portion of the putative beluga AhR has 82% amino acid and 81% nucleotide sequence identity to the mouse AhR, and 63% amino acid and 64% nucleotide sequence identity to an AhR from the marine fish Fundulus heteroclitus. A beluga cDNA library was synthesized and is currently being screened with the PCR-generated fragment to obtain the complete coding sequence. This is the first molecular evidence of AhR presence in cetaceans.« less
MHD simulations of homologous and cannibalistic coronal mass ejections

NASA Astrophysics Data System (ADS)

Fan, Yuhong; Chatterjee, Piyali

2014-06-01

We present magneto-hydrodynamic simulations of the development of a homologous sequence of coronal mass ejections (CMEs) and demonstrate their so-called cannibalistic behavior. These CMEs originate from the repeated formations and partial eruptions of kink unstable flux ropes as a result of the continued emergence of a twisted flux rope across the lower boundary into a pre-existing coronal potential arcade field. The simulations show that a CME erupting into the open magnetic field created by a preceding CME has a higher speed, and therefore tends to be cannibalistic, catching up and merging with the preceding one into a single fast CME. All the CMEs attained speeds of about 1000 km/s as they exit the domain. The reformation of a twisted flux rope after each CME eruption during the sustained flux emergence can naturally explain the X-ray observations of repeated reformations of sigmoids and “sigmoid-under-cusp” configurations at a low-coronal source of homologous CMEs.
Origin and spread of photosynthesis based upon conserved sequence features in key bacteriochlorophyll biosynthesis proteins.

PubMed

Gupta, Radhey S

2012-11-01

The origin of photosynthesis and how this capability has spread to other bacterial phyla remain important unresolved questions. I describe here a number of conserved signature indels (CSIs) in key proteins involved in bacteriochlorophyll (Bchl) biosynthesis that provide important insights in these regards. The proteins BchL and BchX, which are essential for Bchl biosynthesis, are derived by gene duplication in a common ancestor of all phototrophs. More ancient gene duplication gave rise to the BchX-BchL proteins and the NifH protein of the nitrogenase complex. The sequence alignment of NifH-BchX-BchL proteins contain two CSIs that are uniquely shared by all NifH and BchX homologs, but not by any BchL homologs. These CSIs and phylogenetic analysis of NifH-BchX-BchL protein sequences strongly suggest that the BchX homologs are ancestral to BchL and that the Bchl-based anoxygenic photosynthesis originated prior to the chlorophyll (Chl)-based photosynthesis in cyanobacteria. Another CSI in the BchX-BchL sequence alignment that is uniquely shared by all BchX homologs and the BchL sequences from Heliobacteriaceae, but absent in all other BchL homologs, suggests that the BchL homologs from Heliobacteriaceae are primitive in comparison to all other photosynthetic lineages. Several other identified CSIs in the BchN homologs are commonly shared by all proteobacterial homologs and a clade consisting of the marine unicellular Cyanobacteria (Clade C). These CSIs in conjunction with the results of phylogenetic analyses and pair-wise sequence similarity on the BchL, BchN, and BchB proteins, where the homologs from Clade C Cyanobacteria and Proteobacteria exhibited close relationship, provide strong evidence that these two groups have incurred lateral gene transfers. Additionally, phylogenetic analyses and several CSIs in the BchL-N-B proteins that are uniquely shared by all Chlorobi and Chloroflexi homologs provide evidence that the genes for these proteins have also been
Nucleotide sequence and regulatory studies of VGF, a nervous system-specific mRNA that is rapidly and relatively selectively induced by nerve growth factor.

PubMed

Salton, S R

1991-09-01

A nervous system-specific mRNA that is rapidly induced in PC12 cells to a greater extent by nerve growth factor (NGF) than by epidermal growth factor treatment has been cloned. The polypeptide deduced from the nucleic acid sequence of the NGF33.1 cDNA clone contains regions of amino acid sequence identity with that predicted by the cDNA clone VGF, and further analysis suggests that both NGF33.1 and VGF cDNA clones very likely correspond to the same mRNA (VGF). In this report both the nucleic acid sequence that corresponds to VGF mRNA and the polypeptide predicted by the NGF33.1 cDNA clone are presented. Genomic Southern analysis and database comparison did not detect additional sequences with high homology to the VGF gene. Induction of VGF mRNA by depolarization and phorbol 12-myristate 13-acetate treatment was greater than by serum stimulation or protein kinase A pathway activation. These studies suggest that VGF mRNA is induced to the greatest extent by NGF treatment and that VGF is one of the most rapidly regulated neuronal mRNAs identified in PC12 cells.
High frequency of phylogenetically diverse reductive dehalogenase-homologous genes in deep subseafloor sedimentary metagenomes

PubMed Central

Kawai, Mikihiko; Futagami, Taiki; Toyoda, Atsushi; Takaki, Yoshihiro; Nishi, Shinro; Hori, Sayaka; Arai, Wataru; Tsubouchi, Taishi; Morono, Yuki; Uchiyama, Ikuo; Ito, Takehiko; Fujiyama, Asao; Inagaki, Fumio; Takami, Hideto

2014-01-01

Marine subsurface sediments on the Pacific margin harbor diverse microbial communities even at depths of several hundreds meters below the seafloor (mbsf) or more. Previous PCR-based molecular analysis showed the presence of diverse reductive dehalogenase gene (rdhA) homologs in marine subsurface sediment, suggesting that anaerobic respiration of organohalides is one of the possible energy-yielding pathways in the organic-rich sedimentary habitat. However, primer-independent molecular characterization of rdhA has remained to be demonstrated. Here, we studied the diversity and frequency of rdhA homologs by metagenomic analysis of five different depth horizons (0.8, 5.1, 18.6, 48.5, and 107.0 mbsf) at Site C9001 off the Shimokita Peninsula of Japan. From all metagenomic pools, remarkably diverse rdhA-homologous sequences, some of which are affiliated with novel clusters, were observed with high frequency. As a comparison, we also examined frequency of dissimilatory sulfite reductase genes (dsrAB), key functional genes for microbial sulfate reduction. The dsrAB were also widely observed in the metagenomic pools whereas the frequency of dsrAB genes was generally smaller than that of rdhA-homologous genes. The phylogenetic composition of rdhA-homologous genes was similar among the five depth horizons. Our metagenomic data revealed that subseafloor rdhA homologs are more diverse than previously identified from PCR-based molecular studies. Spatial distribution of similar rdhA homologs across wide depositional ages indicates that the heterotrophic metabolic processes mediated by the genes can be ecologically important, functioning in the organic-rich subseafloor sedimentary biosphere. PMID:24624126
High frequency of phylogenetically diverse reductive dehalogenase-homologous genes in deep subseafloor sedimentary metagenomes.

PubMed

Kawai, Mikihiko; Futagami, Taiki; Toyoda, Atsushi; Takaki, Yoshihiro; Nishi, Shinro; Hori, Sayaka; Arai, Wataru; Tsubouchi, Taishi; Morono, Yuki; Uchiyama, Ikuo; Ito, Takehiko; Fujiyama, Asao; Inagaki, Fumio; Takami, Hideto

2014-01-01

Marine subsurface sediments on the Pacific margin harbor diverse microbial communities even at depths of several hundreds meters below the seafloor (mbsf) or more. Previous PCR-based molecular analysis showed the presence of diverse reductive dehalogenase gene (rdhA) homologs in marine subsurface sediment, suggesting that anaerobic respiration of organohalides is one of the possible energy-yielding pathways in the organic-rich sedimentary habitat. However, primer-independent molecular characterization of rdhA has remained to be demonstrated. Here, we studied the diversity and frequency of rdhA homologs by metagenomic analysis of five different depth horizons (0.8, 5.1, 18.6, 48.5, and 107.0 mbsf) at Site C9001 off the Shimokita Peninsula of Japan. From all metagenomic pools, remarkably diverse rdhA-homologous sequences, some of which are affiliated with novel clusters, were observed with high frequency. As a comparison, we also examined frequency of dissimilatory sulfite reductase genes (dsrAB), key functional genes for microbial sulfate reduction. The dsrAB were also widely observed in the metagenomic pools whereas the frequency of dsrAB genes was generally smaller than that of rdhA-homologous genes. The phylogenetic composition of rdhA-homologous genes was similar among the five depth horizons. Our metagenomic data revealed that subseafloor rdhA homologs are more diverse than previously identified from PCR-based molecular studies. Spatial distribution of similar rdhA homologs across wide depositional ages indicates that the heterotrophic metabolic processes mediated by the genes can be ecologically important, functioning in the organic-rich subseafloor sedimentary biosphere.
37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

Code of Federal Regulations, 2013 CFR

2013-07-01

... in WIPO Standard ST.25 (1998), Appendix 2, Tables 1 and 3. This incorporation by reference was... ST.25 (1998), Appendix 2, Tables 1 and 3, shall be listed in a given sequence as “n” or “Xaa... acids. (1) The amino acids in a protein or peptide sequence shall be listed using the three-letter...
37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

Code of Federal Regulations, 2010 CFR

2010-07-01

... in WIPO Standard ST.25 (1998), Appendix 2, Tables 1 and 3. This incorporation by reference was... ST.25 (1998), Appendix 2, Tables 1 and 3, shall be listed in a given sequence as “n” or “Xaa... acids. (1) The amino acids in a protein or peptide sequence shall be listed using the three-letter...
37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

Code of Federal Regulations, 2012 CFR

2012-07-01

... in WIPO Standard ST.25 (1998), Appendix 2, Tables 1 and 3. This incorporation by reference was... ST.25 (1998), Appendix 2, Tables 1 and 3, shall be listed in a given sequence as “n” or “Xaa... acids. (1) The amino acids in a protein or peptide sequence shall be listed using the three-letter...
Draft genome sequence of the docosahexaenoic acid producing thraustochytrid Aurantiochytrium sp. T66.

PubMed

Liu, Bin; Ertesvåg, Helga; Aasen, Inga Marie; Vadstein, Olav; Brautaset, Trygve; Heggeset, Tonje Marita Bjerkan

2016-06-01

Thraustochytrids are unicellular, marine protists, and there is a growing industrial interest in these organisms, particularly because some species, including strains belonging to the genus Aurantiochytrium, accumulate high levels of docosahexaenoic acid (DHA). Here, we report the draft genome sequence of Aurantiochytrium sp. T66 (ATCC PRA-276), with a size of 43 Mbp, and 11,683 predicted protein-coding sequences. The data has been deposited at DDBJ/EMBL/Genbank under the accession LNGJ00000000. The genome sequence will contribute new insight into DHA biosynthesis and regulation, providing a basis for metabolic engineering of thraustochytrids.
A Sabin 2-related poliovirus recombinant contains a homologous sequence of human enterovirus species C in the viral polymerase coding region.

PubMed

Zhang, Yong; Zhang, Fan; Zhu, Shuangli; Chen, Li; Yan, Dongmei; Wang, Dongyan; Tang, Ruiyan; Zhu, Hui; Hou, Xiaohui; An, Hongqiu; Zhang, Hong; Xu, Wenbo

2010-02-01

A type 2 vaccine-related poliovirus (strain CHN3024), differing from the Sabin 2 strain by 0.44% in the VP1 coding region was isolated from a patient with vaccine-associated paralytic poliomyelitis. Sequences downstream of nucleotide position 6735 (3D(pol) coding region) were derived from an unidentified sequence; no close match for a potential parent was found, but it could be classified into a non-polio human enteroviruses species C (HEV-C) phylogeny. The virus differed antigenically from the parental Sabin strain, having an amino acid substitution in the neutralizing antigenic site 1. The similarity between CHN3024 and Sabin 2 sequences suggests that the recombination was recent; this is supported by the estimation that the initiating OPV dose was given only 36-75 days before sampling. The patient's clinical manifestations, intratypic differentiation examination, and whole-genome sequencing showed that this recombinant exhibited characteristics of neurovirulent vaccine-derived polioviruses (VDPV), which may, thus, pose a potential threat to a polio-free world.
Better understanding of homologous recombination through a 12-week laboratory course for undergraduates majoring in biotechnology.

PubMed

Li, Ming; Shen, Xiaodong; Zhao, Yan; Hu, Xiaomei; Hu, Fuquan; Rao, Xiancai

2017-07-08

Homologous recombination, a central concept in biology, is defined as the exchange of DNA strands between two similar or identical nucleotide sequences. Unfortunately, undergraduate students majoring in biotechnology often experience difficulties in understanding the molecular basis of homologous recombination. In this study, we developed and implemented a 12-week laboratory course for biotechnology undergraduates in which gene targeting in Streptococcus suis was used to facilitate their understanding of the basic concept and process of homologous recombination. Students worked in teams of two to select a gene of interest to create a knockout mutant using methods that relied on homologous recombination. By integrating abstract knowledge and practice in the process of scientific research, students gained hands-on experience in molecular biology techniques while learning about the principle and process of homologous recombination. The learning outcomes and survey-based assessment demonstrated that students substantially enhanced their understanding of how homologous recombination could be used to study gene function. Overall, the course was very effective for helping biotechnology undergraduates learn the theory and application of homologous recombination, while also yielding positive effects in developing confidence and scientific skills for future work in research. © 2017 by The International Union of Biochemistry and Molecular Biology, 45(4):329-335, 2017. © 2017 The International Union of Biochemistry and Molecular Biology.
ANCAC: amino acid, nucleotide, and codon analysis of COGs – a tool for sequence bias analysis in microbial orthologs

PubMed Central

2012-01-01

Background The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG) within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Results Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC’s NUCOCOG dataset as the largest one available for that purpose thus far. Conclusions Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills. PMID:22958836
Differential recognition of the ORF2 region in a complete genome sequence of porcine circovirus type 2 (PCV2) isolated from boar bone marrow in Korea.

PubMed

Kweon, Chang-Hee; Nguyen, Lien Thi Kim; Yoo, Mi-Sun; Kang, Seung-Won

2015-09-15

Porcine circovirus type 2 (PCV2) is the causative agent of post-weaning multisystemic wasting syndrome (PMWS) in swine. Here, a phylogenetic tree was constructed using PCV2 nucleotide sequences derived from the bone marrow of Korean boar and previously reported PCV2 sequences isolated from various countries. PCV2 from Korean boar bone marrow (KC188796) was classified into the group containing PCV2a-Canada and other PCV2 strain from Korea. While the ORF1 region of the PCV2 genome was highly conserved, ORF2 (the capsid protein coding region) was relatively variable. The nucleotide sequences for bone marrow-derived PCV2 were 93.4-99.0% homologous to the other reference sequences. The deduced amino acid sequences for the ORF1 and ORF2 coding regions were 97.4-99.3% and 84.5-97.4% homologous with the other reference strains, respectively, indicating that KC188796 did not differ markedly from the other PCV2 strains. Phylogenetic analysis demonstrated that bone marrow-derived PCV2 was highly similar to PCV2a from Canada and may be related to persistent PCV2 infections in swine. Copyright © 2015 Elsevier B.V. All rights reserved.
DockRank: Ranking docked conformations using partner-specific sequence homology-based protein interface prediction

PubMed Central

Xue, Li C.; Jordan, Rafael A.; EL-Manzalawy, Yasser; Dobbs, Drena; Honavar, Vasant

2015-01-01

Selecting near-native conformations from the immense number of conformations generated by docking programs remains a major challenge in molecular docking. We introduce DockRank, a novel approach to scoring docked conformations based on the degree to which the interface residues of the docked conformation match a set of predicted interface residues. Dock-Rank uses interface residues predicted by partner-specific sequence homology-based protein–protein interface predictor (PS-HomPPI), which predicts the interface residues of a query protein with a specific interaction partner. We compared the performance of DockRank with several state-of-the-art docking scoring functions using Success Rate (the percentage of cases that have at least one near-native conformation among the top m conformations) and Hit Rate (the percentage of near-native conformations that are included among the top m conformations). In cases where it is possible to obtain partner-specific (PS) interface predictions from PS-HomPPI, DockRank consistently outperforms both (i) ZRank and IRAD, two state-of-the-art energy-based scoring functions (improving Success Rate by up to 4-fold); and (ii) Variants of DockRank that use predicted interface residues obtained from several protein interface predictors that do not take into account the binding partner in making interface predictions (improving success rate by up to 39-fold). The latter result underscores the importance of using partner-specific interface residues in scoring docked conformations. We show that DockRank, when used to re-rank the conformations returned by ClusPro, improves upon the original ClusPro rankings in terms of both Success Rate and Hit Rate. DockRank is available as a server at http://einstein.cs.iastate.edu/DockRank/. PMID:23873600
DockRank: ranking docked conformations using partner-specific sequence homology-based protein interface prediction.

PubMed

Xue, Li C; Jordan, Rafael A; El-Manzalawy, Yasser; Dobbs, Drena; Honavar, Vasant

2014-02-01

Selecting near-native conformations from the immense number of conformations generated by docking programs remains a major challenge in molecular docking. We introduce DockRank, a novel approach to scoring docked conformations based on the degree to which the interface residues of the docked conformation match a set of predicted interface residues. DockRank uses interface residues predicted by partner-specific sequence homology-based protein-protein interface predictor (PS-HomPPI), which predicts the interface residues of a query protein with a specific interaction partner. We compared the performance of DockRank with several state-of-the-art docking scoring functions using Success Rate (the percentage of cases that have at least one near-native conformation among the top m conformations) and Hit Rate (the percentage of near-native conformations that are included among the top m conformations). In cases where it is possible to obtain partner-specific (PS) interface predictions from PS-HomPPI, DockRank consistently outperforms both (i) ZRank and IRAD, two state-of-the-art energy-based scoring functions (improving Success Rate by up to 4-fold); and (ii) Variants of DockRank that use predicted interface residues obtained from several protein interface predictors that do not take into account the binding partner in making interface predictions (improving success rate by up to 39-fold). The latter result underscores the importance of using partner-specific interface residues in scoring docked conformations. We show that DockRank, when used to re-rank the conformations returned by ClusPro, improves upon the original ClusPro rankings in terms of both Success Rate and Hit Rate. DockRank is available as a server at http://einstein.cs.iastate.edu/DockRank/. Copyright © 2013 Wiley Periodicals, Inc.
A zinc finger domain gene in the lizard, Calotes versicolor, shows extensive homology with the mammalian ZFX and is expressed embryonically.

PubMed

Ganesh, S; Choudhary, B; Raman, R

1998-01-01

A 590-bp long zinc finger domain DNA fragment has been isolated by polymerase chain reaction from the lizard, Calotes versicolor, employing the primers used for amplifying the zinc finger domain of the human Y-chromosomal gene, ZFY. Cloned in pUC18, the fragment, called CvZfa, was sequenced and its expression during development was studied. At the nucleotide and amino acid level CvZfa shows respectively 83% and 90% identity with the human ZFY, but its extent of homology is greater with the ZFX of human (86% at nucleotide and 92% at amino acid level) and the ZFY-like genes of turtle and chick. Similarly its homology with the mouse Zfx and Zfa is much greater than that with Zfy-1 and Zfy-2. It appears that the mammalian ZFX (Zfx) evolved from reptilian ancestors with a considerable degree of conservation, but the ZFX to ZFY divergence within the class mammalia was more rapid. The CvZfa transcripts were seen in all the embryonic stages from which RNA was analysed. The whole mount in situ hybridization with the posteriorly placed mesonephros and the gonadal primordia of 10 to 25 day old embryos showed signal selectively in mesonephros of the 20 and 25 day embryos. There was no signal in the genital ridge. Thus CvZfa may not have a direct role in gonadogenesis of C. versicolor, but the possibility of its inductive role in the formation of adreno-gonadal axis through mesonephros cannot be discounted.
[Molecular cloning of the DNA sequence of activin beta A subunit gene mature peptides from panda and related species and its application in the research of phylogeny and taxonomy].

PubMed

Wang, Xiao-Jing; Wang, Xiao-Xing; Wang, Ya-Jun; Wang, Xi-Zhong; He, Guang-Xin; Chen, Hong-Wei; Fei, Li-Song

2002-09-01

Activin, which is included in the transforming growth factor-beta (TGF beta) superfamily of proteins and receptors, is known to have broad-ranging effects in the creatures. The mature peptide of beta A subunit of this gene, one of the most highly conserved sequence, can elevate the basal secretion of follicle-stimulating hormone (FSH) in the pituitary and FSH is pivotal to organism's reproduction. Reproduction block is one of the main reasons which cause giant panda to extinct. The sequence of Activin beta A subunit gene mature peptides has been successfully amplified from giant panda, red panda and malayan sun bear's genomic DNA by using polymerase chain reaction (PCR) with a pair of degenerate primers. The PCR products were cloned into the vector pBlueScript+ of Esherichia coli. Sequence analysis of Activin beta A subunit gene mature peptides shows that the length of this gene segment is the same (359 bp) and there is no intron in all three species. The sequence encodes a peptide of 119 amino acid residues. The homology comparison demonstrates 93.9% DNA homology and 99% homology in amino acid among these three species. Both GenBank blast search result and restriction enzyme map reveal that the sequences of Activin beta A subunit gene mature peptides of different species are highly conserved during the evolution process. Phylogeny analysis is performed with PHYLIP software package. A consistent phylogeny tree has been drawn with three different methods. The software analysis outcome accords with the academic view that giant panda has a closer relationship to the malayan sun bear than the red panda. Giant panda should be grouped into the bear family (Uersidae) with the malayan sun bear. As to the red panda, it would be better that this animal be grouped into the unique family (red panda family) because of great difference between the red panda and the bears (Uersidae).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.