Sample records for kda sequence analysis

  1. NADH:ubiquinone oxidoreductase from bovine heart mitochondria. cDNA sequences of the import precursors of the nuclear-encoded 39 kDa and 42 kDa subunits.

    PubMed Central

    Fearnley, I M; Finel, M; Skehel, J M; Walker, J E

    1991-01-01

    The 39 kDa and 42 kDa subunits of NADH:ubiquinone oxidoreductase from bovine heart mitochondria are nuclear-coded components of the hydrophobic protein fraction of the enzyme. Their amino acid sequences have been deduced from the sequences of overlapping cDNA clones. These clones were amplified from total bovine heart cDNA by means of the polymerase chain reaction, with the use of complex mixtures of oligonucleotide primers based upon fragments of protein sequence determined at the N-terminals of the proteins and at internal sites. The protein sequences of the 39 kDa and 42 kDa subunits are 345 and 320 amino acid residues long respectively, and their calculated molecular masses are 39,115 Da and 36,693 Da. Both proteins are predominantly hydrophilic, but each contains one or two hydrophobic segments that could possibly be folded into transmembrane alpha-helices. The bovine 39 kDa protein sequence is related to that of a 40 kDa subunit from complex I from Neurospora crassa mitochondria; otherwise, it is not related significantly to any known sequence, including redox proteins and two polypeptides involved in import of proteins into mitochondria, known as the mitochondrial processing peptidase and the processing-enhancing protein. Therefore the functions of the 39 kDa and 42 kDa subunits of complex I are unknown. The mitochondrial gene product, ND4, a hydrophobic component of complex I with an apparent molecular mass of about 39 kDa, has been identified in preparations of the enzyme. This subunit stains faintly with Coomassie Blue dye, and in many gel systems it is not resolved from the nuclearcoded 36 kDa subunit. Images Fig. 1. PMID:1832859

  2. Sequencing Larger Intact Proteins (30-70 kDa) with Activated Ion Electron Transfer Dissociation

    NASA Astrophysics Data System (ADS)

    Riley, Nicholas M.; Westphall, Michael S.; Coon, Joshua J.

    2018-01-01

    The analysis of intact proteins via mass spectrometry can offer several benefits to proteome characterization, although the majority of top-down experiments focus on proteoforms in a relatively low mass range (<30 kDa). Recent studies have focused on improving the analysis of larger intact proteins (up to 75 kDa), but they have also highlighted several challenges to be addressed. One major hurdle is the efficient dissociation of larger protein ions, which often to do not yield extensive fragmentation via conventional tandem MS methods. Here we describe the first application of activated ion electron transfer dissociation (AI-ETD) to proteins in the 30-70 kDa range. AI-ETD leverages infrared photo-activation concurrent to ETD reactions to improve sequence-informative product ion generation. This method generates more product ions and greater sequence coverage than conventional ETD, higher-energy collisional dissociation (HCD), and ETD combined with supplemental HCD activation (EThcD). Importantly, AI-ETD provides the most thorough protein characterization for every precursor ion charge state investigated in this study, making it suitable as a universal fragmentation method in top-down experiments. Additionally, we highlight several acquisition strategies that can benefit characterization of larger proteins with AI-ETD, including combination of spectra from multiple ETD reaction times for a given precursor ion, multiple spectral acquisitions of the same precursor ion, and combination of spectra from two different dissociation methods (e.g., AI-ETD and HCD). In all, AI-ETD shows great promise as a method for dissociating larger intact protein ions as top-down proteomics continues to advance into larger mass ranges. [Figure not available: see fulltext.

  3. Cloning and sequencing of a gene encoding the 69-kDa extracellular chitinase of Janthinobacterium lividum.

    PubMed

    Gleave, A P; Taylor, R K; Morris, B A; Greenwood, D R

    1995-09-15

    Janthinobacterium lividum secretes a major 56-kDa chitinase and a minor 69-kDa chitinase. A chitinase gene was defined on a 3-kb fragment of clone pRKT10, by virtue of fluorescent colonies in the presence of 4-methylumbelliferyl-beta-D-N,N',N"-chitotrioside. Nucleotide sequencing revealed an 1998-bp open reading frame with the potential to encode a 69,716-Da protein with amino acid sequences similar to those in other chitinases, suggesting it encodes the minor chitinase (Chi69). Chitinase activity of Escherichia coli (pRKT10) lysates was detected mainly in the periplasmic fraction and immunoblotting detected a 70-kDa protein in this fraction. Chi69 has an N-terminal secretory leader peptide preceding two probable chitin-binding domains and a catalytic domain. These functional domains are separated by linker regions of proline-threonine repeats. Amino acid sequencing of cyanogen bromide cleavage-derived peptides from the major 56-kDa chitinase suggested that Chi69 may be a precursor of Chi56. In addition, an N-terminally truncated version of Chi69 retained chitinase activity as expected if in vivo processing of Chi69 generates Chi56.

  4. Cloning, expression and activation of a truncated 92-kDa gelatinase minienzyme.

    PubMed

    Kröger, M; Tschesche, H

    1997-09-01

    The matrix metalloproteinases (MMPs) are a family of highly homologous zinc-endopeptidases that degrade extracellular matrix components. Human 92-kDa gelatinase (MMP-9) represents one of the MMPs that cleaves native collagen type IV. As a basis for structural investigations, the short form (catalytic domain, amino acid residues 113-450) of the 92-kDa gelatinase cDNA was cloned and expressed in E. coli as a minienzyme. By combination of reverse transcription (RT) and polymerase chain reaction (PCR), the truncated 92-kDa gelatinase-cDNA was amplified from the corresponding mRNA derived from ovarian carcinoma cells. The cDNA fragment obtained was cloned in E. coli and sequenced. With the exception of one nucleotide inversion at position 745 (gt-->tg) the cDNA sequence was identical to the nucleotide sequence of the 92-kDa gelatinase as has been previously reported. The protein was expressed in E. coli using the vector pET-12b. The recombinant protein was stored in inclusion bodies and extracted as a 38 kDa species from the inclusion bodies by solubilization in 8 M urea. The product was purified by affinity chromatography and gel filtration. Amino-terminal sequence analysis confirmed the identity with the catalytic domain of 92-kDa gelatinase. The recombinant protein was refolded in the presence of Ca2+ and Zn2+ and yielded an active minienzyme with gelatinolytic activity. It degrades the native substrate collagen type IV and the synthetic substrate Mca-Pro-Leu-Gly-Leu-Dpa-Ala-Arg-NH2 x AcOH like the full-length 92-kDa gelatinase. The catalytic activity could be inhibited by the specific MMP inhibitors TIMP-1 and TIMP-2.

  5. Purification and sequence analysis of two rat tissue inhibitors of metalloproteinases

    NASA Technical Reports Server (NTRS)

    Roswit, W. T.; McCourt, D. W.; Partridge, N. C.; Jeffrey, J. J.

    1992-01-01

    Two protein inhibitors of metalloproteinases (TIMP) were isolated from medium conditioned by the clonal rat osteosarcoma line UMR 106-01. Initial purification of both a 30-kDa inhibitor and a 20-kDa inhibitor was accomplished using heparin-Sepharose chromatography with dextran sulfate elution followed by DEAE-Sepharose and CM-Sepharose chromatography. Purification of the 20-kDa inhibitor to homogeneity was completed with reverse-phase high-performance liquid chromatography. The 20-kDa inhibitor was identified as rat TIMP-2. The 30-kDa inhibitor, although not purified to homogeneity, was identified as rat TIMP-1. Amino terminal amino acid sequence analysis of the 30-kDa inhibitor demonstrated 86% identity to human TIMP-1 for the first 22 amino acids while the sequence of the 20-kDa inhibitor was identical to that of human TIMP-2 for the first 22 residues. Treatment with peptide:N-glycosidase F indicated that the 30-kDa rat inhibitor is glycosylated while the 20-kDa inhibitor is apparently unglycosylated. Inhibition of both rat and human interstitial collagenase by rat TIMP-2 was stoichiometric, with a 1:1 molar ratio required for complete inhibition. Exposure of UMR 106-01 cells to 10(-7) M parathyroid hormone resulted in approximately a 40% increase in total inhibitor production over basal levels.

  6. Draft Genome Sequences of Two Bacillus thuringiensis Strains and Characterization of a Putative 41.9-kDa Insecticidal Toxin

    PubMed Central

    Palma, Leopoldo; Muñoz, Delia; Berry, Colin; Murillo, Jesús; Caballero, Primitivo

    2014-01-01

    In this work, we report the genome sequencing of two Bacillus thuringiensis strains using Illumina next-generation sequencing technology (NGS). Strain Hu4-2, toxic to many lepidopteran pest species and to some mosquitoes, encoded genes for two insecticidal crystal (Cry) proteins, cry1Ia and cry9Ea, and a vegetative insecticidal protein (Vip) gene, vip3Ca2. Strain Leapi01 contained genes coding for seven Cry proteins (cry1Aa, cry1Ca, cry1Da, cry2Ab, cry9Ea and two cry1Ia gene variants) and a vip3 gene (vip3Aa10). A putative novel insecticidal protein gene 1143 bp long was found in both strains, whose sequences exhibited 100% nucleotide identity. The predicted protein showed 57 and 100% pairwise identity to protein sequence 72 from a patented Bt strain (US8318900) and to a putative 41.9-kDa insecticidal toxin from Bacillus cereus, respectively. The 41.9-kDa protein, containing a C-terminal 6× HisTag fusion, was expressed in Escherichia coli and tested for the first time against four lepidopteran species (Mamestra brassicae, Ostrinia nubilalis, Spodoptera frugiperda and S. littoralis) and the green-peach aphid Myzus persicae at doses as high as 4.8 µg/cm2 and 1.5 mg/mL, respectively. At these protein concentrations, the recombinant 41.9-kDa protein caused no mortality or symptoms of impaired growth against any of the insects tested, suggesting that these species are outside the protein’s target range or that the protein may not, in fact, be toxic. While the use of the polymerase chain reaction has allowed a significant increase in the number of Bt insecticidal genes characterized to date, novel NGS technologies promise a much faster, cheaper and efficient screening of Bt pesticidal proteins. PMID:24784323

  7. Partial De Novo Sequencing and Unusual CID Fragmentation of a 7 kDa, Disulfide-Bridged Toxin

    NASA Astrophysics Data System (ADS)

    Medzihradszky, Katalin F.; Bohlen, Christopher J.

    2012-05-01

    A 7 kDa toxin isolated from the venom of the Texas coral snake ( Micrurus tener tener) was subjected to collision-induced dissociation (CID) and electron-transfer dissociation (ETD) analyses both before and after reduction at low pH. Manual and automated approaches to de novo sequencing are compared in detail. Manual de novo sequencing utilizing the combination of high accuracy CID and ETD data and an acid-related cleavage yielded the N-terminal half of the sequence from the reduced species. The intact polypeptide, containing 3 disulfide bridges produced a series of unusual fragments in ion trap CID experiments: abundant internal amino acid losses were detected, and also one of the disulfide-linkage positions could be determined from fragments formed by the cleavage of two bonds. In addition, internal and c-type fragments were also observed.

  8. Nucleotide sequence and phylogenetic analysis of Cucurbit yellow stunting disorder virus RNA 2.

    PubMed

    Livieratos, Ioannis C; Coutts, Robert H A

    2002-06-01

    The complete nucleotide sequence of Cucurbit yellow stunting disorder virus (CYSDV) RNA 2, a whitefly (Bemisia tabaci)-transmitted closterovirus with a bi-partite genome, is reported. CYSDV RNA 2 is 7,281 nucleotides long and contains the closterovirus hallmark gene array with a similar arrangement to the prototype member of the genus Crinivirus, Lettuce infectious yellows virus (LIYV). CYSDV RNA 2 contains open reading frames (ORFs) potentially encoding in a 5' to 3' direction for proteins of 5 kDa (ORF 1; hydrophobic protein), 62 kDa (ORF 2; heat shock protein 70 homolog, HSP70h), 59 kDa (ORF 3; protein of unknown function), 9 kDa (ORF 4; protein of unknown function), 28.5 kDa (ORF 5; coat protein, CP), 53 kDa (ORF 6; coat protein minor, CPm), and 26.5 kDa (ORF 7; protein of unknown function). Pairwise comparisons of CYSDV RNA 2-encoded proteins (HSP70h, p59 and CPm) among the closteroviruses showed that CYSDV is closely related to LIYV. Phylogenetic analysis based on the amino acid sequence of the HSP70h, indicated that CYSDV clusters with other members of the genus Crinivirus, and it is related to Little cherry virus-1 (LChV-1), but is distinct from the aphid- or mealybug-transmitted closteroviruses.

  9. Biofortification of soybean meal: immunological properties of the 27 kDa γ-zein.

    PubMed

    Krishnan, Hari B; Jang, Sungchan; Kim, Won-Seok; Kerley, Monty S; Oliver, Melvin J; Trick, Harold N

    2011-02-23

    Legumes, including soybeans ( Glycine max ), are deficient in sulfur-containing amino acids, which are required for the optimal growth of monogastric animals. This deficiency can be overcome by expressing heterologous proteins rich in sulfur-containing amino acids in soybean seeds. A maize 27 kDa γ-zein, a cysteine-rich protein, has been successfully expressed in several crops including soybean, barley, and alfalfa with the intent to biofortify these crops for animal feed. Previous work has shown that the maize 27 kDa zein can withstand digestion by pepsin and elicit an immunogenic response in young pigs. By use of sera from patients who tested positive by ImmunoCAP assay for elevated IgE to maize proteins, specific IgE binding to the 27 kDa γ-zein is demonstrated. Bioinformatic analysis using the full-length and 80 amino acid sliding window FASTA searches identified significant sequence homology of the 27 kDa γ-zein with several known allergens. Immunoblot analysis using human serum that cross-reacts with maize seed proteins also revealed specific IgE-binding to the 27 kDa γ-zein in soybean seed protein extracts containing the 27 kDa zein. This study demonstrates for the first time the allergenicity potential of the 27 kDa γ-zein and the potential that this protein has to limit livestock performance when used in soybeans that serve as a biofortified feed supplement.

  10. Molecular cloning and sequence analysis of the gene coding for the 57kDa soluble antigen of the salmonid fish pathogen Renibacterium salmoninarum

    USGS Publications Warehouse

    Chien, Maw-Sheng; Gilbert , Teresa L.; Huang, Chienjin; Landolt, Marsha L.; O'Hara, Patrick J.; Winton, James R.

    1992-01-01

    The complete sequence coding for the 57-kDa major soluble antigen of the salmonid fish pathogen, Renibacterium salmoninarum, was determined. The gene contained an opening reading frame of 1671 nucleotides coding for a protein of 557 amino acids with a calculated Mr value of 57190. The first 26 amino acids constituted a signal peptide. The deduced sequence for amino acid residues 27–61 was in agreement with the 35 N-terminal amino acid residues determined by microsequencing, suggesting the protein in synthesized as a 557-amino acid precursor and processed to produce a mature protein of Mr 54505. Two regions of the protein contained imperfect direct repeats. The first region contained two copies of an 81-residue repeat, the second contained five copies of an unrelated 25-residue repeat. Also, a perfect inverted repeat (including three in-frame UAA stop codons) was observed at the carboxyl-terminus of the gene.

  11. The 29-kDa proteins phosphorylated ion thrombin-activated human platelets are forms of the estrogen receptor-related 27-kDa heat shock protein

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mendelsohn, M.E.; Yan Zhu; O'Neill, S.

    Thrombin plays a critical role in platelet activation, hemostasis, and thrombosis. Cellular activation by thrombin leads to the phosphorylation of multiple proteins, most of which are unidentified. The authors have characterized several 29-kDa proteins that are rapidly phosphorylated following exposure of intact human platelets to thrombin. A murine monoclonal antibody raised to an unidentified estrogen receptor-related 29-kDa protein selectively recognized these proteins as well as a more basic, unphosphorylated 27-kDa protein. Cellular activation by thrombin led to a marked shift in the proportion of protein from the 27-kDa unphosphorylated form to the 29-kDa phosphoprotein species. Using this antibody, they isolatedmore » and sequenced a human cDNA clone encoding a protein that was identical to the mammalian 27-kDa heat shock protein (HSP27), a protein of uncertain function that is known to be phosphorylated to several forms and to be transcriptionally induced by estrogen. The 29-kDa proteins were confirmed to be phosphorylated forms of HSP27 by immunoprecipitation studies. Thus, the estrogen receptor-related protein is HSP27, and the three major 20-kDa proteins phosphorylated in thrombin-activated platelets are forms of HSP27. These data suggest a role for HSP27 in the signal transduction events of platelet activation.« less

  12. Cloning and sequence analysis of Hemonchus contortus HC58cDNA.

    PubMed

    Muleke, Charles I; Ruofeng, Yan; Lixin, Xu; Xinwen, Bo; Xiangrui, Li

    2007-06-01

    The complete coding sequence of Hemonchus contortus HC58cDNA was generated by rapid amplification of cDNA ends and polymerase chain reaction using primers based on the 5' and 3' ends of the parasite mRNA, accession no. AF305964. The HC58cDNA gene was 851 bp long, with open reading frame of 717 bp, precursors to 239 amino acids coding for approximately 27 kDa protein. Analysis of amino acid sequence revealed conserved residues of cysteine, histidine, asparagine, occluding loop pattern, hemoglobinase motif and glutamine of the oxyanion hole characteristic of cathepsin B like proteases (CBL). Comparison of the predicted amino acid sequences showed the protein shared 33.5-58.7% identity to cathepsin B homologues in the papain clan CA family (family C1). Phylogenetic analysis revealed close evolutionary proximity of the protein sequence to counterpart sequences in the CBL, suggesting that HC58cDNA was a member of the papain family.

  13. A relevant IgE-reactive 28kDa protein identified from Salsola kali pollen extract by proteomics is a natural degradation product of an integral 47kDa polygalaturonase.

    PubMed

    Mas, Salvador; Oeo-Santos, Carmen; Cuesta-Herranz, Javier; Díaz-Perales, Araceli; Colás, Carlos; Fernández, Javier; Barber, Domingo; Rodríguez, Rosalía; de Los Ríos, Vivian; Barderas, Rodrigo; Villalba, Mayte

    2017-08-01

    A highly prevalent IgE-binding protein band of 28kDa is observed when Salsola kali pollen extract is incubated with individual sera from Amaranthaceae pollen sensitized patients. By an immunoproteomic analysis of S. kali pollen extract, we identified this protein band as an allergenic polygalacturonase enzyme. The allergen, named Sal k 6, exhibits a pI of 7.14 and a molecular mass of 39,554.2Da. It presents similarities to Platanaceae, Poaceae, and Cupressaceae allergenic polygalacturonases. cDNA-encoding sequence was subcloned into the pET41b vector and produced in bacteria as a His-tag fusion recombinant protein. The far-UV CD spectrum determined that rSal k 6 was folded. Immunostaining of the S. kali pollen protein extract with a rSal k 6-specific pAb and LC-MS/MS proteomic analyses confirmed the co-existence of the 28kDa band together with an allergenic band of about 47kDa in the pollen extract. Therefore, the 28kDa was assigned as a natural degradation product of the 47kDa integral polygalacturonase. The IgE-binding inhibition to S. kali pollen extract using rSal k 6 as inhibitor showed that signals directed to both protein bands of 28 and 47kDa were completely abrogated. The average prevalence of rSal k 6 among the three populations analyzed was 30%, with values correlating well with the levels of grains/m 3 of Amaranthaceae pollen. Sal k 6 shares IgE epitopes with Oleaceae members (Fraxinus excelsior, Olea europaea and Syringa vulgaris), with IgE-inhibition values ranging from 20% to 60%, respectively. No IgE-inhibition was observed with plant-derived food extracts. Copyright © 2017 Elsevier B.V. All rights reserved.

  14. Translocation of an 89-kDa periplasmic protein is associated with Holospora infection

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Iwatani, Koichi; Dohra, Hideo; Lang, B. Franz

    2005-12-02

    The symbiotic bacterium Holospora obtusa infects the macronucleus of the ciliate Paramecium caudatum. After ingestion by its host, an infectious form of Holospora with an electron-translucent tip passes through the host digestive vacuole and penetrates the macronuclear envelope with this tip. To investigate the underlying molecular mechanism of this process, we raised a monoclonal antibody against the tip-specific 89-kDa protein, sequenced this partially, and identified the corresponding complete gene. The deduced protein sequence carries two actin-binding motifs. Indirect immunofluorescence microscopy shows that during escape from the host digestive vacuole, the 89-kDa proteins translocates from the inside to the outside ofmore » the tip. When the bacterium invades the macronucleus, the 89-kDa protein is left behind at the entry point of the nuclear envelope. Transmission electron microscopy shows the formation of fine fibrous structures that co-localize with the antibody-labeled regions of the bacterium. Our findings suggest that the 89-kDa protein plays a role in Holospora's escape from the host digestive vacuole, the migration through the host cytoplasm, and the invasion into the macronucleus.« less

  15. Complete genomic sequence of a Tobacco rattle virus isolate from Michigan-grown potatoes.

    PubMed

    Crosslin, James M; Hamm, Philip B; Kirk, William W; Hammond, Rosemarie W

    2010-04-01

    Tobacco rattle virus (TRV) causes stem mottle on potato leaves and necrotic arcs and rings in potato tubers, known as corky ringspot disease. Recently, TRV was reported in Michigan potato tubers cv. FL1879 exhibiting corky ringspot disease. Sequence analysis of the RNA-1-encoded 16-kDa gene of the Michigan isolate, designated MI-1, revealed homology to TRV isolates from Florida and Washington. Here, we report the complete genomic sequence of RNA-1 (6,791 nt) and RNA-2 (3,685 nt) of TRV MI-1. RNA-1 is predicted to contain four open reading frames, and the genome structure and phylogenetic analyses of the RNA-1 nucleotide sequence revealed significant homologies to the known sequences of other TRV-1 isolates. The relationships based on the full-length nucleotide sequence were different from than those based on the 16-kDa gene encoded on genomic RNA-1 and reflect sequence variation within a 20-25-aa residue region of the 16-kDa protein. MI-1 RNA-2 is predicted to contain three ORFs, encoding the coat protein (CP), a 37.6-kDa protein (ORF 2b), and a 33.6-kDa protein (ORF 2c). In addition, it contains a region of similarity to the 3' terminus of RNA-1, including a truncated portion of the 16-kDa cistron. Phylogenetic analysis of RNA-2, based on a comparison of nucleotide sequences with other members of the genus Tobravirus, indicates that TRV MI-1 and other North American isolates cluster as a distinct group. TRV M1-1 is only the second North American isolate for which there is a complete sequence of the genome, and it is distinct from the North American isolate TRV ORY. The relationship of the TRV MI-1 isolate to other tobravirus isolates is discussed.

  16. Isolation and characterization of cDNA clones for carrot extensin and a proline-rich 33-kDa protein

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chen, J.; Varner, J.E.

    1985-07-01

    Extensins are hydroxyproline-rich glycoproteins associated with most dicotyledonous plant cell walls. To isolate cDNA clones encoding extensin, the authors started by isolating poly(A) RNA from carrot root tissue, and then translating the RNA in vitro, in the presence of tritiated leucine or proline. A 33-kDa peptide was identified in the translation products as a putative extensin precursor. From a cDNA library constructed with poly(A) RNA from wounded carrots, one cDNA clone (pDC5) was identified that specifically hybridized to poly(A) RNA encoding this 33-kDa peptide. They isolated three cDNA clones (pDC11, pDC12, and pDC16) from another cDNA library using pCD5 asmore » a probe. DNA sequence data, RNA hybridization analysis, and hybrid released in vitro translation indicate that the cDNA clones pDC11 encodes extensin and that cDNA clones pDC12 and pDC16 encode the 33-kDa peptide, which as yet has an unknown identity and function. The assumption that the 33-kDa peptide was an extensin precursor was invalid. RNA hybridization analysis showed that RNA encoded by both clone types is accumulated upon wounding.« less

  17. Nano-LC FTICR tandem mass spectrometry for top-down proteomics: routine baseline unit mass resolution of whole cell lysate proteins up to 72 kDa.

    PubMed

    Tipton, Jeremiah D; Tran, John C; Catherman, Adam D; Ahlf, Dorothy R; Durbin, Kenneth R; Lee, Ji Eun; Kellie, John F; Kelleher, Neil L; Hendrickson, Christopher L; Marshall, Alan G

    2012-03-06

    Current high-throughput top-down proteomic platforms provide routine identification of proteins less than 25 kDa with 4-D separations. This short communication reports the application of technological developments over the past few years that improve protein identification and characterization for masses greater than 25 kDa. Advances in separation science have allowed increased numbers of proteins to be identified, especially by nanoliquid chromatography (nLC) prior to mass spectrometry (MS) analysis. Further, a goal of high-throughput top-down proteomics is to extend the mass range for routine nLC MS analysis up to 80 kDa because gene sequence analysis predicts that ~70% of the human proteome is transcribed to be less than 80 kDa. Normally, large proteins greater than 50 kDa are identified and characterized by top-down proteomics through fraction collection and direct infusion at relatively low throughput. Further, other MS-based techniques provide top-down protein characterization, however at low resolution for intact mass measurement. Here, we present analysis of standard (up to 78 kDa) and whole cell lysate proteins by Fourier transform ion cyclotron resonance mass spectrometry (nLC electrospray ionization (ESI) FTICR MS). The separation platform reduced the complexity of the protein matrix so that, at 14.5 T, proteins from whole cell lysate up to 72 kDa are baseline mass resolved on a nano-LC chromatographic time scale. Further, the results document routine identification of proteins at improved throughput based on accurate mass measurement (less than 10 ppm mass error) of precursor and fragment ions for proteins up to 50 kDa.

  18. Isolation and initial structural characterization of a 27 kDa protein from Zingiber officinale

    NASA Astrophysics Data System (ADS)

    Rasheed, Saima; Malik, Shoaib Ahmad; Falke, Sven; Arslan, Ali; Fazel, Ramin; Schlüter, Hartmut; Betzel, Christian; Choudhary, M. Iqbal

    2018-03-01

    Zingiber officinale Roscoe (Ginger) is a widely used traditional medicinal plant (for different ailments such as arthritis, constipation, and hypertension). This article describes the isolation and characterization of a so far unknown protein from ginger rhizomes applying ion exchange, affinity, size-exclusion chromatography, small angle X-ray scattering (SAXS), and mass spectrometry techniques. One-dimensional Coomassie-stained SDS-PAGE was performed under non-reducing conditions, showing one band corresponding to approx. 27 kDa. Dynamic light scattering (DLS) analysis of the protein solution revealed monodispersity and a monomeric state of the purified protein. Circular dichroism (CD) spectroscopy strongly indicated a β-sheet-rich protein, and disordered regions. MALDI-TOF-MS, and LC-MS/MS analysis resulted in the identification of 27.29 kDa protein, having 32.13% and 25.34% sequence coverage with Zingipain-1 and 2, respectively. The monomeric state and molecular weight were verified by small angle X-ray scattering (SAXS) studies. An elongated ab-initio model was calculated based on the scattering intensity distribution.

  19. Biochemical characterization of the 49 kDa penicillin-binding protein of Mycobacterium smegmatis.

    PubMed Central

    Mukherjee, T; Basu, D; Mahapatra, S; Goffin, C; van Beeumen, J; Basu, J

    1996-01-01

    The 49 kDa penicillin-binding protein (PBP) of Mycobacterium smegmatis catalyses the hydrolysis of the peptide or S-ester bond of carbonyl donors R1-CONH-CHR2-COX-CHR2-COO- (where X is NH or S). In the presence of a suitable amino acceptor, the reaction partitions between the transpeptidation and hydrolysis pathways, with the amino acceptor, behaving as a simple alternative nucleophile at the level of the acyl-enzyme. By virtue of its N-terminal sequence similarity, the 49 kDa PBP represents one of the class of monofunctional low-molecular-mass PBPs. An immunologically related protein of M(r) 52,000 is present in M. tuberculosis. The 49 kDa PBP is sensitive towards amoxycillin, imipenem, flomoxef and cefoxitin. PMID:8947487

  20. Molecular characterization of a 40 kDa OmpC-like porin from Serratia marcescens.

    PubMed

    Hutsul, J A; Worobec, E

    1994-02-01

    An oligonucleotide that encodes the N-terminal portion of a 41 kDa porin of Serratia marcescens was used to probe S. marcescens UOC-51 genomic DNA. An 11 kb EcoRI fragment which hybridized with the oligonucleotide was subcloned into Escherichia coli, examined for expression, and sequenced. The product expressed by the cloned gene was 40 kDa. The nucleotide sequence has an ORF of 1.13 kb. When the deduced amino acid sequence was aligned and compared to other enterobacterial porins the cloned S. marcescens porin most closely resembled E. coli OmpC. Although we did not detect osmoregulation or thermoregulation of any porins in S. marcescens UOC-51, sequences analogous to the E. coli osmoregulator OmpR-binding regions are seen upstream to the cloned gene. We examined the regulation of the S. marcescens porin in E. coli and found that its expression increased in a high salt environment. A micF gene, whose transcriptional product functions to inhibit synthesis of OmpF by hybridizing with the ompF transcript, was also seen upstream of the S. marcescens ompC. An alignment with the E. coli micF gene revealed that the functional region of the S. marcescens micF gene is conserved. Based on the results obtained we have determined that S. marcescens UOC-51 produces a 40 kDa porin similar to the E. coli OmpC porin.

  1. Agarose and Polyacrylamide Gel Electrophoresis Methods for Molecular Mass Analysis of 5–500 kDa Hyaluronan

    PubMed Central

    Bhilocha, Shardul; Amin, Ripal; Pandya, Monika; Yuan, Han; Tank, Mihir; LoBello, Jaclyn; Shytuhina, Anastasia; Wang, Wenlan; Wisniewski, Hans-Georg; de la Motte, Carol; Cowman, Mary K.

    2011-01-01

    Agarose and polyacrylamide gel electrophoresis systems for the molecular mass-dependent separation of hyaluronan (HA) in the size range of approximately 5–500 kDa have been investigated. For agarose-based systems, the suitability of different agarose types, agarose concentrations, and buffers systems were determined. Using chemoenzymatically synthesized HA standards of low polydispersity, the molecular mass range was determined for each gel composition, over which the relationship between HA mobility and logarithm of the molecular mass was linear. Excellent linear calibration was obtained for HA molecular mass as low as approximately 9 kDa in agarose gels. For higher resolution separation, and for extension to molecular masses as low as approximately 5 kDa, gradient polyacrylamide gels were superior. Densitometric scanning of stained gels allowed analysis of the range of molecular masses present in a sample, and calculation of weight-average and number-average values. The methods were validated for polydisperse HA samples with viscosity-average molecular masses of 112, 59, 37, and 22 kDa, at sample loads of 0.5 µg (for polyacrylamide) to 2.5 µg (for agarose). Use of the methods for electrophoretic mobility shift assays was demonstrated for binding of the HA-binding region of aggrecan (recombinant human aggrecan G1-IGD-G2 domains) to a 150 kDa HA standard. PMID:21684248

  2. Effect of 14-kDa and 47-kDa protein molecules of age garlic extract on peritoneal macrophages.

    PubMed

    Daneshmandi, Saeed; Hajimoradi, Monire; Ahmadabad, Hasan Namdar; Hassan, Zuhair Mohammad; Roudbary, Maryam; Ghazanfari, Tooba

    2011-03-01

    Garlic (Allium sativum), traditionally being used as a spice worldwide, has different applications and is claimed to possess beneficial effects in several health ailments such as tumor and atherosclerosis. Garlic is also an immunomodulator and its different components are responsible for different properties. The present work aimed to assess the effect of protein fractions of garlic on peritoneal macrophages. 14-kDa and 47-kDa protein fractions of garlic were purified. Mice peritoneal macrophages were lavaged and cultured in a microtiter plate and exposed to different concentrations of garlic proteins. MTT assay was performed to evaluate the viability of macrophage. The amount of nitric oxide (NO) was detected in culture supernatants of macrophages by Griess reagent and furthermore, the cytotoxicity study of culture supernatants was carried out on WEHI-164 fibrosarcoma cell line as tumor necrosis factor-α bioassay. MTT assay results for both 14-kDa and 47-kDa protein fractions of stimulated macrophages were not significant (P > 0.05). Both 14-kDa and 47-kDa fractions significantly suppressed production of NO from macrophages (P = 0.007 and P = 0.003, respectively). Cytotoxicity of macrophages' supernatant on WEHI-164 fibrosarcoma cells was not affected by garlic protein fractions (P = 0.066 for 14-kDa and P = 0.085 for 47-kDa fractions). according to our finding, 14-kDa and 47-kDa fractions of aged garlic extract are able to suppress NO production from macrophages, which can be used as a biological advantage. These molecules had no cytotoxic effect on macrophages and do not increase tumoricidal property of macrophages.

  3. Adhesive proteins of stalked and acorn barnacles display homology with low sequence similarities.

    PubMed

    Jonker, Jaimie-Leigh; Abram, Florence; Pires, Elisabete; Varela Coelho, Ana; Grunwald, Ingo; Power, Anne Marie

    2014-01-01

    Barnacle adhesion underwater is an important phenomenon to understand for the prevention of biofouling and potential biotechnological innovations, yet so far, identifying what makes barnacle glue proteins 'sticky' has proved elusive. Examination of a broad range of species within the barnacles may be instructive to identify conserved adhesive domains. We add to extensive information from the acorn barnacles (order Sessilia) by providing the first protein analysis of a stalked barnacle adhesive, Lepas anatifera (order Lepadiformes). It was possible to separate the L. anatifera adhesive into at least 10 protein bands using SDS-PAGE. Intense bands were present at approximately 30, 70, 90 and 110 kilodaltons (kDa). Mass spectrometry for protein identification was followed by de novo sequencing which detected 52 peptides of 7-16 amino acids in length. None of the peptides matched published or unpublished transcriptome sequences, but some amino acid sequence similarity was apparent between L. anatifera and closely-related Dosima fascicularis. Antibodies against two acorn barnacle proteins (ab-cp-52k and ab-cp-68k) showed cross-reactivity in the adhesive glands of L. anatifera. We also analysed the similarity of adhesive proteins across several barnacle taxa, including Pollicipes pollicipes (a stalked barnacle in the order Scalpelliformes). Sequence alignment of published expressed sequence tags clearly indicated that P. pollicipes possesses homologues for the 19 kDa and 100 kDa proteins in acorn barnacles. Homology aside, sequence similarity in amino acid and gene sequences tended to decline as taxonomic distance increased, with minimum similarities of 18-26%, depending on the gene. The results indicate that some adhesive proteins (e.g. 100 kDa) are more conserved within barnacles than others (20 kDa).

  4. Bombyx mori nucleopolyhedrovirus orf25 encodes a 30kDa late protein in the infection cycle.

    PubMed

    Wang, Haiyan; Chen, Keping; Guo, Zhongjian; Yao, Qin

    2008-02-01

    Bombyx mori nucleopolyhedrovirus (BmNPV) orf25 gene was characterized for the first time. The coding sequence of Bm25 was amplified and subcloned into the prokaryotic expression vector pGEX-4T-2 to produce glutathione S-transferase-tagged fusion protein in the BL21 (DE3) cells. The GST-Bm25 fusion protein was expressed efficiently after induction with IPTG. The purified fusion protein was used to immunize New Zealand white rabbits to prepare polyclonal antibody. Temporal expression analysis revealed a 30-kDa protein, which was detected beginning 24 hours post-infection using a polyclonal antibody against GST-Bm25 fusion protein. The transcript of Bm25 was detected by RT-PCR at 18-72 h p.i. In conclusion, the available data suggest that Bm25 encodes a 30kDa protein expressed in the late stage of infection cycle.

  5. The analysis Arabidopsis thaliana overexpressing a 14kDa self-folding protein [abstract

    USDA-ARS?s Scientific Manuscript database

    A recent study in banana identified a 14kDa protein that has been hypothesized to function in regulating the nucleation and growth of the needle-shaped crystals of calcium oxalate that accumulate within the tissues of this plant. To gain further insight in to the functional role of this 14 kDa prote...

  6. Identification of bovine sperm acrosomal proteins that interact with a 32-kDa acrosomal matrix protein.

    PubMed

    Nagdas, Subir K; Smith, Linda; Medina-Ortiz, Ilza; Hernandez-Encarnacion, Luisa; Raychoudhury, Samir

    2016-03-01

    Mammalian fertilization is accomplished by the interaction between sperm and egg. Previous studies from this laboratory have identified a stable acrosomal matrix assembly from the bovine sperm acrosome termed the outer acrosomal membrane-matrix complex (OMC). This stable matrix assembly exhibits precise binding activity for acrosin and N-acetylglucosaminidase. A highly purified OMC fraction comprises three major (54, 50, and 45 kDa) and several minor (38-19 kDa) polypeptides. The set of minor polypeptides (38-19 kDa) termed "OMCrpf polypeptides" is selectively solubilized by high-pH extraction (pH 10.5), while the three major polypeptides (55, 50, and 45 kDa) remain insoluble. Proteomic identification of the OMC32 polypeptide (32 kDa polypeptide isolated from high-pH soluble fraction of OMC) yielded two peptides that matched the NCBI database sequence of acrosin-binding protein. Anti-OMC32 recognized an antigenically related family of polypeptides (OMCrpf polypeptides) in the 38-19-kDa range with isoelectric points ranging between 4.0 and 5.1. Other than glycohydrolases, OMC32 may also be complexed to other acrosomal proteins. The present study was undertaken to identify and localize the OMC32 binding polypeptides and to elucidate the potential role of the acrosomal protein complex in sperm function. OMC32 affinity chromatography of a detergent-soluble fraction of bovine cauda sperm acrosome followed by mass spectrometry-based identification of bound proteins identified acrosin, lactadherin, SPACA3, and IZUMO1. Co-immunoprecipitation analysis also demonstrated the interaction of OMC32 with acrosin, lactadherin, SPACA3, and IZUMO1. Our immunofluorescence studies revealed the presence of SPACA3 and lactadherin over the apical segment, whereas IZUMO1 is localized over the equatorial segment of Triton X-100 permeabilized cauda sperm. Immunoblot analysis showed that a significant portion of SPACA3 was released after the lysophosphatidylcholine (LPC)-induced acrosome

  7. Adhesive Proteins of Stalked and Acorn Barnacles Display Homology with Low Sequence Similarities

    PubMed Central

    Jonker, Jaimie-Leigh; Abram, Florence; Pires, Elisabete; Varela Coelho, Ana; Grunwald, Ingo; Power, Anne Marie

    2014-01-01

    Barnacle adhesion underwater is an important phenomenon to understand for the prevention of biofouling and potential biotechnological innovations, yet so far, identifying what makes barnacle glue proteins ‘sticky’ has proved elusive. Examination of a broad range of species within the barnacles may be instructive to identify conserved adhesive domains. We add to extensive information from the acorn barnacles (order Sessilia) by providing the first protein analysis of a stalked barnacle adhesive, Lepas anatifera (order Lepadiformes). It was possible to separate the L. anatifera adhesive into at least 10 protein bands using SDS-PAGE. Intense bands were present at approximately 30, 70, 90 and 110 kilodaltons (kDa). Mass spectrometry for protein identification was followed by de novo sequencing which detected 52 peptides of 7–16 amino acids in length. None of the peptides matched published or unpublished transcriptome sequences, but some amino acid sequence similarity was apparent between L. anatifera and closely-related Dosima fascicularis. Antibodies against two acorn barnacle proteins (ab-cp-52k and ab-cp-68k) showed cross-reactivity in the adhesive glands of L. anatifera. We also analysed the similarity of adhesive proteins across several barnacle taxa, including Pollicipes pollicipes (a stalked barnacle in the order Scalpelliformes). Sequence alignment of published expressed sequence tags clearly indicated that P. pollicipes possesses homologues for the 19 kDa and 100 kDa proteins in acorn barnacles. Homology aside, sequence similarity in amino acid and gene sequences tended to decline as taxonomic distance increased, with minimum similarities of 18–26%, depending on the gene. The results indicate that some adhesive proteins (e.g. 100 kDa) are more conserved within barnacles than others (20 kDa). PMID:25295513

  8. The 21.5-kDa isoform of myelin basic protein has a non-traditional PY-nuclear-localization signal

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Smith, Graham S.T.; Seymour, Lauren V.; Boggs, Joan M.

    2012-06-15

    Highlights: Black-Right-Pointing-Pointer Full-length 21.5-kDa MBP isoform is translocated to the nucleus. Black-Right-Pointing-Pointer We hypothesized that the exon-II-encoded sequence contained the NLS. Black-Right-Pointing-Pointer We mutated this sequence in RFP-tagged constructs and transfected N19-cells. Black-Right-Pointing-Pointer Abolition of two key positively-charged residues resulted in loss of nuclear-trafficking. Black-Right-Pointing-Pointer The 21.5-kDa isoform of classic MBP contains a non-traditional PY-NLS. -- Abstract: The predominant 18.5-kDa classic myelin basic protein (MBP) is mainly responsible for compaction of the myelin sheath in the central nervous system, but is multifunctional, having numerous interactions with Ca{sup 2+}-calmodulin, actin, tubulin, and SH3-domains, and can tether these proteins to a lipidmore » membrane in vitro. The full-length 21.5-kDa MBP isoform has an additional 26 residues encoded by exon-II of the classic gene, which causes it to be trafficked to the nucleus of oligodendrocytes (OLGs). We have performed site-directed mutagenesis of selected residues within this segment in red fluorescent protein (RFP)-tagged constructs, which were then transfected into the immortalized N19-OLG cell line to view protein localization using epifluorescence microscopy. We found that 21.5-kDa MBP contains two non-traditional PY-nuclear-localization signals, and that arginine and lysine residues within these motifs were involved in subcellular trafficking of this protein to the nucleus, where it may have functional roles during myelinogenesis.« less

  9. Maize 27 kDa gamma-zein is a potential allergen for early weaned pigs.

    PubMed

    Krishnan, Hari B; Kerley, Monty S; Allee, Gary L; Jang, Sungchan; Kim, Won-Seok; Fu, Chunjiang J

    2010-06-23

    Soybean and maize are extensively used in animal feed, primarily in poultry, swine, and cattle diets. Soybean meal can affect pig performance in the first few weeks following weaning and elicit specific antibodies in weaned piglets. Though maize is a major component of pig feed, it is not known if any of the maize proteins can elicit immunological response in young pigs. In this study, we have identified a prominent 27 kDa protein from maize as an immunodominant protein in young pigs. This protein, like some known allergens, exhibited resistance to pepsin digestion in vitro. Several lines of evidence identify the immunodominant 27 kDa protein as a gamma-zein, a maize seed storage protein. First, sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) analysis of different solubility classes of maize seed proteins revealed the presence of an abundant 27 kDa protein in the prolamin (zein) fraction. Antibodies raised against the purified maize 27 kDa gamma-zein also reacted against the same protein recognized by the young pig serum. Additionally, matrix-assisted laser desorption ionization time-of-flight mass spectrometry analysis of the peptides generated by trypsin digestion of the immunodominant 27 kDa protein showed significant homology to the maize 27 kDa gamma-zein. Since eliminating the allergenic protein will have a great impact on the nutritive value of the maize meal and expand its use in the livestock industry, it will be highly desirable to develop maize cultivars completely lacking the 27 kDa allergenic protein.

  10. Influence of 120 kDa Pyruvate:Ferredoxin Oxidoreductase on Pathogenicity of Trichomonas vaginalis.

    PubMed

    Song, Hyun-Ouk

    2016-02-01

    Trichomonas vaginalis is a flagellate protozoan parasite and commonly infected the lower genital tract in women and men. Iron is a known nutrient for growth of various pathogens, and also reported to be involved in establishment of trichomoniasis. However, the exact mechanism was not clarified. In this study, the author investigated whether the 120 kDa protein of T. vaginalis may be involved in pathogenicity of trichomonads. Antibodies against 120 kDa protein of T. vaginalis, which was identified as pyruvate:ferredoxin oxidoreductase (PFOR) by peptide analysis of MALDI-TOF-MS, were prepared in rabbits. Pretreatment of T. vaginalis with anti-120 kDa Ab decreased the proliferation and adherence to vaginal epithelial cells (MS74) of T. vaginalis. Subcutaneous tissue abscess in anti-120 kDa Ab-treated T. vaginalis-injected mice was smaller in size than that of untreated T. vaginalis-infected mice. Collectively, the 120 kDa protein expressed by iron may be involved in proliferation, adhesion to host cells, and abscess formation, thereby may influence on the pathogenicity of T. vaginalis.

  11. Protein sequence analysis, cloning, and expression of flammutoxin, a pore-forming cytolysin from Flammulina velutipes. Maturation of dimeric precursor to monomeric active form by carboxyl-terminal truncation.

    PubMed

    Tomita, Toshio; Mizumachi, Yoshihiro; Chong, Kang; Ogawa, Kanako; Konishi, Norihide; Sugawara-Tomita, Noriko; Dohmae, Naoshi; Hashimoto, Yohichi; Takio, Koji

    2004-12-24

    Flammutoxin (FTX), a 31-kDa pore-forming cytolysin from Flammulina velutipes, is specifically expressed during the fruiting body formation. We cloned and expressed the cDNA encoding a 272-residue protein with an identical N-terminal sequence with that of FTX but failed to obtain hemolytically active protein. This, together with the presence of multiple FTX family proteins in the mushroom, prompted us to determine the complete primary structure of FTX by protein sequence analysis. The N-terminal 72 and C-terminal 107 residues were sequenced by Edman degradation of the fragments generated from the alkylated FTX by enzymatic digestions with Achromobacter protease I or Staphylococcus aureus V8 protease and by chemical cleavages with CNBr, hydroxylamine, or 1% formic acid. The central part of FTX was sequenced with a surface-adhesive 7-kDa fragment, which was generated by a tryptic digestion of FTX and recovered by rinsing the wall of a test tube with 6 M guanidine HCl. The 7-kDa peptide was cleaved with 12 M HCl, thermolysin, or S. aureus V8 protease to produce smaller peptides for sequence analysis. As a result, FTX consisted of 251 residues, and protein and nucleotide sequences were in accord except for the lack of the initial Met and the C-terminal 20 residues in protein. Recombinant FTX (rFTX) with or without the C-terminal 20 residues (rFTX271 or rFTX251, respectively) was prepared to study the maturation process of FTX. Like natural FTX, rFTX251 existed as a monomer in solution and assembled into an SDS-stable, ring-shaped pore complex on human erythrocytes, causing hemolysis. In contrast, rFTX271, existing as a dimer in solution, bound to the cells but failed to form pore complex. The dimeric rFTX271 was converted to hemolytically active monomers upon the cleavage between Lys(251) and Met(252) by trypsin.

  12. Cloning and characterization of mouse extracellular-signal-regulated protein kinase 3 as a unique gene product of 100 kDa.

    PubMed

    Turgeon, B; Saba-El-Leil, M K; Meloche, S

    2000-02-15

    MAP (mitogen-activated protein) kinases are a family of serine/threonine kinases that have a pivotal role in signal transduction. Here we report the cloning and characterization of a mouse homologue of extracellular-signal-regulated protein kinase (ERK)3. The mouse Erk3 cDNA encodes a predicted protein of 720 residues, which displays 94% identity with human ERK3. Transcription and translation of this cDNA in vitro generates a 100 kDa protein similar to the human gene product ERK3. Immunoblot analysis with an antibody raised against a unique sequence of ERK3 also recognizes a 100 kDa protein in mouse tissues. A single transcript of Erk3 was detected in every adult mouse tissue examined, with the highest expression being found in the brain. Interestingly, expression of Erk3 mRNA is acutely regulated during mouse development, with a peak of expression observed at embryonic day 11. The mouse Erk3 gene was mapped to a single locus on central mouse chromosome 9, adjacent to the dilute mutation locus and in a region syntenic to human chromosome 15q21. Finally, we provide several lines of evidence to support the existence of a unique Erk3 gene product of 100 kDa in mammalian cells.

  13. Microsporidia, amitochondrial protists, possess a 70-kDa heat shock protein gene of mitochondrial evolutionary origin.

    PubMed

    Peyretaillade, E; Broussolle, V; Peyret, P; Méténier, G; Gouy, M; Vivarès, C P

    1998-06-01

    An intronless gene encoding a protein of 592 amino acid residues with similarity to 70-kDa heat shock proteins (HSP70s) has been cloned and sequenced from the amitochondrial protist Encephalitozoon cuniculi (phylum Microsporidia). Southern blot analyses show the presence of a single gene copy located on chromosome XI. The encoded protein exhibits an N-terminal hydrophobic leader sequence and two motifs shared by proteobacterial and mitochondrially expressed HSP70 homologs. Phylogenetic analysis using maximum likelihood and evolutionary distances place the E. cuniculi sequence in the cluster of mitochondrially expressed HSP70s, with a higher evolutionary rate than those of homologous sequences. Similar results were obtained after cloning a fragment of the homologous gene in the closely related species E. hellem. The presence of a nuclear targeting signal-like sequence supports a role of the Encephalitozoon HSP70 as a molecular chaperone of nuclear proteins. No evidence for cytosolic or endoplasmic reticulum forms of HSP70 was obtained through PCR amplification. These data suggest that Encephalitozoon species have evolved from an ancestor bearing mitochondria, which is in disagreement with the postulated presymbiotic origin of Microsporidia. The specific role and intracellular localization of the mitochondrial HSP70-like protein remain to be elucidated.

  14. Possibility of the transformation of eEF-2 (100 kDa) to eEF-2 (65 kDa) in the peptide elongation process in vitro.

    PubMed

    Gajko, A; Sredzińska, K; Galasiński, W; Gindzieński, A

    1999-02-16

    Two active eEF-2 polypeptides of approximately 100 and 65 kDa were copurified from rat liver cells and separated. The fate of eEF-2 (100 kDa) during its binding to ribosomes and in the translocation step of the peptide elongation process was investigated. It was shown that eEF-2 (100 kDa) did not change its form during the process of binding to the ribosomes. In the postribosomal supernatant, obtained from the postincubation mixture of the elongation process, only eEF-2 (65 kDa) was found. These results suggest that the form of eEF-2 (100 kDa), when bound to the ribosome during the elongation process, is transformed to eEF-2 (65 kDa). Copyright 1999 Academic Press.

  15. Molecular cloning, overexpression, purification, and sequence analysis of the giant panda (Ailuropoda melanoleuca) ferritin light polypeptide.

    PubMed

    Fu, L; Hou, Y L; Ding, X; Du, Y J; Zhu, H Q; Zhang, N; Hou, W R

    2016-08-30

    The complementary DNA (cDNA) of the giant panda (Ailuropoda melanoleuca) ferritin light polypeptide (FTL) gene was successfully cloned using reverse transcription-polymerase chain reaction technology. We constructed a recombinant expression vector containing FTL cDNA and overexpressed it in Escherichia coli using pET28a plasmids. The expressed protein was then purified by nickel chelate affinity chromatography. The cloned cDNA fragment was 580 bp long and contained an open reading frame of 525 bp. The deduced protein sequence was composed of 175 amino acids and had an estimated molecular weight of 19.90 kDa, with an isoelectric point of 5.53. Topology prediction revealed one N-glycosylation site, two casein kinase II phosphorylation sites, one N-myristoylation site, two protein kinase C phosphorylation sites, and one cell attachment sequence. Alignment indicated that the nucleotide and deduced amino acid sequences are highly conserved across several mammals, including Homo sapiens, Cavia porcellus, Equus caballus, and Felis catus, among others. The FTL gene was readily expressed in E. coli, which gave rise to the accumulation of a polypeptide of the expected size (25.50 kDa, including an N-terminal polyhistidine tag).

  16. Sequence Analysis and Molecular Characterization of Clonorchis sinensis Hexokinase, an Unusual Trimeric 50-kDa Glucose-6-Phosphate-Sensitive Allosteric Enzyme

    PubMed Central

    Chen, Tingjin; Ning, Dan; Sun, Hengchang; Li, Ran; Shang, Mei; Li, Xuerong; Wang, Xiaoyun; Chen, Wenjun; Liang, Chi; Li, Wenfang; Mao, Qiang; Li, Ye; Deng, Chuanhuan; Wang, Lexun; Wu, Zhongdao; Huang, Yan; Xu, Jin; Yu, Xinbing

    2014-01-01

    Clonorchiasis, which is induced by the infection of Clonorchis sinensis (C. sinensis), is highly associated with cholangiocarcinoma. Because the available examination, treatment and interrupting transmission provide limited opportunities to prevent infection, it is urgent to develop integrated strategies to prevent and control clonorchiasis. Glycolytic enzymes are crucial molecules for trematode survival and have been targeted for drug development. Hexokinase of C. sinensis (CsHK), the first key regulatory enzyme of the glycolytic pathway, was characterized in this study. The calculated molecular mass (Mr) of CsHK was 50.0 kDa. The obtained recombinant CsHK (rCsHK) was a homotrimer with an Mr of approximately 164 kDa, as determined using native PAGE and gel filtration. The highest activity was obtained with 50 mM glycine-NaOH at pH 10 and 100 mM Tris-HCl at pH 8.5 and 10. The kinetics of rCsHK has a moderate thermal stability. Compared to that of the corresponding negative control, the enzymatic activity was significantly inhibited by praziquantel (PZQ) and anti-rCsHK serum. rCsHK was homotropically and allosterically activated by its substrates, including glucose, mannose, fructose, and ATP. ADP exhibited mixed allosteric effect on rCsHK with respect to ATP, while inorganic pyrophosphate (PPi) displayed net allosteric activation with various allosteric systems. Fructose behaved as a dose-dependent V activator with the substrate glucose. Glucose-6-phosphate (G6P) displayed net allosteric inhibition on rCsHK with respect to ATP or glucose with various allosteric systems in a dose-independent manner. There were differences in both mRNA and protein levels of CsHK among the life stages of adult worm, metacercaria, excysted metacercaria and egg of C. sinensis, suggesting different energy requirements during different development stages. Our study furthers the understanding of the biological functions of CsHK and supports the need to screen for small molecule inhibitors

  17. An endogenous 55 kDa TNF receptor mediates cell death in a neural cell line.

    PubMed

    Sipe, K J; Srisawasdi, D; Dantzer, R; Kelley, K W; Weyhenmeyer, J A

    1996-06-01

    Tumor necrosis factor-alpha (TNF) is associated with developmental and injury-related events in the central nervous system (CNS). In the present study, we have examined the role of TNF on neurons using the clonal murine neuroblastoma line, N1E-115 (N1E). N1E cells represent a well-defined model for studying neuronal development since they can be maintained as either undifferentiated, mitotically active neuroblasts or as differentiated, mature neurons. Northern and reverse transcription-polymerase chain reaction (RT-PCR) analyses revealed that both undifferentiated and differentiated N1Es express transcripts for the 55 kDa TNF receptor (TNFR), but not the 75 kDa TNFR. The biological activity of the expressed TNF receptor was demonstrated by a dose dependent cytotoxicity to either recombinant murine or human TNF when the cells were incubated with the transcriptional inhibitor actinomycin D. The lack of the 75 kDa receptor mRNA expression and the dose dependent response to rHuTNF, an agonist specific for the murine 55 kDa receptor, suggest that the TNF induced cytotoxicity is mediated through the 55 kDa receptor in both the undifferentiated and differentiated N1Es. Light microscopic observations, flow cytometric analysis of hypodiploid DNA, and electrophoretic analysis of nucleosomal DNA fragmentation of N1Es treated with actinomycin D and TNF revealed features characteristic of both necrotic and apoptotic cell death. These findings demonstrate that blast and mature N1E cells express the 55 kDa TNF receptor which is responsible for inducing both necrotic and apoptotic death in these cells. The observation that actinomycin D renders N1E cells susceptible to the cytotoxic effects of TNF indicates that a sensitization step, such as removal of an endogenous protective factor or viral-mediated inhibition of transcription, may be necessary for TNF cytotoxicity in neurons.

  18. Complete genomic sequence of a tobacco rattle virus isolate from Michigan-grown potatoes

    USDA-ARS?s Scientific Manuscript database

    Tobacco rattle virus (TRV) causes stem mottle on potato leaves and necrotic arcs and rings in potato tubers, known as corky ringspot disease. Recently, TRV was reported in Michigan potato tubers cv. FL1879 exhibiting corky ringspot disease. Sequence analysis of the RNA-1-encoded 16 kDa gene of the...

  19. The Induction of Recombinant Protein Bodies in Different Subcellular Compartments Reveals a Cryptic Plastid-Targeting Signal in the 27-kDa γ-Zein Sequence

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hofbauer, Anna; Peters, Jenny; Arcalis, Elsa

    2014-12-11

    Naturally occurring storage proteins such as zeins are used as fusion partners for recombinant proteins because they induce the formation of ectopic storage organelles known as protein bodies (PBs) where the proteins are stabilized by intermolecular interactions and the formation of disulfide bonds. Endogenous PBs are derived from the endoplasmic reticulum (ER). Here, we have used different targeting sequences to determine whether ectopic PBs composed of the N-terminal portion of mature 27 kDa γ-zein added to a fluorescent protein could be induced to form elsewhere in the cell. The addition of a transit peptide for targeting to plastids causes PBmore » formation in the stroma, whereas in the absence of any added targeting sequence PBs were typically associated with the plastid envelope, revealing the presence of a cryptic plastid-targeting signal within the γ-zein cysteine-rich domain. The subcellular localization of the PBs influences their morphology and the solubility of the stored recombinant fusion protein. Our results indicate that the biogenesis and budding of PBs does not require ER-specific factors and therefore, confirm that γ-zein is a versatile fusion partner for recombinant proteins offering unique opportunities for the accumulation and bioencapsulation of recombinant proteins in different subcellular compartments.« less

  20. MALDI-TOF mass spectrometry analysis of small molecular weight compounds (under 10 KDa) as biomarkers of rat hearts undergoing arecoline challenge.

    PubMed

    Chen, Tung-Sheng; Chang, Mu-Hsin; Kuo, Wei-Wen; Lin, Yueh-Min; Yeh, Yu-Lan; Day, Cecilia Hsuan; Lin, Chien-Chung; Tsai, Fuu-Jen; Tsai, Chang-Hai; Huang, Chih-Yang

    2013-04-01

    Statistical and clinical reports indicate that betel nut chewing is strongly associated with progression of oral cancer because some ingredients in betel nuts are potential cancer promoters, especially arecoline. Early diagnosis for cancer biomarkers is the best strategy for prevention of cancer progression. Several methods are suggested for investigating cancer biomarkers. Among these methods, gel-based proteomics approach is the most powerful and recommended tool for investigating biomarkers due to its high-throughput. However, this proteomics approach is not suitable for screening biomarkers with molecular weight under 10 KDa because of the characteristics of gel electrophoresis. This study investigated biomarkers with molecular weight under 10 KDa in rats with arecoline challenge. The centrifuging vials with membrane (10 KDa molecular weight cut-off) played a crucial role in this study. After centrifuging, the filtrate (containing compounds with molecular weight under 10 KDa) was collected and spotted on a sample plate for MALDI-TOF mass spectrometry analysis. Compared to control, three extra peaks (m/z values were 1553.1611, 1668.2097 and 1740.1832, respectively) were found in sera and two extra peaks were found in heart tissue samples (408.9719 and 524.9961, respectively). These small compounds should play important roles and may be potential biomarker candidates in rats with arecoline. This study successfully reports a mass-based method for investigating biomarker candidates with small molecular weight in different types of sample (including serum and tissue). In addition, this reported method is more time-efficient (1 working day) than gel-based proteomics approach (5~7 working days).

  1. Purification and partial characterization of analogous 26-kDa rat submandibular and parotid gland integral membrane phosphoproteins that may have a role in exocytosis.

    PubMed

    Quissell, D O; Deisher, L M

    1992-04-01

    Rat submandibular and parotid gland exocytosis is primarily controlled by beta-adrenergic receptor stimulation. Although its precise role in the regulation of salivary gland exocytosis is not fully understood, protein phosphorylation, mediated by the activation of cAMP-dependent protein kinase, may be directly involved. Previous studies suggest that analogous 26-kDa integral membrane phosphoproteins may play a direct role in regulating exocytosis. Studies were here undertaken to purify and partially characterize both phosphoproteins. After endogenous phosphorylation with 32P, subcellular fraction and solubilization of the microsomal fraction in n-octyl beta-glucopyranoside, the 26-kDa integral membrane phosphoproteins were purified by high performance liquid chromatography (HPLC), followed by sodium dodecyl sulphate-polyacrylamide gel electrophoresis and electroelution of the proteins. Amino acid analysis indicated a significant number of serine amino acids: N-terminal sequence data demonstrated a high level of homology; and trypsin digestion followed by reversed-phase HPLC indicated the possibility of multiple phosphorylation sites.

  2. Analysis of amyloid fibrils in the cheetah (Acinonyx jubatus).

    PubMed

    Bergström, Joakim; Ueda, Mitsuharu; Une, Yumi; Sun, Xuguo; Misumi, Shogo; Shoji, Shozo; Ando, Yukio

    2006-06-01

    Recently, a high prevalence of amyloid A (AA) amyloidosis has been documented among captive cheetahs worldwide. Biochemical analysis of amyloid fibrils extracted from the liver of a Japanese captive cheetah unequivocally showed that protein AA was the main fibril constituent. Further characterization of the AA fibril components by sodium dodecyl sulphate-polyacrylamide gel electrophoresis (SDS-PAGE) and Western blot analysis revealed three main protein AA bands with approximate molecular weights of 8, 10 and 12 kDa. Mass spectrometry analysis of the 12-kDa component observed in SDS-PAGE and Western blotting confirmed the molecular weight of a 12,381-Da peak. Our finding of a 12-kDa protein AA component provides evidence that the cheetah SAA sequence is longer than the previously reported 90 amino acid residues (approximately 10 kDa), and hence SAA is part of the amyloid fibril.

  3. The glyoxysomal and plastid molecular chaperones (70-kDa heat shock protein) of watermelon cotyledons are encoded by a single gene

    PubMed Central

    Wimmer, Bernhard; Lottspeich, Friedrich; van der Klei, Ida; Veenhuis, Marten; Gietl, Christine

    1997-01-01

    The monoclonal a-70-kDa heat shock protein (hsp70) antibody recognizes in crude extracts from watermelon (Citrullus vulgaris) cotyledons two hsps with molecular masses of 70 and 72 kDa. Immunocytochemistry on watermelon cotyledon tissue and on isolated glyoxysomes identified hsp70s in the matrix of glyoxysomes and plastids. Affinity purification and partial amino acid determination revealed the 70-kDa protein to share high sequence identity with cytosolic hsp70s from a number of plant species, while the 72 kDa protein was very similar to plastid hsp70s from pea and cucumber. A full-length cDNA clone encoding the 72-kDa hsp70 was isolated and identified two start methionines in frame within the N-terminal presequence leading either to an N-terminal extension of 67 amino acids or to a shorter one of 47 amino acids. The longer presequence was necessary and sufficient to target a reporter protein into watermelon proplastids in vitro. The shorter extension starting from the second methionine within the long version harbored a consensus peroxisomal targeting signal (RT-X5-KL) that directed in vivo a reporter protein into peroxisomes of the yeast Hansenula polymorpha. Peroxisomal targeting was however prevented, when the 67-residue presequence was fused to the reporter protein, indicating that the peroxisomal targeting signal 2 information is hidden in this context. We propose that the 72-kDa hsp70 is encoded by a single gene, but targeted alternatively into two organelles by the modulated use of its presequence. PMID:9391076

  4. The location of a disease-associated polymorphism and genomic structure of the human 52-kDa Ro/SSA locus (SSA1)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tsugu, H.; Horowitz, R.; Gibson, N.

    1994-12-01

    Sera from approximately 30% of patients with systemic lupus erythematosus (SLE) contain high titers of autoantibodies that bind to the 52-kDa Ro/SSA protein. We previously detected polymorphisms in the 52-kDa Ro/SSA gene (SSA1) with restriction enzymes, one of which is strongly associated with the presence of SLE (P < 0.0005) in African Americans. A higher disease frequency and more severe forms of the disease are commonly noted among these female patients. To determine the location and nature of this polymorphism, we obtained two clones that span 8.5 kb of the 52-kDa Ro/SSA locus including its upstream regulatory region. Six exonsmore » were identified, and their nucleotide sequences plus adjacent noncoding regions were determined. No differences were found between these exons and the coding region of one of the reported cDNAs. The disease-associated polymorphic site suggested by a restriction enzyme map and confirmed by DNA amplification and nucleotide sequencing was present upstream of exon 1. This polymorphism may be a genetic marker for a disease-related variation in the coding region for the protein or in the upstream regulatory region of this gene. Although this RFLP is present in Japanese, it is not associated with lupus in this race. 41 refs., 4 figs., 2 tabs.« less

  5. Cloning and expression of a nuclear encoded plastid specific 33 kDa ribonucleoprotein gene (33RNP) from pea that is light stimulated.

    PubMed

    Reddy, M K; Nair, S; Singh, B N; Mudgil, Y; Tewari, K K; Sopory, S K

    2001-01-24

    We report the cloning and sequencing of both cDNA and genomic DNA of a 33 kDa chloroplast ribonucleoprotein (33RNP) from pea. The analysis of the predicted amino acid sequence of the cDNA clone revealed that the encoded protein contains two RNA binding domains, including the conserved consensus ribonucleoprotein sequences CS-RNP1 and CS-RNP2, on the C-terminus half and the presence of a putative transit peptide sequence in the N-terminus region. The phylogenetic and multiple sequence alignment analysis of pea chloroplast RNP along with RNPs reported from the other plant sources revealed that the pea 33RNP is very closely related to Nicotiana sylvestris 31RNP and 28RNP and also to 31RNP and 28RNP of Arabidopsis and spinach, respectively. The pea 33RNP was expressed in Escherichia coli and purified to homogeneity. The in vitro import of precursor protein into chloroplasts confirmed that the N-terminus putative transit peptide is a bona fide transit peptide and 33RNP is localized in the chloroplast. The nucleic acid-binding properties of the recombinant protein, as revealed by South-Western analysis, showed that 33RNP has higher binding affinity for poly (U) and oligo dT than for ssDNA and dsDNA. The steady state transcript level was higher in leaves than in roots and the expression of this gene is light stimulated. Sequence analysis of the genomic clone revealed that the gene contains four exons and three introns. We have also isolated and analyzed the 5' flanking region of the pea 33RNP gene.

  6. Cloning and sequencing of a gene encoding a 21-kilodalton outer membrane protein from Bordetella avium and expression of the gene in Salmonella typhimurium.

    PubMed Central

    Gentry-Weeks, C R; Hultsch, A L; Kelly, S M; Keith, J M; Curtiss, R

    1992-01-01

    Three gene libraries of Bordetella avium 197 DNA were prepared in Escherichia coli LE392 by using the cosmid vectors pCP13 and pYA2329, a derivative of pCP13 specifying spectinomycin resistance. The cosmid libraries were screened with convalescent-phase anti-B. avium turkey sera and polyclonal rabbit antisera against B. avium 197 outer membrane proteins. One E. coli recombinant clone produced a 56-kDa protein which reacted with convalescent-phase serum from a turkey infected with B. avium 197. In addition, five E. coli recombinant clones were identified which produced B. avium outer membrane proteins with molecular masses of 21, 38, 40, 43, and 48 kDa. At least one of these E. coli clones, which encoded the 21-kDa protein, reacted with both convalescent-phase turkey sera and antibody against B. avium 197 outer membrane proteins. The gene for the 21-kDa outer membrane protein was localized by Tn5seq1 mutagenesis, and the nucleotide sequence was determined by dideoxy sequencing. DNA sequence analysis of the 21-kDa protein revealed an open reading frame of 582 bases that resulted in a predicted protein of 194 amino acids. Comparison of the predicted amino acid sequence of the gene encoding the 21-kDa outer membrane protein with protein sequences in the National Biomedical Research Foundation protein sequence data base indicated significant homology to the OmpA proteins of Shigella dysenteriae, Enterobacter aerogenes, E. coli, and Salmonella typhimurium and to Neisseria gonorrhoeae outer membrane protein III, Haemophilus influenzae protein P6, and Pseudomonas aeruginosa porin protein F. The gene (ompA) encoding the B. avium 21-kDa protein hybridized with 4.1-kb DNA fragments from EcoRI-digested, chromosomal DNA of Bordetella pertussis and Bordetella bronchiseptica and with 6.0- and 3.2-kb DNA fragments from EcoRI-digested, chromosomal DNA of B. avium and B. avium-like DNA, respectively. A 6.75-kb DNA fragment encoding the B. avium 21-kDa protein was subcloned into the

  7. Plants transformed with a tobacco mosaic virus nonstructural gene sequence are resistant to the virus.

    PubMed Central

    Golemboski, D B; Lomonossoff, G P; Zaitlin, M

    1990-01-01

    Nicotiana tabacum cv. Xanthi nn plants were transformed with nucleotides 3472-4916 of tobacco mosaic virus (TMV) strain U1. This sequence contains all but the three 3 terminal nucleotides of the TMV 54-kDa gene, which encodes a putative component of the replicase complex. These plants were resistant to infection when challenged with either TMV U1 virions or TMV U1 RNA at concentrations of up to 500 micrograms/ml or 300 micrograms/ml, respectively, the highest concentrations tested. Resistance was also exhibited when plants were inoculated at 100 micrograms/ml with the closely related TMV mutant YSI/1 but was not shown in plants challenged at the same concentrations with the more distantly related TMV strains U2 or L or cucumber mosaic virus. Although the copy number of the 54-kDa gene sequence varied in individual transformants from 1 to approximately 5, the level of resistance in plants was not dependent on the number of copies of the 54-kDa gene sequence integrated. The transformed plants accumulated a 54-kDa gene sequence-specific RNA transcript of the expected size, but no protein product was detected. Images PMID:2385595

  8. Two-step processing for activation of the cytolysin/hemolysin of Vibrio cholerae O1 biotype El Tor: nucleotide sequence of the structural gene (hlyA) and characterization of the processed products.

    PubMed

    Yamamoto, K; Ichinose, Y; Shinagawa, H; Makino, K; Nakata, A; Iwanaga, M; Honda, T; Miwatani, T

    1990-12-01

    Vibrio cholerae O1 biotype El Tor produces and secretes a 65-kDa cytolysin/hemolysin into the culture medium. We cloned the structural gene (hlyA) for the cytolysin from the total DNA of a V. cholerae O1 El Tor strain, N86. Nucleotide sequence analysis of hlyA revealed an open reading frame consisting of 2,223 bp which can code for a protein of 741 amino acids with a molecular weight of 81,961. Consistent with this, a 79-kDa protein was identified as the product of hlyA by maxicell analysis in Escherichia coli. N-terminal amino acids of this 79-kDa HlyA protein and those of a 65-kDa El Tor cytolysin purified from V. cholerae were Asn-26 and Asn-158, respectively. The 82- and 79-kDa precursors of the 65-kDa mature cytolysin were found in V. cholerae by pulse-chase labeling and Western blot (immunoblot) analysis of hlyA products. Hemolytic activity of the 79-kDa HlyA protein from E. coli was less than 5% that for the 65-kDa cytolysin from V. cholerae. Our results suggest that in V. cholerae, the 82-kDa preprotoxin synthesized in the cytoplasm is secreted through the membranes into the culture medium as the 79-kDa inactive protoxin after cleavage of the signal peptide and is then further processed into the 65-kDa active cytolysin by release of the N-terminal 15-kDa fragment.

  9. A Bayesian framework based on a Gaussian mixture model and radial-basis-function Fisher discriminant analysis (BayGmmKda V1.1) for spatial prediction of floods

    NASA Astrophysics Data System (ADS)

    Tien Bui, Dieu; Hoang, Nhat-Duc

    2017-09-01

    In this study, a probabilistic model, named as BayGmmKda, is proposed for flood susceptibility assessment in a study area in central Vietnam. The new model is a Bayesian framework constructed by a combination of a Gaussian mixture model (GMM), radial-basis-function Fisher discriminant analysis (RBFDA), and a geographic information system (GIS) database. In the Bayesian framework, GMM is used for modeling the data distribution of flood-influencing factors in the GIS database, whereas RBFDA is utilized to construct a latent variable that aims at enhancing the model performance. As a result, the posterior probabilistic output of the BayGmmKda model is used as flood susceptibility index. Experiment results showed that the proposed hybrid framework is superior to other benchmark models, including the adaptive neuro-fuzzy inference system and the support vector machine. To facilitate the model implementation, a software program of BayGmmKda has been developed in MATLAB. The BayGmmKda program can accurately establish a flood susceptibility map for the study region. Accordingly, local authorities can overlay this susceptibility map onto various land-use maps for the purpose of land-use planning or management.

  10. Cloning and Characterization of an Outer Membrane Protein of Vibrio vulnificus Required for Heme Utilization: Regulation of Expression and Determination of the Gene Sequence

    PubMed Central

    Litwin, Christine M.; Byrne, Burke L.

    1998-01-01

    Vibrio vulnificus is a halophilic, marine pathogen that has been associated with septicemia and serious wound infections in patients with iron overload and preexisting liver disease. For V. vulnificus, the ability to acquire iron from the host has been shown to correlate with virulence. V. vulnificus is able to use host iron sources such as hemoglobin and heme. We previously constructed a fur mutant of V. vulnificus which constitutively expresses at least two iron-regulated outer membrane proteins, of 72 and 77 kDa. The N-terminal amino acid sequence of the 77-kDa protein purified from the V. vulnificus fur mutant had 67% homology with the first 15 amino acids of the mature protein of the Vibrio cholerae heme receptor, HutA. In this report, we describe the cloning, DNA sequence, mutagenesis, and analysis of transcriptional regulation of the structural gene for HupA, the heme receptor of V. vulnificus. DNA sequencing of hupA demonstrated a single open reading frame of 712 amino acids that was 50% identical and 66% similar to the sequence of V. cholerae HutA and similar to those of other TonB-dependent outer membrane receptors. Primer extension analysis localized one promoter for the V. vulnificus hupA gene. Analysis of the promoter region of V. vulnificus hupA showed a sequence homologous to the consensus Fur box. Northern blot analysis showed that the transcript was strongly regulated by iron. An internal deletion in the V. vulnificus hupA gene, done by using marker exchange, resulted in the loss of expression of the 77-kDa protein and the loss of the ability to use hemin or hemoglobin as a source of iron. The hupA deletion mutant of V. vulnificus will be helpful in future studies of the role of heme iron in V. vulnificus pathogenesis. PMID:9632577

  11. Anti-inflammatory effect of garlic 14-kDa protein on LPS-stimulated-J774A.1 macrophages.

    PubMed

    Rabe, Shahrzad Zamani Taghizadeh; Ghazanfari, Tooba; Siadat, Zahra; Rastin, Maryam; Rabe, Shahin Zamani Taghizadeh; Mahmoudi, Mahmoud

    2015-04-01

    Garlic 14-kDa protein is purified from garlic (Allium sativum L.) which is used in traditional medicine and exerts various immunomodulatory activities. The present study investigated the suppressive effect of garlic 14-kDa protein on LPS-induced expression of pro-inflammatory mediators and underlying mechanism in inflammatory macrophages. J774A.1 macrophages were treated with 14-kDa protein (5-30 μg/ml) with/without LPS (1 μg/ml) and the production of inflammatory mediators such as prostaglandin E2 (PGE2), TNF-α, and IL-1β released were measured using ELISA. Nitric oxide (NO) production was determined using the Griess method. The anti-inflammatory activity of 14-kDa protein was examined by measuring inducible nitric oxide synthase and cyclooxygenase-2 proteins using western blot. The expression of nuclear NF-κB p65 subunit was assessed by western blot. Garlic 14-kDa protein significantly inhibited the excessive production of NO, PGE, TNF-α, and IL-1β in lipopolysaccharide (LPS)-activated J774A.1 macrophages in a concentration-related manner without cytotoxic effect. Western blot analysis demonstrated that garlic 14-kDa protein suppressed corresponding inducible NO synthase expression and activated cyclooxygenase-2 protein expression. The inhibitory effect was mediated partly by a reduction in the activity and expression of transcription factor NF-κB protein. Our results suggested, for the first time, garlic 14-kDa protein exhibits anti-inflammatory properties in macrophages possibly by suppressing the inflammatory mediators via the inhibition of transcription factor NF-κB signaling pathway. The traditional use of garlic as anti-inflammatory remedy could be ascribed partly to 14-kDa protein content. This protein might be a useful candidate for controlling inflammatory diseases and further investigations in vivo.

  12. The aqueous phase of Alzheimer's disease brain contains assemblies built from ∼4 and ∼7 kDa Aβ species.

    PubMed

    Mc Donald, Jessica M; O'Malley, Tiernan T; Liu, Wen; Mably, Alexandra J; Brinkmalm, Gunnar; Portelius, Erik; Wittbold, William M; Frosch, Matthew P; Walsh, Dominic M

    2015-11-01

    Much knowledge about amyloid β (Aβ) aggregation and toxicity has been acquired using synthetic peptides and mouse models, whereas less is known about soluble Aβ in human brain. We analyzed aqueous extracts from multiple AD brains using an array of techniques. Brains can contain at least four different Aβ assembly forms including: (i) monomers, (ii) a ∼7 kDa Aβ species, and larger species (iii) from ∼30-150 kDa, and (iv) >160 kDa. High molecular weight species are by far the most prevalent and appear to be built from ∼7 kDa Aβ species. The ∼7 kDa Aβ species resist denaturation by chaotropic agents and have a higher Aβ42/Aβ40 ratio than monomers, and are unreactive with antibodies to Asp1 of Ab or APP residues N-terminal of Asp1. Further analysis of brain-derived ∼7 kDa Aβ species, the mechanism by which they assemble and the structures they form should reveal therapeutic and diagnostic opportunities. Copyright © 2015 The Alzheimer's Association. Published by Elsevier Inc. All rights reserved.

  13. Front-End Electron Transfer Dissociation Coupled to a 21 Tesla FT-ICR Mass Spectrometer for Intact Protein Sequence Analysis

    NASA Astrophysics Data System (ADS)

    Weisbrod, Chad R.; Kaiser, Nathan K.; Syka, John E. P.; Early, Lee; Mullen, Christopher; Dunyach, Jean-Jacques; English, A. Michelle; Anderson, Lissa C.; Blakney, Greg T.; Shabanowitz, Jeffrey; Hendrickson, Christopher L.; Marshall, Alan G.; Hunt, Donald F.

    2017-09-01

    High resolution mass spectrometry is a key technology for in-depth protein characterization. High-field Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS) enables high-level interrogation of intact proteins in the most detail to date. However, an appropriate complement of fragmentation technologies must be paired with FTMS to provide comprehensive sequence coverage, as well as characterization of sequence variants, and post-translational modifications. Here we describe the integration of front-end electron transfer dissociation (FETD) with a custom-built 21 tesla FT-ICR mass spectrometer, which yields unprecedented sequence coverage for proteins ranging from 2.8 to 29 kDa, without the need for extensive spectral averaging (e.g., 60% sequence coverage for apo-myoglobin with four averaged acquisitions). The system is equipped with a multipole storage device separate from the ETD reaction device, which allows accumulation of multiple ETD fragment ion fills. Consequently, an optimally large product ion population is accumulated prior to transfer to the ICR cell for mass analysis, which improves mass spectral signal-to-noise ratio, dynamic range, and scan rate. We find a linear relationship between protein molecular weight and minimum number of ETD reaction fills to achieve optimum sequence coverage, thereby enabling more efficient use of instrument data acquisition time. Finally, real-time scaling of the number of ETD reactions fills during method-based acquisition is shown, and the implications for LC-MS/MS top-down analysis are discussed. [Figure not available: see fulltext.

  14. Identification of a 27.8 kDa protein from flounder gill cells involved in lymphocystis disease virus binding and infection.

    PubMed

    Wang, Mu; Sheng, Xiu-Zhen; Xing, Jing; Tang, Xiao-Qian; Zhan, Wen-Bin

    2011-03-16

    In vitro, lymphocystis disease virus (LCDV) infection of flounder gill (FG) cell cultures causes obvious cytopathic effect (CPE). We describe attempts to isolate and characterize the LCDV-binding molecule(s) on the plasma membrane of FG cells that were responsible for virus entry. The results showed that the co-immunoprecipitation assay detected a 27.8 kDa molecule from FG cells that bound to LCDV. In a blocking ELISA, pre-incubation of FG cell membrane proteins with the specific antiserum developed against the 27.8 kDa protein could block LCDV binding. Similarly, antiserum against 27.8 kDa protein could also inhibit LCDV infection of FG cells in vitro. Mass spectrometric analysis established that the 27.8 kDa protein and beta-actin had a strong association. These results strongly supported the possibility that the 27.8 kDa protein was the putative receptor specific for LCDV infection of FG cells.

  15. Type I allergy to elderberry (Sambucus nigra) is elicited by a 33.2 kDa allergen with significant homology to ribosomal inactivating proteins.

    PubMed

    Förster-Waldl, E; Marchetti, M; Schöll, I; Focke, M; Radauer, C; Kinaciyan, T; Nentwich, I; Jäger, S; Schmid, E R; Boltz-Nitulescu, G; Scheiner, O; Jensen-Jarolim, E

    2003-12-01

    Patients suffering from allergic rhinoconjunctivitis and dyspnoea during summer may exhibit these symptoms after contact with flowers or dietary products of the elderberry tree Sambucus nigra. Patients with a history of summer hayfever were tested in a routine setting for sensitization to elderberry. Nine patients having allergic symptoms due to elderberry and specific sensitization were investigated in detail. We studied the responsible allergens in extracts from elderberry pollen, flowers and berries, and investigated cross-reactivity with allergens from birch, grass and mugwort. Sera from patients were tested for IgE reactivity to elderberry proteins by one-dimensional (1D) and 2D electrophoresis/immunoblotting. Inhibition studies with defined allergens and elderberry-specific antibodies were used to evaluate cross-reactivity. The main elderberry allergen was purified by gel filtration and reversed-phase HPLC, and subjected to mass spectrometry. The in-gel-digested allergen was analysed by the MS/MS sequence analysis and peptide mapping. The N-terminal sequence of the predominant allergen was analysed. 0.6% of 3668 randomly tested patients showed positive skin prick test and/or RAST to elderberry. IgE in patients' sera detected a predominant allergen of 33.2 kDa in extracts from elderberry pollen, flowers and berries, with an isoelectric point at pH 7.0. Pre-incubation of sera with extracts from birch, mugwort or grass pollen rendered insignificant or no inhibition of IgE binding to blotted elderberry proteins. Specific mouse antisera reacted exclusively with proteins from elderberry. N-terminal sequence analysis, as well as MS/MS spectrometry of the purified elderberry allergen, indicated homology with ribosomal inactivating proteins (RIPs). We present evidence that the elderberry plant S. nigra harbours allergenic potency. Independent methodologies argue for a significant homology of the predominant 33.2 kDa elderberry allergen with homology to RIPs. We

  16. Nup93, a Vertebrate Homologue of Yeast Nic96p, Forms a Complex with a Novel 205-kDa Protein and Is Required for Correct Nuclear Pore Assembly

    PubMed Central

    Grandi, Paola; Dang, Tam; Pané, Nelly; Shevchenko, Andrej; Mann, Matthias; Forbes, Douglass; Hurt, Ed

    1997-01-01

    Yeast and vertebrate nuclear pores display significant morphological similarity by electron microscopy, but sequence similarity between the respective proteins has been more difficult to observe. Herein we have identified a vertebrate nucleoporin, Nup93, in both human and Xenopus that has proved to be an evolutionarily related homologue of the yeast nucleoporin Nic96p. Polyclonal antiserum to human Nup93 detects corresponding proteins in human, rat, and Xenopus cells. Immunofluorescence and immunoelectron microscopy localize vertebrate Nup93 at the nuclear basket and at or near the nuclear entry to the gated channel of the pore. Immunoprecipitation from both mammalian and Xenopus cell extracts indicates that a small fraction of Nup93 physically interacts with the nucleoporin p62, just as yeast Nic96p interacts with the yeast p62 homologue. However, a large fraction of vertebrate Nup93 is extracted from pores and is also present in Xenopus egg extracts in complex with a newly discovered 205-kDa protein. Mass spectrometric sequencing of the human 205-kDa protein reveals that this protein is encoded by an open reading frame, KIAAO225, present in the human database. The putative human nucleoporin of 205 kDa has related sequence homologues in Caenorhabditis elegans and Saccharomyces cerevisiae. To analyze the role of the Nup93 complex in the pore, nuclei were assembled that lack the Nup93 complex after immunodepletion of a Xenopus nuclear reconstitution extract. The Nup93-complex–depleted nuclei are clearly defective for correct nuclear pore assembly. From these experiments, we conclude that the vertebrate and yeast pore have significant homology in their functionally important cores and that, with the identification of Nup93 and the 205-kDa protein, we have extended the knowledge of the nearest-neighbor interactions of this core in both yeast and vertebrates. PMID:9348540

  17. Tryptic digestion of human GPIIIa. Isolation and biochemical characterization of the 23 kDa N-terminal glycopeptide carrying the antigenic determinant for a monoclonal antibody (P37) which inhibits platelet aggregation.

    PubMed Central

    Calvete, J J; Rivas, G; Maruri, M; Alvarez, M V; McGregor, J L; Hew, C L; Gonzalez-Rodriguez, J

    1988-01-01

    Early digestion of pure human platelet glycoprotein IIIa (GPIIIa) leads to a single cleavage of the molecule at 23 kDa far from one of the terminal amino acids. Automated Edman degradation demonstrates that GPIIIa and the smaller (23 kDa) tryptic fragment share the same N-terminal amino acid sequence. A further cleavage occurs in the larger fragment (80 kDa), reducing its apparent molecular mass by 10 kDa. The 23 kDa fragment remains attached to the larger ones in unreduced samples. Stepwise reduction of early digested GPIIIa with dithioerythritol selectively reduces the single disulphide bond joining the smaller (23 kDa) to the larger (80/70 kDa) fragments. Two fractions were obtained by size-exclusion chromatography of early digested GPIIIa after partial or full reduction and alkylation. The larger-size fraction contains the 80/70 kDa fragments, while the 23 kDa fragment is isolated in the smaller. The amino acid compositions of these fractions do not differ very significantly from the composition of GPIIIa; however the 23 kDa fragment contains only 10.2% by weight of sugars and is richer in neuraminic acid. Disulphide bonds are distributed four in the 23 kDa glycopeptide and 20-21 in the 80/70 kDa glycopeptide. The epitope for P37, a monoclonal antibody which inhibits platelet aggregation [Melero & González-Rodríguez (1984) Eur. J. Biochem. 141, 421-427] is situated within the first 17 kDa of the N-terminal region of GPIIIa, which gives a special functional interest to this extracellular region of GPIIIa. On the other hand, the epitopes for GPIIIa-specific monoclonal antibodies, P6, P35, P40 and P97, which do not interfere with platelet aggregation, are located within the larger tryptic fragment (80/70 kDa). Thus, the antigenic areas available in the extracellular surface of GPIIIa for these five monoclonal antibodies are now more precisely delineated. Images Fig. 1. Fig. 2. Fig. 3. Fig. 4. PMID:2455507

  18. Regulatory sequence analysis tools.

    PubMed

    van Helden, Jacques

    2003-07-01

    The web resource Regulatory Sequence Analysis Tools (RSAT) (http://rsat.ulb.ac.be/rsat) offers a collection of software tools dedicated to the prediction of regulatory sites in non-coding DNA sequences. These tools include sequence retrieval, pattern discovery, pattern matching, genome-scale pattern matching, feature-map drawing, random sequence generation and other utilities. Alternative formats are supported for the representation of regulatory motifs (strings or position-specific scoring matrices) and several algorithms are proposed for pattern discovery. RSAT currently holds >100 fully sequenced genomes and these data are regularly updated from GenBank.

  19. Purification and characterization of a 22-kDa microsomal protein from rat parotid gland which is phosphorylated following stimulation by agonists involving cAMP as second messenger.

    PubMed

    Thiel, G; Schmidt, W E; Meyer, H E; Söling, H D

    1988-01-04

    Stimulation of secretion in exocrine glands by agonists involving cAMP as second messenger leads to the phosphorylation of the ribosomal protein S6 (protein I) and two other particulate proteins with apparent molecular masses of 24 kDa (protein II) and 22 kDa (protein III) [Jahn, R., Unger, C. & Söling, H. D. (1980) Eur. J. Biochem. 112, 345-352]. This report describes the purification and characterization of protein III. Solubilization studies indicate that protein III is an intrinsic membrane protein. It could be extracted from the endoplasmic reticulum membrane only with Triton X-100, SDS or concentrated formic or acetic acid. The purification of this protein involved extraction of the microsomes with Triton X-100, removal of the detergent by acetone precipitation, extraction of water-soluble proteins, lipids and lipoproteins, and preparative SDS polyacrylamide gel electrophoresis. The protein has a basic pI (greater than 8.7). For determination of the amino acid composition of protein III and for sequencing of its amino-terminal portion, the protein was electroeluted out off the gel, the detergent removed and the protein finally purified by reversed-phase HPLC. Protein III could be phosphorylated in vitro by the catalytic subunit of the cAMP-dependent protein kinase to a degree of approximately 0.14 mol phosphate/mol protein. The only phosphopeptide obtained after in vitro phosphorylation and subsequent tryptic or chymotryptic digestion was identical with the phosphopeptide obtained after stimulation of intact rat parotid gland lobules with isoproterenol. The sequence of this peptide was Lys-Leu-Ser(P)-Glu-Ala-Asp-Asn-Arg. It was confirmed by an analysis of the synthetic peptide following in vitro phosphorylation with cAMP-dependent protein kinase. The first 41 N-terminal residues of protein III were sequenced. So far no sequence homology with other known peptides or proteins could be found.

  20. Nuclear 82-kDa choline acetyltransferase decreases amyloidogenic APP metabolism in neurons from APP/PS1 transgenic mice.

    PubMed

    Albers, Shawn; Inthathirath, Fatima; Gill, Sandeep K; Winick-Ng, Warren; Jaworski, Ewa; Wong, Daisy Y L; Gros, Robert; Rylett, R Jane

    2014-09-01

    Alzheimer disease (AD) is associated with increased amyloidogenic processing of amyloid precursor protein (APP) to β-amyloid peptides (Aβ), cholinergic neuron loss with decreased choline acetyltransferase (ChAT) activity, and cognitive dysfunction. Both 69-kDa ChAT and 82-kDa ChAT are expressed in cholinergic neurons in human brain and spinal cord with 82-kDa ChAT localized predominantly to neuronal nuclei, suggesting potential alternative functional roles for the enzyme. By gene microarray analysis, we found that 82-kDa ChAT-expressing IMR32 neural cells have altered expression of genes involved in diverse cellular functions. Importantly, genes for several proteins that regulate APP processing along amyloidogenic and non-amyloidogenic pathways are differentially expressed in 82-kDa ChAT-containing cells. The predicted net effect based on observed changes in expression patterns of these genes would be decreased amyloidogenic APP processing with decreased Aβ production. This functional outcome was verified experimentally as a significant decrease in BACE1 protein levels and activity and a concomitant reduction in the release of endogenous Aβ1-42 from neurons cultured from brains of AD-model APP/PS1 transgenic mice. The expression of 82-kDa ChAT in neurons increased levels of GGA3, which is involved in trafficking BACE1 to lysosomes for degradation. shRNA-induced decreases in GGA3 protein levels attenuated the 82-kDa ChAT-mediated decreases in BACE1 protein and activity and Aβ1-42 release. Evidence that 82-kDa ChAT can enhance GGA3 gene expression is shown by enhanced GGA3 gene promoter activity in SN56 neural cells expressing this ChAT protein. These studies indicate a novel relationship between cholinergic neurons and APP processing, with 82-kDa ChAT acting as a negative regulator of Aβ production. This decreased formation of Aβ could result in protection for cholinergic neurons, as well as protection of other cells in the vicinity that are sensitive to

  1. Ultratight crystal packing of a 10 kDa protein

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Trillo-Muyo, Sergio; Jasilionis, Andrius; Domagalski, Marcin J.

    2013-03-01

    The crystal structure of the C-terminal domain of a putative U32 peptidase from G. thermoleovorans is reported; it is one of the most tightly packed protein structures reported to date. While small organic molecules generally crystallize forming tightly packed lattices with little solvent content, proteins form air-sensitive high-solvent-content crystals. Here, the crystallization and full structure analysis of a novel recombinant 10 kDa protein corresponding to the C-terminal domain of a putative U32 peptidase are reported. The orthorhombic crystal contained only 24.5% solvent and is therefore among the most tightly packed protein lattices ever reported.

  2. Crystal structure of the 25 kDa subunit of human cleavage factor Im

    PubMed Central

    Coseno, Molly; Martin, Georges; Berger, Christopher; Gilmartin, Gregory; Keller, Walter; Doublié, Sylvie

    2008-01-01

    Cleavage factor Im is an essential component of the pre-messenger RNA 3′-end processing machinery in higher eukaryotes, participating in both the polyadenylation and cleavage steps. Cleavage factor Im is an oligomer composed of a small 25 kDa subunit (CF Im25) and a variable larger subunit of either 59, 68 or 72 kDa. The small subunit also interacts with RNA, poly(A) polymerase, and the nuclear poly(A)-binding protein. These protein–protein interactions are thought to be facilitated by the Nudix domain of CF Im25, a hydrolase motif with a characteristic α/β/α fold and a conserved catalytic sequence or Nudix box. We present here the crystal structures of human CF Im25 in its free and diadenosine tetraphosphate (Ap4A) bound forms at 1.85 and 1.80 Å, respectively. CF Im25 crystallizes as a dimer and presents the classical Nudix fold. Results from crystallographic and biochemical experiments suggest that CF Im25 makes use of its Nudix fold to bind but not hydrolyze ATP and Ap4A. The complex and apo protein structures provide insight into the active oligomeric state of CF Im and suggest a possible role of nucleotide binding in either the polyadenylation and/or cleavage steps of pre-messenger RNA 3′-end processing. PMID:18445629

  3. A 170kDa multi-domain cystatin of Fasciola gigantica is active in the male reproductive system.

    PubMed

    Geadkaew, Amornrat; Kosa, Nanthawat; Siricoon, Sinee; Grams, Suksiri Vichasri; Grams, Rudi

    2014-09-01

    Cystatins are functional as intra- and extracellular inhibitors of cysteine proteases and are expressed as single or multi-domain proteins. We have previously described two single domain type 1 cystatins in the trematode Fasciola gigantica that are released into the parasite's intestinal tract and exhibit inhibitory activity against endogenous and host cathepsin L and B proteases. In contrast, the here presented 170kDa multi-domain cystatin (FgMDC) comprises signal peptide and 12 tandem repeated cystatin-like domains with similarity to type 2 single domain cystatins. The domains show high sequence divergence with identity values often <20% and at only 26.8% between the highest matching domains 6 and 10. Several domains contain degenerated QVVAG core motifs and/or lack other important residues of active type 2 cystatins. Domain-specific antisera detected multiple forms of FgMDC ranging from <10 to >120kDa molecular mass in immunoblots of parasite crude extracts and ES product with different banding patterns for each antiserum demonstrating complex processing of the proprotein. The four domains with the highest conserved QVVAG motifs were expressed in Escherichia coli and the refolded recombinant proteins blocked cysteine protease activity in the parasite's ES product. Strikingly, immunohistochemical analysis using seven domain-specific antisera localized FgMDC in testis lobes and sperm. It is speculated that the processed cystatin-like domains have function analogous to the mammalian group of male reproductive tissue-specific type 2 cystatins and are functional in spermiogenesis and fertilization. Copyright © 2014 Elsevier B.V. All rights reserved.

  4. Evaluation of the immunomodulatory effect of the 14 kDa protein isolated from aged garlic extract on dendritic cells.

    PubMed

    Ahmadabad, Hasan Namdar; Hassan, Zuhair Mohammad; Safari, Elahe; Bozorgmehr, Mahmood; Ghazanfari, Tooba; Moazzeni, Seyed Mohammad

    2011-01-01

    Garlic is used all over the world for treatment of different diseases. A wide range of biological activities of garlic has been verified in vitro and in vivo. One of major proteins of garlic which has been isolated and purified is the 14 kDa protein. This protein has been shown to have immunomodulatory effects. In this study, the effect of the 14 kDa protein isolated from aged garlic extract (AGE) was investigated on maturation and immunomodulatory activity of dendritic cells (DC). Proteins were purified from AGE by biochemical method; the semi-purified 14 kDa protein was run on gel filtration Sephadex G50 and its purity was checked by SDS-PAGE. DC were isolated from spleen of BALB/c mice by Nycodenz centrifugation and their adhesiveness to plastic dish. 14 kDa protein from AGE was added to overnight culture of DC medium and the expression percentage of CD40, CD86, and MHC-II was evaluated by flowcytometric analysis. Also, proliferation of T-cells was measured by allogenic mixed lymphocyte reaction (MLR) test. The purified 14 kDa protein isolated from AGE increased the expression of CD40 molecule on DC, but it did not influence CD86 and MHCII molecules. Furthermore, no significant differences were noticed in the pulsed-DC with 14 kDa protein and non-pulsed DC on the MLR. Copyright © 2011 Elsevier Inc. All rights reserved.

  5. The 30 kDa protein co-purified with chick liver glutathione S-transferases is a carbonyl reductase.

    PubMed

    Tsai, S P; Wang, L Y; Yeh, H I; Tam, M F

    1996-02-08

    An unidentified 30 kDa protein was co-purified with chick liver glutathione S-transferases from S-hexylglutathione affinity column. The protein was isolated to apparent homogeneity with chromatofocusing. The molecular mass of the protein was determined to be 30 277 +/- 3 dalton by mass spectrometry. The protein was digested with Achromobacter proteinase I. Amino-acid sequence analyses of the resulting peptides show a high degree of identity with those of human carbonyl reductase. The protein is active with menadione as substrate. Thus, it is identified as chick liver carbonyl reductase.

  6. Studying the highly bent spectra of FR II-type radio galaxies with the KDA EXT model

    NASA Astrophysics Data System (ADS)

    Kuligowska, Elżbieta

    2018-04-01

    Context. The Kaiser, Dennett-Thorpe & Alexander (KDA, 1997, MNRAS, 292, 723) EXT model, that is, the extension of the KDA model of Fanaroff & Riley (FR) II-type source evolution, is applied and confronted with the observational data for selected FR II-type radio sources with significantly aged radio spectra. Aim. A sample of FR II-type radio galaxies with radio spectra strongly bent at their highest frequencies is used for testing the usefulness of the KDA EXT model. Methods: The dynamical evolution of FR II-type sources predicted with the KDA EXT model is briefly presented and discussed. The results are then compared to the ones obtained with the classical KDA approach, assuming the source's continuous injection and self-similarity. Results: The results and corresponding diagrams obtained for the eight sample sources indicate that the KDA EXT model predicts the observed radio spectra significantly better than the best spectral fit provided by the original KDA model.

  7. Cloning and sequence analysis demonstrate the chromate reduction ability of a novel chromate reductase gene from Serratia sp.

    PubMed

    Deng, Peng; Tan, Xiaoqing; Wu, Ying; Bai, Qunhua; Jia, Yan; Xiao, Hong

    2015-03-01

    The ChrT gene encodes a chromate reductase enzyme which catalyzes the reduction of Cr(VI). The chromate reductase is also known as flavin mononucleotide (FMN) reductase (FMN_red). The aim of the present study was to clone the full-length ChrT DNA from Serratia sp. CQMUS2 and analyze the deduced amino acid sequence and three-dimensional structure. The putative ChrT gene fragment of Serratia sp. CQMUS2 was isolated by polymerase chain reaction (PCR), according to the known FMN_red gene sequence from Serratia sp. AS13. The flanking sequences of the ChrT gene were obtained by high efficiency TAIL-PCR, while the full-length gene of ChrT was cloned in Escherichia coli for subsequent sequencing. The nucleotide sequence of ChrT was submitted onto GenBank under the accession number, KF211434. Sequence analysis of the gene and amino acids was conducted using the Basic Local Alignment Search Tool, and open reading frame (ORF) analysis was performed using ORF Finder software. The ChrT gene was found to be an ORF of 567 bp that encodes a 188-amino acid enzyme with a calculated molecular weight of 20.4 kDa. In addition, the ChrT protein was hypothesized to be an NADPH-dependent FMN_red and a member of the flavodoxin-2 superfamily. The amino acid sequence of ChrT showed high sequence similarity to the FMN reductase genes of Klebsiella pneumonia and Raoultella ornithinolytica , which belong to the flavodoxin-2 superfamily. Furthermore, ChrT was shown to have a 85.6% similarity to the three-dimensional structure of Escherichia coli ChrR, sharing four common enzyme active sites for chromate reduction. Therefore, ChrT gene cloning and protein structure determination demonstrated the ability of the gene for chromate reduction. The results of the present study provide a basis for further studies on ChrT gene expression and protein function.

  8. Cloning and sequence analysis demonstrate the chromate reduction ability of a novel chromate reductase gene from Serratia sp

    PubMed Central

    DENG, PENG; TAN, XIAOQING; WU, YING; BAI, QUNHUA; JIA, YAN; XIAO, HONG

    2015-01-01

    The ChrT gene encodes a chromate reductase enzyme which catalyzes the reduction of Cr(VI). The chromate reductase is also known as flavin mononucleotide (FMN) reductase (FMN_red). The aim of the present study was to clone the full-length ChrT DNA from Serratia sp. CQMUS2 and analyze the deduced amino acid sequence and three-dimensional structure. The putative ChrT gene fragment of Serratia sp. CQMUS2 was isolated by polymerase chain reaction (PCR), according to the known FMN_red gene sequence from Serratia sp. AS13. The flanking sequences of the ChrT gene were obtained by high efficiency TAIL-PCR, while the full-length gene of ChrT was cloned in Escherichia coli for subsequent sequencing. The nucleotide sequence of ChrT was submitted onto GenBank under the accession number, KF211434. Sequence analysis of the gene and amino acids was conducted using the Basic Local Alignment Search Tool, and open reading frame (ORF) analysis was performed using ORF Finder software. The ChrT gene was found to be an ORF of 567 bp that encodes a 188-amino acid enzyme with a calculated molecular weight of 20.4 kDa. In addition, the ChrT protein was hypothesized to be an NADPH-dependent FMN_red and a member of the flavodoxin-2 superfamily. The amino acid sequence of ChrT showed high sequence similarity to the FMN reductase genes of Klebsiella pneumonia and Raoultella ornithinolytica, which belong to the flavodoxin-2 superfamily. Furthermore, ChrT was shown to have a 85.6% similarity to the three-dimensional structure of Escherichia coli ChrR, sharing four common enzyme active sites for chromate reduction. Therefore, ChrT gene cloning and protein structure determination demonstrated the ability of the gene for chromate reduction. The results of the present study provide a basis for further studies on ChrT gene expression and protein function. PMID:25667630

  9. Identification of an abundant 56 kDa protein implicated in food allergy as granule-bound starch synthase

    USDA-ARS?s Scientific Manuscript database

    Rice, the staple food of South and East Asian counties, is considered to be hypoallergenic. However, several clinical studies have documented rice-induced allergy in sensitive patients. Rice proteins with molecular weights of 14-16 kDa, 26 kDa, 33 kDa and 56 kDa have been identified as allergens. Re...

  10. RSAT 2015: Regulatory Sequence Analysis Tools

    PubMed Central

    Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A.; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M.; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

    2015-01-01

    RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. PMID:25904632

  11. Crystal Structure of the 25 kDa Subunit of Human Cleavage Factor I{m}

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Coseno,M.; Martin, G.; Berger, C.

    Cleavage factor Im is an essential component of the pre-messenger RNA 3'-end processing machinery in higher eukaryotes, participating in both the polyadenylation and cleavage steps. Cleavage factor Im is an oligomer composed of a small 25 kDa subunit (CF Im25) and a variable larger subunit of either 59, 68 or 72 kDa. The small subunit also interacts with RNA, poly(A) polymerase, and the nuclear poly(A)-binding protein. These protein-protein interactions are thought to be facilitated by the Nudix domain of CF Im25, a hydrolase motif with a characteristic {alpha}/{beta}/{alpha} fold and a conserved catalytic sequence or Nudix box. We present heremore » the crystal structures of human CF Im25 in its free and diadenosine tetraphosphate (Ap4A) bound forms at 1.85 and 1.80 Angstroms, respectively. CF Im25 crystallizes as a dimer and presents the classical Nudix fold. Results from crystallographic and biochemical experiments suggest that CF Im25 makes use of its Nudix fold to bind but not hydrolyze ATP and Ap4A. The complex and apo protein structures provide insight into the active oligomeric state of CF Im and suggest a possible role of nucleotide binding in either the polyadenylation and/or cleavage steps of pre-messenger RNA 3'-end processing.« less

  12. A conserved 19-kDa Eimeria tenella antigen is a profilin-like protein.

    PubMed

    Fetterer, R H; Miska, K B; Jenkins, M C; Barfield, R C

    2004-12-01

    A wide range of recombinant proteins from Eimeria species have been reported to offer some degree of protection against infection and disease, but the specific biological function of these proteins is largely unknown. Previous studies have demonstrated a 19-kDa protein of unknown function designated SZ-1 in sporozoites and merozoites of Eimeria acervulina that can be used to confer partial protection against coccidiosis. Reverse transcriptase-polymerase chain reaction indicated that the gene for SZ-1 is expressed by all the asexual stages of Eimeria tenella. Rabbit antisera to recombinant SZ-1 recognized an approximately 19-kDa protein from extracts of E. tenella sporozoites, merozoites, sporulated oocysts, and oocysts in various stages of sporulation. Immunofluorescence antibody staining indicated specific staining of E. tenella sporozoites and merozoites. Staining was most intense in the cytoplasm of the posterior end of the parasite. The primary amino acid sequence of the gene for E. tenella SZ-1 deduced from the E. tenella genome indicated a conserved domain for the actin-regulatory protein profilin. A conserved binding site for poly-L-proline (PLP), characteristic of profilin was also observed. SZ-1 was separated from soluble extract of E. tenella proteins by affinity chromatography using a PLP ligand, confirming the ability of SZ-1 to bind PLP. SZ-1 also partially inhibited the polymerization of actin. The current results are consistent with the classification of SZ-1 as a profilin-related protein.

  13. Molecular Cloning and Sequence Analysis of the Sta58 Major Antigen Gene of Rickettsia tsutsugamushi: Sequence homology and Antigenic Comparison of Sta58 to the 60-Kilodalton Family of Stress Proteins

    DTIC Science & Technology

    1990-05-01

    Sta58 antigen and the Sta56 strain- GroES, C. burnetii HtpA, Mycobacterium tuberculosis 12- specific major antigen of R. tsutsugamushi (strain Karp...kb HindlIl fragment carrying the gene for the Sta58 tuberculosis, and Mycobacterium smegmatis (65-kDa anti- protein was subjected to DNA sequence...the Hsp6O and HsplO proteins. R. tsu., R. isutsugamushi; M. lep., Mvtcobacteriutn leprae : C. bur., C. burneiii; Synech.. Synechococcus strain 6301; T

  14. RSAT 2015: Regulatory Sequence Analysis Tools.

    PubMed

    Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

    2015-07-01

    RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  15. Allergenic Characterization of 27-kDa Glycoprotein, a Novel Heat Stable Allergen, from the Pupa of Silkworm, Bombyx mori.

    PubMed

    Jeong, Kyoung Yong; Son, Mina; Lee, June Yong; Park, Kyung Hee; Lee, Jae-Hyun; Park, Jung-Won

    2016-01-01

    Boiled silkworm pupa is a traditional food in Asia, and patients with silkworm pupa food allergy are common in these regions. Still now only one allergen from silkworm, arginine kinase, has been identified. The purpose of this study was to identify novel food allergens in silkworm pupa by analyzing a protein extract after heat treatment. Heat treated extracts were examined by proteomic analysis. A 27-kDa glycoprotein was identified, expressed in Escherichia coli, and purified. IgE reactivity of the recombinant protein was investigated by ELISA. High molecular weight proteins (above 100 kDa) elicited increased IgE binding after heat treatment compared to that before heat treatment. The molecular identities of these proteins, however, could not be determined. IgE reactivity toward a 27-kDa glycoprotein was also increased after heating the protein extract. The recombinant protein was recognized by IgE antibodies from allergic subjects (33.3%). Glycation or aggregation of protein by heating may create new IgE binding epitopes. Heat stable allergens are shown to be important in silkworm allergy. Sensitization to the 27-kDa glycoprotein from silkworm may contribute to elevation of IgE to silkworm.

  16. Allergenic Characterization of 27-kDa Glycoprotein, a Novel Heat Stable Allergen, from the Pupa of Silkworm, Bombyx mori

    PubMed Central

    Son, Mina; Lee, June Yong

    2016-01-01

    Boiled silkworm pupa is a traditional food in Asia, and patients with silkworm pupa food allergy are common in these regions. Still now only one allergen from silkworm, arginine kinase, has been identified. The purpose of this study was to identify novel food allergens in silkworm pupa by analyzing a protein extract after heat treatment. Heat treated extracts were examined by proteomic analysis. A 27-kDa glycoprotein was identified, expressed in Escherichia coli, and purified. IgE reactivity of the recombinant protein was investigated by ELISA. High molecular weight proteins (above 100 kDa) elicited increased IgE binding after heat treatment compared to that before heat treatment. The molecular identities of these proteins, however, could not be determined. IgE reactivity toward a 27-kDa glycoprotein was also increased after heating the protein extract. The recombinant protein was recognized by IgE antibodies from allergic subjects (33.3%). Glycation or aggregation of protein by heating may create new IgE binding epitopes. Heat stable allergens are shown to be important in silkworm allergy. Sensitization to the 27-kDa glycoprotein from silkworm may contribute to elevation of IgE to silkworm. PMID:26770033

  17. Identification and sequencing of members of a drought-induced multigene family in Atriplex canescens (salt bush)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jing Chen; Cairney, J.; Newton, R.J.

    1991-05-01

    Atriplex canescens (Pursh.) Nutt. is known to have a high degree of morphological and physiological drought-tolerance, which appears to be related to molecular responses. A cDNA library, constructed from drought-induced messenger RNA, was differentially screened with radioactively labelled cDNA probes synthesized from mRNA extracted from stressed and non-stressed Atriplex. Two clones named 19-3 and 27-3, whose expression is induced by drought-stress, have been characterized. Sequence analysis shows that they are more than 96% homologous. Each clone has an open reading frame which specifies a protein of 95 amino acids (12.77 kDa and 12.74 kDa respectively.) In vitro transcription and translationmore » of each clone results in a single protein of apparent molecular weight 8.6 kDa. The disparity in size may be due to secondary structure, dictated, at least in part, by a highly charged carboxy terminus which may be important for the function of these proteins in drought tolerance.« less

  18. Sequencing, Analysis, and Annotation of Expressed Sequence Tags for Camelus dromedarius

    PubMed Central

    Al-Swailem, Abdulaziz M.; Shehata, Maher M.; Abu-Duhier, Faisel M.; Al-Yamani, Essam J.; Al-Busadah, Khalid A.; Al-Arawi, Mohammed S.; Al-Khider, Ali Y.; Al-Muhaimeed, Abdullah N.; Al-Qahtani, Fahad H.; Manee, Manee M.; Al-Shomrani, Badr M.; Al-Qhtani, Saad M.; Al-Harthi, Amer S.; Akdemir, Kadir C.; Otu, Hasan H.

    2010-01-01

    Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and ∼40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism. PMID:20502665

  19. Analysis of the recE locus of Escherichia coli K-12 by use of polyclonal antibodies to exonuclease VIII.

    PubMed Central

    Luisi-DeLuca, C; Clark, A J; Kolodner, R D

    1988-01-01

    Exonuclease VIII (exoVIII) of Escherichia coli has been purified from a strain carrying a plasmid-encoded recE gene by using a new procedure. This procedure yielded 30 times more protein per gram of cells, and the protein had a twofold higher specific activity than the enzyme purified by the previously published procedure (J. W. Joseph and R. Kolodner, J. Biol. Chem. 258:10411-10417, 1983). The sequence of the 12 N-terminal amino acids was also obtained and found to correspond to one of the open reading frames predicted from the nucleic acid sequence of the recE region of Rac (C. Chu, A. Templin, and A. J. Clark, manuscript in preparation). Polyclonal antibodies directed against purified exoVIII were also prepared. Cell-free extracts prepared from strains containing a wide range of chromosomal- or plasmid-encoded point, insertion, and deletion mutations which result in expression of exoVIII were examined by Western blot (immunoblot) analysis. This analysis showed that two point sbcA mutations (sbcA5 and sbcA23) and the sbc insertion mutations led to the synthesis of the 140-kilodalton (kDa) polypeptide of wild-type exoVIII. Plasmid-encoded partial deletion mutations of recE reduced the size of the cross-reacting protein(s) in direct proportion to the size of the deletion, even though exonuclease activity was still present. The analysis suggests that 39 kDa of the 140-kDa exoVIII subunit is all that is essential for exonuclease activity. One of the truncated but functional exonucleases (the pRAC3 exonuclease) has been purified and confirmed to be a 41-kDa polypeptide. The first 18 amino acids from the N terminus of the 41-kDa pRAC3 exonuclease were sequenced and fond to correspond to one of the translational start signals predicted from the nucleotide sequence of radC (Chu et al., in preparation). Images PMID:3056915

  20. Complementary DNA cloning, sequence analysis, and tissue transcription profile of a novel U2AF2 gene from the Chinese Banna mini-pig inbred line.

    PubMed

    Wang, S Y; Huo, J L; Miao, Y W; Cheng, W M; Zeng, Y Z

    2013-04-02

    U2 small nuclear RNA auxiliary factor 2 (U2AF2) is an important gene for pre-messenger RNA splicing in higher eukaryotes. In this study, the Banna mini-pig inbred line (BMI) U2AF2 coding sequence (CDS) was cloned, sequenced, and characterized. The U2AF2 complete CDS was amplified using the reverse transcription-polymerase chain reaction (RT-PCR) technique based on the conserved sequence information of cattle and known highly homologous swine expressed sequence tags. This novel gene was deposited into the National Center for Biotechnology Information database (Accession No. JQ839267). Sequence analysis revealed that the BMI U2AF2 coding sequence consisted of 1416 bp and encoded 471 amino acids with a molecular weight of 53.12 kDa. The protein sequence has high sequence homology with U2AF65 of 6 species - Homo sapiens (100%), Equus caballus (100%), Canis lupus (100%), Macaca mulatta (99.8%), Bos taurus (74.4%), and Mus musculus (74.4%). The phylogenetic tree analysis revealed that BMI U2AF65 has a closer genetic relationship with B. taurus U2AF65 than with U2AF65 of E. caballus, C. lupus, M. mulatta, H. sapiens, and M. musculus. RT-PCR analysis showed that BMI U2AF2 was most highly expressed in the brain; moderately expressed in the spleen, lung, muscle, and skin; and weakly expressed in the liver, kidney, and ovary. Its expression was nearly silent in the spinal cord, nerve fiber, heart, stomach, pancreas, and intestine. Three microRNA target sites were predicted in the CDS of BMI U2AF2 messenger RNA. Our results establish a foundation for further insight into this swine gene.

  1. Usefulness of 8 kDa protein of Fasciola hepatica in diagnosis of fascioliasis

    PubMed Central

    Kim, Kwangsig; Yang, Hyun Jong

    2003-01-01

    This study was designed to detect and evaluate an antigenicity of low molecular weight proteins of Fasciola hepatica in fascioliasis. Low molecular weight protein of F. hepatica was purified by ammonium sulfate precipitation and Sephacryl S-100 HR gel filtration. The protein obtained was estimated to be 8 kDa on 7.5-15% gradient sodium dodecyl sulfate gel electrophoresis. Immunoblotting studies showed that the 8 kDa protein reacted with human fascioliasis sera, but not other trematodiasis sera. This result suggests that the 8 kDa protein of F. hepatica is one of diagnostic antigens in human fascioliasis without cross-reaction with other human trematodiasis. PMID:12815325

  2. Trichinella spiralis: strong antibody response to a 49 kDa newborn larva antigen in infected rats.

    PubMed

    Salinas-Tobon, Maria Del Rosario; Navarrete-Leon, Anaid; Mendez-Loredo, Blanca Esther; Esquivel-Aguirre, Dalia; Martínez-Abrajan, Dulce Maria; Hernandez-Sanchez, Javier

    2007-02-01

    In this work, we analyzed the kinetics of anti-Trichinella spiralis newborn larva (NBL) antibodies (Ab) and the antigenic recognition pattern of NBL proteins and its dose effects. Wistar rats were infected with 0, 700, 2000, 4000 and 8000 muscle larvae (ML) and bled at different time intervals up to day 31 post infection (p.i.). Ab production was higher with 2000 ML dose and decreased with 8000, 4000 and 700 ML. Abs were not detected until day 10, peaked on day 14 for the 2000 ML dose and on day 19 for the other doses and thereafter declined slowly from 19 to 31 days p.i. In contrast, Abs to ML increased from day 10, peaked on day 19 and remained high until the end of the study. Abs bound strongly at least to three NBL components of 188, 205 and 49 kDa. NBL antigen of 188 and 205 kDa were recognized 10-26 days p.i. and that of 49 kDa from day 10 to day 31 p.i. A weak recognition towards antigens of 52, 54, 62 and 83 kDa was also observed during the infection. An early recognition of 31, 43, 45, 55, 68 and 85 kDa ML antigens was observed whereas the response to those of 43, 45, 48, 60, 64 and 97 kDa (described previously as TSL-1 antigens) occurred late in the infection. A follow-up of antigen recognition up to day 61 with the optimal immunization dose (2000 ML) evidenced a decline of Ab production to the 49 kDa NBL antigen 42 days p.i., which suggested antigenic differences with the previously reported 43 kDa ML antigen strongly recognized late in the infection. To analyze the stage-specificity of the 49 kDa NBL antigen, polyclonal antibodies (PoAb) were obtained in rats immunized with 49 kDa NBL antigen. PoAb reacted strongly with the 49 kDa NBL component in NBL total soluble extract but no reactivity was observed with soluble antigen of the other T. spiralis stages. Albeit with less intensity, the 49 kDa component was also recognized by PoAb together with other antigens of 53, 97 and 107 kDa, in NBL excretory-secretory products (NBL-ESP). Thus, our results reveal

  3. Phosphorylation of Tat-interactive protein 60 kDa by protein kinase C epsilon is important for its subcellular localisation.

    PubMed

    Sapountzi, Vasileia; Logan, Ian R; Nelson, Glyn; Cook, Susan; Robson, Craig N

    2008-01-01

    Tat-interactive protein 60 kDa is a nuclear acetyltransferase that both coactivates and corepresses transcription factors and has a definitive function in the DNA damage response. Here, we provide evidence that Tat-interactive protein 60 kDa is phosphorylated by protein kinase C epsilon. In vitro, protein kinase C epsilon phosphorylates Tat-interactive protein 60 kDa on at least two sites within the acetyltransferase domain. In whole cells, activation of protein kinase C increases the levels of phosphorylated Tat-interactive protein 60 kDa and the interaction of Tat-interactive protein 60 kDa with protein kinase C epsilon. A phosphomimetic mutant Tat-interactive protein 60 kDa has distinct subcellular localisation compared to the wild-type protein in whole cells. Taken together, these findings suggest that the protein kinase C epsilon phosphorylation sites on Tat-interactive protein 60 kDa are important for its subcellular localisation. Regulation of the subcellular localisation of Tat-interactive protein 60 kDa via phosphorylation provides a novel means of controlling Tat-interactive protein 60 kDa function.

  4. Definition of the complete Schistosoma mansoni hemoglobinase mRNA sequence and gene expression in developing parasites.

    PubMed

    el Meanawy, M A; Aji, T; Phillips, N F; Davis, R E; Salata, R A; Malhotra, I; McClain, D; Aikawa, M; Davis, A H

    1990-07-01

    Schistosoma mansoni uses a variety of proteases termed hemoglobinases to obtain nutrition from host globin. Previous reports have characterized cDNAs encoding 1 of these enzymes. However, these sequences did not define the primary structures of the mRNA and protein. The complete sequence of the 1390 base mRNA has now been determined. It encodes a 50 kDa primary translation product. In vitro translations coupled with immunoprecipitations and Western blots of parasite lysates allowed visualization of the 50 kDa form. Production of the 31 kDa mature hemoglobinase from the 50 kDa species involves removal of both NH2 and COOH terminal residues from the primary translation product. Expression of hemoglobinase mRNA and protein was examined during larval parasite development. Low levels were observed in young schistosomula. After 6-9 days in culture, high hemoglobinase levels were seen which correlated with the onset of red blood cell feeding. Immunoelectron microscopy was employed to examine hemoglobinase location and function. In adult worms the enzyme was associated with the gut lumen and gut epithelium. In cercariae, the protease was observed in the head gland, suggesting new roles for the protease.

  5. Isoform composition and stoichiometry of the approx. 90-kDa heat shock protein associated with glucocorticoid receptors

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mendel, D.B.; Orti, E.

    1988-05-15

    The authors observed that the approx. 90-kDa non-steroid-binding component of nonactivated glucocorticoid receptors purified from WEHI-7 mouse thymoma cells (which has been identified as the approx. 90-kDa heat shock protein) consistently migrates as a doublet during polyacrylamide gel electrophoresis under denaturing and reducing conditions. It has recently been reported that murine Meth A cells contain a tumor-specific transplantation antigen (TSTA) which is related or identical to the approx. 90-kDa heat shock protein. The observation that TSTA and the approx. 90-kDa heat shock protein isolated from these cells exists as two isoforms of similar molecular mass and charge has suggested thatmore » the doublet observed is also due to the existence of two isoforms. They have therefore conducted this study to determine whether TSTA and the approx. 90-kDa component of glucocorticoid receptors are indeed related, to establish whether the receptor preferentially binds one isoform of the approx. 90-kDa heat shock protein, and to investigate the stoichiometry of the nonactivated receptor complex. They used the BuGr1 and AC88 monoclonal antibodies to purify, respectively, receptor-associated and free approx. 90-kDa heat shock protein from WEHI-7 cells grown for 48 h with (/sup 35/S)methionine to metabolically label proteins to steady state. The long-term metabolic labeling approach has also enabled them to directly determine that the purified non-activated glucocorticoid receptor contains a single steroid-binding protein and two approx. 90-kDa non-steroid-binding subunits. The consistency with which a approx. 1:2 stoichiometric ratio of steroid binding to approx. 90-kDa protein is observed supports the view that the approx. 90-kDa heat shock protein is a true component of nonactivated glucocorticoid-receptor complexes.« less

  6. Accumulation of 19-kDa plasma membrane polypeptide during induction of freezing tolerance in wheat suspension-cultured cells by abscisic acid.

    PubMed

    Koike, M; Takezawa, D; Arakawa, K; Yoshida, S

    1997-06-01

    Suspension-cultured cells derived from immature embryos of winter wheat (Triticum aestivum L. cv. Chihoku) were used in experiments designed to obtain clues to the mechanism of the ABA-induced development of freezing tolerance. Cultured cells treated with 50 microM ABA for 5 d at 23 degrees C acquired the maximum level of freezing tolerance (LT50; -21.6 degrees C). The increased freezing tolerance of ABA-treated cells was closely associated with the remarkable accumulation of 19-kDa polypeptides in the plasma membrane. The 19-kDa polypeptide components were isolated by preparative gel electrophoresis and were further separated into one major (AWPM-19) and other minor polypeptide components by Tricine-SDS-PAGE. N-terminal amino acid sequence of AWPM-19 was determined, and a cDNA clone encoding AWPM-19 was isolated by PCR from the library prepared from the ABA-treated cultured cells. The cDNA clone (WPM-1) encoded a 18.9 kDa hydrophobic polypeptide with four putative membrane spanning domains and with a high pI value (10.2). Expression of WPM-1 mRNA was dramatically induced by 50 microM ABA within a few hours. These results suggest that the AWPM-19 might be closely associated with the ABA-induced increase in freezing tolerance in wheat cultured cells.

  7. Formation of the 67-kDa laminin receptor by acylation of the precursor.

    PubMed

    Butò, S; Tagliabue, E; Ardini, E; Magnifico, A; Ghirelli, C; van den Brûle, F; Castronovo, V; Colnaghi, M I; Sobel, M E; Ménard, S

    1998-06-01

    Even though the involvement of the 67-kDa laminin receptor (67LR) in tumor invasiveness has been clearly demonstrated, its molecular structure remains an open problem, since only a full-length gene encoding a 37-kDa precursor protein (37LRP) has been isolated so far. A pool of recently obtained monoclonal antibodies directed against the recombinant 37LRP molecule was used to investigate the processing that leads to the formation of the 67-kDa molecule. In soluble extracts of A431 human carcinoma cells, these reagents recognize the precursor molecule as well as the mature 67LR and a 120-kDa molecule. The recovery of these proteins was found to be strikingly dependent upon the cell solubilization conditions: the 67LR is soluble in NP-40-lysis buffer whereas the 37LRP is NP-40-insoluble. Inhibition of 67LR formation by cerulenin indicates that acylation is involved in the processing of the receptor. It is likely a palmitoylation process, as indicated by sensitivity of NP-40-soluble extracts to hydroxylamine treatment. Immunoblotting assays performed with a polyclonal serum directed against galectin3 showed that both the 67- and the 120-kDa proteins carry galectin3 epitopes whereas the 37LRP does not. These data suggest that the 67LR is a heterodimer stabilized by strong intramolecular hydrophobic interactions, carried by fatty acids bound to the 37LRP and to a galectin3 cross-reacting molecule.

  8. Insights into rubber biosynthesis from transcriptome analysis of Hevea brasiliensis latex.

    PubMed

    Chow, Keng-See; Wan, Kiew-Lian; Isa, Mohd Noor Mat; Bahari, Azlina; Tan, Siang-Hee; Harikrishna, K; Yeang, Hoong-Yeet

    2007-01-01

    Hevea brasiliensis is the most widely cultivated species for commercial production of natural rubber (cis-polyisoprene). In this study, 10,040 expressed sequence tags (ESTs) were generated from the latex of the rubber tree, which represents the cytoplasmic content of a single cell type, in order to analyse the latex transcription profile with emphasis on rubber biosynthesis-related genes. A total of 3,441 unique transcripts (UTs) were obtained after quality editing and assembly of EST sequences. Functional classification of UTs according to the Gene Ontology convention showed that 73.8% were related to genes of unknown function. Among highly expressed ESTs, a significant proportion encoded proteins related to rubber biosynthesis and stress or defence responses. Sequences encoding rubber particle membrane proteins (RPMPs) belonging to three protein families accounted for 12% of the ESTs. Characterization of these ESTs revealed nine RPMP variants (7.9-27 kDa) including the 14 kDa REF (rubber elongation factor) and 22 kDa SRPP (small rubber particle protein). The expression of multiple RPMP isoforms in latex was shown using antibodies against REF and SRPP. Both EST and quantitative reverse transcription-PCR (QRT-PCR) analyses demonstrated REF and SRPP to be the most abundant transcripts in latex. Besides rubber biosynthesis, comparative sequence analysis showed that the RPMPs are highly similar to sequences in the plant kingdom having stress-related functions. Implications of the RPMP function in cis-polyisoprene biosynthesis in the context of transcript abundance and differential gene expression are discussed.

  9. Crystallization and X-ray data analysis of the 10 kDa C-terminal lid subdomain from Caenorhabditis elegans Hsp70

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Worrall, Liam; Walkinshaw, Malcolm D., E-mail: m.walkinshaw@ed.ac.uk

    Crystals of the C-terminal 10 kDa lid subdomain from the C. elegans chaperone Hsp70 have been obtained that diffract X-rays to ∼3.5 Å and belong to space group I2{sub 1}2{sub 1}2{sub 1}. Analysis of X-ray data and initial heavy-atom phasing reveals 24 monomers in the asymmetric unit related by 432 non-crystallographic symmetry. Hsp70 is an important molecular chaperone involved in the regulation of protein folding. Crystals of the C-terminal 10 kDa helical lid domain (residues 542–640) from a Caenorhabditis elegans Hsp70 homologue have been produced that diffract X-rays to ∼3.4 Å. Crystals belong to space group I2{sub 1}2{sub 1}2{sub 1},more » with unit-cell parameters a = b = 197, c = 200 Å. The Matthews coefficient, self-rotation function and Patterson map indicate 24 monomers in the asymmetric unit, showing non-crystallographic 432 symmetry. Molecular-replacement studies using the corresponding domain from rat, the only eukaryotic homologue with a known structure, failed and a mercury derivative was obtained. Preliminary MAD phasing using SHELXD and SHARP for location and refinement of the heavy-atom substructure and SOLOMON for density modification produced interpretable maps with a clear protein–solvent boundary. Further density-modification, model-building and refinement are currently under way.« less

  10. Biochemical, molecular, and phylogenetic analysis of pyruvate carboxylase in the yellow fever mosquito, Aedes aegypti.

    PubMed

    Tu, Z; Hagedorn, H H

    1997-02-01

    Pyruvate carboxylase (PC, pyruvate: carbon dioxide ligase [ADP-forming], EC 6.4.1.1) was purified from the yellow fever mosquito, Aedes aegypti. The purified PC showed two polypeptides of similar M(r) (133 and 128 k). The N-terminal sequences of both polypeptides were shown to be very similar, if not identical. A polyclonal antiserum against the 133 kDa polypeptide cross-reacted strongly with the 128 kDa polypeptide. PC was found in all tissues examined. Using a semi-quantitative Western blot assay, PC was shown to be concentrated in the indirect flight muscles and fat body preparations. The ratios of the 133 to 128 kDa polypeptides were shown to differ in various tissues and an Aedes albopictus cell line. The indirect flight muscle was the only tissue in which the 128 kDa polypeptide was more abundant, while both the midgut and the cell line showed almost exclusively the 133 kDa polypeptide. Both peptides were present in varying amounts in brain, malpighian tubule, ovary and fat body preparation. The two isoforms of PC could play different roles in the flight muscle and other tissues. Clones covering a complete cDNA of PC of A. aegypti were obtained using a directional approach. The 3952 bp nucleotide sequence, including a 3585 bp coding region, was determined from these cDNA clones. The deduced 1195 amino acid sequence has a calculated M(r) of 132,200. A putative mitochondrial targeting sequence was determined by comparing the deduced amino acid sequence to the N-terminal sequences of the mature protein. The presence of a mitochondrial targeting sequence indicates that the mosquito PC encoded by the cloned cDNA may be localized in the mitochondria. After the targeting sequence, three functional domains were identified in the following order; biotin carboxylase (BC), carboxyltransferase (CT) and biotin carboxyl carrier protein (BCCP). The mosquito PC showed very high similarity to PCs from other sources (55.1-75.2% identity). Genomic Southern analysis indicated

  11. Joint Sequence Analysis: Association and Clustering

    ERIC Educational Resources Information Center

    Piccarreta, Raffaella

    2017-01-01

    In its standard formulation, sequence analysis aims at finding typical patterns in a set of life courses represented as sequences. Recently, some proposals have been introduced to jointly analyze sequences defined on different domains (e.g., work career, partnership, and parental histories). We introduce measures to evaluate whether a set of…

  12. Identification of a 170-kDa protein associated with the vacuolar Na+/H+ antiport of Beta vulgaris.

    PubMed Central

    Barkla, B J; Blumwald, E

    1991-01-01

    The effect of the addition of amiloride to the growth medium was tested on the Na+/H+ antiport activity of tonoplast vesicles isolated from sugar beet (beta vulgaris L.) cell suspensions. Cells grown in the presence of NaCl and amiloride displayed an increased antiport activity. Analysis of the kinetic data showed that while the affinity of the antiport for Na+ ions did not change, the maximal velocity of the Na+/H+ exchange increased markedly. These results suggest the addition of more antiport molecules to the tonoplast and/or an increase in the turnover rate of the Na+/H+ exchange. The increase in activity of the antiport by the presence of amiloride was correlated with the enhanced synthesis of a tonoplast 170-kDa polypeptide. The increased synthesis of this polypeptide was detected not only upon exposure of the cells to amiloride but also when the cells were exposed to high NaCl concentrations. Polyclonal antibodies against the 170-kDa polypeptide almost completely inhibited the antiport activity. These results suggest the association of the 170-kDa polypeptide with the vacuolar Na+/H+ antiport. Images PMID:1662387

  13. Identification of a 170-kDa protein associated with the vacuolar Na+/H+ antiport of Beta vulgaris.

    PubMed

    Barkla, B J; Blumwald, E

    1991-12-15

    The effect of the addition of amiloride to the growth medium was tested on the Na+/H+ antiport activity of tonoplast vesicles isolated from sugar beet (beta vulgaris L.) cell suspensions. Cells grown in the presence of NaCl and amiloride displayed an increased antiport activity. Analysis of the kinetic data showed that while the affinity of the antiport for Na+ ions did not change, the maximal velocity of the Na+/H+ exchange increased markedly. These results suggest the addition of more antiport molecules to the tonoplast and/or an increase in the turnover rate of the Na+/H+ exchange. The increase in activity of the antiport by the presence of amiloride was correlated with the enhanced synthesis of a tonoplast 170-kDa polypeptide. The increased synthesis of this polypeptide was detected not only upon exposure of the cells to amiloride but also when the cells were exposed to high NaCl concentrations. Polyclonal antibodies against the 170-kDa polypeptide almost completely inhibited the antiport activity. These results suggest the association of the 170-kDa polypeptide with the vacuolar Na+/H+ antiport.

  14. RSAT: regulatory sequence analysis tools.

    PubMed

    Thomas-Chollier, Morgane; Sand, Olivier; Turatsinze, Jean-Valéry; Janky, Rekin's; Defrance, Matthieu; Vervisch, Eric; Brohée, Sylvain; van Helden, Jacques

    2008-07-01

    The regulatory sequence analysis tools (RSAT, http://rsat.ulb.ac.be/rsat/) is a software suite that integrates a wide collection of modular tools for the detection of cis-regulatory elements in genome sequences. The suite includes programs for sequence retrieval, pattern discovery, phylogenetic footprint detection, pattern matching, genome scanning and feature map drawing. Random controls can be performed with random gene selections or by generating random sequences according to a variety of background models (Bernoulli, Markov). Beyond the original word-based pattern-discovery tools (oligo-analysis and dyad-analysis), we recently added a battery of tools for matrix-based detection of cis-acting elements, with some original features (adaptive background models, Markov-chain estimation of P-values) that do not exist in other matrix-based scanning tools. The web server offers an intuitive interface, where each program can be accessed either separately or connected to the other tools. In addition, the tools are now available as web services, enabling their integration in programmatic workflows. Genomes are regularly updated from various genome repositories (NCBI and EnsEMBL) and 682 organisms are currently supported. Since 1998, the tools have been used by several hundreds of researchers from all over the world. Several predictions made with RSAT were validated experimentally and published.

  15. Effects of porcine 25 kDa amelogenin and its proteolytic derivatives on bone sialoprotein expression.

    PubMed

    Nakayama, Y; Yang, L; Mezawa, M; Araki, S; Li, Z; Wang, Z; Sasaki, Y; Takai, H; Nakao, S; Fukae, M; Ogata, Y

    2010-10-01

    Amelogenins are hydrophobic proteins that are the major component of developing enamel. Enamel matrix derivative has been used for periodontal regeneration. Bone sialoprotein is an early phenotypic marker of osteoblast differentiation. In this study, we examined the ability of porcine amelogenins to regulate bone sialoprotein transcription. To determine the molecular basis of the transcriptional regulation of the bone sialoprotein gene by amelogenins, we conducted northern hybridization, transient transfection analyses and gel mobility shift assays using the osteoblast-like ROS 17/2.8 cells. Amelogenins (100 ng/mL) up-regulated bone sialoprotein mRNA at 3 h, with maximal mRNA expression occurring at 12 h (25 and 20 kDa) and 6 h (13 and 6 kDa). Amelogenins (100 ng/mL, 12 h) increased luciferase activities in pLUC3 (nucleotides -116 to +60), and 6 kDa amelogenin up-regulated pLUC4 (nucleotides -425 to +60) activity. The tyrosine kinase inhibitor inhibited amelogenin-induced luciferase activities, whereas the protein kinase A inhibitor abolished 25 kDa amelogenin-induced bone sialoprotein transcription. The effects of amelogenins were abrogated by 2-bp mutations in the fibroblast growth factor 2 response element (FRE). Gel-shift assays with radiolabeled FRE, homeodomain-protein binding site (HOX) and transforming growth factor-beta1 activation element (TAE) double-strand oligonucleotides revealed increased binding of nuclear proteins from amelogenin-stimulated ROS 17/2.8 cells at 3 h (25 and 13 kDa) and 6 h (20 and 6 kDa). These results demonstrate that porcine 25 kDa amelogenin and its proteolytic derivatives stimulate bone sialoprotein transcription by targeting FRE, HOX and TAE in the bone sialoprotein gene promoter, and that full-length amelogenin and amelogenin cleavage products are able to regulate bone sialoprotein transcription via different signaling pathways. (c) 2010 John Wiley & Sons A/S.

  16. Computer-assisted prediction of HLA-DR binding and experimental analysis for human promiscuous Th1-cell peptides in the 24 kDa secreted lipoprotein (LppX) of Mycobacterium tuberculosis.

    PubMed

    Al-Attiyah, R; Mustafa, A S

    2004-01-01

    The secreted 24 kDa lipoprotein (LppX) is an antigen that is specific for Mycobacterium tuberculosis complex and M. leprae. The present study was carried out to identify the promiscuous T helper 1 (Th1)-cell epitopes of the M. tuberculosis LppX (MT24, Rv2945c) antigen by using 15 overlapping synthetic peptides (25 mers overlapping by 10 residues) covering the sequence of the complete protein. The analysis of Rv2945c sequence for binding to 51 alleles of nine serologically defined HLA-DR molecules, by using a virtual matrix-based prediction program (propred), showed that eight of the 15 peptides of Rv2945c were predicted to bind promiscuously to >/=10 alleles from more than or equal to three serologically defined HLA-DR molecules. The Th1-cell reactivity of all the peptides was assessed in antigen-induced proliferation and interferon-gamma (IFN-gamma)-secretion assays with peripheral blood mononuclear cells (PBMCs) from 37 bacille Calmette-Guérin (BCG)-vaccinated healthy subjects. The results showed that 17 of the 37 donors, which represented an HLA-DR-heterogeneous group, responded to one or more peptides of Rv2945c in the Th1-cell assays. Although each peptide stimulated PBMCs from one or more donors in the above assays, the best positive responses (12/17 (71%) responders) were observed with the peptide p14 (aa 196-220). This suggested a highly promiscuous presentation of p14 to Th1 cells. In addition, the sequence of p14 is completely identical among the LppX of M. tuberculosis, M. bovis and M. leprae, which further supports the usefulness of Rv2945c and p14 in the subunit vaccine design against both tuberculosis and leprosy.

  17. Recombinant expression, isolation, and proteolysis of extracellular matrix-secreted phosphoprotein-24 kDa.

    PubMed

    Murray, Elsa J Brochmann; Murray, Samuel S; Simon, Robert; Behnam, Keyvan

    2007-01-01

    Secreted phosphoprotein-24 kDa (spp24) is an extracellular matrix protein first cloned from bone. Bovine spp24 is transcribed as a 203 amino acid residue protein that undergoes cleavage of a secretory peptide to form the mature protein (spp24, residues 24 to 203). While not osteogenic itself, spp24 is degraded to a pro-osteogenic protein, spp18.5, in bone. Both spp18.5 and spp24 contain a cyclic TRH1 (TGF-beta receptor II homology-1) domain similar to that found in the receptor itself and in fetuin. A synthetic peptide corresponding to the TRH1 domain of spp18.5 and spp24 specifically binds BMP-2 and enhances the rate and magnitude of BMP-2-induced ectopic bone formation in vivo. The parental protein, spp24, exhibits a high affinity for bone and mineral complexes, but its abundance there is low, suggesting that it is rapidly degraded. The availability of recombinant spp24 and its degradation products would facilitate the elucidation of their structure:function relationships. We describe here the expression of His(6)-tagged bovine spp24 (residues 24 to 203) in E. coli, its purification by high-resolution IMAC (immobilized metal affinity chromatography), and the characterization of the full-length recombinant 21.5 kDa protein and its two major 16 kDa and 14.5 kDa degradation products (spp24, residues 24 to 157, and spp24, residues 24 to 143) by mass spectroscopy. The recombinant spp24 protein was resistant to proteolysis by MC3T3-E1 osteoblastic cell extracts in the absence of calcium; however, in the presence of 4 mM Ca, it can undergo essentially complete proteolysis to small peptides, bypassing the 16 kDa and 14.5 kDa intermediates. This confirms the proteolytic susceptibility of spp24. It also suggests that the levels of spp24 in bone may be regulated, in part, by calcium-dependent proteolysis mediated by osteoblastic cells.

  18. Molecular cloning and sequence analysis of two carbonic anhydrase in the swimming crab Portunus trituberculatus and its expression in response to salinity and pH stress.

    PubMed

    Pan, Luqing; Hu, Dongxu; Liu, Maoqi; Hu, Yanyan; Liu, Shengnan

    2016-01-15

    Carbonic anhydrase (CA) is involved in ion transport, acid-base balance and pH regulation by catalyzing the interconversion of CO2 and HCO3(-). In this study, full-length cDNA sequences of two CA isoforms were identified from Portunus trituberculatus. One was Portunus trituberculatus cytoplasmic carbonic anydrase (PtCAc) and the other one was Portunus trituberculatus glycosyl-phosphatidylinositol-linked carbonic anhydrase (PtCAg). The sequence of PtCAc was formed by an ORF of 816 bp, encoding a protein of 30.18 kDa. The PtCAg was constituted by an ORF of 927 bp, encoding a protein of 34.09 kDa. The deduced amino acid sequences of the two CA isoforms were compared to other crustacean' CA sequences. Both of them reflected high conservation of the residues and domains essential to the function of the two enzymes. The tissue expression analysis of PtCAc and PtCAg were detected in gill, muscle, hepatopancreas, hemocytes and gonad. PtCAc and PtCAg gene expressions were studied under salinity and pH challenge. The results showed that when salinity decreased (30 to 20 ppt), the mRNA expression of PtCAc increased significantly at 24 and 48 h, and the highest value appeared at 24h. The mRNA expression of PtCAg had the same situation with PtCAc. However, when salinity increased (30 to 35 ppt), only the mRNA expression of PtCAc increased significantly at 48 h. When pH changed, only the mRNA expression of PtCAc increased significantly at 12h, which was under low pH situation. The mRNA expression of PtCAg increased significantly at 12-48 h, and there was no significant difference of the expression between the pH challenged group and the control group in other experimental time. The results provided the base of understanding CA' function and the underlying mechanism in response to environmental changes in crustaceans. Copyright © 2015 Elsevier B.V. All rights reserved.

  19. Auditory sequence analysis and phonological skill

    PubMed Central

    Grube, Manon; Kumar, Sukhbinder; Cooper, Freya E.; Turton, Stuart; Griffiths, Timothy D.

    2012-01-01

    This work tests the relationship between auditory and phonological skill in a non-selected cohort of 238 school students (age 11) with the specific hypothesis that sound-sequence analysis would be more relevant to phonological skill than the analysis of basic, single sounds. Auditory processing was assessed across the domains of pitch, time and timbre; a combination of six standard tests of literacy and language ability was used to assess phonological skill. A significant correlation between general auditory and phonological skill was demonstrated, plus a significant, specific correlation between measures of phonological skill and the auditory analysis of short sequences in pitch and time. The data support a limited but significant link between auditory and phonological ability with a specific role for sound-sequence analysis, and provide a possible new focus for auditory training strategies to aid language development in early adolescence. PMID:22951739

  20. Molecular cloning and analysis of Schizosaccharomyces pombe Reb1p: sequence-specific recognition of two sites in the far upstream rDNA intergenic spacer.

    PubMed Central

    Zhao, A; Guo, A; Liu, Z; Pape, L

    1997-01-01

    The coding sequences for a Schizosaccharomyces pombe sequence-specific DNA binding protein, Reb1p, have been cloned. The predicted S. pombe Reb1p is 24-29% identical to mouse TTF-1 (transcription termination factor-1) and Saccharomyces cerevisiae REB1 protein, both of which direct termination of RNA polymerase I catalyzed transcripts. The S.pombe Reb1 cDNA encodes a predicted polypeptide of 504 amino acids with a predicted molecular weight of 58.4 kDa. The S. pombe Reb1p is unusual in that the bipartite DNA binding motif identified originally in S.cerevisiae and Klyveromyces lactis REB1 proteins is uninterrupted and thus S.pombe Reb1p may contain the smallest natural REB1 homologous DNA binding domain. Its genomic coding sequences were shown to be interrupted by two introns. A recombinant histidine-tagged Reb1 protein bearing the rDNA binding domain has two homologous, sequence-specific binding sites in the S. pomber DNA intergenic spacer, located between 289 and 480 nt downstream of the end of the approximately 25S rRNA coding sequences. Each binding site is 13-14 bp downstream of two of the three proposed in vivo termination sites. The core of this 17 bp site, AGGTAAGGGTAATGCAC, is specifically protected by Reb1p in footprinting analysis. PMID:9016645

  1. A putative peroxidase cDNA from turnip and analysis of the encoded protein sequence.

    PubMed

    Romero-Gómez, S; Duarte-Vázquez, M A; García-Almendárez, B E; Mayorga-Martínez, L; Cervantes-Avilés, O; Regalado, C

    2008-12-01

    A putative peroxidase cDNA was isolated from turnip roots (Brassica napus L. var. purple top white globe) by reverse transcriptase-polymerase chain reaction (RT-PCR) and rapid amplification of cDNA ends (RACE). Total RNA extracted from mature turnip roots was used as a template for RT-PCR, using a degenerated primer designed to amplify the highly conserved distal motif of plant peroxidases. The resulting partial sequence was used to design the rest of the specific primers for 5' and 3' RACE. Two cDNA fragments were purified, sequenced, and aligned with the partial sequence from RT-PCR, and a complete overlapping sequence was obtained and labeled as BbPA (Genbank Accession No. AY423440, named as podC). The full length cDNA is 1167bp long and contains a 1077bp open reading frame (ORF) encoding a 358 deduced amino acid peroxidase polypeptide. The putative peroxidase (BnPA) showed a calculated Mr of 34kDa, and isoelectric point (pI) of 4.5, with no significant identity with other reported turnip peroxidases. Sequence alignment showed that only three peroxidases have a significant identity with BnPA namely AtP29a (84%), and AtPA2 (81%) from Arabidopsis thaliana, and HRPA2 (82%) from horseradish (Armoracia rusticana). Work is in progress to clone this gene into an adequate host to study the specific role and possible biotechnological applications of this alternative peroxidase source.

  2. High expression of 23 kDa protein of augmenter of liver regeneration (ALR) in human hepatocellular carcinoma

    PubMed Central

    Yu, Hai-Ying; Zhu, Man-Hua; Xiang, Dai-Rong; Li, Jun; Sheng, Ji-Fang

    2014-01-01

    Background Augmenter of liver regeneration (ALR) is an important polypeptide that participates in the process of liver regeneration. Two forms of ALR proteins are expressed in hepatocytes. Previous data have shown that ALR is essential for cell survival and has potential antimetastatic properties in hepatocellular carcinoma (HCC). Aims The study aimed to evaluate the expression levels of two forms of ALR proteins in HCC and their possible significance in HCC development. Methods Balb/c mouse monoclonal antibody against ALR protein was prepared in order to detect the ALR protein in HCC by Western blotting and immunohistochemistry. ALR mRNA expression levels were measured by real-time polymerase chain reaction in HCC tissues and compared to paracancerous liver tissues in 22 HCC patients. Results ALR mRNA expression in HCC liver tissues (1.51×106 copies/μL) was higher than in paracancerous tissues (1.04×104 copies/μL). ALR protein expression was also enhanced in HCC liver tissues. The enhanced ALR protein was shown to be 23 kDa by Western blotting. Immunohistochemical analysis showed that the 23 kDa ALR protein mainly existed in the hepatocyte cytosol. Conclusion The 23 kDa ALR protein was highly expressed in HCC and may play an important role in hepatocarcinogenesis. PMID:24940072

  3. Calmyonemin: a 23 kDa analogue of algal centrin occurring in contractile myonemes of Eudiplodinium maggii (ciliate).

    PubMed

    David, C; Viguès, B

    1994-01-01

    Myonemes are bundles of thin filaments (3-6 nm in diameter) which mediate calcium-induced contraction of the whole or only parts of the cell body in a number of protists. In Eudiplodinium maggii, a rumen ciliate which lacks a uniform ciliation of the cell body, myonemes converge toward the bases of apical ciliary zones that can be retracted under stress conditions, entailing immobilization of the cell. An mAB (A69) has been produced that identifies a calcium-binding protein by immunoblot, immunoprecipitation experiments and specifically labels the myonemes in immunoelectron microscopy. Solubility properties, apparent molecular weight (23 kDa) and isoelectric point (4.9) of the myonemal protein, are similar to the values reported for the calcium-modulated contractile protein centrin. Western-blot analysis indicates that the 23 kDa protein cross-reacts antigenically with anti-centrin antibodies. In addition, the 23 kDa protein displays calcium-induced changes in both electrophoretic and chromatographic behaviour, and contains calcium-binding domains that conform to the EF-hand structure, as known for centrin. Based on these observations, we conclude that a calcium-binding protein with major similarities to centrin occurs in the myonemes of E. maggii. We postulate that this protein plays an essential role in myoneme-mediated retraction of the ciliature.

  4. Processing of carcinoembryonic antigen by Kupffer cells: recognition of a penta-peptide sequence.

    PubMed

    Gangopadhyay, A; Thomas, P

    1996-10-01

    Carcinoembryonic antigen (CEA) binds to an 80-kDa cell surface receptor on Kupffer cells via the peptide sequence PELPK (residues 108-112) located at the hinge region between the N and Al immunoglobulin-like domains. This study is aimed at analyzing the specificity of the peptide binding, determining biodistribution of 80-kDa receptor, and processing of CEA by this receptor. We synthesized a number of bovine serum albumin (BSA) derivatives carrying PELPK and related sequences. A series of peptides (YPELPK, YPDLPK, YPDLPR, and YPELGK) were conjugated to bovine serum albumin using N-hydroxysuccinimidyl-4-azidobenzoate. When 125I peptide conjugates, CEA, and BSA were injected intravenously into rats CEA and the PELPK-albumin conjugate were cleared rapidly. The other peptide conjugates and BSA cleared at a much slower rate. Activity of 125I-labeled CEA and PELPK-albumin conjugate per gram of tissue was highest for the liver and spleen. Clearance of 125I-CEA was inhibited by the presence of higher concentrations of the PELPK-albumin conjugate. With isolated rat Kupffer cells, only CEA and the PELPK-albumin conjugate were bound and internalized in vitro and CEA binding was inhibited by higher concentrations of the PELPK-albumin conjugate. Similarly, binding of the PELPK-albumin conjugate was inhibited by the presence of unlabeled CEA. Use of a heterobifunctional cross linking agent demonstrated reaction of the PELPK-albumin with an 80-kDa protein on the Kupffer cell surface by SDS-polyacrylamide gel electrophoresis (SDS-PAGE). This semisynthetic ligand (PELPK-albumin) allows us to examine the function of the 80-kDa receptor without interference due to other properties of CEA including its ability to bind lectins and to cause homotypic aggregation of cells. The consequences of CEA binding to the 80-kDa receptor may have implications in the development of hepatic metastasis from colorectal cancer.

  5. Genome organisation and sequence comparison suggest intraspecies incongruence in M RNA of Watermelon bud necrosis virus.

    PubMed

    Kumar, Rakesh; Mandal, B; Geetanjali, A S; Jain, R K; Jaiwal, P K

    2010-08-01

    Watermelon bud necrosis virus (WBNV), a member of the genus Tospovirus, family Bunyaviridae is an important viral pathogen in watermelon cultivation in India. The complete genome sequence properties of WBNV are not available. In the present study, the complete M RNA sequence and the genome organisation of a WBNV isolate infecting watermelon in Delhi (WBNV-wDel) were determined. The M RNA was 4,794 nucleotides (nt) long and potentially coded for a movement protein (NSm) of 34.22 kDa (307 amino acids) on the viral sense strand and a Gn/Gc glycoprotein precursor of 127.15 kDa (1,121 amino acids) on the complementary strand. The two open reading frames were separated by an intergenic region of 402 nt. The 5' and 3' untranslated regions were 55 and 47 nt long, respectively, containing complementary termini typical of tospoviruses. WBNV-wDel was most closely related (79.1% identity) to Groundnut bud necrosis virus, an important tospovirus that occurs in several crops in India, and was different (63.3-75.2% identity) from the other cucurbit-infecting tospoviruses known to occur in Taiwan and Japan. Sequence analysis of NSm and Gn/Gc revealed phylogenetic incongruence between WBNV-wDel and another isolate originating from central India (WBNV-Wm-Som isolate). The Wm-Som isolate showed evolutionary divergence from the wDel isolate in the Gn/Gc protein (74.6% identity) potentially due to recombination with the other tospoviruses that are known to occur in India. This is the first report of a comparison of complete sequences of M RNA of WBNV.

  6. Molecular cloning, sequencing, and expression of Eimeria tenella HSP70 partial gene.

    PubMed

    Bogado, A L G; Martins, G F; Sasse, J P; Guimarães, J da S; Garcia, J L

    2017-03-15

    Members of the Eimeria genus are protozoan parasites of the subphylum Apicomplexa (Eimeriidae family), and belong to the coccidia group. Eimeria tenella is one of the most pathogenic species owing to its ability to penetrate the mucosa, and cause inflammation and damage. It is an obligate intracellular parasite that causes disease by destroying the host cells during multiplication. Heat shock protein 70 (HSP70) is a molecular chaperone that prevents cellular stress. The objective of this study was to clone, sequence, and express E. tenella HSP70 protein. After selecting the region of highest hydrophilicity in the hsp70 gene, we cloned complementary DNA (cDNA) into a pTrcHis2-TOPO vector and transformed it into TOP10 Escherichia coli cells; after induction, the bacteria expressed a 23-kDa protein with insoluble expression levels of approximately 5 mg/L. In summary, the partial hsp70 gene was successfully expressed in E. coli, producing a 23-kDa protein under insoluble conditions, and the antigen characteristics predicted by hydrophilicity analysis suggest the development of a vaccine for use in avian coccidiosis.

  7. A recombinant 63-kDa form of Bacillus anthracis protective antigen produced in the yeast Saccharomyces cerevisiae provides protection in rabbit and primate inhalational challenge models of anthrax infection.

    PubMed

    Hepler, Robert W; Kelly, Rosemarie; McNeely, Tessie B; Fan, Hongxia; Losada, Maria C; George, Hugh A; Woods, Andrea; Cope, Leslie D; Bansal, Alka; Cook, James C; Zang, Gina; Cohen, Steven L; Wei, Xiaorong; Keller, Paul M; Leffel, Elizabeth; Joyce, Joseph G; Pitt, Louise; Schultz, Loren D; Jansen, Kathrin U; Kurtz, Myra

    2006-03-06

    Infection by Bacillus anthracis is preventable by prophylactic vaccination with several naturally derived and recombinant vaccine preparations. Existing data suggests that protection is mediated by antibodies directed against the protective antigen (PA) component of the anthrax toxin complex. PA is an 83-kDa protein cleaved in vivo to yield a biologically active 63-kDa protein. In an effort to evaluate the potential of yeast as an expression system for the production of recombinant PA, and to determine if the yeast-purified rPA63 can protect from a lethal inhalational challenge, the sequence of the 63-kDa form of PA was codon-optimized and expressed in the yeast Saccharomyces cerevisiae. Highly purified rPA63 isolated from Saccharomyces under denaturing conditions demonstrated reduced biological activity in a macrophage-killing assay compared to non-denatured rPA83 purified from Escherichia coli. Rabbits and non-human primates (NHP) immunized with rPA63 and later challenged with a lethal dose of B. anthracis spores were generally protected from infection. These results indicate that epitopes present in the 63-kDa from of PA can protect rabbits and non-human primates from a lethal spore challenge, and further suggest that a fully functional rPA63 is not required in order to provide these epitopes.

  8. Modern Computational Techniques for the HMMER Sequence Analysis

    PubMed Central

    2013-01-01

    This paper focuses on the latest research and critical reviews on modern computing architectures, software and hardware accelerated algorithms for bioinformatics data analysis with an emphasis on one of the most important sequence analysis applications—hidden Markov models (HMM). We show the detailed performance comparison of sequence analysis tools on various computing platforms recently developed in the bioinformatics society. The characteristics of the sequence analysis, such as data and compute-intensive natures, make it very attractive to optimize and parallelize by using both traditional software approach and innovated hardware acceleration technologies. PMID:25937944

  9. Accumulation of 52 kDa glycine rich protein in auxin-deprived strawberry fruits and its role in fruit growth. [Fragaria ananassa

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Reddy, A.S.N.; Poovaiah, B.W.

    1987-04-01

    Growth of strawberry (Fragaria ananassa Duch) receptacles can be stopped at any stage by deachening the fruits and can be resumed by exogenous application of auxin. In their earlier studies they demonstrated auxin regulated polypeptide changes at different stages of strawberry fruit development. Removal of achenes from fruits to deprive auxin resulted in the accumulation of 52 KDa polypeptide. This polypeptide is associated with cell wall and its concentration is increased in a time-dependent manner in auxin deprived receptacles. Incorporation studies with (/sup 35/S) methionine showed the promotion of labelling of 52 kDa polypeptide in the auxin-deprived receptacles within 12more » h after removal of the achenes. Amino acid analysis revealed that the 52 KDa polypeptide is rich in glycine. Their studies, with normal and mutant strawberry receptacles, indicate that the synthesis and accumulation of this glycine rich protein correlates with cessation of receptacle growth. These results suggest a role for the glycine rich protein in growth.« less

  10. Different effects of 25-kDa amelogenin on the proliferation, attachment and migration of various periodontal cells

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Xiting; Shu, Rong, E-mail: shurong123@hotmail.com; Liu, Dali

    Previous studies have assumed that amelogenin is responsible for the therapeutic effect of the enamel matrix derivative (EMD) in periodontal tissue healing and regeneration. However, it is difficult to confirm this hypothesis because both the EMD and the amelogenins are complex mixtures of multiple proteins. Further adding to the difficulties is the fact that periodontal tissue regeneration involves various types of cells and a sequence of associated cellular events including the attachment, migration and proliferation of various cells. In this study, we investigated the potential effect of a 25-kDa recombinant porcine amelogenin (rPAm) on primarily cultured periodontal ligament fibroblasts (PDLF),more » gingival fibroblasts (GF) and gingival epithelial cells (GEC). The cells were treated with 25-kDa recombinant porcine amelogenin at a concentration of 10 {mu}g/mL. We found that rPAm significantly promoted the proliferation and migration of PDLF, but not their adhesion. Similarly, the proliferation and adhesion of GF were significantly enhanced by treatment with rPAm, while migration was greatly inhibited. Interestingly, this recombinant protein inhibited the growth rate, cell adhesion and migration of GEC. These data suggest that rPAm may play an essential role in periodontal regeneration through the activation of periodontal fibroblasts and inhibition of the cellular behaviors of gingival epithelial cells.« less

  11. SOBA: sequence ontology bioinformatics analysis.

    PubMed

    Moore, Barry; Fan, Guozhen; Eilbeck, Karen

    2010-07-01

    The advent of cheaper, faster sequencing technologies has pushed the task of sequence annotation from the exclusive domain of large-scale multi-national sequencing projects to that of research laboratories and small consortia. The bioinformatics burden placed on these laboratories, some with very little programming experience can be daunting. Fortunately, there exist software libraries and pipelines designed with these groups in mind, to ease the transition from an assembled genome to an annotated and accessible genome resource. We have developed the Sequence Ontology Bioinformatics Analysis (SOBA) tool to provide a simple statistical and graphical summary of an annotated genome. We envisage its use during annotation jamborees, genome comparison and for use by developers for rapid feedback during annotation software development and testing. SOBA also provides annotation consistency feedback to ensure correct use of terminology within annotations, and guides users to add new terms to the Sequence Ontology when required. SOBA is available at http://www.sequenceontology.org/cgi-bin/soba.cgi.

  12. Molecular characterization of a novel luteovirus infecting apple by next-generation sequencing.

    PubMed

    Shen, Pan; Tian, Xin; Zhang, Song; Ren, Fang; Li, Ping; Yu, Yun-Qi; Li, Ruhui; Zhou, Changyong; Cao, Mengji

    2018-03-01

    A new single-stranded positive-sense RNA virus, which shares the highest nucleotide (nt) sequence identity of 53.4% with the genome sequence of cherry-associated luteovirus South Korean isolate (ChALV-SK, genus Luteovirus), was discovered in this work. It is provisionally named apple-associated luteovirus (AaLV). The complete genome sequence of AaLV comprises 5,890 nt and contains eight open reading frames (ORFs), in a very similar arrangement that is typical of members of the genus Luteovirus. When compared with other members of the family Luteoviridae, ORF1 of AaLV was found to encompass another ORF, ORF1a, which encodes a putative 32.9-kDa protein. The ORF1-ORF2 region (RNA-dependent RNA polymerase, RdRP) showed the greatest amino acid (aa) sequence identity (59.7%) to that of cherry-associated luteovirus Czech Republic isolate (ChALV-CZ, genus Luteovirus). The results of genome sequence comparisons and phylogenetic analysis, suggest that AaLV should be a member of a novel species in the genus Luteovirus. To our knowledge, it is the sixth member of the genus Luteovirus reported to naturally infect rosaceous plants.

  13. Homonuclear Hartmann-Hahn transfer with reduced relaxation losses by use of the MOCCA-XY16 multiple pulse sequence

    NASA Astrophysics Data System (ADS)

    Furrer, Julien; Kramer, Frank; Marino, John P.; Glaser, Steffen J.; Luy, Burkhard

    2004-01-01

    Homonuclear Hartmann-Hahn transfer is one of the most important building blocks in modern high-resolution NMR. It constitutes a very efficient transfer element for the assignment of proteins, nucleic acids, and oligosaccharides. Nevertheless, in macromolecules exceeding ˜10 kDa TOCSY-experiments can show decreasing sensitivity due to fast transverse relaxation processes that are active during the mixing periods. In this article we propose the MOCCA-XY16 multiple pulse sequence, originally developed for efficient TOCSY transfer through residual dipolar couplings, as a homonuclear Hartmann-Hahn sequence with improved relaxation properties. A theoretical analysis of the coherence transfer via scalar couplings and its relaxation behavior as well as experimental transfer curves for MOCCA-XY16 relative to the well-characterized DIPSI-2 multiple pulse sequence are given.

  14. Homonuclear Hartmann-Hahn transfer with reduced relaxation losses by use of the MOCCA-XY16 multiple pulse sequence.

    PubMed

    Furrer, Julien; Kramer, Frank; Marino, John P; Glaser, Steffen J; Luy, Burkhard

    2004-01-01

    Homonuclear Hartmann-Hahn transfer is one of the most important building blocks in modern high-resolution NMR. It constitutes a very efficient transfer element for the assignment of proteins, nucleic acids, and oligosaccharides. Nevertheless, in macromolecules exceeding approximately 10 kDa TOCSY-experiments can show decreasing sensitivity due to fast transverse relaxation processes that are active during the mixing periods. In this article we propose the MOCCA-XY16 multiple pulse sequence, originally developed for efficient TOCSY transfer through residual dipolar couplings, as a homonuclear Hartmann-Hahn sequence with improved relaxation properties. A theoretical analysis of the coherence transfer via scalar couplings and its relaxation behavior as well as experimental transfer curves for MOCCA-XY16 relative to the well-characterized DIPSI-2 multiple pulse sequence are given.

  15. Image sequence analysis workstation for multipoint motion analysis

    NASA Astrophysics Data System (ADS)

    Mostafavi, Hassan

    1990-08-01

    This paper describes an application-specific engineering workstation designed and developed to analyze motion of objects from video sequences. The system combines the software and hardware environment of a modem graphic-oriented workstation with the digital image acquisition, processing and display techniques. In addition to automation and Increase In throughput of data reduction tasks, the objective of the system Is to provide less invasive methods of measurement by offering the ability to track objects that are more complex than reflective markers. Grey level Image processing and spatial/temporal adaptation of the processing parameters is used for location and tracking of more complex features of objects under uncontrolled lighting and background conditions. The applications of such an automated and noninvasive measurement tool include analysis of the trajectory and attitude of rigid bodies such as human limbs, robots, aircraft in flight, etc. The system's key features are: 1) Acquisition and storage of Image sequences by digitizing and storing real-time video; 2) computer-controlled movie loop playback, freeze frame display, and digital Image enhancement; 3) multiple leading edge tracking in addition to object centroids at up to 60 fields per second from both live input video or a stored Image sequence; 4) model-based estimation and tracking of the six degrees of freedom of a rigid body: 5) field-of-view and spatial calibration: 6) Image sequence and measurement data base management; and 7) offline analysis software for trajectory plotting and statistical analysis.

  16. Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing

    PubMed Central

    Matochko, Wadim L.; Derda, Ratmir

    2013-01-01

    Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N × 1 frequency vector n = ||ni||, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N × N matrix and a stochastic sampling operator (S a). The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of S a and use them to define the sequencing operator (S e q). Sequencing without any bias and errors is S e q = S a IN, where IN is a N × N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (C E N), which describes elimination or statistically significant downsampling, of specific reads during the sequencing process. PMID:24416071

  17. The VPH1 gene encodes a 95-kDa integral membrane polypeptide required for in vivo assembly and activity of the yeast vacuolar H(+)-ATPase.

    PubMed

    Manolson, M F; Proteau, D; Preston, R A; Stenbit, A; Roberts, B T; Hoyt, M A; Preuss, D; Mulholland, J; Botstein, D; Jones, E W

    1992-07-15

    Yeast vacuolar acidification-defective (vph) mutants were identified using the pH-sensitive fluorescence of 6-carboxyfluorescein diacetate (Preston, R. A., Murphy, R. F., and Jones, E. W. (1989) Proc. Natl. Acad. Sci. U.S.A. 86, 7027-7031). Vacuoles purified from yeast bearing the vph1-1 mutation had no detectable bafilomycin-sensitive ATPase activity or ATP-dependent proton pumping. The peripherally bound nucleotide-binding subunits of the vacuolar H(+)-ATPase (60 and 69 kDa) were no longer associated with vacuolar membranes yet were present in wild type levels in yeast whole cell extracts. The VPH1 gene was cloned by complementation of the vph1-1 mutation and independently cloned by screening a lambda gt11 expression library with antibodies directed against a 95-kDa vacuolar integral membrane protein. Deletion disruption of the VPH1 gene revealed that the VPH1 gene is not essential for viability but is required for vacuolar H(+)-ATPase assembly and vacuolar acidification. VPH1 encodes a predicted polypeptide of 840 amino acid residues (molecular mass 95.6 kDa) and contains six putative membrane-spanning regions. Cell fractionation and immunodetection demonstrate that Vph1p is a vacuolar integral membrane protein that co-purifies with vacuolar H(+)-ATPase activity. Multiple sequence alignments show extensive homology over the entire lengths of the following four polypeptides: Vph1p, the 116-kDa polypeptide of the rat clathrin-coated vesicles/synaptic vesicle proton pump, the predicted polypeptide encoded by the yeast gene STV1 (Similar To VPH1, identified as an open reading frame next to the BUB2 gene), and the TJ6 mouse immune suppressor factor.

  18. Electron microscopic analysis and structural characterization of novel NADP(H)-containing methanol: N,N'-dimethyl-4-nitrosoaniline oxidoreductases from the gram-positive methylotrophic bacteria Amycolatopsis methanolica and Mycobacterium gastri MB19.

    PubMed Central

    Bystrykh, L V; Vonck, J; van Bruggen, E F; van Beeumen, J; Samyn, B; Govorukhina, N I; Arfman, N; Duine, J A; Dijkhuizen, L

    1993-01-01

    The quaternary protein structure of two methanol:N,N'-dimethyl-4-nitrosoaniline (NDMA) oxidoreductases purified from Amycolatopsis methanolica and Mycobacterium gastri MB19 was analyzed by electron microscopy and image processing. The enzymes are decameric proteins (displaying fivefold symmetry) with estimated molecular masses of 490 to 500 kDa based on their subunit molecular masses of 49 to 50 kDa. Both methanol:NDMA oxidoreductases possess a tightly but noncovalently bound NADP(H) cofactor at an NADPH-to-subunit molar ratio of 0.7. These cofactors are redox active toward alcohol and aldehyde substrates. Both enzymes contain significant amounts of Zn2+ and Mg2+ ions. The primary amino acid sequences of the A. methanolica and M. gastri MB19 methanol:NDMA oxidoreductases share a high degree of identity, as indicated by N-terminal sequence analysis (63% identity among the first 27 N-terminal amino acids), internal peptide sequence analysis, and overall amino acid composition. The amino acid sequence analysis also revealed significant similarity to a decameric methanol dehydrogenase of Bacillus methanolicus C1. Images PMID:8449887

  19. Information theory applications for biological sequence analysis.

    PubMed

    Vinga, Susana

    2014-05-01

    Information theory (IT) addresses the analysis of communication systems and has been widely applied in molecular biology. In particular, alignment-free sequence analysis and comparison greatly benefited from concepts derived from IT, such as entropy and mutual information. This review covers several aspects of IT applications, ranging from genome global analysis and comparison, including block-entropy estimation and resolution-free metrics based on iterative maps, to local analysis, comprising the classification of motifs, prediction of transcription factor binding sites and sequence characterization based on linguistic complexity and entropic profiles. IT has also been applied to high-level correlations that combine DNA, RNA or protein features with sequence-independent properties, such as gene mapping and phenotype analysis, and has also provided models based on communication systems theory to describe information transmission channels at the cell level and also during evolutionary processes. While not exhaustive, this review attempts to categorize existing methods and to indicate their relation with broader transversal topics such as genomic signatures, data compression and complexity, time series analysis and phylogenetic classification, providing a resource for future developments in this promising area.

  20. Cardioprotective effects of 70-kDa heat shock protein in transgenic mice.

    PubMed

    Radford, N B; Fina, M; Benjamin, I J; Moreadith, R W; Graves, K H; Zhao, P; Gavva, S; Wiethoff, A; Sherry, A D; Malloy, C R; Williams, R S

    1996-03-19

    Heat shock proteins are proposed to limit injury resulting from diverse environmental stresses, but direct metabolic evidence for such a cytoprotective function in vertebrates has been largely limited to studies of cultured cells. We generated lines of transgenic mice to express human 70-kDa heat shock protein constitutively in the myocardium. Hearts isolated from these animals demonstrated enhanced recovery of high energy phosphate stores and correction of metabolic acidosis following brief periods of global ischemia sufficient to induce sustained abnormalities of these variables in hearts from nontransgenic littermates. These data demonstrate a direct cardioprotective effect of 70-kDa heat shock protein to enhance postischemic recovery of the intact heart.

  1. Sequence analysis by iterated maps, a review.

    PubMed

    Almeida, Jonas S

    2014-05-01

    Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, 'Chaos Game Representation'. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results.

  2. [Cloning, sequencing and prokaryotic expression of cDNAs for the antifreeze protein family from the beetle Tenebrio molitor].

    PubMed

    Liu, Zhong-Yuan; Wang, Yun; Lü, Guo-Dong; Wang, Xian-Lei; Zhang, Fu-Chun; Ma, Ji

    2006-12-01

    The partial cDNA sequence coding for the antifreeze proteins in the Tenebrio molitor was obtained by RT-PCR. Sequence analysis revealed nine putative cDNAs with a high degree of homology to Tenebrio molitor antifreeze proteins. The recombinant pGEX-4T-1-tmafp-XJ430 was introduced into E. coli BL21 to induce a GST fusion protein by IPTG. SDS-PAGE of the fusion protein demonstrated that the antifreeze protein migrated at a size of 38 kDa. The immunization was performed by intra-muscular injection of pCDNA3-tmafp-XJ430, and then antiserum was detected by ELISA. The titer of the antibody was 1:2,000. Western blotting analysis showed the antiserum was specific against the antifreeze protein. This finding could lead to further investigation of the properties and function of antifreeze proteins.

  3. Nucleotide Sequence and Genetic Structure of a Novel Carbaryl Hydrolase Gene (cehA) from Rhizobium sp. Strain AC100

    PubMed Central

    Hashimoto, Masayuki; Fukui, Mitsuru; Hayano, Kouichi; Hayatsu, Masahito

    2002-01-01

    Rhizobium sp. strain AC100, which is capable of degrading carbaryl (1-naphthyl-N-methylcarbamate), was isolated from soil treated with carbaryl. This bacterium hydrolyzed carbaryl to 1-naphthol and methylamine. Carbaryl hydrolase from the strain was purified to homogeneity, and its N-terminal sequence, molecular mass (82 kDa), and enzymatic properties were determined. The purified enzyme hydrolyzed 1-naphthyl acetate and 4-nitrophenyl acetate indicating that the enzyme is an esterase. We then cloned the carbaryl hydrolase gene (cehA) from the plasmid DNA of the strain and determined the nucleotide sequence of the 10-kb region containing cehA. No homologous sequences were found by a database homology search using the nucleotide and deduced amino acid sequences of the cehA gene. Six open reading frames including the cehA gene were found in the 10-kb region, and sequencing analysis shows that the cehA gene is flanked by two copies of insertion sequence-like sequence, suggesting that it makes part of a composite transposon. PMID:11872471

  4. Expression, purification, and characterization of a bifunctional 99-kDa peptidoglycan hydrolase from Pediococcus acidilactici ATCC 8042.

    PubMed

    García-Cano, Israel; Campos-Gómez, Manuel; Contreras-Cruz, Mariana; Serrano-Maldonado, Carlos Eduardo; González-Canto, Augusto; Peña-Montes, Carolina; Rodríguez-Sanoja, Romina; Sánchez, Sergio; Farrés, Amelia

    2015-10-01

    Pediococcus acidilactici ATCC 8042 is a lactic acid bacteria that inhibits pathogenic microorganisms such as Staphylococcus aureus through the production of two proteins with lytic activity, one of 110 kDa and the other of 99 kDa. The 99-kDa one has high homology to a putative peptidoglycan hydrolase (PGH) enzyme reported in the genome of P. acidilactici 7_4, where two different lytic domains have been identified but not characterized. The aim of this work was the biochemical characterization of the recombinant enzyme of 99 kDa. The enzyme was cloned and expressed successfully and retains its activity against Micrococcus lysodeikticus. It has a higher N-acetylglucosaminidase activity, but the N-acetylmuramoyl-L-alanine amidase can also be detected spectrophotometrically. The protein was then purified using gel filtration chromatography. Antibacterial activity showed an optimal pH of 6.0 and was stable between 5.0 and 7.0. The optimal temperature for activity was 60 °C, and all activity was lost after 1 h of incubation at 70 °C. The number of strains susceptible to the recombinant 99-kDa enzyme was lower than that susceptible to the mixture of the 110- and 99-kDa PGHs of P. acidilactici, a result that suggests synergy between these two enzymes. This is the first PGH from LAB that has been shown to possess two lytic sites. The results of this study will aid in the design of new antibacterial agents from natural origin that can combat foodborne disease and improve hygienic practices in the industrial sector.

  5. Cardioprotective effects of 70-kDa heat shock protein in transgenic mice.

    PubMed Central

    Radford, N B; Fina, M; Benjamin, I J; Moreadith, R W; Graves, K H; Zhao, P; Gavva, S; Wiethoff, A; Sherry, A D; Malloy, C R; Williams, R S

    1996-01-01

    Heat shock proteins are proposed to limit injury resulting from diverse environmental stresses, but direct metabolic evidence for such a cytoprotective function in vertebrates has been largely limited to studies of cultured cells. We generated lines of transgenic mice to express human 70-kDa heat shock protein constitutively in the myocardium. Hearts isolated from these animals demonstrated enhanced recovery of high energy phosphate stores and correction of metabolic acidosis following brief periods of global ischemia sufficient to induce sustained abnormalities of these variables in hearts from nontransgenic littermates. These data demonstrate a direct cardioprotective effect of 70-kDa heat shock protein to enhance postischemic recovery of the intact heart. Images Fig. 1 Fig. 3 PMID:8637874

  6. Carboxyl methylation of 21-23 kDa membrane proteins in intact neuroblastoma cells is increased with differentiation.

    PubMed

    Haklai, R; Kloog, Y

    1990-01-01

    Evidence is presented for specific enzymatic methylation of 21-23 kDa membrane proteins in intact neuroblastoma N1E 115 cells, which is increased in dimethylsulfoxide-induced differentiated cells. Methylation of these proteins has characteristics typical of enzymatic reactions in which base labile volatile methyl groups are incorporated into proteins, consistent with the formation of protein carboxyl methylesters. However, these methylesters of the 21-23 kDa proteins are relatively stable compared to other protein carboxyl methylesters. The 3-fold increase in methylated 21-23 kDa proteins in the differentiated cells suggest biological significance in differentiation of the cell membranes.

  7. Neurospora tryptophan synthase: N-terminal analysis and the sequence of the pyridoxal phosphate active site peptide

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pratt, M.L.; Hsu, P.Y.; DeMoss, J.A.

    1986-05-01

    Tryptophan synthase (TS), which catalyzes the final step of tryptophan biosynthesis, is a multifunctional protein requiring pyridoxal phosphate (B6P) for two of its three distinct enzyme activities. TS from Neurospora has a blocked N-terminal, is a homodimer of 150 KDa and binds one mole of B6P per mole of subunit. The authors shown the N-terminal residue to be acyl-serine. The B6P-active site of holoenzyme was labelled by reduction of the B6P-Schiff base with (/sup 3/H)-NaBH/sub 4/, and resulted in a proportionate loss of activity in the two B6P-requiring reactions. SDS-polyacrylamide gel electrophoresis of CNBr-generated peptides showed the labelled, active sitemore » peptide to be 6 KDa. The sequence of this peptide, purified to apparent homogeneity by a combination of C-18 reversed phase and TSK gel filtration HPLC is: gly-arg-pro-gly-gln-leu-his-lys-ala-glu-arg-leu-thr-glu-tyr-ala-gly-gly-ala-gln-ile-xxx-leu-lys-arg-glu-asp-leu-asn-his-xxx-gly-xxx-his-/sub ***/-ile-asn-asn-ala-leu. Although four residues (xxx, /sub ***/) are unidentified, this peptide is minimally 78% homologous with the corresponding peptide from yeast TS, in which residue (/sub ***/) is the lysine that binds B6P.« less

  8. Rickettsia asembonensis Characterization by Multilocus Sequence Typing of Complete Genes, Peru.

    PubMed

    Loyola, Steev; Flores-Mendoza, Carmen; Torre, Armando; Kocher, Claudine; Melendrez, Melanie; Luce-Fedrow, Alison; Maina, Alice N; Richards, Allen L; Leguia, Mariana

    2018-05-01

    While studying rickettsial infections in Peru, we detected Rickettsia asembonensis in fleas from domestic animals. We characterized 5 complete genomic regions (17kDa, gltA, ompA, ompB, and sca4) and conducted multilocus sequence typing and phylogenetic analyses. The molecular isolate from Peru is distinct from the original R. asembonensis strain from Kenya.

  9. mESAdb: microRNA Expression and Sequence Analysis Database

    PubMed Central

    Kaya, Koray D.; Karakülah, Gökhan; Yakıcıer, Cengiz M.; Acar, Aybar C.; Konu, Özlen

    2011-01-01

    microRNA expression and sequence analysis database (http://konulab.fen.bilkent.edu.tr/mirna/) (mESAdb) is a regularly updated database for the multivariate analysis of sequences and expression of microRNAs from multiple taxa. mESAdb is modular and has a user interface implemented in PHP and JavaScript and coupled with statistical analysis and visualization packages written for the R language. The database primarily comprises mature microRNA sequences and their target data, along with selected human, mouse and zebrafish expression data sets. mESAdb analysis modules allow (i) mining of microRNA expression data sets for subsets of microRNAs selected manually or by motif; (ii) pair-wise multivariate analysis of expression data sets within and between taxa; and (iii) association of microRNA subsets with annotation databases, HUGE Navigator, KEGG and GO. The use of existing and customized R packages facilitates future addition of data sets and analysis tools. Furthermore, the ability to upload and analyze user-specified data sets makes mESAdb an interactive and expandable analysis tool for microRNA sequence and expression data. PMID:21177657

  10. mESAdb: microRNA expression and sequence analysis database.

    PubMed

    Kaya, Koray D; Karakülah, Gökhan; Yakicier, Cengiz M; Acar, Aybar C; Konu, Ozlen

    2011-01-01

    microRNA expression and sequence analysis database (http://konulab.fen.bilkent.edu.tr/mirna/) (mESAdb) is a regularly updated database for the multivariate analysis of sequences and expression of microRNAs from multiple taxa. mESAdb is modular and has a user interface implemented in PHP and JavaScript and coupled with statistical analysis and visualization packages written for the R language. The database primarily comprises mature microRNA sequences and their target data, along with selected human, mouse and zebrafish expression data sets. mESAdb analysis modules allow (i) mining of microRNA expression data sets for subsets of microRNAs selected manually or by motif; (ii) pair-wise multivariate analysis of expression data sets within and between taxa; and (iii) association of microRNA subsets with annotation databases, HUGE Navigator, KEGG and GO. The use of existing and customized R packages facilitates future addition of data sets and analysis tools. Furthermore, the ability to upload and analyze user-specified data sets makes mESAdb an interactive and expandable analysis tool for microRNA sequence and expression data.

  11. U2 small nuclear ribonucleoprotein particle (snRNP) auxiliary factor of 65 kDa, U2AF65, can promote U1 snRNP recruitment to 5' splice sites.

    PubMed Central

    Förch, Patrik; Merendino, Livia; Martínez, Concepción; Valcárcel, Juan

    2003-01-01

    The splicing factor U2AF(65), U2 small nuclear ribonucleoprotein particle (snRNP) auxillary factor of 65 kDa, binds to pyrimidine-rich sequences at 3' splice sites to recruit U2 snRNP to pre-mRNAs. We report that U2AF(65) can also promote the recruitment of U1 snRNP to weak 5' splice sites that are followed by uridine-rich sequences. The arginine- and serine-rich domain of U2AF(65) is critical for U1 recruitment, and we discuss the role of its RNA-RNA annealing activity in this novel function of U2AF(65). PMID:12558503

  12. T-cell epitope analysis using subtracted expression libraries (TEASEL): application to a 38-kDA autoantigen recognized by T cells from an insulin-dependent diabetic patient.

    PubMed Central

    Neophytou, P I; Roep, B O; Arden, S D; Muir, E M; Duinkerken, G; Kallan, A; de Vries, R R; Hutton, J C

    1996-01-01

    Studies on circulating T cells and antibodies in newly diagnosed type 1 diabetic patients and rodent models of autoimmune diabetes suggest that beta-cell membrane proteins of 38 kDa may be important molecular targets of autoimmune attack. Biochemical approaches to the isolation and identification of the 38-kDa autoantigen have been hampered by the restricted availability of islet tissue and the low abundance of the protein. A procedure of epitope analysis for CD4+ T cells using subtracted expression libraries (TEASEL) was developed and used to clone a 70-amino acid pancreatic beta-cell peptide incorporating an epitope recognized by a 38-kDa-reactive CD4+ T-cell clone (1C6) isolated from a human diabetic patient. The minimal epitope was mapped to a 10-amino acid synthetic peptide containing a DR1 consensus binding motif. Data base searches did not reveal the identity of the protein, though a weak homology to the bacterial superantigens SEA (Streptococcus pyogenes exotoxin A) and SEB (Staphylococcus aureus enterotoxin B) (23% identity) was evident. The TEASEL procedure might be used to identify epitopes of other autoantigens recognized by CD4+ T cells in diabetes as well as be more generally applicable to the study low-abundance autoantigens in other tissue-specific autoimmune diseases. PMID:8700877

  13. In vivo exposure to ozone produces an increase in a 72-kDa heat shock protein in guinea pigs.

    PubMed

    Su, W Y; Gordon, T

    1997-09-01

    Although several lines of evidence have suggested that oxidizing agents can induce heat shock proteins (HSPs) in vitro, little is known about the induction of HSPs during in vivo exposure to oxidants. Guinea pigs were exposed to ozone for 6 h and euthanized up to 72 h later. Proteins from lavage cells and lung tissue were characterized by immunoblotting with 72- and 73/72-kDa HSP monoclonal antibodies. Although 73-kDa HSP was expressed constituitively in lung tissue, it was not affected by ozone. In contrast, 72-kDa HSP was significantly increased in lavage cells and lung tissue of animals exposed to 0.4 and 0.66 parts/million of ozone. Both heat treatment and arsenite induced 72-kDa HSP in cultured alveolar macrophages. The increase in 72-kDa HSP in the lavage cell pellet peaked at 24 h after ozone, whereas the influx of polymorphonuclear leukocytes peaked at 4 h. Examination of the induction of HSPs by ozone may provide clues to the development of ozone tolerance in humans and animals.

  14. Cloning, sequencing and phylogenetic analysis of the small GTPase gene cdc-42 from Ancylostoma caninum.

    PubMed

    Yang, Yurong; Zheng, Jing; Chen, Jiaxin

    2012-12-01

    CDC-42 is a member of the Rho GTPase subfamily that is involved in many signaling pathways, including mitosis, cell polarity, cell migration and cytoskeleton remodeling. Here, we present the first characterization of a full-length cDNA encoding the small GTPase cdc-42, designated as Accdc-42, isolated from the parasitic nematode Ancylostoma caninum. The encoded protein contains 191 amino acid residues with a predicted molecular weight of 21 kDa and displays a high level of identity with the Rho-family GTPase protein CDC-42. Phylogenetic analysis revealed that Accdc-42 was most closely related to Caenorhabditis briggsae cdc-42. Comparison with selected sequences from the free-living nematode Caenorhabditis elegans, Drosophila melanogaster, Xenopus laevis, Danio rerio, Mus musculus and human genomes showed that Accdc-42 is highly conserved. AcCDC-42 demonstrates the highest identity to CDC-42 from C. briggsae (94.2%), and it also exhibits 91.6% identity to CDC-42 from C. elegans and 91.1% from Brugia malayi. Additionally, the transcript of Accdc-42 was analyzed during the different developmental stages of the worm. Accdc-42 was expressed in the L1/L2 larvae, L3 larvae and female and male adults of A. caninum. Copyright © 2012 Elsevier Inc. All rights reserved.

  15. Identification and molecular characterization of 48 kDa calcium binding protein as calreticulin from finger millet (Eleusine coracana) using peptide mass fingerprinting and transcript profiling.

    PubMed

    Singh, Manoj; Metwal, Mamta; Kumar, Vandana A; Kumar, Anil

    2016-01-30

    Attempts were made to identify and characterize the calcium binding proteins (CaBPs) in grain filling stages of finger millet using proteomics, bioinformatics and molecular approaches. A distinctly observed blue color band of 48 kDa stained by Stains-all was eluted and analyzed as calreticulin (CRT) using nano liquid chromatography-tandem mass spectrometry (nano LC-MS). Based on the top hits of peptide mass fingerprinting results, conserved primers were designed for isolation of the CRT gene from finger millet using calreticulin sequences of different cereals. The deduced nucleotide sequence analysis of 600 bp amplicon showed up to 91% similarity with CRT gene(s) of rice and other plant species and designated as EcCRT1. Transcript profiling of EcCRT1 showed different levels of relative expression at different stages of developing spikes. The higher expression of EcCRT1 transcripts and protein were observed in later stages of developing spikes which might be due to greater translational synthesis of EcCRT1 protein during seed maturation in finger millet. Preferentially higher synthesis of this CaBP during later stages of grain filling may be responsible for the sequestration of calcium in endoplasmic reticulum of finger millet grains. © 2015 Society of Chemical Industry.

  16. Analysis of sequence repeats of proteins in the PDB.

    PubMed

    Mary Rajathei, David; Selvaraj, Samuel

    2013-12-01

    Internal repeats in protein sequences play a significant role in the evolution of protein structure and function. Applications of different bioinformatics tools help in the identification and characterization of these repeats. In the present study, we analyzed sequence repeats in a non-redundant set of proteins available in the Protein Data Bank (PDB). We used RADAR for detecting internal repeats in a protein, PDBeFOLD for assessing structural similarity, PDBsum for finding functional involvement and Pfam for domain assignment of the repeats in a protein. Through the analysis of sequence repeats, we found that identity of the sequence repeats falls in the range of 20-40% and, the superimposed structures of the most of the sequence repeats maintain similar overall folding. Analysis sequence repeats at the functional level reveals that most of the sequence repeats are involved in the function of the protein through functionally involved residues in the repeat regions. We also found that sequence repeats in single and two domain proteins often contained conserved sequence motifs for the function of the domain. Copyright © 2013 Elsevier Ltd. All rights reserved.

  17. Cloning, sequencing, purification, and crystal structure of Grenache (Vitis vinifera) polyphenol oxidase.

    PubMed

    Virador, Victoria M; Reyes Grajeda, Juan P; Blanco-Labra, Alejandro; Mendiola-Olaya, Elizabeth; Smith, Gary M; Moreno, Abel; Whitaker, John R

    2010-01-27

    The full-length cDNA sequence (P93622_VITVI) of polyphenol oxidase (PPO) cDNA from grape Vitis vinifera L., cv Grenache, was found to encode a translated protein of 607 amino acids with an expected molecular weight of ca. 67 kDa and a predicted pI of 6.83. The translated amino acid sequence was 99%, identical to that of a white grape berry PPO (1) (5 out of 607 amino acid potential sequence differences). The protein was purified from Grenache grape berries by using traditional methods, and it was crystallized with ammonium acetate by the hanging-drop vapor diffusion method. The crystals were orthorhombic, space group C222(1). The structure was obtained at 2.2 A resolution using synchrotron radiation using the 39 kDa isozyme of sweet potato PPO (PDB code: 1BT1 ) as a phase donor. The basic symmetry of the cell parameters (a, b, and c and alpha, beta, and gamma) as well as in the number of asymmetric units in the unit cell of the crystals of PPO, differed between the two proteins. The structures of the two enzymes are quite similar in overall fold, the location of the helix bundles at the core, and the active site in which three histidines bind each of the two catalytic copper ions, and one of the histidines is engaged in a thioether linkage with a cysteine residue. The possibility that the formation of the Cys-His thioether linkage constitutes the activation step is proposed. No evidence of phosphorylation or glycoslyation was found in the electron density map. The mass of the crystallized protein appears to be only 38.4 kDa, and the processing that occurs in the grape berry that leads to this smaller size is discussed.

  18. Proteomic analysis of carbon concentrating chemolithotrophic bacteria Serratia sp. for sequestration of carbon dioxide.

    PubMed

    Bharti, Randhir K; Srivastava, Shaili; Thakur, Indu Shekhar

    2014-01-01

    A chemolithotrophic bacterium enriched in the chemostat in presence of sodium bicarbonate as sole carbon source was identified as Serratia sp. by 16S rRNA sequencing. Carbon dioxide sequestering capacity of bacterium was detected by carbonic anhydrase enzyme and ribulose-1, 5- bisphosphate carboxylase/oxygenase (RuBisCO). The purified carbonic anhydrase showed molecular weight of 29 kDa. Molecular weight of RuBisCO was 550 kDa as determined by fast protein liquid chromatography (FPLC), however, sodium dodecyl sulphate polyacrylamide gel electrophoresis (SDS-PAGE) showed presence of two subunits whose molecular weights were 56 and 14 kDa. The Western blot analysis of the crude protein and purified sample cross reacted with RuBisCO large-subunit polypeptides antibodies showed strong band pattern at molecular weight around 56 kDa regions. Whole cell soluble proteins of Serratia sp. grown under autotrophic and heterotrophic conditions were resolved by two-dimensional gel electrophoresis and MALDI-TOF/MS for differential expression of proteins. In proteomic analysis of 63 protein spots, 48 spots were significantly up-regulated in the autotrophically grown cells; seven enzymes showed its utilization in autotrophic carbon fixation pathways and other metabolic activities of bacterium including lipid metabolisms indicated sequestration potency of carbon dioxide and production of biomaterials.

  19. Proteomic Analysis of Carbon Concentrating Chemolithotrophic Bacteria Serratia sp. for Sequestration of Carbon Dioxide

    PubMed Central

    Bharti, Randhir K.; Srivastava, Shaili; Thakur, Indu Shekhar

    2014-01-01

    A chemolithotrophic bacterium enriched in the chemostat in presence of sodium bicarbonate as sole carbon source was identified as Serratia sp. by 16S rRNA sequencing. Carbon dioxide sequestering capacity of bacterium was detected by carbonic anhydrase enzyme and ribulose-1, 5- bisphosphate carboxylase/oxygenase (RuBisCO). The purified carbonic anhydrase showed molecular weight of 29 kDa. Molecular weight of RuBisCO was 550 kDa as determined by fast protein liquid chromatography (FPLC), however, sodium dodecyl sulphate polyacrylamide gel electrophoresis (SDS-PAGE) showed presence of two subunits whose molecular weights were 56 and 14 kDa. The Western blot analysis of the crude protein and purified sample cross reacted with RuBisCO large-subunit polypeptides antibodies showed strong band pattern at molecular weight around 56 kDa regions. Whole cell soluble proteins of Serratia sp. grown under autotrophic and heterotrophic conditions were resolved by two-dimensional gel electrophoresis and MALDI-TOF/MS for differential expression of proteins. In proteomic analysis of 63 protein spots, 48 spots were significantly up-regulated in the autotrophically grown cells; seven enzymes showed its utilization in autotrophic carbon fixation pathways and other metabolic activities of bacterium including lipid metabolisms indicated sequestration potency of carbon dioxide and production of biomaterials. PMID:24619032

  20. Oleosins (24 and 18 kDa) are hydrolyzed not only in extracted soybean oil bodies but also in soybean germination.

    PubMed

    Chen, Yeming; Zhao, Luping; Cao, Yanyun; Kong, Xiangzhen; Hua, Yufei

    2014-01-29

    After oil bodies (OBs) were extracted from ungerminated soybean by pH 6.8 extraction, it was found that 24 and 18 kDa oleosins were hydrolyzed in the extracted OBs, which contained many OB extrinsic proteins (i.e., lipoxygenase, β-conglycinin, γ-conglycinin, β-amylase, glycinin, Gly m Bd 30K (Bd 30K), and P34 probable thiol protease (P34)) as well as OB intrinsic proteins. In this study, some properties (specificity, optimal pH and temperature) of the proteases of 24 and 18 kDa oleosins and the oleosin hydrolysis in soybean germination were examined, and the high relationship between Bd 30K/P34 and the proteases was also discussed. The results showed (1) the proteases were OB extrinsic proteins, which had high specificity to hydrolyze 24 and 18 kDa oleosins, and cleaved the specific peptide bonds to form limited hydrolyzed products; (2) 24 and 18 kDa oleosins were not hydrolyzed in the absence of Bd 30K and P34 (or some Tricine-SDS-PAGE undetectable proteins); (3) the protease of 24 kDa oleosin had strong resistance to alkaline pH while that of 18 kDa oleosin had weak resistance to alkaline pH, and Bd 30K and P34, resolved into two spots on two-dimensional electrophoresis gel, also showed the same trend; (4) 16 kDa oleosin as well as 24 and 18 kDa oleosins were hydrolyzed in soybean germination, and Bd 30K and P34 were always contained in the extracted OBs from germinated soybean even when all oleosins were hydrolyzed; (5) the optimal temperature and pH of the proteases were respectively determined as in the ranges of 35-50 °C and pH 6.0-6.5, while 60 °C or pH 11.0 could denature them.

  1. RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis.

    PubMed

    Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab

    2012-01-01

    RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. http://www.cemb.edu.pk/sw.html RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language.

  2. RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis

    PubMed Central

    Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab

    2012-01-01

    RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. Availability http://www.cemb.edu.pk/sw.html Abbreviations RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language. PMID:23055611

  3. A 115 kDa calmodulin-binding protein is located in rat liver endosome fractions.

    PubMed Central

    Enrich, C; Bachs, O; Evans, W H

    1988-01-01

    The distribution of calmodulin-binding polypeptides in various rat liver subcellular fractions was investigated. Plasma-membrane, endosome, Golgi and lysosome fractions were prepared by established procedures. The calmodulin-binding polypeptides present in the subcellular fractions were identified by using an overlay technique after transfer from gels to nitrocellulose sheets. Distinctive populations of calmodulin-binding polypeptides were present in all the fractions examined except lysosomes. A major 115 kDa calmodulin-binding polypeptide of pI 4.3 was located to the endosome subfractions, and it emerges as a candidate endosome-specific protein. Partitioning of endosome fractions between aqueous and Triton X-114 phases indicated that the calmodulin-binding polypeptide was hydrophobic. Major calmodulin-binding polypeptides of 140 and 240 kDa and minor polypeptides of 40-60 kDa were present in plasma membranes. The distribution of calmodulin in the various endosome and plasma-membrane fractions was also analysed, and the results indicated that the amounts were high compared with those in the cytosol. Images Fig. 1. Fig. 2. Fig. 3. Fig. 4. Fig. 5. PMID:3214436

  4. NexGen Production – Sequencing and Analysis

    ScienceCinema

    Muzny, Donna

    2018-01-16

    Donna Muzny of the Baylor College of Medicine Human Genome Sequencing Center discusses next generation sequencing platforms and evaluating pipeline performance on June 2, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM.

  5. Detection of Multiple Budding Yeast Cells and a Partial Sequence of 43-kDa Glycoprotein Coding Gene of Paracoccidioides brasiliensis from a Case of Lacaziosis in a Female Pacific White-Sided Dolphin (Lagenorhynchus obliquidens).

    PubMed

    Minakawa, Tomoko; Ueda, Keiichi; Tanaka, Miyuu; Tanaka, Natsuki; Kuwamura, Mitsuru; Izawa, Takeshi; Konno, Toshihiro; Yamate, Jyoji; Itano, Eiko Nakagawa; Sano, Ayako; Wada, Shinpei

    2016-08-01

    Lacaziosis, formerly called as lobomycosis, is a zoonotic mycosis, caused by Lacazia loboi, found in humans and dolphins, and is endemic in the countries on the Atlantic Ocean, Indian Ocean and Pacific Ocean of Japanese coast. Susceptible Cetacean species include the bottlenose dolphin (Tursiops truncatus), the Indian Ocean bottlenose dolphin (T. aduncus), and the estuarine dolphin (Sotalia guianensis); however, no cases have been recorded in other Cetacean species. We diagnosed a case of Lacaziosis in a Pacific white-sided dolphin (Lagenorhynchus obliquidens) nursing in an aquarium in Japan. The dolphin was a female estimated to be more than 14 years old at the end of June 2015 and was captured in a coast of Japan Sea in 2001. Multiple, lobose, and solid granulomatous lesions with or without ulcers appeared on her jaw, back, flipper and fluke skin, in July 2014. The granulomatous skin lesions from the present case were similar to those of our previous cases. Multiple budding and chains of round yeast cells were detected in the biopsied samples. The partial sequence of 43-kDa glycoprotein coding gene confirmed by a nested PCR and sequencing, which revealed a different genotype from both Amazonian and Japanese lacaziosis in bottlenose dolphins, and was 99 % identical to those derived from Paracoccidioides brasiliensis; a sister fungal species to L. loboi. This is the first case of lacaziosis in Pacific white-sided dolphin.

  6. Scalable Kernel Methods and Algorithms for General Sequence Analysis

    ERIC Educational Resources Information Center

    Kuksa, Pavel

    2011-01-01

    Analysis of large-scale sequential data has become an important task in machine learning and pattern recognition, inspired in part by numerous scientific and technological applications such as the document and text classification or the analysis of biological sequences. However, current computational methods for sequence comparison still lack…

  7. Laser Desorption Mass Spectrometry for DNA Sequencing and Analysis

    NASA Astrophysics Data System (ADS)

    Chen, C. H. Winston; Taranenko, N. I.; Golovlev, V. V.; Isola, N. R.; Allman, S. L.

    1998-03-01

    Rapid DNA sequencing and/or analysis is critically important for biomedical research. In the past, gel electrophoresis has been the primary tool to achieve DNA analysis and sequencing. However, gel electrophoresis is a time-consuming and labor-extensive process. Recently, we have developed and used laser desorption mass spectrometry (LDMS) to achieve sequencing of ss-DNA longer than 100 nucleotides. With LDMS, we succeeded in sequencing DNA in seconds instead of hours or days required by gel electrophoresis. In addition to sequencing, we also applied LDMS for the detection of DNA probes for hybridization LDMS was also used to detect short tandem repeats for forensic applications. Clinical applications for disease diagnosis such as cystic fibrosis caused by base deletion and point mutation have also been demonstrated. Experimental details will be presented in the meeting. abstract.

  8. DNA sequence and characterization of GcvA, a LysR family regulatory protein for the Escherichia coli glycine cleavage enzyme system.

    PubMed Central

    Wilson, R L; Stauffer, G V

    1994-01-01

    The gene encoding GcvA, the trans-acting regulatory protein for the Escherichia coli glycine cleavage enzyme system, has been sequenced. The gcvA locus contains an open reading frame of 930 nucleotides that could encode a protein with a molecular mass of 34.4 kDa, consistent with the results of minicell analysis indicating that GcvA is a polypeptide of approximately 33 kDa. The deduced amino acid sequence of GcvA revealed that this protein shares similarity with the LysR family of activator proteins. The transcription start site was found to be 72 bp upstream of the presumed translation start site. A chromosomal deletion of gcvA resulted in the inability of cells to activate the expression of a gcvT-lacZ gene fusion when grown in the presence of glycine and an inability to repress gcvT-lacZ expression when grown in the presence of inosine. The regulation of gcvA was examined by constructing a gcvA-lacZ gene fusion in which beta-galactosidase synthesis is under the control of the gcvA regulatory region. Although gcvA expression appears to be autogenously regulated over a two- to threefold range, it is neither induced by glycine nor repressed by inosine. Images PMID:8188587

  9. Sequencing, Assembly and Analysis of Human Microbial Communities

    ScienceCinema

    Petrosino, Joe

    2018-02-02

    Joe Petrosino of Baylor College of Medicine discusses using next generation sequencing technologies to study human microbial communities associated with health and disease on June 4, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM.

  10. Analysis of noise-induced temporal correlations in neuronal spike sequences

    NASA Astrophysics Data System (ADS)

    Reinoso, José A.; Torrent, M. C.; Masoller, Cristina

    2016-11-01

    We investigate temporal correlations in sequences of noise-induced neuronal spikes, using a symbolic method of time-series analysis. We focus on the sequence of time-intervals between consecutive spikes (inter-spike-intervals, ISIs). The analysis method, known as ordinal analysis, transforms the ISI sequence into a sequence of ordinal patterns (OPs), which are defined in terms of the relative ordering of consecutive ISIs. The ISI sequences are obtained from extensive simulations of two neuron models (FitzHugh-Nagumo, FHN, and integrate-and-fire, IF), with correlated noise. We find that, as the noise strength increases, temporal order gradually emerges, revealed by the existence of more frequent ordinal patterns in the ISI sequence. While in the FHN model the most frequent OP depends on the noise strength, in the IF model it is independent of the noise strength. In both models, the correlation time of the noise affects the OP probabilities but does not modify the most probable pattern.

  11. Design and Analysis of Single-Cell Sequencing Experiments.

    PubMed

    Grün, Dominic; van Oudenaarden, Alexander

    2015-11-05

    Recent advances in single-cell sequencing hold great potential for exploring biological systems with unprecedented resolution. Sequencing the genome of individual cells can reveal somatic mutations and allows the investigation of clonal dynamics. Single-cell transcriptome sequencing can elucidate the cell type composition of a sample. However, single-cell sequencing comes with major technical challenges and yields complex data output. In this Primer, we provide an overview of available methods and discuss experimental design and single-cell data analysis. We hope that these guidelines will enable a growing number of researchers to leverage the power of single-cell sequencing. Copyright © 2015 Elsevier Inc. All rights reserved.

  12. Long-read sequencing data analysis for yeasts.

    PubMed

    Yue, Jia-Xing; Liti, Gianni

    2018-06-01

    Long-read sequencing technologies have become increasingly popular due to their strengths in resolving complex genomic regions. As a leading model organism with small genome size and great biotechnological importance, the budding yeast Saccharomyces cerevisiae has many isolates currently being sequenced with long reads. However, analyzing long-read sequencing data to produce high-quality genome assembly and annotation remains challenging. Here, we present a modular computational framework named long-read sequencing data analysis for yeasts (LRSDAY), the first one-stop solution that streamlines this process. Starting from the raw sequencing reads, LRSDAY can produce chromosome-level genome assembly and comprehensive genome annotation in a highly automated manner with minimal manual intervention, which is not possible using any alternative tool available to date. The annotated genomic features include centromeres, protein-coding genes, tRNAs, transposable elements (TEs), and telomere-associated elements. Although tailored for S. cerevisiae, we designed LRSDAY to be highly modular and customizable, making it adaptable to virtually any eukaryotic organism. When applying LRSDAY to an S. cerevisiae strain, it takes ∼41 h to generate a complete and well-annotated genome from ∼100× Pacific Biosciences (PacBio) running the basic workflow with four threads. Basic experience working within the Linux command-line environment is recommended for carrying out the analysis using LRSDAY.

  13. A 92-kDa human immunostimulatory protein.

    PubMed Central

    Fontan, E; Briend, E; Saklani-Jusforgues, H; d'Alayer, J; Vandekerckhove, J; Fauve, R M

    1994-01-01

    We purified to apparent homogeneity a human urinary glycoprotein of 92 kDa (HGP.92) that, administered intravenously at 250 micrograms/kg, fully protected mice against a lethal inoculum of Listeria monocytogenes. Since HGP.92 protected scid mice, which lack B and T lymphocytes, this increased resistance to Listeria did not appear to be lymphocyte mediated. Furthermore, inflammatory macrophages incubated with 6 nM HGP.92 inhibited the growth of Lewis carcinoma cells in vitro. These two activities appeared to depend on an oligosaccharide moiety, as they were lost after N-Glycanase treatment of HGP.92. Thus, the biological activity of HGP.92 was in some way related to a glycan moiety. Images PMID:8078887

  14. High-throughput sequencing: a failure mode analysis.

    PubMed

    Yang, George S; Stott, Jeffery M; Smailus, Duane; Barber, Sarah A; Balasundaram, Miruna; Marra, Marco A; Holt, Robert A

    2005-01-04

    Basic manufacturing principles are becoming increasingly important in high-throughput sequencing facilities where there is a constant drive to increase quality, increase efficiency, and decrease operating costs. While high-throughput centres report failure rates typically on the order of 10%, the causes of sporadic sequencing failures are seldom analyzed in detail and have not, in the past, been formally reported. Here we report the results of a failure mode analysis of our production sequencing facility based on detailed evaluation of 9,216 ESTs generated from two cDNA libraries. Two categories of failures are described; process-related failures (failures due to equipment or sample handling) and template-related failures (failures that are revealed by close inspection of electropherograms and are likely due to properties of the template DNA sequence itself). Preventative action based on a detailed understanding of failure modes is likely to improve the performance of other production sequencing pipelines.

  15. Categorizing accident sequences in the external radiotherapy for risk analysis

    PubMed Central

    2013-01-01

    Purpose This study identifies accident sequences from the past accidents in order to help the risk analysis application to the external radiotherapy. Materials and Methods This study reviews 59 accidental cases in two retrospective safety analyses that have collected the incidents in the external radiotherapy extensively. Two accident analysis reports that accumulated past incidents are investigated to identify accident sequences including initiating events, failure of safety measures, and consequences. This study classifies the accidents by the treatments stages and sources of errors for initiating events, types of failures in the safety measures, and types of undesirable consequences and the number of affected patients. Then, the accident sequences are grouped into several categories on the basis of similarity of progression. As a result, these cases can be categorized into 14 groups of accident sequence. Results The result indicates that risk analysis needs to pay attention to not only the planning stage, but also the calibration stage that is committed prior to the main treatment process. It also shows that human error is the largest contributor to initiating events as well as to the failure of safety measures. This study also illustrates an event tree analysis for an accident sequence initiated in the calibration. Conclusion This study is expected to provide sights into the accident sequences for the prospective risk analysis through the review of experiences. PMID:23865005

  16. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.; Wang, Chunwei; Jevons, Luis C.; Bernhart, Derek H.; Lipshutz, Robert J.

    2004-05-11

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  17. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    1998-08-18

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  18. DSAP: deep-sequencing small RNA analysis pipeline.

    PubMed

    Huang, Po-Jung; Liu, Yi-Chung; Lee, Chi-Ching; Lin, Wei-Chen; Gan, Richie Ruei-Chi; Lyu, Ping-Chiang; Tang, Petrus

    2010-07-01

    DSAP is an automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology. DSAP uses a tab-delimited file as an input format, which holds the unique sequence reads (tags) and their corresponding number of copies generated by the Solexa sequencing platform. The input data will go through four analysis steps in DSAP: (i) cleanup: removal of adaptors and poly-A/T/C/G/N nucleotides; (ii) clustering: grouping of cleaned sequence tags into unique sequence clusters; (iii) non-coding RNA (ncRNA) matching: sequence homology mapping against a transcribed sequence library from the ncRNA database Rfam (http://rfam.sanger.ac.uk/); and (iv) known miRNA matching: detection of known miRNAs in miRBase (http://www.mirbase.org/) based on sequence homology. The expression levels corresponding to matched ncRNAs and miRNAs are summarized in multi-color clickable bar charts linked to external databases. DSAP is also capable of displaying miRNA expression levels from different jobs using a log(2)-scaled color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in miRBase. DSAP is available at http://dsap.cgu.edu.tw.

  19. Sequence information gain based motif analysis.

    PubMed

    Maynou, Joan; Pairó, Erola; Marco, Santiago; Perera, Alexandre

    2015-11-09

    The detection of regulatory regions in candidate sequences is essential for the understanding of the regulation of a particular gene and the mechanisms involved. This paper proposes a novel methodology based on information theoretic metrics for finding regulatory sequences in promoter regions. This methodology (SIGMA) has been tested on genomic sequence data for Homo sapiens and Mus musculus. SIGMA has been compared with different publicly available alternatives for motif detection, such as MEME/MAST, Biostrings (Bioconductor package), MotifRegressor, and previous work such Qresiduals projections or information theoretic based detectors. Comparative results, in the form of Receiver Operating Characteristic curves, show how, in 70% of the studied Transcription Factor Binding Sites, the SIGMA detector has a better performance and behaves more robustly than the methods compared, while having a similar computational time. The performance of SIGMA can be explained by its parametric simplicity in the modelling of the non-linear co-variability in the binding motif positions. Sequence Information Gain based Motif Analysis is a generalisation of a non-linear model of the cis-regulatory sequences detection based on Information Theory. This generalisation allows us to detect transcription factor binding sites with maximum performance disregarding the covariability observed in the positions of the training set of sequences. SIGMA is freely available to the public at http://b2slab.upc.edu.

  20. ReadXplorer—visualization and analysis of mapped sequences

    PubMed Central

    Hilker, Rolf; Stadermann, Kai Bernd; Doppmeier, Daniel; Kalinowski, Jörn; Stoye, Jens; Straube, Jasmin; Winnebald, Jörn; Goesmann, Alexander

    2014-01-01

    Motivation: Fast algorithms and well-arranged visualizations are required for the comprehensive analysis of the ever-growing size of genomic and transcriptomic next-generation sequencing data. Results: ReadXplorer is a software offering straightforward visualization and extensive analysis functions for genomic and transcriptomic DNA sequences mapped on a reference. A unique specialty of ReadXplorer is the quality classification of the read mappings. It is incorporated in all analysis functions and displayed in ReadXplorer's various synchronized data viewers for (i) the reference sequence, its base coverage as (ii) normalizable plot and (iii) histogram, (iv) read alignments and (v) read pairs. ReadXplorer's analysis capability covers RNA secondary structure prediction, single nucleotide polymorphism and deletion–insertion polymorphism detection, genomic feature and general coverage analysis. Especially for RNA-Seq data, it offers differential gene expression analysis, transcription start site and operon detection as well as RPKM value and read count calculations. Furthermore, ReadXplorer can combine or superimpose coverage of different datasets. Availability and implementation: ReadXplorer is available as open-source software at http://www.readxplorer.org along with a detailed manual. Contact: rhilker@mikrobio.med.uni-giessen.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24790157

  1. Single-Cell RNA-Sequencing: Assessment of Differential Expression Analysis Methods.

    PubMed

    Dal Molin, Alessandra; Baruzzo, Giacomo; Di Camillo, Barbara

    2017-01-01

    The sequencing of the transcriptomes of single-cells, or single-cell RNA-sequencing, has now become the dominant technology for the identification of novel cell types and for the study of stochastic gene expression. In recent years, various tools for analyzing single-cell RNA-sequencing data have been proposed, many of them with the purpose of performing differentially expression analysis. In this work, we compare four different tools for single-cell RNA-sequencing differential expression, together with two popular methods originally developed for the analysis of bulk RNA-sequencing data, but largely applied to single-cell data. We discuss results obtained on two real and one synthetic dataset, along with considerations about the perspectives of single-cell differential expression analysis. In particular, we explore the methods performance in four different scenarios, mimicking different unimodal or bimodal distributions of the data, as characteristic of single-cell transcriptomics. We observed marked differences between the selected methods in terms of precision and recall, the number of detected differentially expressed genes and the overall performance. Globally, the results obtained in our study suggest that is difficult to identify a best performing tool and that efforts are needed to improve the methodologies for single-cell RNA-sequencing data analysis and gain better accuracy of results.

  2. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, M.S.

    1998-08-18

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device. 27 figs.

  3. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    2003-08-19

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments may be improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  4. CAFE: aCcelerated Alignment-FrEe sequence analysis.

    PubMed

    Lu, Yang Young; Tang, Kujin; Ren, Jie; Fuhrman, Jed A; Waterman, Michael S; Sun, Fengzhu

    2017-07-03

    Alignment-free genome and metagenome comparisons are increasingly important with the development of next generation sequencing (NGS) technologies. Recently developed state-of-the-art k-mer based alignment-free dissimilarity measures including CVTree, $d_2^*$ and $d_2^S$ are more computationally expensive than measures based solely on the k-mer frequencies. Here, we report a standalone software, aCcelerated Alignment-FrEe sequence analysis (CAFE), for efficient calculation of 28 alignment-free dissimilarity measures. CAFE allows for both assembled genome sequences and unassembled NGS shotgun reads as input, and wraps the output in a standard PHYLIP format. In downstream analyses, CAFE can also be used to visualize the pairwise dissimilarity measures, including dendrograms, heatmap, principal coordinate analysis and network display. CAFE serves as a general k-mer based alignment-free analysis platform for studying the relationships among genomes and metagenomes, and is freely available at https://github.com/younglululu/CAFE. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. SAXS and other spectroscopic analysis of 12S cruciferin isolated from the seeds of Brassica nigra

    NASA Astrophysics Data System (ADS)

    Khaliq, Binish; Falke, Sven; Negm, Amr; Buck, Friedrich; Munawar, Aisha; Saqib, Maria; Mahmood, Seema; Ahmad, Malik Shoaib; Betzel, Christian; Akrem, Ahmed

    2017-06-01

    Oilseeds of the plant family Brassicaceae are important for providing both lipid and protein contents to human nutrition. Cruciferins (12S globulins) are seed storage proteins, which are getting attention due to their allergenic and pathogenicity related nature. This study describes the purification and characterization of a trimeric (∼190 kDa) cruciferin protein from the seeds of Brassica nigra (L.). Cruciferin was first partially purified by ammonium sulfate precipitation (30% saturation constant) and further purified by size exclusion chromatography. The N-terminal amino-acid sequence analysis showed 82% sequence homology with cruciferin from Arabidopsis thaliana. The 50-55 kDa monomeric cruciferin produced multiple bands of two major molecular weight ranges (α-polypeptides of 28-32 kDa and β-polypeptides of 17-20 kDa) under reduced conditions of SDS-PAGE. The 2D gel electrophoretic analysis showed the further separation of the bands into their isoforms with major pI ranges between 5.7 and 8.0 (α-polypeptides) and 5.5-8.5 (β-polypeptides). The Dynamic Light Scattering (DLS) showed the monodisperse nature of the cruciferin with hydrodynamic radius of 5.8 ± 0.1 nm confirming the trimeric nature of the protein. The Circular Dichroism (CD) spectra showed both α-helices and β-sheets in the native conformation of the trimeric protein. The pure cruciferin protein (40 mg/ml) was successfully crystallized; however, the crystals diffracted only to low resolution data (8 Å). Small-angle x-ray scattering (SAXS) was applied to gain insights into the three-dimensional structure in solution. SAXS showed that the radius of gyration is 4.24 ± 0.25 nm and confirmed the nearly globular shape. The SAXS based ab initio dummy model of B. nigra cruciferin was compared with 11S globulins.

  6. galaxie--CGI scripts for sequence identification through automated phylogenetic analysis.

    PubMed

    Nilsson, R Henrik; Larsson, Karl-Henrik; Ursing, Björn M

    2004-06-12

    The prevalent use of similarity searches like BLAST to identify sequences and species implicitly assumes the reference database to be of extensive sequence sampling. This is often not the case, restraining the correctness of the outcome as a basis for sequence identification. Phylogenetic inference outperforms similarity searches in retrieving correct phylogenies and consequently sequence identities, and a project was initiated to design a freely available script package for sequence identification through automated Web-based phylogenetic analysis. Three CGI scripts were designed to facilitate qualified sequence identification from a Web interface. Query sequences are aligned to pre-made alignments or to alignments made by ClustalW with entries retrieved from a BLAST search. The subsequent phylogenetic analysis is based on the PHYLIP package for inferring neighbor-joining and parsimony trees. The scripts are highly configurable. A service installation and a version for local use are found at http://andromeda.botany.gu.se/galaxiewelcome.html and http://galaxie.cgb.ki.se

  7. Initial sequencing and comparative analysis of the mouse genome

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Waterston, Robert H.; Lindblad-Toh, Kerstin; Birney, Ewan

    2002-12-15

    The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of themore » genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.« less

  8. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    1999-10-26

    A computer system (1) for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments may be improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area (814) and sample sequences in another area (816) on a display device (3).

  9. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    2001-06-05

    A computer system (1) for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments may be improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area (814) and sample sequences in another area (816) on a display device (3).

  10. A DNA sequence analysis package for the IBM personal computer.

    PubMed Central

    Lagrimini, L M; Brentano, S T; Donelson, J E

    1984-01-01

    We present here a collection of DNA sequence analysis programs, called "PC Sequence" (PCS), which are designed to run on the IBM Personal Computer (PC). These programs are written in IBM PC compiled BASIC and take full advantage of the IBM PC's speed, error handling, and graphics capabilities. For a modest initial expense in hardware any laboratory can use these programs to quickly perform computer analysis on DNA sequences. They are written with the novice user in mind and require very little training or previous experience with computers. Also provided are a text editing program for creating and modifying DNA sequence files and a communications program which enables the PC to communicate with and collect information from mainframe computers and DNA sequence databases. PMID:6546433

  11. cDNA, genomic sequence cloning and overexpression of ribosomal protein S25 gene (RPS25) from the Giant Panda.

    PubMed

    Hao, Yan-Zhe; Hou, Wan-Ru; Hou, Yi-Ling; Du, Yu-Jie; Zhang, Tian; Peng, Zheng-Song

    2009-11-01

    RPS25 is a component of the 40S small ribosomal subunit encoded by RPS25 gene, which is specific to eukaryotes. Studies in reference to RPS25 gene from animals were handful. The Giant Panda (Ailuropoda melanoleuca), known as a "living fossil", are increasingly concerned by the world community. Studies on RPS25 of the Giant Panda could provide scientific data for inquiring into the hereditary traits of the gene and formulating the protective strategy for the Giant Panda. The cDNA of the RPS25 cloned from Giant Panda is 436 bp in size, containing an open reading frame of 378 bp encoding 125 amino acids. The length of the genomic sequence is 1,992 bp, which was found to possess four exons and three introns. Alignment analysis indicated that the nucleotide sequence of the coding sequence shows a high homology to those of Homo sapiens, Bos taurus, Mus musculus and Rattus norvegicus as determined by Blast analysis, 92.6, 94.4, 89.2 and 91.5%, respectively. Primary structure analysis revealed that the molecular weight of the putative RPS25 protein is 13.7421 kDa with a theoretical pI 10.12. Topology prediction showed there is one N-glycosylation site, one cAMP and cGMP-dependent protein kinase phosphorylation site, two Protein kinase C phosphorylation sites and one Tyrosine kinase phosphorylation site in the RPS25 protein of the Giant Panda. The RPS25 gene was overexpressed in E. coli BL21 and Western Blotting of the RPS25 protein was also done. The results indicated that the RPS25 gene can be really expressed in E. coli and the RPS25 protein fusioned with the N-terminally his-tagged form gave rise to the accumulation of an expected 17.4 kDa polypeptide. The cDNA and the genomic sequence of RPS25 were cloned successfully for the first time from the Giant Panda using RT-PCR technology and Touchdown-PCR, respectively, which were both sequenced and analyzed preliminarily; then the cDNA of the RPS25 gene was overexpressed in E. coli BL21 and immunoblotted, which is the first

  12. Modeling and Docking Studies on Novel Mutants (K71L and T204V) of the ATPase Domain of Human Heat Shock 70 kDa Protein 1

    PubMed Central

    Elengoe, Asita; Naser, Mohammed Abu; Hamdan, Salehhuddin

    2014-01-01

    The purpose of exploring protein interactions between human adenovirus and heat shock protein 70 is to exploit a potentially synergistic interaction to enhance anti-tumoral efficacy and decrease toxicity in cancer treatment. However, the protein interaction of Hsp70 with E1A32 kDa of human adenovirus serotype 5 remains to be elucidated. In this study, two residues of ATPase domain of human heat shock 70 kDa protein 1 (PDB: 1 HJO) were mutated. 3D mutant models (K71L and T204V) using PyMol software were then constructed. The structures were evaluated by PROCHECK, ProQ, ERRAT, Verify 3D and ProSA modules. All evidence suggests that all protein models are acceptable and of good quality. The E1A32 kDa motif was retrieved from UniProt (P03255), as well as subjected to docking interaction with NBD, K71L and T204V, using the Autodock 4.2 program. The best lowest binding energy value of −9.09 kcal/mol was selected for novel T204V. Moreover, the protein-ligand complex structures were validated by RMSD, RMSF, hydrogen bonds and salt bridge analysis. This revealed that the T204V-E1A32 kDa motif complex was the most stable among all three complex structures. This study provides information about the interaction between Hsp70 and the E1A32 kDa motif, which emphasizes future perspectives to design rational drugs and vaccines in cancer therapy. PMID:24758925

  13. Analysis of xylem formation in pine by cDNA sequencing

    NASA Technical Reports Server (NTRS)

    Allona, I.; Quinn, M.; Shoop, E.; Swope, K.; St Cyr, S.; Carlis, J.; Riedl, J.; Retzel, E.; Campbell, M. M.; Sederoff, R.; hide

    1998-01-01

    Secondary xylem (wood) formation is likely to involve some genes expressed rarely or not at all in herbaceous plants. Moreover, environmental and developmental stimuli influence secondary xylem differentiation, producing morphological and chemical changes in wood. To increase our understanding of xylem formation, and to provide material for comparative analysis of gymnosperm and angiosperm sequences, ESTs were obtained from immature xylem of loblolly pine (Pinus taeda L.). A total of 1,097 single-pass sequences were obtained from 5' ends of cDNAs made from gravistimulated tissue from bent trees. Cluster analysis detected 107 groups of similar sequences, ranging in size from 2 to 20 sequences. A total of 361 sequences fell into these groups, whereas 736 sequences were unique. About 55% of the pine EST sequences show similarity to previously described sequences in public databases. About 10% of the recognized genes encode factors involved in cell wall formation. Sequences similar to cell wall proteins, most known lignin biosynthetic enzymes, and several enzymes of carbohydrate metabolism were found. A number of putative regulatory proteins also are represented. Expression patterns of several of these genes were studied in various tissues and organs of pine. Sequencing novel genes expressed during xylem formation will provide a powerful means of identifying mechanisms controlling this important differentiation pathway.

  14. Purification and Immunobiochemical Characterization of a 31 kDa Cross-Reactive Allergen from Phaseolus vulgaris (Kidney Bean)

    PubMed Central

    Kasera, Ramkrashan; Singh, Anand Bahadur; Lavasa, Shakuntala; Nagendra, Komarla; Arora, Naveen

    2013-01-01

    Background Legumes are a rich source of proteins but are also potential elicitors of IgE-mediated food allergy. This study aimed to isolate and characterize a major allergen of Phaseolus vulgaris (kidney bean) and determine its allergenicity. Methodology Kidney bean allergen was purified using Q Sepharose column (anion exchanger) and eluates with high intensity were pooled to purify protein using Superdex 75 (gel filtration) and C18 column (RP-HPLC). Patients with history of kidney bean allergy were skin prick tested (SPT) with crude kidney bean extract and the purified protein. Specific IgE was estimated in sera by enzyme-linked immunosorbent assay (ELISA). Characterization of purified protein and its cross-reactivity was investigated by immunobiochemical methods. Identification of purified protein was carried out by tandem mass spectrometry. Principal Findings Purified protein appeared as a single band at 31 kDa on SDS-PAGE and showed IgE binding to 88% patients’ sera by ELISA and immunoblotting. SPT with purified protein identified 78% hypersensitive patients of kidney bean. Significant release of histamine from sensitized basophils was observed after challenge with purified protein. PAS staining suggested it to be a glycoprotein, but no change in IgE binding was observed after periodate oxidation. The 31 kDa protein remained stable for 60 min on incubation with pepsin. The purified protein had high allergenic potential since it required only 102 ng of self protein for 50% IgE inhibition. Mass spectrometric analysis identified it as Phytohemagglutinin. It also showed hemagglutination with human RBCs. Cross-reactivity was observed with peanut and black gram with IC50 of 185 and 228 ng respectively. Conclusion/Significance A 31 kDa major allergen of kidney bean was purified and identified as phytohemagglutinin with cross-reactivity to peanut and black gram. PMID:23671655

  15. Multilevel analysis of sports video sequences

    NASA Astrophysics Data System (ADS)

    Han, Jungong; Farin, Dirk; de With, Peter H. N.

    2006-01-01

    We propose a fully automatic and flexible framework for analysis and summarization of tennis broadcast video sequences, using visual features and specific game-context knowledge. Our framework can analyze a tennis video sequence at three levels, which provides a broad range of different analysis results. The proposed framework includes novel pixel-level and object-level tennis video processing algorithms, such as a moving-player detection taking both the color and the court (playing-field) information into account, and a player-position tracking algorithm based on a 3-D camera model. Additionally, we employ scene-level models for detecting events, like service, base-line rally and net-approach, based on a number real-world visual features. The system can summarize three forms of information: (1) all court-view playing frames in a game, (2) the moving trajectory and real-speed of each player, as well as relative position between the player and the court, (3) the semantic event segments in a game. The proposed framework is flexible in choosing the level of analysis that is desired. It is effective because the framework makes use of several visual cues obtained from the real-world domain to model important events like service, thereby increasing the accuracy of the scene-level analysis. The paper presents attractive experimental results highlighting the system efficiency and analysis capabilities.

  16. Validation of Genotyping-By-Sequencing Analysis in Populations of Tetraploid Alfalfa by 454 Sequencing

    PubMed Central

    Rocher, Solen; Jean, Martine; Castonguay, Yves; Belzile, François

    2015-01-01

    Genotyping-by-sequencing (GBS) is a relatively low-cost high throughput genotyping technology based on next generation sequencing and is applicable to orphan species with no reference genome. A combination of genome complexity reduction and multiplexing with DNA barcoding provides a simple and affordable way to resolve allelic variation between plant samples or populations. GBS was performed on ApeKI libraries using DNA from 48 genotypes each of two heterogeneous populations of tetraploid alfalfa (Medicago sativa spp. sativa): the synthetic cultivar Apica (ATF0) and a derived population (ATF5) obtained after five cycles of recurrent selection for superior tolerance to freezing (TF). Nearly 400 million reads were obtained from two lanes of an Illumina HiSeq 2000 sequencer and analyzed with the Universal Network-Enabled Analysis Kit (UNEAK) pipeline designed for species with no reference genome. Following the application of whole dataset-level filters, 11,694 single nucleotide polymorphism (SNP) loci were obtained. About 60% had a significant match on the Medicago truncatula syntenic genome. The accuracy of allelic ratios and genotype calls based on GBS data was directly assessed using 454 sequencing on a subset of SNP loci scored in eight plant samples. Sequencing depth in this study was not sufficient for accurate tetraploid allelic dosage, but reliable genotype calls based on diploid allelic dosage were obtained when using additional quality filtering. Principal Component Analysis of SNP loci in plant samples revealed that a small proportion (<5%) of the genetic variability assessed by GBS is able to differentiate ATF0 and ATF5. Our results confirm that analysis of GBS data using UNEAK is a reliable approach for genome-wide discovery of SNP loci in outcrossed polyploids. PMID:26115486

  17. Complete nucleotide sequence of jasmine virus H, a new member of the family Tombusviridae.

    PubMed

    Zhuo, Tao; Zhu, Li-Juan; Lu, Cheng-Cong; Jiang, Chao-Yang; Chen, Zi-Yin; Zhang, Guangzhi; Wang, Zong-Hua; Jovel, Juan; Han, Yan-Hong

    2018-03-01

    Jasmine virus H (JaVH) is a novel virus associated with symptoms of yellow mosaic on jasmine. The JaVH genome is 3,867 nt in length with five open reading frames (ORFs) encoding a 27-kDa protein (ORF 1), an 87-kDa replicase protein (ORF 2), two centrally located movement proteins (ORF 3 and 4), and a 37-kDa capsid protein (ORF 5). Based on genomic and phylogenetic analysis, JaVH is predicted to be a member of the genus Pelarspovirus in the family Tombusviridae.

  18. Molecular cloning and developmental expression of the catalytic and 65-kDa regulatory subunits of protein phosphatase 2A in Drosophila.

    PubMed Central

    Mayer-Jaekel, R E; Baumgartner, S; Bilbe, G; Ohkura, H; Glover, D M; Hemmings, B A

    1992-01-01

    cDNA clones encoding the catalytic subunit and the 65-kDa regulatory subunit of protein phosphatase 2A (PR65) from Drosophila melanogaster have been isolated by homology screening with the corresponding human cDNAs. The Drosophila clones were used to analyze the spatial and temporal expression of the transcripts encoding these two proteins. The Drosophila PR65 cDNA clones contained an open reading frame of 1773 nucleotides encoding a protein of 65.5 kDa. The predicted amino acid sequence showed 75 and 71% identity to the human PR65 alpha and beta isoforms, respectively. As previously reported for the mammalian PR65 isoforms, Drosophila PR65 is composed of 15 imperfect repeating units of approximately 39 amino acids. The residues contributing to this repeat structure show also the highest sequence conservation between species, indicating a functional importance for these repeats. The gene encoding Drosophila PR65 was located at 29B1,2 on the second chromosome. A major transcript of 2.8 kilobase (kb) encoding the PR65 subunit and two transcripts of 1.6 and 2.5 kb encoding the catalytic subunit could be detected throughout Drosophila development. All of these mRNAs were most abundant during early embryogenesis and were expressed at lower levels in larvae and adult flies. In situ hybridization of different developmental stages showed a colocalization of the PR65 and catalytic subunit transcripts. The mRNA expression is high in the nurse cells and oocytes, consistent with a high equally distributed expression in early embryos. In later embryonal development, the expression remains high in the nervous system and the gonads but the overall transcript levels decrease. In third instar larvae, high levels of mRNA could be observed in brain, imaginal discs, and in salivary glands. These results indicate that protein phosphatase 2A transcript levels change during development in a tissue and in a time-specific manner. Images PMID:1320961

  19. Hydroquinone: O-glucosyltransferase from cultivated Rauvolfia cells: enrichment and partial amino acid sequences.

    PubMed

    Arend, J; Warzecha, H; Stöckigt, J

    2000-01-01

    Plant cell suspension cultures of Rauvolfia are able to produce a high amount of arbutin by glucosylation of exogenously added hydroquinone. A four step purification procedure using anion exchange, hydrophobic interaction, hydroxyapatite-chromatography and chromatofocusing delivered in a yield of 0.5%, an approximately 390 fold enrichment of the involved glucosyltransferase. SDS-PAGE showed a M(r) for the enzyme of 52 kDa. Proteolysis of the pure enzyme with endoproteinase LysC revealed six peptide fragments with 9-23 amino acids which were sequenced. Sequence alignment of the six peptides showed high homologies to glycosyltransferases from other higher plants.

  20. EventThread: Visual Summarization and Stage Analysis of Event Sequence Data.

    PubMed

    Guo, Shunan; Xu, Ke; Zhao, Rongwen; Gotz, David; Zha, Hongyuan; Cao, Nan

    2018-01-01

    Event sequence data such as electronic health records, a person's academic records, or car service records, are ordered series of events which have occurred over a period of time. Analyzing collections of event sequences can reveal common or semantically important sequential patterns. For example, event sequence analysis might reveal frequently used care plans for treating a disease, typical publishing patterns of professors, and the patterns of service that result in a well-maintained car. It is challenging, however, to visually explore large numbers of event sequences, or sequences with large numbers of event types. Existing methods focus on extracting explicitly matching patterns of events using statistical analysis to create stages of event progression over time. However, these methods fail to capture latent clusters of similar but not identical evolutions of event sequences. In this paper, we introduce a novel visualization system named EventThread which clusters event sequences into threads based on tensor analysis and visualizes the latent stage categories and evolution patterns by interactively grouping the threads by similarity into time-specific clusters. We demonstrate the effectiveness of EventThread through usage scenarios in three different application domains and via interviews with an expert user.

  1. Noncoding sequence classification based on wavelet transform analysis: part I

    NASA Astrophysics Data System (ADS)

    Paredes, O.; Strojnik, M.; Romo-Vázquez, R.; Vélez Pérez, H.; Ranta, R.; Garcia-Torales, G.; Scholl, M. K.; Morales, J. A.

    2017-09-01

    DNA sequences in human genome can be divided into the coding and noncoding ones. Coding sequences are those that are read during the transcription. The identification of coding sequences has been widely reported in literature due to its much-studied periodicity. Noncoding sequences represent the majority of the human genome. They play an important role in gene regulation and differentiation among the cells. However, noncoding sequences do not exhibit periodicities that correlate to their functions. The ENCODE (Encyclopedia of DNA elements) and Epigenomic Roadmap Project projects have cataloged the human noncoding sequences into specific functions. We study characteristics of noncoding sequences with wavelet analysis of genomic signals.

  2. [Development of laboratory sequence analysis software based on WWW and UNIX].

    PubMed

    Huang, Y; Gu, J R

    2001-01-01

    Sequence analysis tools based on WWW and UNIX were developed in our laboratory to meet the needs of molecular genetics research in our laboratory. General principles of computer analysis of DNA and protein sequences were also briefly discussed in this paper.

  3. Identification and subspecific differentiation of Mycobacterium scrofulaceum by automated sequencing of a region of the gene (hsp65) encoding a 65-kilodalton heat shock protein.

    PubMed Central

    Swanson, D S; Pan, X; Musser, J M

    1996-01-01

    Mycobacterium scrofulaceum is most commonly recovered from children with cervical lymphadenitis, although it also accounts for approximately 2% of the mycobacterial infections in AIDS patients. Species assignment of M. scrofulaceum isolated by conventional techniques can be difficult and time-consuming. To develop a strategy for rapid species assignment of these organisms, a 360-bp region of the gene (hsp65) encoding a 65-kDa heat shock protein in 37 isolates from diverse sources was sequenced. Eight hsp65 alleles were identified, and these sequences formed phylogenetic clusters and lineages largely distinct from other Mycobacterium species. There was incomplete correlation between serovar designation and hsp65 allele assignment. The hsp65 data correlated strongly with the results of sequence analysis of the gene coding for 16S rRNA. Automated DNA sequencing of a 360-bp region of the hsp65 gene provides a rapid and unambiguous method for species assignment of these acid-fast organisms for diagnostic purposes. PMID:8940463

  4. DNA sequence analysis of ARS elements from chromosome III of Saccharomyces cerevisiae: identification of a new conserved sequence.

    PubMed Central

    Palzkill, T G; Oliver, S G; Newlon, C S

    1986-01-01

    Four fragments of Saccharomyces cerevisiae chromosome III DNA which carry ARS elements have been sequenced. Each fragment contains multiple copies of sequences that have at least 10 out of 11 bases of homology to a previously reported 11 bp core consensus sequence. A survey of these new ARS sequences and previously reported sequences revealed the presence of an additional 11 bp conserved element located on the 3' side of the T-rich strand of the core consensus. Subcloning analysis as well as deletion and transposon insertion mutagenesis of ARS fragments support a role for 3' conserved sequence in promoting ARS activity. PMID:3529036

  5. Human endomembrane H+ pump strongly resembles the ATP-synthetase of Archaebacteria.

    PubMed Central

    Südhof, T C; Fried, V A; Stone, D K; Johnston, P A; Xie, X S

    1989-01-01

    Preparations of mammalian H+ pumps that acidify intracellular vesicles contain eight or nine polypeptides, ranging in size from 116 to 17 kDa. Biochemical analysis indicates that the 70- and 58-kDa polypeptides are subunits critical for ATP hydrolysis. The amino acid sequences of the major catalytic subunits (58 and 70 kDa) of the endomembrane H+ pump are unknown from animal cells. We report here the complete sequence of the 58-kDa subunit derived from a human kidney cDNA clone and partial sequences of the 70- and 58-kDa subunits purified from clathrin-coated vesicles of bovine brain. The amino acid sequences of both proteins strongly resemble the sequences of the corresponding subunits of the vacuolar H+ pumps of Archaebacteria, plants, and fungi. The archaebacterial enzyme is believed to use a H+ gradient to synthesize ATP. Thus, a common ancestral protein has given rise to a H+ pump that synthesizes ATP in one organism and hydrolyzes it in another and is highly conserved from prokaryotes to humans. The same pump appears to mediate the acidification of intracellular organelles, including coated vesicles, lysosomes, and secretory granules, as well as extracellular fluids such as urine. PMID:2527371

  6. Variation of expression defects in cell surface 190-kDa protein antigen of Streptococcus mutans.

    PubMed

    Lapirattanakul, Jinthana; Nomura, Ryota; Matsumoto-Nakano, Michiyo; Srisatjaluk, Ratchapin; Ooshima, Takashi; Nakano, Kazuhiko

    2015-05-01

    Streptococcus mutans, which consists of four serotypes, c, e, f, and k, possesses a 190-kDa cell surface protein antigen (PA) for initial tooth adhesion. We used Western blot analysis to determine PA expression in 750 S. mutans isolates from 150 subjects and found a significantly higher prevalence of the isolates with PA expression defects in serotypes f and k compared to serotypes c and e. Moreover, the defect patterns could be classified into three types; no PA expression on whole bacterial cells and in their supernatant samples (Type N1), PA expression mainly seen in supernatant samples (Type N2), and only low expression of PA in the samples of whole bacterial cells (Type W). The underlying reasons for the defects were mutations in the gene encoding PA as well as in the transcriptional processing of this gene for Type N1, defects in the sortase gene for Type N2, and low mRNA expression of PA for Type W. Since cellular hydrophobicity and phagocytosis susceptibility of the PA-defective isolates were significantly lower than those of the normal expression isolates, the potential implication of such defective isolates in systemic diseases involving bacteremia other than dental caries was suggested. Additionally, multilocus sequence typing was utilized to characterize S. mutans clones that represented a proportion of isolates with PA defects of 65-100%. Therefore, we described the molecular basis for variation defects in PA expression of S. mutans. Furthermore, we also emphasized the strong association between PA expression defects and serotypes f and k as well as the clonal relationships among these isolates. Copyright © 2015 Elsevier GmbH. All rights reserved.

  7. Monoclonal antibodies against 27.8 kDa protein receptor efficiently block lymphocystis disease virus infection in flounder Paralichthys olivaceus gill cells.

    PubMed

    Sheng, Xiu-Zhen; Wang, Mu; Xing, Jing; Zhan, Wen-Bin

    2012-08-13

    In previous research using co-immunoprecipitation, a 27.8 kDa protein in flounder Paralichthys olivaceus gill (FG) cells was found to bind lymphocystis disease virus (LCDV). In this paper, 13 hybridomas secreting monoclonal antibodies (MAbs) against the 27.8 kDa protein were obtained, and 2 MAbs designated as 2G11 and 3D9 were cloned by limiting dilution. Analyzed by indirect enzyme-linked immunosorbent assay (ELISA) and western blotting, the MAbs specifically reacted with the 27.8 kDa protein of FG cells. Confocal fluorescence microscopy and immunogold electron microscopy (IEM) provided evidence that the epitopes recognized by these MAbs were located primarily on the cell membrane and occasionally in the cytoplasm near the cell membrane of FG cells. The MAbs could block LCDV binding after MAbs were pre-incubated with isolated membrane proteins of FG cells in a blocking ELISA, and MAbs also could inhibit LCDV infection of FG cells in culture. Moreover, several target tissues of LCDV in flounder, including gill, stomach, intestine and liver, displayed the presence of the LCDV receptor-27.8 kDa. These results strongly supported the possibility that the 27.8 kDa protein is the putative receptor specific for LCDV infection of FG cells in flounder.

  8. Quantiprot - a Python package for quantitative analysis of protein sequences.

    PubMed

    Konopka, Bogumił M; Marciniak, Marta; Dyrka, Witold

    2017-07-17

    The field of protein sequence analysis is dominated by tools rooted in substitution matrices and alignments. A complementary approach is provided by methods of quantitative characterization. A major advantage of the approach is that quantitative properties defines a multidimensional solution space, where sequences can be related to each other and differences can be meaningfully interpreted. Quantiprot is a software package in Python, which provides a simple and consistent interface to multiple methods for quantitative characterization of protein sequences. The package can be used to calculate dozens of characteristics directly from sequences or using physico-chemical properties of amino acids. Besides basic measures, Quantiprot performs quantitative analysis of recurrence and determinism in the sequence, calculates distribution of n-grams and computes the Zipf's law coefficient. We propose three main fields of application of the Quantiprot package. First, quantitative characteristics can be used in alignment-free similarity searches, and in clustering of large and/or divergent sequence sets. Second, a feature space defined by quantitative properties can be used in comparative studies of protein families and organisms. Third, the feature space can be used for evaluating generative models, where large number of sequences generated by the model can be compared to actually observed sequences.

  9. In silico analysis of subtilisin from Glaciozyma antarctica PI12

    NASA Astrophysics Data System (ADS)

    Mustafha, Siti Mardhiah; Murad, Abdul Munir Abdul; Mahadi, Nor Muhammad; Kamaruddin, Shazilah; Bakar, Farah Diba Abu

    2015-09-01

    Subtilisin constitute as a major player in industrial enzymes that has a wide range of application especially in the detergent industry. In this study, a cDNA encoding for subtilisin (GaSUBT) was extracted from the psychrophilic yeast, Glaciozyma antarctica PI12, PCR amplified and sequenced. Various bioinformatics tools were used to characterize the GaSUBT. GaSUBT contains 1587 bp nucleotides encoding for 529 amino acids. The predicted molecular weight of the deduced protein is 55.34 kDa with an isoelectric point of 6.25. GaSUBT was predicted to possess a signal peptide and pro-peptide consisting of a peptidase inhibitor I9 sequence. From the sequence alignment analysis of deduced amino acids with other subtilisins in the NCBI database showed that the sequences surrounding the catalytic triad that forms the catalytic domain are well conserved.

  10. Recurrence time statistics: versatile tools for genomic DNA sequence analysis.

    PubMed

    Cao, Yinhe; Tung, Wen-Wen; Gao, J B

    2004-01-01

    With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.

  11. Infrared thermal facial image sequence registration analysis and verification

    NASA Astrophysics Data System (ADS)

    Chen, Chieh-Li; Jian, Bo-Lin

    2015-03-01

    To study the emotional responses of subjects to the International Affective Picture System (IAPS), infrared thermal facial image sequence is preprocessed for registration before further analysis such that the variance caused by minor and irregular subject movements is reduced. Without affecting the comfort level and inducing minimal harm, this study proposes an infrared thermal facial image sequence registration process that will reduce the deviations caused by the unconscious head shaking of the subjects. A fixed image for registration is produced through the localization of the centroid of the eye region as well as image translation and rotation processes. Thermal image sequencing will then be automatically registered using the two-stage genetic algorithm proposed. The deviation before and after image registration will be demonstrated by image quality indices. The results show that the infrared thermal image sequence registration process proposed in this study is effective in localizing facial images accurately, which will be beneficial to the correlation analysis of psychological information related to the facial area.

  12. Cloning of the Gene Encoding a 22-Kilodalton Cell Surface Antigen of Mycobacterium bovis BCG and Analysis of Its Potential for DNA Vaccination against Tuberculosis

    PubMed Central

    Lefèvre, Philippe; Denis, Olivier; De Wit, Lucas; Tanghe, Audrey; Vandenbussche, Paul; Content, Jean; Huygen, Kris

    2000-01-01

    Using spleen cells from mice vaccinated with live Mycobacterium bovis BCG, we previously generated three monoclonal antibodies reactive against a 22-kDa protein present in mycobacterial culture filtrate (CF) (K. Huygen et al., Infect. Immun. 61:2687–2693, 1993). These monoclonal antibodies were used to screen an M. bovis BCG genomic library made in phage λgt11. The gene encoding a 233-amino-acid (aa) protein, including a putative 26-aa signal sequence, was isolated, and sequence analysis indicated that the protein was 98% identical with the M. tuberculosis Lppx protein and that it contained a sequence 94% identical with the M. leprae 38-mer polypeptide 13B3 recognized by T cells from killed M. leprae-immunized subjects. Flow cytometry and cell fractionation demonstrated that the 22-kDa CF protein is also highly expressed in the bacterial cell wall and membrane compartment but not in the cytosol. C57BL/6, C3H, and BALB/c mice were vaccinated with plasmid DNA encoding the 22-kDa protein and analyzed for immune response and protection against intravenous M. tuberculosis challenge. Whereas DNA vaccination induced elevated antibody responses in C57BL/6 and particularly in C3H mice, Th1-type cytokine response, as measured by interleukin-2 and gamma interferon secretion, was only modest, and no protection against intravenous M. tuberculosis challenge was observed in any of the three mouse strains tested. Therefore, the 22-kDa antigen seems to have little potential for a DNA vaccine against tuberculosis, but it may be a good candidate for a mycobacterial antigen detection test. PMID:10678905

  13. High-Resolution Melting Analysis for Rapid Detection of Sequence Type 131 Escherichia coli.

    PubMed

    Harrison, Lucas B; Hanson, Nancy D

    2017-06-01

    Escherichia coli isolates belonging to the sequence type 131 (ST131) clonal complex have been associated with the global distribution of fluoroquinolone and β-lactam resistance. Whole-genome sequencing and multilocus sequence typing identify sequence type but are expensive when evaluating large numbers of samples. This study was designed to develop a cost-effective screening tool using high-resolution melting (HRM) analysis to differentiate ST131 from non-ST131 E. coli in large sample populations in the absence of sequence analysis. The method was optimized using DNA from 12 E. coli isolates. Singleplex PCR was performed using 10 ng of DNA, Type-it HRM buffer, and multilocus sequence typing primers and was followed by multiplex PCR. The amplicon sizes ranged from 630 to 737 bp. Melt temperature peaks were determined by performing HRM analysis at 0.1°C resolution from 50 to 95°C on a Rotor-Gene Q 5-plex HRM system. Derivative melt curves were compared between sequence types and analyzed by principal component analysis. A blinded study of 191 E. coli isolates of ST131 and unknown sequence types validated this methodology. This methodology returned 99.2% specificity (124 true negatives and 1 false positive) and 100% sensitivity (66 true positives and 0 false negatives). This HRM methodology distinguishes ST131 from non-ST131 E. coli without sequence analysis. The analysis can be accomplished in about 3 h in any laboratory with an HRM-capable instrument and principal component analysis software. Therefore, this assay is a fast and cost-effective alternative to sequencing-based ST131 identification. Copyright © 2017 Harrison and Hanson.

  14. Analysis and Visualization Tool for Targeted Amplicon Bisulfite Sequencing on Ion Torrent Sequencers

    PubMed Central

    Pabinger, Stephan; Ernst, Karina; Pulverer, Walter; Kallmeyer, Rainer; Valdes, Ana M.; Metrustry, Sarah; Katic, Denis; Nuzzo, Angelo; Kriegner, Albert; Vierlinger, Klemens; Weinhaeusel, Andreas

    2016-01-01

    Targeted sequencing of PCR amplicons generated from bisulfite deaminated DNA is a flexible, cost-effective way to study methylation of a sample at single CpG resolution and perform subsequent multi-target, multi-sample comparisons. Currently, no platform specific protocol, support, or analysis solution is provided to perform targeted bisulfite sequencing on a Personal Genome Machine (PGM). Here, we present a novel tool, called TABSAT, for analyzing targeted bisulfite sequencing data generated on Ion Torrent sequencers. The workflow starts with raw sequencing data, performs quality assessment, and uses a tailored version of Bismark to map the reads to a reference genome. The pipeline visualizes results as lollipop plots and is able to deduce specific methylation-patterns present in a sample. The obtained profiles are then summarized and compared between samples. In order to assess the performance of the targeted bisulfite sequencing workflow, 48 samples were used to generate 53 different Bisulfite-Sequencing PCR amplicons from each sample, resulting in 2,544 amplicon targets. We obtained a mean coverage of 282X using 1,196,822 aligned reads. Next, we compared the sequencing results of these targets to the methylation level of the corresponding sites on an Illumina 450k methylation chip. The calculated average Pearson correlation coefficient of 0.91 confirms the sequencing results with one of the industry-leading CpG methylation platforms and shows that targeted amplicon bisulfite sequencing provides an accurate and cost-efficient method for DNA methylation studies, e.g., to provide platform-independent confirmation of Illumina Infinium 450k methylation data. TABSAT offers a novel way to analyze data generated by Ion Torrent instruments and can also be used with data from the Illumina MiSeq platform. It can be easily accessed via the Platomics platform, which offers a web-based graphical user interface along with sample and parameter storage. TABSAT is freely

  15. RIKEN Integrated Sequence Analysis (RISA) System—384-Format Sequencing Pipeline with 384 Multicapillary Sequencer

    PubMed Central

    Shibata, Kazuhiro; Itoh, Masayoshi; Aizawa, Katsunori; Nagaoka, Sumiharu; Sasaki, Nobuya; Carninci, Piero; Konno, Hideaki; Akiyama, Junichi; Nishi, Katsuo; Kitsunai, Tokuji; Tashiro, Hideo; Itoh, Mari; Sumi, Noriko; Ishii, Yoshiyuki; Nakamura, Shin; Hazama, Makoto; Nishine, Tsutomu; Harada, Akira; Yamamoto, Rintaro; Matsumoto, Hiroyuki; Sakaguchi, Sumito; Ikegami, Takashi; Kashiwagi, Katsuya; Fujiwake, Syuji; Inoue, Kouji; Togawa, Yoshiyuki; Izawa, Masaki; Ohara, Eiji; Watahiki, Masanori; Yoneda, Yuko; Ishikawa, Tomokazu; Ozawa, Kaori; Tanaka, Takumi; Matsuura, Shuji; Kawai, Jun; Okazaki, Yasushi; Muramatsu, Masami; Inoue, Yorinao; Kira, Akira; Hayashizaki, Yoshihide

    2000-01-01

    The RIKEN high-throughput 384-format sequencing pipeline (RISA system) including a 384-multicapillary sequencer (the so-called RISA sequencer) was developed for the RIKEN mouse encyclopedia project. The RISA system consists of colony picking, template preparation, sequencing reaction, and the sequencing process. A novel high-throughput 384-format capillary sequencer system (RISA sequencer system) was developed for the sequencing process. This system consists of a 384-multicapillary auto sequencer (RISA sequencer), a 384-multicapillary array assembler (CAS), and a 384-multicapillary casting device. The RISA sequencer can simultaneously analyze 384 independent sequencing products. The optical system is a scanning system chosen after careful comparison with an image detection system for the simultaneous detection of the 384-capillary array. This scanning system can be used with any fluorescent-labeled sequencing reaction (chain termination reaction), including transcriptional sequencing based on RNA polymerase, which was originally developed by us, and cycle sequencing based on thermostable DNA polymerase. For long-read sequencing, 380 out of 384 sequences (99.2%) were successfully analyzed and the average read length, with more than 99% accuracy, was 654.4 bp. A single RISA sequencer can analyze 216 kb with >99% accuracy in 2.7 h (90 kb/h). For short-read sequencing to cluster the 3′ end and 5′ end sequencing by reading 350 bp, 384 samples can be analyzed in 1.5 h. We have also developed a RISA inoculator, RISA filtrator and densitometer, RISA plasmid preparator which can handle throughput of 40,000 samples in 17.5 h, and a high-throughput RISA thermal cycler which has four 384-well sites. The combination of these technologies allowed us to construct the RISA system consisting of 16 RISA sequencers, which can process 50,000 DNA samples per day. One haploid genome shotgun sequence of a higher organism, such as human, mouse, rat, domestic animals, and plants, can

  16. The 60 kDa heat shock proteins in the hyperthermophilic archaeon Sulfolobus shibatae.

    PubMed

    Kagawa, H K; Osipiuk, J; Maltsev, N; Overbeek, R; Quaite-Randall, E; Joachimiak, A; Trent, J D

    1995-11-10

    One of the most abundant proteins in the hyperthermophilic archaeon Sulfolobus shibatae is the 59 kDa heat shock protein (TF55) that is believed to form a homo-oligomeric double ring complex structurally similar to the bacterial chaperonins. We discovered a second protein subunit in the S. shibatae ring complex (referred to as alpha) that is stoichiometric with TF55 (renamed beta). The gene and flanking regions of alpha were cloned and sequenced and its inferred amino acid sequence has 54.4% identity and 74.4% similarity to beta. Transcription start sites for both alpha and beta were mapped and three potential transcription regulatory regions were identified. Northern analyses of cultures shifted from normal growth temperatures (70 to 75 degrees C) to heat shock temperatures (85 to 90 degrees C) indicated that the levels of alpha and beta mRNAs increased during heat shock, but at all temperatures their relative proportions remained constant. Monitoring protein synthesis by autoradiography of total proteins from cultures pulse labeled with L(-)[35S]methionine at normal and heat shock temperatures indicated significant increases in alpha and beta synthesis during heat shock. Under extreme heat shock conditions (> or = 90 degrees C) alpha and beta appeared to be the only two proteins synthesized. The purified alpha and beta subunits combined to form high molecular mass complexes with similar mobilities on native polyacrylamide gels to the complexes isolated directly from cells. Equal proportions of the two subunits gave the greatest yield of the complex, which we refer to as a "rosettasome". It is argued that the rosettasome consists of two homo-oligomeric rings; one of alpha and the other of beta. Polyclonal antibodies against alpha and beta from S. shibatae cross-reacted with proteins of similar molecular mass in 10 out of the 17 archaeal species tested, suggesting that the two rosettasome proteins are highly conserved among the archaea. The archaeal sequences were

  17. Effects of pre- and pro-sequence of thaumatin on the secretion by Pichia pastoris.

    PubMed

    Ide, Nobuyuki; Masuda, Tetsuya; Kitabatake, Naofumi

    2007-11-23

    Thaumatin is a 22-kDa sweet-tasting protein containing eight disulfide bonds. When thaumatin is expressed in Pichia pastoris using the thaumatin cDNA fused with both the alpha-factor signal sequence and the Kex2 protease cleavage site from Saccharomyces cerevisiae, the N-terminal sequence of the secreted thaumatin molecule is not processed correctly. To examine the role of the thaumatin cDNA-encoded N-terminal pre-sequence and C-terminal pro-sequence on the processing of thaumatin and efficiency of thaumatin production in P. pastoris, four expression plasmids with different pre-sequence and pro-sequence were constructed and transformed into P. pastoris. The transformants containing pre-thaumatin gene that has the native plant signal, secreted thaumatin molecules in the medium. The N-terminal amino acid sequence of the secreted thaumatin molecule was processed correctly. The production yield of thaumatin was not affected by the C-terminal pro-sequence, and the pro-sequence was not processed in P. pastoris, indicating that pro-sequence is not necessary for thaumatin synthesis.

  18. Sequence analysis and characterization of pyruvate kinase from Clonorchis sinensis, a 53.1-kDa homopentamer, implicated immune protective efficacy against clonorchiasis.

    PubMed

    Chen, Tingjin; Jiang, Hongye; Sun, Hengchang; Xie, Zhizhi; Ren, Pengli; Zhao, Lu; Dong, Huimin; Shi, Mengchen; Lv, Zhiyue; Wu, Zhongdao; Li, Xuerong; Yu, Xinbing; Huang, Yan; Xu, Jin

    2017-11-09

    Clonorchis sinensis, the causative agent of clonorchiasis, is classified as one of the most neglected tropical diseases and affects more than 15 million people globally. This hepatobiliary disease is highly associated with cholangiocarcinoma. As key molecules in the infectivity and subsistence of trematodes, glycolytic enzymes have been targets for drug and vaccine development. Clonorchis sinensis pyruvate kinase (CsPK), a crucial glycolytic enzyme, was characterized in this research. Differences were observed in the sequences and spatial structures of CsPK and PKs from humans, rats, mice and rabbits. CsPK possessed a characteristic active site signature (IKLIAKIENHEGV) and some unique sites but lacked the N-terminal domain. The predicted subunit molecular mass (Mr) of CsPK was 53.1 kDa. Recombinant CsPK (rCsPK) was a homopentamer with a Mr. of approximately 290 kDa by both native PAGE and gel filtration chromatography. Significant differences in the protein and mRNA levels of CsPK were observed among four life stages of C. sinensis (egg, adult worm, excysted metacercaria and metacercaria), suggesting that these developmental stages may be associated with diverse energy demands. CsPK was widely distributed in adult worms. Moreover, an intense Th1-biased immune response was persistently elicited in rats immunized with rCsPK. Also, rat anti-rCsPK sera suppressed C. sinensis adult subsistence both in vivo and in vitro. The sequences and spatial structures, molecular mass, and expression profile of CsPK have been characterized. rCsPK was indicated to be a homopentamer. Rat anti-rCsPK sera suppressed C. sinensis adult subsistence both in vivo and in vitro. CsPK is worthy of further study as a promising target for drug and vaccine development.

  19. Multilocus sequence analysis and rpoB sequencing of Mycobacterium abscessus (sensu lato) strains.

    PubMed

    Macheras, Edouard; Roux, Anne-Laure; Bastian, Sylvaine; Leão, Sylvia Cardoso; Palaci, Moises; Sivadon-Tardy, Valérie; Gutierrez, Cristina; Richter, Elvira; Rüsch-Gerdes, Sabine; Pfyffer, Gaby; Bodmer, Thomas; Cambau, Emmanuelle; Gaillard, Jean-Louis; Heym, Beate

    2011-02-01

    Mycobacterium abscessus, Mycobacterium bolletii, and Mycobacterium massiliense (Mycobacterium abscessus sensu lato) are closely related species that currently are identified by the sequencing of the rpoB gene. However, recent studies show that rpoB sequencing alone is insufficient to discriminate between these species, and some authors have questioned their current taxonomic classification. We studied here a large collection of M. abscessus (sensu lato) strains by partial rpoB sequencing (752 bp) and multilocus sequence analysis (MLSA). The final MLSA scheme developed was based on the partial sequences of eight housekeeping genes: argH, cya, glpK, gnd, murC, pgm, pta, and purH. The strains studied included the three type strains (M. abscessus CIP 104536(T), M. massiliense CIP 108297(T), and M. bolletii CIP 108541(T)) and 120 isolates recovered between 1997 and 2007 in France, Germany, Switzerland, and Brazil. The rpoB phylogenetic tree confirmed the existence of three main clusters, each comprising the type strain of one species. However, divergence values between the M. massiliense and M. bolletii clusters all were below 3% and between the M. abscessus and M. massiliense clusters were from 2.66 to 3.59%. The tree produced using the concatenated MLSA gene sequences (4,071 bp) also showed three main clusters, each comprising the type strain of one species. The M. abscessus cluster had a bootstrap value of 100% and was mostly compact. Bootstrap values for the M. massiliense and M. bolletii branches were much lower (71 and 61%, respectively), with the M. massiliense cluster having a fuzzy aspect. Mean (range) divergence values were 2.17% (1.13 to 2.58%) between the M. abscessus and M. massiliense clusters, 2.37% (1.5 to 2.85%) between the M. abscessus and M. bolletii clusters, and 2.28% (0.86 to 2.68%) between the M. massiliense and M. bolletii clusters. Adding the rpoB sequence to the MLSA-concatenated sequence (total sequence, 4,823 bp) had little effect on the

  20. Multilocus Sequence Analysis and rpoB Sequencing of Mycobacterium abscessus (Sensu Lato) Strains▿

    PubMed Central

    Macheras, Edouard; Roux, Anne-Laure; Bastian, Sylvaine; Leão, Sylvia Cardoso; Palaci, Moises; Sivadon-Tardy, Valérie; Gutierrez, Cristina; Richter, Elvira; Rüsch-Gerdes, Sabine; Pfyffer, Gaby; Bodmer, Thomas; Cambau, Emmanuelle; Gaillard, Jean-Louis; Heym, Beate

    2011-01-01

    Mycobacterium abscessus, Mycobacterium bolletii, and Mycobacterium massiliense (Mycobacterium abscessus sensu lato) are closely related species that currently are identified by the sequencing of the rpoB gene. However, recent studies show that rpoB sequencing alone is insufficient to discriminate between these species, and some authors have questioned their current taxonomic classification. We studied here a large collection of M. abscessus (sensu lato) strains by partial rpoB sequencing (752 bp) and multilocus sequence analysis (MLSA). The final MLSA scheme developed was based on the partial sequences of eight housekeeping genes: argH, cya, glpK, gnd, murC, pgm, pta, and purH. The strains studied included the three type strains (M. abscessus CIP 104536T, M. massiliense CIP 108297T, and M. bolletii CIP 108541T) and 120 isolates recovered between 1997 and 2007 in France, Germany, Switzerland, and Brazil. The rpoB phylogenetic tree confirmed the existence of three main clusters, each comprising the type strain of one species. However, divergence values between the M. massiliense and M. bolletii clusters all were below 3% and between the M. abscessus and M. massiliense clusters were from 2.66 to 3.59%. The tree produced using the concatenated MLSA gene sequences (4,071 bp) also showed three main clusters, each comprising the type strain of one species. The M. abscessus cluster had a bootstrap value of 100% and was mostly compact. Bootstrap values for the M. massiliense and M. bolletii branches were much lower (71 and 61%, respectively), with the M. massiliense cluster having a fuzzy aspect. Mean (range) divergence values were 2.17% (1.13 to 2.58%) between the M. abscessus and M. massiliense clusters, 2.37% (1.5 to 2.85%) between the M. abscessus and M. bolletii clusters, and 2.28% (0.86 to 2.68%) between the M. massiliense and M. bolletii clusters. Adding the rpoB sequence to the MLSA-concatenated sequence (total sequence, 4,823 bp) had little effect on the clustering

  1. High Throughput Sequence Analysis for Disease Resistance in Maize

    USDA-ARS?s Scientific Manuscript database

    Preliminary results of a computational analysis of high throughput sequencing data from Zea mays and the fungus Aspergillus are reported. The Illumina Genome Analyzer was used to sequence RNA samples from two strains of Z. mays (Va35 and Mp313) collected over a time course as well as several specie...

  2. DNAApp: a mobile application for sequencing data analysis

    PubMed Central

    Nguyen, Phi-Vu; Verma, Chandra Shekhar; Gan, Samuel Ken-En

    2014-01-01

    Summary: There have been numerous applications developed for decoding and visualization of ab1 DNA sequencing files for Windows and MAC platforms, yet none exists for the increasingly popular smartphone operating systems. The ability to decode sequencing files cannot easily be carried out using browser accessed Web tools. To overcome this hurdle, we have developed a new native app called DNAApp that can decode and display ab1 sequencing file on Android and iOS. In addition to in-built analysis tools such as reverse complementation, protein translation and searching for specific sequences, we have incorporated convenient functions that would facilitate the harnessing of online Web tools for a full range of analysis. Given the high usage of Android/iOS tablets and smartphones, such bioinformatics apps would raise productivity and facilitate the high demand for analyzing sequencing data in biomedical research. Availability and implementation: The Android version of DNAApp is available in Google Play Store as ‘DNAApp’, and the iOS version is available in the App Store. More details on the app can be found at www.facebook.com/APDLab; www.bii.a-star.edu.sg/research/trd/apd.php The DNAApp user guide is available at http://tinyurl.com/DNAAppuser, and a video tutorial is available on Google Play Store and App Store, as well as on the Facebook page. Contact: samuelg@bii.a-star.edu.sg PMID:25095882

  3. DNAApp: a mobile application for sequencing data analysis.

    PubMed

    Nguyen, Phi-Vu; Verma, Chandra Shekhar; Gan, Samuel Ken-En

    2014-11-15

    There have been numerous applications developed for decoding and visualization of ab1 DNA sequencing files for Windows and MAC platforms, yet none exists for the increasingly popular smartphone operating systems. The ability to decode sequencing files cannot easily be carried out using browser accessed Web tools. To overcome this hurdle, we have developed a new native app called DNAApp that can decode and display ab1 sequencing file on Android and iOS. In addition to in-built analysis tools such as reverse complementation, protein translation and searching for specific sequences, we have incorporated convenient functions that would facilitate the harnessing of online Web tools for a full range of analysis. Given the high usage of Android/iOS tablets and smartphones, such bioinformatics apps would raise productivity and facilitate the high demand for analyzing sequencing data in biomedical research. The Android version of DNAApp is available in Google Play Store as 'DNAApp', and the iOS version is available in the App Store. More details on the app can be found at www.facebook.com/APDLab; www.bii.a-star.edu.sg/research/trd/apd.php The DNAApp user guide is available at http://tinyurl.com/DNAAppuser, and a video tutorial is available on Google Play Store and App Store, as well as on the Facebook page. samuelg@bii.a-star.edu.sg. © The Author 2014. Published by Oxford University Press.

  4. Complete nucleotide sequences and genome characterization of a novel double-stranded RNA virus infecting Rosa multiflora.

    PubMed

    Salem, Nidá M; Golino, Deborah A; Falk, Bryce W; Rowhani, Adib

    2008-01-01

    The three double-stranded (ds) RNAs were detected in Rosa multiflora plants showing rose spring dwarf (RSD) symptoms. Northern blot analysis revealed three dsRNAs in preparations of both dsRNA and total RNA from R. multiflora plants. The complete sequences of the dsRNAs (referred to as dsRNA 1, dsRNA 2 and dsRNA 3) were determined based on a combination of shotgun cloning of dsRNA cDNAs and reverse transcription-polymerase chain reaction (RT-PCR). The largest dsRNA (dsRNA 1) was 1,762 bp long with a single open reading frame (ORF) that encoded a putative polypeptide containing 479 amino acid residues with a molecular mass of 55.9 kDa. This polypeptide contains amino acid sequence motifs conserved in the RNA-dependent RNA polymerases (RdRp) of members of the family Partitiviridae. Both dsRNA 2 (1,475 bp) and dsRNA 3 (1,384 bp) contained single ORFs, encoding putative proteins of unknown function. The 5' untranslated regions (UTR) of all three segments shared regions of high sequence homology. Phylogenetic analysis using the RdRp sequences of the various partitiviruses revealed that the new sequences would constitute the genome of a virus in family Partitiviridae. This virus would cluster with Fragaria chiloensis cryptic virus and Raphanus sativus cryptic virus 2. We suggest that the three dsRNA segments constitute the genome of a novel cryptic virus infecting roses; we propose the name Rosa multiflora cryptic virus (RMCV). Detection primers were developed and used for RT-PCR detection of RMCV in rose plants.

  5. Whole-genome sequence-based analysis of thyroid function.

    PubMed

    Taylor, Peter N; Porcu, Eleonora; Chew, Shelby; Campbell, Purdey J; Traglia, Michela; Brown, Suzanne J; Mullin, Benjamin H; Shihab, Hashem A; Min, Josine; Walter, Klaudia; Memari, Yasin; Huang, Jie; Barnes, Michael R; Beilby, John P; Charoen, Pimphen; Danecek, Petr; Dudbridge, Frank; Forgetta, Vincenzo; Greenwood, Celia; Grundberg, Elin; Johnson, Andrew D; Hui, Jennie; Lim, Ee M; McCarthy, Shane; Muddyman, Dawn; Panicker, Vijay; Perry, John R B; Bell, Jordana T; Yuan, Wei; Relton, Caroline; Gaunt, Tom; Schlessinger, David; Abecasis, Goncalo; Cucca, Francesco; Surdulescu, Gabriela L; Woltersdorf, Wolfram; Zeggini, Eleftheria; Zheng, Hou-Feng; Toniolo, Daniela; Dayan, Colin M; Naitza, Silvia; Walsh, John P; Spector, Tim; Davey Smith, George; Durbin, Richard; Richards, J Brent; Sanna, Serena; Soranzo, Nicole; Timpson, Nicholas J; Wilson, Scott G

    2015-03-06

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N=2,287). Using additional whole-genome sequence and deeply imputed data sets, we report meta-analysis results for common variants (MAF≥1%) associated with TSH and FT4 (N=16,335). For TSH, we identify a novel variant in SYN2 (MAF=23.5%, P=6.15 × 10(-9)) and a new independent variant in PDE8B (MAF=10.4%, P=5.94 × 10(-14)). For FT4, we report a low-frequency variant near B4GALT6/SLC25A52 (MAF=3.2%, P=1.27 × 10(-9)) tagging a rare TTR variant (MAF=0.4%, P=2.14 × 10(-11)). All common variants explain ≥20% of the variance in TSH and FT4. Analysis of rare variants (MAF<1%) using sequence kernel association testing reveals a novel association with FT4 in NRG1. Our results demonstrate that increased coverage in whole-genome sequence association studies identifies novel variants associated with thyroid function.

  6. RSAT 2018: regulatory sequence analysis tools 20th anniversary.

    PubMed

    Nguyen, Nga Thi Thuy; Contreras-Moreira, Bruno; Castro-Mondragon, Jaime A; Santana-Garcia, Walter; Ossio, Raul; Robles-Espinoza, Carla Daniela; Bahin, Mathieu; Collombet, Samuel; Vincens, Pierre; Thieffry, Denis; van Helden, Jacques; Medina-Rivera, Alejandra; Thomas-Chollier, Morgane

    2018-05-02

    RSAT (Regulatory Sequence Analysis Tools) is a suite of modular tools for the detection and the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, including from genome-wide datasets like ChIP-seq/ATAC-seq, (ii) motif scanning, (iii) motif analysis (quality assessment, comparisons and clustering), (iv) analysis of regulatory variations, (v) comparative genomics. Six public servers jointly support 10 000 genomes from all kingdoms. Six novel or refactored programs have been added since the 2015 NAR Web Software Issue, including updated programs to analyse regulatory variants (retrieve-variation-seq, variation-scan, convert-variations), along with tools to extract sequences from a list of coordinates (retrieve-seq-bed), to select motifs from motif collections (retrieve-matrix), and to extract orthologs based on Ensembl Compara (get-orthologs-compara). Three use cases illustrate the integration of new and refactored tools to the suite. This Anniversary update gives a 20-year perspective on the software suite. RSAT is well-documented and available through Web sites, SOAP/WSDL (Simple Object Access Protocol/Web Services Description Language) web services, virtual machines and stand-alone programs at http://www.rsat.eu/.

  7. SSAW: A new sequence similarity analysis method based on the stationary discrete wavelet transform.

    PubMed

    Lin, Jie; Wei, Jing; Adjeroh, Donald; Jiang, Bing-Hua; Jiang, Yue

    2018-05-02

    Alignment-free sequence similarity analysis methods often lead to significant savings in computational time over alignment-based counterparts. A new alignment-free sequence similarity analysis method, called SSAW is proposed. SSAW stands for Sequence Similarity Analysis using the Stationary Discrete Wavelet Transform (SDWT). It extracts k-mers from a sequence, then maps each k-mer to a complex number field. Then, the series of complex numbers formed are transformed into feature vectors using the stationary discrete wavelet transform. After these steps, the original sequence is turned into a feature vector with numeric values, which can then be used for clustering and/or classification. Using two different types of applications, namely, clustering and classification, we compared SSAW against the the-state-of-the-art alignment free sequence analysis methods. SSAW demonstrates competitive or superior performance in terms of standard indicators, such as accuracy, F-score, precision, and recall. The running time was significantly better in most cases. These make SSAW a suitable method for sequence analysis, especially, given the rapidly increasing volumes of sequence data required by most modern applications.

  8. Mitochondrial sequence analysis for forensic identification using pyrosequencing technology.

    PubMed

    Andréasson, H; Asp, A; Alderborn, A; Gyllensten, U; Allen, M

    2002-01-01

    Over recent years, requests for mtDNA analysis in the field of forensic medicine have notably increased, and the results of such analyses have proved to be very useful in forensic cases where nuclear DNA analysis cannot be performed. Traditionally, mtDNA has been analyzed by DNA sequencing of the two hypervariable regions, HVI and HVII, in the D-loop. DNA sequence analysis using the conventional Sanger sequencing is very robust but time consuming and labor intensive. By contrast, mtDNA analysis based on the pyrosequencing technology provides fast and accurate results from the human mtDNA present in many types of evidence materials in forensic casework. The assay has been developed to determine polymorphic sites in the mitochondrial D-loop as well as the coding region to further increase the discrimination power of mtDNA analysis. The pyrosequencing technology for analysis of mtDNA polymorphisms has been tested with regard to sensitivity, reproducibility, and success rate when applied to control samples and actual casework materials. The results show that the method is very accurate and sensitive; the results are easily interpreted and provide a high success rate on casework samples. The panel of pyrosequencing reactions for the mtDNA polymorphisms were chosen to result in an optimal discrimination power in relation to the number of bases determined.

  9. Software for rapid time dependent ChIP-sequencing analysis (TDCA).

    PubMed

    Myschyshyn, Mike; Farren-Dai, Marco; Chuang, Tien-Jui; Vocadlo, David

    2017-11-25

    Chromatin immunoprecipitation followed by DNA sequencing (ChIP-seq) and associated methods are widely used to define the genome wide distribution of chromatin associated proteins, post-translational epigenetic marks, and modifications found on DNA bases. An area of emerging interest is to study time dependent changes in the distribution of such proteins and marks by using serial ChIP-seq experiments performed in a time resolved manner. Despite such time resolved studies becoming increasingly common, software to facilitate analysis of such data in a robust automated manner is limited. We have designed software called Time-Dependent ChIP-Sequencing Analyser (TDCA), which is the first program to automate analysis of time-dependent ChIP-seq data by fitting to sigmoidal curves. We provide users with guidance for experimental design of TDCA for modeling of time course (TC) ChIP-seq data using two simulated data sets. Furthermore, we demonstrate that this fitting strategy is widely applicable by showing that automated analysis of three previously published TC data sets accurately recapitulates key findings reported in these studies. Using each of these data sets, we highlight how biologically relevant findings can be readily obtained by exploiting TDCA to yield intuitive parameters that describe behavior at either a single locus or sets of loci. TDCA enables customizable analysis of user input aligned DNA sequencing data, coupled with graphical outputs in the form of publication-ready figures that describe behavior at either individual loci or sets of loci sharing common traits defined by the user. TDCA accepts sequencing data as standard binary alignment map (BAM) files and loci of interest in browser extensible data (BED) file format. TDCA accurately models the number of sequencing reads, or coverage, at loci from TC ChIP-seq studies or conceptually related TC sequencing experiments. TC experiments are reduced to intuitive parametric values that facilitate biologically

  10. Coupling detrended fluctuation analysis for multiple warehouse-out behavioral sequences

    NASA Astrophysics Data System (ADS)

    Yao, Can-Zhong; Lin, Ji-Nan; Zheng, Xu-Zhou

    2017-01-01

    Interaction patterns among different warehouses could make the warehouse-out behavioral sequences less predictable. We firstly take a coupling detrended fluctuation analysis on the warehouse-out quantity, and find that the multivariate sequences exhibit significant coupling multifractal characteristics regardless of the types of steel products. Secondly, we track the sources of multifractal warehouse-out sequences by shuffling and surrogating original ones, and we find that fat-tail distribution contributes more to multifractal features than the long-term memory, regardless of types of steel products. From perspective of warehouse contribution, some warehouses steadily contribute more to multifractal than other warehouses. Finally, based on multiscale multifractal analysis, we propose Hurst surface structure to investigate coupling multifractal, and show that multiple behavioral sequences exhibit significant coupling multifractal features that emerge and usually be restricted within relatively greater time scale interval.

  11. CAFE: aCcelerated Alignment-FrEe sequence analysis

    PubMed Central

    Lu, Yang Young; Tang, Kujin; Ren, Jie; Fuhrman, Jed A.; Waterman, Michael S.

    2017-01-01

    Abstract Alignment-free genome and metagenome comparisons are increasingly important with the development of next generation sequencing (NGS) technologies. Recently developed state-of-the-art k-mer based alignment-free dissimilarity measures including CVTree, \\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{upgreek} \\usepackage{mathrsfs} \\setlength{\\oddsidemargin}{-69pt} \\begin{document} }{}$d_2^*$\\end{document} and \\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{upgreek} \\usepackage{mathrsfs} \\setlength{\\oddsidemargin}{-69pt} \\begin{document} }{}$d_2^S$\\end{document} are more computationally expensive than measures based solely on the k-mer frequencies. Here, we report a standalone software, aCcelerated Alignment-FrEe sequence analysis (CAFE), for efficient calculation of 28 alignment-free dissimilarity measures. CAFE allows for both assembled genome sequences and unassembled NGS shotgun reads as input, and wraps the output in a standard PHYLIP format. In downstream analyses, CAFE can also be used to visualize the pairwise dissimilarity measures, including dendrograms, heatmap, principal coordinate analysis and network display. CAFE serves as a general k-mer based alignment-free analysis platform for studying the relationships among genomes and metagenomes, and is freely available at https://github.com/younglululu/CAFE. PMID:28472388

  12. Integrated databanks access and sequence/structure analysis services at the PBIL.

    PubMed

    Perrière, Guy; Combet, Christophe; Penel, Simon; Blanchet, Christophe; Thioulouse, Jean; Geourjon, Christophe; Grassot, Julien; Charavay, Céline; Gouy, Manolo; Duret, Laurent; Deléage, Gilbert

    2003-07-01

    The World Wide Web server of the PBIL (Pôle Bioinformatique Lyonnais) provides on-line access to sequence databanks and to many tools of nucleic acid and protein sequence analyses. This server allows to query nucleotide sequence banks in the EMBL and GenBank formats and protein sequence banks in the SWISS-PROT and PIR formats. The query engine on which our data bank access is based is the ACNUC system. It allows the possibility to build complex queries to access functional zones of biological interest and to retrieve large sequence sets. Of special interest are the unique features provided by this system to query the data banks of gene families developed at the PBIL. The server also provides access to a wide range of sequence analysis methods: similarity search programs, multiple alignments, protein structure prediction and multivariate statistics. An originality of this server is the integration of these two aspects: sequence retrieval and sequence analysis. Indeed, thanks to the introduction of re-usable lists, it is possible to perform treatments on large sets of data. The PBIL server can be reached at: http://pbil.univ-lyon1.fr.

  13. Automated sequence analysis and editing software for HIV drug resistance testing.

    PubMed

    Struck, Daniel; Wallis, Carole L; Denisov, Gennady; Lambert, Christine; Servais, Jean-Yves; Viana, Raquel V; Letsoalo, Esrom; Bronze, Michelle; Aitken, Sue C; Schuurman, Rob; Stevens, Wendy; Schmit, Jean Claude; Rinke de Wit, Tobias; Perez Bercoff, Danielle

    2012-05-01

    Access to antiretroviral treatment in resource-limited-settings is inevitably paralleled by the emergence of HIV drug resistance. Monitoring treatment efficacy and HIV drugs resistance testing are therefore of increasing importance in resource-limited settings. Yet low-cost technologies and procedures suited to the particular context and constraints of such settings are still lacking. The ART-A (Affordable Resistance Testing for Africa) consortium brought together public and private partners to address this issue. To develop an automated sequence analysis and editing software to support high throughput automated sequencing. The ART-A Software was designed to automatically process and edit ABI chromatograms or FASTA files from HIV-1 isolates. The ART-A Software performs the basecalling, assigns quality values, aligns query sequences against a set reference, infers a consensus sequence, identifies the HIV type and subtype, translates the nucleotide sequence to amino acids and reports insertions/deletions, premature stop codons, ambiguities and mixed calls. The results can be automatically exported to Excel to identify mutations. Automated analysis was compared to manual analysis using a panel of 1624 PR-RT sequences generated in 3 different laboratories. Discrepancies between manual and automated sequence analysis were 0.69% at the nucleotide level and 0.57% at the amino acid level (668,047 AA analyzed), and discordances at major resistance mutations were recorded in 62 cases (4.83% of differences, 0.04% of all AA) for PR and 171 (6.18% of differences, 0.03% of all AA) cases for RT. The ART-A Software is a time-sparing tool for pre-analyzing HIV and viral quasispecies sequences in high throughput laboratories and highlighting positions requiring attention. Copyright © 2012 Elsevier B.V. All rights reserved.

  14. CSReport: A New Computational Tool Designed for Automatic Analysis of Class Switch Recombination Junctions Sequenced by High-Throughput Sequencing.

    PubMed

    Boyer, François; Boutouil, Hend; Dalloul, Iman; Dalloul, Zeinab; Cook-Moreau, Jeanne; Aldigier, Jean-Claude; Carrion, Claire; Herve, Bastien; Scaon, Erwan; Cogné, Michel; Péron, Sophie

    2017-05-15

    B cells ensure humoral immune responses due to the production of Ag-specific memory B cells and Ab-secreting plasma cells. In secondary lymphoid organs, Ag-driven B cell activation induces terminal maturation and Ig isotype class switch (class switch recombination [CSR]). CSR creates a virtually unique IgH locus in every B cell clone by intrachromosomal recombination between two switch (S) regions upstream of each C region gene. Amount and structural features of CSR junctions reveal valuable information about the CSR mechanism, and analysis of CSR junctions is useful in basic and clinical research studies of B cell functions. To provide an automated tool able to analyze large data sets of CSR junction sequences produced by high-throughput sequencing (HTS), we designed CSReport, a software program dedicated to support analysis of CSR recombination junctions sequenced with a HTS-based protocol (Ion Torrent technology). CSReport was assessed using simulated data sets of CSR junctions and then used for analysis of Sμ-Sα and Sμ-Sγ1 junctions from CH12F3 cells and primary murine B cells, respectively. CSReport identifies junction segment breakpoints on reference sequences and junction structure (blunt-ended junctions or junctions with insertions or microhomology). Besides the ability to analyze unprecedentedly large libraries of junction sequences, CSReport will provide a unified framework for CSR junction studies. Our results show that CSReport is an accurate tool for analysis of sequences from our HTS-based protocol for CSR junctions, thereby facilitating and accelerating their study. Copyright © 2017 by The American Association of Immunologists, Inc.

  15. Molluscan mega-hemocyanin: an ancient oxygen carrier tuned by a ~550 kDa polypeptide

    PubMed Central

    2010-01-01

    Background The allosteric respiratory protein hemocyanin occurs in gastropods as tubular di-, tri- and multimers of a 35 × 18 nm, ring-like decamer with a collar complex at one opening. The decamer comprises five subunit dimers. The subunit, a 400 kDa polypeptide, is a concatenation of eight paralogous functional units. Their exact topology within the quaternary structure has recently been solved by 3D electron microscopy, providing a molecular model of an entire didecamer (two conjoined decamers). Here we study keyhole limpet hemocyanin (KLH2) tridecamers to unravel the exact association mode of the third decamer. Moreover, we introduce and describe a more complex type of hemocyanin tridecamer discovered in fresh/brackish-water cerithioid snails (Leptoxis, Melanoides, Terebralia). Results The "typical" KLH2 tridecamer is partially hollow, whereas the cerithioid tridecamer is almost completely filled with material; it was therefore termed "mega-hemocyanin". In both types, the staggering angle between adjoining decamers is 36°. The cerithioid tridecamer comprises two typical decamers based on the canonical 400 kDa subunit, flanking a central "mega-decamer" composed of ten unique ~550 kDa subunits. The additional ~150 kDa per subunit substantially enlarge the internal collar complex. Preliminary oxygen binding measurements indicate a moderate hemocyanin oxygen affinity in Leptoxis (p50 ~9 mmHg), and a very high affinity in Melanoides (~3 mmHg) and Terebralia (~2 mmHg). Species-specific and individual variation in the proportions of the two subunit types was also observed, leading to differences in the oligomeric states found in the hemolymph. Conclusions In cerithioid hemocyanin tridecamers ("mega-hemocyanin") the collar complex of the central decamer is substantially enlarged and modified. The preliminary O2 binding curves indicate that there are species-specific functional differences in the cerithioid mega-hemocyanins which might reflect different physiological

  16. SPAR: small RNA-seq portal for analysis of sequencing experiments.

    PubMed

    Kuksa, Pavel P; Amlie-Wolf, Alexandre; Katanic, Živadin; Valladares, Otto; Wang, Li-San; Leung, Yuk Yee

    2018-05-04

    The introduction of new high-throughput small RNA sequencing protocols that generate large-scale genomics datasets along with increasing evidence of the significant regulatory roles of small non-coding RNAs (sncRNAs) have highlighted the urgent need for tools to analyze and interpret large amounts of small RNA sequencing data. However, it remains challenging to systematically and comprehensively discover and characterize sncRNA genes and specifically-processed sncRNA products from these datasets. To fill this gap, we present Small RNA-seq Portal for Analysis of sequencing expeRiments (SPAR), a user-friendly web server for interactive processing, analysis, annotation and visualization of small RNA sequencing data. SPAR supports sequencing data generated from various experimental protocols, including smRNA-seq, short total RNA sequencing, microRNA-seq, and single-cell small RNA-seq. Additionally, SPAR includes publicly available reference sncRNA datasets from our DASHR database and from ENCODE across 185 human tissues and cell types to produce highly informative small RNA annotations across all major small RNA types and other features such as co-localization with various genomic features, precursor transcript cleavage patterns, and conservation. SPAR allows the user to compare the input experiment against reference ENCODE/DASHR datasets. SPAR currently supports analyses of human (hg19, hg38) and mouse (mm10) sequencing data. SPAR is freely available at https://www.lisanwanglab.org/SPAR.

  17. Noncoding sequence classification based on wavelet transform analysis: part II

    NASA Astrophysics Data System (ADS)

    Paredes, O.; Strojnik, M.; Romo-Vázquez, R.; Vélez-Pérez, H.; Ranta, R.; Garcia-Torales, G.; Scholl, M. K.; Morales, J. A.

    2017-09-01

    DNA sequences in human genome can be divided into the coding and noncoding ones. We hypothesize that the characteristic periodicities of the noncoding sequences are related to their function. We describe the procedure to identify these characteristic periodicities using the wavelet analysis. Our results show that three groups of noncoding sequences, each one with different biological function, may be differentiated by their wavelet coefficients within specific frequency range.

  18. Targeted Analysis of Whole Genome Sequence Data to Diagnose Genetic Cardiomyopathy

    DOE PAGES

    Golbus, Jessica R.; Puckelwartz, Megan J.; Dellefave-Castillo, Lisa; ...

    2014-09-01

    Background—Cardiomyopathy is highly heritable but genetically diverse. At present, genetic testing for cardiomyopathy uses targeted sequencing to simultaneously assess the coding regions of more than 50 genes. New genes are routinely added to panels to improve the diagnostic yield. With the anticipated $1000 genome, it is expected that genetic testing will shift towards comprehensive genome sequencing accompanied by targeted gene analysis. Therefore, we assessed the reliability of whole genome sequencing and targeted analysis to identify cardiomyopathy variants in 11 subjects with cardiomyopathy. Methods and Results—Whole genome sequencing with an average of 37× coverage was combined with targeted analysis focused onmore » 204 genes linked to cardiomyopathy. Genetic variants were scored using multiple prediction algorithms combined with frequency data from public databases. This pipeline yielded 1-14 potentially pathogenic variants per individual. Variants were further analyzed using clinical criteria and/or segregation analysis. Three of three previously identified primary mutations were detected by this analysis. In six subjects for whom the primary mutation was previously unknown, we identified mutations that segregated with disease, had clinical correlates, and/or had additional pathological correlation to provide evidence for causality. For two subjects with previously known primary mutations, we identified additional variants that may act as modifiers of disease severity. In total, we identified the likely pathological mutation in 9 of 11 (82%) subjects. We conclude that these pilot data demonstrate that ~30-40× coverage whole genome sequencing combined with targeted analysis is feasible and sensitive to identify rare variants in cardiomyopathy-associated genes.« less

  19. The respiratory arsenate reductase from Bacillus selenitireducens strain MLS10

    USGS Publications Warehouse

    Afkar, E.; Lisak, J.; Saltikov, C.; Basu, P.; Oremland, R.S.; Stolz, J.F.

    2003-01-01

    The respiratory arsenate reductase from the Gram-positive, haloalkaliphile, Bacillus selenitireducens strain MLS10 was purified and characterized. It is a membrane bound heterodimer (150 kDa) composed of two subunits ArrA (110 kDa) and ArrB (34 kDa), with an apparent Km for arsenate of 34 ??M and Vmax of 2.5 ??mol min-1 mg-1. Optimal activity occurred at pH 9.5 and 150 g l-1 of NaCl. Metal analysis (inductively coupled plasma mass spectrometry) of the holoenzyme and sequence analysis of the catalytic subunit (ArrA; the gene for which was cloned and sequenced) indicate it is a member of the DMSO reductase family of molybdoproteins. ?? 2003 Federation of European Microbiological Societies. Published by Elsevier B.V. All rights reserved.

  20. An Imaging And Graphics Workstation For Image Sequence Analysis

    NASA Astrophysics Data System (ADS)

    Mostafavi, Hassan

    1990-01-01

    This paper describes an application-specific engineering workstation designed and developed to analyze imagery sequences from a variety of sources. The system combines the software and hardware environment of the modern graphic-oriented workstations with the digital image acquisition, processing and display techniques. The objective is to achieve automation and high throughput for many data reduction tasks involving metric studies of image sequences. The applications of such an automated data reduction tool include analysis of the trajectory and attitude of aircraft, missile, stores and other flying objects in various flight regimes including launch and separation as well as regular flight maneuvers. The workstation can also be used in an on-line or off-line mode to study three-dimensional motion of aircraft models in simulated flight conditions such as wind tunnels. The system's key features are: 1) Acquisition and storage of image sequences by digitizing real-time video or frames from a film strip; 2) computer-controlled movie loop playback, slow motion and freeze frame display combined with digital image sharpening, noise reduction, contrast enhancement and interactive image magnification; 3) multiple leading edge tracking in addition to object centroids at up to 60 fields per second from both live input video or a stored image sequence; 4) automatic and manual field-of-view and spatial calibration; 5) image sequence data base generation and management, including the measurement data products; 6) off-line analysis software for trajectory plotting and statistical analysis; 7) model-based estimation and tracking of object attitude angles; and 8) interface to a variety of video players and film transport sub-systems.

  1. Network Analysis of Sequence-Function Relationships and Exploration of Sequence Space of TEM β-Lactamases.

    PubMed

    Zeil, Catharina; Widmann, Michael; Fademrecht, Silvia; Vogel, Constantin; Pleiss, Jürgen

    2016-05-01

    The Lactamase Engineering Database (www.LacED.uni-stuttgart.de) was developed to facilitate the classification and analysis of TEM β-lactamases. The current version contains 474 TEM variants. Two hundred fifty-nine variants form a large scale-free network of highly connected point mutants. The network was divided into three subnetworks which were enriched by single phenotypes: one network with predominantly 2be and two networks with 2br phenotypes. Fifteen positions were found to be highly variable, contributing to the majority of the observed variants. Since it is expected that a considerable fraction of the theoretical sequence space is functional, the currently sequenced 474 variants represent only the tip of the iceberg of functional TEM β-lactamase variants which form a huge natural reservoir of highly interconnected variants. Almost 50% of the variants are part of a quartet. Thus, two single mutations that result in functional enzymes can be combined into a functional protein. Most of these quartets consist of the same phenotype, or the mutations are additive with respect to the phenotype. By predicting quartets from triplets, 3,916 unknown variants were constructed. Eighty-seven variants complement multiple quartets and therefore have a high probability of being functional. The construction of a TEM β-lactamase network and subsequent analyses by clustering and quartet prediction are valuable tools to gain new insights into the viable sequence space of TEM β-lactamases and to predict their phenotype. The highly connected sequence space of TEM β-lactamases is ideally suited to network analysis and demonstrates the strengths of network analysis over tree reconstruction methods. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

  2. Dot-Blot Immunoassay of Fasciola gigantica Infection using 27 kDa and Adult Worm Regurge Antigens in Egyptian Patients

    PubMed Central

    Kamel, Hanan H.; Saad, Ghada A.

    2013-01-01

    The purpose of the present study was to evaluate the potential role of the 27-Kilodalton (KDa) antigen versus Fasciola gigantica adult worm regurge antigens in a DOT-Blot assay and to assess this assay as a practical tool for diagnosis fascioliasis in Egyptian patients. Fasciola gigantica antigen of an approximate molecular mass 27-(KDa) was obtained from adult worms by a simple elution SDS-PAGE. A Dot-Blot was developed comparatively to adult worm regurge antigens for the detection of specific antibodies from patients infected with F. gigantica in Egypt. Control sera were obtained from patients with other parasitic infections and healthy volunteers to assess the test and compare between the antigens. The sensitivity, specificity, positive and negative predictive values of Dot-Blot using the adult worm regurge were 80%, 90%, 94.1%, and 69.2% respectively, while those using 27-KDa were 100% which confirms the diagnostic potential of this antigen. All patients infected with Fasciola were positive, with cross reactivity reported with Schistosoma mansoni serum samples. This 27-KDa Dot-Blot assay showed to be a promising test which can be used for serodiagnosis of fascioliasis in Egyptian patients especially, those presenting with hepatic disease. It is specific, sensitive and easy to perform method for the rapid diagnosis particularly when more complex laboratory tests are unavailable. PMID:23710084

  3. Dot-blot immunoassay of Fasciola gigantica infection using 27 kDa and adult worm regurge antigens in Egyptian patients.

    PubMed

    Kamel, Hanan H; Saad, Ghada A; Sarhan, Rania M

    2013-04-01

    The purpose of the present study was to evaluate the potential role of the 27-Kilodalton (KDa) antigen versus Fasciola gigantica adult worm regurge antigens in a DOT-Blot assay and to assess this assay as a practical tool for diagnosis fascioliasis in Egyptian patients. Fasciola gigantica antigen of an approximate molecular mass 27-(KDa) was obtained from adult worms by a simple elution SDS-PAGE. A Dot-Blot was developed comparatively to adult worm regurge antigens for the detection of specific antibodies from patients infected with F. gigantica in Egypt. Control sera were obtained from patients with other parasitic infections and healthy volunteers to assess the test and compare between the antigens. The sensitivity, specificity, positive and negative predictive values of Dot-Blot using the adult worm regurge were 80%, 90%, 94.1%, and 69.2% respectively, while those using 27-KDa were 100% which confirms the diagnostic potential of this antigen. All patients infected with Fasciola were positive, with cross reactivity reported with Schistosoma mansoni serum samples. This 27-KDa Dot-Blot assay showed to be a promising test which can be used for serodiagnosis of fascioliasis in Egyptian patients especially, those presenting with hepatic disease. It is specific, sensitive and easy to perform method for the rapid diagnosis particularly when more complex laboratory tests are unavailable.

  4. Structure of the putative 32 kDa myrosinase-binding protein from Arabidopsis (At3g16450.1) determined by SAIL-NMR.

    PubMed

    Takeda, Mitsuhiro; Sugimori, Nozomi; Torizawa, Takuya; Terauchi, Tsutomu; Ono, Akira M; Yagi, Hirokazu; Yamaguchi, Yoshiki; Kato, Koichi; Ikeya, Teppei; Jee, Jungoo; Güntert, Peter; Aceti, David J; Markley, John L; Kainosho, Masatsune

    2008-12-01

    The product of gene At3g16450.1 from Arabidopsis thaliana is a 32 kDa, 299-residue protein classified as resembling a myrosinase-binding protein (MyroBP). MyroBPs are found in plants as part of a complex with the glucosinolate-degrading enzyme myrosinase, and are suspected to play a role in myrosinase-dependent defense against pathogens. Many MyroBPs and MyroBP-related proteins are composed of repeated homologous sequences with unknown structure. We report here the three-dimensional structure of the At3g16450.1 protein from Arabidopsis, which consists of two tandem repeats. Because the size of the protein is larger than that amenable to high-throughput analysis by uniform (13)C/(15)N labeling methods, we used stereo-array isotope labeling (SAIL) technology to prepare an optimally (2)H/(13)C/(15)N-labeled sample. NMR data sets collected using the SAIL protein enabled us to assign (1)H, (13)C and (15)N chemical shifts to 95.5% of all atoms, even at a low concentration (0.2 mm) of protein product. We collected additional NOESY data and determined the three-dimensional structure using the cyana software package. The structure, the first for a MyroBP family member, revealed that the At3g16450.1 protein consists of two independent but similar lectin-fold domains, each composed of three beta-sheets.

  5. PRADA: pipeline for RNA sequencing data analysis.

    PubMed

    Torres-García, Wandaliz; Zheng, Siyuan; Sivachenko, Andrey; Vegesna, Rahulsimham; Wang, Qianghu; Yao, Rong; Berger, Michael F; Weinstein, John N; Getz, Gad; Verhaak, Roel G W

    2014-08-01

    Technological advances in high-throughput sequencing necessitate improved computational tools for processing and analyzing large-scale datasets in a systematic automated manner. For that purpose, we have developed PRADA (Pipeline for RNA-Sequencing Data Analysis), a flexible, modular and highly scalable software platform that provides many different types of information available by multifaceted analysis starting from raw paired-end RNA-seq data: gene expression levels, quality metrics, detection of unsupervised and supervised fusion transcripts, detection of intragenic fusion variants, homology scores and fusion frame classification. PRADA uses a dual-mapping strategy that increases sensitivity and refines the analytical endpoints. PRADA has been used extensively and successfully in the glioblastoma and renal clear cell projects of The Cancer Genome Atlas program.  http://sourceforge.net/projects/prada/  gadgetz@broadinstitute.org or rverhaak@mdanderson.org  Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  6. Regularized rare variant enrichment analysis for case-control exome sequencing data.

    PubMed

    Larson, Nicholas B; Schaid, Daniel J

    2014-02-01

    Rare variants have recently garnered an immense amount of attention in genetic association analysis. However, unlike methods traditionally used for single marker analysis in GWAS, rare variant analysis often requires some method of aggregation, since single marker approaches are poorly powered for typical sequencing study sample sizes. Advancements in sequencing technologies have rendered next-generation sequencing platforms a realistic alternative to traditional genotyping arrays. Exome sequencing in particular not only provides base-level resolution of genetic coding regions, but also a natural paradigm for aggregation via genes and exons. Here, we propose the use of penalized regression in combination with variant aggregation measures to identify rare variant enrichment in exome sequencing data. In contrast to marginal gene-level testing, we simultaneously evaluate the effects of rare variants in multiple genes, focusing on gene-based least absolute shrinkage and selection operator (LASSO) and exon-based sparse group LASSO models. By using gene membership as a grouping variable, the sparse group LASSO can be used as a gene-centric analysis of rare variants while also providing a penalized approach toward identifying specific regions of interest. We apply extensive simulations to evaluate the performance of these approaches with respect to specificity and sensitivity, comparing these results to multiple competing marginal testing methods. Finally, we discuss our findings and outline future research. © 2013 WILEY PERIODICALS, INC.

  7. A novel 35 kDa frog liver acid metallophosphatase.

    PubMed

    Szalewicz, A; Radomska, B; Strzelczyk, B; Kubicz, A

    1999-04-12

    The lower molecular weight (35 kDa) acid phosphatase from the frog (Rana esculenta) liver is a glycometalloenzyme susceptible to activation by reducing agents and displaying tartrate and fluoride resistance. Metal chelators (EDTA, 1,10-phenanthroline) inactivate the enzyme reversibly in a time- and temperature-dependent manner. The apoenzyme is reactivated by divalent transition metal cations, i. e. cobalt, zinc, ferrous, manganese, cadmium and nickel to 130%, 75%, 63%, 62%, 55% and 34% of the original activity, respectively. Magnesium, calcium, cupric and ferric ions were shown to be ineffective in this process. Metal analysis by the emission spectrometry method (inductively coupled plasma-atomic emission spectrometry) revealed the presence of zinc, iron and magnesium. The time course of the apoenzyme reactivation, the stabilization effect and the relatively high resistance to oxidizing conditions indicate that the zinc ion is crucial for the enzyme activity. The presence of iron was additionally confirmed by the visible absorption spectrum of the enzyme with a shoulder at 417 nm and by the electron paramagnetic resonance line of high spin iron(III) with geff of 2.4. The active center containing only zinc or both zinc and iron ions is proposed. The frog liver lower molecular weight acid phosphatase is a novel metallophosphatase of lower vertebrate origin, distinct from the mammalian tartrate-resistant, purple acid phosphatases.

  8. [Complete genome sequencing and sequence analysis of BCG Tice].

    PubMed

    Wang, Zhiming; Pan, Yuanlong; Wu, Jun; Zhu, Baoli

    2012-10-04

    The objective of this study is to obtain the complete genome sequence of Bacillus Calmette-Guerin Tice (BCG Tice), in order to provide more information about the molecular biology of BCG Tice and design more reasonable vaccines to prevent tuberculosis. We assembled the data from high-throughput sequencing with SOAPdenovo software, with many contigs and scaffolds obtained. There are many sequence gaps and physical gaps remained as a result of regional low coverage and low quality. We designed primers at the end of contigs and performed PCR amplification in order to link these contigs and scaffolds. With various enzymes to perform PCR amplification, adjustment of PCR reaction conditions, and combined with clone construction to sequence, all the gaps were finished. We obtained the complete genome sequence of BCG Tice and submitted it to GenBank of National Center for Biotechnology Information (NCBI). The genome of BCG Tice is 4334064 base pairs in length, with GC content 65.65%. The problems and strategies during the finishing step of BCG Tice sequencing are illuminated here, with the hope of affording some experience to those who are involved in the finishing step of genome sequencing. The microarray data were verified by our results.

  9. Molecular Analysis of a Novel Methanesulfonic Acid Monooxygenase from the Methylotroph Methylosulfonomonas methylovora

    PubMed Central

    de Marco, Paolo; Moradas-Ferreira, Pedro; Higgins, Timothy P.; McDonald, Ian; Kenna, Elizabeth M.; Murrell, J. Colin

    1999-01-01

    Methylosulfonomonas methylovora M2 is an unusual gram-negative methylotrophic bacterium that can grow on methanesulfonic acid (MSA) as the sole source of carbon and energy. Oxidation of MSA by this bacterium is carried out by a multicomponent MSA monooxygenase (MSAMO). Cloning and sequencing of a 7.5-kbp SphI fragment of chromosomal DNA revealed four tightly linked genes encoding this novel monooxygenase. Analysis of the deduced MSAMO polypeptide sequences indicated that the enzyme contains a two-component hydroxylase of the mononuclear-iron-center type. The large subunit of the hydroxylase, MsmA (48 kDa), contains a typical Rieske-type [2Fe–2S] center with an unusual iron-binding motif and, together with the small subunit of the hydroxylase, MsmB (20 kDa), showed a high degree of identity with a number of dioxygenase enzymes. However, the other components of the MSAMO, MsmC, the ferredoxin component, and MsmD, the reductase, more closely resemble those found in other classes of oxygenases. MsmC has a high degree of identity to ferredoxins from toluene and methane monooxygenases, which are enzymes characterized by possessing hydroxylases containing μ-oxo bridge binuclear iron centers. MsmD is a reductase of 38 kDa with a typical chloroplast-like [2Fe–2S] center and conserved flavin adenine dinucleotide- and NAD-binding motifs and is similar to a number of mono- and dioxygenase reductase components. Preliminary analysis of the genes encoding MSAMO from a marine MSA-degrading bacterium, Marinosulfonomonas methylotropha, revealed the presence of msm genes highly related to those found in Methylosulfonomonas, suggesting that MSAMO is a novel type of oxygenase that may be conserved in all MSA-utilizing bacteria. PMID:10094704

  10. FAST: FAST Analysis of Sequences Toolbox

    PubMed Central

    Lawrence, Travis J.; Kauffman, Kyle T.; Amrine, Katherine C. H.; Carper, Dana L.; Lee, Raymond S.; Becich, Peter J.; Canales, Claudia J.; Ardell, David H.

    2015-01-01

    FAST (FAST Analysis of Sequences Toolbox) provides simple, powerful open source command-line tools to filter, transform, annotate and analyze biological sequence data. Modeled after the GNU (GNU's Not Unix) Textutils such as grep, cut, and tr, FAST tools such as fasgrep, fascut, and fastr make it easy to rapidly prototype expressive bioinformatic workflows in a compact and generic command vocabulary. Compact combinatorial encoding of data workflows with FAST commands can simplify the documentation and reproducibility of bioinformatic protocols, supporting better transparency in biological data science. Interface self-consistency and conformity with conventions of GNU, Matlab, Perl, BioPerl, R, and GenBank help make FAST easy and rewarding to learn. FAST automates numerical, taxonomic, and text-based sorting, selection and transformation of sequence records and alignment sites based on content, index ranges, descriptive tags, annotated features, and in-line calculated analytics, including composition and codon usage. Automated content- and feature-based extraction of sites and support for molecular population genetic statistics make FAST useful for molecular evolutionary analysis. FAST is portable, easy to install and secure thanks to the relative maturity of its Perl and BioPerl foundations, with stable releases posted to CPAN. Development as well as a publicly accessible Cookbook and Wiki are available on the FAST GitHub repository at https://github.com/tlawrence3/FAST. The default data exchange format in FAST is Multi-FastA (specifically, a restriction of BioPerl FastA format). Sanger and Illumina 1.8+ FastQ formatted files are also supported. FAST makes it easier for non-programmer biologists to interactively investigate and control biological data at the speed of thought. PMID:26042145

  11. Complete nucleotide sequence of clematis chlorotic mottle virus, a new member of the family Tombusviridae.

    PubMed

    McLaughlin, Margaret; Lockhart, Ben; Jordan, Ramon; Denton, Geoff; Mollov, Dimitre

    2017-05-01

    Clematis chlorotic mottle virus (ClCMV) is a previously undescribed virus associated with symptoms of yellow mottling and veining, chlorotic ring spots, line pattern mosaics, and flower distortion and discoloration on ornamental Clematis. The ClCMV genome is 3,880 nt in length with five open reading frames (ORFs) encoding a 27-kDa protein (ORF 1), an 87-kDa replicase protein (ORF 2), two centrally located movement proteins (ORF 3 and 4), and a 37-kDa capsid protein (ORF 5). Based on morphological, genomic, and phylogenetic analysis, ClCMV is predicted to be a member of the genus Pelarspovirus in the family Tombusviridae.

  12. Sequence quality analysis tool for HIV type 1 protease and reverse transcriptase.

    PubMed

    Delong, Allison K; Wu, Mingham; Bennett, Diane; Parkin, Neil; Wu, Zhijin; Hogan, Joseph W; Kantor, Rami

    2012-08-01

    Access to antiretroviral therapy is increasing globally and drug resistance evolution is anticipated. Currently, protease (PR) and reverse transcriptase (RT) sequence generation is increasing, including the use of in-house sequencing assays, and quality assessment prior to sequence analysis is essential. We created a computational HIV PR/RT Sequence Quality Analysis Tool (SQUAT) that runs in the R statistical environment. Sequence quality thresholds are calculated from a large dataset (46,802 PR and 44,432 RT sequences) from the published literature ( http://hivdb.Stanford.edu ). Nucleic acid sequences are read into SQUAT, identified, aligned, and translated. Nucleic acid sequences are flagged if with >five 1-2-base insertions; >one 3-base insertion; >one deletion; >six PR or >18 RT ambiguous bases; >three consecutive PR or >four RT nucleic acid mutations; >zero stop codons; >three PR or >six RT ambiguous amino acids; >three consecutive PR or >four RT amino acid mutations; >zero unique amino acids; or <0.5% or >15% genetic distance from another submitted sequence. Thresholds are user modifiable. SQUAT output includes a summary report with detailed comments for troubleshooting of flagged sequences, histograms of pairwise genetic distances, neighbor joining phylogenetic trees, and aligned nucleic and amino acid sequences. SQUAT is a stand-alone, free, web-independent tool to ensure use of high-quality HIV PR/RT sequences in interpretation and reporting of drug resistance, while increasing awareness and expertise and facilitating troubleshooting of potentially problematic sequences.

  13. The MIGenAS integrated bioinformatics toolkit for web-based sequence analysis

    PubMed Central

    Rampp, Markus; Soddemann, Thomas; Lederer, Hermann

    2006-01-01

    We describe a versatile and extensible integrated bioinformatics toolkit for the analysis of biological sequences over the Internet. The web portal offers convenient interactive access to a growing pool of chainable bioinformatics software tools and databases that are centrally installed and maintained by the RZG. Currently, supported tasks comprise sequence similarity searches in public or user-supplied databases, computation and validation of multiple sequence alignments, phylogenetic analysis and protein–structure prediction. Individual tools can be seamlessly chained into pipelines allowing the user to conveniently process complex workflows without the necessity to take care of any format conversions or tedious parsing of intermediate results. The toolkit is part of the Max-Planck Integrated Gene Analysis System (MIGenAS) of the Max Planck Society available at (click ‘Start Toolkit’). PMID:16844980

  14. Analysis of Metagenomic Sequences: From Megabases to Terabases

    ScienceCinema

    Krypides, Nikos

    2018-05-04

    Nikos Krypides of the DOE Joint Genome Institute discusses metagenomics and the challenge of dealing with terabases of data on June 4, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM.

  15. Identification of a major 50-kDa molecular weight human B-cell growth factor with Tac antigen-inducing activity on B cells.

    PubMed

    Kawano, M; Matsushima, K; Oppenheim, J J

    1987-08-01

    A bioassay was developed using human small B cells adherent to anti-human IgM (anti-mu)-coated wells. These B cells were stimulated to proliferate by culture supernatants of concanavalin A (Con A)-activated human peripheral blood lymphocytes (Con A Sup) even in the presence of high concentrations of anti-mu coated on assay wells. Human B-cell growth factor (BCGF) activities were partially purified from Con A Sup. Preparative chromatography (Sephacryl S-200 and isoelectrofocusing) yielded a major peak of BCGF activity for B cells adherent to anti-mu-coated wells with a molecular weight of 50,000 (50 kDa) and a pI 7.6. The 50-kDa BCGF was further purified by sequential chromatography using DEAE-Sephacel, CM-Sepharose, Sephacryl S-200, CM-high performance liquid chromatography (HPLC), and hydroxyapatite (HA)-HPLC. The HA-HPLC-purified 50-kDa BCGF was free of interleukin-1 (IL-1), interleukin-2 (IL-2), and interferon activities, but could support growth of BCL1 cells, similar to BCGF-II. Neither IL-1 nor interferon-gamma had any growth-stimulating effect in our B-cell proliferation assay with or without BCGF in Iscove's synthetic assay medium. BCGF-induced proliferation of B cells adherent to anti-mu-coated wells could be markedly augmented by the simultaneous or sequential addition of recombinant human IL-2 (rIL-2). When cultured for 3 days with 50-kDa BCGF, about 40% of B cells adherent to anti-mu-coated wells expressed Tac antigen, and monoclonal anti-Tac antibody inhibited rIL-2 enhancement of proliferation of 50-kDa BCGF-preactivated B cells. In addition, 50-kDa BCGF could induce Tac antigen on an Epstein-Barr virus-transformed B-cell line (ORSON) in the presence of a suboptimal dose of phorbol myristate acetate (PMA) and also on a natural killer-like cell line (YT cells). We have therefore identified a major 50-kDa BCGF activity with Tac antigen-inducing activity that also has a synergistic effect with IL-2 on normal B-cell proliferation.

  16. REFGEN and TREENAMER: Automated Sequence Data Handling for Phylogenetic Analysis in the Genomic Era

    PubMed Central

    Leonard, Guy; Stevens, Jamie R.; Richards, Thomas A.

    2009-01-01

    The phylogenetic analysis of nucleotide sequences and increasingly that of amino acid sequences is used to address a number of biological questions. Access to extensive datasets, including numerous genome projects, means that standard phylogenetic analyses can include many hundreds of sequences. Unfortunately, most phylogenetic analysis programs do not tolerate the sequence naming conventions of genome databases. Managing large numbers of sequences and standardizing sequence labels for use in phylogenetic analysis programs can be a time consuming and laborious task. Here we report the availability of an online resource for the management of gene sequences recovered from public access genome databases such as GenBank. These web utilities include the facility for renaming every sequence in a FASTA alignment file, with each sequence label derived from a user-defined combination of the species name and/or database accession number. This facility enables the user to keep track of the branching order of the sequences/taxa during multiple tree calculations and re-optimisations. Post phylogenetic analysis, these webpages can then be used to rename every label in the subsequent tree files (with a user-defined combination of species name and/or database accession number). Together these programs drastically reduce the time required for managing sequence alignments and labelling phylogenetic figures. Additional features of our platform include the automatic removal of identical accession numbers (recorded in the report file) and generation of species and accession number lists for use in supplementary materials or figure legends. PMID:19812722

  17. Meta sequence analysis of human blood peptides and their parent proteins.

    PubMed

    Bowden, Peter; Pendrak, Voitek; Zhu, Peihong; Marshall, John G

    2010-04-18

    Sequence analysis of the blood peptides and their qualities will be key to understanding the mechanisms that contribute to error in LC-ESI-MS/MS. Analysis of peptides and their proteins at the level of sequences is much more direct and informative than the comparison of disparate accession numbers. A portable database of all blood peptide and protein sequences with descriptor fields and gene ontology terms might be useful for designing immunological or MRM assays from human blood. The results of twelve studies of human blood peptides and/or proteins identified by LC-MS/MS and correlated against a disparate array of genetic libraries were parsed and matched to proteins from the human ENSEMBL, SwissProt and RefSeq databases by SQL. The reported peptide and protein sequences were organized into an SQL database with full protein sequences and up to five unique peptides in order of prevalence along with the peptide count for each protein. Structured query language or BLAST was used to acquire descriptive information in current databases. Sampling error at the level of peptides is the largest source of disparity between groups. Chi Square analysis of peptide to protein distributions confirmed the significant agreement between groups on identified proteins. Copyright 2010. Published by Elsevier B.V.

  18. High Throughput Plasmid Sequencing with Illumina and CLC Bio (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    Athavale, Ajay

    2018-01-04

    Ajay Athavale (Monsanto) presents "High Throughput Plasmid Sequencing with Illumina and CLC Bio" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  19. The European Classical Swine Fever Virus Database: Blueprint for a Pathogen-Specific Sequence Database with Integrated Sequence Analysis Tools

    PubMed Central

    Postel, Alexander; Schmeiser, Stefanie; Zimmermann, Bernd; Becher, Paul

    2016-01-01

    Molecular epidemiology has become an indispensable tool in the diagnosis of diseases and in tracing the infection routes of pathogens. Due to advances in conventional sequencing and the development of high throughput technologies, the field of sequence determination is in the process of being revolutionized. Platforms for sharing sequence information and providing standardized tools for phylogenetic analyses are becoming increasingly important. The database (DB) of the European Union (EU) and World Organisation for Animal Health (OIE) Reference Laboratory for classical swine fever offers one of the world’s largest semi-public virus-specific sequence collections combined with a module for phylogenetic analysis. The classical swine fever (CSF) DB (CSF-DB) became a valuable tool for supporting diagnosis and epidemiological investigations of this highly contagious disease in pigs with high socio-economic impacts worldwide. The DB has been re-designed and now allows for the storage and analysis of traditionally used, well established genomic regions and of larger genomic regions including complete viral genomes. We present an application example for the analysis of highly similar viral sequences obtained in an endemic disease situation and introduce the new geographic “CSF Maps” tool. The concept of this standardized and easy-to-use DB with an integrated genetic typing module is suited to serve as a blueprint for similar platforms for other human or animal viruses. PMID:27827988

  20. Intact and Top-Down Characterization of Biomolecules and Direct Analysis Using Infrared Matrix-Assisted Laser Desorption Electrospray Ionization Coupled to FT-ICR Mass Spectrometry

    PubMed Central

    Sampson, Jason S.; Murray, Kermit K.; Muddiman, David C.

    2013-01-01

    We report the implementation of an infrared laser onto our previously reported matrix-assisted laser desorption electrospray ionization (MALDESI) source with ESI post-ionization yielding multiply charged peptides and proteins. Infrared (IR)-MALDESI is demonstrated for atmospheric pressure desorption and ionization of biological molecules ranging in molecular weight from 1.2 to 17 kDa. High resolving power, high mass accuracy single-acquisition Fourier transform ion cyclotron resonance (FT-ICR) mass spectra were generated from liquid-and solid-state peptide and protein samples by desorption with an infrared laser (2.94 µm) followed by ESI post-ionization. Intact and top-down analysis of equine myoglobin (17 kDa) desorbed from the solid state with ESI post-ionization demonstrates the sequencing capabilities using IR-MALDESI coupled to FT-ICR mass spectrometry. Carbohydrates and lipids were detected through direct analysis of milk and egg yolk using both UV- and IR-MALDESI with minimal sample preparation. Three of the four classes of biological macromolecules (proteins, carbohydrates, and lipids) have been ionized and detected using MALDESI with minimal sample preparation. Sequencing of O-linked glycans, cleaved from mucin using reductive β-elimination chemistry, is also demonstrated. PMID:19185512

  1. Molecular and Structural Characterization of the Tegumental 20.6-kDa Protein in Clonorchis sinensis as a Potential Druggable Target.

    PubMed

    Kim, Yu-Jung; Yoo, Won Gi; Lee, Myoung-Ro; Kang, Jung-Mi; Na, Byoung-Kuk; Cho, Shin-Hyeong; Park, Mi-Yeoun; Ju, Jung-Won

    2017-03-04

    The tegument, representing the membrane-bound outer surface of platyhelminth parasites, plays an important role for the regulation of the host immune response and parasite survival. A comprehensive understanding of tegumental proteins can provide drug candidates for use against helminth-associated diseases, such as clonorchiasis caused by the liver fluke Clonorchis sinensis . However, little is known regarding the physicochemical properties of C. sinensis teguments. In this study, a novel 20.6-kDa tegumental protein of the C. sinensis adult worm (CsTegu20.6) was identified and characterized by molecular and in silico methods. The complete coding sequence of 525 bp was derived from cDNA clones and encodes a protein of 175 amino acids. Homology search using BLASTX showed CsTegu20.6 identity ranging from 29% to 39% with previously-known tegumental proteins in C. sinensis . Domain analysis indicated the presence of a calcium-binding EF-hand domain containing a basic helix-loop-helix structure and a dynein light chain domain exhibiting a ferredoxin fold. We used a modified method to obtain the accurate tertiary structure of the CsTegu20.6 protein because of the unavailability of appropriate templates. The CsTegu20.6 protein sequence was split into two domains based on the disordered region, and then, the structure of each domain was modeled using I-TASSER. A final full-length structure was obtained by combining two structures and refining the whole structure. A refined CsTegu20.6 structure was used to identify a potential CsTegu20.6 inhibitor based on protein structure-compound interaction analysis. The recombinant proteins were expressed in Escherichia coli and purified by nickel-nitrilotriacetic acid affinity chromatography. In C. sinensis , CsTegu20.6 mRNAs were abundant in adult and metacercariae, but not in the egg. Immunohistochemistry revealed that CsTegu20.6 localized to the surface of the tegument in the adult fluke. Collectively, our results contribute to a

  2. Subgenotype analysis of Cryptosporidium isolates from humans, cattle, and zoo ruminants in Portugal.

    PubMed

    Alves, Margarida; Xiao, Lihua; Sulaiman, Irshad; Lal, Altaf A; Matos, Olga; Antunes, Francisco

    2003-06-01

    Cryptosporidium parvum and Cryptosporidium hominis isolates from human immunodeficiency virus-infected patients, cattle, and wild ruminants were characterized by PCR and DNA sequencing analysis of the 60-kDa glycoprotein gene. Seven alleles were identified, three corresponding to C. hominis and four corresponding to C. parvum. One new allele was found (IId), and one (IIb) had only been found in Portugal. Isolates from cattle and wild ruminants clustered in two alleles. In contrast, human isolates clustered in seven alleles, showing extensive allelic diversity.

  3. Total RNA Sequencing Analysis of DCIS Progressing to Invasive Breast Cancer

    DTIC Science & Technology

    2016-09-01

    AWARD NUMBER: W81XWH-14-1-0080 TITLE: Total RNA Sequencing Analysis of DCIS Progressing to Invasive Breast Cancer. PRINCIPAL INVESTIGATOR...PREPARED FOR: U.S. Army Medical Research and Materiel Command Fort Detrick, Maryland 21702-5012 DISTRIBUTION STATEMENT: Approved for Public Release...SUBTITLE Total RNA Sequencing Analysis of DCIS Progressing to Invasive Breast Cancer. 5a. CONTRACT NUMBER 5b. GRANT NUMBER W81XWH-14-1-0080 GRANT11489

  4. DELIMINATE--a fast and efficient method for loss-less compression of genomic sequences: sequence analysis.

    PubMed

    Mohammed, Monzoorul Haque; Dutta, Anirban; Bose, Tungadri; Chadaram, Sudha; Mande, Sharmila S

    2012-10-01

    An unprecedented quantity of genome sequence data is currently being generated using next-generation sequencing platforms. This has necessitated the development of novel bioinformatics approaches and algorithms that not only facilitate a meaningful analysis of these data but also aid in efficient compression, storage, retrieval and transmission of huge volumes of the generated data. We present a novel compression algorithm (DELIMINATE) that can rapidly compress genomic sequence data in a loss-less fashion. Validation results indicate relatively higher compression efficiency of DELIMINATE when compared with popular general purpose compression algorithms, namely, gzip, bzip2 and lzma. Linux, Windows and Mac implementations (both 32 and 64-bit) of DELIMINATE are freely available for download at: http://metagenomics.atc.tcs.com/compression/DELIMINATE. sharmila@atc.tcs.com Supplementary data are available at Bioinformatics online.

  5. Cloning, sequencing, and expression of the gene encoding the high-molecular-weight cytochrome c from Desulfovibrio vulgaris Hildenborough.

    PubMed Central

    Pollock, W B; Loutfi, M; Bruschi, M; Rapp-Giles, B J; Wall, J D; Voordouw, G

    1991-01-01

    By using a synthetic deoxyoligonucleotide probe designed to recognize the structural gene for cytochrome cc3 from Desulfovibrio vulgaris Hildenborough, a 3.7-kb XhoI genomic DNA fragment containing the cc3 gene was isolated. The gene encodes a precursor polypeptide of 58.9 kDa, with an NH2-terminal signal sequence of 31 residues. The mature polypeptide (55.7 kDa) has 16 heme binding sites of the form C-X-X-C-H. Covalent binding of heme to these 16 sites gives a holoprotein of 65.5 kDa with properties similar to those of the high-molecular-weight cytochrome c (Hmc) isolated from the same strain by Higuchi et al. (Y. Higuchi, K. Inaka, N. Yasuoka, and T. Yagi, Biochim. Biophys. Acta 911:341-348, 1987). Since the data indicate that cytochrome cc3 and Hmc are the same protein, the gene has been named hmc. The Hmc polypeptide contains 31 histidinyl residues, 16 of which are integral to heme binding sites. Thus, only 15 of the 16 hemes can have bis-histidinyl coordination. A comparison of the arrangement of heme binding sites and coordinated histidines in the amino acid sequences of cytochrome c3 and Hmc from D. vulgaris Hildenborough suggests that the latter contains three cytochrome c3-like domains. Cloning of the D. vulgaris Hildenborough hmc gene into the broad-host-range vector pJRD215 and subsequent conjugational transfer of the recombinant plasmid into D. desulfuricans G200 led to expression of a periplasmic Hmc gene product with covalently bound hemes. Images PMID:1846136

  6. Genome sequence and analysis of Lactobacillus helveticus

    PubMed Central

    Cremonesi, Paola; Chessa, Stefania; Castiglioni, Bianca

    2013-01-01

    The microbiological characterization of lactobacilli is historically well developed, but the genomic analysis is recent. Because of the widespread use of Lactobacillus helveticus in cheese technology, information concerning the heterogeneity in this species is accumulating rapidly. Recently, the genome of five L. helveticus strains was sequenced to completion and compared with other genomically characterized lactobacilli. The genomic analysis of the first sequenced strain, L. helveticus DPC 4571, isolated from cheese and selected for its characteristics of rapid lysis and high proteolytic activity, has revealed a plethora of genes with industrial potential including those responsible for key metabolic functions such as proteolysis, lipolysis, and cell lysis. These genes and their derived enzymes can facilitate the production of cheese and cheese derivatives with potential for use as ingredients in consumer foods. In addition, L. helveticus has the potential to produce peptides with a biological function, such as angiotensin converting enzyme (ACE) inhibitory activity, in fermented dairy products, demonstrating the therapeutic value of this species. A most intriguing feature of the genome of L. helveticus is the remarkable similarity in gene content with many intestinal lactobacilli. Comparative genomics has allowed the identification of key gene sets that facilitate a variety of lifestyles including adaptation to food matrices or the gastrointestinal tract. As genome sequence and functional genomic information continues to explode, key features of the genomes of L. helveticus strains continue to be discovered, answering many questions but also raising many new ones. PMID:23335916

  7. The 170-kDa glucose-regulated stress protein is an endoplasmic reticulum protein that binds immunoglobulin.

    PubMed Central

    Lin, H Y; Masso-Welch, P; Di, Y P; Cai, J W; Shen, J W; Subjeck, J R

    1993-01-01

    Anoxia, glucose starvation, calcium ionophore A23187, EDTA, glucosamine, and several other conditions that adversely affect the function of the endoplasmic reticulum (ER) induce the synthesis of the glucose-regulated class of stress proteins (GRPs). The primary GRPs induced by these stresses migrate at 78 and 94 kDa (GRP78 and GRP94). In addition, another protein of approximately 150-170 kDa (GRP170) has been previously observed and is coordinately induced with GRP78 and GRP94. To characterize this novel stress protein, we have prepared an antisera against purified GRP170. Immunofluorescence, Endoglycosidase H sensitivity, and protease resistance of this protein in microsomes indicates that GRP170 is an ER lumenal glycoprotein retained in a pre-Golgi compartment. Immunoprecipitation of GRP170 with our antibody coprecipitates the GRP78 (also referred to as the B cell immunoglobulin-binding protein) and GRP94 members of this stress protein family in Chinese hamster ovary cells under stress conditions. ATP depletion, by immunoprecipitation in the presence of apyrase, does not affect the interaction between GRP78 and GRP170 but results in the coprecipitation of an unidentified 60-kDa protein. In addition, GRP170 is found to be coprecipitated with immunoglobulin (Ig) in four different B cell hybridomas expressing surface IgM, cytoplasmic Ig light chain only, cytoplasmic Ig heavy chain only, or an antigen specific secreted IgG. In addition, in IgM surface expressing WEHI-231 B cells, anti-IgM coprecipitates GRP78, GRP94, as well as GRP170; antibodies against GRP170 and GRP94 reciprocally coprecipitate GRP94/GRP170 as well as GRP78. Results suggest that this 170-kDa GRP is a retained ER lumenal glycoprotein that is constitutively present and that may play a role in immunoglobulin folding and assembly in conjunction or consecutively with GRP78 and GRP94. Images PMID:8305733

  8. Analysis and Characterization of Proteins Associated with Outer Membrane Vesicles Secreted by Cronobacter spp.

    PubMed Central

    Kothary, Mahendra H.; Gopinath, Gopal R.; Gangiredla, Jayanthi; Rallabhandi, Prasad V.; Harrison, Lisa M.; Yan, Qiong Q.; Chase, Hannah R.; Lee, Boram; Park, Eunbi; Yoo, YeonJoo; Chung, Taejung; Finkelstein, Samantha B.; Negrete, Flavia J.; Patel, Isha R.; Carter, Laurenda; Sathyamoorthy, Venugopal; Fanning, Séamus; Tall, Ben D.

    2017-01-01

    Little is known about secretion of outer membrane vesicles (OMVs) by Cronobacter. In this study, OMVs isolated from Cronobacter sakazakii, Cronobacter turicensis, and Cronobacter malonaticus were examined by electron microscopy (EM) and their associated outer membrane proteins (OMP) and genes were analyzed by SDS-PAGE, protein sequencing, BLAST, PCR, and DNA microarray. EM of stained cells revealed that the OMVs are secreted as pleomorphic micro-vesicles which cascade from the cell's surface. SDS-PAGE analysis identified protein bands with molecular weights of 18 kDa to >100 kDa which had homologies to OMPs such as GroEL; OmpA, C, E, F, and X; MipA proteins; conjugative plasmid transfer protein; and an outer membrane auto-transporter protein (OMATP). PCR analyses showed that most of the OMP genes were present in all seven Cronobacter species while a few genes (OMATP gene, groEL, ompC, mipA, ctp, and ompX) were absent in some phylogenetically-related species. Microarray analysis demonstrated sequence divergence among the OMP genes that was not captured by PCR. These results support previous findings that OmpA and OmpX may be involved in virulence of Cronobacter, and are packaged within secreted OMVs. These results also suggest that other OMV-packaged OMPs may be involved in roles such as stress response, cell wall and plasmid maintenance, and extracellular transport. PMID:28232819

  9. An intron-containing glycoside hydrolase family 9 cellulase gene encodes the dominant 90 kDa component of the cellulosome of the anaerobic fungus Piromyces sp. strain E2.

    PubMed Central

    Steenbakkers, Peter J M; Ubhayasekera, Wimal; Goossen, Harry J A M; van Lierop, Erik M H M; van der Drift, Chris; Vogels, Godfried D; Mowbray, Sherry L; Op den Camp, Huub J M

    2002-01-01

    The cellulosome produced by Piromyces sp. strain E2 during growth on filter paper was purified by using an optimized cellulose-affinity method consisting of steps of EDTA washing of the cellulose-bound protein followed by elution with water. Three dominant proteins were identified in the cellulosome preparation, with molecular masses of 55, 80 and 90 kDa. Treatment of cellulose-bound cellulosome with a number of denaturing agents was also tested. Incubation with 0.5% (w/v) SDS or 8 M urea released most cellulosomal proteins, while leaving the greater fraction of the 80, 90 and 170 kDa components. To investigate the major 90 kDa cellulosome protein further, the corresponding gene, cel9A, was isolated, using immunoscreening and N-terminal sequencing. Inspection of the cel9A genomic organization revealed the presence of four introns, allowing the construction of a consensus for introns in anaerobic fungi. The 2800 bp cDNA clone contained an open reading frame of 2334 bp encoding a 757-residue extracellular protein. Cel9A includes a 445-residue glycoside hydrolase family 9 catalytic domain, and so is the first fungal representative of this large family. Both modelling of the catalytic domain as well as the activity measured with low level expression in Escherichia coli indicated that Cel9A is an endoglucanase. The catalytic domain is succeeded by a putative beta-sheet module of 160 amino acids with unknown function, followed by a threonine-rich linker and three fungal docking domains. Homology modelling of the Cel9A dockerins suggested that the cysteine residues present are all involved in disulphide bridges. The results presented here are used to discuss evolution of glycoside hydrolase family 9 enzymes. PMID:12071852

  10. Library preparation and data analysis packages for rapid genome sequencing.

    PubMed

    Pomraning, Kyle R; Smith, Kristina M; Bredeweg, Erin L; Connolly, Lanelle R; Phatale, Pallavi A; Freitag, Michael

    2012-01-01

    High-throughput sequencing (HTS) has quickly become a valuable tool for comparative genetics and genomics and is now regularly carried out in laboratories that are not connected to large sequencing centers. Here we describe an updated version of our protocol for constructing single- and paired-end Illumina sequencing libraries, beginning with purified genomic DNA. The present protocol can also be used for "multiplexing," i.e. the analysis of several samples in a single flowcell lane by generating "barcoded" or "indexed" Illumina sequencing libraries in a way that is independent from Illumina-supported methods. To analyze sequencing results, we suggest several independent approaches but end users should be aware that this is a quickly evolving field and that currently many alignment (or "mapping") and counting algorithms are being developed and tested.

  11. Sequence Diversity Diagram for comparative analysis of multiple sequence alignments.

    PubMed

    Sakai, Ryo; Aerts, Jan

    2014-01-01

    The sequence logo is a graphical representation of a set of aligned sequences, commonly used to depict conservation of amino acid or nucleotide sequences. Although it effectively communicates the amount of information present at every position, this visual representation falls short when the domain task is to compare between two or more sets of aligned sequences. We present a new visual presentation called a Sequence Diversity Diagram and validate our design choices with a case study. Our software was developed using the open-source program called Processing. It loads multiple sequence alignment FASTA files and a configuration file, which can be modified as needed to change the visualization. The redesigned figure improves on the visual comparison of two or more sets, and it additionally encodes information on sequential position conservation. In our case study of the adenylate kinase lid domain, the Sequence Diversity Diagram reveals unexpected patterns and new insights, for example the identification of subgroups within the protein subfamily. Our future work will integrate this visual encoding into interactive visualization tools to support higher level data exploration tasks.

  12. A basic analysis toolkit for biological sequences

    PubMed Central

    Giancarlo, Raffaele; Siragusa, Alessandro; Siragusa, Enrico; Utro, Filippo

    2007-01-01

    This paper presents a software library, nicknamed BATS, for some basic sequence analysis tasks. Namely, local alignments, via approximate string matching, and global alignments, via longest common subsequence and alignments with affine and concave gap cost functions. Moreover, it also supports filtering operations to select strings from a set and establish their statistical significance, via z-score computation. None of the algorithms is new, but although they are generally regarded as fundamental for sequence analysis, they have not been implemented in a single and consistent software package, as we do here. Therefore, our main contribution is to fill this gap between algorithmic theory and practice by providing an extensible and easy to use software library that includes algorithms for the mentioned string matching and alignment problems. The library consists of C/C++ library functions as well as Perl library functions. It can be interfaced with Bioperl and can also be used as a stand-alone system with a GUI. The software is available at under the GNU GPL. PMID:17877802

  13. Sequence analysis of the phs operon in Salmonella typhimurium and the contribution of thiosulfate reduction to anaerobic energy metabolism.

    PubMed Central

    Heinzinger, N K; Fujimoto, S Y; Clark, M A; Moreno, M S; Barrett, E L

    1995-01-01

    The phs chromosomal locus of Salmonella typhimurium is essential for the dissimilatory anaerobic reduction of thiosulfate to hydrogen sulfide. Sequence analysis of the phs region revealed a functional operon with three open reading frames, designated phsA, phsB, and phsC, which encode peptides of 82.7, 21.3, and 28.5 kDa, respectively. The predicted products of phsA and phsB exhibited significant homology with the catalytic and electron transfer subunits of several other anaerobic molybdoprotein oxidoreductases, including Escherichia coli dimethyl sulfoxide reductase, nitrate reductase, and formate dehydrogenase. Simultaneous comparison of PhsA to seven homologous molybdoproteins revealed numerous similarities among all eight throughout the entire frame, hence, significant amino acid conservation among molybdoprotein oxidoreductases. Comparison of PhsB to six other homologous sequences revealed four highly conserved iron-sulfur clusters. The predicted phsC product was highly hydrophobic and similar in size to the hydrophobic subunits of the molybdoprotein oxidoreductases containing subunits homologous to phsA and phsB. Thus, phsABC appears to encode thiosulfate reductase. Single-copy phs-lac translational fusions required both anaerobiosis and thiosulfate for full expression, whereas multicopy phs-lac translational fusions responded to either thiosulfate or anaerobiosis, suggesting that oxygen and thiosulfate control of phs involves negative regulation. A possible role for thiosulfate reduction in anaerobic respiration was examined. Thiosulfate did not significantly augment the final densities of anaerobic cultures grown on any of the 18 carbon sources tested. on the other hand, washed stationary-phase cells depleted of ATP were shown to synthesize small amounts of ATP on the addition of the formate and thiosulfate, suggesting that the thiosulfate reduction plays a unique role in anaerobic energy conservation by S typhimurium. PMID:7751291

  14. Interaction of human platelets with laminin and identification of the 67 kDa laminin receptor on platelets.

    PubMed Central

    Tandon, N N; Holland, E A; Kralisz, U; Kleinman, H K; Robey, F A; Jamieson, G A

    1991-01-01

    A microtitre adhesion assay has been developed to define parameters affecting the adherence of washed platelets to laminin. Adherence was optimally supported by Mg2+ and was inhibited by Ca2+ and by anti-laminin Fab fragments, but significant adhesion (75-90% of control) was found both in heparinized plasma containing physiological levels of bivalent cations and in plasma anti-coagulated with EGTA. Adherence was unaffected by platelet activation with ADP but was decreased by 50% by treatment with alpha-thrombin (1 unit/ml, 5 min). Adherence was unaffected by monospecific polyclonal antibodies to glycoprotein (GP) Ib and GPIV, and was normal with platelets from two patients with Glanzmann's thrombasthaenia, indicating that GPIb, the GPIIb/IIIa complex and GPIV are not involved in platelet-laminin interaction. Affinity chromatography of Triton-solubilized membranes on laminin-Sepharose followed by elution with 0.2 M-glycine/HCl (pH 2.85) identified a major band with a molecular mass of 67 kDa in the reduced and of 53 kDa in the unreduced form. This protein gave a positive reaction on Western blotting with a monospecific polyclonal antibody raised against the high-affinity laminin receptor isolated from human breast carcinoma tissue. The adhesion of platelets to laminin was inhibited by two monoclonal IgM antibodies specific to the LR-1 domain of the 67 kDa receptor. The binding protein was surface-oriented, as shown by flow cytofluorimetry and by the fact that it could be iodinated in intact platelets, but it was not labelled by the periodate-borotritide procedure, suggesting that it did not contain terminal sialic acid. The laminin-derived peptides Tyr-Ile-Gly-Ser-Arg and Cys-Asp-Pro-Gly-Tyr-Ile-Gly-Ser-Arg-NH2, which constitute a complementary binding domain in laminin for the 67 kDa receptor, themselves supported platelet adhesion, bound to the receptor and inhibited the adhesion of platelets to laminin. In addition, Fab fragments of anti

  15. Sequence analysis of PROTEOLYSIS 6 from Solanum lycopersicum

    NASA Astrophysics Data System (ADS)

    Roslan, Nur Farhana; Chew, Bee Lyn; Goh, Hoe-Han; Isa, Nurulhikma Md

    2018-04-01

    The N-end rule pathway is a protein degradation pathway that relates the protein half-life with the identity of its N-terminal residues. A destabilizing N-terminal residues is created by enzymatic reaction or chemical modifications. This destabilized substrate will be recognized by PROTEOLYSIS 6 (PRT6) protein, which encodes an E3 ligase enzyme and resulted in substrate degradation by proteasome. PRT6 has been studied in Arabidopsis thaliana and barley but not yet been studied in fleshy fruit plants. Hence, this study was carried out in tomato that is known as the model for fleshy fruit plants. BLASTX analysis identified that Solyc09g010830 which encodes for a PRT6 gene in tomato based on its sequence similarity with PRT6 in A. thaliana. In silico gene expression analysis shows that PRT6 gene was highly expressed in tomato fruits breaker +5. Co-expression analysis shows that PRT6 may not only involved in abiotic stresses but also in biotic stresses. The objective is to analyze the sequence and characterize PRT6 gene in tomato.

  16. Evolution Analysis of Simple Sequence Repeats in Plant Genome.

    PubMed

    Qin, Zhen; Wang, Yanping; Wang, Qingmei; Li, Aixian; Hou, Fuyun; Zhang, Liming

    2015-01-01

    Simple sequence repeats (SSRs) are widespread units on genome sequences, and play many important roles in plants. In order to reveal the evolution of plant genomes, we investigated the evolutionary regularities of SSRs during the evolution of plant species and the plant kingdom by analysis of twelve sequenced plant genome sequences. First, in the twelve studied plant genomes, the main SSRs were those which contain repeats of 1-3 nucleotides combination. Second, in mononucleotide SSRs, the A/T percentage gradually increased along with the evolution of plants (except for P. patens). With the increase of SSRs repeat number the percentage of A/T in C. reinhardtii had no significant change, while the percentage of A/T in terrestrial plants species gradually declined. Third, in dinucleotide SSRs, the percentage of AT/TA increased along with the evolution of plant kingdom and the repeat number increased in terrestrial plants species. This trend was more obvious in dicotyledon than monocotyledon. The percentage of CG/GC showed the opposite pattern to the AT/TA. Forth, in trinucleotide SSRs, the percentages of combinations including two or three A/T were in a rising trend along with the evolution of plant kingdom; meanwhile with the increase of SSRs repeat number in plants species, different species chose different combinations as dominant SSRs. SSRs in C. reinhardtii, P. patens, Z. mays and A. thaliana showed their specific patterns related to evolutionary position or specific changes of genome sequences. The results showed that, SSRs not only had the general pattern in the evolution of plant kingdom, but also were associated with the evolution of the specific genome sequence. The study of the evolutionary regularities of SSRs provided new insights for the analysis of the plant genome evolution.

  17. A functional U-statistic method for association analysis of sequencing data.

    PubMed

    Jadhav, Sneha; Tong, Xiaoran; Lu, Qing

    2017-11-01

    Although sequencing studies hold great promise for uncovering novel variants predisposing to human diseases, the high dimensionality of the sequencing data brings tremendous challenges to data analysis. Moreover, for many complex diseases (e.g., psychiatric disorders) multiple related phenotypes are collected. These phenotypes can be different measurements of an underlying disease, or measurements characterizing multiple related diseases for studying common genetic mechanism. Although jointly analyzing these phenotypes could potentially increase the power of identifying disease-associated genes, the different types of phenotypes pose challenges for association analysis. To address these challenges, we propose a nonparametric method, functional U-statistic method (FU), for multivariate analysis of sequencing data. It first constructs smooth functions from individuals' sequencing data, and then tests the association of these functions with multiple phenotypes by using a U-statistic. The method provides a general framework for analyzing various types of phenotypes (e.g., binary and continuous phenotypes) with unknown distributions. Fitting the genetic variants within a gene using a smoothing function also allows us to capture complexities of gene structure (e.g., linkage disequilibrium, LD), which could potentially increase the power of association analysis. Through simulations, we compared our method to the multivariate outcome score test (MOST), and found that our test attained better performance than MOST. In a real data application, we apply our method to the sequencing data from Minnesota Twin Study (MTS) and found potential associations of several nicotine receptor subunit (CHRN) genes, including CHRNB3, associated with nicotine dependence and/or alcohol dependence. © 2017 WILEY PERIODICALS, INC.

  18. The Isolation and Characterization of Glycosylated Phosphoproteins from Herring Fish Bones*

    PubMed Central

    Zhou, Hai-Yan; Salih, Erdjan; Glimcher, Melvin J.

    2010-01-01

    Past studies of bone extracellular matrix phosphoproteins such as osteopontin and bone sialoprotein have yielded important biological information regarding their role in calcification and the regulation of cellular activity. Most of these studies have been limited to proteins extracted from mammalian and avian vertebrates and nonvertebrates. The present work describes the isolation and purification of two major highly glycosylated and phosphorylated extracellular matrix proteins of 70 and 22 kDa from herring fish bones. The 70-kDa phosphoprotein has some characteristics of osteopontin with respect to amino acid composition and susceptibility to thrombin cleavage. Unlike osteopontin, however, it was found to contain high levels of sialic acid similar to bone sialoprotein. The 22-kDa protein has very different properties such as very high content of phosphoserine (∼270 Ser(P) residues/1000 amino acid residues), Ala, and Asx residues. The N-terminal amino acid sequence analysis of both the 70-kDa (NPIMA(M)ETTS(M)DSKVNPLL) and the 22-kDa (NQDMAMEASSDPEAA) fish phosphoproteins indicate that these unique amino acid sequences are unlike any published in protein databases. An enzyme-linked immunosorbent assay revealed that the 70-kDa phosphoprotein was present principally in bone and in calcified scales, whereas the 22-kDa phosphoprotein was detected only in bone. Immunohistological analysis revealed diffusely positive immunostaining for both the 70- and 22-kDa phosphoproteins throughout the matrix of the bone. Overall, this work adds additional support to the concept that the mechanism of biological calcification has common evolutionary and fundamental bases throughout vertebrate species. PMID:20833721

  19. Genome-wide gene–gene interaction analysis for next-generation sequencing

    PubMed Central

    Zhao, Jinying; Zhu, Yun; Xiong, Momiao

    2016-01-01

    The critical barrier in interaction analysis for next-generation sequencing (NGS) data is that the traditional pairwise interaction analysis that is suitable for common variants is difficult to apply to rare variants because of their prohibitive computational time, large number of tests and low power. The great challenges for successful detection of interactions with NGS data are (1) the demands in the paradigm of changes in interaction analysis; (2) severe multiple testing; and (3) heavy computations. To meet these challenges, we shift the paradigm of interaction analysis between two SNPs to interaction analysis between two genomic regions. In other words, we take a gene as a unit of analysis and use functional data analysis techniques as dimensional reduction tools to develop a novel statistic to collectively test interaction between all possible pairs of SNPs within two genome regions. By intensive simulations, we demonstrate that the functional logistic regression for interaction analysis has the correct type 1 error rates and higher power to detect interaction than the currently used methods. The proposed method was applied to a coronary artery disease dataset from the Wellcome Trust Case Control Consortium (WTCCC) study and the Framingham Heart Study (FHS) dataset, and the early-onset myocardial infarction (EOMI) exome sequence datasets with European origin from the NHLBI's Exome Sequencing Project. We discovered that 6 of 27 pairs of significantly interacted genes in the FHS were replicated in the independent WTCCC study and 24 pairs of significantly interacted genes after applying Bonferroni correction in the EOMI study. PMID:26173972

  20. In silico analysis of β-mannanases and β-mannosidase from Aspergillus flavus and Trichoderma virens UKM1

    NASA Astrophysics Data System (ADS)

    Yee, Chai Sin; Murad, Abdul Munir Abdul; Bakar, Farah Diba Abu

    2013-11-01

    A gene encoding an endo-β-1,4-mannanase from Trichoderma virens UKM1 (manTV) and Aspergillus flavus UKM1 (manAF) was analysed with bioinformatic tools. In addition, A. flavus NRRL 3357 genome database was screened for a β-mannosidase gene and analysed (mndA-AF). These three genes were analysed to understand their gene properties. manTV and manAF both consists of 1,332-bp and 1,386-bp nucleotides encoding 443 and 461 amino acid residues, respectively. Both the endo-β-1,4-mannanases belong to the glycosyl hydrolase family 5 and contain a carbohydrate-binding module family 1 (CBM1). On the other hand, mndA-AF which is a 2,745-bp gene encodes a protein sequence of 914 amino acid residues. This β-mannosidase belongs to the glycosyl hydrolase family 2. Predicted molecular weight of manTV, manAF and mndA-AF are 47.74 kDa, 49.71 kDa and 103 kDa, respectively. All three predicted protein sequences possessed signal peptide sequence and are highly conserved among other fungal β-mannanases and β-mannosidases.

  1. Molecular cloning and nucleotide sequences of the genes for two essential proteins constituting a novel enzyme system for heptaprenyl diphosphate synthesis.

    PubMed

    Koike-Takeshita, A; Koyama, T; Obata, S; Ogura, K

    1995-08-04

    The genes encoding two dissociable components essential for Bacillus stearothermophilus heptaprenyl diphosphate synthase (all-trans-hexparenyl-diphosphate:isopentenyl-diphosphate hexaprenyl-trans-transferase, EC 2.5.1.30) were cloned, and their nucleotide sequences were determined. Sequence analyses revealed the presence of three open reading frames within 2,350 base pairs, designated as ORF-1, ORF-2, and ORF-3 in order of nucleotide sequence, which encode proteins of 220, 234, and 323 amino acids, respectively. Deletion experiments have shown that expression of the enzymatic activity requires the presence of ORF-1 and ORF-3, but ORF-2 is not essential. As a result, this enzyme was proved genetically to consist of two different protein compounds with molecular masses of 25 kDa (Component I) and 36 kDa (Component II), encoded by two of the three tandem genes. The protein encoded by ORF-1 has no similarity to any protein so far registered. However, the protein encoded by ORF-3 shows a 32% similarity to the farnesyl diphosphate synthase of the same bacterium and has seven highly conserved regions that have been shown typical in prenyltransferases (Koyama, T., Obata, S., Osabe, M., Takeshita, A., Yokoyama, K., Uchida, M., Nishino, T., and Ogura, K. (1993) J. Biochem. (Tokyo) 113, 355-363).

  2. Molecular differentiation and phylogenetic relationships of three Angiostrongylus species and Angiostrongylus cantonensis geographical isolates based on a 66-kDa protein gene of A. cantonensis (Nematoda: Angiostrongylidae).

    PubMed

    Eamsobhana, Praphathip; Lim, Phaik Eem; Zhang, Hongman; Gan, Xiaoxian; Yong, Hoi Sen

    2010-12-01

    The phylogenetic relationships and molecular differentiation of three species of angiostrongylid nematodes (Angiostrongylus cantonensis, Angiostrongylus costaricensis and Angiostrongylus malaysiensis) were studied using the AC primers for a 66-kDa protein gene of A. cantonensis. The AC primers successfully amplified the genomic DNA of these angiostrongylid nematodes. No amplification was detected for the DNA of Ascaris lumbricoides, Ascaris suum, Anisakis simplex, Gnathostoma spinigerum, Toxocara canis, and Trichinella spiralis. The maximum-parsimony (MP) consensus tree and the maximum-likelihood (ML) tree both showed that the Angiostrongylus taxa could be divided into two major clades - Clade 1 (A. costaricensis) and Clade 2 (A. cantonensis and A. malaysiensis) with a full support bootstrap value. A. costaricensis is the most distant taxon. A. cantonensis is a sister group to A. malaysiensis; these two taxa (species) are clearly separated. There is no clear distinction between the A. cantonensis samples from four different geographical localities (Thailand, China, Japan and Hawaii); only some of the samples are grouped ranging from no support or low support to moderate support of bootstrap values. The published nucleotide sequences of A. cantonensis adult-specific native 66kDa protein mRNA, clone L5-400 from Taiwan (U17585) appear to be very distant from the A. cantonensis samples from Thailand, China, Japan and Hawaii, with the uncorrected p-distance values ranging from 26.87% to 29.92%.

  3. Relationships among genera of the Saccharomycotina from multigene sequence analysis

    USDA-ARS?s Scientific Manuscript database

    Most known species of the subphylum Saccharomycotina (budding ascomycetous yeasts) have now been placed in phylogenetically defined clades following multigene sequence analysis. Terminal clades, which are usually well supported from bootstrap analysis, are viewed as phylogenetically circumscribed ge...

  4. Sirius PSB: a generic system for analysis of biological sequences.

    PubMed

    Koh, Chuan Hock; Lin, Sharene; Jedd, Gregory; Wong, Limsoon

    2009-12-01

    Computational tools are essential components of modern biological research. For example, BLAST searches can be used to identify related proteins based on sequence homology, or when a new genome is sequenced, prediction models can be used to annotate functional sites such as transcription start sites, translation initiation sites and polyadenylation sites and to predict protein localization. Here we present Sirius Prediction Systems Builder (PSB), a new computational tool for sequence analysis, classification and searching. Sirius PSB has four main operations: (1) Building a classifier, (2) Deploying a classifier, (3) Search for proteins similar to query proteins, (4) Preliminary and post-prediction analysis. Sirius PSB supports all these operations via a simple and interactive graphical user interface. Besides being a convenient tool, Sirius PSB has also introduced two novelties in sequence analysis. Firstly, genetic algorithm is used to identify interesting features in the feature space. Secondly, instead of the conventional method of searching for similar proteins via sequence similarity, we introduced searching via features' similarity. To demonstrate the capabilities of Sirius PSB, we have built two prediction models - one for the recognition of Arabidopsis polyadenylation sites and another for the subcellular localization of proteins. Both systems are competitive against current state-of-the-art models based on evaluation of public datasets. More notably, the time and effort required to build each model is greatly reduced with the assistance of Sirius PSB. Furthermore, we show that under certain conditions when BLAST is unable to find related proteins, Sirius PSB can identify functionally related proteins based on their biophysical similarities. Sirius PSB and its related supplements are available at: http://compbio.ddns.comp.nus.edu.sg/~sirius.

  5. Bayesian Correlation Analysis for Sequence Count Data

    PubMed Central

    Lau, Nelson; Perkins, Theodore J.

    2016-01-01

    Evaluating the similarity of different measured variables is a fundamental task of statistics, and a key part of many bioinformatics algorithms. Here we propose a Bayesian scheme for estimating the correlation between different entities’ measurements based on high-throughput sequencing data. These entities could be different genes or miRNAs whose expression is measured by RNA-seq, different transcription factors or histone marks whose expression is measured by ChIP-seq, or even combinations of different types of entities. Our Bayesian formulation accounts for both measured signal levels and uncertainty in those levels, due to varying sequencing depth in different experiments and to varying absolute levels of individual entities, both of which affect the precision of the measurements. In comparison with a traditional Pearson correlation analysis, we show that our Bayesian correlation analysis retains high correlations when measurement confidence is high, but suppresses correlations when measurement confidence is low—especially for entities with low signal levels. In addition, we consider the influence of priors on the Bayesian correlation estimate. Perhaps surprisingly, we show that naive, uniform priors on entities’ signal levels can lead to highly biased correlation estimates, particularly when different experiments have widely varying sequencing depths. However, we propose two alternative priors that provably mitigate this problem. We also prove that, like traditional Pearson correlation, our Bayesian correlation calculation constitutes a kernel in the machine learning sense, and thus can be used as a similarity measure in any kernel-based machine learning algorithm. We demonstrate our approach on two RNA-seq datasets and one miRNA-seq dataset. PMID:27701449

  6. Congruence analysis of point clouds from unstable stereo image sequences

    NASA Astrophysics Data System (ADS)

    Jepping, C.; Bethmann, F.; Luhmann, T.

    2014-06-01

    This paper deals with the correction of exterior orientation parameters of stereo image sequences over deformed free-form surfaces without control points. Such imaging situation can occur, for example, during photogrammetric car crash test recordings where onboard high-speed stereo cameras are used to measure 3D surfaces. As a result of such measurements 3D point clouds of deformed surfaces are generated for a complete stereo sequence. The first objective of this research focusses on the development and investigation of methods for the detection of corresponding spatial and temporal tie points within the stereo image sequences (by stereo image matching and 3D point tracking) that are robust enough for a reliable handling of occlusions and other disturbances that may occur. The second objective of this research is the analysis of object deformations in order to detect stable areas (congruence analysis). For this purpose a RANSAC-based method for congruence analysis has been developed. This process is based on the sequential transformation of randomly selected point groups from one epoch to another by using a 3D similarity transformation. The paper gives a detailed description of the congruence analysis. The approach has been tested successfully on synthetic and real image data.

  7. A proteomic analysis of Pakistan Daboia russelii russelii venom and assessment of potency of Indian polyvalent and monovalent antivenom.

    PubMed

    Mukherjee, Ashis K; Kalita, Bhargab; Mackessy, Stephen P

    2016-07-20

    To address the dearth of knowledge on the biochemical composition of Pakistan Russell's Viper (Daboia russelii russelii) venom (RVV), the venom proteome has been analyzed and several biochemical and pharmacological properties of the venom were investigated. SDS-PAGE (reduced) analysis indicated that proteins/peptides in the molecular mass range of ~56.0-105.0kDa, 31.6-51.0kDa, 15.6-30.0kDa, 9.0-14.2kDa and 5.6-7.2kDa contribute approximately 9.8%, 12.1%, 13.4%, 34.1% and 30.5%, respectively of Pakistan RVV. Proteomics analysis of gel-filtration peaks of RVV resulted in identification of 75 proteins/peptides which belong to 14 distinct snake venom protein families. Phospholipases A2 (32.8%), Kunitz type serine protease inhibitors (28.4%), and snake venom metalloproteases (21.8%) comprised the majority of Pakistan RVV proteins, while 11 additional families accounted for 6.5-0.2%. Occurrence of aminotransferase, endo-β-glycosidase, and disintegrins is reported for the first time in RVV. Several of RVV proteins/peptides share significant sequence homology across Viperidae subfamilies. Pakistan RVV was well recognized by both the polyvalent (PAV) and monovalent (MAV) antivenom manufactured in India; nonetheless, immunological cross-reactivity determined by ELISA and neutralization of pro-coagulant/anticoagulant activity of RVV and its fractions by MAV surpassed that of PAV. The study establishes the proteome profile of the Pakistan RVV, thereby indicating the presence of diverse proteins and peptides that play a significant role in the pathophysiology of RVV bite. Further, the proteomic findings will contribute to understand the variation in venom composition owing to different geographical location and identification of pharmacologically important proteins in Pakistan RVV. Copyright © 2016. Published by Elsevier B.V.

  8. The 53-kDa proteolytic product of precursor starch-hydrolyzing enzyme of Aspergillus niger has Taka-amylase-like activity.

    PubMed

    Ravi-Kumar, K; Venkatesh, K S; Umesh-Kumar, S

    2007-04-01

    The 53-kDa amylase secreted by Aspergillus niger due to proteolytic processing of the precursor starch-hydrolyzing enzyme was resistant to acarbose, a potent alpha-glucosidase inhibitor. The enzyme production was induced when A. niger was grown in starch medium containing the inhibitor. Antibodies against the precursor enzyme cross-reacted with the 54-kDa Taka-amylase protein of A. oryzae. It resembled Taka-amylase in most of its properties and also hydrolyzed starch to maltose of alpha-anomeric configuration. However, it did not degrade maltotriose formed during the reaction and was not inhibited by zinc ions.

  9. Universal sequence map (USM) of arbitrary discrete sequences

    PubMed Central

    2002-01-01

    Background For over a decade the idea of representing biological sequences in a continuous coordinate space has maintained its appeal but not been fully realized. The basic idea is that any sequence of symbols may define trajectories in the continuous space conserving all its statistical properties. Ideally, such a representation would allow scale independent sequence analysis – without the context of fixed memory length. A simple example would consist on being able to infer the homology between two sequences solely by comparing the coordinates of any two homologous units. Results We have successfully identified such an iterative function for bijective mappingψ of discrete sequences into objects of continuous state space that enable scale-independent sequence analysis. The technique, named Universal Sequence Mapping (USM), is applicable to sequences with an arbitrary length and arbitrary number of unique units and generates a representation where map distance estimates sequence similarity. The novel USM procedure is based on earlier work by these and other authors on the properties of Chaos Game Representation (CGR). The latter enables the representation of 4 unit type sequences (like DNA) as an order free Markov Chain transition table. The properties of USM are illustrated with test data and can be verified for other data by using the accompanying web-based tool:http://bioinformatics.musc.edu/~jonas/usm/. Conclusions USM is shown to enable a statistical mechanics approach to sequence analysis. The scale independent representation frees sequence analysis from the need to assume a memory length in the investigation of syntactic rules. PMID:11895567

  10. Sequence determination and analysis of the NSs genes of two tospoviruses.

    PubMed

    Hallwass, Mariana; Leastro, Mikhail O; Lima, Mirtes F; Inoue-Nagata, Alice K; Resende, Renato O

    2012-03-01

    The tospoviruses groundnut ringspot virus (GRSV) and zucchini lethal chlorosis virus (ZLCV) cause severe losses in many crops, especially in solanaceous and cucurbit species. In this study, the non-structural NSs gene and the 5'UTRs of these two biologically distinct tospoviruses were cloned and sequenced. The NSs sequence of GRSV and ZLCV were both 1,404 nucleotides long. Pairwise comparison showed that the NSs amino acid sequence of GRSV shared 69.6% identity with that of ZLCV and 75.9% identity with that of TSWV, while the NSs sequence of ZLCV and TSWV shared 67.9% identity. Phylogenetic analysis based on NSs sequences confirmed that these viruses cluster in the American clade.

  11. Phylogenetic relationships of Malassezia species based on multilocus sequence analysis.

    PubMed

    Castellá, Gemma; Coutinho, Selene Dall' Acqua; Cabañes, F Javier

    2014-01-01

    Members of the genus Malassezia are lipophilic basidiomycetous yeasts, which are part of the normal cutaneous microbiota of humans and other warm-blooded animals. Currently, this genus consists of 14 species that have been characterized by phenetic and molecular methods. Although several molecular methods have been used to identify and/or differentiate Malassezia species, the sequencing of the rRNA genes and the chitin synthase-2 gene (CHS2) are the most widely employed. There is little information about the β-tubulin gene in the genus Malassezia, a gene has been used for the analysis of complex species groups. The aim of the present study was to sequence a fragment of the β-tubulin gene of Malassezia species and analyze their phylogenetic relationship using a multilocus sequence approach based on two rRNA genes (ITS including 5.8S rRNA and D1/D2 region of 26S rRNA) together with two protein encoding genes (CHS2 and β-tubulin). The phylogenetic study of the partial β-tubulin gene sequences indicated that this molecular marker can be used to assess diversity and identify new species. The multilocus sequence analysis of the four loci provides robust support to delineate species at the terminal nodes and could help to estimate divergence times for the origin and diversification of Malassezia species.

  12. Combined sequence and structure analysis of the fungal laccase family.

    PubMed

    Kumar, S V Suresh; Phale, Prashant S; Durani, S; Wangikar, Pramod P

    2003-08-20

    Plant and fungal laccases belong to the family of multi-copper oxidases and show much broader substrate specificity than other members of the family. Laccases have consequently been of interest for potential industrial applications. We have analyzed the essential sequence features of fungal laccases based on multiple sequence alignments of more than 100 laccases. This has resulted in identification of a set of four ungapped sequence regions, L1-L4, as the overall signature sequences that can be used to identify the laccases, distinguishing them within the broader class of multi-copper oxidases. The 12 amino acid residues in the enzymes serving as the copper ligands are housed within these four identified conserved regions, of which L2 and L4 conform to the earlier reported copper signature sequences of multi-copper oxidases while L1 and L3 are distinctive to the laccases. The mapping of regions L1-L4 on to the three-dimensional structure of the Coprinus cinerius laccase indicates that many of the non-copper-ligating residues of the conserved regions could be critical in maintaining a specific, more or less C-2 symmetric, protein conformational motif characterizing the active site apparatus of the enzymes. The observed intraprotein homologies between L1 and L3 and between L2 and L4 at both the structure and the sequence levels suggest that the quasi C-2 symmetric active site conformational motif may have arisen from a structural duplication event that neither the sequence homology analysis nor the structure homology analysis alone would have unraveled. Although the sequence and structure homology is not detectable in the rest of the protein, the relative orientation of region L1 with L2 is similar to that of L3 with L4. The structure duplication of first-shell and second-shell residues has become cryptic because the intraprotein sequence homology noticeable for a given laccase becomes significant only after comparing the conservation pattern in several fungal

  13. Combining conversation analysis and event sequencing to study health communication.

    PubMed

    Pecanac, Kristen E

    2018-06-01

    Good communication is essential in patient-centered care. The purpose of this paper is to describe conversation analysis and event sequencing and explain how integrating these methods strengthened the analysis in a study of communication between clinicians and surrogate decision makers in an intensive care unit. Conversation analysis was first used to determine how clinicians introduced the need for decision-making regarding life-sustaining treatment and how surrogate decision makers responded. Event sequence analysis then was used to determine the transitional probability (probability of one event leading to another in the interaction) that a given type of clinician introduction would lead to surrogate resistance or alignment. Conversation analysis provides a detailed analysis of the interaction between participants in a conversation. When combined with a quantitative analysis of the patterns of communication in an interaction, these data add information on the communication strategies that produce positive outcomes. Researchers can apply this mixed-methods approach to identify beneficial conversational practices and design interventions to improve health communication. © 2018 Wiley Periodicals, Inc.

  14. Structure of the Putative 32 kDa Myrosinase Binding Protein from Arabidopsis (At3g16450.1) Determined by SAIL-NMR

    PubMed Central

    Takeda, Mitsuhiro; Sugimori, Nozomi; Torizawa, Takuya; Terauchi, Tsutomu; Ono, Akira Mei; Yagi, Hirokazu; Yamaguchi, Yoshiki; Kato, Koichi; Ikeya, Teppei; Jee, JunGoo; Güntert, Peter; Aceti, David J.; Markley, John L.; Kainosho, Masatsune

    2009-01-01

    The product of gene At3g16450.1 from Arabidopsis thaliana is a 32 kDa, 299-residue protein classified as resembling a myrosinase-binding protein (MyroBP). MyroBPs are found in plants as part of a complex with the glucosinolate-degrading enzyme, myrosinase, and are suspected to play a role in myrosinase-dependent defense against pathogens. Many MyroBPs and MyroBP-related proteins are composed of repeated homologous sequences with unknown structure. We report here the three-dimensional structure of the At3g16450.1 protein from Arabidopsis, which consists of two tandem repeats. Because the size of the protein is larger than that amenable to high-throughput analysis by uniformly 13C/15N labeling methods, we used our stereo-array isotope labeling (SAIL) technology to prepare an optimally 2H/13C/15N-labeled sample. NMR data sets collected with the SAIL-protein enabled us to assign 1H, 13C and 15N chemical shifts to 95.5% of all atoms, even at the low concentration (0.2 mM) of the protein product. We collected additional NOESY data and solved the three-dimensional structure with the CYANA software package. The structure, the first for a MyroBP family member, revealed that the At3g16450.1 protein consists of two independent, but similar, lectin-fold domains composed of three β-sheets. PMID:19021763

  15. Interactive computer programs for the graphic analysis of nucleotide sequence data.

    PubMed Central

    Luckow, V A; Littlewood, R K; Rownd, R H

    1984-01-01

    A group of interactive computer programs have been developed which aid in the collection and graphical analysis of nucleotide and protein sequence data. The programs perform the following basic functions: a) enter, edit, list, and rearrange sequence data; b) permit automatic entry of nucleotide sequence data directly from an autoradiograph into the computer; c) search for restriction sites or other specified patterns and plot a linear or circular restriction map, or print their locations; d) plot base composition; e) analyze homology between sequences by plotting a two-dimensional graphic matrix; and f) aid in plotting predicted secondary structures of RNA molecules. PMID:6546437

  16. Sequence analysis of serum albumins reveals the molecular evolution of ligand recognition properties.

    PubMed

    Fanali, Gabriella; Ascenzi, Paolo; Bernardi, Giorgio; Fasano, Mauro

    2012-01-01

    Serum albumin (SA) is a circulating protein providing a depot and carrier for many endogenous and exogenous compounds. At least seven major binding sites have been identified by structural and functional investigations mainly in human SA. SA is conserved in vertebrates, with at least 49 entries in protein sequence databases. The multiple sequence analysis of this set of entries leads to the definition of a cladistic tree for the molecular evolution of SA orthologs in vertebrates, thus showing the clustering of the considered species, with lamprey SAs (Lethenteron japonicum and Petromyzon marinus) in a separate outgroup. Sequence analysis aimed at searching conserved domains revealed that most SA sequences are made up by three repeated domains (about 600 residues), as extensively characterized for human SA. On the contrary, lamprey SAs are giant proteins (about 1400 residues) comprising seven repeated domains. The phylogenetic analysis of the SA family reveals a stringent correlation with the taxonomic classification of the species available in sequence databases. A focused inspection of the sequences of ligand binding sites in SA revealed that in all sites most residues involved in ligand binding are conserved, although the versatility towards different ligands could be peculiar of higher organisms. Moreover, the analysis of molecular links between the different sites suggests that allosteric modulation mechanisms could be restricted to higher vertebrates.

  17. Pair-barcode high-throughput sequencing for large-scale multiplexed sample analysis.

    PubMed

    Tu, Jing; Ge, Qinyu; Wang, Shengqin; Wang, Lei; Sun, Beili; Yang, Qi; Bai, Yunfei; Lu, Zuhong

    2012-01-25

    The multiplexing becomes the major limitation of the next-generation sequencing (NGS) in application to low complexity samples. Physical space segregation allows limited multiplexing, while the existing barcode approach only permits simultaneously analysis of up to several dozen samples. Here we introduce pair-barcode sequencing (PBS), an economic and flexible barcoding technique that permits parallel analysis of large-scale multiplexed samples. In two pilot runs using SOLiD sequencer (Applied Biosystems Inc.), 32 independent pair-barcoded miRNA libraries were simultaneously discovered by the combination of 4 unique forward barcodes and 8 unique reverse barcodes. Over 174,000,000 reads were generated and about 64% of them are assigned to both of the barcodes. After mapping all reads to pre-miRNAs in miRBase, different miRNA expression patterns are captured from the two clinical groups. The strong correlation using different barcode pairs and the high consistency of miRNA expression in two independent runs demonstrates that PBS approach is valid. By employing PBS approach in NGS, large-scale multiplexed pooled samples could be practically analyzed in parallel so that high-throughput sequencing economically meets the requirements of samples which are low sequencing throughput demand.

  18. Complementation of a mutant cell line: central role of the 91 kDa polypeptide of ISGF3 in the interferon-alpha and -gamma signal transduction pathways.

    PubMed Central

    Müller, M; Laxton, C; Briscoe, J; Schindler, C; Improta, T; Darnell, J E; Stark, G R; Kerr, I M

    1993-01-01

    Mutants in complementation group U3, completely defective in the response of all genes tested to interferons (IFNs) alpha and gamma, do not express the 91 and 84 kDa polypeptide components of interferon-stimulated gene factor 3 (ISGF3), a transcription factor known to play a primary role in the IFN-alpha response pathway. The 91 and 84 kDa polypeptides are products of a single gene. They result from differential splicing and differ only in a 38 amino acid extension at the C-terminus of the 91 kDa polypeptide. Complementation of U3 mutants with cDNA constructs expressing the 91 kDa product at levels comparable to those observed in induced wild-type cells completely restored the response to both IFN-alpha and -gamma and the ability to form ISGF3. Complementation with the 84 kDa component similarly restored the ability to form ISGF3 and, albeit to a lower level, the IFN-alpha response of all genes tested so far. It failed, however, to restore the IFN-gamma response of any gene analysed. The precise nature of the DNA motifs and combination of factors required for the transcriptional response of all genes inducible by IFN-alpha and -gamma remains to be established. The results presented here, however, emphasize the apparent general requirement of the 91 kDa polypeptide in the primary transcriptional response to both types of IFN. Images PMID:7693454

  19. Computational Analysis of Mouse piRNA Sequence and Biogenesis

    PubMed Central

    Betel, Doron; Sheridan, Robert; Marks, Debora S; Sander, Chris

    2007-01-01

    The recent discovery of a new class of 30-nucleotide long RNAs in mammalian testes, called PIWI-interacting RNA (piRNA), with similarities to microRNAs and repeat-associated small interfering RNAs (rasiRNAs), has raised puzzling questions regarding their biogenesis and function. We report a comparative analysis of currently available piRNA sequence data from the pachytene stage of mouse spermatogenesis that sheds light on their sequence diversity and mechanism of biogenesis. We conclude that (i) there are at least four times as many piRNAs in mouse testes than currently known; (ii) piRNAs, which originate from long precursor transcripts, are generated by quasi-random enzymatic processing that is guided by a weak sequence signature at the piRNA 5′ends resulting in a large number of distinct sequences; and (iii) many of the piRNA clusters contain inverted repeats segments capable of forming double-strand RNA fold-back segments that may initiate piRNA processing analogous to transposon silencing. PMID:17997596

  20. Molecular Cloning, Characterization, and Expression Analysis of a Prolyl 4-Hydroxylase from the Marine Sponge Chondrosia reniformis.

    PubMed

    Pozzolini, Marina; Scarfì, Sonia; Mussino, Francesca; Ferrando, Sara; Gallus, Lorenzo; Giovine, Marco

    2015-08-01

    Prolyl 4-hydroxylase (P4H) catalyzes the hydroxylation of proline residues in collagen. P4H has two functional subunits, α and β. Here, we report the cDNA cloning, characterization, and expression analysis of the α and β subunits of the P4H derived from the marine sponge Chondrosia reniformis. The amino acid sequence of the α subunit is 533 residues long with an M r of 59.14 kDa, while the β subunit counts 526 residues with an M r of 58.75 kDa. Phylogenetic analyses showed that αP4H and βP4H are more related to the mammalian sequences than to known invertebrate P4Hs. Western blot analysis of sponge lysate protein cross-linking revealed a band of 240 kDa corresponding to an α2β2 tetramer structure. This result suggests that P4H from marine sponges shares the same quaternary structure with vertebrate homologous enzymes. Gene expression analyses showed that αP4H transcript is higher in the choanosome than in the ectosome, while the study of factors affecting its expression in sponge fragmorphs revealed that soluble silicates had no effect on the αP4H levels, whereas ascorbic acid strongly upregulated the αP4H mRNA. Finally, treatment with two different tumor necrosis factor (TNF)-alpha inhibitors determined a significant downregulation of αP4H gene expression in fragmorphs demonstrating, for the first time in Porifera, a positive involvement of TNF in sponge matrix biosynthesis. The molecular characterization of P4H genes involved in collagen hydroxylation, including the mechanisms that regulate their expression, is a key step for future recombinant sponge collagen production and may be pivotal to understand pathological mechanisms related to extracellular matrix deposition in higher organisms.

  1. Food Fish Identification from DNA Extraction through Sequence Analysis

    ERIC Educational Resources Information Center

    Hallen-Adams, Heather E.

    2015-01-01

    This experiment exposed 3rd and 4th y undergraduates and graduate students taking a course in advanced food analysis to DNA extraction, polymerase chain reaction (PCR), and DNA sequence analysis. Students provided their own fish sample, purchased from local grocery stores, and the class as a whole extracted DNA, which was then subjected to PCR,…

  2. Bacterial community comparisons by taxonomy-supervised analysis independent of sequence alignment and clustering

    PubMed Central

    Sul, Woo Jun; Cole, James R.; Jesus, Ederson da C.; Wang, Qiong; Farris, Ryan J.; Fish, Jordan A.; Tiedje, James M.

    2011-01-01

    High-throughput sequencing of 16S rRNA genes has increased our understanding of microbial community structure, but now even higher-throughput methods to the Illumina scale allow the creation of much larger datasets with more samples and orders-of-magnitude more sequences that swamp current analytic methods. We developed a method capable of handling these larger datasets on the basis of assignment of sequences into an existing taxonomy using a supervised learning approach (taxonomy-supervised analysis). We compared this method with a commonly used clustering approach based on sequence similarity (taxonomy-unsupervised analysis). We sampled 211 different bacterial communities from various habitats and obtained ∼1.3 million 16S rRNA sequences spanning the V4 hypervariable region by pyrosequencing. Both methodologies gave similar ecological conclusions in that β-diversity measures calculated by using these two types of matrices were significantly correlated to each other, as were the ordination configurations and hierarchical clustering dendrograms. In addition, our taxonomy-supervised analyses were also highly correlated with phylogenetic methods, such as UniFrac. The taxonomy-supervised analysis has the advantages that it is not limited by the exhaustive computation required for the alignment and clustering necessary for the taxonomy-unsupervised analysis, is more tolerant of sequencing errors, and allows comparisons when sequences are from different regions of the 16S rRNA gene. With the tremendous expansion in 16S rRNA data acquisition underway, the taxonomy-supervised approach offers the potential to provide more rapid and extensive community comparisons across habitats and samples. PMID:21873204

  3. Analysis of DNA Sequences by an Optical Time-Integrating Correlator: Proof-of-Concept Experiments.

    DTIC Science & Technology

    1992-05-01

    DNA ANALYSIS STRATEGY 4 2.1 Representation of DNA Bases 4 2.2 DNA Analysis Strategy 6 3.0 CUSTOM GENERATORS FOR DNA SEQUENCES 10 3.1 Hardware Design 10...of the DNA bases where each base is represented by a 7-bits long pseudorandom sequence. 5 Figure 4: Coarse analysis of a DNA sequence. 7 Figure 5: Fine...a 20-bases long database. 32 xiii LIST OF TABLES PAGE Table 1: Short representations of the DNA bases where each base is represented by 7-bits long

  4. Pathway analysis with next-generation sequencing data.

    PubMed

    Zhao, Jinying; Zhu, Yun; Boerwinkle, Eric; Xiong, Momiao

    2015-04-01

    Although pathway analysis methods have been developed and successfully applied to association studies of common variants, the statistical methods for pathway-based association analysis of rare variants have not been well developed. Many investigators observed highly inflated false-positive rates and low power in pathway-based tests of association of rare variants. The inflated false-positive rates and low true-positive rates of the current methods are mainly due to their lack of ability to account for gametic phase disequilibrium. To overcome these serious limitations, we develop a novel statistic that is based on the smoothed functional principal component analysis (SFPCA) for pathway association tests with next-generation sequencing data. The developed statistic has the ability to capture position-level variant information and account for gametic phase disequilibrium. By intensive simulations, we demonstrate that the SFPCA-based statistic for testing pathway association with either rare or common or both rare and common variants has the correct type 1 error rates. Also the power of the SFPCA-based statistic and 22 additional existing statistics are evaluated. We found that the SFPCA-based statistic has a much higher power than other existing statistics in all the scenarios considered. To further evaluate its performance, the SFPCA-based statistic is applied to pathway analysis of exome sequencing data in the early-onset myocardial infarction (EOMI) project. We identify three pathways significantly associated with EOMI after the Bonferroni correction. In addition, our preliminary results show that the SFPCA-based statistic has much smaller P-values to identify pathway association than other existing methods.

  5. Analysis of the regulatory region of the protease III (ptr) gene of Escherichia coli K-12.

    PubMed

    Claverie-Martin, F; Diaz-Torres, M R; Kushner, S R

    1987-01-01

    The ptr gene of Escherichia coli encodes protease III (Mr 110,000) and a 50-kDa polypeptide, both of which are found in the periplasmic space. The gene is physically located between the recC and recB loci on the E. coli chromosome. The nucleotide sequence of a 1167-bp EcoRV-ClaI fragment of chromosomal DNA containing the promoter region and 885 bp of the ptr coding sequence has been determined. S1 nuclease mapping analysis showed that the major 5' end of the ptr mRNA was localized 127 bp upstream from the ATG start codon. The open reading frame (ORF), preceded by a Shine-Dalgarno sequence, extends to the end of the sequenced DNA. Downstream from the -35 and -10 regions is a sequence that strongly fits the consensus sequence of known nitrogen-regulated promoters. A signal peptide of 23 amino acids residues is present at the N terminus of the derived amino acid sequence. The cleavage site as well as the ORF were confirmed by sequencing the N terminus of mature protease III.

  6. Using SQL Databases for Sequence Similarity Searching and Analysis.

    PubMed

    Pearson, William R; Mackey, Aaron J

    2017-09-13

    Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  7. Faradaurate nanomolecules: a superstable plasmonic 76.3 kDa cluster.

    PubMed

    Dass, Amala

    2011-12-07

    Information on the emergence of the characteristic plasmonic optical properties of nanoscale noble-metal particles has been limited, due in part to the problem of preparing homogeneous material for ensemble measurements. Here, we report the identification, isolation, and mass spectrometric and optical characterization of a 76.3 kDa thiolate-protected gold nanoparticle. This giant molecule is far larger than any metal-cluster compound, those with direct metal-to-metal bonding, previously known as homogeneous molecular substances, and is the first to exhibit clear plasmonic properties. The observed plasmon emergence phenomena in nanomolecules are of great interest, and the availability of absolutely homogeneous and characterized samples is thus critical to establishing their origin. © 2011 American Chemical Society

  8. Ultrahigh-Resolution Differential Ion Mobility Separations of Conformers for Proteins above 10 kDa: Onset of Dipole Alignment?

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shvartsburg, Alexandre A.

    2014-11-04

    Biomacromolecules tend to assume numerous structures in solution or the gas phase. It has been possible to resolve disparate conformational families but not unique geometries within each, and drastic peak broadening has been the bane of protein analyses by chromatography, electrophoresis, and ion mobility spectrometry (IMS). The new differential IMS (FAIMS) approach using hydrogen-rich gases was recently found to separate conformers of a small protein ubiquitin with same peak width and resolving power up to ~400 as for peptides. Present work explores the reach of this approach for larger proteins, exemplified by cytochrome c and myoglobin. Resolution similar to thatmore » for ubiquitin was largely achieved with longer separations, while the onset of peak broadening and coalescence with shorter separations suggests the limitation of present technique to proteins under ~20 kDa. This capability may enable distinguishing whole proteins with differing residue sequences or localizations of posttranslational modifications. Small features at negative compensation voltages that markedly grow from cytochrome c to myoglobin indicate the dipole alignment of rare conformers in accord with theory, further supporting the concept of pendular macroions in FAIMS.« less

  9. Wasabi: An Integrated Platform for Evolutionary Sequence Analysis and Data Visualization.

    PubMed

    Veidenberg, Andres; Medlar, Alan; Löytynoja, Ari

    2016-04-01

    Wasabi is an open source, web-based environment for evolutionary sequence analysis. Wasabi visualizes sequence data together with a phylogenetic tree within a modern, user-friendly interface: The interface hides extraneous options, supports context sensitive menus, drag-and-drop editing, and displays additional information, such as ancestral sequences, associated with specific tree nodes. The Wasabi environment supports reproducibility by automatically storing intermediate analysis steps and includes built-in functions to share data between users and publish analysis results. For computational analysis, Wasabi supports PRANK and PAGAN for phylogeny-aware alignment and alignment extension, and it can be easily extended with other tools. Along with drag-and-drop import of local files, Wasabi can access remote data through URL and import sequence data, GeneTrees and EPO alignments directly from Ensembl. To demonstrate a typical workflow using Wasabi, we reproduce key findings from recent comparative genomics studies, including a reanalysis of the EGLN1 gene from the tiger genome study: These case studies can be browsed within Wasabi at http://wasabiapp.org:8000?id=usecases. Wasabi runs inside a web browser and does not require any installation. One can start using it at http://wasabiapp.org. All source code is licensed under the AGPLv3. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  10. Time fluctuation analysis of forest fire sequences

    NASA Astrophysics Data System (ADS)

    Vega Orozco, Carmen D.; Kanevski, Mikhaïl; Tonini, Marj; Golay, Jean; Pereira, Mário J. G.

    2013-04-01

    Forest fires are complex events involving both space and time fluctuations. Understanding of their dynamics and pattern distribution is of great importance in order to improve the resource allocation and support fire management actions at local and global levels. This study aims at characterizing the temporal fluctuations of forest fire sequences observed in Portugal, which is the country that holds the largest wildfire land dataset in Europe. This research applies several exploratory data analysis measures to 302,000 forest fires occurred from 1980 to 2007. The applied clustering measures are: Morisita clustering index, fractal and multifractal dimensions (box-counting), Ripley's K-function, Allan Factor, and variography. These algorithms enable a global time structural analysis describing the degree of clustering of a point pattern and defining whether the observed events occur randomly, in clusters or in a regular pattern. The considered methods are of general importance and can be used for other spatio-temporal events (i.e. crime, epidemiology, biodiversity, geomarketing, etc.). An important contribution of this research deals with the analysis and estimation of local measures of clustering that helps understanding their temporal structure. Each measure is described and executed for the raw data (forest fires geo-database) and results are compared to reference patterns generated under the null hypothesis of randomness (Poisson processes) embedded in the same time period of the raw data. This comparison enables estimating the degree of the deviation of the real data from a Poisson process. Generalizations to functional measures of these clustering methods, taking into account the phenomena, were also applied and adapted to detect time dependences in a measured variable (i.e. burned area). The time clustering of the raw data is compared several times with the Poisson processes at different thresholds of the measured function. Then, the clustering measure value

  11. Asymmetry of perceived key movement in chorale sequences: converging evidence from a probe-tone analysis.

    PubMed

    Cuddy, L L; Thompson, W F

    1992-01-01

    In a probe-tone experiment, two groups of listeners--one trained, the other untrained, in traditional music theory--rated the goodness of fit of each of the 12 notes of the chromatic scale to four-voice harmonic sequences. Sequences were 12 simplified excerpts from Bach chorales, 4 nonmodulating, and 8 modulating. Modulations occurred either one or two steps in either the clockwise or the counterclockwise direction on the cycle of fifths. A consistent pattern of probe-tone ratings was obtained for each sequence, with no significant differences between listener groups. Two methods of analysis (Fourier analysis and regression analysis) revealed a directional asymmetry in the perceived key movement conveyed by modulating sequences. For a given modulation distance, modulations in the counterclockwise direction effected a clearer shift in tonal organization toward the final key than did clockwise modulations. The nature of the directional asymmetry was consistent with results reported for identification and rating of key change in the sequences (Thompson & Cuddy, 1989a). Further, according to the multiple-regression analysis, probe-tone ratings did not merely reflect the distribution of tones in the sequence. Rather, ratings were sensitive to the temporal structure of the tonal organization in the sequence.

  12. Computational analysis of sequence selection mechanisms.

    PubMed

    Meyerguz, Leonid; Grasso, Catherine; Kleinberg, Jon; Elber, Ron

    2004-04-01

    Mechanisms leading to gene variations are responsible for the diversity of species and are important components of the theory of evolution. One constraint on gene evolution is that of protein foldability; the three-dimensional shapes of proteins must be thermodynamically stable. We explore the impact of this constraint and calculate properties of foldable sequences using 3660 structures from the Protein Data Bank. We seek a selection function that receives sequences as input, and outputs survival probability based on sequence fitness to structure. We compute the number of sequences that match a particular protein structure with energy lower than the native sequence, the density of the number of sequences, the entropy, and the "selection" temperature. The mechanism of structure selection for sequences longer than 200 amino acids is approximately universal. For shorter sequences, it is not. We speculate on concrete evolutionary mechanisms that show this behavior.

  13. Characterization, efficacy, pharmacokinetics, and biodistribution of 5kDa mPEG modified tetrameric canine uricase variant.

    PubMed

    Zhang, Chun; Fan, Kai; Luo, Hua; Ma, Xuefeng; Liu, Riyong; Yang, Li; Hu, Chunlan; Chen, Zhenmin; Min, Zhiqiang; Wei, Dongzhi

    2012-07-01

    PEGylated uricase is a promising anti-gout drug, but the only commercially marketed 10kDa mPEG modified porcine-like uricase (Pegloticase) can only be used for intravenous infusion. In this study, tetrameric canine uricase variant was modified by covalent conjugation of all accessible ɛ amino sites of lysine residues with a smaller 5kDa mPEG (mPEG-UHC). The average modification degree and PEGylation homogeneity were evaluated. Approximately 9.4 5 kDa mPEG chains were coupled to each monomeric uricase and the main conjugates contained 7-11 mPEG chains per subunit. mPEG-UHC showed significantly therapeutic or preventive effect on uric acid nephropathy and acute urate arthritis based on three different animal models. The clearance rate from an intravenous injection of mPEG-UHC varied significantly between species, at 2.61 mL/h/kg for rats and 0.21 mL/h/kg for monkeys. The long elimination half-life of mPEG-UHC in non-human primate (191.48 h, intravenous injection) indicated the long-term effects in humans. Moreover, the acceptable bioavailability of mPEG-UHC after subcutaneous administration in monkeys (94.21%) suggested that subcutaneous injection may be regarded as a candidate administration route in clinical trails. Non-specific tissue distribution was observed after administration of (125)I-labeled mPEG-UHC in rats, and elimination by the kidneys into the urine is the primary excretion route. Copyright © 2012 Elsevier B.V. All rights reserved.

  14. MetaSeq: privacy preserving meta-analysis of sequencing-based association studies.

    PubMed

    Singh, Angad Pal; Zafer, Samreen; Pe'er, Itsik

    2013-01-01

    Human genetics recently transitioned from GWAS to studies based on NGS data. For GWAS, small effects dictated large sample sizes, typically made possible through meta-analysis by exchanging summary statistics across consortia. NGS studies groupwise-test for association of multiple potentially-causal alleles along each gene. They are subject to similar power constraints and therefore likely to resort to meta-analysis as well. The problem arises when considering privacy of the genetic information during the data-exchange process. Many scoring schemes for NGS association rely on the frequency of each variant thus requiring the exchange of identity of the sequenced variant. As such variants are often rare, potentially revealing the identity of their carriers and jeopardizing privacy. We have thus developed MetaSeq, a protocol for meta-analysis of genome-wide sequencing data by multiple collaborating parties, scoring association for rare variants pooled per gene across all parties. We tackle the challenge of tallying frequency counts of rare, sequenced alleles, for metaanalysis of sequencing data without disclosing the allele identity and counts, thereby protecting sample identity. This apparent paradoxical exchange of information is achieved through cryptographic means. The key idea is that parties encrypt identity of genes and variants. When they transfer information about frequency counts in cases and controls, the exchanged data does not convey the identity of a mutation and therefore does not expose carrier identity. The exchange relies on a 3rd party, trusted to follow the protocol although not trusted to learn about the raw data. We show applicability of this method to publicly available exome-sequencing data from multiple studies, simulating phenotypic information for powerful meta-analysis. The MetaSeq software is publicly available as open source.

  15. Sequence-dependent modelling of local DNA bending phenomena: curvature prediction and vibrational analysis.

    PubMed

    Vlahovicek, K; Munteanu, M G; Pongor, S

    1999-01-01

    Bending is a local conformational micropolymorphism of DNA in which the original B-DNA structure is only distorted but not extensively modified. Bending can be predicted by simple static geometry models as well as by a recently developed elastic model that incorporate sequence dependent anisotropic bendability (SDAB). The SDAB model qualitatively explains phenomena including affinity of protein binding, kinking, as well as sequence-dependent vibrational properties of DNA. The vibrational properties of DNA segments can be studied by finite element analysis of a model subjected to an initial bending moment. The frequency spectrum is obtained by applying Fourier analysis to the displacement values in the time domain. This analysis shows that the spectrum of the bending vibrations quite sensitively depends on the sequence, for example the spectrum of a curved sequence is characteristically different from the spectrum of straight sequence motifs of identical basepair composition. Curvature distributions are genome-specific, and pronounced differences are found between protein-coding and regulatory regions, respectively, that is, sites of extreme curvature and/or bendability are less frequent in protein-coding regions. A WWW server is set up for the prediction of curvature and generation of 3D models from DNA sequences (http:@www.icgeb.trieste.it/dna).

  16. Whole-Genome Sequencing in Outbreak Analysis

    PubMed Central

    Turner, Stephen D.; Riley, Margaret F.; Petri, William A.; Hewlett, Erik L.

    2015-01-01

    SUMMARY In addition to the ever-present concern of medical professionals about epidemics of infectious diseases, the relative ease of access and low cost of obtaining, producing, and disseminating pathogenic organisms or biological toxins mean that bioterrorism activity should also be considered when facing a disease outbreak. Utilization of whole-genome sequencing (WGS) in outbreak analysis facilitates the rapid and accurate identification of virulence factors of the pathogen and can be used to identify the path of disease transmission within a population and provide information on the probable source. Molecular tools such as WGS are being refined and advanced at a rapid pace to provide robust and higher-resolution methods for identifying, comparing, and classifying pathogenic organisms. If these methods of pathogen characterization are properly applied, they will enable an improved public health response whether a disease outbreak was initiated by natural events or by accidental or deliberate human activity. The current application of next-generation sequencing (NGS) technology to microbial WGS and microbial forensics is reviewed. PMID:25876885

  17. The Effects of Threonine Phosphorylation on the Stability and Dynamics of the Central Molecular Switch Region of 18.5-kDa Myelin Basic Protein

    PubMed Central

    De Avila, Miguel; Polverini, Eugenia; Harauz, George

    2013-01-01

    The classic isoforms of myelin basic protein (MBP) are essential for the formation and maintenance of myelin in the central nervous system of higher vertebrates. The protein is involved in all facets of the development, compaction, and stabilization of the multilamellar myelin sheath, and also interacts with cytoskeletal and signaling proteins. The predominant 18.5-kDa isoform of MBP is an intrinsically-disordered protein that is a candidate auto-antigen in the human demyelinating disease multiple sclerosis. A highly-conserved central segment within classic MBP consists of a proline-rich region (murine 18.5-kDa sequence –T92-P93-R94-T95-P96-P97-P98-S99–) containing a putative SH3-ligand, adjacent to a region that forms an amphipathic α-helix (P82-I90) upon interaction with membranes, or under membrane-mimetic conditions. The T92 and T95 residues within the proline-rich region can be post-translationally modified through phosphorylation by mitogen-activated protein (MAP) kinases. Here, we have investigated the structure of the α-helical and proline-rich regions in dilute aqueous buffer, and have evaluated the effects of phosphorylation at T92 and T95 on the stability and dynamics of the α-helical region, by utilizing four 36-residue peptides (S72–S107) with differing phosphorylation status. Nuclear magnetic resonance spectroscopy reveals that both the α-helical as well as the proline-rich regions are disordered in aqueous buffer, whereas they are both structured in a lipid environment (cf., Ahmed et al., Biochemistry 51, 7475-9487, 2012). Thermodynamic analysis of trifluoroethanol-titration curves monitored by circular dichroism spectroscopy reveals that phosphorylation, especially at residue T92, impedes formation of the amphipathic α-helix. This conclusion is supported by molecular dynamics simulations, which further illustrate that phosphorylation reduces the folding reversibility of the α-helix upon temperature perturbation and affect the global

  18. The effects of threonine phosphorylation on the stability and dynamics of the central molecular switch region of 18.5-kDa myelin basic protein.

    PubMed

    Vassall, Kenrick A; Bessonov, Kyrylo; De Avila, Miguel; Polverini, Eugenia; Harauz, George

    2013-01-01

    The classic isoforms of myelin basic protein (MBP) are essential for the formation and maintenance of myelin in the central nervous system of higher vertebrates. The protein is involved in all facets of the development, compaction, and stabilization of the multilamellar myelin sheath, and also interacts with cytoskeletal and signaling proteins. The predominant 18.5-kDa isoform of MBP is an intrinsically-disordered protein that is a candidate auto-antigen in the human demyelinating disease multiple sclerosis. A highly-conserved central segment within classic MBP consists of a proline-rich region (murine 18.5-kDa sequence -T92-P93-R94-T95-P96-P97-P98-S99-) containing a putative SH3-ligand, adjacent to a region that forms an amphipathic α-helix (P82-I90) upon interaction with membranes, or under membrane-mimetic conditions. The T92 and T95 residues within the proline-rich region can be post-translationally modified through phosphorylation by mitogen-activated protein (MAP) kinases. Here, we have investigated the structure of the α-helical and proline-rich regions in dilute aqueous buffer, and have evaluated the effects of phosphorylation at T92 and T95 on the stability and dynamics of the α-helical region, by utilizing four 36-residue peptides (S72-S107) with differing phosphorylation status. Nuclear magnetic resonance spectroscopy reveals that both the α-helical as well as the proline-rich regions are disordered in aqueous buffer, whereas they are both structured in a lipid environment (cf., Ahmed et al., Biochemistry 51, 7475-9487, 2012). Thermodynamic analysis of trifluoroethanol-titration curves monitored by circular dichroism spectroscopy reveals that phosphorylation, especially at residue T92, impedes formation of the amphipathic α-helix. This conclusion is supported by molecular dynamics simulations, which further illustrate that phosphorylation reduces the folding reversibility of the α-helix upon temperature perturbation and affect the global structure

  19. Plant mitochondrial pyruvate dehydrogenase complex: purification and identification of catalytic components in potato.

    PubMed Central

    Millar, A H; Knorpp, C; Leaver, C J; Hill, S A

    1998-01-01

    The pyruvate dehydrogenase complex (mPDC) from potato (Solanum tuberosum cv. Romano) tuber mitochondria was purified 40-fold to a specific activity of 5.60 micromol/min per mg of protein. The activity of the complex depended on pyruvate, divalent cations, NAD+ and CoA and was competitively inhibited by both NADH and acetyl-CoA. SDS/PAGE revealed the complex consisted of seven polypeptide bands with apparent molecular masses of 78, 60, 58, 55, 43, 41 and 37 kDa. N-terminal sequencing revealed that the 78 kDa protein was dihydrolipoamide transacetylase (E2), the 58 kDa protein was dihydrolipoamide dehydrogenase (E3), the 43 and 41 kDa proteins were alpha subunits of pyruvate dehydrogenase, and the 37 kDa protein was the beta subunit of pyruvate dehydrogenase. N-terminal sequencing of the 55 kDa protein band yielded two protein sequences: one was another E3; the other was similar to the sequence of E2 from plant and yeast sources but was distinctly different from the sequence of the 78 kDa protein. Incubation of the mPDC with [2-14C]pyruvate resulted in the acetylation of both the 78 and 55 kDa proteins. PMID:9729464

  20. Vidjil: A Web Platform for Analysis of High-Throughput Repertoire Sequencing.

    PubMed

    Duez, Marc; Giraud, Mathieu; Herbert, Ryan; Rocher, Tatiana; Salson, Mikaël; Thonier, Florian

    2016-01-01

    The B and T lymphocytes are white blood cells playing a key role in the adaptive immunity. A part of their DNA, called the V(D)J recombinations, is specific to each lymphocyte, and enables recognition of specific antigenes. Today, with new sequencing techniques, one can get billions of DNA sequences from these regions. With dedicated Repertoire Sequencing (RepSeq) methods, it is now possible to picture population of lymphocytes, and to monitor more accurately the immune response as well as pathologies such as leukemia. Vidjil is an open-source platform for the interactive analysis of high-throughput sequencing data from lymphocyte recombinations. It contains an algorithm gathering reads into clonotypes according to their V(D)J junctions, a web application made of a sample, experiment and patient database and a visualization for the analysis of clonotypes along the time. Vidjil is implemented in C++, Python and Javascript and licensed under the GPLv3 open-source license. Source code, binaries and a public web server are available at http://www.vidjil.org and at http://bioinfo.lille.inria.fr/vidjil. Using the Vidjil web application consists of four steps: 1. uploading a raw sequence file (typically a FASTQ); 2. running RepSeq analysis software; 3. visualizing the results; 4. annotating the results and saving them for future use. For the end-user, the Vidjil web application needs no specific installation and just requires a connection and a modern web browser. Vidjil is used by labs in hematology or immunology for research and clinical applications.

  1. Vidjil: A Web Platform for Analysis of High-Throughput Repertoire Sequencing

    PubMed Central

    Duez, Marc; Herbert, Ryan; Rocher, Tatiana; Salson, Mikaël; Thonier, Florian

    2016-01-01

    Background The B and T lymphocytes are white blood cells playing a key role in the adaptive immunity. A part of their DNA, called the V(D)J recombinations, is specific to each lymphocyte, and enables recognition of specific antigenes. Today, with new sequencing techniques, one can get billions of DNA sequences from these regions. With dedicated Repertoire Sequencing (RepSeq) methods, it is now possible to picture population of lymphocytes, and to monitor more accurately the immune response as well as pathologies such as leukemia. Methods and Results Vidjil is an open-source platform for the interactive analysis of high-throughput sequencing data from lymphocyte recombinations. It contains an algorithm gathering reads into clonotypes according to their V(D)J junctions, a web application made of a sample, experiment and patient database and a visualization for the analysis of clonotypes along the time. Vidjil is implemented in C++, Python and Javascript and licensed under the GPLv3 open-source license. Source code, binaries and a public web server are available at http://www.vidjil.org and at http://bioinfo.lille.inria.fr/vidjil. Using the Vidjil web application consists of four steps: 1. uploading a raw sequence file (typically a FASTQ); 2. running RepSeq analysis software; 3. visualizing the results; 4. annotating the results and saving them for future use. For the end-user, the Vidjil web application needs no specific installation and just requires a connection and a modern web browser. Vidjil is used by labs in hematology or immunology for research and clinical applications. PMID:27835690

  2. Pair-barcode high-throughput sequencing for large-scale multiplexed sample analysis

    PubMed Central

    2012-01-01

    Background The multiplexing becomes the major limitation of the next-generation sequencing (NGS) in application to low complexity samples. Physical space segregation allows limited multiplexing, while the existing barcode approach only permits simultaneously analysis of up to several dozen samples. Results Here we introduce pair-barcode sequencing (PBS), an economic and flexible barcoding technique that permits parallel analysis of large-scale multiplexed samples. In two pilot runs using SOLiD sequencer (Applied Biosystems Inc.), 32 independent pair-barcoded miRNA libraries were simultaneously discovered by the combination of 4 unique forward barcodes and 8 unique reverse barcodes. Over 174,000,000 reads were generated and about 64% of them are assigned to both of the barcodes. After mapping all reads to pre-miRNAs in miRBase, different miRNA expression patterns are captured from the two clinical groups. The strong correlation using different barcode pairs and the high consistency of miRNA expression in two independent runs demonstrates that PBS approach is valid. Conclusions By employing PBS approach in NGS, large-scale multiplexed pooled samples could be practically analyzed in parallel so that high-throughput sequencing economically meets the requirements of samples which are low sequencing throughput demand. PMID:22276739

  3. Molecular characterization of the Serratia marcescens OmpF porin, and analysis of S. marcescens OmpF and OmpC osmoregulation.

    PubMed

    Hutsul, J A; Worobec, E

    1997-08-01

    Serratia marcescens is a nosocomial pathogen with a high incidence of beta-lactam resistance. Reduced amounts of outer-membrane porins have been correlated with increased resistance to beta-lactams but only one porin, OmpC, has been characterized at the molecular level. In this study we present the molecular characterization of a second porin, OmpF, and an analysis of the expression of S. marcescens porins in response to various environmental changes. Two porins were isolated from the outer membrane using urea-SDS-PAGE and the relative amounts were shown to be influenced by the osmolarity of the medium and the presence of salicylate. From a S. marcescens genomic DNA library an 8 kb EcoRI fragment was isolated that hybridized with an oligonucleotide encoding the published N-terminal amino acid sequence of the S. marcescens 41 kDa porin. A 41 kDa protein was detected in the outer membrane of Escherichia coli NM522 carrying the cloned S. marcescens DNA. The cloned gene was sequenced and shown to code for a protein that shared 60-70% identity with other known OmpF and OmpC sequences. The upstream DNA sequence of the S. marcescens gene was similar to the corresponding E. coli ompF sequence; however, a regulatory element important in repression of E. coli ompF at high osmolarity was absent. The cloned S. marcescens OmpF in E. coli increased in expression in conditions of high osmolarity. The potential involvement of micF in the observed osmoregulation of S. marcescens porins is discussed.

  4. [Cloning and bioinformatic analysis and expression analysis of beta-glucuronidase in Scutellaria baicalensis].

    PubMed

    Guo, Shuang-shuang; Cheng, Lin; Yang, Li-min; Han, Mei

    2015-11-01

    The β-Glucuronidase gene (sbGUS) cDNA firstly from Scutellari abaicalensis leaf was cloned by RT-PCR, with GenBank accession number KR364726. The full length cDNA of sbGUS was 1 584 bp with an open reading frame (ORF), encoding an unstable protein with 527 amino acids. The bioinformatic analysis showed that the sbGUS encoding protein had isoelectric point (pI) of 5.55 and a calculated molecular weight about 58.724 8 kDa, with a transmembrane regions and signal peptide, had conserved domains of glycoside hydrolase super family and unintegrated trans-glycosidase catalytic structure. In the secondary structure, the percentage of alpha helix, extended strand, β-extended and random coil were 25.62%, 28.84%, 13.28% and 32.26%, respectively. The homologous analysis indicated the nucleotide sequence 98.93% similarity and the amino acid sequence 98.29% similarity with S. baicalensis (BAA97804.1), in the nine positions were different. The expression level of sGUS was the highest in root based on a real-time PCR analysis, followed by flower and stem, and the lowest was in stem. The results provide a foundation for exploring the molecular function of sbGUS involved in baicalcin biosynthesis based on synthetic biology approach in S. baicalensis plants.

  5. Repetitive sequence analysis and karyotyping reveals centromere-associated DNA sequences in radish (Raphanus sativus L.).

    PubMed

    He, Qunyan; Cai, Zexi; Hu, Tianhua; Liu, Huijun; Bao, Chonglai; Mao, Weihai; Jin, Weiwei

    2015-04-18

    Radish (Raphanus sativus L., 2n = 2x = 18) is a major root vegetable crop especially in eastern Asia. Radish root contains various nutritions which play an important role in strengthening immunity. Repetitive elements are primary components of the genomic sequence and the most important factors in genome size variations in higher eukaryotes. To date, studies about repetitive elements of radish are still limited. To better understand genome structure of radish, we undertook a study to evaluate the proportion of repetitive elements and their distribution in radish. We conducted genome-wide characterization of repetitive elements in radish with low coverage genome sequencing followed by similarity-based cluster analysis. Results showed that about 31% of the genome was composed of repetitive sequences. Satellite repeats were the most dominating elements of the genome. The distribution pattern of three satellite repeat sequences (CL1, CL25, and CL43) on radish chromosomes was characterized using fluorescence in situ hybridization (FISH). CL1 was predominantly located at the centromeric region of all chromosomes, CL25 located at the subtelomeric region, and CL43 was a telomeric satellite. FISH signals of two satellite repeats, CL1 and CL25, together with 5S rDNA and 45S rDNA, provide useful cytogenetic markers to identify each individual somatic metaphase chromosome. The centromere-specific histone H3 (CENH3) has been used as a marker to identify centromere DNA sequences. One putative CENH3 (RsCENH3) was characterized and cloned from radish. Its deduced amino acid sequence shares high similarities to those of the CENH3s in Brassica species. An antibody against B. rapa CENH3, specifically stained radish centromeres. Immunostaining and chromatin immunoprecipitation (ChIP) tests with anti-BrCENH3 antibody demonstrated that both the centromere-specific retrotransposon (CR-Radish) and satellite repeat (CL1) are directly associated with RsCENH3 in radish. Proportions

  6. The complete nucleotide sequence of RNA 3 of a peach isolate of Prunus necrotic ringspot virus.

    PubMed

    Hammond, R W; Crosslin, J M

    1995-04-01

    The complete nucleotide sequence of RNA 3 of the PE-5 peach isolate of Prunus necrotic ringspot ilarvirus (PNRSV) was obtained from cloned cDNA. The RNA sequence is 1941 nucleotides and contains two open reading frames (ORFs). ORF 1 consisted of 284 amino acids with a calculated molecular weight of 31,729 Da and ORF 2 contained 224 amino acids with a calculated molecular weight of 25,018 Da. ORF 2 corresponds to the coat protein gene. Expression of ORF 2 engineered into a pTrcHis vector in Escherichia coli results in a fusion polypeptide of approximately 28 kDa which cross-reacts with PNRSV polyclonal antiserum. Analysis of the coat protein amino acid sequence reveals a putative "zinc-finger" domain at the amino-terminal portion of the protein. Two tetranucleotide AUGC motifs occur in the 3'-UTR of the RNA and may function in coat protein binding and genome activation. ORF 1 homologies to other ilarviruses and alfalfa mosaic virus are confined to limited regions of conserved amino acids. The translated amino acid sequence of the coat protein gene shows 92% similarity to one isolate of apple mosaic virus, a closely related member of the ilarvirus group of plant viruses, but only 66% similarity to the amino acid sequence of the coat protein gene of a second isolate. These relationships are also reflected at the nucleotide sequence level. These results in one instance confirm the close similarities observed at the biophysical and serological levels between these two viruses, but on the other hand call into question the nomenclature used to describe these viruses.

  7. Genomic Sequence around Butterfly Wing Development Genes: Annotation and Comparative Analysis

    PubMed Central

    Conceição, Inês C.; Long, Anthony D.; Gruber, Jonathan D.; Beldade, Patrícia

    2011-01-01

    Background Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. Methodology/Principal Findings We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations) and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes). Conclusions The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1) the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2) the high conservation of non

  8. The Papillomavirus Episteme: a central resource for papillomavirus sequence data and analysis.

    PubMed

    Van Doorslaer, Koenraad; Tan, Qina; Xirasagar, Sandhya; Bandaru, Sandya; Gopalan, Vivek; Mohamoud, Yasmin; Huyen, Yentram; McBride, Alison A

    2013-01-01

    The goal of the Papillomavirus Episteme (PaVE) is to provide an integrated resource for the analysis of papillomavirus (PV) genome sequences and related information. The PaVE is a freely accessible, web-based tool (http://pave.niaid.nih.gov) created around a relational database, which enables storage, analysis and exchange of sequence information. From a design perspective, the PaVE adopts an Open Source software approach and stresses the integration and reuse of existing tools. Reference PV genome sequences have been extracted from publicly available databases and reannotated using a custom-created tool. To date, the PaVE contains 241 annotated PV genomes, 2245 genes and regions, 2004 protein sequences and 47 protein structures, which users can explore, analyze or download. The PaVE provides scientists with the data and tools needed to accelerate scientific progress for the study and treatment of diseases caused by PVs.

  9. Analysis of the 3’ untranslated regions of α-tubulin and S-crystallin mRNA and the identification of CPEB in dark- and light-adapted octopus retinas

    PubMed Central

    Kelly, Shannan; Yamamoto, Hideki

    2008-01-01

    Purpose We previously reported the differential expression and translation of mRNA and protein in dark- and light-adapted octopus retinas, which may result from cytoplasmic polyadenylation element (CPE)–dependent mRNA masking and unmasking. Here we investigate the presence of CPEs in α-tubulin and S-crystallin mRNA and report the identification of cytoplasmic polyadenylation element binding protein (CPEB) in light- and dark-adapted octopus retinas. Methods 3’-RACE and sequencing were used to isolate and analyze the 3’-UTRs of α-tubulin and S-crystallin mRNA. Total retinal protein isolated from light- and dark-adapted octopus retinas was subjected to western blot analysis followed by CPEB antibody detection, PEP-171 inhibition of CPEB, and dephosphorylation of CPEB. Results The following CPE-like sequence was detected in the 3’-UTR of isolated long S-crystallin mRNA variants: UUUAACA. No CPE or CPE-like sequences were detected in the 3’-UTRs of α-tubulin mRNA or of the short S-crystallin mRNA variants. Western blot analysis detected CPEB as two putative bands migrating between 60-80 kDa, while a third band migrated below 30 kDa in dark- and light-adapted retinas. Conclusions The detection of CPEB and the identification of the putative CPE-like sequences in the S-crystallin 3’-UTR suggest that CPEB may be involved in the activation of masked S-crystallin mRNA, but not in the regulation of α-tubulin mRNA, resulting in increased S-crystallin protein synthesis in dark-adapted octopus retinas. PMID:18682811

  10. Prunus necrotic ringspot ilarvirus: nucleotide sequence of RNA3 and the relationship to other ilarviruses based on coat protein comparison.

    PubMed

    Guo, D; Maiss, E; Adam, G; Casper, R

    1995-05-01

    The RNA3 of prunus necrotic ringspot ilarvirus (PNRSV) has been cloned and its entire sequence determined. The RNA3 consists of 1943 nucleotides (nt) and possesses two large open reading frames (ORFs) separated by an intergenic region of 74 nt. The 5' proximal ORF is 855 nt in length and codes for a protein of molecular mass 31.4 kDa which has homologies with the putative movement protein of other members of the Bromoviridae. The 3' proximal ORF of 675 nt is the cistron for the coat protein (CP) and has a predicted molecular mass of 24.9 kDa. The sequence of the 3' non-coding region (NCR) of PNRSV RNA3 showed a high degree of similarity with those of tobacco streak virus (TSV), prune dwarf virus (PDV), apple mosaic virus (ApMV) and also alfalfa mosaic virus (AIMV). In addition it contained potential stem-loop structures with interspersed AUGC motifs characteristic for ilar- and alfamoviruses. This conserved primary and secondary structure in all 3' NCRs may be responsible for the interaction with homologous and heterologous CPs and subsequent activation of genome replication. The CP gene of an ApMV isolate (ApMV-G) of 657 nt has also been cloned and sequenced. Although ApMV and PNRSV have a distant serological relationship, the deduced amino acid sequences of their CPs have an identity of only 51.8%. The N termini of PNRSV and ApMV CPs have in common a zinc-finger motif and the potential to form an amphipathic helix.

  11. VisRseq: R-based visual framework for analysis of sequencing data

    PubMed Central

    2015-01-01

    Background Several tools have been developed to enable biologists to perform initial browsing and exploration of sequencing data. However the computational tool set for further analyses often requires significant computational expertise to use and many of the biologists with the knowledge needed to interpret these data must rely on programming experts. Results We present VisRseq, a framework for analysis of sequencing datasets that provides a computationally rich and accessible framework for integrative and interactive analyses without requiring programming expertise. We achieve this aim by providing R apps, which offer a semi-auto generated and unified graphical user interface for computational packages in R and repositories such as Bioconductor. To address the interactivity limitation inherent in R libraries, our framework includes several native apps that provide exploration and brushing operations as well as an integrated genome browser. The apps can be chained together to create more powerful analysis workflows. Conclusions To validate the usability of VisRseq for analysis of sequencing data, we present two case studies performed by our collaborators and report their workflow and insights. PMID:26328469

  12. VisRseq: R-based visual framework for analysis of sequencing data.

    PubMed

    Younesy, Hamid; Möller, Torsten; Lorincz, Matthew C; Karimi, Mohammad M; Jones, Steven J M

    2015-01-01

    Several tools have been developed to enable biologists to perform initial browsing and exploration of sequencing data. However the computational tool set for further analyses often requires significant computational expertise to use and many of the biologists with the knowledge needed to interpret these data must rely on programming experts. We present VisRseq, a framework for analysis of sequencing datasets that provides a computationally rich and accessible framework for integrative and interactive analyses without requiring programming expertise. We achieve this aim by providing R apps, which offer a semi-auto generated and unified graphical user interface for computational packages in R and repositories such as Bioconductor. To address the interactivity limitation inherent in R libraries, our framework includes several native apps that provide exploration and brushing operations as well as an integrated genome browser. The apps can be chained together to create more powerful analysis workflows. To validate the usability of VisRseq for analysis of sequencing data, we present two case studies performed by our collaborators and report their workflow and insights.

  13. Haplotag: Software for Haplotype-Based Genotyping-by-Sequencing Analysis

    PubMed Central

    Tinker, Nicholas A.; Bekele, Wubishet A.; Hattori, Jiro

    2016-01-01

    Genotyping-by-sequencing (GBS), and related methods, are based on high-throughput short-read sequencing of genomic complexity reductions followed by discovery of single nucleotide polymorphisms (SNPs) within sequence tags. This provides a powerful and economical approach to whole-genome genotyping, facilitating applications in genomics, diversity analysis, and molecular breeding. However, due to the complexity of analyzing large data sets, applications of GBS may require substantial time, expertise, and computational resources. Haplotag, the novel GBS software described here, is freely available, and operates with minimal user-investment on widely available computer platforms. Haplotag is unique in fulfilling the following set of criteria: (1) operates without a reference genome; (2) can be used in a polyploid species; (3) provides a discovery mode, and a production mode; (4) discovers polymorphisms based on a model of tag-level haplotypes within sequenced tags; (5) reports SNPs as well as haplotype-based genotypes; and (6) provides an intuitive visual “passport” for each inferred locus. Haplotag is optimized for use in a self-pollinating plant species. PMID:26818073

  14. Massively parallel sequencing-enabled mixture analysis of mitochondrial DNA samples.

    PubMed

    Churchill, Jennifer D; Stoljarova, Monika; King, Jonathan L; Budowle, Bruce

    2018-02-22

    The mitochondrial genome has a number of characteristics that provide useful information to forensic investigations. Massively parallel sequencing (MPS) technologies offer improvements to the quantitative analysis of the mitochondrial genome, specifically the interpretation of mixed mitochondrial samples. Two-person mixtures with nuclear DNA ratios of 1:1, 5:1, 10:1, and 20:1 of individuals from different and similar phylogenetic backgrounds and three-person mixtures with nuclear DNA ratios of 1:1:1 and 5:1:1 were prepared using the Precision ID mtDNA Whole Genome Panel and Ion Chef, and sequenced on the Ion PGM or Ion S5 sequencer (Thermo Fisher Scientific, Waltham, MA, USA). These data were used to evaluate whether and to what degree MPS mixtures could be deconvolved. Analysis was effective in identifying the major contributor in each instance, while SNPs from the minor contributor's haplotype only were identified in the 1:1, 5:1, and 10:1 two-person mixtures. While the major contributor was identified from the 5:1:1 mixture, analysis of the three-person mixtures was more complex, and the mixed haplotypes could not be completely parsed. These results indicate that mixed mitochondrial DNA samples may be interpreted with the use of MPS technologies.

  15. Analysis of Sequence Data Under Multivariate Trait-Dependent Sampling.

    PubMed

    Tao, Ran; Zeng, Donglin; Franceschini, Nora; North, Kari E; Boerwinkle, Eric; Lin, Dan-Yu

    2015-06-01

    High-throughput DNA sequencing allows for the genotyping of common and rare variants for genetic association studies. At the present time and for the foreseeable future, it is not economically feasible to sequence all individuals in a large cohort. A cost-effective strategy is to sequence those individuals with extreme values of a quantitative trait. We consider the design under which the sampling depends on multiple quantitative traits. Under such trait-dependent sampling, standard linear regression analysis can result in bias of parameter estimation, inflation of type I error, and loss of power. We construct a likelihood function that properly reflects the sampling mechanism and utilizes all available data. We implement a computationally efficient EM algorithm and establish the theoretical properties of the resulting maximum likelihood estimators. Our methods can be used to perform separate inference on each trait or simultaneous inference on multiple traits. We pay special attention to gene-level association tests for rare variants. We demonstrate the superiority of the proposed methods over standard linear regression through extensive simulation studies. We provide applications to the Cohorts for Heart and Aging Research in Genomic Epidemiology Targeted Sequencing Study and the National Heart, Lung, and Blood Institute Exome Sequencing Project.

  16. An Ambystoma mexicanum EST sequencing project: analysis of 17,352 expressed sequence tags from embryonic and regenerating blastema cDNA libraries

    PubMed Central

    Habermann, Bianca; Bebin, Anne-Gaelle; Herklotz, Stephan; Volkmer, Michael; Eckelt, Kay; Pehlke, Kerstin; Epperlein, Hans Henning; Schackert, Hans Konrad; Wiebe, Glenis; Tanaka, Elly M

    2004-01-01

    Background The ambystomatid salamander, Ambystoma mexicanum (axolotl), is an important model organism in evolutionary and regeneration research but relatively little sequence information has so far been available. This is a major limitation for molecular studies on caudate development, regeneration and evolution. To address this lack of sequence information we have generated an expressed sequence tag (EST) database for A. mexicanum. Results Two cDNA libraries, one made from stage 18-22 embryos and the other from day-6 regenerating tail blastemas, generated 17,352 sequences. From the sequenced ESTs, 6,377 contigs were assembled that probably represent 25% of the expressed genes in this organism. Sequence comparison revealed significant homology to entries in the NCBI non-redundant database. Further examination of this gene set revealed the presence of genes involved in important cell and developmental processes, including cell proliferation, cell differentiation and cell-cell communication. On the basis of these data, we have performed phylogenetic analysis of key cell-cycle regulators. Interestingly, while cell-cycle proteins such as the cyclin B family display expected evolutionary relationships, the cyclin-dependent kinase inhibitor 1 gene family shows an unusual evolutionary behavior among the amphibians. Conclusions Our analysis reveals the importance of a comprehensive sequence set from a representative of the Caudata and illustrates that the EST sequence database is a rich source of molecular, developmental and regeneration studies. To aid in data mining, the ESTs have been organized into an easily searchable database that is freely available online. PMID:15345051

  17. VDJServer: A Cloud-Based Analysis Portal and Data Commons for Immune Repertoire Sequences and Rearrangements.

    PubMed

    Christley, Scott; Scarborough, Walter; Salinas, Eddie; Rounds, William H; Toby, Inimary T; Fonner, John M; Levin, Mikhail K; Kim, Min; Mock, Stephen A; Jordan, Christopher; Ostmeyer, Jared; Buntzman, Adam; Rubelt, Florian; Davila, Marco L; Monson, Nancy L; Scheuermann, Richard H; Cowell, Lindsay G

    2018-01-01

    Recent technological advances in immune repertoire sequencing have created tremendous potential for advancing our understanding of adaptive immune response dynamics in various states of health and disease. Immune repertoire sequencing produces large, highly complex data sets, however, which require specialized methods and software tools for their effective analysis and interpretation. VDJServer is a cloud-based analysis portal for immune repertoire sequence data that provide access to a suite of tools for a complete analysis workflow, including modules for preprocessing and quality control of sequence reads, V(D)J gene segment assignment, repertoire characterization, and repertoire comparison. VDJServer also provides sophisticated visualizations for exploratory analysis. It is accessible through a standard web browser via a graphical user interface designed for use by immunologists, clinicians, and bioinformatics researchers. VDJServer provides a data commons for public sharing of repertoire sequencing data, as well as private sharing of data between users. We describe the main functionality and architecture of VDJServer and demonstrate its capabilities with use cases from cancer immunology and autoimmunity. VDJServer provides a complete analysis suite for human and mouse T-cell and B-cell receptor repertoire sequencing data. The combination of its user-friendly interface and high-performance computing allows large immune repertoire sequencing projects to be analyzed with no programming or software installation required. VDJServer is a web-accessible cloud platform that provides access through a graphical user interface to a data management infrastructure, a collection of analysis tools covering all steps in an analysis, and an infrastructure for sharing data along with workflows, results, and computational provenance. VDJServer is a free, publicly available, and open-source licensed resource.

  18. Isolation and determination of the primary structure of a lectin protein from the serum of the American alligator (Alligator mississippiensis).

    PubMed

    Darville, Lancia N F; Merchant, Mark E; Maccha, Venkata; Siddavarapu, Vivekananda Reddy; Hasan, Azeem; Murray, Kermit K

    2012-02-01

    Mass spectrometry in conjunction with de novo sequencing was used to determine the amino acid sequence of a 35kDa lectin protein isolated from the serum of the American alligator that exhibits binding to mannose. The protein N-terminal sequence was determined using Edman degradation and enzymatic digestion with different proteases was used to generate peptide fragments for analysis by liquid chromatography tandem mass spectrometry (LC MS/MS). Separate analysis of the protein digests with multiple enzymes enhanced the protein sequence coverage. De novo sequencing was accomplished using MASCOT Distiller and PEAKS software and the sequences were searched against the NCBI database using MASCOT and BLAST to identify homologous peptides. MS analysis of the intact protein indicated that it is present primarily as monomer and dimer in vitro. The isolated 35kDa protein was ~98% sequenced and found to have 313 amino acids and nine cysteine residues and was identified as an alligator lectin. The alligator lectin sequence was aligned with other lectin sequences using DIALIGN and ClustalW software and was found to exhibit 58% and 59% similarity to both human and mouse intelectin-1. The alligator lectin exhibited strong binding affinities toward mannan and mannose as compared to other tested carbohydrates. Copyright © 2011 Elsevier Inc. All rights reserved.

  19. Microparticles prepared with 50-190kDa chitosan as promising non-toxic carriers for pulmonary delivery of isoniazid.

    PubMed

    Oliveira, Paula M; Matos, Breno N; Pereira, Priscilla A T; Gratieri, Taís; Faccioli, Lucia H; Cunha-Filho, Marcílio S S; Gelfuso, Guilherme M

    2017-10-15

    Chitosan biocompatibility and mucoadhesiveness make it an ideal polymer for antituberculotic drugs microcapsulation for pulmonary delivery. Yet, previous study indicated toxicity problems to J-774.1-cells treated with some medium molecular weight (190-310kDa) chitosan microparticles. As polymer molecular weight is a crucial factor to be considered, this paper describes the preparation and characterization of chitosan (50-190kDa) microparticles containing isoniazid (INH). Cytotoxicity assays were also performed on murine peritoneal (J-774.1) and alveolar (AMJ2-C11) macrophages cell lines, followed by cytokines detection from AMJ2-C11 cells. Spray-drying process produced mucoadhesive microparticles from 3.2μm to 3.9μm, entrapping more than 89% of the drug and preserving their chemical stability. Drug release behavior could be controlled by the use of cross-linked or uncross-linked chitosan, the latter leading to a rapid drug release. Mucoadhesive potential of the microparticles was characterized following in vitro and ex vivo assays. Finally, a significant reduction on toxicity against peritoneal macrophages and no toxic effect on alveolar macrophages with use of such microparticles were observed. In conclusion, 50-190kDa chitosan microparticles may act as promising non-cytotoxic carriers for pulmonary delivery of INH showing marked alveoli macrophage activation. Copyright © 2017 Elsevier Ltd. All rights reserved.

  20. Complete genome sequence of Fer-de-Lance Virus reveals a novel gene in reptilian Paramyxoviruses

    USGS Publications Warehouse

    Kurath, G.; Batts, W.N.; Ahne, W.; Winton, J.R.

    2004-01-01

    The complete RNA genome sequence of the archetype reptilian paramyxovirus, Fer-de-Lance virus (FDLV), has been determined. The genome is 15,378 nucleotides in length and consists of seven nonoverlapping genes in the order 3??? N-U-P-M-F-HN-L 5???, coding for the nucleocapsid, unknown, phospho-, matrix, fusion, hemagglutinin-neuraminidase, and large polymerase proteins, respectively. The gene junctions contain highly conserved transcription start and stop signal sequences and tri-nucleotide intergenic regions similar to those of other Paramyxoviridae. The FDLV P gene expression strategy is like that of rubulaviruses, which express the accessory V protein from the primary transcript and edit a portion of the mRNA to encode P and I proteins. There is also an overlapping open reading frame potentially encoding a small basic protein in the P gene. The gene designated U (unknown), encodes a deduced protein of 19.4 kDa that has no counterpart in other paramyxoviruses and has no similarity with sequences in the National Center for Biotechnology Information database. Active transcription of the U gene in infected cells was demonstrated by Northern blot analysis, and bicistronic N-U mRNA was also evident. The genomes of two other snake paramyxovirus genotypes were also found to have U genes, with 11 to 16% nucleotide divergence from the FDLV U gene. Pairwise comparisons of amino acid identities and phylogenetic analyses of all deduced FDLV protein sequences with homologous sequences from other Paramyxoviridae indicate that FDLV represents a new genus within the subfamily Paramyxovirinae. We suggest the name Ferlavirus for the new genus, with FDLV as the type species.

  1. The purification and characterization of an 88-kDa Porphyromonas endodontalis 35406 protease.

    PubMed

    Rosen, G; Shoshani, M; Naor, R; Sela, M N

    2001-12-01

    A Porphyromonas endodontalis ATCC 35406 protease was purified from Triton X-114 cell extracts by preparative SDS-PAGE followed by electroelution. The purified enzyme exhibits a molecular size of 88 kDa and was dissociated into two polypeptides of 43 and 41 kDa upon heating in the presence of sodium dodecyl sulfate with or without a reducing agent. The protease (pH optimum 7.5-8.0) degraded the extracellular matrix proteins fibrinogen and fibronectin. Collagen IV was also degraded at 37 degrees C but not at 28 degrees C. The protease also cleaved the bioactive peptide angiotensin at amino acid residue phenylalanine-8 and tyrosine-4 but failed to hydrolyze bradykinin, vasopressin and synthetic chromogenic substrates with phenylalanine or tyrosine at the P1 position. In addition, two peptidases were detected in P. endodontalis cells: a proline aminopeptidase that remained associated with the cell pellet after detergent extraction and peptidase/s that partitioned into the Triton X-114 phase after phase separation and degraded the bioactive peptides bradykinin and vasopressin. These P. endodontalis peptidases and proteases may play an important role in both the nutrition and pathogenicity of these assacharolytic microorganisms. The inactivation of bioactive peptides and degradation of extracellular matrix proteins by bacterial enzymes may contribute to the damage of host tissues accompanied with endodontic infections.

  2. Human Ro60 (SSA2) genomic organization and sequence alterations, examined in cutaneous lupus erythematosus.

    PubMed

    Millard, T P; Ashton, G H S; Kondeatis, E; Vaughan, R W; Hughes, G R V; Khamashta, M A; Hawk, J L M; McGregor, J M; McGrath, J A

    2002-02-01

    The Ro 60 kDa protein (Ro60 or SSA2) is the major component of the Ro ribonucleoprotein (Ro RNP) complex, to which an immune response is a specific feature of several autoimmune diseases. The genomic organization and any sequence variation within the DNA encoding Ro60 are unknown. To characterize the Ro60 gene structure and to assess whether any sequence alterations might be associated with serum anti-Ro antibody in subacute cutaneous lupus erythematosus (SCLE), thus potentially providing new insight into disease pathogenesis. The cDNA sequence for Ro60 was obtained from the NCBI database and used for a BLAST search for a clone containing the entire genomic sequence. The intron-exon borders were confirmed by designing intronic primer pairs to flank each exon, which were then used to amplify genomic DNA for automated sequencing from 36 caucasian patients with SCLE (anti-Ro positive) and 49 with discoid LE (DLE, anti-Ro negative), in addition to 36 healthy caucasian controls. Heteroduplex analysis of polymerase chain reaction (PCR) products from patients and controls spanning all Ro60 exons (1-8) revealed a common bandshift in the PCR products spanning exon 7. Sequencing of the corresponding PCR products demonstrated an A > G substitution at nucleotide position 1318-7, within the consensus acceptor splice site of exon 7 (GenBank XM001901). The allele frequencies were major allele A (0.71) and minor allele G (0.29) in 72 control chromosomes, with no significant differences found between SCLE patients, DLE patients and controls. The genomic organization of the DNA encoding the Ro60 protein is described, including a common polymorphism within the consensus acceptor splice site of exon 7. Our delineation of a strategy for the genomic amplification of Ro60 forms a basis for further examination of the pathological functions of the Ro RNP in autoimmune disease.

  3. Activation of the EBV/C3d receptor (CR2, CD21) on human B lymphocyte surface triggers tyrosine phosphorylation of the 95-kDa nucleolin and its interaction with phosphatidylinositol 3 kinase.

    PubMed

    Barel, M; Le Romancer, M; Frade, R

    2001-03-01

    We previously demonstrated that CR2 activation on human B lymphocyte surface triggered tyrosine phosphorylation of a p95 component and its interaction with p85 subunit of phosphatidylinositol 3' (PI 3) kinase. Despite identical molecular mass of 95 kDa, this tyrosine phosphorylated p95 molecule was not CD19, the proto-oncogene Vav, or the adaptator Gab1. To identify this tyrosine phosphorylated p95 component, we first purified it by affinity chromatography on anti-phosphotyrosine mAb covalently linked to Sepharose 4B, followed by polyacrylamide gel electrophoresis. Then, the isolated 95-kDa tyrosine phosphorylated band was submitted to amino acid analysis by mass spectrometry; the two different isolated peptides were characterized by amino acid sequences 100% identical with two different domains of nucleolin, localized between aa 411--420 and 611--624. Anti-nucleolin mAb was used to confirm the antigenic properties of this p95 component. Functional studies demonstrated that CR2 activation induced, within a brief span of 2 min, tyrosine phosphorylation of nucleolin and its interaction with Src homology 2 domains of the p85 subunit of PI 3 kinase and of 3BP2 and Grb2, but not with Src homology 2 domains of Fyn and Gap. These properties of nucleolin were identical with those of the p95 previously described and induced by CR2 activation. Furthermore, tyrosine phosphorylation of nucleolin was also induced in normal B lymphocytes by CR2 activation but neither by CD19 nor BCR activation. These data support that tyrosine phosphorylation of nucleolin and its interaction with PI 3 kinase p85 subunit constitute one of the earlier steps in the specific intracellular signaling pathway of CR2.

  4. Sequential recognition of the pre-mRNA branch point by U2AF65 and a novel spliceosome-associated 28-kDa protein.

    PubMed Central

    Gaur, R K; Valcárcel, J; Green, M R

    1995-01-01

    Splicing of pre-mRNAs occurs via a lariat intermediate in which an intronic adenosine, embedded within a branch point sequence, forms a 2',5'-phosphodiester bond (RNA branch) with the 5' end of the intron. How the branch point is recognized and activated remains largely unknown. Using site-specific photochemical cross-linking, we have identified two proteins that specifically interact with the branch point during the splicing reaction. U2AF65, an essential splicing factor that binds to the adjacent polypyrimidine tract, crosslinks to the branch point at the earliest stage of spliceosome formation in an ATP-independent manner. A novel 28-kDa protein, which is a constituent of the mature spliceosome, contacts the branch point after the first catalytic step. Our results indicate that the branch point is sequentially recognized by distinct splicing factors in the course of the splicing reaction. Images FIGURE 1 FIGURE 2 FIGURE 3 FIGURE 4 FIGURE 5 FIGURE 6 FIGURE 7 FIGURE 8 FIGURE 9 PMID:7493318

  5. Sequence analysis of Leukemia DNA

    NASA Astrophysics Data System (ADS)

    Nacong, Nasria; Lusiyanti, Desy; Irawan, Muhammad. Isa

    2018-03-01

    Cancer is a very deadly disease, one of which is leukemia disease or better known as blood cancer. The cancer cell can be detected by taking DNA in laboratory test. This study focused on local alignment of leukemia and non leukemia data resulting from NCBI in the form of DNA sequences by using Smith-Waterman algorithm. SmithWaterman algorithm was invented by TF Smith and MS Waterman in 1981. These algorithms try to find as much as possible similarity of a pair of sequences, by giving a negative value to the unequal base pair (mismatch), and positive values on the same base pair (match). So that will obtain the maximum positive value as the end of the alignment, and the minimum value as the initial alignment. This study will use sequences of leukemia and 3 sequences of non leukemia.

  6. Sequence Analysis of the Genome of Carnation (Dianthus caryophyllus L.)

    PubMed Central

    Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

    2014-01-01

    The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. ‘Francesco’ was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568 887 315 bp, consisting of 45 088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16 644 bp and 60 737 bp, respectively, and the longest scaffold was 1 287 144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. PMID:24344172

  7. miRanalyzer: a microRNA detection and analysis tool for next-generation sequencing experiments.

    PubMed

    Hackenberg, Michael; Sturm, Martin; Langenberger, David; Falcón-Pérez, Juan Manuel; Aransay, Ana M

    2009-07-01

    Next-generation sequencing allows now the sequencing of small RNA molecules and the estimation of their expression levels. Consequently, there will be a high demand of bioinformatics tools to cope with the several gigabytes of sequence data generated in each single deep-sequencing experiment. Given this scene, we developed miRanalyzer, a web server tool for the analysis of deep-sequencing experiments for small RNAs. The web server tool requires a simple input file containing a list of unique reads and its copy numbers (expression levels). Using these data, miRanalyzer (i) detects all known microRNA sequences annotated in miRBase, (ii) finds all perfect matches against other libraries of transcribed sequences and (iii) predicts new microRNAs. The prediction of new microRNAs is an especially important point as there are many species with very few known microRNAs. Therefore, we implemented a highly accurate machine learning algorithm for the prediction of new microRNAs that reaches AUC values of 97.9% and recall values of up to 75% on unseen data. The web tool summarizes all the described steps in a single output page, which provides a comprehensive overview of the analysis, adding links to more detailed output pages for each analysis module. miRanalyzer is available at http://web.bioinformatics.cicbiogune.es/microRNA/.

  8. Complete Genome Sequence and Immunoproteomic Analyses of the Bacterial Fish Pathogen Streptococcus parauberis▿†

    PubMed Central

    Nho, Seong Won; Hikima, Jun-ichi; Cha, In Seok; Park, Seong Bin; Jang, Ho Bin; del Castillo, Carmelo S.; Kondo, Hidehiro; Hirono, Ikuo; Aoki, Takashi; Jung, Tae Sung

    2011-01-01

    Although Streptococcus parauberis is known as a bacterial pathogen associated with bovine udder mastitis, it has recently become one of the major causative agents of olive flounder (Paralichthys olivaceus) streptococcosis in northeast Asia, causing massive mortality resulting in severe economic losses. S. parauberis contains two serotypes, and it is likely that capsular polysaccharide antigens serve to differentiate the serotypes. In the present study, the complete genome sequence of S. parauberis (serotype I) was determined using the GS-FLX system to investigate its phylogeny, virulence factors, and antigenic proteins. S. parauberis possesses a single chromosome of 2,143,887 bp containing 1,868 predicted coding sequences (CDSs), with an average GC content of 35.6%. Whole-genome dot plot analysis and phylogenetic analysis of a 60-kDa chaperonin-encoding gene and the glyceraldehyde-3-phosphate dehydrogenase (GAPDH)-encoding gene showed that the strain was evolutionarily closely related to Streptococcus uberis. S. parauberis antigenic proteins were analyzed using an immunoproteomic technique. Twenty-one antigenic protein spots were identified in S. parauberis, by reaction with an antiserum obtained from S. parauberis-challenged olive flounder. This work provides the foundation needed to understand more clearly the relationship between pathogen and host and develops new approaches toward prophylactic and therapeutic strategies to deal with streptococcosis in fish. The work also provides a better understanding of the physiology and evolution of a significant representative of the Streptococcaceae. PMID:21531805

  9. A survey of tools for variant analysis of next-generation genome sequencing data

    PubMed Central

    Pabinger, Stephan; Dander, Andreas; Fischer, Maria; Snajder, Rene; Sperk, Michael; Efremova, Mirjana; Krabichler, Birgit; Speicher, Michael R.; Zschocke, Johannes

    2014-01-01

    Recent advances in genome sequencing technologies provide unprecedented opportunities to characterize individual genomic landscapes and identify mutations relevant for diagnosis and therapy. Specifically, whole-exome sequencing using next-generation sequencing (NGS) technologies is gaining popularity in the human genetics community due to the moderate costs, manageable data amounts and straightforward interpretation of analysis results. While whole-exome and, in the near future, whole-genome sequencing are becoming commodities, data analysis still poses significant challenges and led to the development of a plethora of tools supporting specific parts of the analysis workflow or providing a complete solution. Here, we surveyed 205 tools for whole-genome/whole-exome sequencing data analysis supporting five distinct analytical steps: quality assessment, alignment, variant identification, variant annotation and visualization. We report an overview of the functionality, features and specific requirements of the individual tools. We then selected 32 programs for variant identification, variant annotation and visualization, which were subjected to hands-on evaluation using four data sets: one set of exome data from two patients with a rare disease for testing identification of germline mutations, two cancer data sets for testing variant callers for somatic mutations, copy number variations and structural variations, and one semi-synthetic data set for testing identification of copy number variations. Our comprehensive survey and evaluation of NGS tools provides a valuable guideline for human geneticists working on Mendelian disorders, complex diseases and cancers. PMID:23341494

  10. Whole genome sequence analysis of unidentified genetically modified papaya for development of a specific detection method.

    PubMed

    Nakamura, Kosuke; Kondo, Kazunari; Akiyama, Hiroshi; Ishigaki, Takumi; Noguchi, Akio; Katsumata, Hiroshi; Takasaki, Kazuto; Futo, Satoshi; Sakata, Kozue; Fukuda, Nozomi; Mano, Junichi; Kitta, Kazumi; Tanaka, Hidenori; Akashi, Ryo; Nishimaki-Mogami, Tomoko

    2016-08-15

    Identification of transgenic sequences in an unknown genetically modified (GM) papaya (Carica papaya L.) by whole genome sequence analysis was demonstrated. Whole genome sequence data were generated for a GM-positive fresh papaya fruit commodity detected in monitoring using real-time polymerase chain reaction (PCR). The sequences obtained were mapped against an open database for papaya genome sequence. Transgenic construct- and event-specific sequences were identified as a GM papaya developed to resist infection from a Papaya ringspot virus. Based on the transgenic sequences, a specific real-time PCR detection method for GM papaya applicable to various food commodities was developed. Whole genome sequence analysis enabled identifying unknown transgenic construct- and event-specific sequences in GM papaya and development of a reliable method for detecting them in papaya food commodities. Copyright © 2016 Elsevier Ltd. All rights reserved.

  11. Population-Sequencing as a Biomarker of Burkholderia mallei and Burkholderia pseudomallei Evolution through Microbial Forensic Analysis.

    PubMed

    Jakupciak, John P; Wells, Jeffrey M; Karalus, Richard J; Pawlowski, David R; Lin, Jeffrey S; Feldman, Andrew B

    2013-01-01

    Large-scale genomics projects are identifying biomarkers to detect human disease. B. pseudomallei and B. mallei are two closely related select agents that cause melioidosis and glanders. Accurate characterization of metagenomic samples is dependent on accurate measurements of genetic variation between isolates with resolution down to strain level. Often single biomarker sensitivity is augmented by use of multiple or panels of biomarkers. In parallel with single biomarker validation, advances in DNA sequencing enable analysis of entire genomes in a single run: population-sequencing. Potentially, direct sequencing could be used to analyze an entire genome to serve as the biomarker for genome identification. However, genome variation and population diversity complicate use of direct sequencing, as well as differences caused by sample preparation protocols including sequencing artifacts and mistakes. As part of a Department of Homeland Security program in bacterial forensics, we examined how to implement whole genome sequencing (WGS) analysis as a judicially defensible forensic method for attributing microbial sample relatedness; and also to determine the strengths and limitations of whole genome sequence analysis in a forensics context. Herein, we demonstrate use of sequencing to provide genetic characterization of populations: direct sequencing of populations.

  12. Population-Sequencing as a Biomarker of Burkholderia mallei and Burkholderia pseudomallei Evolution through Microbial Forensic Analysis

    PubMed Central

    Jakupciak, John P.; Wells, Jeffrey M.; Karalus, Richard J.; Pawlowski, David R.; Lin, Jeffrey S.; Feldman, Andrew B.

    2013-01-01

    Large-scale genomics projects are identifying biomarkers to detect human disease. B. pseudomallei and B. mallei are two closely related select agents that cause melioidosis and glanders. Accurate characterization of metagenomic samples is dependent on accurate measurements of genetic variation between isolates with resolution down to strain level. Often single biomarker sensitivity is augmented by use of multiple or panels of biomarkers. In parallel with single biomarker validation, advances in DNA sequencing enable analysis of entire genomes in a single run: population-sequencing. Potentially, direct sequencing could be used to analyze an entire genome to serve as the biomarker for genome identification. However, genome variation and population diversity complicate use of direct sequencing, as well as differences caused by sample preparation protocols including sequencing artifacts and mistakes. As part of a Department of Homeland Security program in bacterial forensics, we examined how to implement whole genome sequencing (WGS) analysis as a judicially defensible forensic method for attributing microbial sample relatedness; and also to determine the strengths and limitations of whole genome sequence analysis in a forensics context. Herein, we demonstrate use of sequencing to provide genetic characterization of populations: direct sequencing of populations. PMID:24455204

  13. Sequencing and comparative genomic analysis of 1227 Felis catus cDNA sequences enriched for developmental, clinical and nutritional phenotypes

    PubMed Central

    2012-01-01

    Background The feline genome is valuable to the veterinary and model organism genomics communities because the cat is an obligate carnivore and a model for endangered felids. The initial public release of the Felis catus genome assembly provided a framework for investigating the genomic basis of feline biology. However, the entire set of protein coding genes has not been elucidated. Results We identified and characterized 1227 protein coding feline sequences, of which 913 map to public sequences and 314 are novel. These sequences have been deposited into NCBI's genbank database and complement public genomic resources by providing additional protein coding sequences that fill in some of the gaps in the feline genome assembly. Through functional and comparative genomic analyses, we gained an understanding of the role of these sequences in feline development, nutrition and health. Specifically, we identified 104 orthologs of human genes associated with Mendelian disorders. We detected negative selection within sequences with gene ontology annotations associated with intracellular trafficking, cytoskeleton and muscle functions. We detected relatively less negative selection on protein sequences encoding extracellular networks, apoptotic pathways and mitochondrial gene ontology annotations. Additionally, we characterized feline cDNA sequences that have mouse orthologs associated with clinical, nutritional and developmental phenotypes. Together, this analysis provides an overview of the value of our cDNA sequences and enhances our understanding of how the feline genome is similar to, and different from other mammalian genomes. Conclusions The cDNA sequences reported here expand existing feline genomic resources by providing high-quality sequences annotated with comparative genomic information providing functional, clinical, nutritional and orthologous gene information. PMID:22257742

  14. Molecular characterization of Giardia psittaci by multilocus sequence analysis.

    PubMed

    Abe, Niichiro; Makino, Ikuko; Kojima, Atsushi

    2012-12-01

    Multilocus sequence analyses targeting small subunit ribosomal DNA (SSU rDNA), elongation factor 1 alpha (ef1α), glutamate dehydrogenase (gdh), and beta giardin (β-giardin) were performed on Giardia psittaci isolates from three Budgerigars (Melopsittacus undulates) and four Barred parakeets (Bolborhynchus lineola) kept in individual households or imported from overseas. Nucleotide differences and phylogenetic analyses at four loci indicate the distinction of G. psittaci from the other known Giardia species: Giardia muris, Giardia microti, Giardia ardeae, and Giardia duodenalis assemblages. Furthermore, G. psittaci was related more closely to G. duodenalis than to the other known Giardia species, except for G. microti. Conflicting signals regarded as "double peaks" were found at the same nucleotide positions of the ef1α in all isolates. However, the sequences of the other three loci, including gdh and β-giardin, which are known to be highly variable, from all isolates were also mutually identical at every locus. They showed no double peaks. These results suggest that double peaks found in the ef1α sequences are caused not by mixed infection with genetically different G. psittaci isolates but by allelic sequence heterogeneity (ASH), which is observed in diplomonad lineages including G. duodenalis. No sequence difference was found in any G. psittaci isolates at the gdh and β-giardin, suggesting that G. psittaci is indeed not more diverse genetically than other Giardia species. This report is the first to provide evidence related to the genetic characteristics of G. psittaci obtained using multilocus sequence analysis. Copyright © 2012 Elsevier B.V. All rights reserved.

  15. Therapeutic change in interaction: conversation analysis of a transforming sequence.

    PubMed

    Voutilainen, Liisa; Perakyla, Anssi; Ruusuvuori, Johanna

    2011-05-01

    A process of change within a single case of cognitive-constructivist therapy is analyzed by means of conversation analysis (CA). The focus is on a process of change in the sequences of interaction, which consist of the therapist's conclusion and the patient's response to it. In the conclusions, the therapist investigates and challenges the patient's tendency to transform her feelings of disappointment and anger into self-blame. Over the course of the therapy, the patient's responses to these conclusions are recast: from the patient first rejecting the conclusion, to then being ambivalent, and finally to agreeing with the therapist. On the basis of this case study, we suggest that an analysis that focuses on sequences of talk that are interactionally similar offers a sensitive method to investigate the manifestation of therapeutic change. It is suggested that this line of research can complement assimilation analysis and other methods of analyzing changes in a client's talk.

  16. Investigating the long-term course of schizophrenia by sequence analysis.

    PubMed

    An der Heiden, Wolfram; Häfner, Heinz

    2015-08-30

    In the present study we set out to explore the long-term clinical course of schizophrenia in a holistic manner by adopting sequence analysis. Our aim was to identify course types of illness by means of cluster analysis. The study was based on course and outcome data for 107 patients followed up over 134 months after first admission in the ABC Schizophrenia Study. Focusing on the main syndromes (positive, negative, depressive and unspecific symptoms) and their combinations we looked for similarities in individual illness courses using the 'optimal matching' method. A cluster analysis performed on the resulting similarity matrix yielded two main groups (a 'improving' and a 'chronic' group), which comprised a total of six different types of illness course. The course types differed in both quantitative (frequency of syndromes and syndrome combinations) and qualitative terms (clinical presentation, sequence of syndromes). Cluster membership was only rarely, but clearly associated with sociodemographic characteristics, treatment data and other illness variables. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  17. Determination and analysis of the complete genome sequence of Paralichthys olivaceus rhabdovirus (PORV).

    PubMed

    Zhu, Ruo-Lin; Zhang, Qi-Ya

    2014-04-01

    Paralichthys olivaceus rhabdovirus (PORV), which is associated with high mortality rates in flounder, was isolated in China in 2005. Here, we provide an annotated sequence record of PORV, the genome of which comprises 11,182 nucleotides and contains six genes in the order 3'-N-P-M-G-NV-L-5'. Phylogenetic analysis based on glycoprotein sequences of PORV and other rhabdoviruses showed that PORV clusters with viral haemorrhagic septicemia virus (VHSV), genus Novirhabdovirus, family Rhabdoviridae. Further phylogenetic analysis of the combined amino acid sequences of six proteins of PORV and VHSV strains showed that PORV clusters with Korean strains and is closely related to Asian strains, all of which were isolated from flounder. In a comparison in which the sequences of the six proteins were combined, PORV shared the highest identity (98.3 %) with VHSV strain KJ2008 from Korea.

  18. Transcriptome analysis by strand-specific sequencing of complementary DNA

    PubMed Central

    Parkhomchuk, Dmitri; Borodina, Tatiana; Amstislavskiy, Vyacheslav; Banaru, Maria; Hallen, Linda; Krobitsch, Sylvia; Lehrach, Hans; Soldatov, Alexey

    2009-01-01

    High-throughput complementary DNA sequencing (RNA-Seq) is a powerful tool for whole-transcriptome analysis, supplying information about a transcript's expression level and structure. However, it is difficult to determine the polarity of transcripts, and therefore identify which strand is transcribed. Here, we present a simple cDNA sequencing protocol that preserves information about a transcript's direction. Using Saccharomyces cerevisiae and mouse brain transcriptomes as models, we demonstrate that knowing the transcript's orientation allows more accurate determination of the structure and expression of genes. It also helps to identify new genes and enables studying promoter-associated and antisense transcription. The transcriptional landscapes we obtained are available online. PMID:19620212

  19. Transcriptome analysis by strand-specific sequencing of complementary DNA.

    PubMed

    Parkhomchuk, Dmitri; Borodina, Tatiana; Amstislavskiy, Vyacheslav; Banaru, Maria; Hallen, Linda; Krobitsch, Sylvia; Lehrach, Hans; Soldatov, Alexey

    2009-10-01

    High-throughput complementary DNA sequencing (RNA-Seq) is a powerful tool for whole-transcriptome analysis, supplying information about a transcript's expression level and structure. However, it is difficult to determine the polarity of transcripts, and therefore identify which strand is transcribed. Here, we present a simple cDNA sequencing protocol that preserves information about a transcript's direction. Using Saccharomyces cerevisiae and mouse brain transcriptomes as models, we demonstrate that knowing the transcript's orientation allows more accurate determination of the structure and expression of genes. It also helps to identify new genes and enables studying promoter-associated and antisense transcription. The transcriptional landscapes we obtained are available online.

  20. High-Throughput Single-Cell RNA Sequencing and Data Analysis.

    PubMed

    Sagar; Herman, Josip Stefan; Pospisilik, John Andrew; Grün, Dominic

    2018-01-01

    Understanding biological systems at a single cell resolution may reveal several novel insights which remain masked by the conventional population-based techniques providing an average readout of the behavior of cells. Single-cell transcriptome sequencing holds the potential to identify novel cell types and characterize the cellular composition of any organ or tissue in health and disease. Here, we describe a customized high-throughput protocol for single-cell RNA-sequencing (scRNA-seq) combining flow cytometry and a nanoliter-scale robotic system. Since scRNA-seq requires amplification of a low amount of endogenous cellular RNA, leading to substantial technical noise in the dataset, downstream data filtering and analysis require special care. Therefore, we also briefly describe in-house state-of-the-art data analysis algorithms developed to identify cellular subpopulations including rare cell types as well as to derive lineage trees by ordering the identified subpopulations of cells along the inferred differentiation trajectories.

  1. Assessment of sequence variability in a p23 gene region within and among three genotypes of the Theileria orientalis complex from south-eastern Australia.

    PubMed

    Perera, Piyumali K; Gasser, Robin B; Jabbar, Abdul

    2015-03-01

    Oriental theileriosis is a tick-borne, protozoan disease of cattle caused by one or more genotypes of Theileria orientalis complex. In this study, we assessed sequence variability in a region of the 23kDa piroplasm membrane protein (p23) gene within and among three T. orientalis genotypes (designated buffeli, chitose and ikeda) in south-eastern Australia. Genomic DNA (n=100) was extracted from blood of infected cattle from various locations endemic for oriental theileriosis and tested by polymerase chain reaction (PCR)-coupled mutation scanning (single-strand conformation polymorphism (SSCP)) and targeted sequencing analysis. Eight distinct sequences represented all DNA samples, and three genotypes were found: buffeli (n=3), chitose (3) and ikeda (2). Nucleotide pairwise comparisons among these eight sequences revealed considerably higher variability among the genotypes (6.6-11.7%) than within them (0-1.9%), indicating that the p23 gene region allows the accurate identification of T. orientalis genotypes. In the future, we will combine this gene with other molecular markers to study the genetic structure of T. orientalis populations in Australasia, which will pave the way to establish a highly sensitive and specific PCR-based assay for genotypic diagnosis of infection and for assessing levels of parasitaemia in cattle. Copyright © 2014 Elsevier GmbH. All rights reserved.

  2. Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering.

    PubMed

    Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor; Essex, M

    2015-05-01

    To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice.

  3. Functional sequencing read annotation for high precision microbiome analysis

    PubMed Central

    Zhu, Chengsheng; Miller, Maximilian; Marpaka, Srinayani; Vaysberg, Pavel; Rühlemann, Malte C; Wu, Guojun; Heinsen, Femke-Anouska; Tempel, Marie; Zhao, Liping; Lieb, Wolfgang; Franke, Andre; Bromberg, Yana

    2018-01-01

    Abstract The vast majority of microorganisms on Earth reside in often-inseparable environment-specific communities—microbiomes. Meta-genomic/-transcriptomic sequencing could reveal the otherwise inaccessible functionality of microbiomes. However, existing analytical approaches focus on attributing sequencing reads to known genes/genomes, often failing to make maximal use of available data. We created faser (functional annotation of sequencing reads), an algorithm that is optimized to map reads to molecular functions encoded by the read-correspondent genes. The mi-faser microbiome analysis pipeline, combining faser with our manually curated reference database of protein functions, accurately annotates microbiome molecular functionality. mi-faser’s minutes-per-microbiome processing speed is significantly faster than that of other methods, allowing for large scale comparisons. Microbiome function vectors can be compared between different conditions to highlight environment-specific and/or time-dependent changes in functionality. Here, we identified previously unseen oil degradation-specific functions in BP oil-spill data, as well as functional signatures of individual-specific gut microbiome responses to a dietary intervention in children with Prader–Willi syndrome. Our method also revealed variability in Crohn's Disease patient microbiomes and clearly distinguished them from those of related healthy individuals. Our analysis highlighted the microbiome role in CD pathogenicity, demonstrating enrichment of patient microbiomes in functions that promote inflammation and that help bacteria survive it. PMID:29194524

  4. Multifractal analysis of 2001 Mw 7 . 7 Bhuj earthquake sequence in Gujarat, Western India

    NASA Astrophysics Data System (ADS)

    Aggarwal, Sandeep Kumar; Pastén, Denisse; Khan, Prosanta Kumar

    2017-12-01

    The 2001 Mw 7 . 7 Bhuj mainshock seismic sequence in the Kachchh area, occurring during 2001 to 2012, has been analyzed using mono-fractal and multi-fractal dimension spectrum analysis technique. This region was characterized by frequent moderate shocks of Mw ≥ 5 . 0 for more than a decade since the occurrence of 2001 Bhuj earthquake. The present study is therefore important for precursory analysis using this sequence. The selected long-sequence has been investigated first time for completeness magnitude Mc 3.0 using the maximum curvature method. Multi-fractal Dq spectrum (Dq ∼ q) analysis was carried out using effective window-length of 200 earthquakes with a moving window of 20 events overlapped by 180 events. The robustness of the analysis has been tested by considering the magnitude completeness correction term of 0.2 to Mc 3.0 as Mc 3.2 and we have tested the error in the calculus of Dq for each magnitude threshold. On the other hand, the stability of the analysis has been investigated down to the minimum magnitude of Mw ≥ 2 . 6 in the sequence. The analysis shows the multi-fractal dimension spectrum Dq decreases with increasing of clustering of events with time before a moderate magnitude earthquake in the sequence, which alternatively accounts for non-randomness in the spatial distribution of epicenters and its self-organized criticality. Similar behavior is ubiquitous elsewhere around the globe, and warns for proximity of a damaging seismic event in an area. OS: Please confirm math roman or italics in abs.

  5. Next Generation Sequence Analysis and Computational Genomics Using Graphical Pipeline Workflows

    PubMed Central

    Torri, Federica; Dinov, Ivo D.; Zamanyan, Alen; Hobel, Sam; Genco, Alex; Petrosyan, Petros; Clark, Andrew P.; Liu, Zhizhong; Eggert, Paul; Pierce, Jonathan; Knowles, James A.; Ames, Joseph; Kesselman, Carl; Toga, Arthur W.; Potkin, Steven G.; Vawter, Marquis P.; Macciardi, Fabio

    2012-01-01

    Whole-genome and exome sequencing have already proven to be essential and powerful methods to identify genes responsible for simple Mendelian inherited disorders. These methods can be applied to complex disorders as well, and have been adopted as one of the current mainstream approaches in population genetics. These achievements have been made possible by next generation sequencing (NGS) technologies, which require substantial bioinformatics resources to analyze the dense and complex sequence data. The huge analytical burden of data from genome sequencing might be seen as a bottleneck slowing the publication of NGS papers at this time, especially in psychiatric genetics. We review the existing methods for processing NGS data, to place into context the rationale for the design of a computational resource. We describe our method, the Graphical Pipeline for Computational Genomics (GPCG), to perform the computational steps required to analyze NGS data. The GPCG implements flexible workflows for basic sequence alignment, sequence data quality control, single nucleotide polymorphism analysis, copy number variant identification, annotation, and visualization of results. These workflows cover all the analytical steps required for NGS data, from processing the raw reads to variant calling and annotation. The current version of the pipeline is freely available at http://pipeline.loni.ucla.edu. These applications of NGS analysis may gain clinical utility in the near future (e.g., identifying miRNA signatures in diseases) when the bioinformatics approach is made feasible. Taken together, the annotation tools and strategies that have been developed to retrieve information and test hypotheses about the functional role of variants present in the human genome will help to pinpoint the genetic risk factors for psychiatric disorders. PMID:23139896

  6. Probabilistic topic modeling for the analysis and classification of genomic sequences

    PubMed Central

    2015-01-01

    Background Studies on genomic sequences for classification and taxonomic identification have a leading role in the biomedical field and in the analysis of biodiversity. These studies are focusing on the so-called barcode genes, representing a well defined region of the whole genome. Recently, alignment-free techniques are gaining more importance because they are able to overcome the drawbacks of sequence alignment techniques. In this paper a new alignment-free method for DNA sequences clustering and classification is proposed. The method is based on k-mers representation and text mining techniques. Methods The presented method is based on Probabilistic Topic Modeling, a statistical technique originally proposed for text documents. Probabilistic topic models are able to find in a document corpus the topics (recurrent themes) characterizing classes of documents. This technique, applied on DNA sequences representing the documents, exploits the frequency of fixed-length k-mers and builds a generative model for a training group of sequences. This generative model, obtained through the Latent Dirichlet Allocation (LDA) algorithm, is then used to classify a large set of genomic sequences. Results and conclusions We performed classification of over 7000 16S DNA barcode sequences taken from Ribosomal Database Project (RDP) repository, training probabilistic topic models. The proposed method is compared to the RDP tool and Support Vector Machine (SVM) classification algorithm in a extensive set of trials using both complete sequences and short sequence snippets (from 400 bp to 25 bp). Our method reaches very similar results to RDP classifier and SVM for complete sequences. The most interesting results are obtained when short sequence snippets are considered. In these conditions the proposed method outperforms RDP and SVM with ultra short sequences and it exhibits a smooth decrease of performance, at every taxonomic level, when the sequence length is decreased. PMID:25916734

  7. Solid-state NMR spectroscopy of 18.5 kDa myelin basic protein reconstituted with lipid vesicles: spectroscopic characterisation and spectral assignments of solvent-exposed protein fragments.

    PubMed

    Zhong, Ligang; Bamm, Vladimir V; Ahmed, Mumdooh A M; Harauz, George; Ladizhansky, Vladimir

    2007-12-01

    Myelin basic protein (MBP, 18.5 kDa isoform) is a peripheral membrane protein that is essential for maintaining the structural integrity of the multilamellar myelin sheath of the central nervous system. Reconstitution of the most abundant 18.5 kDa MBP isoform with lipid vesicles yields an aggregated assembly mimicking the protein's natural environment, but which is not amenable to standard solution NMR spectroscopy. On the other hand, the mobility of MBP in such a system is variable, depends on the local strength of the protein-lipid interaction, and in general is of such a time scale that the dipolar interactions are averaged out. Here, we used a combination of solution and solid-state NMR (ssNMR) approaches: J-coupling-driven polarization transfers were combined with magic angle spinning and high-power decoupling to yield high-resolution spectra of the mobile fragments of 18.5 kDa murine MBP in membrane-associated form. To partially circumvent the problem of short transverse relaxation, we implemented three-dimensional constant-time correlation experiments (NCOCX, NCACX, CONCACX, and CAN(CO)CX) that were able to provide interresidue and intraresidue backbone correlations. These experiments resulted in partial spectral assignments for mobile fragments of the protein. Additional nuclear Overhauser effect spectroscopy (NOESY)-based experiments revealed that the mobile fragments were exposed to solvent and were likely located outside the lipid bilayer, or in its hydrophilic portion. Chemical shift index analysis showed that the fragments were largely disordered under these conditions. These combined approaches are applicable to ssNMR investigations of other peripheral membrane proteins reconstituted with lipids.

  8. Comparative Genome Sequence Analysis of the Bpa/Str Region in Mouse and Man

    PubMed Central

    Mallon, A.-M.; Platzer, M.; Bate, R.; Gloeckner, G.; Botcherby, M.R.M.; Nordsiek, G.; Strivens, M.A.; Kioschis, P.; Dangel, A.; Cunningham, D.; Straw, R.N.A.; Weston, P.; Gilbert, M.; Fernando, S.; Goodall, K.; Hunter, G.; Greystrong, J.S.; Clarke, D.; Kimberley, C.; Goerdes, M.; Blechschmidt, K.; Rump, A.; Hinzmann, B.; Mundy, C.R.; Miller, W.; Poustka, A.; Herman, G.E.; Rhodes, M.; Denny, P.; Rosenthal, A.; Brown, S.D.M.

    2000-01-01

    The progress of human and mouse genome sequencing programs presages the possibility of systematic cross-species comparison of the two genomes as a powerful tool for gene and regulatory element identification. As the opportunities to perform comparative sequence analysis emerge, it is important to develop parameters for such analyses and to examine the outcomes of cross-species comparison. Our analysis used gene prediction and a database search of 430 kb of genomic sequence covering the Bpa/Str region of the mouse X chromosome, and 745 kb of genomic sequence from the homologous human X chromosome region. We identified 11 genes in mouse and 13 genes and two pseudogenes in human. In addition, we compared the mouse and human sequences using pairwise alignment and searches for evolutionary conserved regions (ECRs) exceeding a defined threshold of sequence identity. This approach aided the identification of at least four further putative conserved genes in the region. Comparative sequencing revealed that this region is a mosaic in evolutionary terms, with considerably more rearrangement between the two species than realized previously from comparative mapping studies. Surprisingly, this region showed an extremely high LINE and low SINE content, low G+C content, and yet a relatively high gene density, in contrast to the low gene density usually associated with such regions. [The sequence data described in this paper have been submitted to EMBL under the following accession nos.: Mouse Genomic Sequence: Mouse contig A (AL021127), Mouse contig B (AL049866), BAC41M10 (AL136328), PAC303O11(AL136329). Human Genomic Sequence: Human contig 1 (U82671, U82670), Human contig 2 (U82695).] PMID:10854409

  9. Comparative analysis and molecular characterization of genomic sequences and proteins of FABP4 and FABP5 from the giant panda (Ailuropoda melanoleuca).

    PubMed

    Song, B; Hou, Y L; Ding, X; Wang, T; Wang, F; Zhong, J C; Xu, T; Zhong, J; Hou, W R; Shuai, S R

    2014-02-20

    Fatty acid binding proteins (FABPs) are a family of small, highly conserved cytoplasmic proteins that bind long-chain fatty acids and other hydrophobic ligands. In this study, cDNA and genomic sequences of FABP4 and FABP5 were cloned successfully from the giant panda (Ailuropoda melanoleuca) using reverse transcription polymerase chain reaction (RT-PCR) technology and touchdown-PCR. The cDNAs of FABP4 and FABP5 cloned from the giant panda were 400 and 413 bp in length, containing an open reading frame of 399 and 408 bp, encoding 132 and 135 amino acids, respectively. The genomic sequences of FABP4 and FABP5 were 3976 and 3962 bp, respectively, which each contained four exons and three introns. Sequence alignment indicated a high degree of homology with reported FABP sequences of other mammals at both the amino acid and DNA levels. Topology prediction revealed seven protein kinase C phosphorylation sites, two casein kinase II phosphorylation sites, two N-myristoylation sites, and one cytosolic fatty acid-binding protein signature in the FABP4 protein, and three N-glycosylation sites, three protein kinase C phosphorylation sites, one casein kinase II phosphorylation site, one N-myristoylation site, one amidation site, and one cytosolic fatty acid-binding protein signature in the FABP5 protein. The FABP4 and FABP5 genes were overexpressed in Escherichia coli BL21 and they produced the expected 16.8- and 17.0-kDa polypeptides. The results obtained in this study provide information for further in-depth research of this system, which has great value of both theoretical and practical significance.

  10. Porins of Pseudomonas fluorescens MFO as fibronectin-binding proteins.

    PubMed

    Rebière-Huët, J; Guérillon, J; Pimenta, A L; Di Martino, P; Orange, N; Hulen, C

    2002-09-24

    Bacterial adherence is a complex phenomenon involving specific interactions between receptors, including matricial fibronectin, and bacterial ligands. We show here that fibronectin and outer membrane proteins of Pseudomonas fluorescens were able to inhibit adherence of P. fluorescens to fibronectin-coated wells. We identified at least six fibronectin-binding proteins with molecular masses of 70, 55, 44, 37, 32 and 28 kDa. The presence of native (32 kDa) and heat-modified forms (37 kDa) of OprF was revealed by immuno-analysis and the 44-kDa band was composed of three proteins, their N-terminal sequences showing homologies with Pseudomonas aeruginosa porins (OprD, OprE1 and OprE3).

  11. Analysis of plant microbe interactions in the era of next generation sequencing technologies

    PubMed Central

    Knief, Claudia

    2014-01-01

    Next generation sequencing (NGS) technologies have impressively accelerated research in biological science during the last years by enabling the production of large volumes of sequence data to a drastically lower price per base, compared to traditional sequencing methods. The recent and ongoing developments in the field allow addressing research questions in plant-microbe biology that were not conceivable just a few years ago. The present review provides an overview of NGS technologies and their usefulness for the analysis of microorganisms that live in association with plants. Possible limitations of the different sequencing systems, in particular sources of errors and bias, are critically discussed and methods are disclosed that help to overcome these shortcomings. A focus will be on the application of NGS methods in metagenomic studies, including the analysis of microbial communities by amplicon sequencing, which can be considered as a targeted metagenomic approach. Different applications of NGS technologies are exemplified by selected research articles that address the biology of the plant associated microbiota to demonstrate the worth of the new methods. PMID:24904612

  12. Cooperative Interactions between 480 kDa Ankyrin-G and EB Proteins Assemble the Axon Initial Segment.

    PubMed

    Fréal, Amélie; Fassier, Coralie; Le Bras, Barbara; Bullier, Erika; De Gois, Stéphanie; Hazan, Jamilé; Hoogenraad, Casper C; Couraud, François

    2016-04-20

    The axon initial segment (AIS) is required for generating action potentials and maintaining neuronal polarity. Significant progress has been made in deciphering the basic building blocks composing the AIS, but the underlying mechanisms required for AIS formation remains unclear. The scaffolding protein ankyrin-G is the master-organizer of the AIS. Microtubules and their interactors, particularly end-binding proteins (EBs), have emerged as potential key players in AIS formation. Here, we show that the longest isoform of ankyrin-G (480AnkG) selectively associates with EBs via its specific tail domain and that this interaction is crucial for AIS formation and neuronal polarity in cultured rodent hippocampal neurons. EBs are essential for 480AnkG localization and stabilization at the AIS, whereas 480AnkG is required for the specific accumulation of EBs in the proximal axon. Our findings thus provide a conceptual framework for understanding how the cooperative relationship between 480AnkG and EBs induces the assembly of microtubule-AIS structures in the proximal axon. Neuronal polarity is crucial for the proper function of neurons. The assembly of the axon initial segment (AIS), which is the hallmark of early neuronal polarization, relies on the longest 480 kDa ankyrin-G isoform. The microtubule cytoskeleton and its interacting proteins were suggested to be early key players in the process of AIS formation. In this study, we show that the crosstalk between 480 kDa ankyrin-G and the microtubule plus-end tracking proteins, EBs, at the proximal axon is decisive for AIS assembly and neuronal polarity. Our work thus provides insight into the functional mechanisms used by 480 kDa ankyrin-G to drive the AIS formation and thereby to establish neuronal polarity. Copyright © 2016 the authors 0270-6474/16/364421-13$15.00/0.

  13. Core genome conservation of Staphylococcus haemolyticus limits sequence based population structure analysis.

    PubMed

    Cavanagh, Jorunn Pauline; Klingenberg, Claus; Hanssen, Anne-Merethe; Fredheim, Elizabeth Aarag; Francois, Patrice; Schrenzel, Jacques; Flægstad, Trond; Sollid, Johanna Ericson

    2012-06-01

    The notoriously multi-resistant Staphylococcus haemolyticus is an emerging pathogen causing serious infections in immunocompromised patients. Defining the population structure is important to detect outbreaks and spread of antimicrobial resistant clones. Currently, the standard typing technique is pulsed-field gel electrophoresis (PFGE). In this study we describe novel molecular typing schemes for S. haemolyticus using multi locus sequence typing (MLST) and multi locus variable number of tandem repeats (VNTR) analysis. Seven housekeeping genes (MLST) and five VNTR loci (MLVF) were selected for the novel typing schemes. A panel of 45 human and veterinary S. haemolyticus isolates was investigated. The collection had diverse PFGE patterns (38 PFGE types) and was sampled over a 20 year-period from eight countries. MLST resolved 17 sequence types (Simpsons index of diversity [SID]=0.877) and MLVF resolved 14 repeat types (SID=0.831). We found a low sequence diversity. Phylogenetic analysis clustered the isolates in three (MLST) and one (MLVF) clonal complexes, respectively. Taken together, neither the MLST nor the MLVF scheme was suitable to resolve the population structure of this S. haemolyticus collection. Future MLVF and MLST schemes will benefit from addition of more variable core genome sequences identified by comparing different fully sequenced S. haemolyticus genomes. Copyright © 2012 Elsevier B.V. All rights reserved.

  14. Task analysis in curriculum design: a hierarchically sequenced introductory mathematics curriculum1

    PubMed Central

    Resnick, Lauren B.; Wang, Margaret C.; Kaplan, Jerome

    1973-01-01

    A method of systematic task analysis is applied to the problem of designing a sequence of learning objectives that will provide an optimal match for the child's natural sequence of acquisition of mathematical skills and concepts. The authors begin by proposing an operational definition of the number concept in the form of a set of behaviors which, taken together, permit the inference that the child has an abstract concept of “number”. These are the “objectives” of the curriculum. Each behavior in the defining set is then subjected to an analysis that identifies hypothesized components of skilled performance and prerequisites for learning these components. On the basis of these analyses, specific sequences of learning objectives are proposed. The proposed sequences are hypothesized to be those that will best facilitate learning, by maximizing transfer from earlier to later objectives. Relevant literature on early learning and cognitive development is considered in conjunction with the analyses and the resulting sequences. The paper concludes with a discussion of the ways in which the curriculum can be implemented and studied in schools. Examples of data on individual children are presented, and the use of such data for improving the curriculum itself, as well as for examining the effects of other treatment variables, is considered. PMID:16795452

  15. Now and Next-Generation Sequencing Techniques: Future of Sequence Analysis Using Cloud Computing

    PubMed Central

    Thakur, Radhe Shyam; Bandopadhyay, Rajib; Chaudhary, Bratati; Chatterjee, Sourav

    2012-01-01

    Advances in the field of sequencing techniques have resulted in the greatly accelerated production of huge sequence datasets. This presents immediate challenges in database maintenance at datacenters. It provides additional computational challenges in data mining and sequence analysis. Together these represent a significant overburden on traditional stand-alone computer resources, and to reach effective conclusions quickly and efficiently, the virtualization of the resources and computation on a pay-as-you-go concept (together termed “cloud computing”) has recently appeared. The collective resources of the datacenter, including both hardware and software, can be available publicly, being then termed a public cloud, the resources being provided in a virtual mode to the clients who pay according to the resources they employ. Examples of public companies providing these resources include Amazon, Google, and Joyent. The computational workload is shifted to the provider, which also implements required hardware and software upgrades over time. A virtual environment is created in the cloud corresponding to the computational and data storage needs of the user via the internet. The task is then performed, the results transmitted to the user, and the environment finally deleted after all tasks are completed. In this discussion, we focus on the basics of cloud computing, and go on to analyze the prerequisites and overall working of clouds. Finally, the applications of cloud computing in biological systems, particularly in comparative genomics, genome informatics, and SNP detection are discussed with reference to traditional workflows. PMID:23248640

  16. Now and next-generation sequencing techniques: future of sequence analysis using cloud computing.

    PubMed

    Thakur, Radhe Shyam; Bandopadhyay, Rajib; Chaudhary, Bratati; Chatterjee, Sourav

    2012-01-01

    Advances in the field of sequencing techniques have resulted in the greatly accelerated production of huge sequence datasets. This presents immediate challenges in database maintenance at datacenters. It provides additional computational challenges in data mining and sequence analysis. Together these represent a significant overburden on traditional stand-alone computer resources, and to reach effective conclusions quickly and efficiently, the virtualization of the resources and computation on a pay-as-you-go concept (together termed "cloud computing") has recently appeared. The collective resources of the datacenter, including both hardware and software, can be available publicly, being then termed a public cloud, the resources being provided in a virtual mode to the clients who pay according to the resources they employ. Examples of public companies providing these resources include Amazon, Google, and Joyent. The computational workload is shifted to the provider, which also implements required hardware and software upgrades over time. A virtual environment is created in the cloud corresponding to the computational and data storage needs of the user via the internet. The task is then performed, the results transmitted to the user, and the environment finally deleted after all tasks are completed. In this discussion, we focus on the basics of cloud computing, and go on to analyze the prerequisites and overall working of clouds. Finally, the applications of cloud computing in biological systems, particularly in comparative genomics, genome informatics, and SNP detection are discussed with reference to traditional workflows.

  17. in silico Whole Genome Sequencer & Analyzer (iWGS): A Computational Pipeline to Guide the Design and Analysis of de novo Genome Sequencing Studies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhou, Xiaofan; Peris, David; Kominek, Jacek

    The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding its potential for understanding the biology and evolution of the full spectrum of biodiversity. The increasing diversity of sequencing technologies, assays, and de novo assembly algorithms have augmented the complexity of de novo genome sequencing projects in nonmodel organisms. To reduce the costs and challenges in de novo genome sequencing projects and streamline their experimentalmore » design and analysis, we developed iWGS (in silico Whole Genome Sequencer and Analyzer), an automated pipeline for guiding the choice of appropriate sequencing strategy and assembly protocols. iWGS seamlessly integrates the four key steps of a de novo genome sequencing project: data generation (through simulation), data quality control, de novo assembly, and assembly evaluation and validation. The last three steps can also be applied to the analysis of real data. iWGS is designed to enable the user to have great flexibility in testing the range of experimental designs available for genome sequencing projects, and supports all major sequencing technologies and popular assembly tools. Three case studies illustrate how iWGS can guide the design of de novo genome sequencing projects, and evaluate the performance of a wide variety of user-specified sequencing strategies and assembly protocols on genomes of differing architectures. iWGS, along with a detailed documentation, is freely available at https://github.com/zhouxiaofan1983/iWGS.« less

  18. in silico Whole Genome Sequencer & Analyzer (iWGS): A Computational Pipeline to Guide the Design and Analysis of de novo Genome Sequencing Studies

    DOE PAGES

    Zhou, Xiaofan; Peris, David; Kominek, Jacek; ...

    2016-09-16

    The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding its potential for understanding the biology and evolution of the full spectrum of biodiversity. The increasing diversity of sequencing technologies, assays, and de novo assembly algorithms have augmented the complexity of de novo genome sequencing projects in nonmodel organisms. To reduce the costs and challenges in de novo genome sequencing projects and streamline their experimentalmore » design and analysis, we developed iWGS (in silico Whole Genome Sequencer and Analyzer), an automated pipeline for guiding the choice of appropriate sequencing strategy and assembly protocols. iWGS seamlessly integrates the four key steps of a de novo genome sequencing project: data generation (through simulation), data quality control, de novo assembly, and assembly evaluation and validation. The last three steps can also be applied to the analysis of real data. iWGS is designed to enable the user to have great flexibility in testing the range of experimental designs available for genome sequencing projects, and supports all major sequencing technologies and popular assembly tools. Three case studies illustrate how iWGS can guide the design of de novo genome sequencing projects, and evaluate the performance of a wide variety of user-specified sequencing strategies and assembly protocols on genomes of differing architectures. iWGS, along with a detailed documentation, is freely available at https://github.com/zhouxiaofan1983/iWGS.« less

  19. Sequence analysis of the genome of carnation (Dianthus caryophyllus L.).

    PubMed

    Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

    2014-06-01

    The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. 'Francesco' was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568,887,315 bp, consisting of 45,088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16,644 bp and 60,737 bp, respectively, and the longest scaffold was 1,287,144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼ 98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. © The Author 2013. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  20. [Stimulation of cell cultures recovery after cryopreservation by the cattle cord blood FRACTION (below 5 kDa) or Actovegin].

    PubMed

    Gulevskiĭ, A K; Trifonova, A V; Lavrik, A A

    2013-01-01

    The capacities of the cattle cord blood low-molecular fraction (below 5 kDa) and Actovegin (the vealer blood fraction (below 5 kDa)) for recovering functions of cell cultures after cryopreservation compared. Their influence proliferation of the flozen-thawed cell cultures, certain stages of their growth, cell attachment, rate of cell spreading, and mitotic regiment has been studied. Both the cord blood low-molecular fraction and Actovegin were shown to stimulate growth of the cell cultures after cryopreservation more efficiently at the concentration of 224 μg/ml. However, despite the stimulating effect discovered, their application did not bring proliferative indices on the 1st passage after cryopreservation to the values of the native culture. The effects of the cord blood low-molecular fraction and Actovegin on the human fibroblast culture were identical by the following parameters: cell attachment, rates of cell spreading and proliferation. In culture BHK-21 clone 13/04 the efficiency of Actovegin was low, while the cord blood low-molecular fraction has a conspicuous stimulating effect on its adhesion and proliferation. The investigations carried out can serve as a basis for the development of regenerative media containing the cattle cord blood low-molecular fraction (below 5 kDa) or Actovegin as active components at the concentration of 224 μg/ml with the purpose of fast recovery of culture prolifetative properties after cryopreservation.

  1. Genome sequencing and analysis of the model grass Brachypodium distachyon

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yang, Xiaohan; Kalluri, Udaya C; Tuskan, Gerald A

    Three subfamilies of grasses, the Ehrhartoideae, Panicoideae and Pooideae, provide the bulk of human nutrition and are poised to become major sources of renewable energy. Here we describe the genome sequence of the wild grass Brachypodium distachyon (Brachypodium), which is, to our knowledge, the first member of the Pooideae subfamily to be sequenced. Comparison of the Brachypodium, rice and sorghum genomes shows a precise history of genome evolution across a broad diversity of the grasses, and establishes a template for analysis of the large genomes of economically important pooid grasses such as wheat. The high-quality genome sequence, coupled with easemore » of cultivation and transformation, small size and rapid life cycle, will help Brachypodium reach its potential as an important model system for developing new energy and food crops.« less

  2. Analysis of expressed sequence tags from Prunus mume flower and fruit and development of simple sequence repeat markers

    PubMed Central

    2010-01-01

    Background Expressed Sequence Tag (EST) has been a cost-effective tool in molecular biology and represents an abundant valuable resource for genome annotation, gene expression, and comparative genomics in plants. Results In this study, we constructed a cDNA library of Prunus mume flower and fruit, sequenced 10,123 clones of the library, and obtained 8,656 expressed sequence tag (EST) sequences with high quality. The ESTs were assembled into 4,473 unigenes composed of 1,492 contigs and 2,981 singletons and that have been deposited in NCBI (accession IDs: GW868575 - GW873047), among which 1,294 unique ESTs were with known or putative functions. Furthermore, we found 1,233 putative simple sequence repeats (SSRs) in the P. mume unigene dataset. We randomly tested 42 pairs of PCR primers flanking potential SSRs, and 14 pairs were identified as true-to-type SSR loci and could amplify polymorphic bands from 20 individual plants of P. mume. We further used the 14 EST-SSR primer pairs to test the transferability on peach and plum. The result showed that nearly 89% of the primer pairs produced target PCR bands in the two species. A high level of marker polymorphism was observed in the plum species (65%) and low in the peach (46%), and the clustering analysis of the three species indicated that these SSR markers were useful in the evaluation of genetic relationships and diversity between and within the Prunus species. Conclusions We have constructed the first cDNA library of P. mume flower and fruit, and our data provide sets of molecular biology resources for P. mume and other Prunus species. These resources will be useful for further study such as genome annotation, new gene discovery, gene functional analysis, molecular breeding, evolution and comparative genomics between Prunus species. PMID:20626882

  3. Molecular analysis of Toxoplasma gondii Surface Antigen 1 (SAG1) gene cloned from Toxoplasma gondii DNA isolated from Javanese acute toxoplasmosis

    NASA Astrophysics Data System (ADS)

    Haryati, Sri; Agung Prasetyo, Afiono; Sari, Yulia; Dharmawan, Ruben

    2018-05-01

    Toxoplasma gondii Surface Antigen 1 (SAG1) is often used as a diagnostic tool due to its immunodominant-specific as antigen. However, data of the Toxoplasma gondii SAG1 protein from Indonesian isolate is limited. To study the protein, genomic DNA was isolated from a Javanese acute toxoplasmosis blood samples patient. A complete coding sequence of Toxoplasma gondii SAG1 was cloned and inserted into an Escherichia coli expression plasmid and sequenced. The sequencing results were subjected to bioinformatics analysis. The Toxoplasma gondii SAG1 complete coding sequences were successfully cloned. Physicochemical analysis revealed the 336 aa of SAG1 had 34.7 kDa of weight. The isoelectric point and aliphatic index were 8.4 and 78.4, respectively. The N-terminal methionine half-life in Escherichia coli was more than 10 hours. The antigenicity, secondary structure, and identification of the HLA binding motifs also had been discussed. The results of this study would contribute information about Toxoplasma gondii SAG1 and benefits for further works willing to develop diagnostic and therapeutic strategies against the parasite.

  4. Lactobacillus strain diversity based on partial hsp60 gene sequences and design of PCR-restriction fragment length polymorphism assays for species identification and differentiation.

    PubMed

    Blaiotta, Giuseppe; Fusco, Vincenzina; Ercolini, Danilo; Aponte, Maria; Pepe, Olimpia; Villani, Francesco

    2008-01-01

    A phylogenetic tree showing diversities among 116 partial (499-bp) Lactobacillus hsp60 (groEL, encoding a 60-kDa heat shock protein) nucleotide sequences was obtained and compared to those previously described for 16S rRNA and tuf gene sequences. The topology of the tree produced in this study showed a Lactobacillus species distribution similar, but not identical, to those previously reported. However, according to the most recent systematic studies, a clear differentiation of 43 single-species clusters was detected/identified among the sequences analyzed. The slightly higher variability of the hsp60 nucleotide sequences than of the 16S rRNA sequences offers better opportunities to design or develop molecular assays allowing identification and differentiation of either distant or very closely related Lactobacillus species. Therefore, our results suggest that hsp60 can be considered an excellent molecular marker for inferring the taxonomy and phylogeny of members of the genus Lactobacillus and that the chosen primers can be used in a simple PCR procedure allowing the direct sequencing of the hsp60 fragments. Moreover, in this study we performed a computer-aided restriction endonuclease analysis of all 499-bp hsp60 partial sequences and we showed that the PCR-restriction fragment length polymorphism (RFLP) patterns obtainable by using both endonucleases AluI and TacI (in separate reactions) can allow identification and differentiation of all 43 Lactobacillus species considered, with the exception of the pair L. plantarum/L. pentosus. However, the latter species can be differentiated by further analysis with Sau3AI or MseI. The hsp60 PCR-RFLP approach was efficiently applied to identify and to differentiate a total of 110 wild Lactobacillus strains (including closely related species, such as L. casei and L. rhamnosus or L. plantarum and L. pentosus) isolated from cheese and dry-fermented sausages.

  5. Genomic insight into the common carp (Cyprinus carpio) genome by sequencing analysis of BAC-end sequences

    PubMed Central

    2011-01-01

    Background Common carp is one of the most important aquaculture teleost fish in the world. Common carp and other closely related Cyprinidae species provide over 30% aquaculture production in the world. However, common carp genomic resources are still relatively underdeveloped. BAC end sequences (BES) are important resources for genome research on BAC-anchored genetic marker development, linkage map and physical map integration, and whole genome sequence assembling and scaffolding. Result To develop such valuable resources in common carp (Cyprinus carpio), a total of 40,224 BAC clones were sequenced on both ends, generating 65,720 clean BES with an average read length of 647 bp after sequence processing, representing 42,522,168 bp or 2.5% of common carp genome. The first survey of common carp genome was conducted with various bioinformatics tools. The common carp genome contains over 17.3% of repetitive elements with GC content of 36.8% and 518 transposon ORFs. To identify and develop BAC-anchored microsatellite markers, a total of 13,581 microsatellites were detected from 10,355 BES. The coding region of 7,127 genes were recognized from 9,443 BES on 7,453 BACs, with 1,990 BACs have genes on both ends. To evaluate the similarity to the genome of closely related zebrafish, BES of common carp were aligned against zebrafish genome. A total of 39,335 BES of common carp have conserved homologs on zebrafish genome which demonstrated the high similarity between zebrafish and common carp genomes, indicating the feasibility of comparative mapping between zebrafish and common carp once we have physical map of common carp. Conclusion BAC end sequences are great resources for the first genome wide survey of common carp. The repetitive DNA was estimated to be approximate 28% of common carp genome, indicating the higher complexity of the genome. Comparative analysis had mapped around 40,000 BES to zebrafish genome and established over 3,100 microsyntenies, covering over 50% of

  6. Platelet cytosolic 44-kDa protein is a substrate of cholera toxin-induced ADP-ribosylation and is not recognized by antisera against the. alpha. subunit of the stimulatory guanine nucleotide-binding regulatory protein

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Molina Y Vedia, L.M.; Reep, B.R.; Lapetina, E.G.

    1988-08-01

    ADP-ribosylation induced by cholera toxin and pertussis toxin was studied in particulate and cytosolic fractions of human platelets. Platelets were disrupted by a cycle of freezing and thawing in the presence of a hyposmotic buffer containing protease inhibitors. In both fractions, the A subunit of cholera toxin ADP-ribosylates two proteins with molecular masses of 42 and 44 kDa, whereas pertussis toxin ADP-ribosylates a 41-kDa polypeptide. Two antisera against the {alpha} subunit of the stimulatory guanine nucleotide-binding regulatory protein recognize only the 42-kDa polypeptide. Cholera toxin-induced ADP-ribosylation of the 42- and 44-kDa proteins is reduced by pretreatment of platelets with iloprost,more » a prostacyclin analog. The 44-kDa protein, which is substrate of cholera toxin, could be extracted completely from the membrane and recovered in the cytosolic fraction when the cells were disrupted by Dounce homogenization and the pellet was extensively washed. A 44-kDa protein can also be labeled with 8-azidoguanosine 5{prime}-({alpha}-{sup 32}P)triphosphate in the cytosol and membranes. These finding indicate that cholera and pertussis toxins produced covalent modifications of proteins present in particulate and cytosolic platelet fractions. Moreover, the 44-kDa protein might be an {alpha} subunit of a guanine nucleotide-binding regulatory protein that is not recognized by available antisera.« less

  7. Sequence analysis of cultivated strawberry (Fragaria × ananassa Duch.) using microdissected single somatic chromosomes.

    PubMed

    Yanagi, Tomohiro; Shirasawa, Kenta; Terachi, Mayuko; Isobe, Sachiko

    2017-01-01

    Cultivated strawberry ( Fragaria  ×  ananassa Duch.) has homoeologous chromosomes because of allo-octoploidy. For example, two homoeologous chromosomes that belong to different sub-genome of allopolyploids have similar base sequences. Thus, when conducting de novo assembly of DNA sequences, it is difficult to determine whether these sequences are derived from the same chromosome. To avoid the difficulties associated with homoeologous chromosomes and demonstrate the possibility of sequencing allopolyploids using single chromosomes, we conducted sequence analysis using microdissected single somatic chromosomes of cultivated strawberry. Three hundred and ten somatic chromosomes of the Japanese octoploid strawberry 'Reiko' were individually selected under a light microscope using a microdissection system. DNA from 288 of the dissected chromosomes was successfully amplified using a DNA amplification kit. Using next-generation sequencing, we decoded the base sequences of the amplified DNA segments, and on the basis of mapping, we identified DNA sequences from 144 samples that were best matched to the reference genomes of the octoploid strawberry, F.  ×  ananassa , and the diploid strawberry, F. vesca . The 144 samples were classified into seven pseudo-molecules of F. vesca . The coverage rates of the DNA sequences from the single chromosome onto all pseudo-molecular sequences varied from 3 to 29.9%. We demonstrated an efficient method for sequence analysis of allopolyploid plants using microdissected single chromosomes. On the basis of our results, we believe that whole-genome analysis of allopolyploid plants can be enhanced using methodology that employs microdissected single chromosomes.

  8. Cloning, sequencing, and expression of the Pseudomonas testosteroni gene encoding 3-oxosteroid delta 1-dehydrogenase.

    PubMed Central

    Plesiat, P; Grandguillot, M; Harayama, S; Vragar, S; Michel-Briand, Y

    1991-01-01

    Pseudomonas testosteroni ATCC 17410 is able to grow on testosterone. This strain was mutagenized by Tn5, and 41 mutants defective in the utilization of testosterone were isolated. One of them, called mutant 06, expressed 3-oxosteroid delta 1- and 3-oxosteroid delta 4-5 alpha-dehydrogenases only at low levels. The DNA region around the Tn5 insertion in mutant 06 was cloned into pUC19, and the 1-kbp EcoRI-BamHI segment neighbor to the Tn5 insertion was used to probe DNA from the wild-type strain. The probe hybridized to a 7.8-kbp SalI fragment. Plasmid pTES5, which is a pUC19 derivative containing this 7.8-kbp SalI fragment, was isolated after the screening by the 1-kbp EcoRI-BamHI probe. This plasmid expressed delta 1-dehydrogenase in Escherichia coli cells. The 2.2-kbp KpnI-KpnI segment of pTES5 was subcloned into pUC18, and pTEK21 was constructed. In E. coli containing the lacIq plasmid pRG1 and pTEK21, the expression of delta 1-dehydrogenase was induced by isopropyl-beta-D-thiogalactopyranoside (IPTG). The induced level was about 40 times higher than the induced level in P. testosteroni. Delta 1-Dehydrogenase synthesized in E. coli was localized in the inner membrane fraction. The minicell experiments showed that a 59-kDa polypeptide was synthesized from pTEK21, and this polypeptide was located in the inner membrane fraction. The complete nucleotide sequence of the 2.2-kbp KpnI-KpnI segment of pTEK21 was determined. An open reading frame which encodes a 62.4-kDa polypeptide and which is preceded by a Shine-Dalgarno-like sequence was identified. The first 44 amino acids of the putative product exhibited significant sequence similarity to the N-terminal sequences of lipoamide dehydrogenases. Images FIG. 4 PMID:1657885

  9. PMS2 gene mutational analysis: direct cDNA sequencing to circumvent pseudogene interference.

    PubMed

    Wimmer, Katharina; Wernstedt, Annekatrin

    2014-01-01

    The presence of highly homologous pseudocopies can compromise the mutation analysis of a gene of interest. In particular, when using PCR-based strategies, pseudogene co-amplification has to be effectively prevented. This is often achieved by using primers designed to be parental gene specific according to the reference sequence and by applying stringent PCR conditions. However, there are cases in which this approach is of limited utility. For example, it has been shown that the PMS2 gene exchanges sequences with one of its pseudogenes, named PMS2CL. This results in functional PMS2 alleles containing pseudogene-derived sequences at their 3'-end and in nonfunctional PMS2CL pseudogene alleles that contain gene-derived sequences. Hence, the paralogues cannot be distinguished according to the reference sequence. This shortcoming can be effectively circumvented by using direct cDNA sequencing. This approach is based on the selective amplification of PMS2 transcripts in two overlapping 1.6-kb RT-PCR products. In addition to avoiding pseudogene co-amplification and allele dropout, this method has also the advantage that it allows to effectively identify deletions, splice mutations, and de novo retrotransposon insertions that escape the detection of most DNA-based mutation analysis protocols.

  10. ReSeqTools: an integrated toolkit for large-scale next-generation sequencing based resequencing analysis.

    PubMed

    He, W; Zhao, S; Liu, X; Dong, S; Lv, J; Liu, D; Wang, J; Meng, Z

    2013-12-04

    Large-scale next-generation sequencing (NGS)-based resequencing detects sequence variations, constructs evolutionary histories, and identifies phenotype-related genotypes. However, NGS-based resequencing studies generate extraordinarily large amounts of data, making computations difficult. Effective use and analysis of these data for NGS-based resequencing studies remains a difficult task for individual researchers. Here, we introduce ReSeqTools, a full-featured toolkit for NGS (Illumina sequencing)-based resequencing analysis, which processes raw data, interprets mapping results, and identifies and annotates sequence variations. ReSeqTools provides abundant scalable functions for routine resequencing analysis in different modules to facilitate customization of the analysis pipeline. ReSeqTools is designed to use compressed data files as input or output to save storage space and facilitates faster and more computationally efficient large-scale resequencing studies in a user-friendly manner. It offers abundant practical functions and generates useful statistics during the analysis pipeline, which significantly simplifies resequencing analysis. Its integrated algorithms and abundant sub-functions provide a solid foundation for special demands in resequencing projects. Users can combine these functions to construct their own pipelines for other purposes.

  11. Complete Genome Sequence and Comparative Analysis of the Fish Pathogen Lactococcus garvieae

    PubMed Central

    Oshima, Kenshiro; Yoshizaki, Mariko; Kawanishi, Michiko; Nakaya, Kohei; Suzuki, Takehito; Miyauchi, Eiji; Ishii, Yasuo; Tanabe, Soichi; Murakami, Masaru; Hattori, Masahira

    2011-01-01

    Lactococcus garvieae causes fatal haemorrhagic septicaemia in fish such as yellowtail. The comparative analysis of genomes of a virulent strain Lg2 and a non-virulent strain ATCC 49156 of L. garvieae revealed that the two strains shared a high degree of sequence identity, but Lg2 had a 16.5-kb capsule gene cluster that is absent in ATCC 49156. The capsule gene cluster was composed of 15 genes, of which eight genes are highly conserved with those in exopolysaccharide biosynthesis gene cluster often found in Lactococcus lactis strains. Sequence analysis of the capsule gene cluster in the less virulent strain L. garvieae Lg2-S, Lg2-derived strain, showed that two conserved genes were disrupted by a single base pair deletion, respectively. These results strongly suggest that the capsule is crucial for virulence of Lg2. The capsule gene cluster of Lg2 may be a genomic island from several features such as the presence of insertion sequences flanked on both ends, different GC content from the chromosomal average, integration into the locus syntenic to other lactococcal genome sequences, and distribution in human gut microbiomes. The analysis also predicted other potential virulence factors such as haemolysin. The present study provides new insights into understanding of the virulence mechanisms of L. garvieae in fish. PMID:21829716

  12. The Design and Analysis of Transposon-Insertion Sequencing Experiments

    PubMed Central

    Chao, Michael C.; Abel, Sören; Davis, Brigid M.; Waldor, Matthew K.

    2016-01-01

    Preface Transposon-insertion sequencing (TIS) is a powerful approach that can be widely applied to genome-wide definition of loci that are required for growth in diverse conditions. However, experimental design choices and stochastic biological processes can heavily influence the results of TIS experiments and affect downstream statistical analysis. Here, we discuss TIS experimental parameters and how these factors relate to the benefits and limitations of the various statistical frameworks that can be applied to computational analysis of TIS data. PMID:26775926

  13. Examining inter-family differences in intra-family (parent-adolescent) dynamics using grid-sequence analysis.

    PubMed

    Brinberg, Miriam; Fosco, Gregory M; Ram, Nilam

    2017-12-01

    Family systems theorists have forwarded a set of theoretical principles meant to guide family scientists and practitioners in their conceptualization of patterns of family interaction-intra-family dynamics-that, over time, give rise to family and individual dysfunction and/or adaptation. In this article, we present an analytic approach that merges state space grid methods adapted from the dynamic systems literature with sequence analysis methods adapted from molecular biology into a "grid-sequence" method for studying inter-family differences in intra-family dynamics. Using dyadic data from 86 parent-adolescent dyads who provided up to 21 daily reports about connectedness, we illustrate how grid-sequence analysis can be used to identify a typology of intrafamily dynamics and to inform theory about how specific types of intrafamily dynamics contribute to adolescent behavior problems and family members' mental health. Methodologically, grid-sequence analysis extends the toolbox of techniques for analysis of family experience sampling and daily diary data. Substantively, we identify patterns of family level microdynamics that may serve as new markers of risk/protective factors and potential points for intervention in families. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  14. Using RSAT oligo-analysis and dyad-analysis tools to discover regulatory signals in nucleic sequences.

    PubMed

    Defrance, Matthieu; Janky, Rekin's; Sand, Olivier; van Helden, Jacques

    2008-01-01

    This protocol explains how to discover functional signals in genomic sequences by detecting over- or under-represented oligonucleotides (words) or spaced pairs thereof (dyads) with the Regulatory Sequence Analysis Tools (http://rsat.ulb.ac.be/rsat/). Two typical applications are presented: (i) predicting transcription factor-binding motifs in promoters of coregulated genes and (ii) discovering phylogenetic footprints in promoters of orthologous genes. The steps of this protocol include purging genomic sequences to discard redundant fragments, discovering over-represented patterns and assembling them to obtain degenerate motifs, scanning sequences and drawing feature maps. The main strength of the method is its statistical ground: the binomial significance provides an efficient control on the rate of false positives. In contrast with optimization-based pattern discovery algorithms, the method supports the detection of under- as well as over-represented motifs. Computation times vary from seconds (gene clusters) to minutes (whole genomes). The execution of the whole protocol should take approximately 1 h.

  15. Analysis of loss of decay-heat-removal sequences at Browns Ferry Unit One

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Harrington, R.M.

    1983-01-01

    This paper summarizes the Oak Ridge National Laboratory (ORNL) report Loss of DHR Sequences at Browns Ferry Unit One - Accident Sequence Analysis (NUREG/CR-2973). The Loss of DHR investigation is the third in a series of accident studies concerning the BWR 4 - MK I containment plant design. These studies, sponsored by the Nuclear Regulatory Commission Severe Accident Sequence Analysis (SASA) program, have been conducted at ORNL with the full cooperation of the Tennessee Valley Authority (TVA). The purpose of the SASA studies is to predetermine the probable course of postulated severe accidents so as to establish the timing andmore » the sequence of events. The SASA studies also produce recommendations concerning the implementation of better system design and better emergency operating instructions and operator training. The ORNL studies also include a detailed, best-estimate calculation of the release and transport of radioactive fission products following postulated severe accidents.« less

  16. [Genome-scale sequence data processing and epigenetic analysis of DNA methylation].

    PubMed

    Wang, Ting-Zhang; Shan, Gao; Xu, Jian-Hong; Xue, Qing-Zhong

    2013-06-01

    A new approach recently developed for detecting cytosine DNA methylation (mC) and analyzing the genome-scale DNA methylation profiling, is called BS-Seq which is based on bisulfite conversion of genomic DNA combined with next-generation sequencing. The method can not only provide an insight into the difference of genome-scale DNA methylation among different organisms, but also reveal the conservation of DNA methylation in all contexts and nucleotide preference for different genomic regions, including genes, exons, and repetitive DNA sequences. It will be helpful to under-stand the epigenetic impacts of cytosine DNA methylation on the regulation of gene expression and maintaining silence of repetitive sequences, such as transposable elements. In this paper, we introduce the preprocessing steps of DNA methylation data, by which cytosine (C) and guanine (G) in the reference sequence are transferred to thymine (T) and adenine (A), and cytosine in reads is transferred to thymine, respectively. We also comprehensively review the main content of the DNA methylation analysis on the genomic scale: (1) the cytosine methylation under the context of different sequences; (2) the distribution of genomic methylcytosine; (3) DNA methylation context and the preference for the nucleotides; (4) DNA- protein interaction sites of DNA methylation; (5) degree of methylation of cytosine in the different structural elements of genes. DNA methylation analysis technique provides a powerful tool for the epigenome study in human and other species, and genes and environment interaction, and founds the theoretical basis for further development of disease diagnostics and therapeutics in human.

  17. Analysis of the Macaca mulatta transcriptome and the sequence divergence between Macaca and human.

    PubMed

    Magness, Charles L; Fellin, P Campion; Thomas, Matthew J; Korth, Marcus J; Agy, Michael B; Proll, Sean C; Fitzgibbon, Matthew; Scherer, Christina A; Miner, Douglas G; Katze, Michael G; Iadonato, Shawn P

    2005-01-01

    We report the initial sequencing and comparative analysis of the Macaca mulatta transcriptome. Cloned sequences from 11 tissues, nine animals, and three species (M. mulatta, M. fascicularis, and M. nemestrina) were sampled, resulting in the generation of 48,642 sequence reads. These data represent an initial sampling of the putative rhesus orthologs for 6,216 human genes. Mean nucleotide diversity within M. mulatta and sequence divergence among M. fascicularis, M. nemestrina, and M. mulatta are also reported.

  18. Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering

    PubMed Central

    Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor

    2015-01-01

    Abstract To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice. PMID:25560745

  19. Resources and costs for microbial sequence analysis evaluated using virtual machines and cloud computing.

    PubMed

    Angiuoli, Samuel V; White, James R; Matalka, Malcolm; White, Owen; Fricke, W Florian

    2011-01-01

    The widespread popularity of genomic applications is threatened by the "bioinformatics bottleneck" resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly. We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS) sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small- to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2), which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers. Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer) invested in 16S rRNA amplicon sequencing, microbial single

  20. Resources and Costs for Microbial Sequence Analysis Evaluated Using Virtual Machines and Cloud Computing

    PubMed Central

    Angiuoli, Samuel V.; White, James R.; Matalka, Malcolm; White, Owen; Fricke, W. Florian

    2011-01-01

    Background The widespread popularity of genomic applications is threatened by the “bioinformatics bottleneck” resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly. Results We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS) sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small- to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2), which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers. Conclusions Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer) invested in 16S r

  1. SIMPLEX: Cloud-Enabled Pipeline for the Comprehensive Analysis of Exome Sequencing Data

    PubMed Central

    Fischer, Maria; Snajder, Rene; Pabinger, Stephan; Dander, Andreas; Schossig, Anna; Zschocke, Johannes; Trajanoski, Zlatko; Stocker, Gernot

    2012-01-01

    In recent studies, exome sequencing has proven to be a successful screening tool for the identification of candidate genes causing rare genetic diseases. Although underlying targeted sequencing methods are well established, necessary data handling and focused, structured analysis still remain demanding tasks. Here, we present a cloud-enabled autonomous analysis pipeline, which comprises the complete exome analysis workflow. The pipeline combines several in-house developed and published applications to perform the following steps: (a) initial quality control, (b) intelligent data filtering and pre-processing, (c) sequence alignment to a reference genome, (d) SNP and DIP detection, (e) functional annotation of variants using different approaches, and (f) detailed report generation during various stages of the workflow. The pipeline connects the selected analysis steps, exposes all available parameters for customized usage, performs required data handling, and distributes computationally expensive tasks either on a dedicated high-performance computing infrastructure or on the Amazon cloud environment (EC2). The presented application has already been used in several research projects including studies to elucidate the role of rare genetic diseases. The pipeline is continuously tested and is publicly available under the GPL as a VirtualBox or Cloud image at http://simplex.i-med.ac.at; additional supplementary data is provided at http://www.icbi.at/exome. PMID:22870267

  2. Interuser Interference Analysis for Direct-Sequence Spread-Spectrum Systems Part I: Partial-Period Cross-Correlation

    NASA Technical Reports Server (NTRS)

    Ni, Jianjun (David)

    2012-01-01

    This presentation discusses an analysis approach to evaluate the interuser interference for Direct-Sequence Spread-Spectrum (DSSS) Systems for Space Network (SN) Users. Part I of this analysis shows that the correlation property of pseudo noise (PN) sequences is the critical factor which determines the interuser interference performance of the DSSS system. For non-standard DSSS systems in which PN sequence s period is much larger than one data symbol duration, it is the partial-period cross-correlation that determines the system performance. This study reveals through an example that a well-designed PN sequence set (e.g. Gold Sequence, in which the cross-correlation for a whole-period is well controlled) may have non-controlled partial-period cross-correlation which could cause severe interuser interference for a DSSS system. Since the analytical derivation of performance metric (bit error rate or signal-to-noise ratio) based on partial-period cross-correlation is prohibitive, the performance degradation due to partial-period cross-correlation will be evaluated using simulation in Part II of this analysis in the future.

  3. Lactobacillus Strain Diversity Based on Partial hsp60 Gene Sequences and Design of PCR-Restriction Fragment Length Polymorphism Assays for Species Identification and Differentiation▿ †

    PubMed Central

    Blaiotta, Giuseppe; Fusco, Vincenzina; Ercolini, Danilo; Aponte, Maria; Pepe, Olimpia; Villani, Francesco

    2008-01-01

    A phylogenetic tree showing diversities among 116 partial (499-bp) Lactobacillus hsp60 (groEL, encoding a 60-kDa heat shock protein) nucleotide sequences was obtained and compared to those previously described for 16S rRNA and tuf gene sequences. The topology of the tree produced in this study showed a Lactobacillus species distribution similar, but not identical, to those previously reported. However, according to the most recent systematic studies, a clear differentiation of 43 single-species clusters was detected/identified among the sequences analyzed. The slightly higher variability of the hsp60 nucleotide sequences than of the 16S rRNA sequences offers better opportunities to design or develop molecular assays allowing identification and differentiation of either distant or very closely related Lactobacillus species. Therefore, our results suggest that hsp60 can be considered an excellent molecular marker for inferring the taxonomy and phylogeny of members of the genus Lactobacillus and that the chosen primers can be used in a simple PCR procedure allowing the direct sequencing of the hsp60 fragments. Moreover, in this study we performed a computer-aided restriction endonuclease analysis of all 499-bp hsp60 partial sequences and we showed that the PCR-restriction fragment length polymorphism (RFLP) patterns obtainable by using both endonucleases AluI and TacI (in separate reactions) can allow identification and differentiation of all 43 Lactobacillus species considered, with the exception of the pair L. plantarum/L. pentosus. However, the latter species can be differentiated by further analysis with Sau3AI or MseI. The hsp60 PCR-RFLP approach was efficiently applied to identify and to differentiate a total of 110 wild Lactobacillus strains (including closely related species, such as L. casei and L. rhamnosus or L. plantarum and L. pentosus) isolated from cheese and dry-fermented sausages. PMID:17993558

  4. Molecular cloning of a cDNA encoding the glycoprotein of hen oviduct microsomal signal peptidase.

    PubMed Central

    Newsome, A L; McLean, J W; Lively, M O

    1992-01-01

    Detergent-solubilized hen oviduct signal peptidase has been characterized previously as an apparent complex of a 19 kDa protein and a 23 kDa glycoprotein (GP23) [Baker & Lively (1987) Biochemistry 26, 8561-8567]. A cDNA clone encoding GP23 from a chicken oviduct lambda gt11 cDNA library has now been characterized. The cDNA encodes a protein of 180 amino acid residues with a single site for asparagine-linked glycosylation that has been directly identified by amino acid sequence analysis of a tryptic-digest peptide containing the glycosylated site. Immunoblot analysis reveals cross-reactivity with a dog pancreas protein. Comparison of the deduced amino acid sequence of GP23 with the 22/23 kDa glycoprotein of dog microsomal signal peptidase [Shelness, Kanwar & Blobel (1988) J. Biol. Chem. 263, 17063-17070], one of five proteins associated with this enzyme, reveals that the amino acid sequences are 90% identical. Thus the signal peptidase glycoprotein is as highly conserved as the sequences of cytochromes c and b from these same species and is likely to be found in a similar form in many, if not all, vertebrate species. The data also show conclusively that the dog and avian signal peptidases have at least one protein subunit in common. Images Fig. 1. PMID:1546959

  5. ANCAC: amino acid, nucleotide, and codon analysis of COGs--a tool for sequence bias analysis in microbial orthologs.

    PubMed

    Meiler, Arno; Klinger, Claudia; Kaufmann, Michael

    2012-09-08

    The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG) within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC's NUCOCOG dataset as the largest one available for that purpose thus far. Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills.

  6. Comparative analysis of the prion protein gene sequences in African lion.

    PubMed

    Wu, Chang-De; Pang, Wan-Yong; Zhao, De-Ming

    2006-10-01

    The prion protein gene of African lion (Panthera Leo) was first cloned and polymorphisms screened. The results suggest that the prion protein gene of eight African lions is highly homogenous. The amino acid sequences of the prion protein (PrP) of all samples tested were identical. Four single nucleotide polymorphisms (C42T, C81A, C420T, T600C) in the prion protein gene (Prnp) of African lion were found, but no amino acid substitutions. Sequence analysis showed that the higher homology is observed to felis catus AF003087 (96.7%) and to sheep number M31313.1 (96.2%) Genbank accessed. With respect to all the mammalian prion protein sequences compared, the African lion prion protein sequence has three amino acid substitutions. The homology might in turn affect the potential intermolecular interactions critical for cross species transmission of prion disease.

  7. Streaming support for data intensive cloud-based sequence analysis.

    PubMed

    Issa, Shadi A; Kienzler, Romeo; El-Kalioby, Mohamed; Tonellato, Peter J; Wall, Dennis; Bruggmann, Rémy; Abouelhoda, Mohamed

    2013-01-01

    Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of "resources-on-demand" and "pay-as-you-go", scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation.

  8. Streaming Support for Data Intensive Cloud-Based Sequence Analysis

    PubMed Central

    Issa, Shadi A.; Kienzler, Romeo; El-Kalioby, Mohamed; Tonellato, Peter J.; Wall, Dennis; Bruggmann, Rémy; Abouelhoda, Mohamed

    2013-01-01

    Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation. PMID:23710461

  9. Characterization of a species-specific repetitive DNA from a highly endangered wild animal, Rhinoceros unicornis, and assessment of genetic polymorphism by microsatellite associated sequence amplification (MASA).

    PubMed

    Ali, S; Azfer, M A; Bashamboo, A; Mathur, P K; Malik, P K; Mathur, V B; Raha, A K; Ansari, S

    1999-03-04

    We have cloned and sequenced a 906bp EcoRI repeat DNA fraction from Rhinoceros unicornis genome. The contig pSS(R)2 is AT rich with 340 A (37.53%), 187 C (20.64%), 173 G (19.09%) and 206 T (22.74%). The sequence contains MALT box, NF-E1, Poly-A signal, lariat consensus sequences, TATA box, translational initiation sequences and several stop codons. Translation of the contig showed seven different types of protein motifs, among which, EGF-like domain cysteine pattern signatures and Bowman-Birk serine protease inhibitor family signatures were prominent. The presence of eukaryotic transcriptional elements, protein signatures and analysis of subset sequences in the 5' region from 1 to 165nt indicating coding potential (test code value=0.97) suggest possible regulatory and/or functional role(s) of these sequences in the rhino genome. Translation of the complementary strand from 906 to 706nt and 190 to 2nt showed proteins of more than 7kDa rich in non-polar residues. This suggests that pSS(R)2 is either a part of, or adjacent to, a functional gene. The contig contains mostly non-consecutive simple repeat units from 2 to 17nt with varying frequencies, of which four base motifs were found to be predominant. Zoo-blot hybridization revealed that pSS(R)2 sequences are unique to R. unicornis genome because they do not cross-hybridize, even with the genomic DNA of South African black rhino Diceros bicornis. Southern blot analysis of R. unicornis genomic DNA with pSS(R)2 and other synthetic oligo probes revealed a high level of genetic homogeneity, which was also substantiated by microsatellite associated sequence amplification (MASA). Owing to its uniqueness, the pSS(R)2 probe has a potential application in the area of conservation biology for unequivocal identification of horn or other body tissues of R. unicornis. The evolutionary aspect of this repeat fraction in the context of comparative genome analysis is discussed.

  10. Release of carrot plasma membrane-associated phosphatidylinositol kinase by phospholipase A2 and activation by a 70 kDa protein.

    PubMed

    Gross, W; Yang, W; Boss, W F

    1992-02-19

    Plasma membranes were isolated from carrot (Daucus carota L.) cells grown in suspension culture and treated with phospholipase A2 from snake or bee venom for 10 min. As a result of this treatment, phosphatidylinositol kinase activity was recovered in the soluble fraction. There was no detectable diacylglycerol kinase or phosphatidylinositol monophosphate kinase activity released from the membranes after the phospholipase A2 treatment. Treating the plasma membranes with phospholipase C or D did not release PI kinase activity. The phospholipase A2-released PI kinase was activated over 2-fold by a heat stable, soluble 70 kDa protein. The partially purified 70 kDa activator increases the Vmax but does not affect the Km of the phospholipase A2-released PI kinase.

  11. Whole genome sequence phylogenetic analysis of four Mexican rabies viruses isolated from cattle.

    PubMed

    Bárcenas-Reyes, I; Loza-Rubio, E; Cantó-Alarcón, G J; Luna-Cozar, J; Enríquez-Vázquez, A; Barrón-Rodríguez, R J; Milián-Suazo, F

    2017-08-01

    Phylogenetic analysis of the rabies virus in molecular epidemiology has been traditionally performed on partial sequences of the genome, such as the N, G, and P genes; however, that approach raises concerns about the discriminatory power compared to whole genome sequencing. In this study we characterized four strains of the rabies virus isolated from cattle in Querétaro, Mexico by comparing the whole genome sequence to that of strains from the American, European and Asian continents. Four cattle brain samples positive to rabies and characterized as AgV11, genotype 1, were used in the study. A cDNA sequence was generated by reverse transcription PCR (RT-PCR) using oligo dT. cDNA samples were sequenced in an Illumina NextSeq 500 platform. The phylogenetic analysis was performed with MEGA 6.0. Minimum evolution phylogenetic trees were constructed with the Neighbor-Joining method and bootstrapped with 1000 replicates. Three large and seven small clusters were formed with the 26 sequences used. The largest cluster grouped strains from different species in South America: Brazil, and the French Guyana. The second cluster grouped five strains from Mexico. A Mexican strain reported in a different study was highly related to our four strains, suggesting common source of infection. The phylogenetic analysis shows that the type of host is different for the different regions in the American Continent; rabies is more related to bats. It was concluded that the rabies virus in central Mexico is genetically stable and that it is transmitted by the vampire bat Desmodus rotundus. Copyright © 2017 Elsevier Ltd. All rights reserved.

  12. Tn5401, a new class II transposable element from Bacillus thuringiensis.

    PubMed Central

    Baum, J A

    1994-01-01

    A new class II (Tn3-like) transposable element, designated Tn5401, was recovered from a sporulation-deficient variant of Bacillus thuringiensis subsp. morrisoni EG2158 following its insertion into a recombinant plasmid. Sequence analysis of the insert revealed a 4,837-bp transposon with two large open reading frames, in the same orientation, encoding proteins of 36 kDa (306 residues) and 116 kDa (1,005 residues) and 53-bp terminal inverted repeats. The deduced amino acid sequence for the 36-kDa protein shows 24% sequence identity with the TnpI recombinase of the B. thuringiensis transposon Tn4430, a member of the phage integrase family of site-specific recombinases. The deduced amino acid sequence for the 116-kDa protein shows 42% sequence identity with the transposase of Tn3 but only 28% identity with the TnpA transposase of Tn4430. Two small open reading frames of unknown function, designated orf1 (85 residues) and orf2 (74 residues), were also identified. Southern blot analysis indicated that Tn5401, in contrast to Tn4430, is not commonly found among different subspecies of B. thuringiensis and is not typically associated with known insecticidal crystal protein genes. Transposition was studied with B. thuringiensis by using plasmid pEG922, a temperature-sensitive shuttle vector containing Tn5401. Tn5401 transposed to both chromosomal and plasmid target sites but displayed an apparent preference for plasmid sites. Transposition was replicative and resulted in the generation of a 5-bp duplication at the target site. Transcriptional start sites within Tn5401 were mapped by primer extension analysis. Two promoters, designated PL and PR, direct the transcription of orf1-orf2 and tnpI-tnpA, respectively, and are negatively regulated by TnpI. Sequence comparison of the promoter regions of Tn5401 and Tn4430 suggests that the conserved sequence element ATGTCCRCTAAY mediates TnpI binding and cointegrate resolution. The same element is contained within the 53-bp terminal

  13. Genome sequence analysis of dengue virus 1 isolated in Key West, Florida.

    PubMed

    Shin, Dongyoung; Richards, Stephanie L; Alto, Barry W; Bettinardi, David J; Smartt, Chelsea T

    2013-01-01

    Dengue virus (DENV) is transmitted to humans through the bite of mosquitoes. In November 2010, a dengue outbreak was reported in Monroe County in southern Florida (FL), including greater than 20 confirmed human cases. The virus collected from the human cases was verified as DENV serotype 1 (DENV-1) and one isolate was provided for sequence analysis. RNA was extracted from the DENV-1 isolate and was used in reverse transcription polymerase chain reaction (RT-PCR) to amplify PCR fragments to sequence. Nucleic acid primers were designed to generate overlapping PCR fragments that covered the entire genome. The DENV-1 isolate found in Key West (KW), FL was sequenced for whole genome characterization. Sequence assembly, Genbank searches, and recombination analyses were performed to verify the identity of the genome sequences and to determine percent similarity to known DENV-1 sequences. We show that the KW DENV-1 strain is 99% identical to Nicaraguan and Mexican DENV-1 strains. Phylogenetic and recombination analyses suggest that the DENV-1 isolated in KW originated from Nicaragua (NI) and the KW strain may circulate in KW. Also, recombination analysis results detected recombination events in the KW strain compared to DENV-1 strains from Puerto Rico. We evaluate the relative growth of KW strain of DENV-1 compared to other dengue viruses to determine whether the underlying genetics of the strain is associated with a replicative advantage, an important consideration since local transmission of DENV may result because domestic tourism can spread DENVs.

  14. Multilocus sequence analysis of phytopathogenic species of the genus Streptomyces

    USDA-ARS?s Scientific Manuscript database

    The identification and classification of species within the genus Streptomyces is difficult because there are presently 576 validly described species and this number increases every year. The value of the application of multilocus sequence analysis scheme to the systematics of Streptomyces species h...

  15. Characterization of photosystem 1 chlorophyll a/b-binding apoprotein accumulation in developing soybean using type-specific antibodies

    NASA Technical Reports Server (NTRS)

    Henry, R. L.; Armbrust, T.; Gallegos, G.; Guikema, J. A.; Spooner, B. S. (Principal Investigator)

    1992-01-01

    The structure and supramolecular assembly of the soybean photosystem 1 (PS 1) chlorophyll a/b-binding antenna (LHC 1) was examined. We identified the subunit composition of LHC 1 in soybean and followed the accumulation of individual subunits during light-induced assembly. We observed four LHC 1 subunits, at 23, 22, 21 and 20.5 kDa, obtained partial sequence information by amino-terminal sequence analysis, and classified the 20.5, 22, and 21 kDa subunits as being encoded by type I, II, and IV chlorophyll a/b binding protein genes, respectively. Antisera against LHC 1 subunits were used to follow the accumulation of individual subunits during the light-initiated transition from etioplast to chloroplast. Several points are noteworthy. First, monospecific antibody against the 22 kDa subunit decorated a 25 kDa peptide in etiolated tissue, which declined during maturation. This decline correlated with the light-induced appearance of mature 22 kDa peptide, suggesting a precursor/product relationship. Second, the same antibody identified a 22 kDa protein in mature corn, but not a larger band in etiolated corn, suggesting that LHC 1 accumulation is regulated differently between species before the onset of chlorophyll biosynthesis. Third, the mature 22 kDa subunit appeared somewhat later than the other LHC 1 peptides during greening, implying that this subunit is less intimately associated with the PS1 core than are the subunits appearing earlier in development.

  16. Integrative analysis of environmental sequences using MEGAN4.

    PubMed

    Huson, Daniel H; Mitra, Suparna; Ruscheweyh, Hans-Joachim; Weber, Nico; Schuster, Stephan C

    2011-09-01

    A major challenge in the analysis of environmental sequences is data integration. The question is how to analyze different types of data in a unified approach, addressing both the taxonomic and functional aspects. To facilitate such analyses, we have substantially extended MEGAN, a widely used taxonomic analysis program. The new program, MEGAN4, provides an integrated approach to the taxonomic and functional analysis of metagenomic, metatranscriptomic, metaproteomic, and rRNA data. While taxonomic analysis is performed based on the NCBI taxonomy, functional analysis is performed using the SEED classification of subsystems and functional roles or the KEGG classification of pathways and enzymes. A number of examples illustrate how such analyses can be performed, and show that one can also import and compare classification results obtained using others' tools. MEGAN4 is freely available for academic purposes, and installers for all three major operating systems can be downloaded from www-ab.informatik.uni-tuebingen.de/software/megan.

  17. The stylar 120 kDa glycoprotein is required for S-specific pollen rejection in Nicotiana.

    PubMed

    Hancock, C Nathan; Kent, Lia; McClure, Bruce A

    2005-09-01

    S-RNase participates in at least three mechanisms of pollen rejection. It functions in S-specific pollen rejection (self-incompatibility) and in at least two distinct interspecific mechanisms of pollen rejection in Nicotiana. S-specific pollen rejection and rejection of pollen from Nicotiana plumbaginifolia also require additional stylar proteins. Transmitting-tract-specific (TTS) protein, 120 kDa glycoprotein (120K) and pistil extensin-like protein III (PELP III) are stylar glycoproteins that bind S-RNase in vitro and are also known to interact with pollen. Here we tested whether these glycoproteins have a direct role in pollen rejection. 120K shows the most polymorphism in size between Nicotiana species. Larger 120K-like proteins are often correlated with S-specific pollen rejection. Sequencing results suggest that the polymorphism primarily reflects differences in glycosylation, although indels also occur in the predicted polypeptides. Using RNA interference (RNAi), we suppressed expression of 120K to determine if it is required for S-specific pollen rejection. Transgenic SC N. plumbaginifolia x SI Nicotiana alata (S105S105 or SC10SC10) hybrids with no detectable 120K were unable to perform S-specific pollen rejection. Thus, 120K has a direct role in S-specific pollen rejection. However, suppression of 120K had no effect on rejection of N. plumbaginifolia pollen. In contrast, suppression of HT-B, a factor previously implicated in S-specific pollen rejection, disrupts rejection of N. plumbaginifolia pollen. Thus, S-specific pollen rejection and rejection of N. plumbaginifolia pollen are mechanistically distinct, because they require different non-S-RNase factors.

  18. The Swiss-Army-Knife Approach to the Nearly Automatic Analysis for Microearthquake Sequences.

    NASA Astrophysics Data System (ADS)

    Kraft, T.; Simon, V.; Tormann, T.; Diehl, T.; Herrmann, M.

    2017-12-01

    Many Swiss earthquake sequence have been studied using relative location techniques, which often allowed to constrain the active fault planes and shed light on the tectonic processes that drove the seismicity. Yet, in the majority of cases the number of located earthquakes was too small to infer the details of the space-time evolution of the sequences, or their statistical properties. Therefore, it has mostly been impossible to resolve clear patterns in the seismicity of individual sequences, which are needed to improve our understanding of the mechanisms behind them. Here we present a nearly automatic workflow that combines well-established seismological analysis techniques and allows to significantly improve the completeness of detected and located earthquakes of a sequence. We start from the manually timed routine catalog of the Swiss Seismological Service (SED), which contains the larger events of a sequence. From these well-analyzed earthquakes we dynamically assemble a template set and perform a matched filter analysis on the station with: the best SNR for the sequence; and a recording history of at least 10-15 years, our typical analysis period. This usually allows us to detect events several orders of magnitude below the SED catalog detection threshold. The waveform similarity of the events is then further exploited to derive accurate and consistent magnitudes. The enhanced catalog is then analyzed statistically to derive high-resolution time-lines of the a- and b-value and consequently the occurrence probability of larger events. Many of the detected events are strong enough to be located using double-differences. No further manual interaction is needed; we simply time-shift the arrival-time pattern of the detecting template to the associated detection. Waveform similarity assures a good approximation of the expected arrival-times, which we use to calculate event-pair arrival-time differences by cross correlation. After a SNR and cycle-skipping quality

  19. Purification of a 6.5 kDa protease inhibitor from Amazon Inga umbratica seeds effective against serine proteases of the boll weevil Anthonomus grandis.

    PubMed

    Calderon, L A; Teles, R C L; Leite, J R S A; Franco, O L; Grossi-de-Sá, M F; Medrano, F J; Bloch, C; Freitas, S M

    2005-08-01

    A 6.5 kDa serine protease inhibitor was purified by anion-exchange chromatography from the crude extract of the Inga umbratica seeds, containing inhibitor isoforms ranging from 6.3 to 6.7 kDa and protease inhibitors of approximately 19 kDa. The purified protein was characterized as a potent inhibitor against trypsin and chymotrypsin and it was named I. umbratica trypsin and chymotrypsin inhibitor (IUTCI). MALDI-TOF spectra of the IUTCI, in the presence of DTT, showed six disulfide bonds content, suggesting that this inhibitor belongs to Bowman-Birk family. The circular dichroism spectroscopy indicates that IUTCI is predominantly formed by unordered and beta-sheet secondary structure. It was also characterized, by fluorescence spectroscopy, as a stable protein at range of pH from 5.0 to 7.0. Moreover, this inhibitor at concentration of 75 microM presented a remarkable inhibitory activity (60%) against digestive serine proteases from boll weevil Anthonomus grandis, an important economical cotton pest.

  20. Cost analysis of whole genome sequencing in German clinical practice.

    PubMed

    Plöthner, Marika; Frank, Martin; von der Schulenburg, J-Matthias Graf

    2017-06-01

    Whole genome sequencing (WGS) is an emerging tool in clinical diagnostics. However, little has been said about its procedure costs, owing to a dearth of related cost studies. This study helps fill this research gap by analyzing the execution costs of WGS within the setting of German clinical practice. First, to estimate costs, a sequencing process related to clinical practice was undertaken. Once relevant resources were identified, a quantification and monetary evaluation was conducted using data and information from expert interviews with clinical geneticists, and personnel at private enterprises and hospitals. This study focuses on identifying the costs associated with the standard sequencing process, and the procedure costs for a single WGS were analyzed on the basis of two sequencing platforms-namely, HiSeq 2500 and HiSeq Xten, both by Illumina, Inc. In addition, sensitivity analyses were performed to assess the influence of various uses of sequencing platforms and various coverage values on a fixed-cost degression. In the base case scenario-which features 80 % utilization and 30-times coverage-the cost of a single WGS analysis with the HiSeq 2500 was estimated at €3858.06. The cost of sequencing materials was estimated at €2848.08; related personnel costs of €396.94 and acquisition/maintenance costs (€607.39) were also found. In comparison, the cost of sequencing that uses the latest technology (i.e., HiSeq Xten) was approximately 63 % cheaper, at €1411.20. The estimated costs of WGS currently exceed the prediction of a 'US$1000 per genome', by more than a factor of 3.8. In particular, the material costs in themselves exceed this predicted cost.

  1. Euglena gracilis chloroplast DNA: analysis of a 1.6 kb intron of the psb C gene containing an open reading frame of 458 codons.

    PubMed

    Montandon, P E; Vasserot, A; Stutz, E

    1986-01-01

    We retrieved a 1.6 kbp intron separating two exons of the psb C gene which codes for the 44 kDa reaction center protein of photosystem II. This intron is 3 to 4 times the size of all previously sequenced Euglena gracilis chloroplast introns. It contains an open reading frame of 458 codons potentially coding for a basic protein of 54 kDa of yet unknown function. The intron boundaries follow consensus sequences established for chloroplast introns related to class II and nuclear pre-mRNA introns. Its 3'-terminal segment has structural features similar to class II mitochondrial introns with an invariant base A as possible branch point for lariat formation.

  2. Investigation of the human disease osteogenesis imperfecta: a research-based introduction to concepts and skills in biomolecular analysis.

    PubMed

    Mate, Karen; Sim, Alistair; Weidenhofer, Judith; Milward, Liz; Scott, Judith

    2013-01-01

    A blended approach encompassing problem-based learning (PBL) and structured inquiry was used in this laboratory exercise based on the congenital disease Osteogenesis imperfecta (OI), to introduce commonly used techniques in biomolecular analysis within a clinical context. During a series of PBL sessions students were presented with several scenarios involving a 2 year old child, who had experienced numerous fractures. Key learning goals related to both the theory and practical aspects of the course, covering biomolecular analysis and functional genomics, were identified in successive PBL sessions. The laboratory exercises were conducted in 3 hour blocks over six weeks, focused firstly on protein analysis, followed by nucleic acids. Students isolated collagen from normal and OI affected fibroblast cultures. Analysis by SDS-PAGE demonstrated α1 and α2 of collagen Type I chains at approximately 95 kDa and 92 kDa, respectively. Subtle differences in protein mobility between the control and OI samples were observed by some students, but most considered it inconclusive as a diagnostic tool. The nucleic acid module involved isolation of RNA from OI affected fibroblasts. The RNA was reverse transcribed and used as template to amplify a 354 bp COL1A1 fragment. Students were provided with the sequence of the OI affected COL1A1 PCR product aligned with the normal COL1A1 sequence, allowing identification of the mutation, as the substitution of Arg for Gly(976) of the triple helical region. Our experience with student cohorts over several years is that presentation of this laboratory exercise within a relevant clinical context, and the opportunity for active engagement with the experimental procedures via PBL sessions, supported the learning of basic theory and practical techniques of biomolecular analysis. Copyright © 2013 International Union of Biochemistry and Molecular Biology, Inc.

  3. On avoided words, absent words, and their application to biological sequence analysis.

    PubMed

    Almirantis, Yannis; Charalampopoulos, Panagiotis; Gao, Jia; Iliopoulos, Costas S; Mohamed, Manal; Pissis, Solon P; Polychronopoulos, Dimitris

    2017-01-01

    The deviation of the observed frequency of a word w from its expected frequency in a given sequence x is used to determine whether or not the word is avoided . This concept is particularly useful in DNA linguistic analysis. The value of the deviation of w , denoted by [Formula: see text], effectively characterises the extent of a word by its edge contrast in the context in which it occurs. A word w of length [Formula: see text] is a [Formula: see text]-avoided word in x if [Formula: see text], for a given threshold [Formula: see text]. Notice that such a word may be completely absent from x . Hence, computing all such words naïvely can be a very time-consuming procedure, in particular for large k . In this article, we propose an [Formula: see text]-time and [Formula: see text]-space algorithm to compute all [Formula: see text]-avoided words of length k in a given sequence of length n over a fixed-sized alphabet. We also present a time-optimal [Formula: see text]-time algorithm to compute all [Formula: see text]-avoided words (of any length) in a sequence of length n over an integer alphabet of size [Formula: see text]. In addition, we provide a tight asymptotic upper bound for the number of [Formula: see text]-avoided words over an integer alphabet and the expected length of the longest one. We make available an implementation of our algorithm. Experimental results, using both real and synthetic data, show the efficiency and applicability of our implementation in biological sequence analysis. The systematic search for avoided words is particularly useful for biological sequence analysis. We present a linear-time and linear-space algorithm for the computation of avoided words of length k in a given sequence x . We suggest a modification to this algorithm so that it computes all avoided words of x , irrespective of their length, within the same time complexity. We also present combinatorial results with regards to avoided words and absent words.

  4. Functional analysis of Pacific oyster (Crassostrea gigas) β-thymosin: Focus on antimicrobial activity.

    PubMed

    Nam, Bo-Hye; Seo, Jung-Kil; Lee, Min Jeong; Kim, Young-Ok; Kim, Dong-Gyun; An, Cheul Min; Park, Nam Gyu

    2015-07-01

    An antimicrobial peptide, ∼5 kDa in size, was isolated and purified in its active form from the mantle of the Pacific oyster Crassostrea gigas by C18 reversed-phase high-performance liquid chromatography. Matrix-assisted laser desorption ionisation time-of-flight analysis revealed 4656.4 Da of the purified and unreduced peptide. A comparison of the N-terminal amino acid sequence of oyster antimicrobial peptide with deduced amino acid sequences in our local expressed sequence tag (EST) database of C. gigas (unpublished data) revealed that the oyster antimicrobial peptide sequence entirely matched the deduced amino acid sequence of an EST clone (HM-8_A04), which was highly homologous with the β-thymosin of other species. The cDNA possessed a 126-bp open reading frame that encoded a protein of 41 amino acids. To confirm the antimicrobial activity of C. gigas β-thymosin, we overexpressed a recombinant β-thymosin (rcgTβ) using a pET22 expression plasmid in an Escherichia coli system. The antimicrobial activity of rcgTβ was evaluated and demonstrated using a bacterial growth inhibition test in both liquid and solid cultures. Copyright © 2015 Elsevier Ltd. All rights reserved.

  5. Characterization of a GHF45 cellulase, AkEG21, from the common sea hare Aplysia kurodai

    NASA Astrophysics Data System (ADS)

    Rahman, Mohammad; Inoue, Akira; Ojima, Takao

    2014-08-01

    The common sea hare Aplysia kurodai is known to be a good source for the enzymes degrading seaweed polysaccharides. Recently four cellulases, i.e., 95 kDa, 66 kDa, 45 kDa and 21 kDa enzymes, were isolated from A. kurodai (Tsuji et al., PLoS ONE, 8, e65418, 2013). The former three cellulases were regarded as glycosyl-hydrolase-family 9 (GHF9) enzymes, while the 21 kDa cellulase was suggested to be a GHF45 enzyme. The 21 kDa cellulase was significantly heat stable, and appeared to be advantageous in performing heterogeneous expression and protein-engineering study. In the present study, we determined some enzymatic properties of the 21 kDa cellulase and cloned its cDNA to provide the basis for the protein engineering study of this cellulase. The purified 21 kDa enzyme, termed AkEG21 in the present study, hydrolyzed carboxymethyl cellulose with an optimal pH and temperature at 4.5 and 40oC, respectively. AkEG21 was considerably heat-stable, i.e., it was not inactivated by the incubation at 55oC for 30 min. AkEG21 degraded phosphoric-acid-swollen cellulose producing cellotriose and cellobiose as major end products but hardly degraded oligosaccharides smaller than tetrasaccharide. This indicated that AkEG21 is an endolytic ?-1,4-glucanase (EC 3.2.1.4). A cDNA of 1,013 bp encoding AkEG21 was amplified by PCR and the amino-acid sequence of 197 residues was deduced. The sequence comprised the initiation Met, the putative signal peptide of 16 residues for secretion and the catalytic domain of 180 residues, which lined from the N-terminus in this order. The sequence of the catalytic domain showed 47-62% amino-acid identities to those of GHF45 cellulases reported in other mollusks. Both the catalytic residues and the N-glycosylation residues known in other GHF45 cellulases were conserved in AkEG21. Phylogenetic analysis for the amino-acid sequences suggested the close relation between AkEG21 and fungal GHF45 cellulases.

  6. Evaluation of next generation sequencing for the analysis of Eimeria communities in wildlife.

    PubMed

    Vermeulen, Elke T; Lott, Matthew J; Eldridge, Mark D B; Power, Michelle L

    2016-05-01

    Next-generation sequencing (NGS) techniques are well-established for studying bacterial communities but not yet for microbial eukaryotes. Parasite communities remain poorly studied, due in part to the lack of reliable and accessible molecular methods to analyse eukaryotic communities. We aimed to develop and evaluate a methodology to analyse communities of the protozoan parasite Eimeria from populations of the Australian marsupial Petrogale penicillata (brush-tailed rock-wallaby) using NGS. An oocyst purification method for small sample sizes and polymerase chain reaction (PCR) protocol for the 18S rRNA locus targeting Eimeria was developed and optimised prior to sequencing on the Illumina MiSeq platform. A data analysis approach was developed by modifying methods from bacterial metagenomics and utilising existing Eimeria sequences in GenBank. Operational taxonomic unit (OTU) assignment at a high similarity threshold (97%) was more accurate at assigning Eimeria contigs into Eimeria OTUs but at a lower threshold (95%) there was greater resolution between OTU consensus sequences. The assessment of two amplification PCR methods prior to Illumina MiSeq, single and nested PCR, determined that single PCR was more sensitive to Eimeria as more Eimeria OTUs were detected in single amplicons. We have developed a simple and cost-effective approach to a data analysis pipeline for community analysis of eukaryotic organisms using Eimeria communities as a model. The pipeline provides a basis for evaluation using other eukaryotic organisms and potential for diverse community analysis studies. Copyright © 2016 Elsevier B.V. All rights reserved.

  7. Sequence and analysis of chromosome 4 of the plant Arabidopsis thaliana.

    PubMed

    Mayer, K; Schüller, C; Wambutt, R; Murphy, G; Volckaert, G; Pohl, T; Düsterhöft, A; Stiekema, W; Entian, K D; Terryn, N; Harris, B; Ansorge, W; Brandt, P; Grivell, L; Rieger, M; Weichselgartner, M; de Simone, V; Obermaier, B; Mache, R; Müller, M; Kreis, M; Delseny, M; Puigdomenech, P; Watson, M; Schmidtheini, T; Reichert, B; Portatelle, D; Perez-Alonso, M; Boutry, M; Bancroft, I; Vos, P; Hoheisel, J; Zimmermann, W; Wedler, H; Ridley, P; Langham, S A; McCullagh, B; Bilham, L; Robben, J; Van der Schueren, J; Grymonprez, B; Chuang, Y J; Vandenbussche, F; Braeken, M; Weltjens, I; Voet, M; Bastiaens, I; Aert, R; Defoor, E; Weitzenegger, T; Bothe, G; Ramsperger, U; Hilbert, H; Braun, M; Holzer, E; Brandt, A; Peters, S; van Staveren, M; Dirske, W; Mooijman, P; Klein Lankhorst, R; Rose, M; Hauf, J; Kötter, P; Berneiser, S; Hempel, S; Feldpausch, M; Lamberth, S; Van den Daele, H; De Keyser, A; Buysshaert, C; Gielen, J; Villarroel, R; De Clercq, R; Van Montagu, M; Rogers, J; Cronin, A; Quail, M; Bray-Allen, S; Clark, L; Doggett, J; Hall, S; Kay, M; Lennard, N; McLay, K; Mayes, R; Pettett, A; Rajandream, M A; Lyne, M; Benes, V; Rechmann, S; Borkova, D; Blöcker, H; Scharfe, M; Grimm, M; Löhnert, T H; Dose, S; de Haan, M; Maarse, A; Schäfer, M; Müller-Auer, S; Gabel, C; Fuchs, M; Fartmann, B; Granderath, K; Dauner, D; Herzl, A; Neumann, S; Argiriou, A; Vitale, D; Liguori, R; Piravandi, E; Massenet, O; Quigley, F; Clabauld, G; Mündlein, A; Felber, R; Schnabl, S; Hiller, R; Schmidt, W; Lecharny, A; Aubourg, S; Chefdor, F; Cooke, R; Berger, C; Montfort, A; Casacuberta, E; Gibbons, T; Weber, N; Vandenbol, M; Bargues, M; Terol, J; Torres, A; Perez-Perez, A; Purnelle, B; Bent, E; Johnson, S; Tacon, D; Jesse, T; Heijnen, L; Schwarz, S; Scholler, P; Heber, S; Francs, P; Bielke, C; Frishman, D; Haase, D; Lemcke, K; Mewes, H W; Stocker, S; Zaccaria, P; Bevan, M; Wilson, R K; de la Bastide, M; Habermann, K; Parnell, L; Dedhia, N; Gnoj, L; Schutz, K; Huang, E; Spiegel, L; Sehkon, M; Murray, J; Sheet, P; Cordes, M; Abu-Threideh, J; Stoneking, T; Kalicki, J; Graves, T; Harmon, G; Edwards, J; Latreille, P; Courtney, L; Cloud, J; Abbott, A; Scott, K; Johnson, D; Minx, P; Bentley, D; Fulton, B; Miller, N; Greco, T; Kemp, K; Kramer, J; Fulton, L; Mardis, E; Dante, M; Pepin, K; Hillier, L; Nelson, J; Spieth, J; Ryan, E; Andrews, S; Geisel, C; Layman, D; Du, H; Ali, J; Berghoff, A; Jones, K; Drone, K; Cotton, M; Joshu, C; Antonoiu, B; Zidanic, M; Strong, C; Sun, H; Lamar, B; Yordan, C; Ma, P; Zhong, J; Preston, R; Vil, D; Shekher, M; Matero, A; Shah, R; Swaby, I K; O'Shaughnessy, A; Rodriguez, M; Hoffmann, J; Till, S; Granat, S; Shohdy, N; Hasegawa, A; Hameed, A; Lodhi, M; Johnson, A; Chen, E; Marra, M; Martienssen, R; McCombie, W R

    1999-12-16

    The higher plant Arabidopsis thaliana (Arabidopsis) is an important model for identifying plant genes and determining their function. To assist biological investigations and to define chromosome structure, a coordinated effort to sequence the Arabidopsis genome was initiated in late 1996. Here we report one of the first milestones of this project, the sequence of chromosome 4. Analysis of 17.38 megabases of unique sequence, representing about 17% of the genome, reveals 3,744 protein coding genes, 81 transfer RNAs and numerous repeat elements. Heterochromatic regions surrounding the putative centromere, which has not yet been completely sequenced, are characterized by an increased frequency of a variety of repeats, new repeats, reduced recombination, lowered gene density and lowered gene expression. Roughly 60% of the predicted protein-coding genes have been functionally characterized on the basis of their homology to known genes. Many genes encode predicted proteins that are homologous to human and Caenorhabditis elegans proteins.

  8. Suitability of partial 16S ribosomal RNA gene sequence analysis for the identification of dangerous bacterial pathogens.

    PubMed

    Ruppitsch, W; Stöger, A; Indra, A; Grif, K; Schabereiter-Gurtner, C; Hirschl, A; Allerberger, F

    2007-03-01

    In a bioterrorism event a rapid tool is needed to identify relevant dangerous bacteria. The aim of the study was to assess the usefulness of partial 16S rRNA gene sequence analysis and the suitability of diverse databases for identifying dangerous bacterial pathogens. For rapid identification purposes a 500-bp fragment of the 16S rRNA gene of 28 isolates comprising Bacillus anthracis, Brucella melitensis, Burkholderia mallei, Burkholderia pseudomallei, Francisella tularensis, Yersinia pestis, and eight genus-related and unrelated control strains was amplified and sequenced. The obtained sequence data were submitted to three public and two commercial sequence databases for species identification. The most frequent reason for incorrect identification was the lack of the respective 16S rRNA gene sequences in the database. Sequence analysis of a 500-bp 16S rDNA fragment allows the rapid identification of dangerous bacterial species. However, for discrimination of closely related species sequencing of the entire 16S rRNA gene, additional sequencing of the 23S rRNA gene or sequencing of the 16S-23S rRNA intergenic spacer is essential. This work provides comprehensive information on the suitability of partial 16S rDNA analysis and diverse databases for rapid and accurate identification of dangerous bacterial pathogens.

  9. MethVisual - visualization and exploratory statistical analysis of DNA methylation profiles from bisulfite sequencing.

    PubMed

    Zackay, Arie; Steinhoff, Christine

    2010-12-15

    Exploration of DNA methylation and its impact on various regulatory mechanisms has become a very active field of research. Simultaneously there is an arising need for tools to process and analyse the data together with statistical investigation and visualisation. MethVisual is a new application that enables exploratory analysis and intuitive visualization of DNA methylation data as is typically generated by bisulfite sequencing. The package allows the import of DNA methylation sequences, aligns them and performs quality control comparison. It comprises basic analysis steps as lollipop visualization, co-occurrence display of methylation of neighbouring and distant CpG sites, summary statistics on methylation status, clustering and correspondence analysis. The package has been developed for methylation data but can be also used for other data types for which binary coding can be inferred. The application of the package, as well as a comparison to existing DNA methylation analysis tools and its workflow based on two datasets is presented in this paper. The R package MethVisual offers various analysis procedures for data that can be binarized, in particular for bisulfite sequenced methylation data. R/Bioconductor has become one of the most important environments for statistical analysis of various types of biological and medical data. Therefore, any data analysis within R that allows the integration of various data types as provided from different technological platforms is convenient. It is the first and so far the only specific package for DNA methylation analysis, in particular for bisulfite sequenced data available in R/Bioconductor enviroment. The package is available for free at http://methvisual.molgen.mpg.de/ and from the Bioconductor Consortium http://www.bioconductor.org.

  10. MethVisual - visualization and exploratory statistical analysis of DNA methylation profiles from bisulfite sequencing

    PubMed Central

    2010-01-01

    Background Exploration of DNA methylation and its impact on various regulatory mechanisms has become a very active field of research. Simultaneously there is an arising need for tools to process and analyse the data together with statistical investigation and visualisation. Findings MethVisual is a new application that enables exploratory analysis and intuitive visualization of DNA methylation data as is typically generated by bisulfite sequencing. The package allows the import of DNA methylation sequences, aligns them and performs quality control comparison. It comprises basic analysis steps as lollipop visualization, co-occurrence display of methylation of neighbouring and distant CpG sites, summary statistics on methylation status, clustering and correspondence analysis. The package has been developed for methylation data but can be also used for other data types for which binary coding can be inferred. The application of the package, as well as a comparison to existing DNA methylation analysis tools and its workflow based on two datasets is presented in this paper. Conclusions The R package MethVisual offers various analysis procedures for data that can be binarized, in particular for bisulfite sequenced methylation data. R/Bioconductor has become one of the most important environments for statistical analysis of various types of biological and medical data. Therefore, any data analysis within R that allows the integration of various data types as provided from different technological platforms is convenient. It is the first and so far the only specific package for DNA methylation analysis, in particular for bisulfite sequenced data available in R/Bioconductor enviroment. The package is available for free at http://methvisual.molgen.mpg.de/ and from the Bioconductor Consortium http://www.bioconductor.org. PMID:21159174

  11. Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis

    PubMed Central

    Steele, Joe; Bastola, Dhundy

    2014-01-01

    Modern sequencing and genome assembly technologies have provided a wealth of data, which will soon require an analysis by comparison for discovery. Sequence alignment, a fundamental task in bioinformatics research, may be used but with some caveats. Seminal techniques and methods from dynamic programming are proving ineffective for this work owing to their inherent computational expense when processing large amounts of sequence data. These methods are prone to giving misleading information because of genetic recombination, genetic shuffling and other inherent biological events. New approaches from information theory, frequency analysis and data compression are available and provide powerful alternatives to dynamic programming. These new methods are often preferred, as their algorithms are simpler and are not affected by synteny-related problems. In this review, we provide a detailed discussion of computational tools, which stem from alignment-free methods based on statistical analysis from word frequencies. We provide several clear examples to demonstrate applications and the interpretations over several different areas of alignment-free analysis such as base–base correlations, feature frequency profiles, compositional vectors, an improved string composition and the D2 statistic metric. Additionally, we provide detailed discussion and an example of analysis by Lempel–Ziv techniques from data compression. PMID:23904502

  12. Comparative sequence analysis of Sordaria macrospora and Neurospora crassa as a means to improve genome annotation.

    PubMed

    Nowrousian, Minou; Würtz, Christian; Pöggeler, Stefanie; Kück, Ulrich

    2004-03-01

    One of the most challenging parts of large scale sequencing projects is the identification of functional elements encoded in a genome. Recently, studies of genomes of up to six different Saccharomyces species have demonstrated that a comparative analysis of genome sequences from closely related species is a powerful approach to identify open reading frames and other functional regions within genomes [Science 301 (2003) 71, Nature 423 (2003) 241]. Here, we present a comparison of selected sequences from Sordaria macrospora to their corresponding Neurospora crassa orthologous regions. Our analysis indicates that due to the high degree of sequence similarity and conservation of overall genomic organization, S. macrospora sequence information can be used to simplify the annotation of the N. crassa genome.

  13. Conjugation of 10 kDa Linear PEG onto Trastuzumab Fab' Is Sufficient to Significantly Enhance Lymphatic Exposure while Preserving in Vitro Biological Activity.

    PubMed

    Chan, Linda J; Ascher, David B; Yadav, Rajbharan; Bulitta, Jürgen B; Williams, Charlotte C; Porter, Christopher J H; Landersdorfer, Cornelia B; Kaminskas, Lisa M

    2016-04-04

    The lymphatic system is a major conduit by which many diseases spread and proliferate. There is therefore increasing interest in promoting better lymphatic drug targeting. Further, antibody fragments such as Fabs have several advantages over full length monoclonal antibodies but are subject to rapid plasma clearance, which can limit the lymphatic exposure and activity of Fabs against lymph-resident diseases. This study therefore explored ideal PEGylation strategies to maximize biological activity and lymphatic exposure using trastuzumab Fab' as a model. Specifically, the Fab' was conjugated with single linear 10 or 40 kDa PEG chains at the hinge region. PEGylation led to a 3-4-fold reduction in binding affinity to HER2, but antiproliferative activity against HER2-expressing BT474 cells was preserved. Lymphatic pharmacokinetics were then examined in thoracic lymph duct cannulated rats after intravenous and subcutaneous dosing at 2 mg/kg, and the data were evaluated via population pharmacokinetic modeling. The Fab' displayed limited lymphatic exposure, but conjugation of 10 kDa PEG improved exposure by approximately 11- and 5-fold after intravenous (15% dose collected in thoracic lymph over 30 h) and subcutaneous (9%) administration, respectively. Increasing the molecular weight of the PEG to 40 kDa, however, had no significant impact on lymphatic exposure after intravenous (14%) administration and only doubled lymphatic exposure after subcutaneous administration (18%) when compared to 10 kDa PEG-Fab'. The data therefore suggests that minimal PEGylation has the potential to enhance the exposure and activity of Fab's against lymph-resident diseases, while no significant benefit is achieved with very large PEGs.

  14. Analysis of simulated image sequences from sensors for restricted-visibility operations

    NASA Technical Reports Server (NTRS)

    Kasturi, Rangachar

    1991-01-01

    A real time model of the visible output from a 94 GHz sensor, based on a radiometric simulation of the sensor, was developed. A sequence of images as seen from an aircraft as it approaches for landing was simulated using this model. Thirty frames from this sequence of 200 x 200 pixel images were analyzed to identify and track objects in the image using the Cantata image processing package within the visual programming environment provided by the Khoros software system. The image analysis operations are described.

  15. Nucleotide sequence of Hungarian grapevine chrome mosaic nepovirus RNA1.

    PubMed Central

    Le Gall, O; Candresse, T; Brault, V; Dunez, J

    1989-01-01

    The nucleotide sequence of the RNA1 of hungarian grapevine chrome mosaic virus, a nepovirus very closely related to tomato black ring virus, has been determined from cDNA clones. It is 7212 nucleotides in length excluding the 3' terminal poly(A) tail and contains a large open reading frame extending from nucleotides 216 to 6971. The presumably encoded polyprotein is 2252 amino acids in length with a molecular weight of 250 kDa. The primary structure of the polyprotein was compared with that of other viral polyproteins, revealing the same general genetic organization as that of other picorna-like viruses (comoviruses, potyviruses and picornaviruses), except that an additional protein is suspected to occupy the N-terminus of the polyprotein. PMID:2798128

  16. Novel primer specific false terminations during DNA sequencing reactions: danger of inaccuracy of mutation analysis in molecular diagnostics

    PubMed Central

    Anwar, R; Booth, A; Churchill, A J; Markham, A F

    1996-01-01

    The determination of nucleotide sequence is fundamental to the identification and molecular analysis of genes. Direct sequencing of PCR products is now becoming a commonplace procedure for haplotype analysis, and for defining mutations and polymorphism within genes, particularly for diagnostic purposes. A previously unrecognised phenomenon, primer related variability, observed in sequence data generated using Taq cycle sequencing and T7 Sequenase sequencing, is reported. This suggests that caution is necessary when interpreting DNA sequence data. This is particularly important in situations where treatment may be dependent on the accuracy of the molecular diagnosis. Images PMID:16696096

  17. Quantitative analysis of the anti-noise performance of an m-sequence in an electromagnetic method

    NASA Astrophysics Data System (ADS)

    Yuan, Zhe; Zhang, Yiming; Zheng, Qijia

    2018-02-01

    An electromagnetic method with a transmitted waveform coded by an m-sequence achieved better anti-noise performance compared to the conventional manner with a square-wave. The anti-noise performance of the m-sequence varied with multiple coding parameters; hence, a quantitative analysis of the anti-noise performance for m-sequences with different coding parameters was required to optimize them. This paper proposes the concept of an identification system, with the identified Earth impulse response obtained by measuring the system output with the input of the voltage response. A quantitative analysis of the anti-noise performance of the m-sequence was achieved by analyzing the amplitude-frequency response of the corresponding identification system. The effects of the coding parameters on the anti-noise performance are summarized by numerical simulation, and their optimization is further discussed in our conclusions; the validity of the conclusions is further verified by field experiment. The quantitative analysis method proposed in this paper provides a new insight into the anti-noise mechanism of the m-sequence, and could be used to evaluate the anti-noise performance of artificial sources in other time-domain exploration methods, such as the seismic method.

  18. Comparative sequence analysis suggests a conserved gating mechanism for TRP channels

    PubMed Central

    Palovcak, Eugene; Delemotte, Lucie; Klein, Michael L.

    2015-01-01

    The transient receptor potential (TRP) channel superfamily plays a central role in transducing diverse sensory stimuli in eukaryotes. Although dissimilar in sequence and domain organization, all known TRP channels act as polymodal cellular sensors and form tetrameric assemblies similar to those of their distant relatives, the voltage-gated potassium (Kv) channels. Here, we investigated the related questions of whether the allosteric mechanism underlying polymodal gating is common to all TRP channels, and how this mechanism differs from that underpinning Kv channel voltage sensitivity. To provide insight into these questions, we performed comparative sequence analysis on large, comprehensive ensembles of TRP and Kv channel sequences, contextualizing the patterns of conservation and correlation observed in the TRP channel sequences in light of the well-studied Kv channels. We report sequence features that are specific to TRP channels and, based on insight from recent TRPV1 structures, we suggest a model of TRP channel gating that differs substantially from the one mediating voltage sensitivity in Kv channels. The common mechanism underlying polymodal gating involves the displacement of a defect in the H-bond network of S6 that changes the orientation of the pore-lining residues at the hydrophobic gate. PMID:26078053

  19. Improved serial analysis of V1 ribosomal sequence tags (SARST-V1) provides a rapid, comprehensive, sequence-based characterization of bacterial diversity and community composition.

    PubMed

    Yu, Zhongtang; Yu, Marie; Morrison, Mark

    2006-04-01

    Serial analysis of ribosomal sequence tags (SARST) is a recently developed technology that can generate large 16S rRNA gene (rrs) sequence data sets from microbiomes, but there are numerous enzymatic and purification steps required to construct the ribosomal sequence tag (RST) clone libraries. We report here an improved SARST method, which still targets the V1 hypervariable region of rrs genes, but reduces the number of enzymes, oligonucleotides, reagents, and technical steps needed to produce the RST clone libraries. The new method, hereafter referred to as SARST-V1, was used to examine the eubacterial diversity present in community DNA recovered from the microbiome resident in the ovine rumen. The 190 sequenced clones contained 1055 RSTs and no less than 236 unique phylotypes (based on > or = 95% sequence identity) that were assigned to eight different eubacterial phyla. Rarefaction and monomolecular curve analyses predicted that the complete RST clone library contains 99% of the 353 unique phylotypes predicted to exist in this microbiome. When compared with ribosomal intergenic spacer analysis (RISA) of the same community DNA sample, as well as a compilation of nine previously published conventional rrs clone libraries prepared from the same type of samples, the RST clone library provided a more comprehensive characterization of the eubacterial diversity present in rumen microbiomes. As such, SARST-V1 should be a useful tool applicable to comprehensive examination of diversity and composition in microbiomes and offers an affordable, sequence-based method for diversity analysis.

  20. BioPig: Developing Cloud Computing Applications for Next-Generation Sequence Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bhatia, Karan; Wang, Zhong

    Next Generation sequencing is producing ever larger data sizes with a growth rate outpacing Moore's Law. The data deluge has made many of the current sequenceanalysis tools obsolete because they do not scale with data. Here we present BioPig, a collection of cloud computing tools to scale data analysis and management. Pig is aflexible data scripting language that uses Apache's Hadoop data structure and map reduce framework to process very large data files in parallel and combine the results.BioPig extends Pig with capability with sequence analysis. We will show the performance of BioPig on a variety of bioinformatics tasks, includingmore » screeningsequence contaminants, Illumina QA/QC, and gene discovery from metagenome data sets using the Rumen metagenome as an example.« less

  1. PHASTpep: Analysis Software for Discovery of Cell-Selective Peptides via Phage Display and Next-Generation Sequencing

    PubMed Central

    Dasa, Siva Sai Krishna; Kelly, Kimberly A.

    2016-01-01

    Next-generation sequencing has enhanced the phage display process, allowing for the quantification of millions of sequences resulting from the biopanning process. In response, many valuable analysis programs focused on specificity and finding targeted motifs or consensus sequences were developed. For targeted drug delivery and molecular imaging, it is also necessary to find peptides that are selective—targeting only the cell type or tissue of interest. We present a new analysis strategy and accompanying software, PHage Analysis for Selective Targeted PEPtides (PHASTpep), which identifies highly specific and selective peptides. Using this process, we discovered and validated, both in vitro and in vivo in mice, two sequences (HTTIPKV and APPIMSV) targeted to pancreatic cancer-associated fibroblasts that escaped identification using previously existing software. Our selectivity analysis makes it possible to discover peptides that target a specific cell type and avoid other cell types, enhancing clinical translatability by circumventing complications with systemic use. PMID:27186887

  2. A Rapid Whole Genome Sequencing and Analysis System Supporting Genomic Epidemiology (7th Annual SFAF Meeting, 2012)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    FitzGerald, Michael

    2012-06-01

    Michael FitzGerald on "A rapid whole genome sequencing and analysis system supporting genomic epidemiology" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  3. A Rapid Whole Genome Sequencing and Analysis System Supporting Genomic Epidemiology (7th Annual SFAF Meeting, 2012)

    ScienceCinema

    FitzGerald, Michael

    2018-01-11

    Michael FitzGerald on "A rapid whole genome sequencing and analysis system supporting genomic epidemiology" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  4. Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis.

    PubMed

    Buldyrev, S V; Goldberger, A L; Havlin, S; Mantegna, R N; Matsa, M E; Peng, C K; Simons, M; Stanley, H E

    1995-05-01

    An open question in computational molecular biology is whether long-range correlations are present in both coding and noncoding DNA or only in the latter. To answer this question, we consider all 33301 coding and all 29453 noncoding eukaryotic sequences--each of length larger than 512 base pairs (bp)--in the present release of the GenBank to dtermine whether there is any statistically significant distinction in their long-range correlation properties. Standard fast Fourier transform (FFT) analysis indicates that coding sequences have practically no correlations in the range from 10 bp to 100 bp (spectral exponent beta=0.00 +/- 0.04, where the uncertainty is two standard deviations). In contrast, for noncoding sequences, the average value of the spectral exponent beta is positive (0.16 +/- 0.05) which unambiguously shows the presence of long-range correlations. We also separately analyze the 874 coding and the 1157 noncoding sequences that have more than 4096 bp and find a larger region of power-law behavior. We calculate the probability that these two data sets (coding and noncoding) were drawn from the same distribution and we find that it is less than 10(-10). We obtain independent confirmation of these findings using the method of detrended fluctuation analysis (DFA), which is designed to treat sequences with statistical heterogeneity, such as DNA's known mosaic structure ("patchiness") arising from the nonstationarity of nucleotide concentration. The near-perfect agreement between the two independent analysis methods, FFT and DFA, increases the confidence in the reliability of our conclusion.

  5. Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis

    NASA Technical Reports Server (NTRS)

    Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Matsa, M. E.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    An open question in computational molecular biology is whether long-range correlations are present in both coding and noncoding DNA or only in the latter. To answer this question, we consider all 33301 coding and all 29453 noncoding eukaryotic sequences--each of length larger than 512 base pairs (bp)--in the present release of the GenBank to dtermine whether there is any statistically significant distinction in their long-range correlation properties. Standard fast Fourier transform (FFT) analysis indicates that coding sequences have practically no correlations in the range from 10 bp to 100 bp (spectral exponent beta=0.00 +/- 0.04, where the uncertainty is two standard deviations). In contrast, for noncoding sequences, the average value of the spectral exponent beta is positive (0.16 +/- 0.05) which unambiguously shows the presence of long-range correlations. We also separately analyze the 874 coding and the 1157 noncoding sequences that have more than 4096 bp and find a larger region of power-law behavior. We calculate the probability that these two data sets (coding and noncoding) were drawn from the same distribution and we find that it is less than 10(-10). We obtain independent confirmation of these findings using the method of detrended fluctuation analysis (DFA), which is designed to treat sequences with statistical heterogeneity, such as DNA's known mosaic structure ("patchiness") arising from the nonstationarity of nucleotide concentration. The near-perfect agreement between the two independent analysis methods, FFT and DFA, increases the confidence in the reliability of our conclusion.

  6. Sequencing and phylogenetic analysis of tobacco virus 2, a polerovirus from Nicotiana tabacum.

    PubMed

    Zhou, Benguo; Wang, Fang; Zhang, Xuesong; Zhang, Lina; Lin, Huafeng

    2017-07-01

    The complete genome sequence of a new virus, provisionally named tobacco virus 2 (TV2), was determined and identified from leaves of tobacco (Nicotiana tabacum) exhibiting leaf mosaic, yellowing, and deformity, in Anhui Province, China. The genome sequence of TV2 comprises 5,979 nucleotides, with 87% nucleotide sequence identity to potato leafroll virus (PLRV). Its genome organization is similar to that of PLRV, containing six open reading frames (ORFs) that potentially encode proteins with putative functions in cell-to-cell movement and suppression of RNA silencing. Phylogenetic analysis of the nucleotide sequence placed TV2 alongside members of the genus Polerovirus in the family Luteoviridae. To the best our knowledge, this study is the first report of a complete genome sequence of a new polerovirus identified in tobacco.

  7. A prokaryotic viral sequence is expressed and conserved in mammalian brain.

    PubMed

    Yeh, Yang-Hui; Gunasekharan, Vignesh; Manuelidis, Laura

    2017-07-03

    A natural and permanent transfer of prokaryotic viral sequences to mammals has not been reported by others. Circular "SPHINX" DNAs <5 kb were previously isolated from nuclease-protected cytoplasmic particles in rodent neuronal cell lines and brain. Two of these DNAs were sequenced after Φ29 polymerase amplification, and they revealed significant but imperfect homology to segments of commensal Acinetobacter phage viruses. These findings were surprising because the brain is isolated from environmental microorganisms. The 1.76-kb DNA sequence (SPHINX 1.8), with an iteron before its ORF, was evaluated here for its expression in neural cells and brain. A rabbit affinity purified antibody generated against a peptide without homology to mammalian sequences labeled a nonglycosylated ∼41-kDa protein (spx1) on Western blots, and the signal was efficiently blocked by the competing peptide. Spx1 was resistant to limited proteinase K digestion, but was unrelated to the expression of host prion protein or its pathologic amyloid form. Remarkably, spx1 concentrated in selected brain synapses, such as those on anterior motor horn neurons that integrate many complex neural inputs. SPHINX 1.8 appears to be involved in tissue-specific differentiation, including essential functions that preserve its propagation during mammalian evolution, possibly via maternal inheritance. The data here indicate that mammals can share and exchange a larger world of prokaryotic viruses than previously envisioned.

  8. Order within disorder: Aggrecan chondroitin sulphate-attachment region provides new structural insights into protein sequences classified as disordered

    PubMed Central

    Jowitt, Thomas A; Murdoch, Alan D; Baldock, Clair; Berry, Richard; Day, Joanna M; Hardingham, Timothy E

    2010-01-01

    Structural investigation of proteins containing large stretches of sequences without predicted secondary structure is the focus of much increased attention. Here, we have produced an unglycosylated 30 kDa peptide from the chondroitin sulphate (CS)-attachment region of human aggrecan (CS-peptide), which was predicted to be intrinsically disordered and compared its structure with the adjacent aggrecan G3 domain. Biophysical analyses, including analytical ultracentrifugation, light scattering, and circular dichroism showed that the CS-peptide had an elongated and stiffened conformation in contrast to the globular G3 domain. The results suggested that it contained significant secondary structure, which was sensitive to urea, and we propose that the CS-peptide forms an elongated wormlike molecule based on a dynamic range of energetically equivalent secondary structures stabilized by hydrogen bonds. The dimensions of the structure predicted from small-angle X-ray scattering analysis were compatible with EM images of fully glycosylated aggrecan and a partly glycosylated aggrecan CS2-G3 construct. The semiordered structure identified in CS-peptide was not predicted by common structural algorithms and identified a potentially distinct class of semiordered structure within sequences currently identified as disordered. Sequence comparisons suggested some evidence for comparable structures in proteins encoded by other genes (PRG4, MUC5B, and CBP). The function of these semiordered sequences may serve to spatially position attached folded modules and/or to present polypeptides for modification, such as glycosylation, and to provide templates for the multiple pleiotropic interactions proposed for disordered proteins. Proteins 2010. © 2010 Wiley-Liss, Inc. PMID:20806220

  9. Project Report: Automatic Sequence Processor Software Analysis

    NASA Technical Reports Server (NTRS)

    Benjamin, Brandon

    2011-01-01

    The Mission Planning and Sequencing (MPS) element of Multi-Mission Ground System and Services (MGSS) provides space missions with multi-purpose software to plan spacecraft activities, sequence spacecraft commands, and then integrate these products and execute them on spacecraft. Jet Propulsion Laboratory (JPL) is currently is flying many missions. The processes for building, integrating, and testing the multi-mission uplink software need to be improved to meet the needs of the missions and the operations teams that command the spacecraft. The Multi-Mission Sequencing Team is responsible for collecting and processing the observations, experiments and engineering activities that are to be performed on a selected spacecraft. The collection of these activities is called a sequence and ultimately a sequence becomes a sequence of spacecraft commands. The operations teams check the sequence to make sure that no constraints are violated. The workflow process involves sending a program start command, which activates the Automatic Sequence Processor (ASP). The ASP is currently a file-based system that is comprised of scripts written in perl, c-shell and awk. Once this start process is complete, the system checks for errors and aborts if there are any; otherwise the system converts the commands to binary, and then sends the resultant information to be radiated to the spacecraft.

  10. Secondary structure prediction and structure-specific sequence analysis of single-stranded DNA.

    PubMed

    Dong, F; Allawi, H T; Anderson, T; Neri, B P; Lyamichev, V I

    2001-08-01

    DNA sequence analysis by oligonucleotide binding is often affected by interference with the secondary structure of the target DNA. Here we describe an approach that improves DNA secondary structure prediction by combining enzymatic probing of DNA by structure-specific 5'-nucleases with an energy minimization algorithm that utilizes the 5'-nuclease cleavage sites as constraints. The method can identify structural differences between two DNA molecules caused by minor sequence variations such as a single nucleotide mutation. It also demonstrates the existence of long-range interactions between DNA regions separated by >300 nt and the formation of multiple alternative structures by a 244 nt DNA molecule. The differences in the secondary structure of DNA molecules revealed by 5'-nuclease probing were used to design structure-specific probes for mutation discrimination that target the regions of structural, rather than sequence, differences. We also demonstrate the performance of structure-specific 'bridge' probes complementary to non-contiguous regions of the target molecule. The structure-specific probes do not require the high stringency binding conditions necessary for methods based on mismatch formation and permit mutation detection at temperatures from 4 to 37 degrees C. Structure-specific sequence analysis is applied for mutation detection in the Mycobacterium tuberculosis katG gene and for genotyping of the hepatitis C virus.

  11. A newly constructed primer pair for the PCR amplification, cloning and sequencing of the flagellin (flaA) gene from isolatesof urease-negative Campylobacter lari.

    PubMed

    Sekizuka, Tsuyoshi; Yokoi, Taeko; Murayama, Ohoshi; Millar, B Cherie; Moore, Johne; Matsuda, Motoo

    2005-08-01

    A newly constructed primer pair (lari-Af/lari-Ar) designed to generate a product of the flagellin (flaA) gene for urease-negative Campylobacter lari produced a PCR amplicon of about 1700 bp for 16 isolates from 7 seagulls, 5 humans, 3 food animals and one mussel in Japan and Northern Ireland. Nucleotide sequencing and alignments of the flaA amplicons from these isolates demonstrated that the deduced amino acid sequences of the possible open reading frame were 564-572 amino acid residues in length with calculated molecular weights of 58,804 to 59,463. The deduced amino acid sequence similarity analysis strongly suggested that the ORF of the flaA from the 16 isolates showed 70-75% sequence similarities to those of Campylobacter jejuni isolates. The approximate Mr of the flagellin purified from some of the isolates of urease-negative C. lari was estimated to range from 59.6 to 61.8 kDa. Thus, flagellin from the isolates of urease-negative C. lari was shown for the first time to have a molecular size similar to those of C. jejuni and Campylobacter coli isolates, but to be different from the shorter flaA and smaller flagellin of urease-positive thermophilic Campylobacter (UPTC) isolates. Flagellins from C. lari spp., consisting of the two representative taxa of urease-negative C. lari and UPTC, thus show genotypic and phenotypic diversity.

  12. Whole-Genome Sequencing and Variant Analysis of Human Papillomavirus 16 Infections.

    PubMed

    van der Weele, Pascal; Meijer, Chris J L M; King, Audrey J

    2017-10-01

    Human papillomavirus (HPV) is a strongly conserved DNA virus, high-risk types of which can cause cervical cancer in persistent infections. The most common type found in HPV-attributable cancer is HPV16, which can be subdivided into four lineages (A to D) with different carcinogenic properties. Studies have shown HPV16 sequence diversity in different geographical areas, but only limited information is available regarding HPV16 diversity within a population, especially at the whole-genome level. We analyzed HPV16 major variant diversity and conservation in persistent infections and performed a single nucleotide polymorphism (SNP) comparison between persistent and clearing infections. Materials were obtained in the Netherlands from a cohort study with longitudinal follow-up for up to 3 years. Our analysis shows a remarkably large variant diversity in the population. Whole-genome sequences were obtained for 57 persistent and 59 clearing HPV16 infections, resulting in 109 unique variants. Interestingly, persistent infections were completely conserved through time. One reinfection event was identified where the initial and follow-up samples clustered differently. Non-A1/A2 variants seemed to clear preferentially ( P = 0.02). Our analysis shows that population-wide HPV16 sequence diversity is very large. In persistent infections, the HPV16 sequence was fully conserved. Sequencing can identify HPV16 reinfections, although occurrence is rare. SNP comparison identified no strongly acting effect of the viral genome affecting HPV16 infection clearance or persistence in up to 3 years of follow-up. These findings suggest the progression of an early HPV16 infection could be host related. IMPORTANCE Human papillomavirus 16 (HPV16) is the predominant type found in cervical cancer. Progression of initial infection to cervical cancer has been linked to sequence properties; however, knowledge of variants circulating in European populations, especially with longitudinal follow-up, is

  13. Genetic mutation analysis of human gastric adenocarcinomas using ion torrent sequencing platform.

    PubMed

    Xu, Zhi; Huo, Xinying; Ye, Hua; Tang, Chuanning; Nandakumar, Vijayalakshmi; Lou, Feng; Zhang, Dandan; Dong, Haichao; Sun, Hong; Jiang, Shouwen; Zhang, Guangchun; Liu, Zhiyuan; Dong, Zhishou; Guo, Baishuai; He, Yan; Yan, Chaowei; Wang, Lu; Su, Ziyi; Li, Yangyang; Gu, Dongying; Zhang, Xiaojing; Wu, Xiaomin; Wei, Xiaowei; Hong, Lingzhi; Zhang, Yangmei; Yang, Jinsong; Gong, Yonglin; Tang, Cuiju; Jones, Lindsey; Huang, Xue F; Chen, Si-Yi; Chen, Jinfei

    2014-01-01

    Gastric cancer is the one of the major causes of cancer-related death, especially in Asia. Gastric adenocarcinoma, the most common type of gastric cancer, is heterogeneous and its incidence and cause varies widely with geographical regions, gender, ethnicity, and diet. Since unique mutations have been observed in individual human cancer samples, identification and characterization of the molecular alterations underlying individual gastric adenocarcinomas is a critical step for developing more effective, personalized therapies. Until recently, identifying genetic mutations on an individual basis by DNA sequencing remained a daunting task. Recent advances in new next-generation DNA sequencing technologies, such as the semiconductor-based Ion Torrent sequencing platform, makes DNA sequencing cheaper, faster, and more reliable. In this study, we aim to identify genetic mutations in the genes which are targeted by drugs in clinical use or are under development in individual human gastric adenocarcinoma samples using Ion Torrent sequencing. We sequenced 737 loci from 45 cancer-related genes in 238 human gastric adenocarcinoma samples using the Ion Torrent Ampliseq Cancer Panel. The sequencing analysis revealed a high occurrence of mutations along the TP53 locus (9.7%) in our sample set. Thus, this study indicates the utility of a cost and time efficient tool such as Ion Torrent sequencing to screen cancer mutations for the development of personalized cancer therapy.

  14. GobyWeb: Simplified Management and Analysis of Gene Expression and DNA Methylation Sequencing Data

    PubMed Central

    Dorff, Kevin C.; Chambwe, Nyasha; Zeno, Zachary; Simi, Manuele; Shaknovich, Rita; Campagne, Fabien

    2013-01-01

    We present GobyWeb, a web-based system that facilitates the management and analysis of high-throughput sequencing (HTS) projects. The software provides integrated support for a broad set of HTS analyses and offers a simple plugin extension mechanism. Analyses currently supported include quantification of gene expression for messenger and small RNA sequencing, estimation of DNA methylation (i.e., reduced bisulfite sequencing and whole genome methyl-seq), or the detection of pathogens in sequenced data. In contrast to previous analysis pipelines developed for analysis of HTS data, GobyWeb requires significantly less storage space, runs analyses efficiently on a parallel grid, scales gracefully to process tens or hundreds of multi-gigabyte samples, yet can be used effectively by researchers who are comfortable using a web browser. We conducted performance evaluations of the software and found it to either outperform or have similar performance to analysis programs developed for specialized analyses of HTS data. We found that most biologists who took a one-hour GobyWeb training session were readily able to analyze RNA-Seq data with state of the art analysis tools. GobyWeb can be obtained at http://gobyweb.campagnelab.org and is freely available for non-commercial use. GobyWeb plugins are distributed in source code and licensed under the open source LGPL3 license to facilitate code inspection, reuse and independent extensions http://github.com/CampagneLaboratory/gobyweb2-plugins. PMID:23936070

  15. Molecular cloning and expression analysis of annexin A2 gene in sika deer antler tip.

    PubMed

    Xia, Yanling; Qu, Haomiao; Lu, Binshan; Zhang, Qiang; Li, Heping

    2018-04-01

    Molecular cloning and bioinformatics analysis of annexin A2 ( ANXA2 ) gene in sika deer antler tip were conducted. The role of ANXA2 gene in the growth and development of the antler were analyzed initially. The reverse transcriptase polymerase chain reaction (RT-PCR) was used to clone the cDNA sequence of the ANXA2 gene from antler tip of sika deer ( Cervus Nippon hortulorum ) and the bioinformatics methods were applied to analyze the amino acid sequence of Anxa2 protein. The mRNA expression levels of the ANXA2 gene in different growth stages were examined by real time reverse transcriptase polymerase chain reaction (real time RT-PCR). The nucleotide sequence analysis revealed an open reading frame of 1,020 bp encoding 339 amino acids long protein of calculated molecular weight 38.6 kDa and isoelectric point 6.09. Homologous sequence alignment and phylogenetic analysis indicated that the Anxa2 mature protein of sika deer had the closest genetic distance with Cervus elaphus and Bos mutus . Real time RT-PCR results showed that the gene had differential expression levels in different growth stages, and the expression level of the ANXA2 gene was the highest at metaphase (rapid growing period). ANXA2 gene may promote the cell proliferation, and the finding suggested Anxa2 as an important candidate for regulating the growth and development of deer antler.

  16. Sequence analysis of Jembrana disease virus strains reveals a genetically stable lentivirus.

    PubMed

    Desport, Moira; Stewart, Meredith E; Mikosza, Andrew S; Sheridan, Carol A; Peterson, Shane E; Chavand, Olivier; Hartaningsih, Nining; Wilcox, Graham E

    2007-06-01

    Jembrana disease virus (JDV) is a lentivirus associated with an acute disease syndrome with a 20% case fatality rate in Bos javanicus (Bali cattle) in Indonesia, occurring after a short incubation period and with no recurrence of the disease after recovery. Partial regions of gag and pol and the entire env were examined for sequence variation in DNA samples from cases of Jembrana disease obtained from Bali, Sumatra and South Kalimantan in Indonesian Borneo. A high level of nucleotide conservation (97-100%) was observed in gag sequences from samples taken in Bali and Sumatra, indicating that the source of JDV in Sumatra was most likely to have originated from Bali. The pol sequences and, unexpectedly, the env sequences from Bali samples were also well conserved with low nucleotide (96-99%) and amino acid substitutions (95-99%). However, the sample from South Kalimantan (JDV(KAL/01)) contained more divergent sequences, particularly in env (88% identity). Phylogenetic analysis revealed that the JDV(KAL/01)env sequences clustered with the sequence from the Pulukan sample (Bali) from 2001. JDV appears to be remarkably stable genetically and has undergone minor genetic changes over a period of nearly 20 years in Bali despite becoming endemic in the cattle population of the island.

  17. Antimicrobial activity of a 48-kDa protease (AMP48) from Artocarpus heterophyllus latex.

    PubMed

    Siritapetawee, J; Thammasirirak, S; Samosornsuk, W

    2012-01-01

    Artocarpus heterophyllus (jackfruit) is a latex producing plant. Plant latex is produced from secretory cells and contains many intergradients. It also has been used in folk medicine. This study aimed to purify and characterize the biological activities of a protease from jackfruit latex. A protease was isolated and purified from crude latex of a jackfruit tree by acid precipitation and ion exchange chromatography. The proteolytic activities of protein were tested using gelatin- and casein-zymography. The molecular weight and isoelectric point (pl) of protein were analysed by SDS/12.5% PAGE and 2D-PAGE, respectively. Antimicrobial activity of protein was analysed by broth microdilution method. In addition, the antibacterial activity of protein against Pseudomonas aeruginosa ATCC 27853 was observed and measured using atomic force microscopy (AFM) technique. The purified protein contained protease activity by digesting gelatin- and casein-substrates. The protease was designated as antimicrobial protease-48 kDa or AMP48 due to its molecular mass on SDS-PAGE was approximately 48 kDa. The isoelectric point (pl) of AMP48 was approximately 4.2. In addition, AMP48 contained antimicrobial activities by it could inhibit the growths of Pseudomonas aeruginosa ATCC 27853 and clinical isolated Candida albicans at minimum inhibitory concentration (MIC) 2.2 mg/ml and Minimum microbicidal concentration (MMC) 8.8 mg/ml. AFM image also supported the antimicrobial activities of AMP48 by the treated bacterial morphology and size were altered from normal.

  18. Initial steps towards a production platform for DNA sequence analysis on the grid.

    PubMed

    Luyf, Angela C M; van Schaik, Barbera D C; de Vries, Michel; Baas, Frank; van Kampen, Antoine H C; Olabarriaga, Silvia D

    2010-12-14

    Bioinformatics is confronted with a new data explosion due to the availability of high throughput DNA sequencers. Data storage and analysis becomes a problem on local servers, and therefore it is needed to switch to other IT infrastructures. Grid and workflow technology can help to handle the data more efficiently, as well as facilitate collaborations. However, interfaces to grids are often unfriendly to novice users. In this study we reused a platform that was developed in the VL-e project for the analysis of medical images. Data transfer, workflow execution and job monitoring are operated from one graphical interface. We developed workflows for two sequence alignment tools (BLAST and BLAT) as a proof of concept. The analysis time was significantly reduced. All workflows and executables are available for the members of the Dutch Life Science Grid and the VL-e Medical virtual organizations All components are open source and can be transported to other grid infrastructures. The availability of in-house expertise and tools facilitates the usage of grid resources by new users. Our first results indicate that this is a practical, powerful and scalable solution to address the capacity and collaboration issues raised by the deployment of next generation sequencers. We currently adopt this methodology on a daily basis for DNA sequencing and other applications. More information and source code is available via http://www.bioinformaticslaboratory.nl/

  19. A 21-35 kDa Mixed Protein Component from Helicobacter pylori Activates Mast Cells Effectively in Chronic Spontaneous Urticaria.

    PubMed

    Tan, Ran-Jing; Sun, He-Qiang; Zhang, Wei; Yuan, Han-Mei; Li, Bin; Yan, Hong-Tao; Lan, Chun-Hui; Yang, Jun; Zhao, Zhuo; Wu, Jin-Jin; Wu, Chao

    2016-12-01

    Helicobacter pylori (H. pylori) seem to involve in the etiology of chronic spontaneous urticaria (CSU). But studies of the pathogenic mechanism are very little. In this study, we detected the serum-specific anti-H. pylori IgG and IgE antibodies in 211 CSU and 137 normal subjects by enzyme-linked immunosorbent assay (ELISA), evaluated the direct activation effects of H. pylori preparations and its protein components on human LAD 2 mast cell line in vitro, and analyzed the specific protein ingredients and functions of the most effective H. pylori mixed protein component using liquid chromatography-mass spectrometry and ELISA assay. In CSU patients, the positive rate of anti-H. pylori IgG positive rate was significantly higher than that in normal controls, and the anti-H. pylori IgE levels had no statistical difference between H. pylori-infected patients with and without CSU. Further studies suggested that H. pylori preparations can directly activate human LAD 2 mast cell line in a dose-dependent manner and its most powerful protein component was a mixture of 21-35 kDa proteins. Moreover, the 21-35 kDa mixed protein component mainly contained 23 kinds of proteins, which can stimulate the release of histamine, TNF-a, IL-3, IFN-γ, and LTB4 by LAD 2 cells in a dose-dependent or time-dependent manner. A 21-35 kDa mixed protein component should be regarded as the most promising pathogenic factor contributing to the CSU associated with H. pylori infection. © 2016 John Wiley & Sons Ltd.

  20. Cloning, analysis and functional annotation of expressed sequence tags from the Earthworm Eisenia fetida

    PubMed Central

    Pirooznia, Mehdi; Gong, Ping; Guan, Xin; Inouye, Laura S; Yang, Kuan; Perkins, Edward J; Deng, Youping

    2007-01-01

    Background Eisenia fetida, commonly known as red wiggler or compost worm, belongs to the Lumbricidae family of the Annelida phylum. Little is known about its genome sequence although it has been extensively used as a test organism in terrestrial ecotoxicology. In order to understand its gene expression response to environmental contaminants, we cloned 4032 cDNAs or expressed sequence tags (ESTs) from two E. fetida libraries enriched with genes responsive to ten ordnance related compounds using suppressive subtractive hybridization-PCR. Results A total of 3144 good quality ESTs (GenBank dbEST accession number EH669363–EH672369 and EL515444–EL515580) were obtained from the raw clone sequences after cleaning. Clustering analysis yielded 2231 unique sequences including 448 contigs (from 1361 ESTs) and 1783 singletons. Comparative genomic analysis showed that 743 or 33% of the unique sequences shared high similarity with existing genes in the GenBank nr database. Provisional function annotation assigned 830 Gene Ontology terms to 517 unique sequences based on their homology with the annotated genomes of four model organisms Drosophila melanogaster, Mus musculus, Saccharomyces cerevisiae, and Caenorhabditis elegans. Seven percent of the unique sequences were further mapped to 99 Kyoto Encyclopedia of Genes and Genomes pathways based on their matching Enzyme Commission numbers. All the information is stored and retrievable at a highly performed, web-based and user-friendly relational database called EST model database or ESTMD version 2. Conclusion The ESTMD containing the sequence and annotation information of 4032 E. fetida ESTs is publicly accessible at . PMID:18047730

  1. Extracellular proteins of Vibrio cholerae: molecular cloning, nucleotide sequence and characterization of the deoxyribonuclease (DNase) together with its periplasmic localization in Escherichia coli K-12.

    PubMed

    Focareta, T; Manning, P A

    1987-01-01

    The gene encoding the extracellular DNase of Vibrio cholerae was cloned into Escherichia coli K-12. A maximal coding region of 1.2 kb and a minimal region of 0.6 kb were determined by transposon mutagenesis and deletion analysis. The nucleotide sequence of this region contained a single open reading frame of 690 bp corresponding to a protein of Mr 26,389 with a typical N-terminal signal sequence of 18 aa which, when removed, would give a mature protein of Mr 24,163. This is in good agreement with the size of 24 kDa, calculated directly by Coomassie blue staining following sodium dodecyl sulphate-polyacrylamide gel electrophoresis and indirectly via a DNA-hydrolysis assay. The protein is located in the periplasmic space of E. coli K-12 unlike in V. cholerae where it is excreted into the extracellular medium. The introduction of the DNase gene into a periplasmic (tolA) leaky mutant of E. coli K-12 facilitates the release of the protein, further confirming the periplasmic location.

  2. Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis.

    PubMed

    Bonham-Carter, Oliver; Steele, Joe; Bastola, Dhundy

    2014-11-01

    Modern sequencing and genome assembly technologies have provided a wealth of data, which will soon require an analysis by comparison for discovery. Sequence alignment, a fundamental task in bioinformatics research, may be used but with some caveats. Seminal techniques and methods from dynamic programming are proving ineffective for this work owing to their inherent computational expense when processing large amounts of sequence data. These methods are prone to giving misleading information because of genetic recombination, genetic shuffling and other inherent biological events. New approaches from information theory, frequency analysis and data compression are available and provide powerful alternatives to dynamic programming. These new methods are often preferred, as their algorithms are simpler and are not affected by synteny-related problems. In this review, we provide a detailed discussion of computational tools, which stem from alignment-free methods based on statistical analysis from word frequencies. We provide several clear examples to demonstrate applications and the interpretations over several different areas of alignment-free analysis such as base-base correlations, feature frequency profiles, compositional vectors, an improved string composition and the D2 statistic metric. Additionally, we provide detailed discussion and an example of analysis by Lempel-Ziv techniques from data compression. © The Author 2013. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  3. Analysis of DNA Sequences by An Optical Time-Integrating Correlator: Proof-Of-Concept Experiments.

    DTIC Science & Technology

    1992-05-01

    TABLES xv LIST OF ABBREVIATIONS xvii 1.0 INTRODUCTION 1 2.0 DNA ANALYSIS STRATEGY 4 2.1 Representation of DNA Bases 4 2.2 DNA Analysis Strategy 6 3.0...Zehnder architecture. 3 Figure 3: Short representations of the DNA bases where each base is represented by a 7-bits long pseudorandom sequence. 5... DNA bases where each base is represented by 7-bits long pseudorandom sequences. 4 Table 2: Long representations of the DNA bases with 255-bits maximum

  4. Nucleotide sequence of RNA2 of Lettuce big-vein virus and evidence for a possible transcription termination/initiation strategy similar to that of rhabdoviruses.

    PubMed

    Sasaya, Takahide; Kusaba, Shinnosuke; Ishikawa, Koichi; Koganezawa, Hiroki

    2004-09-01

    Lettuce big-vein virus (LBVV) is the type species of the genus Varicosavirus and is a two-segmented negative-sense single-stranded RNA virus. The larger LBVV genome segment (RNA1) consists of 6797 nt and encodes an L polymerase that resembles that of rhabdoviruses. Here, the nucleotide sequence of the second LBVV genome segment (RNA2) is reported. LBVV RNA2 consisted of 6081 nt and contained antisense information for five major ORFs: ORF1 (nt 210-1403 on the viral RNA), ORF2 (nt 1493-2494), ORF3 (nt 2617-3489), ORF4 (nt 3843-4337) and ORF5 (nt 4530-5636), which had coding capacities of 44, 36, 32, 19 and 41 kDa, respectively. The gene at the 3' end of the viral RNA encoded a coat protein, while the other four genes encoded proteins of unknown functions. The 3'-terminal 11 nt of LBVV RNA2 were identical to those of LBVV RNA1, and the 5'-terminal regions of LBVV RNA1 and RNA2 contained a long common nucleotide stretch of about 100 nt. Northern blot analysis using probes specific to the individual ORFs revealed that LBVV transcribes monocistronic RNAs. Analysis of the terminal sequences, and primer extension and RNase H digestion analysis of LBVV mRNAs, suggested that LBVV utilizes a transcription termination/initiation strategy comparable with that of rhabdoviruses.

  5. Phylogenetic sequence analysis, recombinant expression, and tissue distribution of a channel catfish estrogen receptor beta

    USGS Publications Warehouse

    Xia, Zhenfang; Gale, William L.; Chang, Xiaotian; Langenau, David; Patino, Reynaldo; Maule, Alec G.; Densmore, Llewellyn D.

    2000-01-01

    An estrogen receptor β (ERβ) cDNA fragment was amplified by RT-PCR of total RNAextracted from liver and ovary of immature channel catfish. This cDNA fragment was used to screen an ovarian cDNA library made from an immature female fish. A clone was obtained that contained an open reading frame encoding a 575-amino-acid protein with a deduced molecular weight of 63.9 kDa. Maximum parsimony and Neighbor Joining analyses were used to generate a phylogenetic classification of channel catfish ERβ on the basis of 25 full-length teleost and tetrapod ER sequences. The consensus tree obtained indicated the existence of two major vertebrate ER subtypes, α and β. Within each subtype, and in accordance with established phylogenetic relationships, teleost and tetrapod ER were monophyletic confirming the results of a previous analysis (Z. Xiaet al., 1999, Gen. Comp. Endocrinol. 113, 360–368). Extracts of COS-7 cells transfectedwith channel catfish ERβ cDNA bound estrogen with high affinity (Kd = 0.21 nM) and specificity. The affinity of channel catfish ERβ for estrogen was higher than previously reported for channel catfish ERα. As determined by qualitative RT-PCR, the tissue distributions of ERα and ERβ were similar but not identical. Both ER subtypes were present in ovary and testis. ERα was found in all other tissues examined from juvenile and mature fish of both sexes. ERβ was also found in most tissues except, in most cases, whole blood and head kidney. Interestingly, the pattern of expression of ER subtypes in head kidney always corresponded to the pattern in whole blood. In conclusion, we isolated a channel catfish ERβ with ligand-binding affinity and tissue expression patterns different from ERα. Also, we confirmed the validity of our previously proposed general classification scheme for vertebrate ER into α and β subtypes and within each subtype, into teleost and tetrapod clades.

  6. Efficient experimental design and analysis strategies for the detection of differential expression using RNA-Sequencing

    PubMed Central

    2012-01-01

    Background RNA sequencing (RNA-Seq) has emerged as a powerful approach for the detection of differential gene expression with both high-throughput and high resolution capabilities possible depending upon the experimental design chosen. Multiplex experimental designs are now readily available, these can be utilised to increase the numbers of samples or replicates profiled at the cost of decreased sequencing depth generated per sample. These strategies impact on the power of the approach to accurately identify differential expression. This study presents a detailed analysis of the power to detect differential expression in a range of scenarios including simulated null and differential expression distributions with varying numbers of biological or technical replicates, sequencing depths and analysis methods. Results Differential and non-differential expression datasets were simulated using a combination of negative binomial and exponential distributions derived from real RNA-Seq data. These datasets were used to evaluate the performance of three commonly used differential expression analysis algorithms and to quantify the changes in power with respect to true and false positive rates when simulating variations in sequencing depth, biological replication and multiplex experimental design choices. Conclusions This work quantitatively explores comparisons between contemporary analysis tools and experimental design choices for the detection of differential expression using RNA-Seq. We found that the DESeq algorithm performs more conservatively than edgeR and NBPSeq. With regard to testing of various experimental designs, this work strongly suggests that greater power is gained through the use of biological replicates relative to library (technical) replicates and sequencing depth. Strikingly, sequencing depth could be reduced as low as 15% without substantial impacts on false positive or true positive rates. PMID:22985019

  7. Efficient experimental design and analysis strategies for the detection of differential expression using RNA-Sequencing.

    PubMed

    Robles, José A; Qureshi, Sumaira E; Stephen, Stuart J; Wilson, Susan R; Burden, Conrad J; Taylor, Jennifer M

    2012-09-17

    RNA sequencing (RNA-Seq) has emerged as a powerful approach for the detection of differential gene expression with both high-throughput and high resolution capabilities possible depending upon the experimental design chosen. Multiplex experimental designs are now readily available, these can be utilised to increase the numbers of samples or replicates profiled at the cost of decreased sequencing depth generated per sample. These strategies impact on the power of the approach to accurately identify differential expression. This study presents a detailed analysis of the power to detect differential expression in a range of scenarios including simulated null and differential expression distributions with varying numbers of biological or technical replicates, sequencing depths and analysis methods. Differential and non-differential expression datasets were simulated using a combination of negative binomial and exponential distributions derived from real RNA-Seq data. These datasets were used to evaluate the performance of three commonly used differential expression analysis algorithms and to quantify the changes in power with respect to true and false positive rates when simulating variations in sequencing depth, biological replication and multiplex experimental design choices. This work quantitatively explores comparisons between contemporary analysis tools and experimental design choices for the detection of differential expression using RNA-Seq. We found that the DESeq algorithm performs more conservatively than edgeR and NBPSeq. With regard to testing of various experimental designs, this work strongly suggests that greater power is gained through the use of biological replicates relative to library (technical) replicates and sequencing depth. Strikingly, sequencing depth could be reduced as low as 15% without substantial impacts on false positive or true positive rates.

  8. Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics

    NASA Technical Reports Server (NTRS)

    Mantegna, R. N.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    We compare the statistical properties of coding and noncoding regions in eukaryotic and viral DNA sequences by adapting two tests developed for the analysis of natural languages and symbolic sequences. The data set comprises all 30 sequences of length above 50 000 base pairs in GenBank Release No. 81.0, as well as the recently published sequences of C. elegans chromosome III (2.2 Mbp) and yeast chromosome XI (661 Kbp). We find that for the three chromosomes we studied the statistical properties of noncoding regions appear to be closer to those observed in natural languages than those of coding regions. In particular, (i) a n-tuple Zipf analysis of noncoding regions reveals a regime close to power-law behavior while the coding regions show logarithmic behavior over a wide interval, while (ii) an n-gram entropy measurement shows that the noncoding regions have a lower n-gram entropy (and hence a larger "n-gram redundancy") than the coding regions. In contrast to the three chromosomes, we find that for vertebrates such as primates and rodents and for viral DNA, the difference between the statistical properties of coding and noncoding regions is not pronounced and therefore the results of the analyses of the investigated sequences are less conclusive. After noting the intrinsic limitations of the n-gram redundancy analysis, we also briefly discuss the failure of the zeroth- and first-order Markovian models or simple nucleotide repeats to account fully for these "linguistic" features of DNA. Finally, we emphasize that our results by no means prove the existence of a "language" in noncoding DNA.

  9. Lessons learnt on the analysis of large sequence data in animal genomics.

    PubMed

    Biscarini, F; Cozzi, P; Orozco-Ter Wengel, P

    2018-04-06

    The 'omics revolution has made a large amount of sequence data available to researchers and the industry. This has had a profound impact in the field of bioinformatics, stimulating unprecedented advancements in this discipline. Mostly, this is usually looked at from the perspective of human 'omics, in particular human genomics. Plant and animal genomics, however, have also been deeply influenced by next-generation sequencing technologies, with several genomics applications now popular among researchers and the breeding industry. Genomics tends to generate huge amounts of data, and genomic sequence data account for an increasing proportion of big data in biological sciences, due largely to decreasing sequencing and genotyping costs and to large-scale sequencing and resequencing projects. The analysis of big data poses a challenge to scientists, as data gathering currently takes place at a faster pace than does data processing and analysis, and the associated computational burden is increasingly taxing, making even simple manipulation, visualization and transferring of data a cumbersome operation. The time consumed by the processing and analysing of huge data sets may be at the expense of data quality assessment and critical interpretation. Additionally, when analysing lots of data, something is likely to go awry-the software may crash or stop-and it can be very frustrating to track the error. We herein review the most relevant issues related to tackling these challenges and problems, from the perspective of animal genomics, and provide researchers that lack extensive computing experience with guidelines that will help when processing large genomic data sets. © 2018 Stichting International Foundation for Animal Genetics.

  10. Plastome Sequence Determination and Comparative Analysis for Members of the Lolium-Festuca Grass Species Complex

    PubMed Central

    Hand, Melanie L.; Spangenberg, German C.; Forster, John W.; Cogan, Noel O. I.

    2013-01-01

    Chloroplast genome sequences are of broad significance in plant biology, due to frequent use in molecular phylogenetics, comparative genomics, population genetics, and genetic modification studies. The present study used a second-generation sequencing approach to determine and assemble the plastid genomes (plastomes) of four representatives from the agriculturally important Lolium-Festuca species complex of pasture grasses (Lolium multiflorum, Festuca pratensis, Festuca altissima, and Festuca ovina). Total cellular DNA was extracted from either roots or leaves, was sequenced, and the output was filtered for plastome-related reads. A comparison between sources revealed fewer plastome-related reads from root-derived template but an increase in incidental bacterium-derived sequences. Plastome assembly and annotation indicated high levels of sequence identity and a conserved organization and gene content between species. However, frequent deletions within the F. ovina plastome appeared to contribute to a smaller plastid genome size. Comparative analysis with complete plastome sequences from other members of the Poaceae confirmed conservation of most grass-specific features. Detailed analysis of the rbcL–psaI intergenic region, however, revealed a “hot-spot” of variation characterized by independent deletion events. The evolutionary implications of this observation are discussed. The complete plastome sequences are anticipated to provide the basis for potential organelle-specific genetic modification of pasture grasses. PMID:23550121

  11. CloVR: a virtual machine for automated and portable sequence analysis from the desktop using cloud computing.

    PubMed

    Angiuoli, Samuel V; Matalka, Malcolm; Gussman, Aaron; Galens, Kevin; Vangala, Mahesh; Riley, David R; Arze, Cesar; White, James R; White, Owen; Fricke, W Florian

    2011-08-30

    Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing.

  12. Analysis of Pteridium ribosomal RNA sequences by rapid direct sequencing.

    PubMed

    Tan, M K

    1991-08-01

    A total of 864 bases from 5 regions interspersed in the 18S and 26S rRNA molecules from various clones of Pteridium covering the general geographical distribution of the genus was analysed using a rapid rRNA sequencing technique. No base difference has been detected amongst the three major lineages, two of which apparently separated before the breakup of the ancient supercontinent, Pangaea. These regions of the rRNA sequences have thus been conserved for at least 160 million years and are here compared with other eukaryotic, especially plant rRNAs.

  13. A technique for setting analytical thresholds in massively parallel sequencing-based forensic DNA analysis

    PubMed Central

    2017-01-01

    Amplicon (targeted) sequencing by massively parallel sequencing (PCR-MPS) is a potential method for use in forensic DNA analyses. In this application, PCR-MPS may supplement or replace other instrumental analysis methods such as capillary electrophoresis and Sanger sequencing for STR and mitochondrial DNA typing, respectively. PCR-MPS also may enable the expansion of forensic DNA analysis methods to include new marker systems such as single nucleotide polymorphisms (SNPs) and insertion/deletions (indels) that currently are assayable using various instrumental analysis methods including microarray and quantitative PCR. Acceptance of PCR-MPS as a forensic method will depend in part upon developing protocols and criteria that define the limitations of a method, including a defensible analytical threshold or method detection limit. This paper describes an approach to establish objective analytical thresholds suitable for multiplexed PCR-MPS methods. A definition is proposed for PCR-MPS method background noise, and an analytical threshold based on background noise is described. PMID:28542338

  14. Empirical analysis of RNA robustness and evolution using high-throughput sequencing of ribozyme reactions.

    PubMed

    Hayden, Eric J

    2016-08-15

    RNA molecules provide a realistic but tractable model of a genotype to phenotype relationship. This relationship has been extensively investigated computationally using secondary structure prediction algorithms. Enzymatic RNA molecules, or ribozymes, offer access to genotypic and phenotypic information in the laboratory. Advancements in high-throughput sequencing technologies have enabled the analysis of sequences in the lab that now rivals what can be accomplished computationally. This has motivated a resurgence of in vitro selection experiments and opened new doors for the analysis of the distribution of RNA functions in genotype space. A body of computational experiments has investigated the persistence of specific RNA structures despite changes in the primary sequence, and how this mutational robustness can promote adaptations. This article summarizes recent approaches that were designed to investigate the role of mutational robustness during the evolution of RNA molecules in the laboratory, and presents theoretical motivations, experimental methods and approaches to data analysis. Copyright © 2016 Elsevier Inc. All rights reserved.

  15. A technique for setting analytical thresholds in massively parallel sequencing-based forensic DNA analysis.

    PubMed

    Young, Brian; King, Jonathan L; Budowle, Bruce; Armogida, Luigi

    2017-01-01

    Amplicon (targeted) sequencing by massively parallel sequencing (PCR-MPS) is a potential method for use in forensic DNA analyses. In this application, PCR-MPS may supplement or replace other instrumental analysis methods such as capillary electrophoresis and Sanger sequencing for STR and mitochondrial DNA typing, respectively. PCR-MPS also may enable the expansion of forensic DNA analysis methods to include new marker systems such as single nucleotide polymorphisms (SNPs) and insertion/deletions (indels) that currently are assayable using various instrumental analysis methods including microarray and quantitative PCR. Acceptance of PCR-MPS as a forensic method will depend in part upon developing protocols and criteria that define the limitations of a method, including a defensible analytical threshold or method detection limit. This paper describes an approach to establish objective analytical thresholds suitable for multiplexed PCR-MPS methods. A definition is proposed for PCR-MPS method background noise, and an analytical threshold based on background noise is described.

  16. Whole-genome sequencing and genetic variant analysis of a Quarter Horse mare.

    PubMed

    Doan, Ryan; Cohen, Noah D; Sawyer, Jason; Ghaffari, Noushin; Johnson, Charlie D; Dindot, Scott V

    2012-02-17

    The catalog of genetic variants in the horse genome originates from a few select animals, the majority originating from the Thoroughbred mare used for the equine genome sequencing project. The purpose of this study was to identify genetic variants, including single nucleotide polymorphisms (SNPs), insertion/deletion polymorphisms (INDELs), and copy number variants (CNVs) in the genome of an individual Quarter Horse mare sequenced by next-generation sequencing. Using massively parallel paired-end sequencing, we generated 59.6 Gb of DNA sequence from a Quarter Horse mare resulting in an average of 24.7X sequence coverage. Reads were mapped to approximately 97% of the reference Thoroughbred genome. Unmapped reads were de novo assembled resulting in 19.1 Mb of new genomic sequence in the horse. Using a stringent filtering method, we identified 3.1 million SNPs, 193 thousand INDELs, and 282 CNVs. Genetic variants were annotated to determine their impact on gene structure and function. Additionally, we genotyped this Quarter Horse for mutations of known diseases and for variants associated with particular traits. Functional clustering analysis of genetic variants revealed that most of the genetic variation in the horse's genome was enriched in sensory perception, signal transduction, and immunity and defense pathways. This is the first sequencing of a horse genome by next-generation sequencing and the first genomic sequence of an individual Quarter Horse mare. We have increased the catalog of genetic variants for use in equine genomics by the addition of novel SNPs, INDELs, and CNVs. The genetic variants described here will be a useful resource for future studies of genetic variation regulating performance traits and diseases in equids.

  17. A generalized analysis of hydrophobic and loop clusters within globular protein sequences

    PubMed Central

    Eudes, Richard; Le Tuan, Khanh; Delettré, Jean; Mornon, Jean-Paul; Callebaut, Isabelle

    2007-01-01

    Background Hydrophobic Cluster Analysis (HCA) is an efficient way to compare highly divergent sequences through the implicit secondary structure information directly derived from hydrophobic clusters. However, its efficiency and application are currently limited by the need of user expertise. In order to help the analysis of HCA plots, we report here the structural preferences of hydrophobic cluster species, which are frequently encountered in globular domains of proteins. These species are characterized only by their hydrophobic/non-hydrophobic dichotomy. This analysis has been extended to loop-forming clusters, using an appropriate loop alphabet. Results The structural behavior of hydrophobic cluster species, which are typical of protein globular domains, was investigated within banks of experimental structures, considered at different levels of sequence redundancy. The 294 more frequent hydrophobic cluster species were analyzed with regard to their association with the different secondary structures (frequencies of association with secondary structures and secondary structure propensities). Hydrophobic cluster species are predominantly associated with regular secondary structures, and a large part (60 %) reveals preferences for α-helices or β-strands. Moreover, the analysis of the hydrophobic cluster amino acid composition generally allows for finer prediction of the regular secondary structure associated with the considered cluster within a cluster species. We also investigated the behavior of loop forming clusters, using a "PGDNS" alphabet. These loop clusters do not overlap with hydrophobic clusters and are highly associated with coils. Finally, the structural information contained in the hydrophobic structural words, as deduced from experimental structures, was compared to the PSI-PRED predictions, revealing that β-strands and especially α-helices are generally over-predicted within the limits of typical β and α hydrophobic clusters. Conclusion The

  18. Sequence Analysis and Domain Motifs in the Porcine Skin Decorin Glycosaminoglycan Chain*

    PubMed Central

    Zhao, Xue; Yang, Bo; Solakylidirim, Kemal; Joo, Eun Ji; Toida, Toshihiko; Higashi, Kyohei; Linhardt, Robert J.; Li, Lingyun

    2013-01-01

    Decorin proteoglycan is comprised of a core protein containing a single O-linked dermatan sulfate/chondroitin sulfate glycosaminoglycan (GAG) chain. Although the sequence of the decorin core protein is determined by the gene encoding its structure, the structure of its GAG chain is determined in the Golgi. The recent application of modern MS to bikunin, a far simpler chondroitin sulfate proteoglycans, suggests that it has a single or small number of defined sequences. On this basis, a similar approach to sequence the decorin of porcine skin much larger and more structurally complex dermatan sulfate/chondroitin sulfate GAG chain was undertaken. This approach resulted in information on the consistency/variability of its linkage region at the reducing end of the GAG chain, its iduronic acid-rich domain, glucuronic acid-rich domain, and non-reducing end. A general motif for the porcine skin decorin GAG chain was established. A single small decorin GAG chain was sequenced using MS/MS analysis. The data obtained in the study suggest that the decorin GAG chain has a small or a limited number of sequences. PMID:23423381

  19. Masking as an effective quality control method for next-generation sequencing data analysis.

    PubMed

    Yun, Sajung; Yun, Sijung

    2014-12-13

    Next generation sequencing produces base calls with low quality scores that can affect the accuracy of identifying simple nucleotide variation calls, including single nucleotide polymorphisms and small insertions and deletions. Here we compare the effectiveness of two data preprocessing methods, masking and trimming, and the accuracy of simple nucleotide variation calls on whole-genome sequence data from Caenorhabditis elegans. Masking substitutes low quality base calls with 'N's (undetermined bases), whereas trimming removes low quality bases that results in a shorter read lengths. We demonstrate that masking is more effective than trimming in reducing the false-positive rate in single nucleotide polymorphism (SNP) calling. However, both of the preprocessing methods did not affect the false-negative rate in SNP calling with statistical significance compared to the data analysis without preprocessing. False-positive rate and false-negative rate for small insertions and deletions did not show differences between masking and trimming. We recommend masking over trimming as a more effective preprocessing method for next generation sequencing data analysis since masking reduces the false-positive rate in SNP calling without sacrificing the false-negative rate although trimming is more commonly used currently in the field. The perl script for masking is available at http://code.google.com/p/subn/. The sequencing data used in the study were deposited in the Sequence Read Archive (SRX450968 and SRX451773).

  20. Correlation between phosphorylation level of a hippocampal 86kDa protein and extinction of a behaviour in a model of Wernicke-Korsakoff syndrome.

    PubMed

    Pires, Rita G W; Pereira, Sílvia R C; Carvalho, Fabiana M; Oliveira-Silva, Ieda F; Ferraz, Vany P; Ribeiro, Angela M

    2007-06-04

    The effects of chronic ethanol and thiamine deficiency, alone or associated, on hippocampal protein phosphorylation profiles ranging in molecular weight from 30 to 250kDa molecular weight, in stimulated (high K(+) concentration) and unstimulated (basal) conditions were investigated. These treatments significantly changed the phosphorylation level of an 86kDa phosphoprotein. Thiamine deficiency, but not chronic ethanol, induced a decrease in a behavioural extinction index, which is significantly correlated to the phosphorylation level of the p86 protein. These data add to and extend previous findings by our laboratory implicating the involvement of hippocampal neurotransmission components in extinction of a behaviour which involves learning of environmental spatial cues.

  1. Analysis and Functional Annotation of an Expressed Sequence Tag Collection for Tropical Crop Sugarcane

    PubMed Central

    Vettore, André L.; da Silva, Felipe R.; Kemper, Edson L.; Souza, Glaucia M.; da Silva, Aline M.; Ferro, Maria Inês T.; Henrique-Silva, Flavio; Giglioti, Éder A.; Lemos, Manoel V.F.; Coutinho, Luiz L.; Nobrega, Marina P.; Carrer, Helaine; França, Suzelei C.; Bacci, Maurício; Goldman, Maria Helena S.; Gomes, Suely L.; Nunes, Luiz R.; Camargo, Luis E.A.; Siqueira, Walter J.; Van Sluys, Marie-Anne; Thiemann, Otavio H.; Kuramae, Eiko E.; Santelli, Roberto V.; Marino, Celso L.; Targon, Maria L.P.N.; Ferro, Jesus A.; Silveira, Henrique C.S.; Marini, Danyelle C.; Lemos, Eliana G.M.; Monteiro-Vitorello, Claudia B.; Tambor, José H.M.; Carraro, Dirce M.; Roberto, Patrícia G.; Martins, Vanderlei G.; Goldman, Gustavo H.; de Oliveira, Regina C.; Truffi, Daniela; Colombo, Carlos A.; Rossi, Magdalena; de Araujo, Paula G.; Sculaccio, Susana A.; Angella, Aline; Lima, Marleide M.A.; de Rosa, Vicente E.; Siviero, Fábio; Coscrato, Virginia E.; Machado, Marcos A.; Grivet, Laurent; Di Mauro, Sonia M.Z.; Nobrega, Francisco G.; Menck, Carlos F.M.; Braga, Marilia D.V.; Telles, Guilherme P.; Cara, Frank A.A.; Pedrosa, Guilherme; Meidanis, João; Arruda, Paulo

    2003-01-01

    To contribute to our understanding of the genome complexity of sugarcane, we undertook a large-scale expressed sequence tag (EST) program. More than 260,000 cDNA clones were partially sequenced from 26 standard cDNA libraries generated from different sugarcane tissues. After the processing of the sequences, 237,954 high-quality ESTs were identified. These ESTs were assembled into 43,141 putative transcripts. Of the assembled sequences, 35.6% presented no matches with existing sequences in public databases. A global analysis of the whole SUCEST data set indicated that 14,409 assembled sequences (33% of the total) contained at least one cDNA clone with a full-length insert. Annotation of the 43,141 assembled sequences associated almost 50% of the putative identified sugarcane genes with protein metabolism, cellular communication/signal transduction, bioenergetics, and stress responses. Inspection of the translated assembled sequences for conserved protein domains revealed 40,821 amino acid sequences with 1415 Pfam domains. Reassembling the consensus sequences of the 43,141 transcripts revealed a 22% redundancy in the first assembling. This indicated that possibly 33,620 unique genes had been identified and indicated that >90% of the sugarcane expressed genes were tagged. PMID:14613979

  2. Achieving an empathic stance: dialogical sequence analysis of a change episode.

    PubMed

    Tikkanen, Soile; Stiles, William B; Leiman, Mikael

    2013-01-01

    Abstract This study examined a client's therapeutic progress within one session of an 18-session child neurological assessment. The analysis focused on a parent-psychologist dialogue in one session of the assessment process. Dialogical sequence analysis (DSA; Leiman, 2004, 2012) was used as a micro-analytic method to examine the developing discourse. The analysis traced the mother's developing of a reflective stance toward herself and her problematic ways of interacting with her daughter, who was the client. During the dialogue, the mother began to recognize her own contribution in maintaining the problematic pattern. Her gradual acknowledgment of the child's perspective and her growing sense of the child's otherness were mediated by an observer position (third-person view) toward the problematic pattern, which allowed a flexible exchange between the perspectives of self and the other. The results demonstrate the parallel development of intrapersonal and interpersonal empathy shown previously to characterize the transition from stage 3 (problem statement/clarification) to stage 4 (understanding/insight) in the assimilation of problematic experiences sequence (Brinegar, Salvi, Stiles, & Greenberg, 2006).

  3. Implementation of Cloud based next generation sequencing data analysis in a clinical laboratory.

    PubMed

    Onsongo, Getiria; Erdmann, Jesse; Spears, Michael D; Chilton, John; Beckman, Kenneth B; Hauge, Adam; Yohe, Sophia; Schomaker, Matthew; Bower, Matthew; Silverstein, Kevin A T; Thyagarajan, Bharat

    2014-05-23

    The introduction of next generation sequencing (NGS) has revolutionized molecular diagnostics, though several challenges remain limiting the widespread adoption of NGS testing into clinical practice. One such difficulty includes the development of a robust bioinformatics pipeline that can handle the volume of data generated by high-throughput sequencing in a cost-effective manner. Analysis of sequencing data typically requires a substantial level of computing power that is often cost-prohibitive to most clinical diagnostics laboratories. To address this challenge, our institution has developed a Galaxy-based data analysis pipeline which relies on a web-based, cloud-computing infrastructure to process NGS data and identify genetic variants. It provides additional flexibility, needed to control storage costs, resulting in a pipeline that is cost-effective on a per-sample basis. It does not require the usage of EBS disk to run a sample. We demonstrate the validation and feasibility of implementing this bioinformatics pipeline in a molecular diagnostics laboratory. Four samples were analyzed in duplicate pairs and showed 100% concordance in mutations identified. This pipeline is currently being used in the clinic and all identified pathogenic variants confirmed using Sanger sequencing further validating the software.

  4. Patome: a database server for biological sequence annotation and analysis in issued patents and published patent applications.

    PubMed

    Lee, Byungwook; Kim, Taehyung; Kim, Seon-Kyu; Lee, Kwang H; Lee, Doheon

    2007-01-01

    With the advent of automated and high-throughput techniques, the number of patent applications containing biological sequences has been increasing rapidly. However, they have attracted relatively little attention compared to other sequence resources. We have built a database server called Patome, which contains biological sequence data disclosed in patents and published applications, as well as their analysis information. The analysis is divided into two steps. The first is an annotation step in which the disclosed sequences were annotated with RefSeq database. The second is an association step where the sequences were linked to Entrez Gene, OMIM and GO databases, and their results were saved as a gene-patent table. From the analysis, we found that 55% of human genes were associated with patenting. The gene-patent table can be used to identify whether a particular gene or disease is related to patenting. Patome is available at http://www.patome.org/; the information is updated bimonthly.

  5. Patome: a database server for biological sequence annotation and analysis in issued patents and published patent applications

    PubMed Central

    Lee, Byungwook; Kim, Taehyung; Kim, Seon-Kyu; Lee, Kwang H.; Lee, Doheon

    2007-01-01

    With the advent of automated and high-throughput techniques, the number of patent applications containing biological sequences has been increasing rapidly. However, they have attracted relatively little attention compared to other sequence resources. We have built a database server called Patome, which contains biological sequence data disclosed in patents and published applications, as well as their analysis information. The analysis is divided into two steps. The first is an annotation step in which the disclosed sequences were annotated with RefSeq database. The second is an association step where the sequences were linked to Entrez Gene, OMIM and GO databases, and their results were saved as a gene–patent table. From the analysis, we found that 55% of human genes were associated with patenting. The gene–patent table can be used to identify whether a particular gene or disease is related to patenting. Patome is available at ; the information is updated bimonthly. PMID:17085479

  6. CloVR-ITS: Automated internal transcribed spacer amplicon sequence analysis pipeline for the characterization of fungal microbiota

    PubMed Central

    2013-01-01

    Background Besides the development of comprehensive tools for high-throughput 16S ribosomal RNA amplicon sequence analysis, there exists a growing need for protocols emphasizing alternative phylogenetic markers such as those representing eukaryotic organisms. Results Here we introduce CloVR-ITS, an automated pipeline for comparative analysis of internal transcribed spacer (ITS) pyrosequences amplified from metagenomic DNA isolates and representing fungal species. This pipeline performs a variety of steps similar to those commonly used for 16S rRNA amplicon sequence analysis, including preprocessing for quality, chimera detection, clustering of sequences into operational taxonomic units (OTUs), taxonomic assignment (at class, order, family, genus, and species levels) and statistical analysis of sample groups of interest based on user-provided information. Using ITS amplicon pyrosequencing data from a previous human gastric fluid study, we demonstrate the utility of CloVR-ITS for fungal microbiota analysis and provide runtime and cost examples, including analysis of extremely large datasets on the cloud. We show that the largest fractions of reads from the stomach fluid samples were assigned to Dothideomycetes, Saccharomycetes, Agaricomycetes and Sordariomycetes but that all samples were dominated by sequences that could not be taxonomically classified. Representatives of the Candida genus were identified in all samples, most notably C. quercitrusa, while sequence reads assigned to the Aspergillus genus were only identified in a subset of samples. CloVR-ITS is made available as a pre-installed, automated, and portable software pipeline for cloud-friendly execution as part of the CloVR virtual machine package (http://clovr.org). Conclusion The CloVR-ITS pipeline provides fungal microbiota analysis that can be complementary to bacterial 16S rRNA and total metagenome sequence analysis allowing for more comprehensive studies of environmental and host-associated microbial

  7. HTSstation: a web application and open-access libraries for high-throughput sequencing data analysis.

    PubMed

    David, Fabrice P A; Delafontaine, Julien; Carat, Solenne; Ross, Frederick J; Lefebvre, Gregory; Jarosz, Yohan; Sinclair, Lucas; Noordermeer, Daan; Rougemont, Jacques; Leleu, Marion

    2014-01-01

    The HTSstation analysis portal is a suite of simple web forms coupled to modular analysis pipelines for various applications of High-Throughput Sequencing including ChIP-seq, RNA-seq, 4C-seq and re-sequencing. HTSstation offers biologists the possibility to rapidly investigate their HTS data using an intuitive web application with heuristically pre-defined parameters. A number of open-source software components have been implemented and can be used to build, configure and run HTS analysis pipelines reactively. Besides, our programming framework empowers developers with the possibility to design their own workflows and integrate additional third-party software. The HTSstation web application is accessible at http://htsstation.epfl.ch.

  8. Development of self-compressing BLSOM for comprehensive analysis of big sequence data.

    PubMed

    Kikuchi, Akihito; Ikemura, Toshimichi; Abe, Takashi

    2015-01-01

    With the remarkable increase in genomic sequence data from various organisms, novel tools are needed for comprehensive analyses of available big sequence data. We previously developed a Batch-Learning Self-Organizing Map (BLSOM), which can cluster genomic fragment sequences according to phylotype solely dependent on oligonucleotide composition and applied to genome and metagenomic studies. BLSOM is suitable for high-performance parallel-computing and can analyze big data simultaneously, but a large-scale BLSOM needs a large computational resource. We have developed Self-Compressing BLSOM (SC-BLSOM) for reduction of computation time, which allows us to carry out comprehensive analysis of big sequence data without the use of high-performance supercomputers. The strategy of SC-BLSOM is to hierarchically construct BLSOMs according to data class, such as phylotype. The first-layer BLSOM was constructed with each of the divided input data pieces that represents the data subclass, such as phylotype division, resulting in compression of the number of data pieces. The second BLSOM was constructed with a total of weight vectors obtained in the first-layer BLSOMs. We compared SC-BLSOM with the conventional BLSOM by analyzing bacterial genome sequences. SC-BLSOM could be constructed faster than BLSOM and cluster the sequences according to phylotype with high accuracy, showing the method's suitability for efficient knowledge discovery from big sequence data.

  9. Multilocus Sequence Typing Analysis of Staphylococcus lugdunensis Implies a Clonal Population Structure

    PubMed Central

    Chassain, Benoît; Lemée, Ludovic; Didi, Jennifer; Thiberge, Jean-Michel; Brisse, Sylvain; Pons, Jean-Louis

    2012-01-01

    Staphylococcus lugdunensis is recognized as one of the major pathogenic species within the genus Staphylococcus, even though it belongs to the coagulase-negative group. A multilocus sequence typing (MLST) scheme was developed to study the genetic relationships and population structure of 87 S. lugdunensis isolates from various clinical and geographic sources by DNA sequence analysis of seven housekeeping genes (aroE, dat, ddl, gmk, ldh, recA, and yqiL). The number of alleles ranged from four (gmk and ldh) to nine (yqiL). Allelic profiles allowed the definition of 20 different sequence types (STs) and five clonal complexes. The 20 STs lacked correlation with geographic source. Isolates recovered from hematogenic infections (blood or osteoarticular isolates) or from skin and soft tissue infections did not cluster in separate lineages. Penicillin-resistant isolates clustered mainly in one clonal complex, unlike glycopeptide-tolerant isolates, which did not constitute a distinct subpopulation within S. lugdunensis. Phylogenies from the sequences of the seven individual housekeeping genes were congruent, indicating a predominantly mutational evolution of these genes. Quantitative analysis of the linkages between alleles from the seven loci revealed a significant linkage disequilibrium, thus confirming a clonal population structure for S. lugdunensis. This first MLST scheme for S. lugdunensis provides a new tool for investigating the macroepidemiology and phylogeny of this unusually virulent coagulase-negative Staphylococcus. PMID:22785196

  10. Expressed sequence tag analysis of guinea pig (Cavia porcellus) eye tissues for NEIBank

    PubMed Central

    Simpanya, Mukoma F.; Wistow, Graeme; Gao, James; David, Larry L.; Giblin, Frank J.

    2008-01-01

    Purpose To characterize gene expression patterns in guinea pig ocular tissues and identify orthologs of human genes from NEIBank expressed sequence tags. Methods RNA was extracted from dissected eye tissues of 2.5-month-old guinea pigs to make three unamplified and unnormalized cDNA libraries in the pCMVSport-6 vector for the lens, retina, and eye minus lens and retina. Over 4,000 clones were sequenced from each library and were analyzed using GRIST for clustering and gene identification. Lens crystallin EST data were validated using two-dimensional electrophoresis (2-DE), matrix assisted laser desorption (MALDI), and electrospray ionization mass spectrometry (ESIMS). Results Combined data from the three libraries generated a total of 6,694 distinctive gene clusters, with each library having between 1,000 and 3,000 clusters. Approximately 60% of the total gene clusters were novel cDNA sequences and had significant homologies to other mammalian sequences in GenBank. Complete cDNA sequences were obtained for many guinea pig lens proteins, including αA/αAinsert-, γN-, and γS-crystallins, lengsin and GRIFIN. The ratio of αA- to αB-crystallin on 2-DE gels was 8: 1 in the lens nucleus and 6.5: 1 in the cortex. Analysis of ESTs, genome sequence, and proteins (by MALDI), did not reveal any evidence for the presence of γD-, γE-, and γF-crystallin in the guinea pig. Predicted masses of many guinea pig lens crystallins were confirmed by ESIMS analysis. For the retina, orthologs of human phototransduction genes were found, such as Rhodopsin, S-antigen (Sag, Arrestin), and Transducin. The guinea-pig ortholog of NRL, a key rod photoreceptor-specific transcription factor, was also represented in EST data. In the ‘rest-of-eye’ library, the most abundant transcripts included decorin and keratin 12, representative of the cornea. Conclusions Genomic analysis of guinea pig eye tissues provides sequence-verified clones for future studies. Guinea pig orthologs of many human

  11. ANCAC: amino acid, nucleotide, and codon analysis of COGs – a tool for sequence bias analysis in microbial orthologs

    PubMed Central

    2012-01-01

    Background The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG) within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Results Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC’s NUCOCOG dataset as the largest one available for that purpose thus far. Conclusions Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills. PMID:22958836

  12. Ixodes ricinus Tick Lipocalins: Identification, Cloning, Phylogenetic Analysis and Biochemical Characterization

    PubMed Central

    Beaufays, Jérôme; Adam, Benoît; Decrem, Yves; Prévôt, Pierre-Paul; Santini, Sébastien; Brasseur, Robert; Brossard, Michel; Lins, Laurence

    2008-01-01

    Background During their blood meal, ticks secrete a wide variety of proteins that interfere with their host's defense mechanisms. Among these proteins, lipocalins play a major role in the modulation of the inflammatory response. Methodology/Principal Findings Screening a cDNA library in association with RT-PCR and RACE methodologies allowed us to identify 14 new lipocalin genes in the salivary glands of the Ixodes ricinus hard tick. A computational in-depth structural analysis confirmed that LIRs belong to the lipocalin family. These proteins were called LIR for “Lipocalin from I. ricinus” and numbered from 1 to 14 (LIR1 to LIR14). According to their percentage identity/similarity, LIR proteins may be assigned to 6 distinct phylogenetic groups. The mature proteins have calculated pM and pI varying from 21.8 kDa to 37.2 kDa and from 4.45 to 9.57 respectively. In a western blot analysis, all recombinant LIRs appeared as a series of thin bands at 50–70 kDa, suggesting extensive glycosylation, which was experimentally confirmed by treatment with N-glycosidase F. In addition, the in vivo expression analysis of LIRs in I. ricinus, examined by RT-PCR, showed homogeneous expression profiles for certain phylogenetic groups and relatively heterogeneous profiles for other groups. Finally, we demonstrated that LIR6 codes for a protein that specifically binds leukotriene B4. Conclusions/Significance This work confirms that, regarding their biochemical properties, expression profile, and sequence signature, lipocalins in Ixodes hard tick genus, and more specifically in the Ixodes ricinus species, are segregated into distinct phylogenetic groups suggesting potential distinct function. This was particularly demonstrated by the ability of LIR6 to scavenge leukotriene B4. The other LIRs did not bind any of the ligands tested, such as 5-hydroxytryptamine, ADP, norepinephrine, platelet activating factor, prostaglandins D2 and E2, and finally leukotrienes B4 and C4. PMID:19096708

  13. CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences

    PubMed Central

    2012-01-01

    Background The complete sequences of chloroplast genomes provide wealthy information regarding the evolutionary history of species. With the advance of next-generation sequencing technology, the number of completely sequenced chloroplast genomes is expected to increase exponentially, powerful computational tools annotating the genome sequences are in urgent need. Results We have developed a web server CPGAVAS. The server accepts a complete chloroplast genome sequence as input. First, it predicts protein-coding and rRNA genes based on the identification and mapping of the most similar, full-length protein, cDNA and rRNA sequences by integrating results from Blastx, Blastn, protein2genome and est2genome programs. Second, tRNA genes and inverted repeats (IR) are identified using tRNAscan, ARAGORN and vmatch respectively. Third, it calculates the summary statistics for the annotated genome. Fourth, it generates a circular map ready for publication. Fifth, it can create a Sequin file for GenBank submission. Last, it allows the extractions of protein and mRNA sequences for given list of genes and species. The annotation results in GFF3 format can be edited using any compatible annotation editing tools. The edited annotations can then be uploaded to CPGAVAS for update and re-analyses repeatedly. Using known chloroplast genome sequences as test set, we show that CPGAVAS performs comparably to another application DOGMA, while having several superior functionalities. Conclusions CPGAVAS allows the semi-automatic and complete annotation of a chloroplast genome sequence, and the visualization, editing and analysis of the annotation results. It will become an indispensible tool for researchers studying chloroplast genomes. The software is freely accessible from http://www.herbalgenomics.org/cpgavas. PMID:23256920

  14. Size-Sorting Combined with Improved Nanocapillary-LC-MS for Identification of Intact Proteins up to 80 kDa

    PubMed Central

    Vellaichamy, Adaikkalam; Tran, John C.; Catherman, Adam D.; Lee, Ji Eun; Kellie, John F.; Sweet, Steve M.M.; Zamdborg, Leonid; Thomas, Paul M.; Ahlf, Dorothy R.; Durbin, Kenneth R.; Valaskovic, Gary A.; Kelleher, Neil L.

    2010-01-01

    Despite the availability of ultra-high resolution mass spectrometers, methods for separation and detection of intact proteins for proteome-scale analyses are still in a developmental phase. Here we report robust protocols for on-line LC-MS to drive high-throughput top-down proteomics in a fashion similar to bottom-up. Comparative work on protein standards showed that a polymeric stationary phase led to superior sensitivity over a silica-based medium in reversed-phase nanocapillary-LC, with detection of proteins >50 kDa routinely accomplished in the linear ion trap of a hybrid Fourier-Transform mass spectrometer. Protein identification was enabled by nozzle-skimmer dissociation (NSD) and detection of fragment ions with <5 ppm mass accuracy for highly-specific database searching using custom software. This overall approach led to identification of proteins up to 80 kDa, with 10-60 proteins identified in single LC-MS runs of samples from yeast and human cell lines pre-fractionated by their molecular weight using a gel-based sieving system. PMID:20073486

  15. A strategic stakeholder approach for addressing further analysis requests in whole genome sequencing research.

    PubMed

    Thornock, Bradley Steven O

    2016-01-01

    Whole genome sequencing (WGS) can be a cost-effective and efficient means of diagnosis for some children, but it also raises a number of ethical concerns. One such concern is how researchers derive and communicate results from WGS, including future requests for further analysis of stored sequences. The purpose of this paper is to think about what is at stake, and for whom, in any solution that is developed to deal with such requests. To accomplish this task, this paper will utilize stakeholder theory, a common method used in business ethics. Several scenarios that connect stakeholder concerns and WGS will also posited and analyzed. This paper concludes by developing criteria composed of a series of questions that researchers can answer in order to more effectively address requests for further analysis of stored sequences.

  16. Student Initiatives and Missed Learning Opportunities in an IRF Sequence: A Single Case Analysis

    ERIC Educational Resources Information Center

    Li, Houxiang

    2013-01-01

    Most conversation analysis (CA) studies of the initiation-response-feedback (IRF; Sinclair & Coulthard, 1975) sequence have focused on teacher actions in the feedback move. In this article, I use CA to analyze student initiatives (Waring, 2011) within an IRF sequence in one excerpt from a Chinese as a foreign language class. The excerpt…

  17. An atypical topoisomerase II sequence from the slime mold Physarum polycephalum.

    PubMed

    Hugodot, Yannick; Dutertre, Murielle; Duguet, Michel

    2004-01-21

    We have determined the complete nucleotide sequence of the cDNA encoding DNA topoisomerase II from Physarum polycephalum. Using degenerate primers, based on the conserved amino acid sequences of other eukaryotic enzymes, a 250-bp fragment was polymerase chain reaction (PCR) amplified. This fragment was used as a probe to screen a Physarum cDNA library. A partial cDNA clone was isolated that was truncated at the 3' end. Rapid amplification of cDNA ends (RACE)-PCR was employed to isolate the remaining portion of the gene. The complete sequence of 4613 bp contains an open reading frame of 4494 bp that codes for 1498 amino acid residues with a theoretical molecular weight of 167 kDa. The predicted amino acid sequence shares similarity with those of other eukaryotes and shows the highest degree of identity with the enzyme of Dictyostelium discoideum. However, the enzyme of P. polycephalum contains an atypical amino-terminal domain very rich in serine and proline, whose function is unknown. Remarkably, both a mitochondrial targeting sequence and a nuclear localization signal were predicted respectively in the amino and carboxy-terminus of the protein, as in the case of human topoisomerase III alpha. At the Physarum genomic level, the topoisomerase II gene encompasses a region of about 16 kbp suggesting a large proportion of intronic sequences, an unusual situation for a gene of a lower eukaryote, often free of introns. Finally, expression of topoisomerase II mRNA does not appear significantly dependent on the plasmodium cycle stage, possibly due to the lack of G1 phase or (and) to a mitochondrial localization of the enzyme.

  18. Molecular variability analysis of five new complete cacao swollen shoot virus genomic sequences.

    PubMed

    Muller, E; Sackey, S

    2005-01-01

    Cacao swollen shoot virus (CSSV), a member of the family Caulimovi-ridae, genus Badnavirus occurs in all the main cacao-growing areas of West Africa. We amplified, cloned and sequenced complete genomes of five new isolates, two originating from Togo and three originating from Ghana. The genome of these five newly sequenced isolates all contain the five putative open reading frames I, II, III, X and Y described for the first sequenced CSSV isolate, Agou1 originating from Togo. Their genomes have been aligned with the genome of Agou1. The nucleotide and amino acid sequence identities between isolates have been calculated and a phylogenetic analysis has been made including other pararetroviruses. Maximum nucleotide sequence variability between complete genomes of CSSV isolates was 29.4%. Geographical differentiation between isolates appears more important than differentiation between mild and severe isolates. ORF X differs greatly in size and sequence between the Togolese isolates Nyongbo2 and Agou1, and the four other isolates, its functional role is therefore clearly questionable.

  19. Mercury: Next-gen Data Analysis and Annotation Pipeline (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    Sexton, David

    2018-01-22

    David Sexton (Baylor) gives a talk titled "Mercury: Next-gen Data Analysis and Annotation Pipeline" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  20. Mercury: Next-gen Data Analysis and Annotation Pipeline (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sexton, David

    2012-06-01

    David Sexton (Baylor) gives a talk titled "Mercury: Next-gen Data Analysis and Annotation Pipeline" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  1. Automated Sanger Analysis Pipeline (ASAP): A Tool for Rapidly Analyzing Sanger Sequencing Data with Minimum User Interference.

    PubMed

    Singh, Aditya; Bhatia, Prateek

    2016-12-01

    Sanger sequencing platforms, such as applied biosystems instruments, generate chromatogram files. Generally, for 1 region of a sequence, we use both forward and reverse primers to sequence that area, in that way, we have 2 sequences that need to be aligned and a consensus generated before mutation detection studies. This work is cumbersome and takes time, especially if the gene is large with many exons. Hence, we devised a rapid automated command system to filter, build, and align consensus sequences and also optionally extract exonic regions, translate them in all frames, and perform an amino acid alignment starting from raw sequence data within a very short time. In full capabilities of Automated Mutation Analysis Pipeline (ASAP), it is able to read "*.ab1" chromatogram files through command line interface, convert it to the FASTQ format, trim the low-quality regions, reverse-complement the reverse sequence, create a consensus sequence, extract the exonic regions using a reference exonic sequence, translate the sequence in all frames, and align the nucleic acid and amino acid sequences to reference nucleic acid and amino acid sequences, respectively. All files are created and can be used for further analysis. ASAP is available as Python 3.x executable at https://github.com/aditya-88/ASAP. The version described in this paper is 0.28.

  2. Indel variant analysis of short-read sequencing data with Scalpel

    PubMed Central

    Fang, Han; Bergmann, Ewa A; Arora, Kanika; Vacic, Vladimir; Zody, Michael C; Iossifov, Ivan; O’Rawe, Jason A; Wu, Yiyang; Barron, Laura T Jimenez; Rosenbaum, Julie; Ronemus, Michael; Lee, Yoon-ha; Wang, Zihua; Dikoglu, Esra; Jobanputra, Vaidehi; Lyon, Gholson J; Wigler, Michael; Schatz, Michael C; Narzisi, Giuseppe

    2017-01-01

    As the second most common type of variation in the human genome, insertions and deletions (indels) have been linked to many diseases, but the discovery of indels of more than a few bases in size from short-read sequencing data remains challenging. Scalpel (http://scalpel.sourceforge.net) is an open-source software for reliable indel detection based on the microassembly technique. It has been successfully used to discover mutations in novel candidate genes for autism, and it is extensively used in other large-scale studies of human diseases. This protocol gives an overview of the algorithm and describes how to use Scalpel to perform highly accurate indel calling from whole-genome and whole-exome sequencing data. We provide detailed instructions for an exemplary family-based de novo study, but we also characterize the other two supported modes of operation: single-sample and somatic analysis. Indel normalization, visualization and annotation of the mutations are also illustrated. Using a standard server, indel discovery and characterization in the exonic regions of the example sequencing data can be completed in ~5 h after read mapping. PMID:27854363

  3. CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing

    PubMed Central

    2011-01-01

    Background Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. Results We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. Conclusion The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing. PMID:21878105

  4. SeqHBase: a big data toolset for family based sequencing data analysis.

    PubMed

    He, Min; Person, Thomas N; Hebbring, Scott J; Heinzen, Ethan; Ye, Zhan; Schrodi, Steven J; McPherson, Elizabeth W; Lin, Simon M; Peissig, Peggy L; Brilliant, Murray H; O'Rawe, Jason; Robison, Reid J; Lyon, Gholson J; Wang, Kai

    2015-04-01

    Whole-genome sequencing (WGS) and whole-exome sequencing (WES) technologies are increasingly used to identify disease-contributing mutations in human genomic studies. It can be a significant challenge to process such data, especially when a large family or cohort is sequenced. Our objective was to develop a big data toolset to efficiently manipulate genome-wide variants, functional annotations and coverage, together with conducting family based sequencing data analysis. Hadoop is a framework for reliable, scalable, distributed processing of large data sets using MapReduce programming models. Based on Hadoop and HBase, we developed SeqHBase, a big data-based toolset for analysing family based sequencing data to detect de novo, inherited homozygous, or compound heterozygous mutations that may contribute to disease manifestations. SeqHBase takes as input BAM files (for coverage at every site), variant call format (VCF) files (for variant calls) and functional annotations (for variant prioritisation). We applied SeqHBase to a 5-member nuclear family and a 10-member 3-generation family with WGS data, as well as a 4-member nuclear family with WES data. Analysis times were almost linearly scalable with number of data nodes. With 20 data nodes, SeqHBase took about 5 secs to analyse WES familial data and approximately 1 min to analyse WGS familial data. These results demonstrate SeqHBase's high efficiency and scalability, which is necessary as WGS and WES are rapidly becoming standard methods to study the genetics of familial disorders. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  5. Top-down analysis of protein samples by de novo sequencing techniques

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vyatkina, Kira; Wu, Si; Dekker, Lennard J. M.

    MOTIVATION: Recent technological advances have made high-resolution mass spectrometers affordable to many laboratories, thus boosting rapid development of top-down mass spectrometry, and implying a need in efficient methods for analyzing this kind of data. RESULTS: We describe a method for analysis of protein samples from top-down tandem mass spectrometry data, which capitalizes on de novo sequencing of fragments of the proteins present in the sample. Our algorithm takes as input a set of de novo amino acid strings derived from the given mass spectra using the recently proposed Twister approach, and combines them into aggregated strings endowed with offsets. Themore » former typically constitute accurate sequence fragments of sufficiently well-represented proteins from the sample being analyzed, while the latter indicate their location in the protein sequence, and also bear information on post-translational modifications and fragmentation patterns.« less

  6. Kickoff to Conflict: A Sequence Analysis of Intra-State Conflict-Preceding Event Structures

    PubMed Central

    D'Orazio, Vito; Yonamine, James E.

    2015-01-01

    While many studies have suggested or assumed that the periods preceding the onset of intra-state conflict are similar across time and space, few have empirically tested this proposition. Using the Integrated Crisis Early Warning System's domestic event data in Asia from 1998–2010, we subject this proposition to empirical analysis. We code the similarity of government-rebel interactions in sequences preceding the onset of intra-state conflict to those preceding further periods of peace using three different metrics: Euclidean, Levenshtein, and mutual information. These scores are then used as predictors in a bivariate logistic regression to forecast whether we are likely to observe conflict in neither, one, or both of the states. We find that our model accurately classifies cases where both sequences precede peace, but struggles to distinguish between cases in which one sequence escalates to conflict and where both sequences escalate to conflict. These findings empirically suggest that generalizable patterns exist between event sequences that precede peace. PMID:25951105

  7. Tula hantavirus L protein is a 250 kDa perinuclear membrane-associated protein.

    PubMed

    Kukkonen, Sami K J; Vaheri, Antti; Plyusnin, Alexander

    2004-05-01

    The complete open reading frame of Tula hantavirus (TULV) L RNA was cloned in three parts. The middle third (nt 2191-4344) could be expressed in E. coli and was used to immunize rabbits. The resultant antiserum was then used to immunoblot concentrated TULV and infected Vero E6 cells. The L protein of a hantavirus was detected, for the first time, in infected cells and was found to be expressed as a single protein with an apparent molecular mass of 250 kDa in both virions and infected cells. Using the antiserum, the expression level of the L protein was followed and image analysis of immunoblots indicated that there were 10(4) copies per cell at the peak level of expression. The antiserum was also used to detect the L protein in cell fractionation studies. In cells infected with TULV and cells expressing recombinant L, the protein pelleted with the microsomal membrane fraction. The membrane association was confirmed with membrane flotation assays. To visualize L protein localization in cells, a fusion protein of L and enhanced green fluorescent protein, L-EGFP, was expressed in Vero E6 cells with a plasmid-driven T7 expression system. L-EGFP localized in the perinuclear region where it had partial co-localization with the Golgi matrix protein GM130 and the TULV nucleocapsid protein.

  8. Solution Structure of the 128 kDa Enzyme I Dimer from Escherichia coli and Its 146 kDa Complex with HPr Using Residual Dipolar Couplings and Small- and Wide-Angle X-ray Scattering

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schwieters, Charles D.; Suh, Jeong-Yong; Grishaev, Alexander

    2010-09-17

    The solution structures of free Enzyme I (EI, {approx}128 kDa, 575 x 2 residues), the first enzyme in the bacterial phosphotransferase system, and its complex with HPr ({approx}146 kDa) have been solved using novel methodology that makes use of prior structural knowledge (namely, the structures of the dimeric EIC domain and the isolated EIN domain both free and complexed to HPr), combined with residual dipolar coupling (RDC), small- (SAXS) and wide- (WAXS) angle X-ray scattering and small-angle neutron scattering (SANS) data. The calculational strategy employs conjoined rigid body/torsion/Cartesian simulated annealing, and incorporates improvements in calculating and refining against SAXS/WAXS datamore » that take into account complex molecular shapes in the description of the solvent layer resulting in a better representation of the SAXS/WAXS data. The RDC data orient the symmetrically related EIN domains relative to the C{sub 2} symmetry axis of the EIC dimer, while translational, shape, and size information is provided by SAXS/WAXS. The resulting structures are independently validated by SANS. Comparison of the structures of the free EI and the EI-HPr complex with that of the crystal structure of a trapped phosphorylated EI intermediate reveals large ({approx}70-90{sup o}) hinge body rotations of the two subdomains comprising the EIN domain, as well as of the EIN domain relative to the dimeric EIC domain. These large-scale interdomain motions shed light on the structural transitions that accompany the catalytic cycle of EI.« less

  9. A Reference Viral Database (RVDB) To Enhance Bioinformatics Analysis of High-Throughput Sequencing for Novel Virus Detection

    PubMed Central

    Goodacre, Norman; Aljanahi, Aisha; Nandakumar, Subhiksha; Mikailov, Mike

    2018-01-01

    ABSTRACT Detection of distantly related viruses by high-throughput sequencing (HTS) is bioinformatically challenging because of the lack of a public database containing all viral sequences, without abundant nonviral sequences, which can extend runtime and obscure viral hits. Our reference viral database (RVDB) includes all viral, virus-related, and virus-like nucleotide sequences (excluding bacterial viruses), regardless of length, and with overall reduced cellular sequences. Semantic selection criteria (SEM-I) were used to select viral sequences from GenBank, resulting in a first-generation viral database (VDB). This database was manually and computationally reviewed, resulting in refined, semantic selection criteria (SEM-R), which were applied to a new download of updated GenBank sequences to create a second-generation VDB. Viral entries in the latter were clustered at 98% by CD-HIT-EST to reduce redundancy while retaining high viral sequence diversity. The viral identity of the clustered representative sequences (creps) was confirmed by BLAST searches in NCBI databases and HMMER searches in PFAM and DFAM databases. The resulting RVDB contained a broad representation of viral families, sequence diversity, and a reduced cellular content; it includes full-length and partial sequences and endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Testing of RVDBv10.2, with an in-house HTS transcriptomic data set indicated a significantly faster run for virus detection than interrogating the entirety of the NCBI nonredundant nucleotide database, which contains all viral sequences but also nonviral sequences. RVDB is publically available for facilitating HTS analysis, particularly for novel virus detection. It is meant to be updated on a regular basis to include new viral sequences added to GenBank. IMPORTANCE To facilitate bioinformatics analysis of high-throughput sequencing (HTS) data for the detection of both known and novel viruses, we have

  10. A Reference Viral Database (RVDB) To Enhance Bioinformatics Analysis of High-Throughput Sequencing for Novel Virus Detection.

    PubMed

    Goodacre, Norman; Aljanahi, Aisha; Nandakumar, Subhiksha; Mikailov, Mike; Khan, Arifa S

    2018-01-01

    Detection of distantly related viruses by high-throughput sequencing (HTS) is bioinformatically challenging because of the lack of a public database containing all viral sequences, without abundant nonviral sequences, which can extend runtime and obscure viral hits. Our reference viral database (RVDB) includes all viral, virus-related, and virus-like nucleotide sequences (excluding bacterial viruses), regardless of length, and with overall reduced cellular sequences. Semantic selection criteria (SEM-I) were used to select viral sequences from GenBank, resulting in a first-generation viral database (VDB). This database was manually and computationally reviewed, resulting in refined, semantic selection criteria (SEM-R), which were applied to a new download of updated GenBank sequences to create a second-generation VDB. Viral entries in the latter were clustered at 98% by CD-HIT-EST to reduce redundancy while retaining high viral sequence diversity. The viral identity of the clustered representative sequences (creps) was confirmed by BLAST searches in NCBI databases and HMMER searches in PFAM and DFAM databases. The resulting RVDB contained a broad representation of viral families, sequence diversity, and a reduced cellular content; it includes full-length and partial sequences and endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Testing of RVDBv10.2, with an in-house HTS transcriptomic data set indicated a significantly faster run for virus detection than interrogating the entirety of the NCBI nonredundant nucleotide database, which contains all viral sequences but also nonviral sequences. RVDB is publically available for facilitating HTS analysis, particularly for novel virus detection. It is meant to be updated on a regular basis to include new viral sequences added to GenBank. IMPORTANCE To facilitate bioinformatics analysis of high-throughput sequencing (HTS) data for the detection of both known and novel viruses, we have

  11. Integrated protein analysis platform based on column switch recycling size exclusion chromatography, microenzymatic reactor and microRPLC-ESI-MS/MS.

    PubMed

    Yuan, Huiming; Zhou, Yuan; Zhang, Lihua; Liang, Zhen; Zhang, Yukui

    2009-10-30

    An integrated platform with the combination of proteins and peptides separation was established via the unit of on-line proteins digestion, by which proteins were in sequence separated by column switch recycling size exclusion chromatography (csrSEC), on-line digested by an immobilized trypsin microreactor, trapped and desalted by two parallel C8 precolumns, separated by microRPLC with the linear gradient of organic modifier concentration, and identified by ESI-MS/MS. A 6-protein mixture, with Mr ranging from 10 kDa to 80 kDa, was used to evaluate the performance of the integrated platform, and all proteins were identified with sequence coverage over 5.67%. Our experimental results demonstrate that such an integrated platform is of advantages such as good time compatibility, high peak capacity, and facile automation, which might be a promising approach for proteome study.

  12. HTSstation: A Web Application and Open-Access Libraries for High-Throughput Sequencing Data Analysis

    PubMed Central

    David, Fabrice P. A.; Delafontaine, Julien; Carat, Solenne; Ross, Frederick J.; Lefebvre, Gregory; Jarosz, Yohan; Sinclair, Lucas; Noordermeer, Daan; Rougemont, Jacques; Leleu, Marion

    2014-01-01

    The HTSstation analysis portal is a suite of simple web forms coupled to modular analysis pipelines for various applications of High-Throughput Sequencing including ChIP-seq, RNA-seq, 4C-seq and re-sequencing. HTSstation offers biologists the possibility to rapidly investigate their HTS data using an intuitive web application with heuristically pre-defined parameters. A number of open-source software components have been implemented and can be used to build, configure and run HTS analysis pipelines reactively. Besides, our programming framework empowers developers with the possibility to design their own workflows and integrate additional third-party software. The HTSstation web application is accessible at http://htsstation.epfl.ch. PMID:24475057

  13. Sequence analysis of the msp4 gene of Anaplasma ovis strains

    USGS Publications Warehouse

    de la Fuente, J.; Atkinson, M.W.; Naranjo, V.; Fernandez de Mera, I. G.; Mangold, A.J.; Keating, K.A.; Kocan, K.M.

    2007-01-01

    Anaplasma ovis (Rickettsiales: Anaplasmataceae) is a tick-borne pathogen of sheep, goats and wild ruminants. The genetic diversity of A. ovis strains has not been well characterized due to the lack of sequence information. In this study, we evaluated bighorn sheep (Ovis canadensis) and mule deer (Odocoileus hemionus) from Montana for infection with A. ovis by serology and sequence analysis of the msp4 gene. Antibodies to Anaplasma spp. were detected in 37% and 39% of bighorn sheep and mule deer analyzed, respectively. Four new msp4 genotypes were identified. The A. ovis msp4 sequences identified herein were analyzed together with sequences reported previously for the characterization of the genetic diversity of A. ovis strains in comparison with other Anaplasma spp. The results of these studies demonstrated that although A. ovis msp4 genotypes may vary among geographic regions and between sheep and deer hosts, the variation observed was less than the variation observed between A. marginale and A. phagocytophilum strains. The results reported herein further confirm that A. ovis infection occurs in natural wild ruminant populations in Western United States and that bighorn sheep and mule deer may serve as wildlife reservoirs of A. ovis. ?? 2006.

  14. Genetic diversity assessment of anoxygenic photosynthetic bacteria by distance-based grouping analysis of pufM sequences.

    PubMed

    Zeng, Y H; Chen, X H; Jiao, N Z

    2007-12-01

    To assess how completely the diversity of anoxygenic phototrophic bacteria (APB) was sampled in natural environments. All nucleotide sequences of the APB marker gene pufM from cultures and environmental clones were retrieved from the GenBank database. A set of cutoff values (sequence distances 0.06, 0.15 and 0.48 for species, genus, and (sub)phylum levels, respectively) was established using a distance-based grouping program. Analysis of the environmental clones revealed that current efforts on APB isolation and sampling in natural environments are largely inadequate. Analysis of the average distance between each identified genus and an uncultured environmental pufM sequence indicated that the majority of cultured APB genera lack environmental representatives. The distance-based grouping method is fast and efficient for bulk functional gene sequences analysis. The results clearly show that we are at a relatively early stage in sampling the global richness of APB species. Periodical assessment will undoubtedly facilitate in-depth analysis of potential biogeographical distribution pattern of APB. This is the first attempt to assess the present understanding of APB diversity in natural environments. The method used is also useful for assessing the diversity of other functional genes.

  15. Optimized Next-Generation Sequencing Genotype-Haplotype Calling for Genome Variability Analysis

    PubMed Central

    Navarro, Javier; Nevado, Bruno; Hernández, Porfidio; Vera, Gonzalo; Ramos-Onsins, Sebastián E

    2017-01-01

    The accurate estimation of nucleotide variability using next-generation sequencing data is challenged by the high number of sequencing errors produced by new sequencing technologies, especially for nonmodel species, where reference sequences may not be available and the read depth may be low due to limited budgets. The most popular single-nucleotide polymorphism (SNP) callers are designed to obtain a high SNP recovery and low false discovery rate but are not designed to account appropriately the frequency of the variants. Instead, algorithms designed to account for the frequency of SNPs give precise results for estimating the levels and the patterns of variability. These algorithms are focused on the unbiased estimation of the variability and not on the high recovery of SNPs. Here, we implemented a fast and optimized parallel algorithm that includes the method developed by Roesti et al and Lynch, which estimates the genotype of each individual at each site, considering the possibility to call both bases from the genotype, a single one or none. This algorithm does not consider the reference and therefore is independent of biases related to the reference nucleotide specified. The pipeline starts from a BAM file converted to pileup or mpileup format and the software outputs a FASTA file. The new program not only reduces the running times but also, given the improved use of resources, it allows its usage with smaller computers and large parallel computers, expanding its benefits to a wider range of researchers. The output file can be analyzed using software for population genetics analysis, such as the R library PopGenome, the software VariScan, and the program mstatspop for analysis considering positions with missing data. PMID:28894353

  16. Illumina MiSeq Sequencing for Preliminary Analysis of Microbiome Causing Primary Endodontic Infections in Egypt

    PubMed Central

    Azab, Marwa Mohamed; Fayyad, Dalia Mukhtar

    2018-01-01

    The use of high throughput next generation technologies has allowed more comprehensive analysis than traditional Sanger sequencing. The specific aim of this study was to investigate the microbial diversity of primary endodontic infections using Illumina MiSeq sequencing platform in Egyptian patients. Samples were collected from 19 patients in Suez Canal University Hospital (Endodontic Department) using sterile # 15K file and paper points. DNA was extracted using Mo Bio power soil DNA isolation extraction kit followed by PCR amplification and agarose gel electrophoresis. The microbiome was characterized on the basis of the V3 and V4 hypervariable region of the 16S rRNA gene by using paired-end sequencing on Illumina MiSeq device. MOTHUR software was used in sequence filtration and analysis of sequenced data. A total of 1858 operational taxonomic units at 97% similarity were assigned to 26 phyla, 245 families, and 705 genera. Four main phyla Firmicutes, Bacteroidetes, Proteobacteria, and Synergistetes were predominant in all samples. At genus level, Prevotella, Bacillus, Porphyromonas, Streptococcus, and Bacteroides were the most abundant. Illumina MiSeq platform sequencing can be used to investigate oral microbiome composition of endodontic infections. Elucidating the ecology of endodontic infections is a necessary step in developing effective intracanal antimicrobials. PMID:29849646

  17. PolyPhred analysis software for mutation detection from fluorescence-based sequence data.

    PubMed

    Montgomery, Kate T; Iartchouck, Oleg; Li, Li; Loomis, Stephanie; Obourn, Vanessa; Kucherlapati, Raju

    2008-10-01

    The ability to search for genetic variants that may be related to human disease is one of the most exciting consequences of the availability of the sequence of the human genome. Large cohorts of individuals exhibiting certain phenotypes can be studied and candidate genes resequenced. However, the challenge of analyzing sequence data from many individuals with accuracy, speed, and economy is great. This unit describes one set of software tools: Phred, Phrap, PolyPhred, and Consed. Coverage includes the advantages and disadvantages of these analysis tools, details for obtaining and using the software, and the results one may expect. The software is being continually updated to permit further automation of mutation analysis. Currently, however, at least some manual review is required if one wishes to identify 100% of the variants in a sample set.

  18. Multi-objective Analysis for a Sequencing Planning of Mixed-model Assembly Line

    NASA Astrophysics Data System (ADS)

    Shimizu, Yoshiaki; Waki, Toshiya; Yoo, Jae Kyu

    Diversified customer demands are raising importance of just-in-time and agile manufacturing much more than before. Accordingly, introduction of mixed-model assembly lines becomes popular to realize the small-lot-multi-kinds production. Since it produces various kinds on the same assembly line, a rational management is of special importance. With this point of view, this study focuses on a sequencing problem of mixed-model assembly line including a paint line as its preceding process. By taking into account the paint line together, reducing work-in-process (WIP) inventory between these heterogeneous lines becomes a major concern of the sequencing problem besides improving production efficiency. Finally, we have formulated the sequencing problem as a bi-objective optimization problem to prevent various line stoppages, and to reduce the volume of WIP inventory simultaneously. Then we have proposed a practical method for the multi-objective analysis. For this purpose, we applied the weighting method to derive the Pareto front. Actually, the resulting problem is solved by a meta-heuristic method like SA (Simulated Annealing). Through numerical experiments, we verified the validity of the proposed approach, and discussed the significance of trade-off analysis between the conflicting objectives.

  19. Generation and analysis of expressed sequence tags from the bone marrow of Chinese Sika deer.

    PubMed

    Yao, Baojin; Zhao, Yu; Zhang, Mei; Li, Juan

    2012-03-01

    Sika deer is one of the best-known and highly valued animals of China. Despite its economic, cultural, and biological importance, there has not been a large-scale sequencing project for Sika deer to date. With the ultimate goal of sequencing the complete genome of this organism, we first established a bone marrow cDNA library for Sika deer and generated a total of 2,025 reads. After processing the sequences, 2,017 high-quality expressed sequence tags (ESTs) were obtained. These ESTs were assembled into 1,157 unigenes, including 238 contigs and 919 singletons. Comparative analyses indicated that 888 (76.75%) of the unigenes had significant matches to sequences in the non-redundant protein database, In addition to highly expressed genes, such as stearoyl-CoA desaturase, cytochrome c oxidase, adipocyte-type fatty acid-binding protein, adiponectin and thymosin beta-4, we also obtained vascular endothelial growth factor-A and heparin-binding growth-associated molecule, both of which are of great importance for angiogenesis research. There were 244 (21.09%) unigenes with no significant match to any sequence in current protein or nucleotide databases, and these sequences may represent genes with unknown function in Sika deer. Open reading frame analysis of the sequences was performed using the getorf program. In addition, the sequences were functionally classified using the gene ontology hierarchy, clusters of orthologous groups of proteins and Kyoto encyclopedia of genes and genomes databases. Analysis of ESTs described in this paper provides an important resource for the transcriptome exploration of Sika deer, and will also facilitate further studies on functional genomics, gene discovery and genome annotation of Sika deer.

  20. The expression of a novel stress protein '150-kDa oxygen regulated protein' in sudden infant death.

    PubMed

    Ikematsu, Kazuya; Tsuda, Ryouichi; Kondo, Toshikazu; Kondo, Hisayoshi; Ozawa, Kentaro; Ogawa, Satoshi; Nakasono, Ichiro

    2003-03-01

    The oxygen regulated protein 150-kDa (ORP-150) is only induced in hypoxic conditions. We performed an immunohistochemical and morphometrical study on the expression of ORP-150 in the brains of sudden infant death (SID) victims. The cerebral cortexes of 18 infants were used for this study. Each tissue section was incubated with anti-ORP-150 polyclonal antibodies and the number of ORP-150 positive cells was counted. In the cluster analysis, the 18 cases were classified into three groups (A-C groups). Group A was composed of six sudden infant death syndrome (SIDS) cases and its mean value of ORP-150 positive cells was 66.75+/-3.44, Group B (six severe respiratory infectious disease such as pneumonia and bronchitis including sepsis): 39.50+/-2.52 and Group C (five SIDS and one severe respiratory infectious disease): 16.00+/-2.92, respectively. These results might reflect chronic hypoxic condition before death, because ORP-150 is only induced when a hypoxic condition exist, but not acute hypoxia. And chronic hypoxic state is likely to be antecedent to SIDS. Therefore, immunohistochemical analysis of OPR-150 in the brain of SID cases may be very useful to differentiate between SIDS and acute asphyxia.