Science.gov

Sample records for acid sequence analysis

  1. Analysis and Annotation of Nucleic Acid Sequence

    SciTech Connect

    States, David J.

    2004-07-28

    The aims of this project were to develop improved methods for computational genome annotation and to apply these methods to improve the annotation of genomic sequence data with a specific focus on human genome sequencing. The project resulted in a substantial body of published work. Notable contributions of this project were the identification of basecalling and lane tracking as error processes in genome sequencing and contributions to improved methods for these steps in genome sequencing. This technology improved the accuracy and throughput of genome sequence analysis. Probabilistic methods for physical map construction were developed. Improved methods for sequence alignment, alternative splicing analysis, promoter identification and NF kappa B response gene prediction were also developed.

  2. Analysis and Annotation of Nucleic Acid Sequence

    SciTech Connect

    David J. States

    1998-08-01

    The aims of this project were to develop improved methods for computational genome annotation and to apply these methods to improve the annotation of genomic sequence data with a specific focus on human genome sequencing. The project resulted in a substantial body of published work. Notable contributions of this project were the identification of basecalling and lane tracking as error processes in genome sequencing and contributions to improved methods for these steps in genome sequencing. This technology improved the accuracy and throughput of genome sequence analysis. Probabilistic methods for physical map construction were developed. Improved methods for sequence alignment, alternative splicing analysis, promoter identification and NF kappa B response gene prediction were also developed.

  3. Human retroviruses and AIDS 1996. A compilation and analysis of nucleic acid and amino acid sequences

    SciTech Connect

    Myers, G.; Foley, B.; Korber, B.; Mellors, J.W.; Jeang, K.T.; Wain-Hobson, S.

    1997-04-01

    This compendium and the accompanying floppy diskettes are the result of an effort to compile and rapidly publish all relevant molecular data concerning the human immunodeficiency viruses (HIV) and related retroviruses. The scope of the compendium and database is best summarized by the five parts that it comprises: (1) Nuclear Acid Alignments and Sequences; (2) Amino Acid Alignments; (3) Analysis; (4) Related Sequences; and (5) Database Communications. Information within all the parts is updated throughout the year on the Web site, http://hiv-web.lanl.gov. While this publication could take the form of a review or sequence monograph, it is not so conceived. Instead, the literature from which the database is derived has simply been summarized and some elementary computational analyses have been performed upon the data. Interpretation and commentary have been avoided insofar as possible so that the reader can form his or her own judgments concerning the complex information. In addition to the general descriptions of the parts of the compendium, the user should read the individual introductions for each part.

  4. Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    Patel, Kamlesh D [Ken]; SNL,

    2013-01-25

    Kamlesh (Ken) Patel from Sandia National Laboratories (Livermore, California) presents "Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology " at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  5. Genome Sequence Analysis of the Naphthenic Acid Degrading and Metal Resistant Bacterium Cupriavidus gilardii CR3.

    PubMed

    Wang, Xiaoyu; Chen, Meili; Xiao, Jingfa; Hao, Lirui; Crowley, David E; Zhang, Zhewen; Yu, Jun; Huang, Ning; Huo, Mingxin; Wu, Jiayan

    2015-01-01

    Cupriavidus sp. are generally heavy metal tolerant bacteria with the ability to degrade a variety of aromatic hydrocarbon compounds, although the degradation pathways and substrate versatilities remain largely unknown. Here we studied the bacterium Cupriavidus gilardii strain CR3, which was isolated from a natural asphalt deposit, and which was shown to utilize naphthenic acids as a sole carbon source. Genome sequencing of C. gilardii CR3 was carried out to elucidate possible mechanisms for the naphthenic acid biodegradation. The genome of C. gilardii CR3 was composed of two circular chromosomes chr1 and chr2 of respectively 3,539,530 bp and 2,039,213 bp in size. The genome for strain CR3 encoded 4,502 putative protein-coding genes, 59 tRNA genes, and many other non-coding genes. Many genes were associated with xenobiotic biodegradation and metal resistance functions. Pathway prediction for degradation of cyclohexanecarboxylic acid, a representative naphthenic acid, suggested that naphthenic acid undergoes initial ring-cleavage, after which the ring fission products can be degraded via several plausible degradation pathways including a mechanism similar to that used for fatty acid oxidation. The final metabolic products of these pathways are unstable or volatile compounds that were not toxic to CR3. Strain CR3 was also shown to have tolerance to at least 10 heavy metals, which was mainly achieved by self-detoxification through ion efflux, metal-complexation and metal-reduction, and a powerful DNA self-repair mechanism. Our genomic analysis suggests that CR3 is well adapted to survive the harsh environment in natural asphalts containing naphthenic acids and high concentrations of heavy metals. PMID:26301592

  6. Genome Sequence Analysis of the Naphthenic Acid Degrading and Metal Resistant Bacterium Cupriavidus gilardii CR3

    PubMed Central

    Xiao, Jingfa; Hao, Lirui; Crowley, David E.; Zhang, Zhewen; Yu, Jun; Huang, Ning; Huo, Mingxin; Wu, Jiayan

    2015-01-01

    Cupriavidus sp. are generally heavy metal tolerant bacteria with the ability to degrade a variety of aromatic hydrocarbon compounds, although the degradation pathways and substrate versatilities remain largely unknown. Here we studied the bacterium Cupriavidus gilardii strain CR3, which was isolated from a natural asphalt deposit, and which was shown to utilize naphthenic acids as a sole carbon source. Genome sequencing of C. gilardii CR3 was carried out to elucidate possible mechanisms for the naphthenic acid biodegradation. The genome of C. gilardii CR3 was composed of two circular chromosomes chr1 and chr2 of respectively 3,539,530 bp and 2,039,213 bp in size. The genome for strain CR3 encoded 4,502 putative protein-coding genes, 59 tRNA genes, and many other non-coding genes. Many genes were associated with xenobiotic biodegradation and metal resistance functions. Pathway prediction for degradation of cyclohexanecarboxylic acid, a representative naphthenic acid, suggested that naphthenic acid undergoes initial ring-cleavage, after which the ring fission products can be degraded via several plausible degradation pathways including a mechanism similar to that used for fatty acid oxidation. The final metabolic products of these pathways are unstable or volatile compounds that were not toxic to CR3. Strain CR3 was also shown to have tolerance to at least 10 heavy metals, which was mainly achieved by self-detoxification through ion efflux, metal-complexation and metal-reduction, and a powerful DNA self-repair mechanism. Our genomic analysis suggests that CR3 is well adapted to survive the harsh environment in natural asphalts containing naphthenic acids and high concentrations of heavy metals. PMID:26301592

  7. New monoclonal antibodies to the Ebola virus glycoprotein: Identification and analysis of the amino acid sequence of the variable domains.

    PubMed

    Panina, A A; Aliev, T K; Shemchukova, O B; Dement'yeva, I G; Varlamov, N E; Pozdnyakova, L P; Bokov, M N; Dolgikh, D A; Sveshnikov, P G; Kirpichnikov, M P

    2016-03-01

    We determined the nucleotide and amino acid sequences of variable domains of three new monoclonal antibodies to the glycoprotein of Ebola virus capsid. The framework and hypervariable regions of immunoglobulin heavy and light chains were identified. The primary structures were confirmed using massspectrometry analysis. Immunoglobulin database search showed the uniqueness of the sequences obtained. PMID:27193713

  8. Amino acid sequence analysis and characterization of a ribonuclease from starfish Asterias amurensis.

    PubMed

    Motoyoshi, Naomi; Kobayashi, Hiroko; Itagaki, Tadashi; Inokuchi, Norio

    2016-09-01

    The aim of this study was to phylogenetically characterize the location of the RNase T2 enzyme in the starfish (Asterias amurensis). We isolated an RNase T2 ribonuclease (RNase Aa) from the ovaries of starfish and determined its amino acid sequence by protein chemistry and cloning cDNA encoding RNase Aa. The isolated protein had 231 amino acid residues, a predicted molecular mass of 25,906 Da, and an optimal pH of 5.0. RNase Aa preferentially released guanylic acid from the RNA. The catalytic sites of the RNase T2 family are conserved in RNase Aa; furthermore, the distribution of the cysteine residues in RNase Aa is similar to that in other animal and plant T2 RNases. RNase Aa is cleaved at two points: 21 residues from the N-terminus and 29 residues from the C-terminus; however, both fragments may remain attached to the protein via disulfide bridges, leading to the maintenance of its conformation, as suggested by circular dichroism spectrum analysis. The phylogenetic analysis revealed that starfish RNase Aa is evolutionarily an intermediate between protozoan and oyster RNases. PMID:26920046

  9. In silico comparative analysis of DNA and amino acid sequences for prion protein gene.

    PubMed

    Kim, Y; Lee, J; Lee, C

    2008-01-01

    Genetic variability might contribute to species specificity of prion diseases in various organisms. In this study, structures of the prion protein gene (PRNP) and its amino acids were compared among species of which sequence data were available. Comparisons of PRNP DNA sequences among 12 species including human, chimpanzee, monkey, bovine, ovine, dog, mouse, rat, wallaby, opossum, chicken and zebrafish allowed us to identify candidate regulatory regions in intron 1 and 3'-untranslated region (UTR) in addition to the coding region. Highly conserved putative binding sites for transcription factors, such as heat shock factor 2 (HSF2) and myocite enhancer factor 2 (MEF2), were discovered in the intron 1. In 3'-UTR, the functional sequence (ATTAAA) for nucleus-specific polyadenylation was found in all the analysed species. The functional sequence (TTTTTAT) for maturation-specific polyadenylation was identically observed only in ovine, and one or two nucleotide mismatches in the other species. A comparison of the amino acid sequences in 53 species revealed a large sequence identity. Especially the octapeptide repeat region was observed in all the species but frog and zebrafish. Functional changes and susceptibility to prion diseases with various isoforms of prion protein could be caused by numeric variability and conformational changes discovered in the repeat sequences. PMID:18397498

  10. Structural analysis of complementary DNA and amino acid sequences of human and rat androgen receptors

    SciTech Connect

    Chang, C.; Kokontis, J.; Liao, S. )

    1988-10-01

    Structural analysis of cDNAs for human and rat androgen receptors (ARs) indicates that the amino-terminal regions of ARs are rich in oligo- and poly(amino acid) motifs as in some homeotic genes. The human AR has a long stretch of repeated glycines, whereas rat AR has a long stretch of glutamines. There is a considerable sequence similarity among ARs and the receptors for glucocorticoids, progestins, and mineralocorticoids within the steroid-binding domains. The cysteine-rich DNA-binding domains are well conserved. Translation of mRNA transcribed from AR cDNAs yielded 94- and 76-kDa proteins and smaller forms that bind to DNA and have high affinity toward androgens. These rat or human ARs were recognized by human autoantibodies to natural Ars. Molecular hybridization studies, using AR cDNAs as probes, indicated that the ventral prostate and other male accessory organs are rich in AR mRNA and that the production of AR mRNA in the target organs may be autoregulated by androgens.

  11. Nanopore Analysis of Nucleic Acids: Single-Molecule Studies of Molecular Dynamics, Structure, and Base Sequence

    NASA Astrophysics Data System (ADS)

    Olasagasti, Felix; Deamer, David W.

    Nucleic acids are linear polynucleotides in which each base is covalently linked to a pentose sugar and a phosphate group carrying a negative charge. If a pore having roughly the crosssectional diameter of a single-stranded nucleic acid is embedded in a thin membrane and a voltage of 100 mV or more is applied, individual nucleic acids in solution can be captured by the electrical field in the pore and translocated through by single-molecule electrophoresis. The dimensions of the pore cannot accommodate anything larger than a single strand, so each base in the molecule passes through the pore in strict linear sequence. The nucleic acid strand occupies a large fraction of the pore's volume during translocation and therefore produces a transient blockade of the ionic current created by the applied voltage. If it could be demonstrated that each nucleotide in the polymer produced a characteristic modulation of the ionic current during its passage through the nanopore, the sequence of current modulations would reflect the sequence of bases in the polymer. According to this basic concept, nanopores are analogous to a Coulter counter that detects nanoscopic molecules rather than microscopic [1,2]. However, the advantage of nanopores is that individual macromolecules can be characterized because different chemical and physical properties affect their passage through the pore. Because macromolecules can be captured in the pore as well as translocated, the nanopore can be used to detect individual functional complexes that form between a nucleic acid and an enzyme. No other technique has this capability.

  12. [Partial sequence homology of FtsZ in phylogenetics analysis of lactic acid bacteria].

    PubMed

    Zhang, Bin; Dong, Xiu-zhu

    2005-10-01

    FtsZ is a structurally conserved protein, which is universal among the prokaryotes. It plays a key role in prokaryote cell division. A partial fragment of the ftsZ gene about 800bp in length was amplified and sequenced and a partial FtsZ protein phylogenetic tree for the lactic acid bacteria was constructed. By comparing the FtsZ phylogenetic tree with the 16S rDNA tree, it was shown that the two trees were similar in topology. Both trees revealed that Pediococcus spp. were closely related with L. casei group of Lactobacillus spp. , but less related with other lactic acid cocci such as Enterococcus and Streptococcus. The results also showed that the discriminative power of FtsZ was higher than that of 16S rDNA for either inter-species or inter-genus and could be a very useful tool in species identification of lactic acid bacteria. PMID:16342751

  13. Composition for nucleic acid sequencing

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2008-08-26

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  14. Novel method for PIK3CA mutation analysis: locked nucleic acid--PCR sequencing.

    PubMed

    Ang, Daphne; O'Gara, Rebecca; Schilling, Amy; Beadling, Carol; Warrick, Andrea; Troxell, Megan L; Corless, Christopher L

    2013-05-01

    Somatic mutations in PIK3CA are commonly seen in invasive breast cancer and several other carcinomas, occurring in three hotspots: codons 542 and 545 of exon 9 and in codon 1047 of exon 20. We designed a locked nucleic acid (LNA)-PCR sequencing assay to detect low levels of mutant PIK3CA DNA with attention to avoiding amplification of a pseudogene on chromosome 22 that has >95% homology to exon 9 of PIK3CA. We tested 60 FFPE breast DNA samples with known PIK3CA mutation status (48 cases had one or more PIK3CA mutations, and 12 were wild type) as identified by PCR-mass spectrometry. PIK3CA exons 9 and 20 were amplified in the presence or absence of LNA-oligonucleotides designed to bind to the wild-type sequences for codons 542, 545, and 1047, and partially suppress their amplification. LNA-PCR sequencing confirmed all 51 PIK3CA mutations; however, the mutation detection rate by standard Sanger sequencing was only 69% (35 of 51). Of the 12 PIK3CA wild-type cases, LNA-PCR sequencing detected three additional H1047R mutations in "normal" breast tissue and one E545K in usual ductal hyperplasia. Histopathological review of these three normal breast specimens showed columnar cell change in two (both with known H1047R mutations) and apocrine metaplasia in one. The novel LNA-PCR shows higher sensitivity than standard Sanger sequencing and did not amplify the known pseudogene. PMID:23541593

  15. Jack bean α-mannosidase: amino acid sequencing and N-glycosylation analysis of a valuable glycomics tool.

    PubMed

    Gnanesh Kumar, B S; Pohlentz, Gottfried; Schulte, Mona; Mormann, Michael; Siva Kumar, Nadimpalli

    2014-03-01

    Jack bean (Canavalia ensiformis) seeds contain several biologically important proteins among which α-mannosidase (EC 3.2.1.24) has been purified, its biochemical properties studied and widely used in glycan analysis. In the present study, we have used the purified enzyme and derived its amino acid sequence covering both the known subunits (molecular mass of ∼66,000 and ∼44,000 Da) hitherto not known in its entirety. Peptide de novo sequencing and structural elucidation of N-glycopeptides obtained either directly from proteolytic digestion or after zwitterionic hydrophilic interaction liquid chromatography solid phase extraction-based separation were performed by use of nanoelectrospray ionization quadrupole time-of-flight mass spectrometry and low-energy collision-induced dissociation experiments. De novo sequencing provided new insights into the disulfide linkage organization, intersection of subunits and complete N-glycan structures along with site specificities. The primary sequence suggests that the enzyme belongs to glycosyl hydrolase family 38 and the N-glycan sequence analysis revealed high-mannose oligosaccharides, which were found to be heterogeneous with varying number of hexoses viz, Man8-9GlcNAc2 and Glc1Man9GlcNAc2 in an evolutionarily conserved N-glycosylation site. This site with two proximal cysteines is present in all the acidic α-mannosidases reported so far in eukaryotes. Further, a truncated paucimannose type was identified to be lacking terminal two mannose, Man1(Xyl)GlcNAc2 (Fuc). PMID:24295789

  16. ISHAN: sequence homology analysis package.

    PubMed

    Shil, Pratip; Dudani, Niraj; Vidyasagar, Pandit B

    2006-01-01

    Sequence based homology studies play an important role in evolutionary tracing and classification of proteins. Various methods are available to analyze biological sequence information. However, with the advent of proteomics era, there is a growing demand for analysis of huge amount of biological sequence information, and it has become necessary to have programs that would provide speedy analysis. ISHAN has been developed as a homology analysis package, built on various sequence analysis tools viz FASTA, ALIGN, CLUSTALW, PHYLIP and CODONW (for DNA sequences). This JAVA application offers the user choice of analysis tools. For testing, ISHAN was applied to perform phylogenetic analysis for sets of Caspase 3 DNA sequences and NF-kappaB p105 amino acid sequences. By integrating several tools it has made analysis much faster and reduced manual intervention. PMID:17274766

  17. High speed nucleic acid sequencing

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2011-05-17

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid. Each type of labeled nucleotide comprises an acceptor fluorophore attached to a phosphate portion of the nucleotide such that the fluorophore is removed upon incorporation into a growing strand. Fluorescent signal is emitted via fluorescent resonance energy transfer between the donor fluorophore and the acceptor fluorophore as each nucleotide is incorporated into the growing strand. The sequence is deduced by identifying which base is being incorporated into the growing strand.

  18. Protein sequence analysis by incorporating modified chaos game and physicochemical properties into Chou's general pseudo amino acid composition.

    PubMed

    Xu, Chunrui; Sun, Dandan; Liu, Shenghui; Zhang, Yusen

    2016-10-01

    In this contribution we introduced a novel graphical method to compare protein sequences. By mapping a protein sequence into 3D space based on codons and physicochemical properties of 20 amino acids, we are able to get a unique P-vector from the 3D curve. This approach is consistent with wobble theory of amino acids. We compute the distance between sequences by their P-vectors to measure similarities/dissimilarities among protein sequences. Finally, we use our method to analyze four datasets and get better results compared with previous approaches. PMID:27375218

  19. N-Terminal Amino Acid Sequence Determination of Proteins by N-Terminal Dimethyl Labeling: Pitfalls and Advantages When Compared with Edman Degradation Sequence Analysis.

    PubMed

    Chang, Elizabeth; Pourmal, Sergei; Zhou, Chun; Kumar, Rupesh; Teplova, Marianna; Pavletich, Nikola P; Marians, Kenneth J; Erdjument-Bromage, Hediye

    2016-07-01

    In recent history, alternative approaches to Edman sequencing have been investigated, and to this end, the Association of Biomolecular Resource Facilities (ABRF) Protein Sequencing Research Group (PSRG) initiated studies in 2014 and 2015, looking into bottom-up and top-down N-terminal (Nt) dimethyl derivatization of standard quantities of intact proteins with the aim to determine Nt sequence information. We have expanded this initiative and used low picomole amounts of myoglobin to determine the efficiency of Nt-dimethylation. Application of this approach on protein domains, generated by limited proteolysis of overexpressed proteins, confirms that it is a universal labeling technique and is very sensitive when compared with Edman sequencing. Finally, we compared Edman sequencing and Nt-dimethylation of the same polypeptide fragments; results confirm that there is agreement in the identity of the Nt amino acid sequence between these 2 methods. PMID:27006647

  20. N-Terminal Amino Acid Sequence Determination of Proteins by N-Terminal Dimethyl Labeling: Pitfalls and Advantages When Compared with Edman Degradation Sequence Analysis

    PubMed Central

    Chang, Elizabeth; Pourmal, Sergei; Zhou, Chun; Kumar, Rupesh; Teplova, Marianna; Pavletich, Nikola P.; Marians, Kenneth J.

    2016-01-01

    In recent history, alternative approaches to Edman sequencing have been investigated, and to this end, the Association of Biomolecular Resource Facilities (ABRF) Protein Sequencing Research Group (PSRG) initiated studies in 2014 and 2015, looking into bottom-up and top-down N-terminal (Nt) dimethyl derivatization of standard quantities of intact proteins with the aim to determine Nt sequence information. We have expanded this initiative and used low picomole amounts of myoglobin to determine the efficiency of Nt-dimethylation. Application of this approach on protein domains, generated by limited proteolysis of overexpressed proteins, confirms that it is a universal labeling technique and is very sensitive when compared with Edman sequencing. Finally, we compared Edman sequencing and Nt-dimethylation of the same polypeptide fragments; results confirm that there is agreement in the identity of the Nt amino acid sequence between these 2 methods. PMID:27006647

  1. Species specific amino acid sequence-protein local structure relationships: An analysis in the light of a structural alphabet.

    PubMed

    de Brevern, Alexandre G; Joseph, Agnel Praveen

    2011-05-01

    Protein structure analysis and prediction methods are based on non-redundant data extracted from the available protein structures, regardless of the species from which the protein originates. Hence, these datasets represent the global knowledge on protein folds, which constitutes a generic distribution of amino acid sequence-protein structure (AAS-PS) relationships. In this study, we try to elucidate whether the AAS-PS relationship could possess specificities depending on the specie. For this purpose, we have chosen three different species: Saccharomyces cerevisiae, Plasmodium falciparum and Arabidopsis thaliana. We analyzed the AAS-PS behaviors of the proteins from these three species and compared it to the "expected" distribution of a classical non-redundant databank. With the classical secondary structure description, only slight differences in amino acid preferences could be observed. With a more precise description of local protein structures (Protein Blocks), significant changes could be highlighted. S. cerevisiae's AAS-PS relationship is close to the general distribution, while striking differences are observed in the case of A. thaliana. P. falciparum is the most distant one. This study presents some interesting view-points on AAS-PS relationship. Certain species exhibit unique preferences for amino acids to be associated with protein local structural elements. Thus, AAS-PS relationships are species dependent. These results can give useful insights for improving prediction methodologies which take the species specific information into account. PMID:21333657

  2. Statistical analysis of nucleotide sequences.

    PubMed Central

    Stückle, E E; Emmrich, C; Grob, U; Nielsen, P J

    1990-01-01

    In order to scan nucleic acid databases for potentially relevant but as yet unknown signals, we have developed an improved statistical model for pattern analysis of nucleic acid sequences by modifying previous methods based on Markov chains. We demonstrate the importance of selecting the appropriate parameters in order for the method to function at all. The model allows the simultaneous analysis of several short sequences with unequal base frequencies and Markov order k not equal to 0 as is usually the case in databases. As a test of these modifications, we show that in E. coli sequences there is a bias against palindromic hexamers which correspond to known restriction enzyme recognition sites. PMID:2251125

  3. The delta EEG (sleep)-inducing peptide (DSIP). XI. Amino-acid analysis, sequence, synthesis and activity of the nonapeptide.

    PubMed

    Schoenenberger, G A; Maier, P F; Tobler, H J; Wilson, K; Monnier, M

    1978-09-01

    A peptide which induces slow-wave EEG (sleep) after intraventricular infusion into the brain has been isolated from the extracorporeal dialysate of cerebral venous blood in rabbits submitted to hypnogenic electrical stimulation of the intralaminar thalamic area. It was shown by amino-acid analysis and sequence determination to be Trp-Ala-Gly-Gly-Asp-Ala-Ser-Gly-Glu and named "Delta Sleep-Inducing Peptide" (DSIP). This compound was synthesized as well as 5 possible metabolic products (1--8, 2--9, 2--8, 1--4 and 5--9), 2 nonapeptide analogues (with one and two amino-acids exchanged) and a related tripeptide (Trp-Ser-Glu). All 9 synthetic peptides were infused intraventricularly in rabbits (6 nmol/kg in 0.05 ml of CSF-like solution over 3.5 min) and tested under double-blind conditions. A total of 61 rabbits including controls were used. The EEG from the frontal neocortex and the limbic archicortex were subjected to direct fast-Fourier transformation and analyzed by an 1108 computer system. A highly specific delta and spindle EEG-enhancing effect of the synthetic DSIP could be demonstrated. The mean increase of EEG delta activity reached 35% in the neocortex and limbic cortex as compared to control animals receiving CSF-like solution or any of the other 8 peptides. The final chemical characterization of the synthetic DSIP revealed that only the pure alpha-aspartyl peptide is highly active in contrast to its beta-Asp isomer. A neurohumoral modulating and programming activity was suggested. PMID:568769

  4. Human Retroviruses and AIDS. A compilation and analysis of nucleic acid and amino acid sequences: I--II; III--V

    SciTech Connect

    Myers, G.; Korber, B.; Wain-Hobson, S.; Smith, R.F.; Pavlakis, G.N.

    1993-12-31

    This compendium and the accompanying floppy diskettes are the result of an effort to compile and rapidly publish all relevant molecular data concerning the human immunodeficiency viruses (HIV) and related retroviruses. The scope of the compendium and database is best summarized by the five parts that it comprises: (I) HIV and SIV Nucleotide Sequences; (II) Amino Acid Sequences; (III) Analyses; (IV) Related Sequences; and (V) Database Communications. Information within all the parts is updated at least twice in each year, which accounts for the modes of binding and pagination in the compendium.

  5. Chip-based sequencing nucleic acids

    DOEpatents

    Beer, Neil Reginald

    2014-08-26

    A system for fast DNA sequencing by amplification of genetic material within microreactors, denaturing, demulsifying, and then sequencing the material, while retaining it in a PCR/sequencing zone by a magnetic field. One embodiment includes sequencing nucleic acids on a microchip that includes a microchannel flow channel in the microchip. The nucleic acids are isolated and hybridized to magnetic nanoparticles or to magnetic polystyrene-coated beads. Microreactor droplets are formed in the microchannel flow channel. The microreactor droplets containing the nucleic acids and the magnetic nanoparticles are retained in a magnetic trap in the microchannel flow channel and sequenced.

  6. Biological Sequence Analysis with Multivariate String Kernels.

    PubMed

    Kuksa, Pavel P

    2013-03-01

    String kernel-based machine learning methods have yielded great success in practical tasks of structured/sequential data analysis. They often exhibit state-of-the-art performance on many practical tasks of sequence analysis such as biological sequence classification, remote homology detection, or protein superfamily and fold prediction. However, typical string kernel methods rely on analysis of discrete one-dimensional (1D) string data (e.g., DNA or amino acid sequences). In this work we address the multi-class biological sequence classification problems using multivariate representations in the form of sequences of features vectors (as in biological sequence profiles, or sequences of individual amino acid physico-chemical descriptors) and a class of multivariate string kernels that exploit these representations. On a number of protein sequence classification tasks proposed multivariate representations and kernels show significant 15-20\\% improvements compared to existing state-of-the-art sequence classification methods. PMID:23509193

  7. Image analysis for DNA sequencing

    NASA Astrophysics Data System (ADS)

    Palaniappan, Kannappan; Huang, Thomas S.

    1991-07-01

    There is a great deal of interest in automating the process of DNA (deoxyribonucleic acid) sequencing to support the analysis of genomic DNA such as the Human and Mouse Genome projects. In one class of gel-based sequencing protocols autoradiograph images are generated in the final step and usually require manual interpretation to reconstruct the DNA sequence represented by the image. The need to handle a large volume of sequence information necessitates automation of the manual autoradiograph reading step through image analysis in order to reduce the length of time required to obtain sequence data and reduce transcription errors. Various adaptive image enhancement, segmentation and alignment methods were applied to autoradiograph images. The methods are adaptive to the local characteristics of the image such as noise, background signal, or presence of edges. Once the two-dimensional data is converted to a set of aligned one-dimensional profiles waveform analysis is used to determine the location of each band which represents one nucleotide in the sequence. Different classification strategies including a rule-based approach are investigated to map the profile signals, augmented with the original two-dimensional image data as necessary, to textual DNA sequence information.

  8. Genome Sequence and Transcriptome Analysis of Meat-Spoilage-Associated Lactic Acid Bacterium Lactococcus piscium MKFS47

    PubMed Central

    Johansson, Per; Laine, Pia; Smolander, Olli-Pekka; Sonck, Matti; Rahkila, Riitta; Jääskeläinen, Elina; Paulin, Lars; Auvinen, Petri; Björkroth, Johanna

    2015-01-01

    Lactococcus piscium is a psychrotrophic lactic acid bacterium and is known to be one of the predominant species within spoilage microbial communities in cold-stored packaged foods, particularly in meat products. Its presence in such products has been associated with the formation of buttery and sour off-odors. Nevertheless, the spoilage potential of L. piscium varies dramatically depending on the strain and growth conditions. Additional knowledge about the genome is required to explain such variation, understand its phylogeny, and study gene functions. Here, we present the complete and annotated genomic sequence of L. piscium MKFS47, combined with a time course analysis of the glucose catabolism-based transcriptome. In addition, a comparative analysis of gene contents was done for L. piscium MKFS47 and 29 other lactococci, revealing three distinct clades within the genus. The genome of L. piscium MKFS47 consists of one chromosome, carrying 2,289 genes, and two plasmids. A wide range of carbohydrates was predicted to be fermented, and growth on glycerol was observed. Both carbohydrate and glycerol catabolic pathways were significantly upregulated in the course of time as a result of glucose exhaustion. At the same time, differential expression of the pyruvate utilization pathways, implicated in the formation of spoilage substances, switched the metabolism toward a heterofermentative mode. In agreement with data from previous inoculation studies, L. piscium MKFS47 was identified as an efficient producer of buttery-odor compounds under aerobic conditions. Finally, genes and pathways that may contribute to increased survival in meat environments were considered. PMID:25819958

  9. Genome Sequence and Transcriptome Analysis of Meat-Spoilage-Associated Lactic Acid Bacterium Lactococcus piscium MKFS47.

    PubMed

    Andreevskaya, Margarita; Johansson, Per; Laine, Pia; Smolander, Olli-Pekka; Sonck, Matti; Rahkila, Riitta; Jääskeläinen, Elina; Paulin, Lars; Auvinen, Petri; Björkroth, Johanna

    2015-06-01

    Lactococcus piscium is a psychrotrophic lactic acid bacterium and is known to be one of the predominant species within spoilage microbial communities in cold-stored packaged foods, particularly in meat products. Its presence in such products has been associated with the formation of buttery and sour off-odors. Nevertheless, the spoilage potential of L. piscium varies dramatically depending on the strain and growth conditions. Additional knowledge about the genome is required to explain such variation, understand its phylogeny, and study gene functions. Here, we present the complete and annotated genomic sequence of L. piscium MKFS47, combined with a time course analysis of the glucose catabolism-based transcriptome. In addition, a comparative analysis of gene contents was done for L. piscium MKFS47 and 29 other lactococci, revealing three distinct clades within the genus. The genome of L. piscium MKFS47 consists of one chromosome, carrying 2,289 genes, and two plasmids. A wide range of carbohydrates was predicted to be fermented, and growth on glycerol was observed. Both carbohydrate and glycerol catabolic pathways were significantly upregulated in the course of time as a result of glucose exhaustion. At the same time, differential expression of the pyruvate utilization pathways, implicated in the formation of spoilage substances, switched the metabolism toward a heterofermentative mode. In agreement with data from previous inoculation studies, L. piscium MKFS47 was identified as an efficient producer of buttery-odor compounds under aerobic conditions. Finally, genes and pathways that may contribute to increased survival in meat environments were considered. PMID:25819958

  10. Distinguishing Proteins From Arbitrary Amino Acid Sequences

    PubMed Central

    Yau, Stephen S.-T.; Mao, Wei-Guang; Benson, Max; He, Rong Lucy

    2015-01-01

    What kinds of amino acid sequences could possibly be protein sequences? From all existing databases that we can find, known proteins are only a small fraction of all possible combinations of amino acids. Beginning with Sanger's first detailed determination of a protein sequence in 1952, previous studies have focused on describing the structure of existing protein sequences in order to construct the protein universe. No one, however, has developed a criteria for determining whether an arbitrary amino acid sequence can be a protein. Here we show that when the collection of arbitrary amino acid sequences is viewed in an appropriate geometric context, the protein sequences cluster together. This leads to a new computational test, described here, that has proved to be remarkably accurate at determining whether an arbitrary amino acid sequence can be a protein. Even more, if the results of this test indicate that the sequence can be a protein, and it is indeed a protein sequence, then its identity as a protein sequence is uniquely defined. We anticipate our computational test will be useful for those who are attempting to complete the job of discovering all proteins, or constructing the protein universe. PMID:25609314

  11. Method for sequencing nucleic acid molecules

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2006-05-30

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  12. Method for sequencing nucleic acid molecules

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2006-06-06

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  13. Analysis of the functional domains of biosynthetic threonine deaminase by comparison of the amino acid sequences of three wild-type alleles to the amino acid sequence of biodegradative threonine deaminase.

    PubMed

    Taillon, B E; Little, R; Lawther, R P

    1988-03-31

    The nucleotide sequence of the gene, ilvA, for biosynthetic threonine deaminase (Tda) from Salmonella typhimurium was determined. The deduced amino acid sequence was compared with the deduced amino acid sequences of the biosynthetic Tda from Escherichia coli K-12 (ilvA) and Saccharomyces cerevisiae (ILV1) and the biodegradative Tda from E. coli K-12 (tdc). The comparison indicated the presence of two types of blocks of homologous amino acids. The first type of homology is in the N-terminal portion of all four isozymes of Tda and probably indicates amino acids involved in catalysis. The second type of homology is found in the C-terminal portion of the three biosynthetic isozymes and presumably is involved in either (i) the binding or interaction of the allosteric effector isoleucine with the enzyme, or (ii) subunit interactions. The sites of amino acid changes of two E. coli K-12 ilvA alleles with altered response to isoleucine are consistent with the conclusion that the C-terminal portion of biosynthetic Tda is involved in allosteric regulation. PMID:3290055

  14. Analysis of a nucleotide-binding site of 5-lipoxygenase by affinity labelling: binding characteristics and amino acid sequences.

    PubMed Central

    Zhang, Y Y; Hammarberg, T; Radmark, O; Samuelsson, B; Ng, C F; Funk, C D; Loscalzo, J

    2000-01-01

    5-Lipoxygenase (5LO) catalyses the first two steps in the biosynthesis of leukotrienes, which are inflammatory mediators derived from arachidonic acid. 5LO activity is stimulated by ATP; however, a consensus ATP-binding site or nucleotide-binding site has not been found in its protein sequence. In the present study, affinity and photoaffinity labelling of 5LO with 5'-p-fluorosulphonylbenzoyladenosine (FSBA) and 2-azido-ATP showed that 5LO bound to the ATP analogues quantitatively and specifically and that the incorporation of either analogue inhibited ATP stimulation of 5LO activity. The stoichiometry of the labelling was 1.4 mol of FSBA/mol of 5LO (of which ATP competed with 1 mol/mol) or 0.94 mol of 2-azido-ATP/mol of 5LO (of which ATP competed with 0.77 mol/mol). Labelling with FSBA prevented further labelling with 2-azido-ATP, indicating that the same binding site was occupied by both analogues. Other nucleotides (ADP, AMP, GTP, CTP and UTP) also competed with 2-azido-ATP labelling, suggesting that the site was a general nucleotide-binding site rather than a strict ATP-binding site. Ca(2+), which also stimulates 5LO activity, had no effect on the labelling of the nucleotide-binding site. Digestion with trypsin and peptide sequencing showed that two fragments of 5LO were labelled by 2-azido-ATP. These fragments correspond to residues 73-83 (KYWLNDDWYLK, in single-letter amino acid code) and 193-209 (FMHMFQSSWNDFADFEK) in the 5LO sequence. Trp-75 and Trp-201 in these peptides were modified by the labelling, suggesting that they were immediately adjacent to the C-2 position of the adenine ring of ATP. Given the stoichiometry of the labelling, the two peptide sequences of 5LO were probably near each other in the enzyme's tertiary structure, composing or surrounding the ATP-binding site of 5LO. PMID:11042125

  15. A chemically modified carbon paste electrode with d-lactate dehydrogenase and alanine aminotranferase enzyme sequences for d-lactic acid analysis.

    PubMed

    Shu, H C; Wu, N P

    2001-04-12

    An amperometric biosensor was constructed for the analysis of d-lactic acid based on immobilizing d-lactate dehydrogenase(d-LDH), alanine aminotransferase (ALT), NAD(+), a redox polymer and polyethylenimine in carbon paste. The effect of addition of ALT in the paste, using enzyme sequences of ALT/d-LDH, was insignificant for d-lactic acid analysis. The responses of d-lactic acid in ALT/d-LDH paste electrode are the same as those in d-LDH paste electrode. However, the interference effect of pyruvate in the sample can be substantially reduced if sodium glutamate was applied in the carrier solution. When ALT immobilized in control porous glass as an immobilized enzyme reactor (IMER) was mounted in flow injection analysis system with the d-LDH paste electrode as detector for d-lactate analysis, the interference of the pyruvate can be significantly eliminated. The adverse effect of pyruvate in the samples for d-lactic acid analysis was reduced more effectively in ALT IMER with d-LDH electrode than in ALT/d-LDH electrode. PMID:18968259

  16. Sequence analysis on microcomputers.

    PubMed

    Cannon, G C

    1987-10-01

    Overall, each of the program packages performed their tasks satisfactorily. For analyses where there was a well-defined answer, such as a search for a restriction site, there were few significant differences between the program sets. However, for tasks in which a degree of flexibility is desirable, such as homology or similarity determinations and database searches, DNASTAR consistently afforded the user more options in conducting the required analysis than did the other two packages. However, for laboratories where sequence analysis is not a major effort and the expense of a full sequence analysis workstation cannot be justified, MicroGenie and IBI-Pustell offer a satisfactory alternative. MicroGenie is a polished program system. Many may find that its user interface is more "user friendly" than the standard menu-driven interfaces. Its system of filing sequences under individual passwords facilitates use by more than one person. MicroGenie uses a hardware device for software protection that occupies a card slot in the computer on which it is used. Although I am sympathetic to the problem of software piracy, I feel that a less drastic solution is in order for a program likely to be sharing limited computer space with other software packages. The IBI-Pustell package performs the required analysis functions as accurately and quickly as MicroGenie but it lacks the clearness and ease of use. The menu system seems disjointed, and new or infrequent users often find themselves at apparent "dead-end menus" where the only clear alternative is to restart the entire program package. It is suggested from published accounts that the user interface is going to be upgraded and perhaps when that version is available, use of the system will be improved. The documentation accompanying each package was relatively clear as to how to run the programs, but all three packages assumed that the user was familiar with the computational techniques employed. MicroGenie and IBI-Pustell further

  17. Shared Segment Analysis and Next-Generation Sequencing Implicates the Retinoic Acid Signaling Pathway in Total Anomalous Pulmonary Venous Return (TAPVR)

    PubMed Central

    Nash, Dustin; Arrington, Cammon B.; Kennedy, Brett J.; Yandell, Mark; Wu, Wilfred; Zhang, Wenying; Ware, Stephanie; Jorde, Lynn B.; Gruber, Peter J.; Yost, H. Joseph

    2015-01-01

    Most isolated congenital heart defects are thought to be sporadic and are often ascribed to multifactorial mechanisms with poorly understood genetics. Total Anomalous Pulmonary Venous Return (TAPVR) occurs in 1 in 15,000 live-born infants and occurs either in isolation or as part of a syndrome involving aberrant left-right development. Previously, we reported causative links between TAVPR and the PDGFRA gene. TAPVR has also been linked to the ANKRD1/CARP genes. However, these genes only explain a small fraction of the heritability of the condition. By examination of phased single nucleotide polymorphism genotype data from 5 distantly related TAPVR patients we identified a single 25 cM shared, Identical by Descent genomic segment on the short arm of chromosome 12 shared by 3 of the patients and their obligate-carrier parents. Whole genome sequence (WGS) analysis identified a non-synonymous variant within the shared segment in the retinol binding protein 5 (RBP5) gene. The RBP5 variant is predicted to be deleterious and is overrepresented in the TAPVR population. Gene expression and functional analysis of the zebrafish orthologue, rbp7, supports the notion that RBP5 is a TAPVR susceptibility gene. Additional sequence analysis also uncovered deleterious variants in genes associated with retinoic acid signaling, including NODAL and retinol dehydrogenase 10. These data indicate that genetic variation in the retinoic acid signaling pathway confers, in part, susceptibility to TAPVR. PMID:26121141

  18. Analysis of the complete sequences of two biologically distinct Zucchini yellow mosaic virus isolates further evidences the involvement of a single amino acid in the virus pathogenicity.

    PubMed

    Nováková, S; Svoboda, J; Glasa, M

    2014-01-01

    The complete genome sequences of two Slovak Zucchini yellow mosaic virus isolates (ZYMV-H and ZYMV-SE04T) were determined. These isolates differ significantly in their pathogenicity, producing either severe or very mild symptoms on susceptible cucurbit hosts. The viral genome of both isolates consisted of 9593 nucleotides in size, and contained an open reading frame encoding a single polyprotein of 3080 amino acids. Despite their different biological properties, an extremely high nucleotide identity could be noted (99.8%), resulting in differences of only 5 aa, located in the HC-Pro, P3, and NIb, respectively. In silico analysis including 5 additional fully-sequenced and phylogenetically closely-related isolates known to induce different symptoms in cucurbits was performed. This suggested that the key single mutation responsible for virus pathogenicity is likely located in the N-terminal part of P3, adjacent to the PIPO. PMID:25518719

  19. Amino acid sequence and carbohydrate-binding analysis of the N-acetyl-D-galactosamine-specific C-type lectin, CEL-I, from the Holothuroidea, Cucumaria echinata.

    PubMed

    Hatakeyama, Tomomitsu; Matsuo, Noriaki; Shiba, Kouhei; Nishinohara, Shoichi; Yamasaki, Nobuyuki; Sugawara, Hajime; Aoyagi, Haruhiko

    2002-01-01

    CEL-I is one of the Ca2+-dependent lectins that has been isolated from the sea cucumber, Cucumaria echinata. This protein is composed of two identical subunits held by a single disulfide bond. The complete amino acid sequence of CEL-I was determined by sequencing the peptides produced by proteolytic fragmentation of S-pyridylethylated CEL-I. A subunit of CEL-I is composed of 140 amino acid residues. Two intrachain (Cys3-Cys14 and Cys31-Cys135) and one interchain (Cys36) disulfide bonds were also identified from an analysis of the cystine-containing peptides obtained from the intact protein. The similarity between the sequence of CEL-I and that of other C-type lectins was low, while the C-terminal region, including the putative Ca2+ and carbohydrate-binding sites, was relatively well conserved. When the carbohydrate-binding activity was examined by a solid-phase microplate assay, CEL-I showed much higher affinity for N-acetyl-D-galactosamine than for other galactose-related carbohydrates. The association constant of CEL-I for p-nitrophenyl N-acetyl-beta-D-galactosaminide (NP-GalNAc) was determined to be 2.3 x 10(4) M(-1), and the maximum number of bound NP-GalNAc was estimated to be 1.6 by an equilibrium dialysis experiment. PMID:11866098

  20. Amino acid sequence repertoire of the bacterial proteome and the occurrence of untranslatable sequences.

    PubMed

    Navon, Sharon Penias; Kornberg, Guy; Chen, Jin; Schwartzman, Tali; Tsai, Albert; Puglisi, Elisabetta Viani; Puglisi, Joseph D; Adir, Noam

    2016-06-28

    Bioinformatic analysis of Escherichia coli proteomes revealed that all possible amino acid triplet sequences occur at their expected frequencies, with four exceptions. Two of the four underrepresented sequences (URSs) were shown to interfere with translation in vivo and in vitro. Enlarging the URS by a single amino acid resulted in increased translational inhibition. Single-molecule methods revealed stalling of translation at the entrance of the peptide exit tunnel of the ribosome, adjacent to ribosomal nucleotides A2062 and U2585. Interaction with these same ribosomal residues is involved in regulation of translation by longer, naturally occurring protein sequences. The E. coli exit tunnel has evidently evolved to minimize interaction with the exit tunnel and maximize the sequence diversity of the proteome, although allowing some interactions for regulatory purposes. Bioinformatic analysis of the human proteome revealed no underrepresented triplet sequences, possibly reflecting an absence of regulation by interaction with the exit tunnel. PMID:27307442

  1. Phenolic acid esterases, coding sequences and methods

    DOEpatents

    Blum, David L.; Kataeva, Irina; Li, Xin-Liang; Ljungdahl, Lars G.

    2002-01-01

    Described herein are four phenolic acid esterases, three of which correspond to domains of previously unknown function within bacterial xylanases, from XynY and XynZ of Clostridium thermocellum and from a xylanase of Ruminococcus. The fourth specifically exemplified xylanase is a protein encoded within the genome of Orpinomyces PC-2. The amino acids of these polypeptides and nucleotide sequences encoding them are provided. Recombinant host cells, expression vectors and methods for the recombinant production of phenolic acid esterases are also provided.

  2. Amino-Acid Sequence of Porcine Pepsin

    PubMed Central

    Tang, J.; Sepulveda, P.; Marciniszyn, J.; Chen, K. C. S.; Huang, W-Y.; Tao, N.; Liu, D.; Lanier, J. P.

    1973-01-01

    As the culmination of several years of experiments, we propose a complete amino-acid sequence for porcine pepsin, an enzyme containing 327 amino-acid residues in a single polypeptide chain. In the sequence determination, the enzyme was treated with cyanogen bromide. Five resulting fragments were purified. The amino-acid sequence of four of the fragments accounted for 290 residues. Because the structure of a 37-residue carboxyl-terminal fragment was already known, it was not studied. The alignment of these fragments was determined from the sequence of methionyl-peptides we had previously reported. We also discovered the locations of activesite aspartyl residues, as well as the pairing of the three disulfide bridges. A minor component of commercial crystalline pepsin was found to contain two extra amino-acid residues, Ala-Leu-, at the amino-terminus of the molecule. This minor component was apparently derived from a different site of cleavage during the activation of porcine pepsinogen. PMID:4587252

  3. Method for identifying and quantifying nucleic acid sequence aberrations

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1998-01-01

    A method for detecting nucleic acid sequence aberrations by detecting nucleic acid sequences having both a first and a second nucleic acid sequence type, the presence of the first and second sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. The method uses a first hybridization probe which includes a nucleic acid sequence that is complementary to a first sequence type and a first complexing agent capable of attaching to a second complexing agent and a second hybridization probe which includes a nucleic acid sequence that selectively hybridizes to the second nucleic acid sequence type over the first sequence type and includes a detectable marker for detecting the second hybridization probe.

  4. Method for identifying and quantifying nucleic acid sequence aberrations

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1998-07-21

    A method is disclosed for detecting nucleic acid sequence aberrations by detecting nucleic acid sequences having both a first and a second nucleic acid sequence type, the presence of the first and second sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. The method uses a first hybridization probe which includes a nucleic acid sequence that is complementary to a first sequence type and a first complexing agent capable of attaching to a second complexing agent and a second hybridization probe which includes a nucleic acid sequence that selectively hybridizes to the second nucleic acid sequence type over the first sequence type and includes a detectable marker for detecting the second hybridization probe. 11 figs.

  5. Analysis of key genes of jasmonic acid mediated signal pathway for defense against insect damages by comparative transcriptome sequencing.

    PubMed

    Yang, Fengshan; Zhang, Yuliang; Huang, Qixing; Yin, Guohua; Pennerman, Kayla K; Yu, Jiujiang; Liu, Zhixin; Li, Dafei; Guo, Anping

    2015-01-01

    Corn defense systems against insect herbivory involve activation of genes that lead to metabolic reconfigurations to produce toxic compounds, proteinase inhibitors, oxidative enzymes, and behavior-modifying volatiles. Similar responses occur when the plant is exposed to methyl jasmonate (MeJA). To compare the defense responses between stalk borer feeding and exogenous MeJA on a transcriptional level, we employed deep transcriptome sequencing methods following Ostrinia furnacalis leaf feeding and MeJA leaf treatment. 39,636 genes were found to be differentially expressed with O. furnacalis feeding, MeJA application, and O. furnacalis feeding and MeJA application. Following Gene Ontology enrichment analysis of the up- or down- regulated genes, many were implicated in metabolic processes, stimuli-responsive catalytic activity, and transfer activity. Fifteen genes that indicated significant changes in the O. furnacalis feeding group: LOX1, ASN1, eIF3, DXS, AOS, TIM, LOX5, BBTI2, BBTI11, BBTI12, BBTI13, Cl-1B, TPS10, DOX, and A20/AN1 were found to almost all be involved in jasmonate defense signaling pathways. All of the data demonstrate that the jasmonate defense signal pathway is a major defense signaling pathways of Asian corn borer's defense against insect herbivory. The transcriptome data are publically available at NCBI SRA: SRS965087. PMID:26560755

  6. Analysis of key genes of jasmonic acid mediated signal pathway for defense against insect damages by comparative transcriptome sequencing

    PubMed Central

    Yang, Fengshan; Zhang, Yuliang; Huang, Qixing; Yin, Guohua; Pennerman, Kayla K.; Yu, Jiujiang; Liu, Zhixin; Li, Dafei; Guo, Anping

    2015-01-01

    Corn defense systems against insect herbivory involve activation of genes that lead to metabolic reconfigurations to produce toxic compounds, proteinase inhibitors, oxidative enzymes, and behavior-modifying volatiles. Similar responses occur when the plant is exposed to methyl jasmonate (MeJA). To compare the defense responses between stalk borer feeding and exogenous MeJA on a transcriptional level, we employed deep transcriptome sequencing methods following Ostrinia furnacalis leaf feeding and MeJA leaf treatment. 39,636 genes were found to be differentially expressed with O. furnacalis feeding, MeJA application, and O. furnacalis feeding and MeJA application. Following Gene Ontology enrichment analysis of the up- or down- regulated genes, many were implicated in metabolic processes, stimuli-responsive catalytic activity, and transfer activity. Fifteen genes that indicated significant changes in the O. furnacalis feeding group: LOX1, ASN1, eIF3, DXS, AOS, TIM, LOX5, BBTI2, BBTI11, BBTI12, BBTI13, Cl-1B, TPS10, DOX, and A20/AN1 were found to almost all be involved in jasmonate defense signaling pathways. All of the data demonstrate that the jasmonate defense signal pathway is a major defense signaling pathways of Asian corn borer’s defense against insect herbivory. The transcriptome data are publically available at NCBI SRA: SRS965087. PMID:26560755

  7. Nucleotide sequence analysis with polynucleotide kinase and nucleotide `mapping' methods. 5′-Terminal sequence of deoxyribonucleic acid from bacteriophages λ and 424

    PubMed Central

    Murray, Kenneth

    1973-01-01

    The polynucleotide kinase reaction was used in analyses of complex mixtures of oligodeoxynucleotides which were fractionated by various two-dimensional nucleotide `mapping' procedures. Parallel ionophoretic analyses on DEAE-cellulose paper, pH2, and AE-cellulose paper, pH3.5, of venom phosphodiesterase partial digests of 5′-terminally labelled oligonucleotides enabled the sequence of the nucleotides to be deduced uniquely. A `diagonal ionophoresis' method has been used with mixtures of nucleotides. Application of these methods to 5′-terminally labelled DNA from bacteriophage λ gave the terminal sequences pA-G-G-T-C-G and pG-G-G-C-G. Identical 5′-terminal sequences were found with DNA from bacteriophage 424. ImagesPLATE 5PLATE 1PLATE 2PLATE 3PLATE 4 PMID:4352720

  8. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    2001-06-05

    A computer system (1) for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments may be improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area (814) and sample sequences in another area (816) on a display device (3).

  9. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    1999-10-26

    A computer system (1) for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments may be improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area (814) and sample sequences in another area (816) on a display device (3).

  10. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, M.S.

    1998-08-18

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device. 27 figs.

  11. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    2003-08-19

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments may be improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  12. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    1998-08-18

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  13. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.; Wang, Chunwei; Jevons, Luis C.; Bernhart, Derek H.; Lipshutz, Robert J.

    2004-05-11

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  14. Methods for analyzing nucleic acid sequences

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2011-05-17

    The present invention is directed to a method of sequencing a target nucleic acid. The method provides a complex comprising a polymerase enzyme, a target nucleic acid molecule, and a primer, wherein the complex is immobilized on a support Fluorescent label is attached to a terminal phosphate group of the nucleotide or nucleotide analog. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The time duration of the signal from labeled nucleotides or nucleotide analogs that become incorporated is distinguished from freely diffusing labels by a longer retention in the observation volume for the nucleotides or nucleotide analogs that become incorporated than for the freely diffusing labels.

  15. Twin Mitochondrial Sequence Analysis.

    PubMed

    Bouhlal, Yosr; Martinez, Selena; Gong, Henry; Dumas, Kevin; Shieh, Joseph T C

    2013-09-01

    When applying genome-wide sequencing technologies to disease investigation, it is increasingly important to resolve sequence variation in regions of the genome that may have homologous sequences. The human mitochondrial genome challenges interpretation given the potential for heteroplasmy, somatic variation, and homologous nuclear mitochondrial sequences (numts). Identical twins share the same mitochondrial DNA (mtDNA) from early life, but whether the mitochondrial sequence remains similar is unclear. We compared an adult monozygotic twin pair using high throughput-sequencing and evaluated variants with primer extension and mitochondrial pre-enrichment. Thirty-seven variants were shared between the twin individuals, and the variants were verified on the original genomic DNA. These studies support highly identical genetic sequence in this case. Certain low-level variant calls were of high quality and homology to the mitochondrial DNA, and they were further evaluated. When we assessed calls in pre-enriched mitochondrial DNA templates, we found that these may represent numts, which can be differentiated from mtDNA variation. We conclude that twin identity extends to mitochondrial DNA, and it is critical to differentiate between numts and mtDNA in genome sequencing, particularly since significant heteroplasmy could influence genome interpretation. Further studies on mtDNA and numts will aid in understanding how variation occurs and persists. PMID:24040623

  16. E-probe Diagnostic Nucleic acid Analysis (EDNA): A theoretical approach for handling of next generation sequencing data for diagnostics

    Technology Transfer Automated Retrieval System (TEKTRAN)

    There are many plant pathogen-specific diagnostic assays, based on PCR and immune-detection. However, the ability to test for large numbers of pathogens simultaneously is lacking. Next generation sequencing (NGS) allows one to detect all organisms within a given sample, but has computational limitat...

  17. Identification of Tuber borchii Vittad. mycelium proteins separated by two-dimensional polyacrylamide gel electrophoresis using amino acid analysis and sequence tagging.

    PubMed

    Vallorani, L; Bernardini, F; Sacconi, C; Pierleoni, R; Pieretti, B; Piccoli, G; Buffalini, M; Stocchi, V

    2000-11-01

    This paper reports the first results in the proteome analysis of Tuber borchii Vittad. mycelium, an ectomycorrhizal fungus poorly defined genetically, but known for its generation of edible fruit bodies known as white truffles. Employing isoelectric focusing on immobilized pH gradients, followed by sodium dodecyl sulfate-polyacrylamide gel electrophoresis, we obtained an electropherogram presenting over 800 spots within the window of isoelectric points (pI) 3.5-9 and a molecular mass of 10-200 kDa. Different reducing agents were tested in the sample preparation buffers, and the standard lysis buffer plus 2% w/v polyvinylpolypyrrolidone allowed the best solubilization and resolution of the proteins. The T. borchii proteins separated in micropreparative gels were electroblotted onto polyvinylidene difluoride membranes and visualized by Coomassie staining. Twenty-three proteins were excised and analyzed by the combination of amino acid and N-terminal analysis. One protein was identified by matching its amino acid composition, estimated isoelectric point and molecular mass against the SWISS-PROT and EMBL databases. Four spots were successfully tagged by Edman microsequencing but no homologous sequences were found in databases. PMID:11271490

  18. The amino-acid sequence of kangaroo pancreatic ribonuclease.

    PubMed

    Gaastra, W; Welling, G W; Beintema, J J

    1978-05-01

    Red kangaroo (Macropus rufus) ribonuclease was isolated from pancreatic tissue by affinity chromatography. The amino acid sequence was determined by automatic sequencing of overlapping large fragments and by analysis of shorter peptides obtained by digestion with a number of proteolytic enzymes. The polypeptide chain consists of 122 amino acid residues. Compared to other ribonucleases, the N-terminal residue and residue 114 are deleted. In other pancreatic ribonucleases position 114 is occupied by a cis proline residue in an external loop at the surface of the molecule. Other remarkable substitutions are the presence of a tyrosine residue at position 123 instead of a serine which forms a hydrogen bond with the pyrimidine ring of a nucleotide substrate, and a number of hydrophobichydrophilic interchanges in the sequence 51-55, which forms part of an alpha-helix in bovine ribonuclease and exhibits few substitutions in the placental mammals. Kangaroo ribonuclease contains no carbohydrate, although the enzyme possesses a recognition site for carbohydrate attachment in the sequence Asn-Val-Thr (62-64). The enzyme differs at about 35-40% of the positions from all other mammalian pancreatic ribonucleases sequenced to date, which is in agreement with the early divergence between the marsupials and the placental mammals. From fragmentary data a tentative sequence of red-necked wallaby (Macropus rufogriseus) pancreatic ribonuclease has been derived. Eight differences with the kangaroo sequence were found. PMID:658039

  19. Uses of Phage Display in Agriculture: Sequence Analysis and Comparative Modeling of Late Embryogenesis Abundant Client Proteins Suggest Protein-Nucleic Acid Binding Functionality

    PubMed Central

    Kushwaha, Rekha; Downie, A. Bruce; Payne, Christina M.

    2013-01-01

    A group of intrinsically disordered, hydrophilic proteins—Late Embryogenesis Abundant (LEA) proteins—has been linked to survival in plants and animals in periods of stress, putatively through safeguarding enzymatic function and prevention of aggregation in times of dehydration/heat. Yet despite decades of effort, the molecular-level mechanisms defining this protective function remain unknown. A recent effort to understand LEA functionality began with the unique application of phage display, wherein phage display and biopanning over recombinant Seed Maturation Protein homologs from Arabidopsis thaliana and Glycine max were used to retrieve client proteins at two different temperatures, with one intended to represent heat stress. From this previous study, we identified 21 client proteins for which clones were recovered, sometimes repeatedly. Here, we use sequence analysis and homology modeling of the client proteins to ascertain common sequence and structural properties that may contribute to binding affinity with the protective LEA protein. Our methods uncover what appears to be a predilection for protein-nucleic acid interactions among LEA client proteins, which is suggestive of subcellular residence. The results from this initial computational study will guide future efforts to uncover the protein protective mechanisms during heat stress, potentially leading to phage-display-directed evolution of synthetic LEA molecules. PMID:23956788

  20. Isolation and identification of lactic acid bacteria from Tarag in Eastern Inner Mongolia of China by 16S rRNA sequences and DGGE analysis.

    PubMed

    Liu, Wenjun; Bao, Qiuhua; Jirimutu; Qing, Manjun; Siriguleng; Chen, Xia; Sun, Ting; Li, Meihua; Zhang, Jiachao; Yu, Jie; Bilige, Menghe; Sun, Tiansong; Zhang, Heping

    2012-01-20

    Tarag is a characteristic fermented dairy product with rich microflora (especially lactic acid bacteria), developed by the people of Mongolian nationality in Inner Mongolia of China and Mongolia throughout history. One hundred and ninety-eight samples of Tarag were collected from scattered households in Eastern Inner Mongolia, and total of 790 isolates of lactic acid bacteria (LAB) were isolated by traditional pure culture method. To identify these isolates and analyze their biodiversity, 16S rRNA gene sequences analysis and PCR-DGGE were performed respectively. The results showed that 790 isolates could be classified as 31 species and subspecies. Among these isolates, Lactobacillus helveticus (153 strains, about 19.4%), Lactococcus lactis subsp. lactis (132 strains, about 16.7%) and Lactobacillus casei (106 strains, about 11.0%) were considered as the predominated species in the traditional fermented dairy products (Tarag) in Eastern Inner Mongolia. It was shown that the biodiversity of LAB in Tarag in Inner Mongolia was very abundant, and this traditional fermented dairy product could be considered as valuable resources for LAB isolation and probiotic selection. PMID:21689912

  1. The outer capsid protein VP4 of equine rotavirus strain H-2 represents a unique VP4 type by amino acid sequence analysis.

    PubMed

    Hardy, M E; Gorziglia, M; Woode, G N

    1993-03-01

    The nucleotide and deduced amino acid sequence of G serotype 3 equine rotavirus strain H-2 was determined. A predicted 776-amino-acid H-2 VP4 shows less than or equal to 85.3% identity to other rotavirus VP4 types sequenced to date and thus represents a new P serotype. A PCR-generated probe derived from a cDNA clone of H-2 gene 4 hybridized to gene 4 of several tissue-culture-adapted equine rotavirus isolates, demonstrating that the gene 4 allele present in the H-2 strain is present in the equine rotavirus population. PMID:8382410

  2. Detection of nucleic acid sequences by invader-directed cleavage

    DOEpatents

    Brow, Mary Ann D.; Hall, Jeff Steven Grotelueschen; Lyamichev, Victor; Olive, David Michael; Prudent, James Robert

    1999-01-01

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The 5' nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based by charge.

  3. Amino acid analysis

    NASA Technical Reports Server (NTRS)

    Winitz, M.; Graff, J. (Inventor)

    1974-01-01

    The process and apparatus for qualitative and quantitative analysis of the amino acid content of a biological sample are presented. The sample is deposited on a cation exchange resin and then is washed with suitable solvents. The amino acids and various cations and organic material with a basic function remain on the resin. The resin is eluted with an acid eluant, and the eluate containing the amino acids is transferred to a reaction vessel where the eluant is removed. Final analysis of the purified acylated amino acid esters is accomplished by gas-liquid chromatographic techniques.

  4. Amino acid sequence of bovine heart coupling factor 6.

    PubMed Central

    Fang, J K; Jacobs, J W; Kanner, B I; Racker, E; Bradshaw, R A

    1984-01-01

    The amino acid sequence of bovine heart mitochondrial coupling factor 6 (F6) has been determined by automated Edman degradation of the whole protein and derived peptides. Preparations based on heat precipitation and ethanol extraction showed allotypic variation at three positions while material further purified by HPLC yielded only one sequence that also differed by a Phe-Thr replacement at residue 62. The mature protein contains 76 amino acids with a calculated molecular weight of 9006 and a pI of approximately equal to 5, in good agreement with experimentally measured values. The charged amino acids are mainly clustered at the termini and in one section in the middle; these three polar segments are separated by two segments relatively rich in nonpolar residues. Chou-Fasman analysis suggests three stretches of alpha-helix coinciding (or within) the high-charge-density sequences with a single beta-turn at the first polar-nonpolar junction. Comparison of the F6 sequence with those of other proteins did not reveal any homologous structures. PMID:6149548

  5. Hybridization and sequencing of nucleic acids using base pair mismatches

    DOEpatents

    Fodor, Stephen P. A.; Lipshutz, Robert J.; Huang, Xiaohua

    2001-01-01

    Devices and techniques for hybridization of nucleic acids and for determining the sequence of nucleic acids. Arrays of nucleic acids are formed by techniques, preferably high resolution, light-directed techniques. Positions of hybridization of a target nucleic acid are determined by, e.g., epifluorescence microscopy. Devices and techniques are proposed to determine the sequence of a target nucleic acid more efficiently and more quickly through such synthesis and detection techniques.

  6. Computer analysis between nucleotide and amino acid sequences of bean golden mosaic virus and those of maize streak, wheat dwarf, chloris striate mosaic, and beet curly top viruses.

    PubMed

    Ikegami, M

    1989-01-01

    Bean golden mosaic virus (BGMV) DNA 1 and 2 have little sequence homology with maize streak virus (MSV), wheat dwarf virus (WDV), and chloris striate mosaic virus (CSMV) DNAs. BGMV DNA 1 and beet curly top virus (BCTV) DNA are closely related, whereas BGMV DNA 2 and BCTV DNA are not related. Direct amino acid homologies of predicted proteins between BGMV ORFs and MSV ORFs, WDV ORFs or CSMV ORFs were 40-50%. BGMV 1L1 and BCTV L1, and BGMV IL3 and BCTV L4 were highly conserved. The sequence TAATATTAC was detected in the loops of hairpin structures of 5 gemini-viruses. PMID:2615677

  7. 77 FR 65537 - Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-10-29

    ... Amino Acid Sequence Disclosures ACTION: Proposed collection; comment request. SUMMARY: The United States....'' SUPPLEMENTARY INFORMATION: I. Abstract Patent applications that contain nucleotide and/or amino acid sequence disclosures must include a copy of the sequence listing in accordance with the requirements in 37 CFR...

  8. [Multilocus sequence typing (MLST) analysis].

    PubMed

    Matsumura, Yasufumi

    2013-12-01

    Multilocus sequence typing (MLST) analysis has been emerging as a powerful tool for genotyping specific bacterial species. MLST utilizes internal fragments of multiple housekeeping genes and the combination of each allele defines the sequence type for each isolate. MLST databases contain reference data and are freely accessible via internet websites. The standard method for investigating short-term hospital outbreaks is still pulse-field gel-electrophoresis and MLST analysis is not a substitute. However, analysis of sequence types and clonal complexes (closely related sequence types) enables identification and understanding of a specific clone that is widely spreading among drug-resistant organisms, or a key clone that is important for evolution of the organism. In the case of Escherichia coli, CTX-M-15 or CTX-M-14 extended-spectrum beta-lactamase producing ST131 clone has emerged and spread globally in the last 10 years. MLST analysis is an unambiguous procedure and is becoming a common typing method to characterize isolates. PMID:24605545

  9. Structural gene and complete amino acid sequence of Pseudomonas aeruginosa IFO 3455 elastase.

    PubMed Central

    Fukushima, J; Yamamoto, S; Morihara, K; Atsumi, Y; Takeuchi, H; Kawamoto, S; Okuda, K

    1989-01-01

    The DNA encoding the elastase of Pseudomonas aeruginosa IFO 3455 was cloned, and its complete nucleotide sequence was determined. When the cloned gene was ligated to pUC18, the Escherichia coli expression vector, bacteria carrying the gene exhibited high levels of both elastase activity and elastase antigens. The amino acid sequence, deduced from the nucleotide sequence, revealed that the mature elastase consisted of 301 amino acids with a relative molecular mass of 32,926 daltons. The amino acid composition predicted from the DNA sequence was quite similar to the chemically determined composition of purified elastase reported previously. We also observed nucleotide sequence encoding a signal peptide and "pro" sequence consisting of 197 amino acids upstream from the mature elastase protein gene. The amino acid sequence analysis revealed that both the N-terminal sequence of the purified elastase and the N-terminal side sequences of the C-terminal tryptic peptide as well as the internal lysyl peptide fragment were completely identical to the deduced amino acid sequences. The pattern of identity of amino acid sequences was quite evident in the regions that include structurally and functionally important residues of Bacillus subtilis thermolysin. PMID:2493453

  10. Predicting intrinsic disorder from amino acid sequence.

    PubMed

    Obradovic, Zoran; Peng, Kang; Vucetic, Slobodan; Radivojac, Predrag; Brown, Celeste J; Dunker, A Keith

    2003-01-01

    Blind predictions of intrinsic order and disorder were made on 42 proteins subsequently revealed to contain 9,044 ordered residues, 284 disordered residues in 26 segments of length 30 residues or less, and 281 disordered residues in 2 disordered segments of length greater than 30 residues. The accuracies of the six predictors used in this experiment ranged from 77% to 91% for the ordered regions and from 56% to 78% for the disordered segments. The average of the order and disorder predictions ranged from 73% to 77%. The prediction of disorder in the shorter segments was poor, from 25% to 66% correct, while the prediction of disorder in the longer segments was better, from 75% to 95% correct. Four of the predictors were composed of ensembles of neural networks. This enabled them to deal more efficiently with the large asymmetry in the training data through diversified sampling from the significantly larger ordered set and achieve better accuracy on ordered and long disordered regions. The exclusive use of long disordered regions for predictor training likely contributed to the disparity of the predictions on long versus short disordered regions, while averaging the output values over 61-residue windows to eliminate short predictions of order or disorder probably contributed to the even greater disparity for three of the predictors. This experiment supports the predictability of intrinsic disorder from amino acid sequence. PMID:14579347

  11. RSAT: regulatory sequence analysis tools.

    PubMed

    Thomas-Chollier, Morgane; Sand, Olivier; Turatsinze, Jean-Valéry; Janky, Rekin's; Defrance, Matthieu; Vervisch, Eric; Brohée, Sylvain; van Helden, Jacques

    2008-07-01

    The regulatory sequence analysis tools (RSAT, http://rsat.ulb.ac.be/rsat/) is a software suite that integrates a wide collection of modular tools for the detection of cis-regulatory elements in genome sequences. The suite includes programs for sequence retrieval, pattern discovery, phylogenetic footprint detection, pattern matching, genome scanning and feature map drawing. Random controls can be performed with random gene selections or by generating random sequences according to a variety of background models (Bernoulli, Markov). Beyond the original word-based pattern-discovery tools (oligo-analysis and dyad-analysis), we recently added a battery of tools for matrix-based detection of cis-acting elements, with some original features (adaptive background models, Markov-chain estimation of P-values) that do not exist in other matrix-based scanning tools. The web server offers an intuitive interface, where each program can be accessed either separately or connected to the other tools. In addition, the tools are now available as web services, enabling their integration in programmatic workflows. Genomes are regularly updated from various genome repositories (NCBI and EnsEMBL) and 682 organisms are currently supported. Since 1998, the tools have been used by several hundreds of researchers from all over the world. Several predictions made with RSAT were validated experimentally and published. PMID:18495751

  12. Molecular cloning and amino acid sequence of human 5-lipoxygenase

    SciTech Connect

    Matsumoto, T.; Funk, C.D.; Radmark, O.; Hoeoeg, J.O.; Joernvall, H.; Samuelsson, B.

    1988-01-01

    5-Lipoxygenase (EC 1.13.11.34), a Ca/sup 2 +/- and ATP-requiring enzyme, catalyzes the first two steps in the biosynthesis of the peptidoleukotrienes and the chemotactic factor leukotriene B/sub 4/. A cDNA clone corresponding to 5-lipoxygenase was isolated from a human lung lambda gt11 expression library by immunoscreening with a polyclonal antibody. Additional clones from a human placenta lambda gt11 cDNA library were obtained by plaque hybridization with the /sup 32/P-labeled lung cDNA clone. Sequence data obtained from several overlapping clones indicate that the composite DNAs contain the complete coding region for the enzyme. From the deduced primary structure, 5-lipoxygenase encodes a 673 amino acid protein with a calculated molecular weight of 77,839. Direct analysis of the native protein and its proteolytic fragments confirmed the deduced composition, the amino-terminal amino acid sequence, and the structure of many internal segments. 5-Lipoxygenase has no apparent sequence homology with leukotriene A/sub 4/ hydrolase or Ca/sup 2 +/-binding proteins. RNA blot analysis indicated substantial amounts of an mRNA species of approx. = 2700 nucleotides in leukocytes, lung, and placenta.

  13. Methods and compositions for efficient nucleic acid sequencing

    DOEpatents

    Drmanac, Radoje

    2002-01-01

    Disclosed are novel methods and compositions for rapid and highly efficient nucleic acid sequencing based upon hybridization with two sets of small oligonucleotide probes of known sequences. Extremely large nucleic acid molecules, including chromosomes and non-amplified RNA, may be sequenced without prior cloning or subcloning steps. The methods of the invention also solve various current problems associated with sequencing technology such as, for example, high noise to signal ratios and difficult discrimination, attaching many nucleic acid fragments to a surface, preparing many, longer or more complex probes and labelling more species.

  14. Methods and compositions for efficient nucleic acid sequencing

    DOEpatents

    Drmanac, Radoje

    2006-07-04

    Disclosed are novel methods and compositions for rapid and highly efficient nucleic acid sequencing based upon hybridization with two sets of small oligonucleotide probes of known sequences. Extremely large nucleic acid molecules, including chromosomes and non-amplified RNA, may be sequenced without prior cloning or subcloning steps. The methods of the invention also solve various current problems associated with sequencing technology such as, for example, high noise to signal ratios and difficult discrimination, attaching many nucleic acid fragments to a surface, preparing many, longer or more complex probes and labelling more species.

  15. Kit for detecting nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    2001-01-01

    A kit is provided for detecting a target nucleic acid sequence in a sample, the kit comprising: a first hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a first portion of the target sequence, the first hybridization probe including a first complexing agent for forming a binding pair with a second complexing agent; and a second hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a second portion of the target sequence to which the first hybridization probe does not selectively hybridize, the second hybridization probe including a detectable marker; a third hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a first portion of the target sequence, the third hybridization probe including the same detectable marker as the second hybridization probe; and a fourth hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a second portion of the target sequence to which the third hybridization probe does not selectively hybridize, the fourth hybridization probe including the first complexing agent for forming a binding pair with the second complexing agent; wherein the first and second hybridization probes are capable of simultaneously hybridizing to the target sequence and the third and fourth hybridization probes are capable of simultaneously hybridizing to the target sequence, the detectable marker is not present on the first or fourth hybridization probes and the first, second, third, and fourth hybridization probes each include a competitive nucleic acid sequence which is sufficiently complementary to a third portion of the target sequence that the competitive sequences of the first, second, third, and fourth hybridization probes compete with each other to hybridize to the third portion of the

  16. Solid phase sequencing of double-stranded nucleic acids

    DOEpatents

    Fu, Dong-Jing; Cantor, Charles R.; Koster, Hubert; Smith, Cassandra L.

    2002-01-01

    This invention relates to methods for detecting and sequencing of target double-stranded nucleic acid sequences, to nucleic acid probes and arrays of probes useful in these methods, and to kits and systems which contain these probes. Useful methods involve hybridizing the nucleic acids or nucleic acids which represent complementary or homologous sequences of the target to an array of nucleic acid probes. These probe comprise a single-stranded portion, an optional double-stranded portion and a variable sequence within the single-stranded portion. The molecular weights of the hybridized nucleic acids of the set can be determined by mass spectroscopy, and the sequence of the target determined from the molecular weights of the fragments. Nucleic acids whose sequences can be determined include nucleic acids in biological samples such as patient biopsies and environmental samples. Probes may be fixed to a solid support such as a hybridization chip to facilitate automated determination of molecular weights and identification of the target sequence.

  17. Amino acid analysis.

    PubMed

    Crabb, J W; West, K A; Dodson, W S; Hulmes, J D

    2001-05-01

    Amino acid analysis (AAA) is one of the best methods to quantify peptides and proteins. Two general approaches to quantitative AAA exist, namely, classical postcolumn derivatization following ion-exchange chromatography and precolumn derivatization followed by reversed-phase HPLC (RP-HPLC). Excellent instrumentation and several specific methodologies are available for both approaches, and both have advantages and disadvantages. This unit focuses on picomole-level AAA of peptides and proteins using the most popular precolumn-derivatization method, namely, phenylthiocarbamyl amino acid analysis (PTC-AAA). It is directed primarily toward those interested in establishing the technology with a modest budget. PTC derivatization and analysis conditions are described, and support and alternate protocols describe additional techniques necessary or useful for most any AAA method--e.g., sample preparation, hydrolysis, instrument calibration, data interpretation, and analysis of difficult or unusual residues such as cysteine, tryptophan, phosphoamino acids, and hydroxyproline. PMID:18429107

  18. Integrative visual analysis of protein sequence mutations

    PubMed Central

    2014-01-01

    Background An important aspect of studying the relationship between protein sequence, structure and function is the molecular characterization of the effect of protein mutations. To understand the functional impact of amino acid changes, the multiple biological properties of protein residues have to be considered together. Results Here, we present a novel visual approach for analyzing residue mutations. It combines different biological visualizations and integrates them with molecular data derived from external resources. To show various aspects of the biological information on different scales, our approach includes one-dimensional sequence views, three-dimensional protein structure views and two-dimensional views of residue interaction networks as well as aggregated views. The views are linked tightly and synchronized to reduce the cognitive load of the user when switching between them. In particular, the protein mutations are mapped onto the views together with further functional and structural information. We also assess the impact of individual amino acid changes by the detailed analysis and visualization of the involved residue interactions. We demonstrate the effectiveness of our approach and the developed software on the data provided for the BioVis 2013 data contest. Conclusions Our visual approach and software greatly facilitate the integrative and interactive analysis of protein mutations based on complementary visualizations. The different data views offered to the user are enriched with information about molecular properties of amino acid residues and further biological knowledge. PMID:25237389

  19. Automated carboxy-terminal sequence analysis of peptides.

    PubMed Central

    Bailey, J. M.; Shenoy, N. R.; Ronk, M.; Shively, J. E.

    1992-01-01

    Proteins and peptides can be sequenced from the carboxy-terminus with isothiocyanate reagents to produce amino acid thiohydantoin derivatives. Previous studies in our laboratory have focused on solution phase conditions for formation of the peptidylthiohydantoins with trimethylsilylisothiocyanate (TMS-ITC) and for hydrolysis of these peptidylthiohydantoins into an amino acid thiohydantoin derivative and a new shortened peptide capable of continued degradation (Bailey, J. M. & Shively, J. E., 1990, Biochemistry 29, 3145-3156). The current study is a continuation of this work and describes the construction of an instrument for automated C-terminal sequencing, the application of the thiocyanate chemistry to peptides covalently coupled to a novel polyethylene solid support (Shenoy, N. R., Bailey, J. M., & Shively, J. E., 1992, Protein Sci. I, 58-67), the use of sodium trimethylsilanolate as a novel reagent for the specific cleavage of the derivatized C-terminal amino acid, and the development of methodology to sequence through the difficult amino acid, aspartate. Automated programs are described for the C-terminal sequencing of peptides covalently attached to carboxylic acid-modified polyethylene. The chemistry involves activation with acetic anhydride, derivatization with TMS-ITC, and cleavage of the derivatized C-terminal amino acid with sodium trimethylsilanolate. The thiohydantoin amino acid is identified by on-line high performance liquid chromatography using a Phenomenex Ultracarb 5 ODS(30) column and a triethylamine/phosphoric acid buffer system containing pentanesulfonic acid. The generality of our automated C-terminal sequencing methodology was examined by sequencing model peptides containing all 20 of the common amino acids. All of the amino acids were found to sequence in high yield (90% or greater) except for asparagine and aspartate, which could be only partially removed, and proline, which was found not be capable of derivatization. In spite of these

  20. From Artificial Amino Acids to Sequence-Defined Targeted Oligoaminoamides.

    PubMed

    Morys, Stephan; Wagner, Ernst; Lächelt, Ulrich

    2016-01-01

    Artificial oligoamino acids with appropriate protecting groups can be used for the sequential assembly of oligoaminoamides on solid-phase. With the help of these oligoamino acids multifunctional nucleic acid (NA) carriers can be designed and produced in highly defined topologies. Here we describe the synthesis of the artificial oligoamino acid Fmoc-Stp(Boc3)-OH, the subsequent assembly into sequence-defined oligomers and the formulation of tumor-targeted plasmid DNA (pDNA) polyplexes. PMID:27436323

  1. Partial amino acid sequence of human factor D:homology with serine proteases.

    PubMed Central

    Volanakis, J E; Bhown, A; Bennett, J C; Mole, J E

    1980-01-01

    Human factor D purified to homogeneity by a modified procedure was subjected to NH2-terminal amino acid sequence analysis by using a modified automated Beckman sequencer. We identified 48 of the first 57 NH2-terminal amino acids in a single sequencer run, using microgram quantities of factor D. The deduced amino acid sequence represents approximately 25% of the primary structure of factor D. This extended NH2-terminal amino acid sequence of factor D was compared to that of other trypsin-related serine proteases. By visual inspection, strong homologies (33--50% identity) were observed with all the serine proteases included in the comparison. Interestingly, factor D showed a higher degree of homology to serine proteases of pancreatic origin than to those of serum origin. Images PMID:6987665

  2. Amino acid sequence of Japanese quail (Coturnix japonica) and northern bobwhite (Colinus virginianus) myoglobin.

    PubMed

    Goodson, John; Beckstead, Robert B; Payne, Jason; Singh, Rakesh K; Mohan, Anand

    2015-08-15

    Myoglobin has an important physiological role in vertebrates, and as the primary sarcoplasmic pigment in meat, influences quality perception and consumer acceptability. In this study, the amino acid sequences of Japanese quail and northern bobwhite myoglobin were deduced by cDNA cloning of the coding sequence from mRNA. Japanese quail myoglobin was isolated from quail cardiac muscles, purified using ammonium sulphate precipitation and gel-filtration, and subjected to multiple enzymatic digestions. Mass spectrometry corroborated the deduced protein amino acid sequence at the protein level. Sequence analysis revealed both species' myoglobin structures consist of 153 amino acids, differing at only three positions. When compared with chicken myoglobin, Japanese quail showed 98% sequence identity, and northern bobwhite 97% sequence identity. The myoglobin in both quail species contained eight histidine residues instead of the nine present in chicken and turkey. PMID:25794748

  3. De novo Sequencing and Transcriptome Analysis of Pinellia ternata Identify the Candidate Genes Involved in the Biosynthesis of Benzoic Acid and Ephedrine

    PubMed Central

    Zhang, Guang-hui; Jiang, Ni-hao; Song, Wan-ling; Ma, Chun-hua; Yang, Sheng-chao; Chen, Jun-wen

    2016-01-01

    Background: The medicinal herb, Pinellia ternata, is purported to be an anti-emetic with analgesic and sedative effects. Alkaloids are the main biologically active compounds in P. ternata, especially ephedrine that is a phenylpropylamino alkaloid specifically produced by Ephedra and Catha edulis. However, how ephedrine is synthesized in plants is uncertain. Only the phenylalanine ammonia lyase (PAL) and relevant genes in this pathway have been characterized. Genomic information of P. ternata is also unavailable. Results: We analyzed the transcriptome of the tuber of P. ternata with the Illumina HiSeq™ 2000 sequencing platform. 66,813,052 high-quality reads were generated, and these reads were assembled de novo into 89,068 unigenes. Most known genes involved in benzoic acid biosynthesis were identified in the unigene dataset of P. ternata, and the expression patterns of some ephedrine biosynthesis-related genes were analyzed by reverse transcription quantitative real-time PCR (RT-qPCR). Also, 14,468 simple sequence repeats (SSRs) were identified from 12,000 unigenes. Twenty primer pairs for SSRs were randomly selected for the validation of their amplification effect. Conclusion: RNA-seq data was used for the first time to provide a comprehensive gene information on P. ternata at the transcriptional level. These data will advance molecular genetics in this valuable medicinal plant. PMID:27579029

  4. Identification and sequence analysis of pWcMBF8-1, a bacteriocin-encoding plasmid from the lactic acid bacterium Weissella confusa.

    PubMed

    Malik, Amarila; Sumayyah, Sumayyah; Yeh, Chia-Wen; Heng, Nicholas C K

    2016-04-01

    Members of the Gram-positive lactic acid bacteria (LAB) are well-known for their beneficial properties as starter cultures and probiotics. Many LAB species produce ribosomally synthesized proteinaceous antibiotics (bacteriocins). Weissella confusa MBF8-1 is a strain isolated from a fermented soybean product that not only produces useful exopolysaccharides but also exhibits bacteriocin activity, which we call weissellicin MBF. Here, we show that bacteriocin production by W. confusa MBF8-1 is specified by a large plasmid, pWcMBF8-1. Plasmid pWcMBF8-1 (GenBank accession number KR350502), which was identified from the W. confusa MBF8-1 draft genome sequence, is 17 643 bp in length with a G + C content of 34.8% and contains 25 open reading frames (ORFs). Six ORFs constitute the weissellicin MBF locus, encoding three putative double-glycine-motif peptides (Bac1, Bac2, Bac3), an ABC transporter complex (BacTE) and a putative immunity protein (BacI). Two ORFs encode plasmid partitioning and mobilization proteins, suggesting that pWcMBF8-1 is transferable to other hosts. To the best of our knowledge, plasmid pWcMBF8-1 not only represents the first large Weissella plasmid to be sequenced but also the first to be associated with bacteriocin production in W. confusa. PMID:26976853

  5. Detecting frame shifts by amino acid sequence comparison.

    PubMed

    Claverie, J M

    1993-12-20

    Various amino acid substitution scoring matrices are used in conjunction with local alignments programs to detect regions of similarity and infer potential common ancestry between proteins. The usual scoring schemes derive from the implicit hypothesis that related proteins evolve from a common ancestor by the accumulation of point mutations and that amino acids tend to be progressively substituted by others with similar properties. However, other frequent single mutation events, like nucleotide insertion or deletion and gene inversion, change the translation reading frame and cause previously encoded amino acid sequences to become unrecognizable at once. Here, I derive five new types of scoring matrix, each capable of detecting a specific frame shift (deletion, insertion and inversion in 3 frames) and use them with a regular local alignments program to detect amino acid sequences that may have derived from alternative reading frames of the same nucleotide sequence. Frame shifts are inferred from the sole comparison of the protein sequences. The five scoring matrices were used with the BLASTP program to compare all the protein sequences in the Swissprot database. Surprisingly, the searches revealed hundreds of highly significant frame shift matches, of which many are likely to represent sequencing errors. Others provide some evidence that frame shift mutations might be used in protein evolution as a way to create new amino acid sequences from pre-existing coding regions. PMID:7903399

  6. Segments of amino acid sequence similarity in beta-amylases.

    PubMed

    Friedberg, F; Rhodes, C

    1988-01-01

    In alpha-amylases from animals, plants and bacteria and in beta-amylases from plants and bacteria a number of segments exhibit amino acid sequence similarity specific to the alpha or to the beta type, respectively. In the case of the beta-amylases the similar sequence regions are extensive and they are disrupted only by short interspersed dissimilar regions. Close to the C terminus, however, no such sequence similarity exist. PMID:2464171

  7. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... acids are not intended to be embraced by this definition. Any amino acid sequence that contains post-translationally modified amino acids may be described as the amino acid sequence that is initially translated... sequence of four or more amino acids or an unbranched sequence of ten or more nucleotides....

  8. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... acids are not intended to be embraced by this definition. Any amino acid sequence that contains post-translationally modified amino acids may be described as the amino acid sequence that is initially translated... sequence of four or more amino acids or an unbranched sequence of ten or more nucleotides....

  9. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... acids are not intended to be embraced by this definition. Any amino acid sequence that contains post-translationally modified amino acids may be described as the amino acid sequence that is initially translated... sequence of four or more amino acids or an unbranched sequence of ten or more nucleotides....

  10. A method to find palindromes in nucleic acid sequences.

    PubMed

    Anjana, Ramnath; Shankar, Mani; Vaishnavi, Marthandan Kirti; Sekar, Kanagaraj

    2013-01-01

    Various types of sequences in the human genome are known to play important roles in different aspects of genomic functioning. Among these sequences, palindromic nucleic acid sequences are one such type that have been studied in detail and found to influence a wide variety of genomic characteristics. For a nucleotide sequence to be considered as a palindrome, its complementary strand must read the same in the opposite direction. For example, both the strands i.e the strand going from 5' to 3' and its complementary strand from 3' to 5' must be complementary. A typical nucleotide palindromic sequence would be TATA (5' to 3') and its complimentary sequence from 3' to 5' would be ATAT. Thus, a new method has been developed using dynamic programming to fetch the palindromic nucleic acid sequences. The new method uses less memory and thereby it increases the overall speed and efficiency. The proposed method has been tested using the bacterial (3891 KB bases) and human chromosomal sequences (Chr-18: 74366 kb and Chr-Y: 25554 kb) and the computation time for finding the palindromic sequences is in milli seconds. PMID:23515654

  11. The amino acid sequence of monal pheasant lysozyme and its activity.

    PubMed

    Araki, T; Matsumoto, T; Torikata, T

    1998-10-01

    The amino acid sequence of monal pheasant lysozyme and its activity were analyzed. Carboxymethylated lysozyme was digested with trypsin and the resulting peptides were sequenced. The established amino acid sequence had one amino acid substitution at position 102 (Arg to Gly) comparing with Indian peafowl lysozyme and four amino acid substitutions at positions 3 (Phe to Tyr), 15 (His to Leu), 41 (Gln to His), and 121 (Gln to His) with chicken lysozyme. Analysis of the time-courses of reaction using N-acetylglucosamine pentamer as a substrate showed a difference of binding free energy change (-0.4 kcal/mol) at subsites A between monal pheasant and Indian peafowl lysozyme. This was assumed to be caused by the amino acid substitution at subsite A with loss of a positive charge at position 102 (Arg102 to Gly). PMID:9836434

  12. Cannabinoid acids analysis.

    PubMed

    Lercker, G; Bocci, F; Frega, N; Bortolomeazzi, R

    1992-03-01

    The cannabinoid pattern of vegetable preparations from Cannabis sativa (hashish, marijuana) allows to recognize the phenotype of the plants, to be used as drug or for fiber. Cannabinoid determination by analytical point of view has represented some problems caused by the complex composition of the hexane extract. Capillary gas chromatography of the hexane extracts of vegetable samples, shows the presence of rather polar constituents that eluted, with noticeable interactions, only on polar phase. The compounds can be methylated by diazomethane and silanized (TMS) by silylating reagents. The methyl and methyl-TMS derivatives are analyzed by high resolution gas chromatography (HRGC) and by gas chromatography-mass spectrometry (GC-MS). The identification of the compounds shows their nature of cannabinoid acids, which the main by quantitative point of view results the cannabidiolic acid (CBDA). It is known that the cannabinoid acids are thermally unstable and are transformed in the corresponding cannabinoids by decarboxilation. This is of interest in forensic analysis with the aim to establish the total amount of THC in the Cannabis preparations, as the active component. PMID:1503600

  13. RNA sequence analysis using covariance models.

    PubMed Central

    Eddy, S R; Durbin, R

    1994-01-01

    We describe a general approach to several RNA sequence analysis problems using probabilistic models that flexibly describe the secondary structure and primary sequence consensus of an RNA sequence family. We call these models 'covariance models'. A covariance model of tRNA sequences is an extremely sensitive and discriminative tool for searching for additional tRNAs and tRNA-related sequences in sequence databases. A model can be built automatically from an existing sequence alignment. We also describe an algorithm for learning a model and hence a consensus secondary structure from initially unaligned example sequences and no prior structural information. Models trained on unaligned tRNA examples correctly predict tRNA secondary structure and produce high-quality multiple alignments. The approach may be applied to any family of small RNA sequences. Images PMID:8029015

  14. N-terminal sequence of amino acids and some properties of an acid-stable alpha-amylase from citric acid-koji (Aspergillus usamii var.).

    PubMed

    Suganuma, T; Tahara, N; Kitahara, K; Nagahama, T; Inuzuka, K

    1996-01-01

    An acid-stable alpha-amylase (AA) was purified from an acidic extract of citric acid-koji (A. usamii var.). The N-terminal sequence of the first 20 amino acids of the enzyme was identical with that of AA from A. niger, but the two enzymes differed in molecular weight. HPLC analysis for identifying the anomers of products indicated that the AA hydrolyzed maltopentaose (G5) at the third glycoside bond predominantly, which differed from Taka-amylase A and the neutral alpha-amylase (NA) from the citric acid-koji. PMID:8824843

  15. Phylogenetic Analysis of Poliovirus Sequences.

    PubMed

    Jorba, Jaume

    2016-01-01

    Comparative genomic sequencing is a major surveillance tool in the Polio Laboratory Network. Due to the rapid evolution of polioviruses (~1 % per year), pathways of virus transmission can be reconstructed from the pathways of genomic evolution. Here, we describe three main phylogenetic methods; estimation of genetic distances, reconstruction of a maximum-likelihood (ML) tree, and estimation of substitution rates using Bayesian Markov chain Monte Carlo (MCMC). The data set used consists of complete capsid sequences from a survey of poliovirus sequences available in GenBank. PMID:26983737

  16. On Quantum Algorithm for Multiple Alignment of Amino Acid Sequences

    NASA Astrophysics Data System (ADS)

    Iriyama, Satoshi; Ohya, Masanori

    2009-02-01

    The alignment of genome sequences or amino acid sequences is one of fundamental operations for the study of life. Usual computational complexity for the multiple alignment of N sequences with common length L by dynamic programming is O(LN). This alignment is considered as one of the NP problems, so that it is desirable to find a nice algorithm of the multiple alignment. Thus in this paper we propose the quantum algorithm for the multiple alignment based on the works12,1,2 in which the NP complete problem was shown to be the P problem by means of quantum algorithm and chaos information dynamics.

  17. Prebiotically plausible mechanisms increase compositional diversity of nucleic acid sequences

    PubMed Central

    Derr, Julien; Manapat, Michael L.; Rajamani, Sudha; Leu, Kevin; Xulvi-Brunet, Ramon; Joseph, Isaac; Nowak, Martin A.; Chen, Irene A.

    2012-01-01

    During the origin of life, the biological information of nucleic acid polymers must have increased to encode functional molecules (the RNA world). Ribozymes tend to be compositionally unbiased, as is the vast majority of possible sequence space. However, ribonucleotides vary greatly in synthetic yield, reactivity and degradation rate, and their non-enzymatic polymerization results in compositionally biased sequences. While natural selection could lead to complex sequences, molecules with some activity are required to begin this process. Was the emergence of compositionally diverse sequences a matter of chance, or could prebiotically plausible reactions counter chemical biases to increase the probability of finding a ribozyme? Our in silico simulations using a two-letter alphabet show that template-directed ligation and high concatenation rates counter compositional bias and shift the pool toward longer sequences, permitting greater exploration of sequence space and stable folding. We verified experimentally that unbiased DNA sequences are more efficient templates for ligation, thus increasing the compositional diversity of the pool. Our work suggests that prebiotically plausible chemical mechanisms of nucleic acid polymerization and ligation could predispose toward a diverse pool of longer, potentially structured molecules. Such mechanisms could have set the stage for the appearance of functional activity very early in the emergence of life. PMID:22319215

  18. Analysis of expressed sequence tags (ESTs) from Agrostis species obtained using sequence related amplified polymorphism.

    PubMed

    Dinler, Gizem; Budak, Hikmet

    2008-10-01

    Bentgrass (Agrostis spp.), a genus of the Poaceae family, consists of more than 200 species and is mainly used in athletic fields and golf courses. Creeping bentgrass (A. stolonifera L.) is the most commonly used species in maintaining golf courses, followed by colonial bentgrass (A. capillaris L.) and velvet bentgrass (A. canina L.). The presence and nature of sequence related amplified polymorphism (SRAP) at the cDNA level were investigated. We isolated 80 unique cDNA fragment bands from these species using 56 SRAP primer combinations. Sequence analysis of cDNA clones and analysis of putative translation products revealed that some encoded amino acid sequences were similar to proteins involved in DNA synthesis, transcription, and signal transduction. The cytosolic glyceraldehyde-3-phosphate dehydrogenase (GAPDH) gene (GenBank accession no. EB812822) was also identified from velvet bentgrass, and the corresponding protein sequence is further analyzed due to its critical role in many cellular processes. The partial peptide sequence obtained was 112 amino acids long, presenting a high degree of homology to parts of the N-terminal and C-terminal regions of cytosolic phosphorylating GAPDH (GapC). The existence of common expressed sequence tags (ESTs) revealed by a minimum evolutionary dendrogram among the Agrostis ESTs indicated the usefulness of SRAP for comparative genome analysis of transcribed genes in the grass species. PMID:18726683

  19. Genome Sequencing and Analysis Conference IV

    SciTech Connect

    Not Available

    1993-12-31

    J. Craig Venter and C. Thomas Caskey co-chaired Genome Sequencing and Analysis Conference IV held at Hilton Head, South Carolina from September 26--30, 1992. Venter opened the conference by noting that approximately 400 researchers from 16 nations were present four times as many participants as at Genome Sequencing Conference I in 1989. Venter also introduced the Data Fair, a new component of the conference allowing exchange and on-site computer analysis of unpublished sequence data.

  20. Transcriptome Sequencing in Response to Salicylic Acid in Salvia miltiorrhiza

    PubMed Central

    Zhang, Xiaoru; Dong, Juane; Liu, Hailong; Wang, Jiao; Qi, Yuexin; Liang, Zongsuo

    2016-01-01

    Salvia miltiorrhiza is a traditional Chinese herbal medicine, whose quality and yield are often affected by diseases and environmental stresses during its growing season. Salicylic acid (SA) plays a significant role in plants responding to biotic and abiotic stresses, but the involved regulatory factors and their signaling mechanisms are largely unknown. In order to identify the genes involved in SA signaling, the RNA sequencing (RNA-seq) strategy was employed to evaluate the transcriptional profiles in S. miltiorrhiza cell cultures. A total of 50,778 unigenes were assembled, in which 5,316 unigenes were differentially expressed among 0-, 2-, and 8-h SA induction. The up-regulated genes were mainly involved in stimulus response and multi-organism process. A core set of candidate novel genes coding SA signaling component proteins was identified. Many transcription factors (e.g., WRKY, bHLH and GRAS) and genes involved in hormone signal transduction were differentially expressed in response to SA induction. Detailed analysis revealed that genes associated with defense signaling, such as antioxidant system genes, cytochrome P450s and ATP-binding cassette transporters, were significantly overexpressed, which can be used as genetic tools to investigate disease resistance. Our transcriptome analysis will help understand SA signaling and its mechanism of defense systems in S. miltiorrhiza. PMID:26808150

  1. Transcriptome Sequencing in Response to Salicylic Acid in Salvia miltiorrhiza.

    PubMed

    Zhang, Xiaoru; Dong, Juane; Liu, Hailong; Wang, Jiao; Qi, Yuexin; Liang, Zongsuo

    2016-01-01

    Salvia miltiorrhiza is a traditional Chinese herbal medicine, whose quality and yield are often affected by diseases and environmental stresses during its growing season. Salicylic acid (SA) plays a significant role in plants responding to biotic and abiotic stresses, but the involved regulatory factors and their signaling mechanisms are largely unknown. In order to identify the genes involved in SA signaling, the RNA sequencing (RNA-seq) strategy was employed to evaluate the transcriptional profiles in S. miltiorrhiza cell cultures. A total of 50,778 unigenes were assembled, in which 5,316 unigenes were differentially expressed among 0-, 2-, and 8-h SA induction. The up-regulated genes were mainly involved in stimulus response and multi-organism process. A core set of candidate novel genes coding SA signaling component proteins was identified. Many transcription factors (e.g., WRKY, bHLH and GRAS) and genes involved in hormone signal transduction were differentially expressed in response to SA induction. Detailed analysis revealed that genes associated with defense signaling, such as antioxidant system genes, cytochrome P450s and ATP-binding cassette transporters, were significantly overexpressed, which can be used as genetic tools to investigate disease resistance. Our transcriptome analysis will help understand SA signaling and its mechanism of defense systems in S. miltiorrhiza. PMID:26808150

  2. Sequencing and comparative analysis of the gorilla MHC genomic sequence.

    PubMed

    Wilming, Laurens G; Hart, Elizabeth A; Coggill, Penny C; Horton, Roger; Gilbert, James G R; Clee, Chris; Jones, Matt; Lloyd, Christine; Palmer, Sophie; Sims, Sarah; Whitehead, Siobhan; Wiley, David; Beck, Stephan; Harrow, Jennifer L

    2013-01-01

    Major histocompatibility complex (MHC) genes play a critical role in vertebrate immune response and because the MHC is linked to a significant number of auto-immune and other diseases it is of great medical interest. Here we describe the clone-based sequencing and subsequent annotation of the MHC region of the gorilla genome. Because the MHC is subject to extensive variation, both structural and sequence-wise, it is not readily amenable to study in whole genome shotgun sequence such as the recently published gorilla genome. The variation of the MHC also makes it of evolutionary interest and therefore we analyse the sequence in the context of human and chimpanzee. In our comparisons with human and re-annotated chimpanzee MHC sequence we find that gorilla has a trimodular RCCX cluster, versus the reference human bimodular cluster, and additional copies of Class I (pseudo)genes between Gogo-K and Gogo-A (the orthologues of HLA-K and -A). We also find that Gogo-H (and Patr-H) is coding versus the HLA-H pseudogene and, conversely, there is a Gogo-DQB2 pseudogene versus the HLA-DQB2 coding gene. Our analysis, which is freely available through the VEGA genome browser, provides the research community with a comprehensive dataset for comparative and evolutionary research of the MHC. PMID:23589541

  3. Amino acid sequence of Salmonella typhimurium branched-chain amino acid aminotransferase.

    PubMed

    Feild, M J; Nguyen, D C; Armstrong, F B

    1989-06-13

    The complete amino acid sequence of the subunit of branched-chain amino acid aminotransferase (transaminase B, EC 2.6.1.42) of Salmonella typhimurium was determined. An Escherichia coli recombinant containing the ilvGEDAY gene cluster of Salmonella was used as the source of the hexameric enzyme. The peptide fragments used for sequencing were generated by treatment with trypsin, Staphylococcus aureus V8 protease, endoproteinase Lys-C, and cyanogen bromide. The enzyme subunit contains 308 residues and has a molecular weight of 33,920. To determine the coenzyme-binding site, the pyridoxal 5-phosphate containing enzyme was treated with tritiated sodium borohydride prior to trypsin digestion. Peptide map comparisons with an apoenzyme tryptic digest and monitoring radioactivity incorporation allowed identification of the pyridoxylated peptide, which was then isolated and sequenced. The coenzyme-binding site is the lysyl residue at position 159. The amino acid sequence of Salmonella transaminase B is 97.4% identical with that of Escherichia coli, differing in only eight amino acid positions. Sequence comparisons of transaminase B to other known aminotransferase sequences revealed limited sequence similarity (24-33%) when conserved amino acid substitutions are allowed and alignments were forced to occur on the coenzyme-binding site. PMID:2669973

  4. Fractal analysis of DNA sequence data

    SciTech Connect

    Berthelsen, C.L.

    1993-01-01

    DNA sequence databases are growing at an almost exponential rate. New analysis methods are needed to extract knowledge about the organization of nucleotides from this vast amount of data. Fractal analysis is a new scientific paradigm that has been used successfully in many domains including the biological and physical sciences. Biological growth is a nonlinear dynamic process and some have suggested that to consider fractal geometry as a biological design principle may be most productive. This research is an exploratory study of the application of fractal analysis to DNA sequence data. A simple random fractal, the random walk, is used to represent DNA sequences. The fractal dimension of these walks is then estimated using the [open quote]sandbox method[close quote]. Analysis of 164 human DNA sequences compared to three types of control sequences (random, base-content matched, and dimer-content matched) reveals that long-range correlations are present in DNA that are not explained by base or dimer frequencies. The study also revealed that the fractal dimension of coding sequences was significantly lower than sequences that were primarily noncoding, indicating the presence of longer-range correlations in functional sequences. The multifractal spectrum is used to analyze fractals that are heterogeneous and have a different fractal dimension for subsets with different scalings. The multifractal spectrum of the random walks of twelve mitochondrial genome sequences was estimated. Eight vertebrate mtDNA sequences had uniformly lower spectra values than did four invertebrate mtDNA sequences. Thus, vertebrate mitochondria show significantly longer-range correlations than to invertebrate mitochondria. The higher multifractal spectra values for invertebrate mitochondria suggest a more random organization of the sequences. This research also includes considerable theoretical work on the effects of finite size, embedding dimension, and scaling ranges.

  5. Fractal Analysis of DNA Sequence Data

    NASA Astrophysics Data System (ADS)

    Berthelsen, Cheryl Lynn

    DNA sequence databases are growing at an almost exponential rate. New analysis methods are needed to extract knowledge about the organization of nucleotides from this vast amount of data. Fractal analysis is a new scientific paradigm that has been used successfully in many domains including the biological and physical sciences. Biological growth is a nonlinear dynamic process and some have suggested that to consider fractal geometry as a biological design principle may be most productive. This research is an exploratory study of the application of fractal analysis to DNA sequence data. A simple random fractal, the random walk, is used to represent DNA sequences. The fractal dimension of these walks is then estimated using the "sandbox method." Analysis of 164 human DNA sequences compared to three types of control sequences (random, base -content matched, and dimer-content matched) reveals that long-range correlations are present in DNA that are not explained by base or dimer frequencies. The study also revealed that the fractal dimension of coding sequences was significantly lower than sequences that were primarily noncoding, indicating the presence of longer-range correlations in functional sequences. The multifractal spectrum is used to analyze fractals that are heterogeneous and have a different fractal dimension for subsets with different scalings. The multifractal spectrum of the random walks of twelve mitochondrial genome sequences was estimated. Eight vertebrate mtDNA sequences had uniformly lower spectra values than did four invertebrate mtDNA sequences. Thus, vertebrate mitochondria show significantly longer-range correlations than do invertebrate mitochondria. The higher multifractal spectra values for invertebrate mitochondria suggest a more random organization of the sequences. This research also includes considerable theoretical work on the effects of finite size, embedding dimension, and scaling ranges.

  6. CROSS-DISCIPLINARY PHYSICS AND RELATED AREAS OF SCIENCE AND TECHNOLOGY: The structural analysis of protein sequences based on the quasi-amino acids code

    NASA Astrophysics Data System (ADS)

    Zhu, Ping; Tang, Xu-Qing; Xu, Zhen-Yuan

    2009-01-01

    Proteomics is the study of proteins and their interactions in a cell. With the successful completion of the Human Genome Project, it comes the postgenome era when the proteomics technology is emerging. This paper studies protein molecule from the algebraic point of view. The algebraic system (Σ, +, *) is introduced, where Σ is the set of 64 codons. According to the characteristics of (Σ, +, *), a novel quasi-amino acids code classification method is introduced and the corresponding algebraic operation table over the set ZU of the 16 kinds of quasi-amino acids is established. The internal relation is revealed about quasi-amino acids. The results show that there exist some very close correlations between the properties of the quasi-amino acids and the codon. All these correlation relationships may play an important part in establishing the logic relationship between codons and the quasi-amino acids during the course of life origination. According to Ma F et al (2003 J. Anhui Agricultural University 30 439), the corresponding relation and the excellent properties about amino acids code are very difficult to observe. The present paper shows that (ZU, ⊕, otimes) is a field. Furthermore, the operational results display that the codon tga has different property from other stop codons. In fact, in the mitochondrion from human and ox genomic codon, tga is just tryptophane, is not the stop codon like in other genetic code, it is the case of the Chen W C et al (2002 Acta Biophysica Sinica 18(1) 87). The present theory avoids some inexplicable events of the 20 kinds of amino acids code, in other words it solves the problem of 'the 64 codon assignments of mRNA to amino acids is probably completely wrong' proposed by Yang (2006 Progress in Modern Biomedicine 6 3).

  7. Sequences Of Amino Acids For Human Serum Albumin

    NASA Technical Reports Server (NTRS)

    Carter, Daniel C.

    1992-01-01

    Sequences of amino acids defined for use in making polypeptides one-third to one-sixth as large as parent human serum albumin molecule. Smaller, chemically stable peptides have diverse applications including service as artificial human serum and as active components of biosensors and chromatographic matrices. In applications involving production of artificial sera from new sequences, little or no concern about viral contaminants. Smaller genetically engineered polypeptides more easily expressed and produced in large quantities, making commercial isolation and production more feasible and profitable.

  8. Inference of Splicing Regulatory Activities by Sequence Neighborhood Analysis

    PubMed Central

    Stadler, Michael B; Shomron, Noam; Yeo, Gene W; Schneider, Aniket; Xiao, Xinshu; Burge, Christopher B

    2006-01-01

    Sequence-specific recognition of nucleic-acid motifs is critical to many cellular processes. We have developed a new and general method called Neighborhood Inference (NI) that predicts sequences with activity in regulating a biochemical process based on the local density of known sites in sequence space. Applied to the problem of RNA splicing regulation, NI was used to predict hundreds of new exonic splicing enhancer (ESE) and silencer (ESS) hexanucleotides from known human ESEs and ESSs. These predictions were supported by cross-validation analysis, by analysis of published splicing regulatory activity data, by sequence-conservation analysis, and by measurement of the splicing regulatory activity of 24 novel predicted ESEs, ESSs, and neutral sequences using an in vivo splicing reporter assay. These results demonstrate the ability of NI to accurately predict splicing regulatory activity and show that the scope of exonic splicing regulatory elements is substantially larger than previously anticipated. Analysis of orthologous exons in four mammals showed that the NI score of ESEs, a measure of function, is much more highly conserved above background than ESE primary sequence. This observation indicates a high degree of selection for ESE activity in mammalian exons, with surprisingly frequent interchangeability between ESE sequences. PMID:17121466

  9. Nanopores and nucleic acids: prospects for ultrarapid sequencing

    NASA Technical Reports Server (NTRS)

    Deamer, D. W.; Akeson, M.

    2000-01-01

    DNA and RNA molecules can be detected as they are driven through a nanopore by an applied electric field at rates ranging from several hundred microseconds to a few milliseconds per molecule. The nanopore can rapidly discriminate between pyrimidine and purine segments along a single-stranded nucleic acid molecule. Nanopore detection and characterization of single molecules represents a new method for directly reading information encoded in linear polymers. If single-nucleotide resolution can be achieved, it is possible that nucleic acid sequences can be determined at rates exceeding a thousand bases per second.

  10. Multimodal phylogeny for taxonomy: integrating information from nucleotide and amino acid sequences.

    PubMed

    Bicego, Manuele; Dellaglio, Franco; Felis, Giovanna E

    2007-10-01

    The crucial role played by the analysis of microbial diversity in biotechnology-based innovations has increased the interest in the microbial taxonomy research area. Phylogenetic sequence analyses have contributed significantly to the advances in this field, also in the view of the large amount of sequence data collected in recent years. Phylogenetic analyses could be realized on the basis of protein-encoding nucleotide sequences or encoded amino acid molecules: these two mechanisms present different peculiarities, still starting from two alternative representations of the same information. This complementarity could be exploited to achieve a multimodal phylogenetic scheme that is able to integrate gene and protein information in order to realize a single final tree. This aspect has been poorly addressed in the literature. In this paper, we propose to integrate the two phylogenetic analyses using basic schemes derived from the multimodality fusion theory (or multiclassifier systems theory), a well-founded and rigorous branch for which its powerfulness has already been demonstrated in other pattern recognition contexts. The proposed approach could be applied to distance matrix-based phylogenetic techniques (like neighbor joining), resulting in a smart and fast method. The proposed methodology has been tested in a real case involving sequences of some species of lactic acid bacteria. With this dataset, both nucleotide sequence- and amino acid sequence-based phylogenetic analyses present some drawbacks, which are overcome with the multimodal analysis. PMID:17933011

  11. Amino acid sequence of the Amur tiger prion protein.

    PubMed

    Wu, Changde; Pang, Wanyong; Zhao, Deming

    2006-10-01

    Prion diseases are fatal neurodegenerative disorders in human and animal associated with conformational conversion of a cellular prion protein (PrP(C)) into the pathologic isoform (PrP(Sc)). Various data indicate that the polymorphisms within the open reading frame (ORF) of PrP are associated with the susceptibility and control the species barrier in prion diseases. In the present study, partial Prnp from 25 Amur tigers (tPrnp) were cloned and screened for polymorphisms. Four single nucleotide polymorphisms (T423C, A501G, C511A, A610G) were found; the C511A and A610G nucleotide substitutions resulted in the amino acid changes Lysine171Glutamine and Alanine204Threoine, respectively. The tPrnp amino acid sequence is similar to house cat (Felis catus ) and sheep, but differs significantly from other two cat Prnp sequences that were previously deposited in GenBank. PMID:16780982

  12. Analysis of Organic Acids.

    ERIC Educational Resources Information Center

    Griswold, John R.; Rauner, Richard A.

    1990-01-01

    Presented are the procedures and a discussion of the results for an experiment in which students select unknown carboxylic acids, determine their melting points, and investigate their solubility behavior in water and ethanol. A table of selected carboxylic acids is included. (CW)

  13. Expressed sequence tags: analysis and annotation.

    PubMed

    Parkinson, John; Blaxter, Mark

    2004-01-01

    Expressed sequence tags (ESTs) present a special set of problems for bioinformatic analysis. They are partial and error-prone, and large datasets can have significant internal redundancy. To facilitate analysis of small EST datasets from in-house projects, we present an integrated "pipeline" of tools that take EST data from sequence trace to database submission. These tools also can be used to provide clustering of ESTs into putative genes and to annotate these genes with preliminary sequence similarity searches. The systems are written to use the public-domain LINUX environment and other openly available analytical tools. PMID:15153624

  14. Mechanism Analysis of Acid Tolerance Response of Bifidobacterium longum subsp. longum BBMN 68 by Gene Expression Profile Using RNA-Sequencing

    PubMed Central

    Jin, Junhua; Zhang, Bing; Guo, Huiyuan; Cui, Jianyun; Jiang, Lu; Song, Shuhui; Sun, Min; Ren, Fazheng

    2012-01-01

    To analyze the mechanism of the acid tolerance response (ATR) in Bifidobacterium longum subsp. longum BBMN68, we optimized the acid-adaptation condition to stimulate ATR effectively and analyzed the change of gene expression profile after acid-adaptation using high-throughput RNA-Seq. After acid-adaptation at pH 4.5 for 2 hours, the survival rate of BBMN68 at lethal pH 3.5 for 120 min was increased by 70 fold and the expression of 293 genes were upregulated by more than 2 fold, and 245 genes were downregulated by more than 2 fold. Gene expression profiling of ATR in BBMN68 suggested that, when the bacteria faced acid stress, the cells strengthened the integrity of cell wall and changed the permeability of membrane to keep the H+ from entering. Once the H+ entered the cytoplasm, the cells showed four main responses: First, the F0F1-ATPase system was initiated to discharge H+. Second, the ability to produce NH3 by cysteine-cystathionine-cycle was strengthened to neutralize excess H+. Third, the cells started NER-UVR and NER-VSR systems to minimize the damage to DNA and upregulated HtpX, IbpA, and γ-glutamylcysteine production to protect proteins against damage. Fourth, the cells initiated global response signals ((p)ppGpp, polyP, and Sec-SRP) to bring the whole cell into a state of response to the stress. The cells also secreted the quorum sensing signal (AI-2) to communicate between intraspecies cells by the cellular signal system, such as two-component systems, to improve the overall survival rate. Besides, the cells varied the pathways of producing energy by shifting to BCAA metabolism and enhanced the ability to utilize sugar to supply sufficient energy for the operation of the mechanism mentioned above. Based on these reults, it was inferred that, during industrial applications, the acid resistance of bifidobacteria could be improved by adding BCAA, γ-glutamylcysteine, cysteine, and cystathionine into the acid-stress environment. PMID:23236393

  15. Quantum-Sequencing: Biophysics of quantum tunneling through nucleic acids

    NASA Astrophysics Data System (ADS)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    Tunneling microscopy and spectroscopy has extensively been used in physical surface sciences to study quantum tunneling to measure electronic local density of states of nanomaterials and to characterize adsorbed species. Quantum-Sequencing (Q-Seq) is a new method based on tunneling microscopy for electronic sequencing of single molecule of nucleic acids. A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free single-molecule sequencing method. Here, we present the unique ``electronic fingerprints'' for all nucleotides on DNA and RNA using Q-Seq along their intrinsic biophysical parameters. We have analyzed tunneling spectra for the nucleotides at different pH conditions and analyzed the HOMO, LUMO and energy gap for all of them. In addition we show a number of biophysical parameters to further characterize all nucleobases (electron and hole transition voltage and energy barriers). These results highlight the robustness of Q-Seq as a technique for next-generation sequencing.

  16. A classification of glycosyl hydrolases based on amino acid sequence similarities.

    PubMed Central

    Henrissat, B

    1991-01-01

    The amino acid sequences of 301 glycosyl hydrolases and related enzymes have been compared. A total of 291 sequences corresponding to 39 EC entries could be classified into 35 families. Only ten sequences (less than 5% of the sample) could not be assigned to any family. With the sequences available for this analysis, 18 families were found to be monospecific (containing only one EC number) and 17 were found to be polyspecific (containing at least two EC numbers). Implications on the folding characteristics and mechanism of action of these enzymes and on the evolution of carbohydrate metabolism are discussed. With the steady increase in sequence and structural data, it is suggested that the enzyme classification system should perhaps be revised. PMID:1747104

  17. Correlation between fibroin amino acid sequence and physical silk properties.

    PubMed

    Fedic, Robert; Zurovec, Michal; Sehnal, Frantisek

    2003-09-12

    The fiber properties of lepidopteran silk depend on the amino acid repeats that interact during H-fibroin polymerization. The aim of our research was to relate repeat composition to insect biology and fiber strength. Representative regions of the H-fibroin genes were sequenced and analyzed in three pyralid species: wax moth (Galleria mellonella), European flour moth (Ephestia kuehniella), and Indian meal moth (Plodia interpunctella). The amino acid repeats are species-specific, evidently a diversification of an ancestral region of 43 residues, and include three types of regularly dispersed motifs: modifications of GSSAASAA sequence, stretches of tripeptides GXZ where X and Z represent bulky residues, and sequences similar to PVIVIEE. No concatenations of GX dipeptide or alanine, which are typical for Bombyx silkworms and Antheraea silk moths, respectively, were found. Despite different repeat structure, the silks of G. mellonella and E. kuehniella exhibit similar tensile strength as the Bombyx and Antheraea silks. We suggest that in these latter two species, variations in the repeat length obstruct repeat alignment, but sufficiently long stretches of iterated residues get superposed to interact. In the pyralid H-fibroins, interactions of the widely separated and diverse motifs depend on the precision of repeat matching; silk is strong in G. mellonella and E. kuehniella, with 2-3 types of long homogeneous repeats, and nearly 10 times weaker in P. interpunctella, with seven types of shorter erratic repeats. The high proportion of large amino acids in the H-fibroin of pyralids has probably evolved in connection with the spinning habit of caterpillars that live in protective silk tubes and spin continuously, enlarging the tubes on one end and partly devouring the other one. The silk serves as a depot of energetically rich and essential amino acids that may be scarce in the diet. PMID:12816957

  18. Ab initio detection of fuzzy amino acid tandem repeats in protein sequences

    PubMed Central

    2012-01-01

    Background Tandem repetitions within protein amino acid sequences often correspond to regular secondary structures and form multi-repeat 3D assemblies of varied size and function. Developing internal repetitions is one of the evolutionary mechanisms that proteins employ to adapt their structure and function under evolutionary pressure. While there is keen interest in understanding such phenomena, detection of repeating structures based only on sequence analysis is considered an arduous task, since structure and function is often preserved even under considerable sequence divergence (fuzzy tandem repeats). Results In this paper we present PTRStalker, a new algorithm for ab-initio detection of fuzzy tandem repeats in protein amino acid sequences. In the reported results we show that by feeding PTRStalker with amino acid sequences from the UniProtKB/Swiss-Prot database we detect novel tandemly repeated structures not captured by other state-of-the-art tools. Experiments with membrane proteins indicate that PTRStalker can detect global symmetries in the primary structure which are then reflected in the tertiary structure. Conclusions PTRStalker is able to detect fuzzy tandem repeating structures in protein sequences, with performance beyond the current state-of-the art. Such a tool may be a valuable support to investigating protein structural properties when tertiary X-ray data is not available. PMID:22536906

  19. Amino acid sequence of the nonsecretory ribonuclease of human urine.

    PubMed

    Beintema, J J; Hofsteenge, J; Iwama, M; Morita, T; Ohgi, K; Irie, M; Sugiyama, R H; Schieven, G L; Dekker, C A; Glitz, D G

    1988-06-14

    The amino acid sequence of a nonsecretory ribonuclease isolated from human urine was determined except for the identity of the residue at position 7. Sequence information indicates that the ribonucleases of human liver and spleen and an eosinophil-derived neurotoxin are identical or very closely related gene products. The sequence is identical at about 30% of the amino acid positions with those of all of the secreted mammalian ribonucleases for which information is available. Identical residues include active-site residues histidine-12, histidine-119, and lysine-41, other residues known to be important for substrate binding and catalytic activity, and all eight half-cystine residues common to these enzymes. Major differences include a deletion of six residues in the (so-called) S-peptide loop, insertions of two, and nine residues, respectively, in three other external loops of the molecule, and an addition of three residues at the amino terminus. The sequence shows the human nonsecretory ribonuclease to belong to the same ribonuclease superfamily as the mammalian secretory ribonucleases, turtle pancreatic ribonuclease, and human angiogenin. Sequence data suggest that a gene duplication occurred in an ancient vertebrate ancestor; one branch led to the nonsecretory ribonuclease, while the other branch led to a second duplication, with one line leading to the secretory ribonucleases (in mammals) and the second line leading to pancreatic ribonuclease in turtle and an angiogenic factor in mammals (human angiogenin). The nonsecretory ribonuclease has five short carbohydrate chains attached via asparagine residues at the surface of the molecule; these chains may have been shortened by exoglycosidase action.(ABSTRACT TRUNCATED AT 250 WORDS) PMID:3166997

  20. Auditory sequence analysis and phonological skill.

    PubMed

    Grube, Manon; Kumar, Sukhbinder; Cooper, Freya E; Turton, Stuart; Griffiths, Timothy D

    2012-11-01

    This work tests the relationship between auditory and phonological skill in a non-selected cohort of 238 school students (age 11) with the specific hypothesis that sound-sequence analysis would be more relevant to phonological skill than the analysis of basic, single sounds. Auditory processing was assessed across the domains of pitch, time and timbre; a combination of six standard tests of literacy and language ability was used to assess phonological skill. A significant correlation between general auditory and phonological skill was demonstrated, plus a significant, specific correlation between measures of phonological skill and the auditory analysis of short sequences in pitch and time. The data support a limited but significant link between auditory and phonological ability with a specific role for sound-sequence analysis, and provide a possible new focus for auditory training strategies to aid language development in early adolescence. PMID:22951739

  1. Characterization and amino acid sequence of a fatty acid-binding protein from human heart.

    PubMed

    Offner, G D; Brecher, P; Sawlivich, W B; Costello, C E; Troxler, R F

    1988-05-15

    The complete amino acid sequence of a fatty acid-binding protein from human heart was determined by automated Edman degradation of CNBr, BNPS-skatole [3'-bromo-3-methyl-2-(2-nitrobenzenesulphenyl)indolenine], hydroxylamine, Staphylococcus aureus V8 proteinase, tryptic and chymotryptic peptides, and by digestion of the protein with carboxypeptidase A. The sequence of the blocked N-terminal tryptic peptide from citraconylated protein was determined by collisionally induced decomposition mass spectrometry. The protein contains 132 amino acid residues, is enriched with respect to threonine and lysine, lacks cysteine, has an acetylated valine residue at the N-terminus, and has an Mr of 14768 and an isoelectric point of 5.25. This protein contains two short internal repeated sequences from residues 48-54 and from residues 114-119 located within regions of predicted beta-structure and decreasing hydrophobicity. These short repeats are contained within two longer repeated regions from residues 48-60 and residues 114-125, which display 62% sequence similarity. These regions could accommodate the charged and uncharged moieties of long-chain fatty acids and may represent fatty acid-binding domains consistent with the finding that human heart fatty acid-binding protein binds 2 mol of oleate or palmitate/mol of protein. Detailed evidence for the amino acid sequences of the peptides has been deposited as Supplementary Publication SUP 50143 (23 pages) at the British Library Lending Division, Boston Spa, Yorkshire LS23 7BQ, U.K., from whom copies may be obtained as indicated in Biochem. J. (1988) 249, 5. PMID:3421901

  2. Sequencing and Analysis of Neanderthal Genomic DNA

    PubMed Central

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith, Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Pääbo, Svante; Pritchard, Jonathan K.; Rubin, Edward M.

    2008-01-01

    Our knowledge of Neanderthals is based on a limited number of remains and artifacts from which we must make inferences about their biology, behavior, and relationship to ourselves. Here, we describe the characterization of these extinct hominids from a new perspective, based on the development of a Neanderthal metagenomic library and its high-throughput sequencing and analysis. Several lines of evidence indicate that the 65,250 base pairs of hominid sequence so far identified in the library are of Neanderthal origin, the strongest being the ascertainment of sequence identities between Neanderthal and chimpanzee at sites where the human genomic sequence is different. These results enabled us to calculate the human-Neanderthal divergence time based on multiple randomly distributed autosomal loci. Our analyses suggest that on average the Neanderthal genomic sequence we obtained and the reference human genome sequence share a most recent common ancestor ~706,000 years ago, and that the human and Neanderthal ancestral populations split ~370,000 years ago, before the emergence of anatomically modern humans. Our finding that the Neanderthal and human genomes are at least 99.5% identical led us to develop and successfully implement a targeted method for recovering specific ancient DNA sequences from metagenomic libraries. This initial analysis of the Neanderthal genome advances our understanding of the evolutionary relationship of Homo sapiens and Homo neanderthalensis and signifies the dawn of Neanderthal genomics. PMID:17110569

  3. Genomic sequence analysis tools: a user's guide.

    PubMed

    Fortna, A; Gardiner, K

    2001-03-01

    The wealth of information from various genome sequencing projects provides the biologist with a new perspective from which to analyze, and design experiments with, mammalian systems. The complexity of the information, however, requires new software tools, and numerous such tools are now available. Which type and which specific system is most effective depends, in part, upon how much sequence is to be analyzed and with what level of experimental support. Here we survey a number of mammalian genomic sequence analysis systems with respect to the data they provide and the ease of their use. The hope is to aid the experimental biologist in choosing the most appropriate tool for their analyses. PMID:11226611

  4. Nucleic acid sequence detection using multiplexed oligonucleotide PCR

    DOEpatents

    Nolan, John P.; White, P. Scott

    2006-12-26

    Methods for rapidly detecting single or multiple sequence alleles in a sample nucleic acid are described. Provided are all of the oligonucleotide pairs capable of annealing specifically to a target allele and discriminating among possible sequences thereof, and ligating to each other to form an oligonucleotide complex when a particular sequence feature is present (or, alternatively, absent) in the sample nucleic acid. The design of each oligonucleotide pair permits the subsequent high-level PCR amplification of a specific amplicon when the oligonucleotide complex is formed, but not when the oligonucleotide complex is not formed. The presence or absence of the specific amplicon is used to detect the allele. Detection of the specific amplicon may be achieved using a variety of methods well known in the art, including without limitation, oligonucleotide capture onto DNA chips or microarrays, oligonucleotide capture onto beads or microspheres, electrophoresis, and mass spectrometry. Various labels and address-capture tags may be employed in the amplicon detection step of multiplexed assays, as further described herein.

  5. The amino acid sequence of rabbit muscle triose phosphate isomerase.

    PubMed Central

    Corran, P H; Waley, S G

    1975-01-01

    The amino acid sequence of rabbit muscle triose phosphate isomerase was deduced by characterizing peptides that overlap the tryptic peptides. Thiol groups were modified by oxidation, carboxymethylation or aminoen. About 50 peptides that provided information about overlaps were isolated; the peptides were mostly characterized by their compositions and N-terminal residues. The peptide chains contain 248 amino acid residues, and no evidence for dissimilarity of the two subunits that comprise the native enzyme was found. The sequence of the rabbit muscle enzyme may be compared with that of the coelacanth enzyme (Kolb et al., 1974): 84% of the residues are in identical positions. Similarly, comparison of the sequence with that inferred for the chicken enzyme (Furth et al., 1974) shows that 87% of the residues are in identical positions. Limited though these comparisons are, they suggest that triose phosphate isomerase has one of the lowest rates of evolutionary change. An extended version of the present paper has been deposited as Supplementary Publication SUP 50040 (42 pages) at the British Library (Lending Division) (formerly the National Lending Library for Science and Technology), Boston Spa, Yorks. LS23 7BQ, U.K., from whom copies can be obtained on the terms given in Biochem. J. (1975) 145, 5. PMID:1171682

  6. RSAT 2015: Regulatory Sequence Analysis Tools.

    PubMed

    Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

    2015-07-01

    RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. PMID:25904632

  7. RSAT 2015: Regulatory Sequence Analysis Tools

    PubMed Central

    Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A.; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M.; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

    2015-01-01

    RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. PMID:25904632

  8. The amino acid sequence of chymopapain from Carica papaya.

    PubMed Central

    Watson, D C; Yaguchi, M; Lynn, K R

    1990-01-01

    Chymopapain is a polypeptide of 218 amino acid residues. It has considerable structural similarity with papain and papaya proteinase omega, including conservation of the catalytic site and of the disulphide bonding. Chymopapain is like papaya proteinase omega in carrying four extra residues between papain positions 168 and 169, but differs from both papaya proteinases in the composition of its S2 subsite, as well as in having a second thiol group, Cys-117. Some evidence for the amino acid sequence of chymopapain has been deposited as Supplementary Publication SUP 50153 (12 pages) at the British Library Document Supply Centre, Boston Spa., Wetherby, West Yorkshire LS23 7BQ, U.K., from whom copies may be obtained on the terms indicated in Biochem. J. (1990) 265, 5. The information comprises Supplement Tables 1-4, which contain, in order, amino acid compositions of peptides from tryptic, peptic, CNBr and mild acid cleavages, Supplement Fig. 1, showing re-fractionation of selected peaks from Fig. 2 of the main paper. Supplement Fig. 2, showing cation-exchange chromatography of the earliest-eluted peak of Fig. 3 of the main paper, Supplement Fig. 3, showing reverse-phase h.p.l.c. of the later-eluted peak from Fig. 3 of the main paper, and Supplement Fig. 4, showing the separation of peptides after mild acid hydrolysis of CNBr-cleavage fragment CB3. PMID:2106878

  9. The amino acid sequence of chymopapain from Carica papaya.

    PubMed

    Watson, D C; Yaguchi, M; Lynn, K R

    1990-02-15

    Chymopapain is a polypeptide of 218 amino acid residues. It has considerable structural similarity with papain and papaya proteinase omega, including conservation of the catalytic site and of the disulphide bonding. Chymopapain is like papaya proteinase omega in carrying four extra residues between papain positions 168 and 169, but differs from both papaya proteinases in the composition of its S2 subsite, as well as in having a second thiol group, Cys-117. Some evidence for the amino acid sequence of chymopapain has been deposited as Supplementary Publication SUP 50153 (12 pages) at the British Library Document Supply Centre, Boston Spa., Wetherby, West Yorkshire LS23 7BQ, U.K., from whom copies may be obtained on the terms indicated in Biochem. J. (1990) 265, 5. The information comprises Supplement Tables 1-4, which contain, in order, amino acid compositions of peptides from tryptic, peptic, CNBr and mild acid cleavages, Supplement Fig. 1, showing re-fractionation of selected peaks from Fig. 2 of the main paper. Supplement Fig. 2, showing cation-exchange chromatography of the earliest-eluted peak of Fig. 3 of the main paper, Supplement Fig. 3, showing reverse-phase h.p.l.c. of the later-eluted peak from Fig. 3 of the main paper, and Supplement Fig. 4, showing the separation of peptides after mild acid hydrolysis of CNBr-cleavage fragment CB3. PMID:2106878

  10. Improved Algorithm for Analysis of DNA Sequences Using Multiresolution Transformation

    PubMed Central

    Inbamalar, T. M.; Sivakumar, R.

    2015-01-01

    Bioinformatics and genomic signal processing use computational techniques to solve various biological problems. They aim to study the information allied with genetic materials such as the deoxyribonucleic acid (DNA), the ribonucleic acid (RNA), and the proteins. Fast and precise identification of the protein coding regions in DNA sequence is one of the most important tasks in analysis. Existing digital signal processing (DSP) methods provide less accurate and computationally complex solution with greater background noise. Hence, improvements in accuracy, computational complexity, and reduction in background noise are essential in identification of the protein coding regions in the DNA sequences. In this paper, a new DSP based method is introduced to detect the protein coding regions in DNA sequences. Here, the DNA sequences are converted into numeric sequences using electron ion interaction potential (EIIP) representation. Then discrete wavelet transformation is taken. Absolute value of the energy is found followed by proper threshold. The test is conducted using the data bases available in the National Centre for Biotechnology Information (NCBI) site. The comparative analysis is done and it ensures the efficiency of the proposed system. PMID:26000337

  11. Improved algorithm for analysis of DNA sequences using multiresolution transformation.

    PubMed

    Inbamalar, T M; Sivakumar, R

    2015-01-01

    Bioinformatics and genomic signal processing use computational techniques to solve various biological problems. They aim to study the information allied with genetic materials such as the deoxyribonucleic acid (DNA), the ribonucleic acid (RNA), and the proteins. Fast and precise identification of the protein coding regions in DNA sequence is one of the most important tasks in analysis. Existing digital signal processing (DSP) methods provide less accurate and computationally complex solution with greater background noise. Hence, improvements in accuracy, computational complexity, and reduction in background noise are essential in identification of the protein coding regions in the DNA sequences. In this paper, a new DSP based method is introduced to detect the protein coding regions in DNA sequences. Here, the DNA sequences are converted into numeric sequences using electron ion interaction potential (EIIP) representation. Then discrete wavelet transformation is taken. Absolute value of the energy is found followed by proper threshold. The test is conducted using the data bases available in the National Centre for Biotechnology Information (NCBI) site. The comparative analysis is done and it ensures the efficiency of the proposed system. PMID:26000337

  12. Efficient Nucleic Acid Extraction and 16S rRNA Gene Sequencing for Bacterial Community Characterization.

    PubMed

    Anahtar, Melis N; Bowman, Brittany A; Kwon, Douglas S

    2016-01-01

    There is a growing appreciation for the role of microbial communities as critical modulators of human health and disease. High throughput sequencing technologies have allowed for the rapid and efficient characterization of bacterial communities using 16S rRNA gene sequencing from a variety of sources. Although readily available tools for 16S rRNA sequence analysis have standardized computational workflows, sample processing for DNA extraction remains a continued source of variability across studies. Here we describe an efficient, robust, and cost effective method for extracting nucleic acid from swabs. We also delineate downstream methods for 16S rRNA gene sequencing, including generation of sequencing libraries, data quality control, and sequence analysis. The workflow can accommodate multiple samples types, including stool and swabs collected from a variety of anatomical locations and host species. Additionally, recovered DNA and RNA can be separated and used for other applications, including whole genome sequencing or RNA-seq. The method described allows for a common processing approach for multiple sample types and accommodates downstream analysis of genomic, metagenomic and transcriptional information. PMID:27168460

  13. Efficient Nucleic Acid Extraction and 16S rRNA Gene Sequencing for Bacterial Community Characterization

    PubMed Central

    Anahtar, Melis N.; Bowman, Brittany A.; Kwon, Douglas S.

    2016-01-01

    There is a growing appreciation for the role of microbial communities as critical modulators of human health and disease. High throughput sequencing technologies have allowed for the rapid and efficient characterization of bacterial communities using 16S rRNA gene sequencing from a variety of sources. Although readily available tools for 16S rRNA sequence analysis have standardized computational workflows, sample processing for DNA extraction remains a continued source of variability across studies. Here we describe an efficient, robust, and cost effective method for extracting nucleic acid from swabs. We also delineate downstream methods for 16S rRNA gene sequencing, including generation of sequencing libraries, data quality control, and sequence analysis. The workflow can accommodate multiple samples types, including stool and swabs collected from a variety of anatomical locations and host species. Additionally, recovered DNA and RNA can be separated and used for other applications, including whole genome sequencing or RNA-seq. The method described allows for a common processing approach for multiple sample types and accommodates downstream analysis of genomic, metagenomic and transcriptional information. PMID:27168460

  14. Structural characterization of blotting membranes and the influence of membrane parameters for electroblotting and subsequent amino acid sequence analysis of proteins.

    PubMed

    Eckerskorn, C; Lottspeich, F

    1993-09-01

    Various blotting membranes were evaluated and correlated with the efficiency of electroblotting and the performance in the sequencing process. Structural parameters including specific surface area, pore size distribution, pore volumes, and permeabilities of different solvents lead to discrimination of the membranes relative to their accessible surfaces and membrane densities. Protein binding capacities as well as protein recoveries in electroblotting correlate with the specific surface areas. Almost quantitative retention of proteins during electroblotting from gels was obtained for membranes with a high specific surface area and narrow pores (Trans-Blot, Immobilon PSQ, Fluorotrans), whereas membranes with a relatively low specific surface area (Immobilon P, Glassybond) showed reduced recoveries of between 10-20% for the tested proteins. Initial yields and repetitive yields were compared for radioiodinated standard proteins that have been either electroblotted or loaded by direct adsorption. The results showed that the different permeabilities for solutions of the Edman chemistry have a major influence on initial yields. The glass fiber-based membranes with an extremely low flow restriction produce consistently high initial yields independent of the application mode of the protein (spotted or electroblotted) or the application of the membranes into the cartridge (discs or small pieces). In contrast, the polymeric membranes showed decreasing initial yields with increasing membrane density for spotted and electroblotted proteins. Yields varied considerably when the membranes were applied as discs into the cartridge. This effect could be minimized by cutting the membranes into pieces as small as possible, as demonstrated for electroblotted proteins. PMID:8223390

  15. Phylogenetic analysis of adenovirus sequences.

    PubMed

    Harrach, Balázs; Benko, Mária

    2007-01-01

    Members of the family Adenoviridae have been isolated from a large variety of hosts, including representatives from every major vertebrate class from fish to mammals. The high prevalence, together with the fairly conserved organization of the central part of their genomes, make the adenoviruses one of (if not the) best models for studying viral evolution on a larger time scale. Phylogenetic calculation can infer the evolutionary distance among adenovirus strains on serotype, species, and genus levels, thus helping the establishment of a correct taxonomy on the one hand, and speeding up the process of typing new isolates on the other. Initially, four major lineages corresponding to four genera were recognized. Later, the demarcation criteria of lower taxon levels, such as species or types, could also be defined with phylogenetic calculations. A limited number of possible host switches have been hypothesized and convincingly supported. Application of the web-based BLAST and MultAlin programs and the freely available PHYLIP package, along with the TreeView program, enables everyone to make correct calculations. In addition to step-by-step instruction on how to perform phylogenetic analysis, critical points where typical mistakes or misinterpretation of the results might occur will be identified and hints for their avoidance will be provided. PMID:17656792

  16. Sequence analysis by iterated maps, a review.

    PubMed

    Almeida, Jonas S

    2014-05-01

    Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, 'Chaos Game Representation'. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results. PMID:24162172

  17. Microfluidics in amino acid analysis.

    PubMed

    Pumera, Martin

    2007-07-01

    Microfluidic devices have been widely used to derivatize, separate, and detect amino acids employing many different strategies. Virtually zero-dead volume interconnections and fast mass transfer in small volume microchannels enable dramatic increases in on-chip derivatization reaction speed, while only minute amounts of sample and reagent are needed. Due to short channel path, fast subsecond separations can be carried out. With sophisticated miniaturized detectors, the whole analytical process can be integrated on one platform. This article reviews developments of lab-on-chip technology in amino acid analysis, it shows important design features such as sample preconcentration, precolumn and postcolumn amino acid derivatization, and unlabeled and labeled amino acid detection with focus on advanced designs. The review also describes important biomedical and space exploration applications of amino acid analysis on microfluidic devices. PMID:17542043

  18. Amino acid sequence prerequisites for the formation of cn ions.

    PubMed

    Downard, K M; Biemann, K

    1993-11-01

    Ammo acid sequence prerequisites are described for the formation of c, ions observed in high-energy collision-induced decomposition spectra of peptides. It is shown that the formation of cn ions is promoted by the nature of the amino acid C-terminal to the cleavage site. A propensity for cn cleavage preceding threonine, and to a lesser extent tryptophan, lysine, and serine, is demonstrated where fragmentation is directed N-terminally at these residues. In addition, the nature of the residue N-terminal to the cleavage site is shown to have little effect on cn ion formation. A mechanism for cn ion formation is proposed and its applicability to the results observed is discussed. PMID:24227531

  19. Ultrasensitive nucleic acid sequence detection by single-molecule electrophoresis

    SciTech Connect

    Castro, A; Shera, E.B.

    1996-09-01

    This is the final report of a one-year laboratory-directed research and development project at Los Alamos National Laboratory. There has been considerable interest in the development of very sensitive clinical diagnostic techniques over the last few years. Many pathogenic agents are often present in extremely small concentrations in clinical samples, especially at the initial stages of infection, making their detection very difficult. This project sought to develop a new technique for the detection and accurate quantification of specific bacterial and viral nucleic acid sequences in clinical samples. The scheme involved the use of novel hybridization probes for the detection of nucleic acids combined with our recently developed technique of single-molecule electrophoresis. This project is directly relevant to the DOE`s Defense Programs strategic directions in the area of biological warfare counter-proliferation.

  20. Antibody-specific model of amino acid substitution for immunological inferences from alignments of antibody sequences.

    PubMed

    Mirsky, Alexander; Kazandjian, Linda; Anisimova, Maria

    2015-03-01

    Antibodies are glycoproteins produced by the immune system as a dynamically adaptive line of defense against invading pathogens. Very elegant and specific mutational mechanisms allow B lymphocytes to produce a large and diversified repertoire of antibodies, which is modified and enhanced throughout all adulthood. One of these mechanisms is somatic hypermutation, which stochastically mutates nucleotides in the antibody genes, forming new sequences with different properties and, eventually, higher affinity and selectivity to the pathogenic target. As somatic hypermutation involves fast mutation of antibody sequences, this process can be described using a Markov substitution model of molecular evolution. Here, using large sets of antibody sequences from mice and humans, we infer an empirical amino acid substitution model AB, which is specific to antibody sequences. Compared with existing general amino acid models, we show that the AB model provides significantly better description for the somatic evolution of mice and human antibody sequences, as demonstrated on large next generation sequencing (NGS) antibody data. General amino acid models are reflective of conservation at the protein level due to functional constraints, with most frequent amino acids exchanges taking place between residues with the same or similar physicochemical properties. In contrast, within the variable part of antibody sequences we observed an elevated frequency of exchanges between amino acids with distinct physicochemical properties. This is indicative of a sui generis mutational mechanism, specific to antibody somatic hypermutation. We illustrate this property of antibody sequences by a comparative analysis of the network modularity implied by the AB model and general amino acid substitution models. We recommend using the new model for computational studies of antibody sequence maturation, including inference of alignments and phylogenetic trees describing antibody somatic hypermutation in

  1. Sequence analysis of the AAA protein family.

    PubMed Central

    Beyer, A.

    1997-01-01

    The AAA protein family, a recently recognized group of Walker-type ATPases, has been subjected to an extensive sequence analysis. Multiple sequence alignments revealed the existence of a region of sequence similarity, the so-called AAA cassette. The borders of this cassette were localized and within it, three boxes of a high degree of conservation were identified. Two of these boxes could be assigned to substantial parts of the ATP binding site (namely, to Walker motifs A and B); the third may be a portion of the catalytic center. Phylogenetic trees were calculated to obtain insights into the evolutionary history of the family. Subfamilies with varying degrees of intra-relatedness could be discriminated; these relationships are also supported by analysis of sequences outside the canonical AAA boxes: within the cassette are regions that are strongly conserved within each subfamily, whereas little or even no similarity between different subfamilies can be observed. These regions are well suited to define fingerprints for subfamilies. A secondary structure prediction utilizing all available sequence information was performed and the result was fitted to the general 3D structure of a Walker A/GTPase. The agreement was unexpectedly high and strongly supports the conclusion that the AAA family belongs to the Walker superfamily of A/GTPases. PMID:9336829

  2. On human disease-causing amino acid variants: statistical study of sequence and structural patterns

    PubMed Central

    Alexov, Emil

    2015-01-01

    Statistical analysis was carried out on large set of naturally occurring human amino acid variations and it was demonstrated that there is a preference for some amino acid substitutions to be associated with diseases. At an amino acid sequence level, it was shown that the disease-causing variants frequently involve drastic changes of amino acid physico-chemical properties of proteins such as charge, hydrophobicity and geometry. Structural analysis of variants involved in diseases and being frequently observed in human population showed similar trends: disease-causing variants tend to cause more changes of hydrogen bond network and salt bridges as compared with harmless amino acid mutations. Analysis of thermodynamics data reported in literature, both experimental and computational, indicated that disease-causing variants tend to destabilize proteins and their interactions, which prompted us to investigate the effects of amino acid mutations on large databases of experimentally measured energy changes in unrelated proteins. Although the experimental datasets were linked neither to diseases nor exclusory to human proteins, the observed trends were the same: amino acid mutations tend to destabilize proteins and their interactions. Having in mind that structural and thermodynamics properties are interrelated, it is pointed out that any large change of any of them is anticipated to cause a disease. PMID:25689729

  3. Sequence analysis by iterated maps, a review

    PubMed Central

    2014-01-01

    Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, ‘Chaos Game Representation’. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results. PMID:24162172

  4. Comparative sequence-structure analysis of Aves insulin

    PubMed Central

    Islam, Md Mirazul; Aktaruzzaman, M; Mohamed, Zahurin

    2015-01-01

    Normal blood glucose level depends on the availability of insulin and its ability to bind insulin receptor (IR) that regulates the downstream signaling pathway. Insulin sequence and blood glucose level usually vary among animals due to species specificity. The study of genetic variation of insulin, blood glucose level and diabetics symptoms development in Aves is interesting because of its optimal high blood glucose level than mammals. Therefore, it is of interest to study its evolutionary relationship with other mammals using sequence data. Hence, we compiled 32 Aves insulin from GenBank to compare its sequence-structure features with phylogeny for evolutionary inference. The analysis shows long conserved motifs (about 14 residues) for functional inference. These sequences show high leucine content (20%) with high instability index (>40). Amino acid position 11, 14, 16 and 20 are variable that may have contribution to binding to IR. We identified functionally critical variable residues in the dataset for possible genetic implication. Structural models of these sequences were developed for surface analysis towards functional representation. These data find application in the understanding of insulin function across species. PMID:25848166

  5. Nucleotide and predicted amino acid sequences of cloned human and mouse preprocathepsin B cDNAs.

    PubMed Central

    Chan, S J; San Segundo, B; McCormick, M B; Steiner, D F

    1986-01-01

    Cathepsin B is a lysosomal thiol proteinase that may have additional extralysosomal functions. To further our investigations on the structure, mode of biosynthesis, and intracellular sorting of this enzyme, we have determined the complete coding sequences for human and mouse preprocathepsin B by using cDNA clones isolated from human hepatoma and kidney phage libraries. The nucleotide sequences predict that the primary structure of preprocathepsin B contains 339 amino acids organized as follows: a 17-residue NH2-terminal prepeptide sequence followed by a 62-residue propeptide region, 254 residues in mature (single chain) cathepsin B, and a 6-residue extension at the COOH terminus. A comparison of procathepsin B sequences from three species (human, mouse, and rat) reveals that the homology between the propeptides is relatively conserved with a minimum of 68% sequence identity. In particular, two conserved sequences in the propeptide that may be functionally significant include a potential glycosylation site and the presence of a single cysteine at position 59. Comparative analysis of the three sequences also suggests that processing of procathepsin B is a multistep process, during which enzymatically active intermediate forms may be generated. The availability of the cDNA clones will facilitate the identification of possible active or inactive intermediate processive forms as well as studies on the transcriptional regulation of the cathepsin B gene. PMID:3463996

  6. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... in the sequence. (4) The enumeration of amino acids may start at the first amino acid of the first..., counting backwards starting with the amino acid next to number 1. Otherwise, the enumeration of amino acids... sequence every 5 amino acids. The enumeration method for amino acid sequences that is set forth......

  7. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... in the sequence. (4) The enumeration of amino acids may start at the first amino acid of the first..., counting backwards starting with the amino acid next to number 1. Otherwise, the enumeration of amino acids... sequence every 5 amino acids. The enumeration method for amino acid sequences that is set forth......

  8. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... in the sequence. (4) The enumeration of amino acids may start at the first amino acid of the first..., counting backwards starting with the amino acid next to number 1. Otherwise, the enumeration of amino acids... sequence every 5 amino acids. The enumeration method for amino acid sequences that is set forth......

  9. Predicting protein disorder by analyzing amino acid sequence

    PubMed Central

    Yang, Jack Y; Yang, Mary Qu

    2008-01-01

    Background Many protein regions and some entire proteins have no definite tertiary structure, presenting instead as dynamic, disorder ensembles under different physiochemical circumstances. These proteins and regions are known as Intrinsically Unstructured Proteins (IUP). IUP have been associated with a wide range of protein functions, along with roles in diseases characterized by protein misfolding and aggregation. Results Identifying IUP is important task in structural and functional genomics. We exact useful features from sequences and develop machine learning algorithms for the above task. We compare our IUP predictor with PONDRs (mainly neural-network-based predictors), disEMBL (also based on neural networks) and Globplot (based on disorder propensity). Conclusion We find that augmenting features derived from physiochemical properties of amino acids (such as hydrophobicity, complexity etc.) and using ensemble method proved beneficial. The IUP predictor is a viable alternative software tool for identifying IUP protein regions and proteins. PMID:18831799

  10. Deduced amino acid sequence of human pulmonary surfactant proteolipid: SPL(pVal)

    SciTech Connect

    Whitsett, J.A.; Glasser, S.W.; Korfhagen, T.R.; Weaver, T.E.; Clark, J.; Pilot-Matias, T.; Meuth, J.; Fox, J.L.

    1987-05-01

    Hydrophobic, proteolipid-like protein of Mr 6500 was isolated from ether/ethanol extracts of human, canine and bovine pulmonary surfactant. Amino acid composition of the protein demonstrated a remarkable abundance of hydrophobic residues, particularly valine and leucine. The N-terminal amino acid sequence of the human protein was determined: N-Leu-Ile-Pro-Cys-Cys-Pro-Val-Asn-Leu-Lys-Arg-Leu-Leu-Ile-Val4... An oligonucleotide probe was used to screen an adult human lung cDNA library and resulted in detection of cDNA clones with predicted amino acid sequence with close identity to the N-terminal amino acid sequence of the human peptide. SPL(pVal) was found within the reading frame of a larger peptide. SPL(pVal) results from proteolytic processing of a larger preprotein. Northern blot analysis detected in a single 1.0 kilobase SPL(pVal) RNA which was less abundant in fetal than in adult lung. Mixtures of purified canine and bovine SPL(pVal) and synthetic phospholipids display properties of rapid adsorption and surface tension lowering activity characteristic of surfactant. Human SPL(pVal) is a pulmonary surfactant proteolipid which may therefore be useful in combination with phospholipids and/or other surfactant proteins for the treatment of surfactant deficiency such as hyaline membrane disease in newborn infants.

  11. ANTICALIgN: visualizing, editing and analyzing combined nucleotide and amino acid sequence alignments for combinatorial protein engineering.

    PubMed

    Jarasch, Alexander; Kopp, Melanie; Eggenstein, Evelyn; Richter, Antonia; Gebauer, Michaela; Skerra, Arne

    2016-07-01

    ANTIC ALIGN: is an interactive software developed to simultaneously visualize, analyze and modify alignments of DNA and/or protein sequences that arise during combinatorial protein engineering, design and selection. ANTIC ALIGN: combines powerful functions known from currently available sequence analysis tools with unique features for protein engineering, in particular the possibility to display and manipulate nucleotide sequences and their translated amino acid sequences at the same time. ANTIC ALIGN: offers both template-based multiple sequence alignment (MSA), using the unmutated protein as reference, and conventional global alignment, to compare sequences that share an evolutionary relationship. The application of similarity-based clustering algorithms facilitates the identification of duplicates or of conserved sequence features among a set of selected clones. Imported nucleotide sequences from DNA sequence analysis are automatically translated into the corresponding amino acid sequences and displayed, offering numerous options for selecting reading frames, highlighting of sequence features and graphical layout of the MSA. The MSA complexity can be reduced by hiding the conserved nucleotide and/or amino acid residues, thus putting emphasis on the relevant mutated positions. ANTIC ALIGN: is also able to handle suppressed stop codons or even to incorporate non-natural amino acids into a coding sequence. We demonstrate crucial functions of ANTIC ALIGN: in an example of Anticalins selected from a lipocalin random library against the fibronectin extradomain B (ED-B), an established marker of tumor vasculature. Apart from engineered protein scaffolds, ANTIC ALIGN: provides a powerful tool in the area of antibody engineering and for directed enzyme evolution. PMID:27261456

  12. Complete nucleic acid sequence of Penaeus stylirostris densovirus (PstDNV) from India.

    PubMed

    Rai, Praveen; Safeena, Muhammed P; Karunasagar, Iddya; Karunasagar, Indrani

    2011-06-01

    Infectious hypodermal and hematopoietic necrosis virus (IHHNV) of shrimp, recently been classified as Penaeus stylirostris densovirus (PstDNV). The complete nucleic acid sequence of PstDNV from India was obtained by cloning and sequencing of different DNA fragment of the virus. The genome organisation of PstDNV revealed that there were three major coding domains: a left ORF (NS1) of 2001 bp, a mid ORF (NS2) of 1092 bp and a right ORF (VP) of 990 bp. The complete genome and amino acid sequences of three proteins viz., NS1, NS2 and VP were compared with the genomes of the virus reported from Hawaii, China and Mexico and with partial sequence available from isolates from different regions. The phylogenetic analysis of shrimp, insect and vertebrate parvovirus sequences showed that the Indian PstDNV isolate is phylogenetically more closely related to one of the three isolates from Taiwan (AY355307), and two isolates (AY362547 and AY102034) from Thailand. PMID:21402111

  13. Nucleotide and derived amino acid sequences of the major porin of Comamonas acidovorans and comparison of porin primary structures.

    PubMed Central

    Gerbl-Rieger, S; Peters, J; Kellermann, J; Lottspeich, F; Baumeister, W

    1991-01-01

    The DNA sequence of the gene which codes for the major outer membrane porin (Omp32) of Comamonas acidovorans has been determined. The structural gene encodes a precursor consisting of 351 amino acid residues with a signal peptide of 19 amino acid residues. Comparisons with amino acid sequences of outer membrane proteins and porins from several other members of the class Proteobacteria and of the Chlamydia trachomatis porin and the Neurospora crassa mitochondrial porin revealed a motif of eight regions of local homology. The results of this analysis are discussed with regard to common structural features of porins. PMID:1848840

  14. NexGen Production – Sequencing and Analysis

    SciTech Connect

    Muzny, Donna

    2010-06-02

    Donna Muzny of the Baylor College of Medicine Human Genome Sequencing Center discusses next generation sequencing platforms and evaluating pipeline performance on June 2, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

  15. Partial amino acid sequence of apolipoprotein(a) shows that it is homologous to plasminogen

    SciTech Connect

    Eaton, D.L.; Fless, G.M.; Kohr, W.J.; McLean, J.W.; Xu, Q.T.; Miller, C.G.; Lawn, R.M.; Scanu, A.M.

    1987-05-01

    Apolipoprotein(a) (apo(a)) is a glycoprotein with M/sub r/ approx. 280,000 that is disulfide linked to apolipoprotein B in lipoprotein(a) particles. Elevated plasma levels of lipoprotein(a) are correlated with atherosclerosis. Partial amino acid sequence of apo(a) shows that it has striking homology to plasminogen. Plasminogen is a plasma serine protease zymogen that consists of five homologous and tandemly repeated domains called kringles and a trypsin-like protease domain. The amino-terminal sequence obtained for apo(a) is homologous to the beginning of kringle 4 but not the amino terminus of plasminogen. Apo(a) was subjected to limited proteolysis by trypsin or V8 protease, and fragments generated were isolated and sequenced. Sequences obtained from several of these fragments are highly (77-100%) homologous to plasminogen residues 391-421, which reside within kringle 4. Analysis of these internal apo(a) sequences revealed that apo(a) may contain at least two kringle 4-like domains. A sequence obtained from another tryptic fragment also shows homology to the end of kringle 4 and the beginning of kringle 5. Sequence data obtained from the two tryptic fragments shows homology with the protease domain of plasminogen. One of these sequences is homologous to the sequences surrounding the activation site of plasminogen. Plasminogen is activated by the cleavage of a specific arginine residue by urokinase and tissue plasminogen activator; however, the corresponding site in apo(a) is a serine that would not be cleaved by tissue plasminogen activator or urokinase. Using a plasmin-specific assay, no proteolytic activity could be demonstrated for lipoprotein(a) particles. These results suggest that apo(a) contains kringle-like domains and an inactive protease domain.

  16. Comparative Analysis of Genome Sequences with VISTA

    DOE Data Explorer

    Dubchak, Inna

    VISTA is a comprehensive suite of programs and databases developed by and hosted at the Genomics Division of Lawrence Berkeley National Laboratory. They provide information and tools designed to facilitate comparative analysis of genomic sequences. Users have two ways to interact with the suite of applications at the VISTA portal. They can submit their own sequences and alignments for analysis (VISTA servers) or examine pre-computed whole-genome alignments of different species. A key menu option is the Enhancer Browser and Database at http://enhancer.lbl.gov/. The VISTA Enhancer Browser is a central resource for experimentally validated human noncoding fragments with gene enhancer activity as assessed in transgenic mice. Most of these noncoding elements were selected for testing based on their extreme conservation with other vertebrates. The results of this enhancer screen are provided through this publicly available website. The browser also features relevant results by external contributors and a large collection of additional genome-wide conserved noncoding elements which are candidate enhancer sequences. The LBL developers invite external groups to submit computational predictions of developmental enhancers. As of 10/19/2009 the database contains information on 1109 in vivo tested elements - 508 elements with enhancer activity.

  17. Genome sequence and analysis of Lactobacillus helveticus

    PubMed Central

    Cremonesi, Paola; Chessa, Stefania; Castiglioni, Bianca

    2013-01-01

    The microbiological characterization of lactobacilli is historically well developed, but the genomic analysis is recent. Because of the widespread use of Lactobacillus helveticus in cheese technology, information concerning the heterogeneity in this species is accumulating rapidly. Recently, the genome of five L. helveticus strains was sequenced to completion and compared with other genomically characterized lactobacilli. The genomic analysis of the first sequenced strain, L. helveticus DPC 4571, isolated from cheese and selected for its characteristics of rapid lysis and high proteolytic activity, has revealed a plethora of genes with industrial potential including those responsible for key metabolic functions such as proteolysis, lipolysis, and cell lysis. These genes and their derived enzymes can facilitate the production of cheese and cheese derivatives with potential for use as ingredients in consumer foods. In addition, L. helveticus has the potential to produce peptides with a biological function, such as angiotensin converting enzyme (ACE) inhibitory activity, in fermented dairy products, demonstrating the therapeutic value of this species. A most intriguing feature of the genome of L. helveticus is the remarkable similarity in gene content with many intestinal lactobacilli. Comparative genomics has allowed the identification of key gene sets that facilitate a variety of lifestyles including adaptation to food matrices or the gastrointestinal tract. As genome sequence and functional genomic information continues to explode, key features of the genomes of L. helveticus strains continue to be discovered, answering many questions but also raising many new ones. PMID:23335916

  18. Castor Bean Organelle Genome Sequencing and Worldwide Genetic Diversity Analysis

    PubMed Central

    Chan, Agnes P.; Williams, Amber L.; Rice, Danny W.; Liu, Xinyue; Melake-Berhan, Admasu; Huot Creasy, Heather; Puiu, Daniela; Rosovitz, M. J.; Khouri, Hoda M.; Beckstrom-Sternberg, Stephen M.; Allan, Gerard J.; Keim, Paul; Ravel, Jacques; Rabinowicz, Pablo D.

    2011-01-01

    Castor bean is an important oil-producing plant in the Euphorbiaceae family. Its high-quality oil contains up to 90% of the unusual fatty acid ricinoleate, which has many industrial and medical applications. Castor bean seeds also contain ricin, a highly toxic Type 2 ribosome-inactivating protein, which has gained relevance in recent years due to biosafety concerns. In order to gain knowledge on global genetic diversity in castor bean and to ultimately help the development of breeding and forensic tools, we carried out an extensive chloroplast sequence diversity analysis. Taking advantage of the recently published genome sequence of castor bean, we assembled the chloroplast and mitochondrion genomes extracting selected reads from the available whole genome shotgun reads. Using the chloroplast reference genome we used the methylation filtration technique to readily obtain draft genome sequences of 7 geographically and genetically diverse castor bean accessions. These sequence data were used to identify single nucleotide polymorphism markers and phylogenetic analysis resulted in the identification of two major clades that were not apparent in previous population genetic studies using genetic markers derived from nuclear DNA. Two distinct sub-clades could be defined within each major clade and large-scale genotyping of castor bean populations worldwide confirmed previously observed low levels of genetic diversity and showed a broad geographic distribution of each sub-clade. PMID:21750729

  19. PSSARD: protein sequence-structure analysis relational database.

    PubMed

    Guruprasad, Kunchur; Srikanth, K; Babu, A V N

    2005-09-15

    We have implemented a relational database comprising a representative dataset of amino acid sequences and their associated secondary structure. The representative amino acid sequences were selected according to the PDB_SELECT program by choosing proteins corresponding to protein crystal structure data deposited in the protein data bank that share less than 25% overall pair-wise sequence identity. The secondary structure was extracted from the protein data bank website. The information content in the database includes the protein description, PDB code, crystal structure resolution, total number of amino acid residues in the protein chain, amino acid sequence, secondary structure conformation and its summary. The database is freely accessible from the website mentioned below and is useful to query on any of the above fields. The database is particularly useful to quickly retrieve amino acid sequences that are compatible to any super-secondary structure conformation from several proteins simultaneously. PMID:16054209

  20. Designing novel kinases using evolutionary sequence analysis

    NASA Astrophysics Data System (ADS)

    Mody, Areez; Weiner, Joan; Iyer, Lakshman; Ramanathan, Sharad

    2006-03-01

    Cellular pathways with new functions are thought to arise from the duplication and divergence of proteins in existing pathways. The MAP kinase pathways in eukaryotes provide one example of this. These pathways consist of the MAP kinase proteins which are responsible for evoking the correct response to external stimuli. In the yeast Saccharomyces cerevisiae these pathways detect pheromones, osmolar stresses and nutrient levels, leading the cell into dramatic changes of morphology. Despite being homologous to each other, the MAP kinase proteins show specificity of function. We investigate the nature of the amino acid sequences conferring this specificity. To this end, we i) search the sequences of similar proteins in other Eukaryote species, ii) make a study of simple theoretical models exploring the constraints felt by these protein segments and iii) experimentally construct, a large suite of hybrid proteins made of segments taken from the homologous proteins. These are then expressed in Yeast cells to see what function they are able to perform. Particularly we also ask whether it is possible to design a new kinase protein possessing new function and specificity.

  1. Integrating Sequence Evolution into Probabilistic Orthology Analysis.

    PubMed

    Ullah, Ikram; Sjöstrand, Joel; Andersson, Peter; Sennblad, Bengt; Lagergren, Jens

    2015-11-01

    Orthology analysis, that is, finding out whether a pair of homologous genes are orthologs - stemming from a speciation - or paralogs - stemming from a gene duplication - is of central importance in computational biology, genome annotation, and phylogenetic inference. In particular, an orthologous relationship makes functional equivalence of the two genes highly likely. A major approach to orthology analysis is to reconcile a gene tree to the corresponding species tree, (most commonly performed using the most parsimonious reconciliation, MPR). However, most such phylogenetic orthology methods infer the gene tree without considering the constraints implied by the species tree and, perhaps even more importantly, only allow the gene sequences to influence the orthology analysis through the a priori reconstructed gene tree. We propose a sound, comprehensive Bayesian Markov chain Monte Carlo-based method, DLRSOrthology, to compute orthology probabilities. It efficiently sums over the possible gene trees and jointly takes into account the current gene tree, all possible reconciliations to the species tree, and the, typically strong, signal conveyed by the sequences. We compare our method with PrIME-GEM, a probabilistic orthology approach built on a probabilistic duplication-loss model, and MrBayesMPR, a probabilistic orthology approach that is based on conventional Bayesian inference coupled with MPR. We find that DLRSOrthology outperforms these competing approaches on synthetic data as well as on biological data sets and is robust to incomplete taxon sampling artifacts. PMID:26130236

  2. Natural vs. random protein sequences: Discovering combinatorics properties on amino acid words.

    PubMed

    Santoni, Daniele; Felici, Giovanni; Vergni, Davide

    2016-02-21

    Casual mutations and natural selection have driven the evolution of protein amino acid sequences that we observe at present in nature. The question about which is the dominant force of proteins evolution is still lacking of an unambiguous answer. Casual mutations tend to randomize protein sequences while, in order to have the correct functionality, one expects that selection mechanisms impose rigid constraints on amino acid sequences. Moreover, one also has to consider that the space of all possible amino acid sequences is so astonishingly large that it could be reasonable to have a well tuned amino acid sequence indistinguishable from a random one. In order to study the possibility to discriminate between random and natural amino acid sequences, we introduce different measures of association between pairs of amino acids in a sequence, and apply them to a dataset of 1047 natural protein sequences and 10,470 random sequences, carefully generated in order to preserve the relative length and amino acid distribution of the natural proteins. We analyze the multidimensional measures with machine learning techniques and show that, to a reasonable extent, natural protein sequences can be differentiated from random ones. PMID:26656109

  3. Complete amino acid sequence of a histidine-rich proteolytic fragment of human ceruloplasmin.

    PubMed

    Kingston, I B; Kingston, B L; Putnam, F W

    1979-04-01

    The complete amino acid sequence has been determined for a fragment of human ceruloplasmin [ferroxidase; iron(II):oxygen oxidoreductase, EC 1.16.3.1]. The fragment (designated Cp F5) contains 159 amino acid residues and has a molecular weight of 18,650; it lacks carbohydrate, is rich in histidine, and contains one free cysteine that may be part of a copper-binding site. This fragment is present in most commercial preparations of ceruloplasmin, probably owing to proteolytic degradation, but can also be obtained by limited cleavage of single-chain ceruloplasmin with plasmin. Cp F5 probably is an intact domain attached to the COOH-terminal end of single-chain ceruloplasmin via a labile interdomain peptide bond. A model of the secondary structure predicted by empirical methods suggests that almost one-third of the amino acid residues are distributed in alpha helices, about a third in beta-sheet structure, and the remainder in beta turns and unidentified structures. Computer analysis of the amino acid sequence has not demonstrated a statistically significant relationship between this ceruloplasmin fragment and any other protein, but there is some evidence for an internal duplication. PMID:287005

  4. FAST: FAST Analysis of Sequences Toolbox.

    PubMed

    Lawrence, Travis J; Kauffman, Kyle T; Amrine, Katherine C H; Carper, Dana L; Lee, Raymond S; Becich, Peter J; Canales, Claudia J; Ardell, David H

    2015-01-01

    FAST (FAST Analysis of Sequences Toolbox) provides simple, powerful open source command-line tools to filter, transform, annotate and analyze biological sequence data. Modeled after the GNU (GNU's Not Unix) Textutils such as grep, cut, and tr, FAST tools such as fasgrep, fascut, and fastr make it easy to rapidly prototype expressive bioinformatic workflows in a compact and generic command vocabulary. Compact combinatorial encoding of data workflows with FAST commands can simplify the documentation and reproducibility of bioinformatic protocols, supporting better transparency in biological data science. Interface self-consistency and conformity with conventions of GNU, Matlab, Perl, BioPerl, R, and GenBank help make FAST easy and rewarding to learn. FAST automates numerical, taxonomic, and text-based sorting, selection and transformation of sequence records and alignment sites based on content, index ranges, descriptive tags, annotated features, and in-line calculated analytics, including composition and codon usage. Automated content- and feature-based extraction of sites and support for molecular population genetic statistics make FAST useful for molecular evolutionary analysis. FAST is portable, easy to install and secure thanks to the relative maturity of its Perl and BioPerl foundations, with stable releases posted to CPAN. Development as well as a publicly accessible Cookbook and Wiki are available on the FAST GitHub repository at https://github.com/tlawrence3/FAST. The default data exchange format in FAST is Multi-FastA (specifically, a restriction of BioPerl FastA format). Sanger and Illumina 1.8+ FastQ formatted files are also supported. FAST makes it easier for non-programmer biologists to interactively investigate and control biological data at the speed of thought. PMID:26042145

  5. FAST: FAST Analysis of Sequences Toolbox

    PubMed Central

    Lawrence, Travis J.; Kauffman, Kyle T.; Amrine, Katherine C. H.; Carper, Dana L.; Lee, Raymond S.; Becich, Peter J.; Canales, Claudia J.; Ardell, David H.

    2015-01-01

    FAST (FAST Analysis of Sequences Toolbox) provides simple, powerful open source command-line tools to filter, transform, annotate and analyze biological sequence data. Modeled after the GNU (GNU's Not Unix) Textutils such as grep, cut, and tr, FAST tools such as fasgrep, fascut, and fastr make it easy to rapidly prototype expressive bioinformatic workflows in a compact and generic command vocabulary. Compact combinatorial encoding of data workflows with FAST commands can simplify the documentation and reproducibility of bioinformatic protocols, supporting better transparency in biological data science. Interface self-consistency and conformity with conventions of GNU, Matlab, Perl, BioPerl, R, and GenBank help make FAST easy and rewarding to learn. FAST automates numerical, taxonomic, and text-based sorting, selection and transformation of sequence records and alignment sites based on content, index ranges, descriptive tags, annotated features, and in-line calculated analytics, including composition and codon usage. Automated content- and feature-based extraction of sites and support for molecular population genetic statistics make FAST useful for molecular evolutionary analysis. FAST is portable, easy to install and secure thanks to the relative maturity of its Perl and BioPerl foundations, with stable releases posted to CPAN. Development as well as a publicly accessible Cookbook and Wiki are available on the FAST GitHub repository at https://github.com/tlawrence3/FAST. The default data exchange format in FAST is Multi-FastA (specifically, a restriction of BioPerl FastA format). Sanger and Illumina 1.8+ FastQ formatted files are also supported. FAST makes it easier for non-programmer biologists to interactively investigate and control biological data at the speed of thought. PMID:26042145

  6. Whole genome sequence analysis of Mycobacterium suricattae.

    PubMed

    Dippenaar, Anzaan; Parsons, Sven David Charles; Sampson, Samantha Leigh; van der Merwe, Ruben Gerhard; Drewe, Julian Ashley; Abdallah, Abdallah Musa; Siame, Kabengele Keith; Gey van Pittius, Nicolaas Claudius; van Helden, Paul David; Pain, Arnab; Warren, Robin Mark

    2015-12-01

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi. PMID:26542221

  7. Matrix genes of measles virus and canine distemper virus: cloning, nucleotide sequences, and deduced amino acid sequences.

    PubMed Central

    Bellini, W J; Englund, G; Richardson, C D; Rozenblatt, S; Lazzarini, R A

    1986-01-01

    The nucleotide sequences encoding the matrix (M) proteins of measles virus (MV) and canine distemper virus (CDV) were determined from cDNA clones containing these genes in their entirety. In both cases, single open reading frames specifying basic proteins of 335 amino acid residues were predicted from the nucleotide sequences. Both viral messages were composed of approximately 1,450 nucleotides and contained 400 nucleotides of presumptive noncoding sequences at their respective 3' ends. MV and CDV M-protein-coding regions were 67% homologous at the nucleotide level and 76% homologous at the amino acid level. Only chance homology was observed in the 400-nucleotide trailer sequences. Comparisons of the M protein sequences of MV and CDV with the sequence reported for Sendai virus (B. M. Blumberg, K. Rose, M. G. Simona, L. Roux, C. Giorgi, and D. Kolakofsky, J. Virol. 52:656-663; Y. Hidaka, T. Kanda, K. Iwasaki, A. Nomoto, T. Shioda, and H. Shibuta, Nucleic Acids Res. 12:7965-7973) indicated the greatest homology among these M proteins in the carboxyterminal third of the molecule. Secondary-structure analyses of this shared region indicated a structurally conserved, hydrophobic sequence which possibly interacted with the lipid bilayer. Images PMID:3754588

  8. Sequence analysis of the Choristoneura occidentalis granulovirus genome.

    PubMed

    Escasa, Shannon R; Lauzon, Hilary A M; Mathur, Amanda C; Krell, Peter J; Arif, Basil M

    2006-07-01

    The genome of the Choristoneura occidentalis granulovirus (ChocGV) isolated from the western spruce budworm, Choristoneura occidentalis, was sequenced completely. It was 104,710 bp long, with a 67.3% A+T content and contained 116 potential open reading frames (ORFs) covering 88.4% of the genome. Of these, 29 ORFs were conserved in all fully sequenced baculovirus genomes, 30 were GV-specific, 53 were present in some nucleopolyhedroviruses (NPVs) and/or GVs, three were common to ChocGV and Choristoneura fumiferana GV (ChfuGV) and one was so far unique. To date, ChocGV is the only GV identified that contains a homologue of the apoptosis inhibitor protein P35/P49, present in some group I NPVs. It is also the first GV without a Xestia c-nigrum GV ORF 26 homologue. Five homologous regions (hrs)/repeat regions, lacking typical NPV hr palindromes were identified. ChocGV hrs were similar to each other but not to other GV hrs. A 1.8 kb repeat region with a high A+T content (81%) and multiple repeats of 21-210 bp was found between choc36 and 37. This area resembled the non-homologous region origin of DNA replication (non-hr ori) identified in Cryptophlebia leucotreta GV (CrleGV) and Cydia pomonella GV (CpGV). Based on the mean amino acid identities of homologous proteins, ChocGV was closest to fully sequenced genomes CpGV (52.3%) and CrleGV (52.1%). The closest amino acid identity was to individual ORFs from the partially sequenced ChfuGV genome (97.2% in 38 ORFs). Phylogenetic analysis placed ChocGV in a clade with CrleGV and CpGV. PMID:16760394

  9. Detection and isolation of nucleic acid sequences using a bifunctional hybridization probe

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    2000-01-01

    A method for detecting and isolating a target sequence in a sample of nucleic acids is provided using a bifunctional hybridization probe capable of hybridizing to the target sequence that includes a detectable marker and a first complexing agent capable of forming a binding pair with a second complexing agent. A kit is also provided for detecting a target sequence in a sample of nucleic acids using a bifunctional hybridization probe according to this method.

  10. Whole-genome sequencing in outbreak analysis.

    PubMed

    Gilchrist, Carol A; Turner, Stephen D; Riley, Margaret F; Petri, William A; Hewlett, Erik L

    2015-07-01

    In addition to the ever-present concern of medical professionals about epidemics of infectious diseases, the relative ease of access and low cost of obtaining, producing, and disseminating pathogenic organisms or biological toxins mean that bioterrorism activity should also be considered when facing a disease outbreak. Utilization of whole-genome sequencing (WGS) in outbreak analysis facilitates the rapid and accurate identification of virulence factors of the pathogen and can be used to identify the path of disease transmission within a population and provide information on the probable source. Molecular tools such as WGS are being refined and advanced at a rapid pace to provide robust and higher-resolution methods for identifying, comparing, and classifying pathogenic organisms. If these methods of pathogen characterization are properly applied, they will enable an improved public health response whether a disease outbreak was initiated by natural events or by accidental or deliberate human activity. The current application of next-generation sequencing (NGS) technology to microbial WGS and microbial forensics is reviewed. PMID:25876885

  11. Whole-Genome Sequencing in Outbreak Analysis

    PubMed Central

    Turner, Stephen D.; Riley, Margaret F.; Petri, William A.; Hewlett, Erik L.

    2015-01-01

    SUMMARY In addition to the ever-present concern of medical professionals about epidemics of infectious diseases, the relative ease of access and low cost of obtaining, producing, and disseminating pathogenic organisms or biological toxins mean that bioterrorism activity should also be considered when facing a disease outbreak. Utilization of whole-genome sequencing (WGS) in outbreak analysis facilitates the rapid and accurate identification of virulence factors of the pathogen and can be used to identify the path of disease transmission within a population and provide information on the probable source. Molecular tools such as WGS are being refined and advanced at a rapid pace to provide robust and higher-resolution methods for identifying, comparing, and classifying pathogenic organisms. If these methods of pathogen characterization are properly applied, they will enable an improved public health response whether a disease outbreak was initiated by natural events or by accidental or deliberate human activity. The current application of next-generation sequencing (NGS) technology to microbial WGS and microbial forensics is reviewed. PMID:25876885

  12. The Role of HIV-1 gp41 Glycoprotein in Infectious Tropism Inferred from Physico-Chemical Properties of its Amino Acid Sequence

    NASA Astrophysics Data System (ADS)

    Figueroa, E.; Villarreal, C.; Huerta, L.; Cocho, G.

    2006-09-01

    We performed a statistical analysis of the amino acid sequence of the gp41 ectodomain of the Human Immunodeficiency Virus type 1. We found strong correlations between physicochemical properties of highly variable residues and the viral infectious tropism.

  13. Time fluctuation analysis of forest fire sequences

    NASA Astrophysics Data System (ADS)

    Vega Orozco, Carmen D.; Kanevski, Mikhaïl; Tonini, Marj; Golay, Jean; Pereira, Mário J. G.

    2013-04-01

    Forest fires are complex events involving both space and time fluctuations. Understanding of their dynamics and pattern distribution is of great importance in order to improve the resource allocation and support fire management actions at local and global levels. This study aims at characterizing the temporal fluctuations of forest fire sequences observed in Portugal, which is the country that holds the largest wildfire land dataset in Europe. This research applies several exploratory data analysis measures to 302,000 forest fires occurred from 1980 to 2007. The applied clustering measures are: Morisita clustering index, fractal and multifractal dimensions (box-counting), Ripley's K-function, Allan Factor, and variography. These algorithms enable a global time structural analysis describing the degree of clustering of a point pattern and defining whether the observed events occur randomly, in clusters or in a regular pattern. The considered methods are of general importance and can be used for other spatio-temporal events (i.e. crime, epidemiology, biodiversity, geomarketing, etc.). An important contribution of this research deals with the analysis and estimation of local measures of clustering that helps understanding their temporal structure. Each measure is described and executed for the raw data (forest fires geo-database) and results are compared to reference patterns generated under the null hypothesis of randomness (Poisson processes) embedded in the same time period of the raw data. This comparison enables estimating the degree of the deviation of the real data from a Poisson process. Generalizations to functional measures of these clustering methods, taking into account the phenomena, were also applied and adapted to detect time dependences in a measured variable (i.e. burned area). The time clustering of the raw data is compared several times with the Poisson processes at different thresholds of the measured function. Then, the clustering measure value

  14. Bacteria obtained from a sequencing batch reactor that are capable of growth on dehydroabietic acid.

    PubMed Central

    Mohn, W W

    1995-01-01

    Eleven isolates capable of growth on the resin acid dehydroabietic acid (DhA) were obtained from a sequencing batch reactor designed to treat a high-strength process stream from a paper mill. The isolates belonged to two groups, represented by strains DhA-33 and DhA-35, which were characterized. In the bioreactor, bacteria like DhA-35 were more abundant than those like DhA-33. The population in the bioreactor of organisms capable of growth on DhA was estimated to be 1.1 x 10(6) propagules per ml, based on a most-probable-number determination. Analysis of small-subunit rRNA partial sequences indicated that DhA-33 was most closely related to Sphingomonas yanoikuyae (Sab = 0.875) and that DhA-35 was most closely related to Zoogloea ramigera (Sab = 0.849). Both isolates additionally grew on other abietanes, i.e., abietic and palustric acids, but not on the pimaranes, pimaric and isopimaric acids. For DhA-33 and DhA-35 with DhA as the sole organic substrate, doubling times were 2.7 and 2.2 h, respectively, and growth yields were 0.30 and 0.25 g of protein per g of DhA, respectively. Glucose as a cosubstrate stimulated growth of DhA-33 on DhA and stimulated DhA degradation by the culture. Pyruvate as a cosubstrate did not stimulate growth of DhA-35 on DhA and reduced the specific rate of DhA degradation of the culture. DhA induced DhA and abietic acid degradation activities in both strains, and these activities were heat labile. Cell suspensions of both strains consumed DhA at a rate of 6 mumol mg of protein-1 h-1.(ABSTRACT TRUNCATED AT 250 WORDS) PMID:7793937

  15. Analysis of human immunodeficiency virus type 1 nef gene sequences present in vivo.

    PubMed Central

    Shugars, D C; Smith, M S; Glueck, D H; Nantermet, P V; Seillier-Moiseiwitsch, F; Swanstrom, R

    1993-01-01

    The nef genes of the human immunodeficiency viruses type 1 and 2 (HIV-1 and HIV-2) and the related simian immunodeficiency viruses (SIVs) encode a protein (Nef) whose role in virus replication and cytopathicity remains uncertain. As an attempt to elucidate the function of nef, we characterized the nucleotide and corresponding protein sequences of naturally occurring nef genes obtained from several HIV-1-infected individuals. A consensus Nef sequence was derived and used to identify several features that were highly conserved among the Nef sequences. These features included a nearly invariant myristylation signal, regions of sequence polymorphism and variable duplication, a region with an acidic charge, a (Pxx)4 repeat sequence, and a potential protein kinase C phosphorylation site. Clustering of premature stop codons at position 124 was noted in 6 of the 54 Nef sequences. Further analysis revealed four stretches of residues that were highly conserved not only among the patient-derived HIV-1 Nef sequences, but also among the Nef sequences of HIV-2 and the SIVs, suggesting that Nef proteins expressed by these retroviruses are functionally equivalent. The "Nef-defining" sequences were used to evaluate the sequence alignments of known proteins reported to share sequence similarity with Nef sequences and to conduct additional computer-based searches for similar protein sequences. A gene encoding the consensus Nef sequence was also generated. This gene encodes a full-length Nef protein that should be a valuable tool in further studies of Nef function. Images PMID:8043040

  16. Cloning and sequence analysis of the muramidase-2 gene from Enterococcus hirae.

    PubMed Central

    Chu, C P; Kariyama, R; Daneo-Moore, L; Shockman, G D

    1992-01-01

    Extracellular muramidase-2 of Enterococcus hirae ATCC 9790 was purified to homogeneity by substrate binding, guanidine-HCl extraction, and reversed-phase chromatography. A monoclonal antibody, 2F8, which specifically recognizes muramidase-2, was used to screen a genomic library of E. hirae ATCC 9790 DNA in bacteriophage lambda gt11. A positive phage clone containing a 4.5-kb DNA insert was isolated and analyzed. The EcoRI-digested 4.5-kb fragment was cut into 2.3-, 1.0-, and 1.5-kb pieces by using restriction enzymes KpnI, Sau3AI, and PstI, and each fragment was subcloned into plasmid pJDC9 or pUC19. The nucleotide sequence of each subclone was determined. The sequence data indicated an open reading frame encoding a polypeptide of 666 amino acid residues, with a calculated molecular mass of 70,678 Da. The first 24 N-terminal amino acids of purified extracellular muramidase-2 were in very good agreement with the deduced amino acid sequence after a 49-amino-acid putative signal sequence. Analysis of the deduced amino acid sequence showed the presence at the C-terminal region of the protein of six highly homologous repeat units separated by nonhomologous intervening sequences that are highly enriched in serine and threonine. The overall sequence showed a high degree of homology with a recently cloned Streptococcus faecalis autolysin. Images PMID:1347040

  17. Direct Chloroplast Sequencing: Comparison of Sequencing Platforms and Analysis Tools for Whole Chloroplast Barcoding

    PubMed Central

    Brozynska, Marta; Furtado, Agnelo; Henry, Robert James

    2014-01-01

    Direct sequencing of total plant DNA using next generation sequencing technologies generates a whole chloroplast genome sequence that has the potential to provide a barcode for use in plant and food identification. Advances in DNA sequencing platforms may make this an attractive approach for routine plant identification. The HiSeq (Illumina) and Ion Torrent (Life Technology) sequencing platforms were used to sequence total DNA from rice to identify polymorphisms in the whole chloroplast genome sequence of a wild rice plant relative to cultivated rice (cv. Nipponbare). Consensus chloroplast sequences were produced by mapping sequence reads to the reference rice chloroplast genome or by de novo assembly and mapping of the resulting contigs to the reference sequence. A total of 122 polymorphisms (SNPs and indels) between the wild and cultivated rice chloroplasts were predicted by these different sequencing and analysis methods. Of these, a total of 102 polymorphisms including 90 SNPs were predicted by both platforms. Indels were more variable with different sequencing methods, with almost all discrepancies found in homopolymers. The Ion Torrent platform gave no apparent false SNP but was less reliable for indels. The methods should be suitable for routine barcoding using appropriate combinations of sequencing platform and data analysis. PMID:25329378

  18. Whole exome sequence analysis of Peters anomaly.

    PubMed

    Weh, Eric; Reis, Linda M; Happ, Hannah C; Levin, Alex V; Wheeler, Patricia G; David, Karen L; Carney, Erin; Angle, Brad; Hauser, Natalie; Semina, Elena V

    2014-12-01

    Peters anomaly is a rare form of anterior segment ocular dysgenesis, which can also be associated with additional systemic defects. At this time, the majority of cases of Peters anomaly lack a genetic diagnosis. We performed whole exome sequencing of 27 patients with syndromic or isolated Peters anomaly to search for pathogenic mutations in currently known ocular genes. Among the eight previously recognized Peters anomaly genes, we identified a de novo missense mutation in PAX6, c.155G>A, p.(Cys52Tyr), in one patient. Analysis of 691 additional genes currently associated with a different ocular phenotype identified a heterozygous splicing mutation c.1025+2T>A in TFAP2A, a de novo heterozygous nonsense mutation c.715C>T, p.(Gln239*) in HCCS, a hemizygous mutation c.385G>A, p.(Glu129Lys) in NDP, a hemizygous mutation c.3446C>T, p.(Pro1149Leu) in FLNA, and compound heterozygous mutations c.1422T>A, p.(Tyr474*) and c.2544G>A, p.(Met848Ile) in SLC4A11; all mutations, except for the FLNA and SLC4A11 c.2544G>A alleles, are novel. This is the first study to use whole exome sequencing to discern the genetic etiology of a large cohort of patients with syndromic or isolated Peters anomaly. We report five new genes associated with this condition and suggest screening of TFAP2A and FLNA in patients with Peters anomaly and relevant syndromic features and HCCS, NDP and SLC4A11 in patients with isolated Peters anomaly. PMID:25182519

  19. Whole exome sequence analysis of Peters anomaly

    PubMed Central

    Weh, Eric; Reis, Linda M.; Happ, Hannah C.; Levin, Alex V.; Wheeler, Patricia G.; David, Karen L.; Carney, Erin; Angle, Brad; Hauser, Natalie

    2015-01-01

    Peters anomaly is a rare form of anterior segment ocular dysgenesis, which can also be associated with additional systemic defects. At this time, the majority of cases of Peters anomaly lack a genetic diagnosis. We performed whole exome sequencing of 27 patients with syndromic or isolated Peters anomaly to search for pathogenic mutations in currently known ocular genes. Among the eight previously recognized Peters anomaly genes, we identified a de novo missense mutation in PAX6, c.155G>A, p.(Cys52Tyr), in one patient. Analysis of 691 additional genes currently associated with a different ocular phenotype identified a heterozygous splicing mutation c.1025+2T>A in TFAP2A, a de novo heterozygous nonsense mutation c.715C>T, p.(Gln239*) in HCCS, a hemizygous mutation c.385G>A, p.(Glu129Lys) in NDP, a hemizygous mutation c.3446C>T, p.(Pro1149Leu) in FLNA, and compound heterozygous mutations c.1422T>A, p.(Tyr474*) and c.2544G>A, p.(Met848Ile) in SLC4A11; all mutations, except for the FLNA and SLC4A11 c.2544G>A alleles, are novel. This is the frst study to use whole exome sequencing to discern the genetic etiology of a large cohort of patients with syndromic or isolated Peters anomaly. We report five new genes associated with this condition and suggest screening of TFAP2A and FLNA in patients with Peters anomaly and relevant syndromic features and HCCS, NDP and SLC4A11 in patients with isolated Peters anomaly. PMID:25182519

  20. Identification of random nucleic acid sequence aberrations using dual capture probes which hybridize to different chromosome regions

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1998-01-01

    A method is provided for detecting nucleic acid sequence aberrations using two immobilization steps. According to the method, a nucleic acid sequence aberration is detected by detecting nucleic acid sequences having both a first nucleic acid sequence type (e.g., from a first chromosome) and a second nucleic acid sequence type (e.g., from a second chromosome), the presence of the first and the second nucleic acid sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. In the method, immobilization of a first hybridization probe is used to isolate a first set of nucleic acids in the sample which contain the first nucleic acid sequence type. Immobilization of a second hybridization probe is then used to isolate a second set of nucleic acids from within the first set of nucleic acids which contain the second nucleic acid sequence type. The second set of nucleic acids are then detected, their presence indicating the presence of a nucleic acid sequence aberration.

  1. Identification of random nucleic acid sequence aberrations using dual capture probes which hybridize to different chromosome regions

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1998-03-24

    A method is provided for detecting nucleic acid sequence aberrations using two immobilization steps. According to the method, a nucleic acid sequence aberration is detected by detecting nucleic acid sequences having both a first nucleic acid sequence type (e.g., from a first chromosome) and a second nucleic acid sequence type (e.g., from a second chromosome), the presence of the first and the second nucleic acid sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. In the method, immobilization of a first hybridization probe is used to isolate a first set of nucleic acids in the sample which contain the first nucleic acid sequence type. Immobilization of a second hybridization probe is then used to isolate a second set of nucleic acids from within the first set of nucleic acids which contain the second nucleic acid sequence type. The second set of nucleic acids are then detected, their presence indicating the presence of a nucleic acid sequence aberration. 14 figs.

  2. tax and rex Sequences of bovine leukaemia virus from globally diverse isolates: rex amino acid sequence more variable than tax.

    PubMed

    McGirr, K M; Buehring, G C

    2005-02-01

    Bovine leukaemia virus (BLV) is an important agricultural problem with high costs to the dairy industry. Here, we examine the variation of the tax and rex genes of BLV. The tax and rex genes share 420 bases and have overlapping reading frames. The tax gene encodes a protein that functions as a transactivator of the BLV promoter, is required for viral replication, acts on cellular promoters, and is responsible for oncogenesis. The rex facilitates the export of viral mRNAs from the nucleus and regulates transcription. We have sequenced five new isolates of the tax/rex gene. We examined the five new and three previously published tax/rex DNA and predicted amino acid sequences of BLV isolates from cattle in representative regions worldwide. The highest variation among nucleic acid sequences for tax and rex was 7% and 5%, respectively; among predicted amino acid sequences for Tax and Rex, 9% and 11%, respectively. Significantly more nucleotide changes resulted in predicted amino acid changes in the rex gene than in the tax gene (P < or = 0.0006). This variability is higher than previously reported for any region of the viral genome. This research may also have implications for the development of Tax-based vaccines. PMID:15702995

  3. Complete amino acid sequence of the medium-chain S-acyl fatty acid synthetase thio ester hydrolase from rat mammary gland

    SciTech Connect

    Randhawa, Z.I.; Smith, S.

    1987-03-10

    The complete amino acid sequence of the medium-chain S-acyl fatty acid synthetase thio ester hydrolase (thioesterase II) from rat mammary gland is presented. Most of the sequence was derived by analysis of (/sup 14/C)-labelled peptide fragments produced by cleavage at methionyl, glutamyl, lysyl, arginyl, and tryptophanyl residues. A small section of the sequence was deduced from a previously analyzed cDNA clone. The protein consists of 260 residues and has a blocked amino-terminal methionine and calculated M/sub r/ of 29,212. The carboxy-terminal sequence, verified by Edman degradation of the carboxy-terminal cyanogen bromide fragment and carboxypeptidase Y digestion of the intact thioesterase II, terminates with a serine residue and lacks three additional residues predicted by the cDNA sequence. The native enzyme contains three cysteine residues but no disulfide bridges. The active site serine residue is located at position 101. The rat mammary gland thioesterase II exhibits approximately 40% homology with a thioesterase from mallard uropygial gland, the sequence of which was recently determined by cDNA analysis. Thus the two enzymes may share similar structural features and a common evolutionary origin. The location of the active site in these thioesterases differs from that of other serine active site esterases; indeed, the enzymes do not exhibit any significant homology with other serine esterases, suggesting that they may constitute a separate new family of serine active site enzymes.

  4. The amino acid sequence of protein CM-3 from Dendroaspis polylepis polylepis (black mamba) venom.

    PubMed

    Joubert, F J

    1985-01-01

    Protein CM-3 from Dendroaspis polylepis polylepis venom was purified by gel filtration and ion exchange chromatography. It comprises 65 amino acids including eight half-cystines. The complete amino acid sequence of protein CM-3 has been elucidated. The sequence (residues 1-50) resembles that of the N-terminal sequence of the subunits of a synergistic type protein and residues 51-65 that of the C-terminal sequence of an angusticeps type protein. Mixtures of protein CM-3 and angusticeps type proteins showed no apparent synergistic effect, in that their toxicity in combination was no greater than the sum of their individual toxicities. PMID:4029488

  5. Formation Sequences of Iron Minerals in the Acidic Alteration Products and Variation of Hydrothermal Fluid Conditions

    NASA Astrophysics Data System (ADS)

    Isobe, H.; Yoshizawa, M.

    2008-12-01

    Iron minerals have important role in environmental issues not only on the Earth but also other terrestrial planets. Iron mineral species related to alteration products of primary minerals with surface or subsurface fluids are characterized by temperature, acidity and redox conditions of the fluids. We can see various iron- bearing alteration products in alteration products around fumaroles in geothermal/volcanic areas. In this study, zonal structures of iron minerals in alteration products of the geothermal area are observed to elucidate temporal and spatial variation of hydrothermal fluids. Alteration of the pyroxene-amphibole andesite of Garan-dake volcano, Oita, Japan occurs by the acidic hydrothermal fluid to form cristobalite leaching out elements other than Si. Hand specimens with unaltered or weakly altered core and cristobalite crust show various sequences of layers. XRD analysis revealed that the alteration degree is represented by abundance of cristobalite. Intermediately altered layers are characterized by occurrence including alunite, pyrite, kaolinite, goethite and hematite. A specimen with reddish brown core surrounded by cristobalite-rich white crust has brown colored layers at the boundary of core and the crust. Reddish core is characterized by occurrence of crystalline hematite by XRD. Another hand specimen has light gray core, which represents reduced conditions, and white cristobalite crust with light brown and reddish brown layers of ferric iron minerals between the core and the crust. On the other hand, hornblende crystals, typical ferrous iron-bearing mineral of the host rock, are well preserved in some samples with strongly decolorized cristobalite-rich groundmass. Hydrothermal alteration experiments of iron-rich basaltic material shows iron mineral species depend on acidity and temperature of the fluid. Oxidation states of the iron-bearing mineral species are strongly influenced by the acidity and redox conditions. Variations of alteration

  6. Time-dependent accident sequence analysis

    SciTech Connect

    Chu, T.L.

    1983-01-01

    One problem of the current event tree methodology is that the transitions between accident sequences are not modeled. The causes of transitions are mostly due to operator actions during an accident. A model for such transitions is presented. A generalized algorithm is used for quantification. In the more realistic accident analysis, the progression of the physical processes, which determines the time available for proper operators response, is modeled. Furthermore, the uncertainty associated with the physical modeling is considered. As an example, the approach is applied to analyze TMI-type accidents. Statistical evidence is collected and used in assessing the frequency of stuck-open pressure operated relief valve at B and W plants as well as the frequency of misdiagnosis. Statistical data are also used in modeling the timing of operator actions during the accident. A thermal code (CUT) is developed to determine the time at which the core uncovery occurs. A response surface is used to propagate the uncertainty associated with the thermal code.

  7. The Chinese hamster Alu-equivalent sequence: a conserved highly repetitious, interspersed deoxyribonucleic acid sequence in mammals has a structure suggestive of a transposable element.

    PubMed Central

    Haynes, S R; Toomey, T P; Leinwand, L; Jelinek, W R

    1981-01-01

    A consensus sequence has been determined for a major interspersed deoxyribonucleic acid repeat in the genome of Chinese hamster ovary cells (CHO cells). This sequence is extensively homologous to (i) the human Alu sequence (P. L. Deininger et al., J. Mol. Biol., in press), (ii) the mouse B1 interspersed repetitious sequence (Krayev et al., Nucleic Acids Res. 8:1201-1215, 1980) (iii) an interspersed repetitious sequence from African green monkey deoxyribonucleic acid (Dhruva et al., Proc. Natl. Acad. Sci. U.S.A. 77:4514-4518, 1980) and (iv) the CHO and mouse 4.5S ribonucleic acid (this report; F. Harada and N. Kato, Nucleic Acids Res. 8:1273-1285, 1980). Because the CHO consensus sequence shows significant homology to the human Alu sequence it is termed the CHO Alu-equivalent sequence. A conserved structure surrounding CHO Alu-equivalent family members can be recognized. It is similar to that surrounding the human Alu and the mouse B1 sequences, and is represented as follows: direct repeat-CHO-Alu-A-rich sequence-direct repeat. A composite interspersed repetitious sequence has been identified. Its structure is represented as follows: direct repeat-residue 47 to 107 of CHO-Alu-non-Alu repetitious sequence-A-rich sequence-direct repeat. Because the Alu flanking sequences resemble those that flank known transposable elements, we think it likely that the Alu sequence dispersed throughout the mammalian genome by transposition. Images PMID:9279371

  8. An analysis of the feasibility of short read sequencing

    PubMed Central

    Whiteford, Nava; Haslam, Niall; Weber, Gerald; Prügel-Bennett, Adam; Essex, Jonathan W.; Roach, Peter L.; Bradley, Mark; Neylon, Cameron

    2005-01-01

    Several methods for ultra high-throughput DNA sequencing are currently under investigation. Many of these methods yield very short blocks of sequence information (reads). Here we report on an analysis showing the level of genome sequencing possible as a function of read length. It is shown that re-sequencing and de novo sequencing of the majority of a bacterial genome is possible with read lengths of 20–30 nt, and that reads of 50 nt can provide reconstructed contigs (a contiguous fragment of sequence data) of 1000 nt and greater that cover 80% of human chromosome 1. PMID:16275781

  9. Sequencing, analysis, and annotation of expressed sequence tags for Camelus dromedarius.

    PubMed

    Al-Swailem, Abdulaziz M; Shehata, Maher M; Abu-Duhier, Faisel M; Al-Yamani, Essam J; Al-Busadah, Khalid A; Al-Arawi, Mohammed S; Al-Khider, Ali Y; Al-Muhaimeed, Abdullah N; Al-Qahtani, Fahad H; Manee, Manee M; Al-Shomrani, Badr M; Al-Qhtani, Saad M; Al-Harthi, Amer S; Akdemir, Kadir C; Inan, Mehmet S; Otu, Hasan H

    2010-01-01

    Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and approximately 40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism. PMID:20502665

  10. Sequencing, Analysis, and Annotation of Expressed Sequence Tags for Camelus dromedarius

    PubMed Central

    Al-Swailem, Abdulaziz M.; Shehata, Maher M.; Abu-Duhier, Faisel M.; Al-Yamani, Essam J.; Al-Busadah, Khalid A.; Al-Arawi, Mohammed S.; Al-Khider, Ali Y.; Al-Muhaimeed, Abdullah N.; Al-Qahtani, Fahad H.; Manee, Manee M.; Al-Shomrani, Badr M.; Al-Qhtani, Saad M.; Al-Harthi, Amer S.; Akdemir, Kadir C.; Otu, Hasan H.

    2010-01-01

    Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and ∼40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism. PMID:20502665

  11. Project Report: Automatic Sequence Processor Software Analysis

    NASA Technical Reports Server (NTRS)

    Benjamin, Brandon

    2011-01-01

    The Mission Planning and Sequencing (MPS) element of Multi-Mission Ground System and Services (MGSS) provides space missions with multi-purpose software to plan spacecraft activities, sequence spacecraft commands, and then integrate these products and execute them on spacecraft. Jet Propulsion Laboratory (JPL) is currently is flying many missions. The processes for building, integrating, and testing the multi-mission uplink software need to be improved to meet the needs of the missions and the operations teams that command the spacecraft. The Multi-Mission Sequencing Team is responsible for collecting and processing the observations, experiments and engineering activities that are to be performed on a selected spacecraft. The collection of these activities is called a sequence and ultimately a sequence becomes a sequence of spacecraft commands. The operations teams check the sequence to make sure that no constraints are violated. The workflow process involves sending a program start command, which activates the Automatic Sequence Processor (ASP). The ASP is currently a file-based system that is comprised of scripts written in perl, c-shell and awk. Once this start process is complete, the system checks for errors and aborts if there are any; otherwise the system converts the commands to binary, and then sends the resultant information to be radiated to the spacecraft.

  12. Partial N-terminal sequence analysis of human class II molecules expressing the DQw3 determinant.

    PubMed

    Obata, F; Endo, T; Yoshii, M; Otani, F; Igarashi, M; Takenouchi, T; Ikeda, H; Ogasawara, K; Kasahara, M; Wakisaka, A

    1985-09-01

    HLA-DQ molecules were isolated from DRw9-homozygous and DR4-homozygous cell lines by using a monoclonal antibody HU-18, which recognizes class II molecules carrying the conventional DQw3 determinant. The partial N-terminal sequence analysis of the DQw3 molecules revealed that they have sequences homologous to those of murine I-A molecules. Within the limits of our sequence analysis, the DQw3 molecules from the two cell lines are identical to each other in both the alpha and beta chains. The DQ alpha as well as DQ beta chains were found to have amino acid substitutions when compared to other I-A-like molecules whose sequences have been reported. These differences may contribute to the DQw supertypic specificity. The polymorphic nature of DQ molecules is in marked contrast to that of DR molecules where DR alpha chains are highly conserved while DR beta chains have easily detectable amino acid substitutions. PMID:2411700

  13. Computer Simulation of the Determination of Amino Acid Sequences in Polypeptides

    ERIC Educational Resources Information Center

    Daubert, Stephen D.; Sontum, Stephen F.

    1977-01-01

    Describes a computer program that generates a random string of amino acids and guides the student in determining the correct sequence of a given protein by using experimental analytic data for that protein. (MLH)

  14. Sample Prep, Workflow Automation and Nucleic Acid Fractionation for Next Generation Sequencing

    SciTech Connect

    Roskey, Mark

    2010-06-03

    Mark Roskey of Caliper LifeSciences discusses how the company's technologies fit into the next generation sequencing workflow on June 3, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

  15. Automated shielding analysis sequences for spent fuel casks

    SciTech Connect

    Tang, J.S.; Parks, C.V.; Hermann, O.W.

    1987-01-01

    Two important Shielding Analysis Sequences (SAS) have recently been developed within the SCALE computational system. These sequences significantly enhance the existing SCALE system capabilities for evaluating radiation doses exterior to spent fuel casks. These new control module sequences (SAS1 and SAS4) and their capabilities are discussed and demonstrated, together with the existing SAS2 sequence that is used to generate radiation sources for spent fuel. Particular attention is given to the new SAS4 sequence which provides an automated scheme for generating and using biasing parameters in a subsequent Monte Carlo analysis of a cask.

  16. Sequencing and analysis of a genomic fragment provide an insight into the Dunaliella viridis genomic sequence.

    PubMed

    Sun, Xiao-Ming; Tang, Yuan-Ping; Meng, Xiang-Zong; Zhang, Wen-Wen; Li, Shan; Deng, Zhi-Rui; Xu, Zheng-Kai; Song, Ren-Tao

    2006-11-01

    Dunaliella is a genus of wall-less unicellular eukaryotic green alga. Its exceptional resistances to salt and various other stresses have made it an ideal model for stress tolerance study. However, very little is known about its genome and genomic sequences. In this study, we sequenced and analyzed a 29,268 bp genomic fragment from Dunaliella viridis. The fragment showed low sequence homology to the GenBank database. At the nucleotide level, only a segment with significant sequence homology to 18S rRNA was found. The fragment contained six putative genes, but only one gene showed significant homology at the protein level to GenBank database. The average GC content of this sequence was 51.1%, which was much lower than that of close related green algae Chlamydomonas (65.7%). Significant segmental duplications were found within this fragment. The duplicated sequences accounted for about 35.7% of the entire region. Large amounts of simple sequence repeats (microsatellites) were found, with strong bias towards (AC)(n) type (76%). Analysis of other Dunaliella genomic sequences in the GenBank database (total 25,749 bp) was in agreement with these findings. These sequence features made it difficult to sequence Dunaliella genomic sequences. Further investigation should be made to reveal the biological significance of these unique sequence features. PMID:17091199

  17. Accuracy of sequence alignment and fold assessment using reduced amino acid alphabets.

    PubMed

    Melo, Francisco; Marti-Renom, Marc A

    2006-06-01

    Reduced or simplified amino acid alphabets group the 20 naturally occurring amino acids into a smaller number of representative protein residues. To date, several reduced amino acid alphabets have been proposed, which have been derived and optimized by a variety of methods. The resulting reduced amino acid alphabets have been applied to pattern recognition, generation of consensus sequences from multiple alignments, protein folding, and protein structure prediction. In this work, amino acid substitution matrices and statistical potentials were derived based on several reduced amino acid alphabets and their performance assessed in a large benchmark for the tasks of sequence alignment and fold assessment of protein structure models, using as a reference frame the standard alphabet of 20 amino acids. The results showed that a large reduction in the total number of residue types does not necessarily translate into a significant loss of discriminative power for sequence alignment and fold assessment. Therefore, some definitions of a few residue types are able to encode most of the relevant sequence/structure information that is present in the 20 standard amino acids. Based on these results, we suggest that the use of reduced amino acid alphabets may allow to increasing the accuracy of current substitution matrices and statistical potentials for the prediction of protein structure of remote homologs. PMID:16506243

  18. Characterization of mouse cellular deoxyribonucleic acid homologous to Abelson murine leukemia virus-specific sequences.

    PubMed Central

    Dale, B; Ozanne, B

    1981-01-01

    The genome of Abelson murine leukemia virus (A-MuLV) consists of sequences derived from both BALB/c mouse deoxyribonucleic acid and the genome of Moloney murine leukemia virus. Using deoxyribonucleic acid linear intermediates as a source of retroviral deoxyribonucleic acid, we isolated a recombinant plasmid which contained 1.9 kilobases of the 3.5-kilobase mouse-derived sequences found in A-MuLV (A-MuLV-specific sequences). We used this clone, designated pSA-17, as a probe restriction enzyme and Southern blot analyses to examine the arrangement of homologous sequences in BALB/c deoxyribonucleic acid (endogenous Abelson sequences). The endogenous Abelson sequences within the mouse genome were interrupted by noncoding regions, suggesting that a rearrangement of the cell sequences was required to produce the sequence found in the virus. Endogenous Abelson sequences were arranged similarly in mice that were susceptible to A-MuLV tumors and in mice that were resistant to A-MuLV tumors. An examination of three BALB/c plasmacytomas and a BALB/c early B-cell tumor likewise revealed no alteration in the arrangement of the endogenous Abelson sequences. Homology to pSA-17 was also observed in deoxyribonucleic acids prepared from rat, hamster, chicken, and human cells. An isolate of A-MuLV which encoded a 160,000-dalton transforming protein (P160) contained 700 more base pairs of mouse sequences than the standard A-MuLV isolate, which encoded a 120,000-dalton transforming protein (P120). Images PMID:9279386

  19. Studies on monotreme proteins. VII. Amino acid sequence of myoglobin from the platypus, Ornithoryhynchus anatinus.

    PubMed

    Fisher, W K; Thompson, E O

    1976-03-01

    Myoglobin isolated from skeletal muscle of the platypus contains 153 amino acid residues. The complete amino acid sequence has been determined following cleavage with cyanogen bromide and further digestion of the four fragments with trypsin, chymotrypsin, pepsin and thermolysin. Sequences of the purified peptides were determined by the dansyl-Edman procedure. The amino acid sequence showed 25 differences from human myoglobin and 24 from kangaroo myoglobin. Amino acid sequences in myoglobins are more conserved than sequences in the alpha- and beta-globin chains, and platypus myoglobin shows a similar number of variations in sequence to kangaroo myoglobin when compared with myoglobin of other species. The date of divergence of the platypus from other mammals was estimated at 102 +/- 31 million years, based on the number of amino acid differences between species and allowing for mutations during the evolutionary period. This estimate differs widely from the estimate given by similar treatment of the alpha- and beta-chain sequences and a constant rate of mutation of globin chains is not supported. PMID:962722

  20. cDNA-derived amino acid sequences of myoglobins from nine species of whales and dolphins.

    PubMed

    Iwanami, Kentaro; Mita, Hajime; Yamamoto, Yasuhiko; Fujise, Yoshihiro; Yamada, Tadasu; Suzuki, Tomohiko

    2006-10-01

    We determined the myoglobin (Mb) cDNA sequences of nine cetaceans, of which six are the first reports of Mb sequences: sei whale (Balaenoptera borealis), Bryde's whale (Balaenoptera edeni), pygmy sperm whale (Kogia breviceps), Stejneger's beaked whale (Mesoplodon stejnegeri), Longman's beaked whale (Indopacetus pacificus), and melon-headed whale (Peponocephala electra), and three confirm the previously determined chemical amino acid sequences: sperm whale (Physeter macrocephalus), common minke whale (Balaenoptera acutorostrata) and pantropical spotted dolphin (Stenella attenuata). We found two types of Mb in the skeletal muscle of pantropical spotted dolphin: Mb I with the same amino acid sequence as that deposited in the protein database, and Mb II, which differs at two amino acid residues compared with Mb I. Using an alignment of the amino acid or cDNA sequences of cetacean Mb, we constructed a phylogenetic tree by the NJ method. Clustering of cetacean Mb amino acid and cDNA sequences essentially follows the classical taxonomy of cetaceans, suggesting that Mb sequence data is valid for classification of cetaceans at least to the family level. PMID:16962803

  1. The BsaHI restriction-modification system: Cloning, sequencing and analysis of conserved motifs

    PubMed Central

    Neely, Robert K; Roberts, Richard J

    2008-01-01

    Background Restriction and modification enzymes typically recognise short DNA sequences of between two and eight bases in length. Understanding the mechanism of this recognition represents a significant challenge that we begin to address for the BsaHI restriction-modification system, which recognises the six base sequence GRCGYC. Results The DNA sequences of the genes for the BsaHI methyltransferase, bsaHIM, and restriction endonuclease, bsaHIR, have been determined (GenBank accession #EU386360), cloned and expressed in E. coli. Both the restriction endonuclease and methyltransferase enzymes share significant similarity with a group of 6 other enzymes comprising the restriction-modification systems HgiDI and HgiGI and the putative HindVP, NlaCORFDP, NpuORFC228P and SplZORFNP restriction-modification systems. A sequence alignment of these homologues shows that their amino acid sequences are largely conserved and highlights several motifs of interest. We target one such conserved motif, reading SPERRFD, at the C-terminal end of the bsaHIR gene. A mutational analysis of these amino acids indicates that the motif is crucial for enzymatic activity. Sequence alignment of the methyltransferase gene reveals a short motif within the target recognition domain that is conserved among enzymes recognising the same sequences. Thus, this motif may be used as a diagnostic tool to define the recognition sequences of the cytosine C5 methyltransferases. Conclusion We have cloned and sequenced the BsaHI restriction and modification enzymes. We have identified a region of the R. BsaHI enzyme that is crucial for its activity. Analysis of the amino acid sequence of the BsaHI methyltransferase enzyme led us to propose two new motifs that can be used in the diagnosis of the recognition sequence of the cytosine C5-methyltransferases. PMID:18479503

  2. Complete amino acid sequence of the myoglobin from the Pacific spotted dolphin, Stenella attenuata graffmani.

    PubMed

    Jones, B N; Wang, C C; Dwulet, F E; Lehman, L D; Meuth, J L; Bogardt, R A; Gurd, F R

    1979-04-25

    The complete amino acid sequence of the major component myoglobin from the Pacific spotted dolphin, Stenella attenuata graffmani, was determined by the automated Edman degradation of several large peptides obtained by specific cleavage of the protein. The acetimidated apomyoglobin was selectively cleaved at its two methionyl residues with cyanogen bromide and at its three arginyl residues by trypsin. By subjecting four of these peptides and the apomyoglobin to automated Edman degradation, over 80% of the primary structure of the protein was obtained. The remainder of the covalent structure was determined by the sequence analysis of peptides that resulted from further digestion of the central cyanogen bromide fragment. This fragment was cleaved at its glutamyl residues with staphylococcal protease and its lysyl residues with trypsin. The action of trypsin was restricted to the lysyl residues by chemical modification of the single arginyl residue of the fragment with 1,2-cyclohexanedione. The primary structure of this myoglobin proved to be identical with that from the Atlantic bottlenosed dolphin and Pacific common dolphin but differs from the myoglobins of the killer whale and pilot whale at two positions. The above sequence identities and differences reflect the close taxonomic relationship of these five species of Cetacea. PMID:454657

  3. Binding site discovery from nucleic acid sequences by discriminative learning of hidden Markov models

    PubMed Central

    Maaskola, Jonas; Rajewsky, Nikolaus

    2014-01-01

    We present a discriminative learning method for pattern discovery of binding sites in nucleic acid sequences based on hidden Markov models. Sets of positive and negative example sequences are mined for sequence motifs whose occurrence frequency varies between the sets. The method offers several objective functions, but we concentrate on mutual information of condition and motif occurrence. We perform a systematic comparison of our method and numerous published motif-finding tools. Our method achieves the highest motif discovery performance, while being faster than most published methods. We present case studies of data from various technologies, including ChIP-Seq, RIP-Chip and PAR-CLIP, of embryonic stem cell transcription factors and of RNA-binding proteins, demonstrating practicality and utility of the method. For the alternative splicing factor RBM10, our analysis finds motifs known to be splicing-relevant. The motif discovery method is implemented in the free software package Discrover. It is applicable to genome- and transcriptome-scale data, makes use of available repeat experiments and aside from binary contrasts also more complex data configurations can be utilized. PMID:25389269

  4. Automated carboxy-terminal sequence analysis of peptides and proteins using diphenyl phosphoroisothiocyanatidate.

    PubMed Central

    Bailey, J. M.; Nikfarjam, F.; Shenoy, N. R.; Shively, J. E.

    1992-01-01

    peptides covalently attached to carboxylic acid-modified polyethylene and proteins (200 pmol to 5 nmol) noncovalently applied to Zitex (porous Teflon). The generality of our automated C-terminal sequencing methodology was examined by sequencing model peptides containing all 20 of the common amino acids. All of the amino acids tested were found to sequence in good yield except for proline, which was found not to be capable of derivatization. In spite of this limitation, the methodology should be a valuable tool for the C-terminal sequence analysis of peptides and proteins.(ABSTRACT TRUNCATED AT 400 WORDS) PMID:1304893

  5. Mass spectrometric detection of the amino acid sequence polymorphism of the hepatitis C virus antigen.

    PubMed

    Kaysheva, A L; Ivanov, Yu D; Frantsuzov, P A; Krohin, N V; Pavlova, T I; Uchaikin, V F; Konev, V А; Kovalev, O B; Ziborov, V S; Archakov, A I

    2016-03-01

    A method for detection and identification of the hepatitis C virus antigen (HCVcoreAg) in human serum with consideration for possible amino acid substitutions is proposed. The method is based on a combination of biospecific capturing and concentrating of the target protein on the surface of the chip for atomic force microscope (AFM chip) with subsequent protein identification by tandem mass spectrometric (MS/MS) analysis. Biospecific AFM-capturing of viral particles containing HCVcoreAg from serum samples was performed by use of AFM chips with monoclonal antibodies (anti-HCVcore) covalently immobilized on the surface. Biospecific complexes were registered and counted by AFM. Further MS/MS analysis allowed to reliably identify the HCVcoreAg in the complexes formed on the AFM chip surface. Analysis of MS/MS spectra, with the account taken of the possible polymorphisms in the amino acid sequence of the HCVcoreAg, enabled us to increase the number of identified peptides. PMID:26773170

  6. Analysis of fatty acid content and composition in microalgae.

    PubMed

    Breuer, Guido; Evers, Wendy A C; de Vree, Jeroen H; Kleinegris, Dorinde M M; Martens, Dirk E; Wijffels, René H; Lamers, Packo P

    2013-01-01

    A method to determine the content and composition of total fatty acids present in microalgae is described. Fatty acids are a major constituent of microalgal biomass. These fatty acids can be present in different acyl-lipid classes. Especially the fatty acids present in triacylglycerol (TAG) are of commercial interest, because they can be used for production of transportation fuels, bulk chemicals, nutraceuticals (ω-3 fatty acids), and food commodities. To develop commercial applications, reliable analytical methods for quantification of fatty acid content and composition are needed. Microalgae are single cells surrounded by a rigid cell wall. A fatty acid analysis method should provide sufficient cell disruption to liberate all acyl lipids and the extraction procedure used should be able to extract all acyl lipid classes. With the method presented here all fatty acids present in microalgae can be accurately and reproducibly identified and quantified using small amounts of sample (5 mg) independent of their chain length, degree of unsaturation, or the lipid class they are part of. This method does not provide information about the relative abundance of different lipid classes, but can be extended to separate lipid classes from each other. The method is based on a sequence of mechanical cell disruption, solvent based lipid extraction, transesterification of fatty acids to fatty acid methyl esters (FAMEs), and quantification and identification of FAMEs using gas chromatography (GC-FID). A TAG internal standard (tripentadecanoin) is added prior to the analytical procedure to correct for losses during extraction and incomplete transesterification. PMID:24121679

  7. Analysis of Fatty Acid Content and Composition in Microalgae

    PubMed Central

    Breuer, Guido; Evers, Wendy A. C.; de Vree, Jeroen H.; Kleinegris, Dorinde M. M.; Martens, Dirk E.; Wijffels, René H.; Lamers, Packo P.

    2013-01-01

    A method to determine the content and composition of total fatty acids present in microalgae is described. Fatty acids are a major constituent of microalgal biomass. These fatty acids can be present in different acyl-lipid classes. Especially the fatty acids present in triacylglycerol (TAG) are of commercial interest, because they can be used for production of transportation fuels, bulk chemicals, nutraceuticals (ω-3 fatty acids), and food commodities. To develop commercial applications, reliable analytical methods for quantification of fatty acid content and composition are needed. Microalgae are single cells surrounded by a rigid cell wall. A fatty acid analysis method should provide sufficient cell disruption to liberate all acyl lipids and the extraction procedure used should be able to extract all acyl lipid classes. With the method presented here all fatty acids present in microalgae can be accurately and reproducibly identified and quantified using small amounts of sample (5 mg) independent of their chain length, degree of unsaturation, or the lipid class they are part of. This method does not provide information about the relative abundance of different lipid classes, but can be extended to separate lipid classes from each other. The method is based on a sequence of mechanical cell disruption, solvent based lipid extraction, transesterification of fatty acids to fatty acid methyl esters (FAMEs), and quantification and identification of FAMEs using gas chromatography (GC-FID). A TAG internal standard (tripentadecanoin) is added prior to the analytical procedure to correct for losses during extraction and incomplete transesterification. PMID:24121679

  8. Draft Genome Sequences of Two Novel Acidimicrobiaceae Members from an Acid Mine Drainage Biofilm Metagenome.

    PubMed

    Pinto, Ameet J; Sharp, Jonathan O; Yoder, Michael J; Almstrand, Robert

    2016-01-01

    Bacteria belonging to the family Acidimicrobiaceae are frequently encountered in heavy metal-contaminated acidic environments. However, their phylogenetic and metabolic diversity is poorly resolved. We present draft genome sequences of two novel and phylogenetically distinct Acidimicrobiaceae members assembled from an acid mine drainage biofilm metagenome. PMID:26769942

  9. Draft Genome Sequences of Two Novel Acidimicrobiaceae Members from an Acid Mine Drainage Biofilm Metagenome

    PubMed Central

    Pinto, Ameet J.; Sharp, Jonathan O.; Yoder, Michael J.

    2016-01-01

    Bacteria belonging to the family Acidimicrobiaceae are frequently encountered in heavy metal-contaminated acidic environments. However, their phylogenetic and metabolic diversity is poorly resolved. We present draft genome sequences of two novel and phylogenetically distinct Acidimicrobiaceae members assembled from an acid mine drainage biofilm metagenome. PMID:26769942

  10. Scalable Kernel Methods and Algorithms for General Sequence Analysis

    ERIC Educational Resources Information Center

    Kuksa, Pavel

    2011-01-01

    Analysis of large-scale sequential data has become an important task in machine learning and pattern recognition, inspired in part by numerous scientific and technological applications such as the document and text classification or the analysis of biological sequences. However, current computational methods for sequence comparison still lack…

  11. Relationships among genera of the Saccharomycotina from multigene sequence analysis

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Most known species of the subphylum Saccharomycotina (budding ascomycetous yeasts) have now been placed in phylogenetically defined clades following multigene sequence analysis. Terminal clades, which are usually well supported from bootstrap analysis, are viewed as phylogenetically circumscribed ge...

  12. Establishing a framework for comparative analysis of genome sequences

    SciTech Connect

    Bansal, A.K.

    1995-06-01

    This paper describes a framework and a high-level language toolkit for comparative analysis of genome sequence alignment The framework integrates the information derived from multiple sequence alignment and phylogenetic tree (hypothetical tree of evolution) to derive new properties about sequences. Multiple sequence alignments are treated as an abstract data type. Abstract operations have been described to manipulate a multiple sequence alignment and to derive mutation related information from a phylogenetic tree by superimposing parsimonious analysis. The framework has been applied on protein alignments to derive constrained columns (in a multiple sequence alignment) that exhibit evolutionary pressure to preserve a common property in a column despite mutation. A Prolog toolkit based on the framework has been implemented and demonstrated on alignments containing 3000 sequences and 3904 columns.

  13. Phylogenetic analysis of Ostreococcus virus sequences from the Patagonian Coast.

    PubMed

    Manrique, Julieta M; Calvo, Andrea Y; Jones, Leandro R

    2012-10-01

    A phylogenetic analysis of new Ostreococcus virus (OV) sequences from the Patagonian Coast, Argentina, and homologous sequences from public databases was performed. This analysis showed that the Patagonian sequences represented a divergent viral clade and that the rest of OV sequences analyzed here were clustered into six additional phylogenetic groups. Analyses of 18S gene libraries supported a close relationship of the Patagonian Ostreococcus host with clade A sequences described elsewhere, corroborating previous studies indicating that clade A strains are ubiquitous. Besides the Patagonian OV sequences, several phylogenetic groupings were linked to particular geographic locations, suggesting a role for allopatric cladogenesis in viral diversification. However, and in agreement with previous observations, other viral lineages included sequences with diverse geographic origins. These findings, together with analyses of ancestral trait trajectories performed here, are consistent with an evolutionary dynamics in which geographical isolation has a role in OV diversification but can be followed by rapid dispersion to remote places. PMID:22674355

  14. UNIT 11.10 N-Terminal Sequence Analysis of Proteins and Peptides

    PubMed Central

    Speicher, Kaye D.; Gorman, Nicole; Speicher, David W.

    2009-01-01

    Automated N-terminal sequence analysis involves a series of chemical reactions that derivatize and remove one amino acid at a time from the N-terminal of purified peptides or intact proteins. At least several pmoles of a purified protein or 10 to 20 pmoles of a purified peptide with an unmodified N-terminal is required in order to obtain useful sequence information. In recent years the demand for N-terminal sequencing has decreased substantially as some applications for protein identification and characterization can now be more effectively performed using mass spectrometry. However, N-terminal sequencing remains the method of choice for verifying the N-terminal boundary of recombinant proteins, determining the N-terminal of protease-resistant domains, identifying proteins isolated from species where most of the genome has not yet been sequenced, and mapping modified or crosslinked sites in proteins that prove to be refractory to analysis by mass spectrometry. PMID:18429102

  15. Nucleotide and predicted amino acid sequence of a cDNA clone encoding part of human transketolase.

    PubMed

    Abedinia, M; Layfield, R; Jones, S M; Nixon, P F; Mattick, J S

    1992-03-31

    Transketolase is a key enzyme in the pentose-phosphate pathway which has been implicated in the latent human genetic disease, Wernicke-Korsakoff syndrome. Here we report the cloning and partial characterisation of the coding sequences encoding human transketolase from a human brain cDNA library. The library was screened with oligonucleotide probes based on the amino acid sequence of proteolytic fragments of the purified protein. Northern blots showed that the transketolase mRNA is approximately 2.2 kb, close to the minimum expected, of which approximately 60% was represented in the largest cDNA clone. Sequence analysis of the transketolase coding sequences reveals a number of homologies with related enzymes from other species. PMID:1567394

  16. 5S ribosomal ribonucleic acid sequences in Bacteroides and Fusobacterium: evolutionary relationships within these genera and among eubacteria in general

    NASA Technical Reports Server (NTRS)

    Van den Eynde, H.; De Baere, R.; Shah, H. N.; Gharbia, S. E.; Fox, G. E.; Michalik, J.; Van de Peer, Y.; De Wachter, R.

    1989-01-01

    The 5S ribosomal ribonucleic acid (rRNA) sequences were determined for Bacteroides fragilis, Bacteroides thetaiotaomicron, Bacteroides capillosus, Bacteroides veroralis, Porphyromonas gingivalis, Anaerorhabdus furcosus, Fusobacterium nucleatum, Fusobacterium mortiferum, and Fusobacterium varium. A dendrogram constructed by a clustering algorithm from these sequences, which were aligned with all other hitherto known eubacterial 5S rRNA sequences, showed differences as well as similarities with respect to results derived from 16S rRNA analyses. In the 5S rRNA dendrogram, Bacteroides clustered together with Cytophaga and Fusobacterium, as in 16S rRNA analyses. Intraphylum relationships deduced from 5S rRNAs suggested that Bacteroides is specifically related to Cytophaga rather than to Fusobacterium, as was suggested by 16S rRNA analyses. Previous taxonomic considerations concerning the genus Bacteroides, based on biochemical and physiological data, were confirmed by the 5S rRNA sequence analysis.

  17. Clostridium sticklandii, a specialist in amino acid degradation:revisiting its metabolism through its genome sequence

    PubMed Central

    2010-01-01

    Background Clostridium sticklandii belongs to a cluster of non-pathogenic proteolytic clostridia which utilize amino acids as carbon and energy sources. Isolated by T.C. Stadtman in 1954, it has been generally regarded as a "gold mine" for novel biochemical reactions and is used as a model organism for studying metabolic aspects such as the Stickland reaction, coenzyme-B12- and selenium-dependent reactions of amino acids. With the goal of revisiting its carbon, nitrogen, and energy metabolism, and comparing studies with other clostridia, its genome has been sequenced and analyzed. Results C. sticklandii is one of the best biochemically studied proteolytic clostridial species. Useful additional information has been obtained from the sequencing and annotation of its genome, which is presented in this paper. Besides, experimental procedures reveal that C. sticklandii degrades amino acids in a preferential and sequential way. The organism prefers threonine, arginine, serine, cysteine, proline, and glycine, whereas glutamate, aspartate and alanine are excreted. Energy conservation is primarily obtained by substrate-level phosphorylation in fermentative pathways. The reactions catalyzed by different ferredoxin oxidoreductases and the exergonic NADH-dependent reduction of crotonyl-CoA point to a possible chemiosmotic energy conservation via the Rnf complex. C. sticklandii possesses both the F-type and V-type ATPases. The discovery of an as yet unrecognized selenoprotein in the D-proline reductase operon suggests a more detailed mechanism for NADH-dependent D-proline reduction. A rather unusual metabolic feature is the presence of genes for all the enzymes involved in two different CO2-fixation pathways: C. sticklandii harbours both the glycine synthase/glycine reductase and the Wood-Ljungdahl pathways. This unusual pathway combination has retrospectively been observed in only four other sequenced microorganisms. Conclusions Analysis of the C. sticklandii genome and

  18. Two distinct ferredoxins from Rhodobacter capsulatus: complete amino acid sequences and molecular evolution.

    PubMed

    Saeki, K; Suetsugu, Y; Yao, Y; Horio, T; Marrs, B L; Matsubara, H

    1990-09-01

    Two distinct ferredoxins were purified from Rhodobacter capsulatus SB1003. Their complete amino acid sequences were determined by a combination of protease digestion, BrCN cleavage and Edman degradation. Ferredoxins I and II were composed of 64 and 111 amino acids, respectively, with molecular weights of 6,728 and 12,549 excluding iron and sulfur atoms. Both contained two Cys clusters in their amino acid sequences. The first cluster of ferredoxin I and the second cluster of ferredoxin II had a sequence, CxxCxxCxxxCP, in common with the ferredoxins found in Clostridia. The second cluster of ferredoxin I had a sequence, CxxCxxxxxxxxCxxxCM, with extra amino acids between the second and third Cys, which has been reported for other photosynthetic bacterial ferredoxins and putative ferredoxins (nif-gene products) from nitrogen-fixing bacteria, and with a unique occurrence of Met. The first cluster of ferredoxin II had a CxxCxxxxCxxxCP sequence, with two additional amino acids between the second and third Cys, a characteristics feature of Azotobacter-[3Fe-4S] [4Fe-4S]-ferredoxin. Ferredoxin II was also similar to Azotobacter-type ferredoxins with an extended carboxyl (C-) terminal sequence compared to the common Clostridium-type. The evolutionary relationship of the two together with a putative one recently found to be encoded in nifENXQ region in this bacterium [Moreno-Vivian et al. (1989) J. Bacteriol. 171, 2591-2598] is discussed. PMID:2277040

  19. Amino Acid Sequence of Anionic Peroxidase from the Windmill Palm Tree Trachycarpus fortunei

    PubMed Central

    2015-01-01

    Palm peroxidases are extremely stable and have uncommon substrate specificity. This study was designed to fill in the knowledge gap about the structures of a peroxidase from the windmill palm tree Trachycarpus fortunei. The complete amino acid sequence and partial glycosylation were determined by MALDI-top-down sequencing of native windmill palm tree peroxidase (WPTP), MALDI-TOF/TOF MS/MS of WPTP tryptic peptides, and cDNA sequencing. The propeptide of WPTP contained N- and C-terminal signal sequences which contained 21 and 17 amino acid residues, respectively. Mature WPTP was 306 amino acids in length, and its carbohydrate content ranged from 21% to 29%. Comparison to closely related royal palm tree peroxidase revealed structural features that may explain differences in their substrate specificity. The results can be used to guide engineering of WPTP and its novel applications. PMID:25383699

  20. Individual and simultaneous determination of uric acid and ascorbic acid by flow injection analysis.

    PubMed

    Almuaibed, A M; Townshend, A

    1992-11-01

    Flow injection methods for the individual and simultaneous determination of ascorbic acid and uric acid are proposed. A spectrophotometer and a miniamperometric detector are connected in sequence. The calibration graphs for uric acid obtained by measuring its absorbance at 293 nm and its current at +0.6 V are linear up to at least 80 and 70 mug/ml, respectively, with an rsd (n = 10) of 1 % for both methods at mid-range concentrations. The calibration graph for ascorbic acid with amperometric detection is linear up to 80 mg/l. with an rsd (n = 10) of 0.8% at 30 mg/l. The simultaneous determination of uric acid and ascorbic acid is based on measurement of the absorbance of uric acid at 393 nm and amperometric determination of both analytes at +0.6 V. The average relative errors of the analysis of binary mixtures of uric acid and ascorbic acid are 2.2 and 4.2%, respectively. PMID:18965554

  1. Wavelet Analysis on Symbolic Sequences and Two-Fold de Bruijn Sequences

    NASA Astrophysics Data System (ADS)

    Osipov, V. Al.

    2016-05-01

    The concept of symbolic sequences play important role in study of complex systems. In the work we are interested in ultrametric structure of the set of cyclic sequences naturally arising in theory of dynamical systems. Aimed at construction of analytic and numerical methods for investigation of clusters we introduce operator language on the space of symbolic sequences and propose an approach based on wavelet analysis for study of the cluster hierarchy. The analytic power of the approach is demonstrated by derivation of a formula for counting of two-fold de Bruijn sequences, the extension of the notion of de Bruijn sequences. Possible advantages of the developed description is also discussed in context of applied problem of construction of efficient DNA sequence assembly algorithms.

  2. Wavelet Analysis on Symbolic Sequences and Two-Fold de Bruijn Sequences

    NASA Astrophysics Data System (ADS)

    Osipov, V. Al.

    2016-07-01

    The concept of symbolic sequences play important role in study of complex systems. In the work we are interested in ultrametric structure of the set of cyclic sequences naturally arising in theory of dynamical systems. Aimed at construction of analytic and numerical methods for investigation of clusters we introduce operator language on the space of symbolic sequences and propose an approach based on wavelet analysis for study of the cluster hierarchy. The analytic power of the approach is demonstrated by derivation of a formula for counting of two-fold de Bruijn sequences, the extension of the notion of de Bruijn sequences. Possible advantages of the developed description is also discussed in context of applied problem of construction of efficient DNA sequence assembly algorithms.

  3. Sequence Analysis of the Genome of the Neodiprion sertifer Nucleopolyhedrovirus†

    PubMed Central

    Garcia-Maruniak, Alejandra; Maruniak, James E.; Zanotto, Paolo M. A.; Doumbouya, Aissa E.; Liu, Jaw-Ching; Merritt, Thomas M.; Lanoie, Jennifer S.

    2004-01-01

    The genome of the Neodiprion sertifer nucleopolyhedrovirus (NeseNPV), which infects the European pine sawfly, N. sertifer (Hymenoptera: Diprionidae), was sequenced and analyzed. The genome was 86,462 bp in size. The C+G content of 34% was lower than that of the majority of baculoviruses. A total of 90 methionine-initiated open reading frames (ORFs) with more than 50 amino acids and minimal overlapping were found. From those, 43 ORFs were homologous to other baculovirus ORFs, and 29 of these were from the 30 conserved core genes among all baculoviruses. A NeseNPV homolog to the ld130 gene, which is present in all other baculovirus genomes sequenced to date, could not be identified. Six NeseNPV ORFs were similar to non-baculovirus-related genes, one of which was a trypsin-like gene. Only one iap gene, containing a single BIR motif and a RING finger, was found in NeseNPV. Two NeseNPV ORFs (nese18 and nese19) were duplicates transcribed in opposite orientations from each other. NeseNPV did not have an AcMNPV ORF 2 homolog characterized as the baculovirus repeat ORF (bro). Six homologous regions (hrs) were located within the NeseNPV genome, each containing small palindromes embedded within direct repeats. A phylogenetic analysis was done to root the tree based upon the sequences of DNA polymerase genes of NeseNPV, 23 other baculoviruses, and other phyla. Baculovirus phylogeny was then constructed with 29 conserved genes from 24 baculovirus genomes. Culex nigripalpus nucleopolyhedrovirus (CuniNPV) was the most distantly related baculovirus, branching to the hymenopteran NeseNPV and the lepidopteran nucleopolyhedroviruses and granuloviruses. PMID:15194780

  4. Protein chemotaxonomy. XIII. Amino acid sequence of ferredoxin from Panax ginseng.

    PubMed

    Mino, Yoshiki

    2006-08-01

    The complete amino acid sequence of [2Fe-2S] ferredoxin from Panax ginseng (Araliaceae) has been determined by automated Edman degradation of the entire S-carboxymethylcysteinyl protein and of the peptides obtained by enzymatic digestion. This ferredoxin has a unique amino acid sequence, which includes an insertion of Tyr at the 3rd position from the amino-terminus and a deletion of two amino acid residues at the carboxyl terminus. This ferredoxin had 18 differences in its amino acid sequence compared to that of Petroselinum sativum (Umbelliferae). In contrast, 23-33 differences were observed compared to other dicotyledonous plants. This suggests that Panax ginseng is related taxonomically to umbelliferous plants. PMID:16880642

  5. Complete amino acid sequence and structure characterization of the taste-modifying protein, miraculin.

    PubMed

    Theerasilp, S; Hitotsuya, H; Nakajo, S; Nakaya, K; Nakamura, Y; Kurihara, Y

    1989-04-25

    The taste-modifying protein, miraculin, has the unusual property of modifying sour taste into sweet taste. The complete amino acid sequence of miraculin purified from miracle fruits by a newly developed method (Theerasilp, S., and Kurihara, Y. (1988) J. Biol. Chem. 263, 11536-11539) was determined by an automatic Edman degradation method. Miraculin was a single polypeptide with 191 amino acid residues. The calculated molecular weight based on the amino acid sequence and the carbohydrate content (13.9%) was 24,600. Asn-42 and Asn-186 were linked N-glycosidically to carbohydrate chains. High homology was found between the amino acid sequences of miraculin and soybean trypsin inhibitor. PMID:2708331

  6. Identification and sequence analysis of grain softness protein in selected wheat, rye and triticale.

    PubMed

    Kharrazi, M A S; Bobojonov, V

    2012-01-01

    Grain softness protein (GSP) is an important protein for overcoming milling and grain defenses in the innate immunity systems of cereals. The objective of this study was to evaluate and understand GSP sequences in selected wheat, rye and triticale. Using sequences for this gene from a sequence database, we performed clustering analysis to compare the sequences obtained from 3 germplasms with other studied sequences for GSP. The maximum difference between the Hirmand GSP genotype in wheat and the database sequences was 23% in EF109396 and EF109399. Most amino acid variation between the GSP sequences involved the same amino acids. The Nikita rye GSP gene showed 64% identity with DQ269918 and AY667063. The isoelectric point in the GSP of wheat and Lasko triticale was significantly higher than that of rye GSP. In addition, parameters such as optical density, grand average of hydrophobicity, percentage of hydrophobicity and hydrophilic amino acids, and number of alpha helices and beta sheets in GSP were similar in wheat and triticale but not in wheat and rye. PMID:22869084

  7. Modern Computational Techniques for the HMMER Sequence Analysis

    PubMed Central

    2013-01-01

    This paper focuses on the latest research and critical reviews on modern computing architectures, software and hardware accelerated algorithms for bioinformatics data analysis with an emphasis on one of the most important sequence analysis applications—hidden Markov models (HMM). We show the detailed performance comparison of sequence analysis tools on various computing platforms recently developed in the bioinformatics society. The characteristics of the sequence analysis, such as data and compute-intensive natures, make it very attractive to optimize and parallelize by using both traditional software approach and innovated hardware acceleration technologies. PMID:25937944

  8. Complete cDNA and derived amino acid sequence of human factor V

    SciTech Connect

    Jenny, R.J.; Pittman, D.D.; Toole, J.J.; Kriz, R.W.; Aldape, R.A.; Hewick, R.M.; Kaufman, R.J.; Mann, K.G.

    1987-07-01

    cDNA clones encoding human factor V have been isolated from an oligo(dT)-primed human fetal liver cDNA library prepared with vector Charon 21A. The cDNA sequence of factor V from three overlapping clones includes a 6672-base-pair (bp) coding region, a 90-bp 5' untranslated region, and a 163-bp 3' untranslated region within which is a poly(A)tail. The deduced amino acid sequence consists of 2224 amino acids inclusive of a 28-amino acid leader peptide. Direct comparison with human factor VIII reveals considerable homology between proteins in amino acid sequence and domain structure: a triplicated A domain and duplicated C domain show approx. 40% identity with the corresponding domains in factor VIII. As in factor VIII, the A domains of factor V share approx. 40% amino acid-sequence homology with the three highly conserved domains in ceruloplasmin. The B domain of factor V contains 35 tandem and approx. 9 additional semiconserved repeats of nine amino acids of the form Asp-Leu-Ser-Gln-Thr-Thr/Asn-Leu-Ser-Pro and 2 additional semiconserved repeats of 17 amino acids. Factor V contains 37 potential N-linked glycosylation sites, 25 of which are in the B domain, and a total of 19 cysteine residues.

  9. Peptide mapping and amino acid sequencing of two catechol 1,2-dioxygenases (CD I1 and CD I2) from Acinetobacter lwoffii K24.

    PubMed

    Kim, S I; Ha, K S

    1997-10-31

    The partial amino acid sequences of two catechol 1,2-dioxygenases (CD I1 and CD I2) from Acinetobacter lwoffii K24 have been determined by analysis of peptides after cleavages with endopeptidase Lys-C, endopeptidase Glu-C, trypsin, and chemicals (cyanogen bromide and BNPS-skatole). They include 248 amino acid sequences (4 fragments) of CD I1 and 211 amino acid sequences (5 fragments) of CD I2. Two enzymes have more than 50% sequence homology with type I catechol 1,2-dioxygenases and less than 30% sequence homology with type II catechol 1,2-dioxygenases. Two enzymes have similar hydropathy profiles in the N-terminal region, suggesting that they have similar secondary structures. PMID:9387151

  10. Stratigraphic sequence analysis of the Antler foreland

    SciTech Connect

    Silberling, N.J.; Nichols, K.M.; Macke, D.L. )

    1993-04-01

    Mid-Upper Devonian to Upper Mississippian strata in western Utah were deposited in the distal Antler foreland. They record lateral and vertical changes in depositional environments that define five successive stratigraphic sequences, each representing a third-order transgressive-regressive cycle. In ascending order, these sequences are informally named the Langenheim (LA) of late Frasnian to mid-Famennian age, the Gutschick (GU) of late Famennian to early Kinderhookian age, the Morris (MO) of late Kinderhookian age; the Sadlick (SA) of Osagean to early Meramecian age, and the Maughan (MA) of mid-Meramecian to Chesterian age. MO is widespread and recognized within carbonate rocks of the Fitchville Formation and Joana Limestone. SA formed in concert with and to the east and south of the Wendover foreland high; the Delle phosphatic event marks maximum marine flooding during SA deposition. The transgressive systems tract of MA includes rhythmic-bedded limestone in the upper part of the Deseret Limestone in west-central Utah and, farther west, the hypoxic limestone and black shale of the Skunk Spring Limestone Bed and part of the overlying Chainman Shale. Traced westward into Nevada, MA first oversteps SA and then MO. Lithostratigraphic correlation of these sequences still farther west into the Eureka thrust belt (ETB) could mean that the youngest strata truncated by the Roberts Mountains thrust belong to the MA and that this thrust is simply part of the post-Mississippian ETB. However, some strata in central Nevada that lithically resemble those of the MA are paleontologically dated as Early Mississippian, the age of sequences overstepped by MA not far to the east. Thus, at least some imbricates of the ETB may contain a sequence stratigraphy which reflects local tectonic control.

  11. Sequence analysis and structural features of the largest known protamine isolated from the sperm of the archaeogastropod Monodonta turbinata.

    PubMed

    Daban, M; Martinage, A; Kouach, M; Chiva, M; Subirana, J A; Sautière, P

    1995-06-01

    Protamine of the archaeogastropod mollusc Monodonta turbinata has been isolated and characterized. With a mass of 13,476 Da, it is the largest known protamine. Amino acid sequence of this protamine (106 residues) was established from data provided by automated sequence analysis and mass spectrometry of the protein and of its fragments. The primary structure of the NH2-terminal region exhibits repetitive sequence motifs "Basic-Ser" (mainly R-S) and both central and COOH-terminal regions are composed by arginine clusters. The amino acid sequence of Monodonta turbinata protamine shows structural similarities with other protamines from invertebrates and from birds and mammals. PMID:7643417

  12. Detection and isolation of nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1997-01-01

    A method for detecting a target nucleic acid sequence in a sample is provided using hybridization probes which competitively hybridize to a target nucleic acid. According to the method, a target nucleic acid sequence is hybridized to first and second hybridization probes which are complementary to overlapping portions of the target nucleic acid sequence, the first hybridization probe including a first complexing agent capable of forming a binding pair with a second complexing agent and the second hybridization probe including a detectable marker. The first complexing agent attached to the first hybridization probe is contacted with a second complexing agent, the second complexing agent being attached to a solid support such that when the first and second complexing agents are attached, target nucleic acid sequences hybridized to the first hybridization probe become immobilized on to the solid support. The immobilized target nucleic acids are then separated and detected by detecting the detectable marker attached to the second hybridization probe. A kit for performing the method is also provided.

  13. Detection and isolation of nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1997-04-01

    A method for detecting a target nucleic acid sequence in a sample is provided using hybridization probes which competitively hybridize to a target nucleic acid. According to the method, a target nucleic acid sequence is hybridized to first and second hybridization probes which are complementary to overlapping portions of the target nucleic acid sequence, the first hybridization probe including a first complexing agent capable of forming a binding pair with a second complexing agent and the second hybridization probe including a detectable marker. The first complexing agent attached to the first hybridization probe is contacted with a second complexing agent, the second complexing agent being attached to a solid support such that when the first and second complexing agents are attached, target nucleic acid sequences hybridized to the first hybridization probe become immobilized on to the solid support. The immobilized target nucleic acids are then separated and detected by detecting the detectable marker attached to the second hybridization probe. A kit for performing the method is also provided. 7 figs.

  14. Snake venom toxins. The amino acid sequence of toxin Vi2, a homologue of pancreatic trypsin inhibitor, from Dendroaspis polylepis polylepis (black mamba) venom.

    PubMed

    Strydom, D J

    1977-04-25

    The amino acid sequence of venom component Vi2, a protein of low toxicity from Dendroaspis polylepis polylepis venom was determined by automatic sequence analysis in combination with sequence studies on tryptic peptides. This protein, the most retarded fraction of this venom on a cation-exchange resin, is a homologue of bovine pancreatic trypsin inhibitor consisting of a single chain of 57 amino acid residues containing six half-cystine residues. The active site lysyl residue of bovine trypsin inhibitor is conserved in Vi2 although large differences are found in the rest of the molecule. PMID:857902

  15. Simultaneous analysis of biologically active aminoalkanephosphonic acids.

    PubMed

    Kudzin, Zbigniew H; Gralak, Dorota K; Andrijewski, Grzegorz; Drabowicz, Józef; Luczak, Jerzy

    2003-05-23

    A new approach for simultaneous analysis of biologically active aminoalkanephosphonic acids, namely glyphosate, phosphonoglycine, phosphonosarcosine, phosphonoalanine, phosphono-beta-alanine, phosphonohomoalanine, phosphono-gamma-homoalanine and glufosinate, is presented. This includes a preliminary 31p NMR analysis of these amino acids, their further derivatization to volatile phosphonates (phosphinates) by means of trifluoroacetic acid-trifluoroacetic anhydride-trimethyl orthoacetate reagent and subsequent analysis of derivatization products using MS and/or GC-MS (chemical ionization and/or electron impact ionization). PMID:12862383

  16. HIVE-Hexagon: High-Performance, Parallelized Sequence Alignment for Next-Generation Sequencing Data Analysis

    PubMed Central

    Santana-Quintero, Luis; Dingerdissen, Hayley; Thierry-Mieg, Jean; Mazumder, Raja; Simonyan, Vahan

    2014-01-01

    Due to the size of Next-Generation Sequencing data, the computational challenge of sequence alignment has been vast. Inexact alignments can take up to 90% of total CPU time in bioinformatics pipelines. High-performance Integrated Virtual Environment (HIVE), a cloud-based environment optimized for storage and analysis of extra-large data, presents an algorithmic solution: the HIVE-hexagon DNA sequence aligner. HIVE-hexagon implements novel approaches to exploit both characteristics of sequence space and CPU, RAM and Input/Output (I/O) architecture to quickly compute accurate alignments. Key components of HIVE-hexagon include non-redundification and sorting of sequences; floating diagonals of linearized dynamic programming matrices; and consideration of cross-similarity to minimize computations. Availability https://hive.biochemistry.gwu.edu/hive/ PMID:24918764

  17. Identification of antigen-specific B cell receptor sequences using public repertoire analysis

    PubMed Central

    Galson, Jacob D.; Rance, Richard; Parkhill, Julian; Lunter, Gerton; Pollard, Andrew J.; Kelly, Dominic F.

    2014-01-01

    High-throughput sequencing allows detailed study of the B cell receptor (BCR) repertoire post-immunization but it remains unclear to what extent the de novo identification of antigen-specific sequences from the total BCR repertoire is possible. A Hib-MenC-TT conjugate vaccine containing H. influenzae type b (Hib) and group C meningococcal (MenC) polysaccharides as well as tetanus toxoid (TT) was used to investigate the BCR repertoire of adult humans following immunization and test the hypothesis that public or convergent repertoire analysis could identify antigen specific sequences. A number of antigen-specific BCR sequences have previously been reported for Hib and TT which made a vaccine containing these 2 antigens an ideal immunological stimulus. Analysis of identical complementarity determining region (CDR)3 amino acid (AA) sequences that were shared by individuals in the post-vaccine repertoire identified a number of known Hib-specific sequences but only one previously described TT sequence. The extension of this analysis to non-identical but highly similar CDR3 AA sequences revealed a number of other TT-related sequences. The anti-Hib avidity index post-vaccination was strongly correlated with the relative frequency of Hib-specific sequences, indicating that the post-vaccination public BCR repertoire may be related to more conventional measures of immunogenicity correlating with disease protection. Analysis of public BCR repertoire provided evidence of convergent BCR evolution in individuals exposed to the same antigens. If this finding is confirmed, the public repertoire could be used for rapid and direct identification of protective antigen-specific BCR sequences from peripheral blood. PMID:25392534

  18. Similarity/Dissimilarity Analysis of Protein Sequences Based on a New Spectrum-Like Graphical Representation

    PubMed Central

    Yao, Yuhua; Yan, Shoujiang; Xu, Huimin; Han, Jianning; Nan, Xuying; He, Ping-an; Dai, Qi

    2014-01-01

    Sequence comparison is one of the foundations in bioinformatics, which can be used to study evolutionary relations among the sequences. In this study, a 2D spectrum-like graphical representation of protein sequences is presented based on the hydrophobicity scale of amino acids. The frequencies of amplitudes of 4-subsequences are adopted to characterize a spectrum-like graph, and a 17D vector is used as the descriptor of protein sequence. The χ2 value of compatibility test is performed. New similarity analysis approach is illustrated on the all protein sequences, which are encoded by the mitochondrion genome of 20 different species. Finally, comparison with the ClustalW method shows the utility of our method. PMID:25002811

  19. Initial sequencing and analysis of the human genome.

    PubMed

    Lander, E S; Linton, L M; Birren, B; Nusbaum, C; Zody, M C; Baldwin, J; Devon, K; Dewar, K; Doyle, M; FitzHugh, W; Funke, R; Gage, D; Harris, K; Heaford, A; Howland, J; Kann, L; Lehoczky, J; LeVine, R; McEwan, P; McKernan, K; Meldrim, J; Mesirov, J P; Miranda, C; Morris, W; Naylor, J; Raymond, C; Rosetti, M; Santos, R; Sheridan, A; Sougnez, C; Stange-Thomann, Y; Stojanovic, N; Subramanian, A; Wyman, D; Rogers, J; Sulston, J; Ainscough, R; Beck, S; Bentley, D; Burton, J; Clee, C; Carter, N; Coulson, A; Deadman, R; Deloukas, P; Dunham, A; Dunham, I; Durbin, R; French, L; Grafham, D; Gregory, S; Hubbard, T; Humphray, S; Hunt, A; Jones, M; Lloyd, C; McMurray, A; Matthews, L; Mercer, S; Milne, S; Mullikin, J C; Mungall, A; Plumb, R; Ross, M; Shownkeen, R; Sims, S; Waterston, R H; Wilson, R K; Hillier, L W; McPherson, J D; Marra, M A; Mardis, E R; Fulton, L A; Chinwalla, A T; Pepin, K H; Gish, W R; Chissoe, S L; Wendl, M C; Delehaunty, K D; Miner, T L; Delehaunty, A; Kramer, J B; Cook, L L; Fulton, R S; Johnson, D L; Minx, P J; Clifton, S W; Hawkins, T; Branscomb, E; Predki, P; Richardson, P; Wenning, S; Slezak, T; Doggett, N; Cheng, J F; Olsen, A; Lucas, S; Elkin, C; Uberbacher, E; Frazier, M; Gibbs, R A; Muzny, D M; Scherer, S E; Bouck, J B; Sodergren, E J; Worley, K C; Rives, C M; Gorrell, J H; Metzker, M L; Naylor, S L; Kucherlapati, R S; Nelson, D L; Weinstock, G M; Sakaki, Y; Fujiyama, A; Hattori, M; Yada, T; Toyoda, A; Itoh, T; Kawagoe, C; Watanabe, H; Totoki, Y; Taylor, T; Weissenbach, J; Heilig, R; Saurin, W; Artiguenave, F; Brottier, P; Bruls, T; Pelletier, E; Robert, C; Wincker, P; Smith, D R; Doucette-Stamm, L; Rubenfield, M; Weinstock, K; Lee, H M; Dubois, J; Rosenthal, A; Platzer, M; Nyakatura, G; Taudien, S; Rump, A; Yang, H; Yu, J; Wang, J; Huang, G; Gu, J; Hood, L; Rowen, L; Madan, A; Qin, S; Davis, R W; Federspiel, N A; Abola, A P; Proctor, M J; Myers, R M; Schmutz, J; Dickson, M; Grimwood, J; Cox, D R; Olson, M V; Kaul, R; Raymond, C; Shimizu, N; Kawasaki, K; Minoshima, S; Evans, G A; Athanasiou, M; Schultz, R; Roe, B A; Chen, F; Pan, H; Ramser, J; Lehrach, H; Reinhardt, R; McCombie, W R; de la Bastide, M; Dedhia, N; Blöcker, H; Hornischer, K; Nordsiek, G; Agarwala, R; Aravind, L; Bailey, J A; Bateman, A; Batzoglou, S; Birney, E; Bork, P; Brown, D G; Burge, C B; Cerutti, L; Chen, H C; Church, D; Clamp, M; Copley, R R; Doerks, T; Eddy, S R; Eichler, E E; Furey, T S; Galagan, J; Gilbert, J G; Harmon, C; Hayashizaki, Y; Haussler, D; Hermjakob, H; Hokamp, K; Jang, W; Johnson, L S; Jones, T A; Kasif, S; Kaspryzk, A; Kennedy, S; Kent, W J; Kitts, P; Koonin, E V; Korf, I; Kulp, D; Lancet, D; Lowe, T M; McLysaght, A; Mikkelsen, T; Moran, J V; Mulder, N; Pollara, V J; Ponting, C P; Schuler, G; Schultz, J; Slater, G; Smit, A F; Stupka, E; Szustakowki, J; Thierry-Mieg, D; Thierry-Mieg, J; Wagner, L; Wallis, J; Wheeler, R; Williams, A; Wolf, Y I; Wolfe, K H; Yang, S P; Yeh, R F; Collins, F; Guyer, M S; Peterson, J; Felsenfeld, A; Wetterstrand, K A; Patrinos, A; Morgan, M J; de Jong, P; Catanese, J J; Osoegawa, K; Shizuya, H; Choi, S; Chen, Y J; Szustakowki, J

    2001-02-15

    The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence. PMID:11237011

  20. High Throughput Sequence Analysis for Disease Resistance in Maize

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Preliminary results of a computational analysis of high throughput sequencing data from Zea mays and the fungus Aspergillus are reported. The Illumina Genome Analyzer was used to sequence RNA samples from two strains of Z. mays (Va35 and Mp313) collected over a time course as well as several specie...

  1. Trypsin inhibitors from ridged gourd (Luffa acutangula Linn.) seeds: purification, properties, and amino acid sequences.

    PubMed

    Haldar, U C; Saha, S K; Beavis, R C; Sinha, N K

    1996-02-01

    Two trypsin inhibitors, LA-1 and LA-2, have been isolated from ridged gourd (Luffa acutangula Linn.) seeds and purified to homogeneity by gel filtration followed by ion-exchange chromatography. The isoelectric point is at pH 4.55 for LA-1 and at pH 5.85 for LA-2. The Stokes radius of each inhibitor is 11.4 A. The fluorescence emission spectrum of each inhibitor is similar to that of the free tyrosine. The biomolecular rate constant of acrylamide quenching is 1.0 x 10(9) M-1 sec-1 for LA-1 and 0.8 x 10(9) M-1 sec-1 for LA-2 and that of K2HPO4 quenching is 1.6 x 10(11) M-1 sec-1 for LA-1 and 1.2 x 10(11) M-1 sec-1 for LA-2. Analysis of the circular dichroic spectra yields 40% alpha-helix and 60% beta-turn for La-1 and 45% alpha-helix and 55% beta-turn for LA-2. Inhibitors LA-1 and LA-2 consist of 28 and 29 amino acid residues, respectively. They lack threonine, alanine, valine, and tryptophan. Both inhibitors strongly inhibit trypsin by forming enzyme-inhibitor complexes at a molar ratio of unity. A chemical modification study suggests the involvement of arginine of LA-1 and lysine of LA-2 in their reactive sites. The inhibitors are very similar in their amino acid sequences, and show sequence homology with other squash family inhibitors. PMID:8924202

  2. Error analysis of deep sequencing of phage libraries: peptides censored in sequencing.

    PubMed

    Matochko, Wadim L; Derda, Ratmir

    2013-01-01

    Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N × 1 frequency vector n = ||ni||, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N × N matrix and a stochastic sampling operator (Sa). The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of Sa and use them to define the sequencing operator (Seq). Sequencing without any bias and errors is Seq = Sa IN, where IN is a N × N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (CEN), which describes elimination or statistically significant downsampling, of specific reads during the sequencing process. PMID:24416071

  3. MESSA: MEta-Server for protein Sequence Analysis

    PubMed Central

    2012-01-01

    Background Computational sequence analysis, that is, prediction of local sequence properties, homologs, spatial structure and function from the sequence of a protein, offers an efficient way to obtain needed information about proteins under study. Since reliable prediction is usually based on the consensus of many computer programs, meta-severs have been developed to fit such needs. Most meta-servers focus on one aspect of sequence analysis, while others incorporate more information, such as PredictProtein for local sequence feature predictions, SMART for domain architecture and sequence motif annotation, and GeneSilico for secondary and spatial structure prediction. However, as predictions of local sequence properties, three-dimensional structure and function are usually intertwined, it is beneficial to address them together. Results We developed a MEta-Server for protein Sequence Analysis (MESSA) to facilitate comprehensive protein sequence analysis and gather structural and functional predictions for a protein of interest. For an input sequence, the server exploits a number of select tools to predict local sequence properties, such as secondary structure, structurally disordered regions, coiled coils, signal peptides and transmembrane helices; detect homologous proteins and assign the query to a protein family; identify three-dimensional structure templates and generate structure models; and provide predictive statements about the protein's function, including functional annotations, Gene Ontology terms, enzyme classification and possible functionally associated proteins. We tested MESSA on the proteome of Candidatus Liberibacter asiaticus. Manual curation shows that three-dimensional structure models generated by MESSA covered around 75% of all the residues in this proteome and the function of 80% of all proteins could be predicted. Availability MESSA is free for non-commercial use at http://prodata.swmed.edu/MESSA/ PMID:23031578

  4. Amino acid sequence of the serine-repeat antigen (SERA) of Plasmodium falciparum determined from cloned cDNA.

    PubMed

    Bzik, D J; Li, W B; Horii, T; Inselburg, J

    1988-09-01

    We report the isolation of cDNA clones for a Plasmodium falciparum gene that encodes the complete amino acid sequence of a previously identified exported blood stage antigen. The Mr of this antigen protein had been determined by sodium dodecylsulphate-polyacrylamide gel electrophoresis analysis, by different workers, to be 113,000, 126,000, and 140,000. We show, by cDNA nucleotide sequence analysis, that this antigen gene encodes a 989 amino acid protein (111 kDa) that contains a potential signal peptide, but not a membrane anchor domain. In the FCR3 strain the serine content of the protein was 11%, of which 57% of the serine residues were localized within a 201 amino acid sequence that included 35 consecutive serine residues. The protein also contained three possible N-linked glycosylation sites and numerous possible O-linked glycosylation sites. The mRNA was abundant during late trophozoite-schizont parasite stages. We propose to identity this antigen, which had been called p126, by the acronym SERA, serine-repeat antigen, based on its complete structure. The usefulness of the cloned cDNA as a source of a possible malaria vaccine is considered in view of the previously demonstrated ability of the antigen to induce parasite-inhibitory antibodies and a protective immune response in Saimiri monkeys. PMID:2847041

  5. Conservation of Shannon's redundancy for proteins. [information theory applied to amino acid sequences

    NASA Technical Reports Server (NTRS)

    Gatlin, L. L.

    1974-01-01

    Concepts of information theory are applied to examine various proteins in terms of their redundancy in natural originators such as animals and plants. The Monte Carlo method is used to derive information parameters for random protein sequences. Real protein sequence parameters are compared with the standard parameters of protein sequences having a specific length. The tendency of a chain to contain some amino acids more frequently than others and the tendency of a chain to contain certain amino acid pairs more frequently than other pairs are used as randomness measures of individual protein sequences. Non-periodic proteins are generally found to have random Shannon redundancies except in cases of constraints due to short chain length and genetic codes. Redundant characteristics of highly periodic proteins are discussed. A degree of periodicity parameter is derived.

  6. Principal component analysis of phenolic acid spectra

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Phenolic acids are common plant metabolites that exhibit bioactive properties and have applications in functional food and animal feed formulations. The ultraviolet (UV) and infrared (IR) spectra of four closely related phenolic acid structures were evaluated by principal component analysis (PCA) to...

  7. Amino acid isotopic analysis in agricultural systems

    Technology Transfer Automated Retrieval System (TEKTRAN)

    A relatively new approach to stable isotopic analysis—referred to as compound-specific isotopic analysis (CSIA)—has emerged, centering on the measurement of 15N:14N ratios in amino acids (glutamic acid and phenylalanine). CSIA has recently been used to generate trophic position estimates among anima...

  8. Bacterial community compositions in sediment polluted by perfluoroalkyl acids (PFAAs) using Illumina high-throughput sequencing.

    PubMed

    Sun, Yajun; Wang, Tieyu; Peng, Xiawei; Wang, Pei; Lu, Yonglong

    2016-06-01

    The characterization of bacterial community compositions and the change in perfluoroalkyl acids (PFAAs) along a natural river distribution system were explored in the present study. Illumina high-throughput sequencing was used to explore bacterial community diversity and structure in sediment polluted by PFAAs from the Xiaoqing River, the area with concentrated fluorochemical facilities in China. The concentration of PFAAs was in the range of 8.44-465.60 ng/g dry weight (dw) in sediment. Perfluorooctanoic acid (PFOA) was the dominant PFAA in all samples, which accounted for 94.2 % of total PFAAs. High-level PFOA could lead to an obvious increase in relative abundance of Proteobacteria, ε-Proteobacteria, Thiobacillus, and Sulfurimonas and the decrease in relative abundance of other bacteria. Redundancy analysis revealed that PFOA played an important role in the formation of bacterial community, and PFOA at higher concentration could reduce the diversity of bacterial community. When the concentration of PFOA was below 100 ng/g dw in sediment, no significant effect on microbial community structure was observed. Thiobacillus and Sulfurimonas were positively correlated with the concentration of PFOA, suggesting that both genera were resistant to PFOA contamination. PMID:26780047

  9. Boric Acid in Kjeldahl Analysis

    ERIC Educational Resources Information Center

    Cruz, Gregorio

    2013-01-01

    The use of boric acid in the Kjeldahl determination of nitrogen is a variant of the original method widely applied in many laboratories all over the world. Its use is recommended by control organizations such as ISO, IDF, and EPA because it yields reliable and accurate results. However, the chemical principles the method is based on are not…

  10. Cloning and sequence analysis of cDNA for human cathepsin D.

    PubMed Central

    Faust, P L; Kornfeld, S; Chirgwin, J M

    1985-01-01

    An 1110-base-pair cDNA clone for human cathepsin D was obtained by screening a lambda gt10 human hepatoma G2 cDNA library with a human renin exon 3 genomic fragment. Poly(A)+ RNA blot analysis with this cathepsin D clone demonstrated a message length of about 2.2 kilobases. The partial clone was used to screen a size-selected human kidney cDNA library, from which two cathepsin D recombinant plasmids with inserts of about 2200 and 2150 base pairs were obtained. The nucleotide sequences of these clones and of the lambda gt10 clone were determined. The amino acid sequence predicted from the cDNA sequence shows that human cathepsin D consists of 412 amino acids with 20 and 44 amino acids in a pre- and a prosegment, respectively. The mature protein region shows 87% amino acid identity with porcine cathepsin D but differs in having nine additional amino acids. Two of these are at the COOH terminus; the other seven are positioned between the previously determined junction for the light and heavy chains of porcine cathepsin D. A high degree of sequence homology was observed between human cathepsin D and other aspartyl proteases, suggesting a conservation of three-dimensional structure in this family of proteins. Images PMID:3927292

  11. Characterization of N-glycosylation and amino acid sequence features of immunoglobulins from swine.

    PubMed

    Lopez, Paul G; Girard, Lauren; Buist, Marjorie; de Oliveira, Andrey Giovanni Gomes; Bodnar, Edward; Salama, Apolline; Soulillou, Jean-Paul; Perreault, Hélène

    2016-02-01

    The primary goal of this study was to develop a method to study the N-glycosylation of IgG from swine in order to detect epitopes containing N-glycolylneuraminic acid (Neu5Gc) and/or terminal galactose residues linked in α1-3 susceptible to cause xenograft-related problems. Samples of immunoglobulin were isolated from porcine serum using protein-A affinity chromatography. The eluate was then separated on electrophoretic gel, and bands corresponding to the N-glycosylated heavy chains were cut off the gel and subjected to tryptic digestion. Peptides and glycopeptides were separated by reversed phase liquid chromatography and fractions were collected for matrix-assisted laser desorption/ionization time-of-flight mass spectrometric (MALDI-TOF-MS) analysis. Overall no α1-3 galactose was detected, as demonstrated by complete susceptibility of terminal galactose residues to β-galactosidase digestion. Neu5Gc was detected on singly sialylated structures. Two major N-glycopeptides were found, EEQFNSTYR and EAQFNSTYR as determined by tandem MS (MS/MS), as previously reported by Butler et al. (Immunogenetics, 61, 2009, 209-230), who found 11 subclasses for porcine IgG. Out of the 11, ten include the sequence corresponding to EEQFNSTYR, and only one codes for EAQFNSTYR. In this study, glycosylation patterns associated with both chains were slightly different, in that EEQFNSTYR had a higher content of galactose. The last step of this study consisted of peptide-mapping the 11 reported porcine IgG sequences. Although there was considerable overlap, at least one unique tryptic peptide was found per IgG sequence. The workflow presented in this manuscript constitutes the first study to use MALDI-TOF-MS in the investigation of porcine IgG structural features. PMID:26586247

  12. Cloning and sequence analysis of candidate human natural killer-enhancing factor genes

    SciTech Connect

    Shau, H.; Butterfield, L.H.; Chiu, R.; Kim, A.

    1994-12-31

    A cytosol factor from human red blood cells enhances natural killer (NK) activity. This factor, termed NK-enhancing factor (NKEF), is a protein of 44000 M{sub r} consisting of two subunits of equal size linked by disulfide bonds. NKEF is expressed in the NK-sensitive erythroleukemic cell line K562. Using an antibody specific for NKEF as a probe for immunoblot screening, we isolated several clones from a {lambda}gt11 cDNA library of K562. Additional subcloning and sequencing revealed that the candidate NKEF cDNAs fell into one of two categories of closely related but non-identical genes, referred to as NKEF A and B. They are 88% identical in amino acid sequence and 71% identical in nucleotide sequence. Southern blot analysis suggests that there are two to three NKEF family members in the genome. Analysis of predicted amino acid sequences indicates that both NKEF A and B are cytosol proteins with several phosphorylation sites each, but that they have no glycosylation sites. They are significantly homologous to several other proteins from a wide variety of organisms ranging from prokaryotes to mammals, especially with regard to several well-conserved motifs within the amino acid sequences. The biological functions of these proteins in other species are mostly unknown, but some of them were reported to be induced by oxidative stress. Therefore, as well as for immunoregulation of NK activity, NKEF may be important for cells in coping with oxidative insults. 32 refs., 3 figs.

  13. Conversion of amino-acid sequence in proteins to classical music: search for auditory patterns

    PubMed Central

    2007-01-01

    We have converted genome-encoded protein sequences into musical notes to reveal auditory patterns without compromising musicality. We derived a reduced range of 13 base notes by pairing similar amino acids and distinguishing them using variations of three-note chords and codon distribution to dictate rhythm. The conversion will help make genomic coding sequences more approachable for the general public, young children, and vision-impaired scientists. PMID:17477882

  14. Analysis of Metagenomic Sequences: From Megabases to Terabases

    SciTech Connect

    Krypides, Nikos

    2010-06-04

    Nikos Krypides of the DOE Joint Genome Institute discusses metagenomics and the challenge of dealing with terabases of data on June 4, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

  15. Protein location prediction using atomic composition and global features of the amino acid sequence

    SciTech Connect

    Cherian, Betsy Sheena; Nair, Achuthsankar S.

    2010-01-22

    Subcellular location of protein is constructive information in determining its function, screening for drug candidates, vaccine design, annotation of gene products and in selecting relevant proteins for further studies. Computational prediction of subcellular localization deals with predicting the location of a protein from its amino acid sequence. For a computational localization prediction method to be more accurate, it should exploit all possible relevant biological features that contribute to the subcellular localization. In this work, we extracted the biological features from the full length protein sequence to incorporate more biological information. A new biological feature, distribution of atomic composition is effectively used with, multiple physiochemical properties, amino acid composition, three part amino acid composition, and sequence similarity for predicting the subcellular location of the protein. Support Vector Machines are designed for four modules and prediction is made by a weighted voting system. Our system makes prediction with an accuracy of 100, 82.47, 88.81 for self-consistency test, jackknife test and independent data test respectively. Our results provide evidence that the prediction based on the biological features derived from the full length amino acid sequence gives better accuracy than those derived from N-terminal alone. Considering the features as a distribution within the entire sequence will bring out underlying property distribution to a greater detail to enhance the prediction accuracy.

  16. Sequence Comparison and Phylogeny of Nucleotide Sequence of Coat Protein and Nucleic Acid Binding Protein of a Distinct Isolate of Shallot virus X from India.

    PubMed

    Majumder, S; Baranwal, V K

    2011-06-01

    Shallot virus X (ShVX), a type species in the genus Allexivirus of the family Alfaflexiviridae has been associated with shallot plants in India and other shallot growing countries like Russia, Germany, Netherland, and New Zealand. Coat protein (CP) and nucleic acid binding protein (NB) region of the virus was obtained by reverse transcriptase polymerase chain reaction from scales leaves of shallot bulbs. The partial cDNA contained two open reading frames encoding proteins of molecular weights of 28.66 and 14.18 kDa belonging to Flexi_CP super-family and viral NB super-family, respectively. The percent identity and phylogenetic analysis of amino acid sequences of CP and NB region of the virus associated with shallot indicated that it was a distinct isolate of ShVX. PMID:23637504

  17. The amino-acid sequence of leghemoglobin component a from Phaseolus vulgaris (kidney bean).

    PubMed

    Lehtovaara, P; Ellfolk, N

    1975-06-01

    1. Leghemoglobin component a from Phaseolus vulgaris (kidney bean) was digested with trypsin; 15 tryptic peptides and free lysine were purified and the amino acid sequences of the peptides determined. 2. The internal order of the tryptic peptides was determined by the bridge peptides obtained from the thermolytic digest and the dilute acid hydrolyzate of kidney bean leghemoglobin a; 12 thermolytic peptides and two acid hydrolysis peptides were purified and the sequences were partially or completely determined. 3. The complete amino acid sequence of kidney bean leghemoglobin a is compared to that of leghemoglobin a from soybean (Glycine max) and to some animal globins. As regards sequence, the kidney bean globin has 79% identity with the soybean globin and 21% identity with human hemoglobin gamma-chain. Seven of the 14 amino acid residues common to most globins are found in the kidney bean globin. Trp-15 and Tyr-145 are evolutionarily conserved in this globin, which confirms the concept of a common origin of animal and plant globins. PMID:809270

  18. Laser Desorption Mass Spectrometry for DNA Sequencing and Analysis

    NASA Astrophysics Data System (ADS)

    Chen, C. H. Winston; Taranenko, N. I.; Golovlev, V. V.; Isola, N. R.; Allman, S. L.

    1998-03-01

    Rapid DNA sequencing and/or analysis is critically important for biomedical research. In the past, gel electrophoresis has been the primary tool to achieve DNA analysis and sequencing. However, gel electrophoresis is a time-consuming and labor-extensive process. Recently, we have developed and used laser desorption mass spectrometry (LDMS) to achieve sequencing of ss-DNA longer than 100 nucleotides. With LDMS, we succeeded in sequencing DNA in seconds instead of hours or days required by gel electrophoresis. In addition to sequencing, we also applied LDMS for the detection of DNA probes for hybridization LDMS was also used to detect short tandem repeats for forensic applications. Clinical applications for disease diagnosis such as cystic fibrosis caused by base deletion and point mutation have also been demonstrated. Experimental details will be presented in the meeting. abstract.

  19. The DNA sequence and comparative analysis of human chromosome 10.

    PubMed

    Deloukas, P; Earthrowl, M E; Grafham, D V; Rubenfield, M; French, L; Steward, C A; Sims, S K; Jones, M C; Searle, S; Scott, C; Howe, K; Hunt, S E; Andrews, T D; Gilbert, J G R; Swarbreck, D; Ashurst, J L; Taylor, A; Battles, J; Bird, C P; Ainscough, R; Almeida, J P; Ashwell, R I S; Ambrose, K D; Babbage, A K; Bagguley, C L; Bailey, J; Banerjee, R; Bates, K; Beasley, H; Bray-Allen, S; Brown, A J; Brown, J Y; Burford, D C; Burrill, W; Burton, J; Cahill, P; Camire, D; Carter, N P; Chapman, J C; Clark, S Y; Clarke, G; Clee, C M; Clegg, S; Corby, N; Coulson, A; Dhami, P; Dutta, I; Dunn, M; Faulkner, L; Frankish, A; Frankland, J A; Garner, P; Garnett, J; Gribble, S; Griffiths, C; Grocock, R; Gustafson, E; Hammond, S; Harley, J L; Hart, E; Heath, P D; Ho, T P; Hopkins, B; Horne, J; Howden, P J; Huckle, E; Hynds, C; Johnson, C; Johnson, D; Kana, A; Kay, M; Kimberley, A M; Kershaw, J K; Kokkinaki, M; Laird, G K; Lawlor, S; Lee, H M; Leongamornlert, D A; Laird, G; Lloyd, C; Lloyd, D M; Loveland, J; Lovell, J; McLaren, S; McLay, K E; McMurray, A; Mashreghi-Mohammadi, M; Matthews, L; Milne, S; Nickerson, T; Nguyen, M; Overton-Larty, E; Palmer, S A; Pearce, A V; Peck, A I; Pelan, S; Phillimore, B; Porter, K; Rice, C M; Rogosin, A; Ross, M T; Sarafidou, T; Sehra, H K; Shownkeen, R; Skuce, C D; Smith, M; Standring, L; Sycamore, N; Tester, J; Thorpe, A; Torcasso, W; Tracey, A; Tromans, A; Tsolas, J; Wall, M; Walsh, J; Wang, H; Weinstock, K; West, A P; Willey, D L; Whitehead, S L; Wilming, L; Wray, P W; Young, L; Chen, Y; Lovering, R C; Moschonas, N K; Siebert, R; Fechtel, K; Bentley, D; Durbin, R; Hubbard, T; Doucette-Stamm, L; Beck, S; Smith, D R; Rogers, J

    2004-05-27

    The finished sequence of human chromosome 10 comprises a total of 131,666,441 base pairs. It represents 99.4% of the euchromatic DNA and includes one megabase of heterochromatic sequence within the pericentromeric region of the short and long arm of the chromosome. Sequence annotation revealed 1,357 genes, of which 816 are protein coding, and 430 are pseudogenes. We observed widespread occurrence of overlapping coding genes (either strand) and identified 67 antisense transcripts. Our analysis suggests that both inter- and intrachromosomal segmental duplications have impacted on the gene count on chromosome 10. Multispecies comparative analysis indicated that we can readily annotate the protein-coding genes with current resources. We estimate that over 95% of all coding exons were identified in this study. Assessment of single base changes between the human chromosome 10 and chimpanzee sequence revealed nonsense mutations in only 21 coding genes with respect to the human sequence. PMID:15164054

  20. Initial sequencing and comparative analysis of the mouse genome

    SciTech Connect

    Waterston, Robert H.; Lindblad-Toh, Kerstin; Birney, Ewan; Rogers, Jane; Abril, Josep F.; Agarwal, Pankaj; Agarwala, Richa; Ainscough, Rachel; Alexandersson, Marina; An, Peter; Antonarakis, Stylianos E.; Attwood, John; Baertsch, Robert; Bailey, Jonathon; Barlow, Karen; Beck, Stephan; Berry, Eric; Birren, Bruce; Bloom, Toby; Bork, Peer; Botcherby, Marc; Bray, Nicolas; Brent, Michael R.; Brown, Daniel G.; Brown, Stephen D.; Bult, Carol; Burton, John; Butler, Jonathan; Campbell, Robert D.; Carninci, Piero; Cawley, Simon; Chiaromonte, Francesca; Chinwalla, Asif T.; Church, Deanna M.; Clamp, Michele; Clee, Christopher; Collins, Francis S.; Cook, Lisa L.; Copley, Richard R.; Coulson, Alan; Couronne, Olivier; Cuff, James; Curwen, Val; Cutts, Tim; Daly, Mark; David, Robert; Davies, Joy; Delehaunty, Kimberly D.; Deri, Justin; Dermitzakis, Emmanouil T.; Dewey, Colin; Dickens, Nicholas J.; Diekhans, Mark; Dodge, Sheila; Dubchak, Inna; Dunn, Diane M.; Eddy, Sean R.; Elnitski, Laura; Emes, Richard D.; Eswara, Pallavi; Eyras, Eduardo; Felsenfeld, Adam; Fewell, Ginger A.; Flicek, Paul; Foley, Karen; Frankel, Wayne N.; Fulton, Lucinda A.; Fulton, Robert S.; Furey, Terrence S.; Gage, Diane; Gibbs, Richard A.; Glusman, Gustavo; Gnerre, Sante; Goldman, Nick; Goodstadt, Leo; Grafham, Darren; Graves, Tina A.; Green, Eric D.; Gregory, Simon; Guigo, Roderic; Guyer, Mark; Hardison, Ross C.; Haussler, David; Hayashizaki, Yoshihide; Hillier, LaDeana W.; Hinrichs, Angela; Hlavina, Wratko; Holzer, Timothy; Hsu, Fan; Hua, Axin; Hubbard, Tim; Hunt, Adrienne; Jackson, Ian; Jaffe, David B.; Johnson, L. Steven; Jones, Matthew; Jones, Thomas A.; Joy, Ann; Kamal, Michael; Karlsson, Elinor K.; Karolchik, Donna; Kasprzyk, Arkadiusz; Kawai, Jun; Keibler, Evan; Kells, Cristyn; Kent, W. James; Kirby, Andrew; Kolbe, Diana L.; Korf, Ian; Kucherlapati, Raju S.; Kulbokas III, Edward J.; Kulp, David; Landers, Tom; Leger, J.P.; Leonard, Steven; Letunic, Ivica; Levine, Rosie; et al.

    2002-12-15

    The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

  1. The DNA sequence and comparative analysis of human chromosome 20.

    PubMed

    Deloukas, P; Matthews, L H; Ashurst, J; Burton, J; Gilbert, J G; Jones, M; Stavrides, G; Almeida, J P; Babbage, A K; Bagguley, C L; Bailey, J; Barlow, K F; Bates, K N; Beard, L M; Beare, D M; Beasley, O P; Bird, C P; Blakey, S E; Bridgeman, A M; Brown, A J; Buck, D; Burrill, W; Butler, A P; Carder, C; Carter, N P; Chapman, J C; Clamp, M; Clark, G; Clark, L N; Clark, S Y; Clee, C M; Clegg, S; Cobley, V E; Collier, R E; Connor, R; Corby, N R; Coulson, A; Coville, G J; Deadman, R; Dhami, P; Dunn, M; Ellington, A G; Frankland, J A; Fraser, A; French, L; Garner, P; Grafham, D V; Griffiths, C; Griffiths, M N; Gwilliam, R; Hall, R E; Hammond, S; Harley, J L; Heath, P D; Ho, S; Holden, J L; Howden, P J; Huckle, E; Hunt, A R; Hunt, S E; Jekosch, K; Johnson, C M; Johnson, D; Kay, M P; Kimberley, A M; King, A; Knights, A; Laird, G K; Lawlor, S; Lehvaslaiho, M H; Leversha, M; Lloyd, C; Lloyd, D M; Lovell, J D; Marsh, V L; Martin, S L; McConnachie, L J; McLay, K; McMurray, A A; Milne, S; Mistry, D; Moore, M J; Mullikin, J C; Nickerson, T; Oliver, K; Parker, A; Patel, R; Pearce, T A; Peck, A I; Phillimore, B J; Prathalingam, S R; Plumb, R W; Ramsay, H; Rice, C M; Ross, M T; Scott, C E; Sehra, H K; Shownkeen, R; Sims, S; Skuce, C D; Smith, M L; Soderlund, C; Steward, C A; Sulston, J E; Swann, M; Sycamore, N; Taylor, R; Tee, L; Thomas, D W; Thorpe, A; Tracey, A; Tromans, A C; Vaudin, M; Wall, M; Wallis, J M; Whitehead, S L; Whittaker, P; Willey, D L; Williams, L; Williams, S A; Wilming, L; Wray, P W; Hubbard, T; Durbin, R M; Bentley, D R; Beck, S; Rogers, J

    The finished sequence of human chromosome 20 comprises 59,187,298 base pairs (bp) and represents 99.4% of the euchromatic DNA. A single contig of 26 megabases (Mb) spans the entire short arm, and five contigs separated by gaps totalling 320 kb span the long arm of this metacentric chromosome. An additional 234,339 bp of sequence has been determined within the pericentromeric region of the long arm. We annotated 727 genes and 168 pseudogenes in the sequence. About 64% of these genes have a 5' and a 3' untranslated region and a complete open reading frame. Comparative analysis of the sequence of chromosome 20 to whole-genome shotgun-sequence data of two other vertebrates, the mouse Mus musculus and the puffer fish Tetraodon nigroviridis, provides an independent measure of the efficiency of gene annotation, and indicates that this analysis may account for more than 95% of all coding exons and almost all genes. PMID:11780052

  2. First draft genome sequencing of indole acetic acid producing and plant growth promoting fungus Preussia sp. BSL10.

    PubMed

    Khan, Abdul Latif; Asaf, Sajjad; Khan, Abdur Rahim; Al-Harrasi, Ahmed; Al-Rawahi, Ahmed; Lee, In-Jung

    2016-05-10

    Preussia sp. BSL10, family Sporormiaceae, was actively producing phytohormone (indole-3-acetic acid) and extra-cellular enzymes (phosphatases and glucosidases). The fungus was also promoting the growth of arid-land tree-Boswellia sacra. Looking at such prospects of this fungus, we sequenced its draft genome for the first time. The Illumina based sequence analysis reveals an approximate genome size of 31.4Mbp for Preussia sp. BSL10. Based on ab initio gene prediction, total 32,312 coding sequences were annotated consisting of 11,967 coding genes, pseudogenes, and 221 tRNA genes. Furthermore, 321 carbohydrate-active enzymes were predicted and classified into many functional families. PMID:26995610

  3. A simple ligation-based method to increase the information density in sequencing reactions used to deconvolute nucleic acid selections

    PubMed Central

    Childs-Disney, Jessica L.; Disney, Matthew D.

    2008-01-01

    Herein, a method is described to increase the information density of sequencing experiments used to deconvolute nucleic acid selections. The method is facile and should be applicable to any selection experiment. A critical feature of this method is the use of biotinylated primers to amplify and encode a BamHI restriction site on both ends of a PCR product. After amplification, the PCR reaction is captured onto streptavidin resin, washed, and digested directly on the resin. Resin-based digestion affords clean product that is devoid of partially digested products and unincorporated PCR primers. The product's complementary ends are annealed and ligated together with T4 DNA ligase. Analysis of ligation products shows formation of concatemers of different length and little detectable monomer. Sequencing results produced data that routinely contained three to four copies of the library. This method allows for more efficient formulation of structure-activity relationships since multiple active sequences are identified from a single clone. PMID:18065718

  4. DNA Sequence and Expression Variation of Hop (Humulus lupulus) Valerophenone Synthase (VPS), a Key Gene in Bitter Acid Biosynthesis

    PubMed Central

    Castro, Consuelo B.; Whittock, Lucy D.; Whittock, Simon P.; Leggett, Grey; Koutoulis, Anthony

    2008-01-01

    Background The hop plant (Humulus lupulus) is a source of many secondary metabolites, with bitter acids essential in the beer brewing industry and others having potential applications for human health. This study investigated variation in DNA sequence and gene expression of valerophenone synthase (VPS), a key gene in the bitter acid biosynthesis pathway of hop. Methods Sequence variation was studied in 12 varieties, and expression was analysed in four of the 12 varieties in a series across the development of the hop cone. Results Nine single nucleotide polymorphisms (SNPs) were detected in VPS, seven of which were synonymous. The two non-synonymous polymorphisms did not appear to be related to typical bitter acid profiles of the varieties studied. However, real-time quantitative reverse-transcription polymerase chain reaction (qRT-PCR) analysis of VPS expression during hop cone development showed a clear link with the bitter acid content. The highest levels of VPS expression were observed in two triploid varieties, ‘Symphony’ and ‘Ember’, which typically have high bitter acid levels. Conclusions In all hop varieties studied, VPS expression was lowest in the leaves and an increase in expression was consistently observed during the early stages of cone development. PMID:18519445

  5. Deep sequencing and human antibody repertoire analysis.

    PubMed

    Boyd, Scott D; Crowe, James E

    2016-06-01

    In the past decade, high-throughput DNA sequencing (HTS) methods and improved approaches for isolating antigen-specific B cells and their antibody genes have been applied in many areas of human immunology. This work has greatly increased our understanding of human antibody repertoires and the specific clones responsible for protective immunity or immune-mediated pathogenesis. Although the principles underlying selection of individual B cell clones in the intact immune system are still under investigation, the combination of more powerful genetic tracking of antibody lineage development and functional testing of the encoded proteins promises to transform therapeutic antibody discovery and optimization. Here, we highlight recent advances in this fast-moving field. PMID:27065089

  6. Draft genome sequence of the docosahexaenoic acid producing thraustochytrid Aurantiochytrium sp. T66.

    PubMed

    Liu, Bin; Ertesvåg, Helga; Aasen, Inga Marie; Vadstein, Olav; Brautaset, Trygve; Heggeset, Tonje Marita Bjerkan

    2016-06-01

    Thraustochytrids are unicellular, marine protists, and there is a growing industrial interest in these organisms, particularly because some species, including strains belonging to the genus Aurantiochytrium, accumulate high levels of docosahexaenoic acid (DHA). Here, we report the draft genome sequence of Aurantiochytrium sp. T66 (ATCC PRA-276), with a size of 43 Mbp, and 11,683 predicted protein-coding sequences. The data has been deposited at DDBJ/EMBL/Genbank under the accession LNGJ00000000. The genome sequence will contribute new insight into DHA biosynthesis and regulation, providing a basis for metabolic engineering of thraustochytrids. PMID:27222814

  7. A computer program for the estimation of protein and nucleic acid sequence diversity in random point mutagenesis libraries

    PubMed Central

    Volles, Michael J.; Lansbury, Peter T.

    2005-01-01

    A computer program for the generation and analysis of in silico random point mutagenesis libraries is described. The program operates by mutagenizing an input nucleic acid sequence according to mutation parameters specified by the user for each sequence position and type of point mutation. The program can mimic almost any type of random mutagenesis library, including those produced via error-prone PCR (ep-PCR), mutator Escherichia coli strains, chemical mutagenesis, and doped or random oligonucleotide synthesis. The program analyzes the generated nucleic acid sequences and/or the associated protein library to produce several estimates of library diversity (number of unique sequences, point mutations, and single point mutants) and the rate of saturation of these diversities during experimental screening or selection of clones. This information allows one to select the optimal screen size for a given mutagenesis library, necessary to efficiently obtain a certain coverage of the sequence-space. The program also reports the abundance of each specific protein mutation at each sequence position, which is useful as a measure of the level and type of mutation bias in the library. Alternatively, one can use the program to evaluate the relative merits of preexisting libraries, or to examine various hypothetical mutation schemes to determine the optimal method for creating a library that serves the screen/selection of interest. Simulated libraries of at least 109 sequences are accessible by the numerical algorithm with currently available personal computers; an analytical algorithm is also available which can rapidly calculate a subset of the numerical statistics in libraries of arbitrarily large size. A multi-type double-strand stochastic model of ep-PCR is developed in an appendix to demonstrate the applicability of the algorithm to amplifying mutagenesis procedures. Estimators of DNA polymerase mutation-type-specific error rates are derived using the model. Analyses of an

  8. Complete sequence and genomic analysis of murine gammaherpesvirus 68.

    PubMed Central

    Virgin, H W; Latreille, P; Wamsley, P; Hallsworth, K; Weck, K E; Dal Canto, A J; Speck, S H

    1997-01-01

    Murine gammaherpesvirus 68 (gammaHV68) infects mice, thus providing a tractable small-animal model for analysis of the acute and chronic pathogenesis of gammaherpesviruses. To facilitate molecular analysis of gammaHV68 pathogenesis, we have sequenced the gammaHV68 genome. The genome contains 118,237 bp of unique sequence flanked by multiple copies of a 1,213-bp terminal repeat. The GC content of the unique portion of the genome is 46%, while the GC content of the terminal repeat is 78%. The unique portion of the genome is estimated to encode at least 80 genes and is largely colinear with the genomes of Kaposi's sarcoma herpesvirus (KSHV; also known as human herpesvirus 8), herpesvirus saimiri (HVS), and Epstein-Barr virus (EBV). We detected 63 open reading frames (ORFs) homologous to HVS and KSHV ORFs and used the HVS/KSHV numbering system to designate these ORFs. gammaHV68 shares with HVS and KSHV ORFs homologous to a complement regulatory protein (ORF 4), a D-type cyclin (ORF 72), and a G-protein-coupled receptor with close homology to the interleukin-8 receptor (ORF 74). One ORF (K3) was identified in gammaHV68 as homologous to both ORFs K3 and K5 of KSHV and contains a domain found in a bovine herpesvirus 4 major immediate-early protein. We also detected 16 methionine-initiated ORFs predicted to encode proteins at least 100 amino acids in length that are unique to gammaHV68 (ORFs M1 to 14). ORF M1 has striking homology to poxvirus serpins, while ORF M11 encodes a potential homolog of Bcl-2-like molecules encoded by other gammaherpesviruses (gene 16 of HVS and KSHV and the BHRF1 gene of EBV). In addition, clustered at the left end of the unique region are eight sequences with significant homology to bacterial tRNAs. The unique region of the genome contains two internal repeats: a 40-bp repeat located between bp 26778 and 28191 in the genome and a 100-bp repeat located between bp 98981 and 101170. Analysis of the gammaHV68, HVS, EBV, and KSHV genomes demonstrated

  9. Canine preprorelaxin: nucleic acid sequence and localization within the canine placenta.

    PubMed

    Klonisch, T; Hombach-Klonisch, S; Froehlich, C; Kauffold, J; Steger, K; Steinetz, B G; Fischer, B

    1999-03-01

    Employing uteroplacental tissue at Day 35 of gestation, we determined the nucleic acid sequence of canine preprorelaxin using reverse transcription- and rapid amplification of cDNA ends-polymerase chain reaction. Canine preprorelaxin cDNA consisted of 534 base pairs encoding a protein of 177 amino acids with a signal peptide of 25 amino acids (aa), a B domain of 35 aa, a C domain of 93 aa, and an A domain of 24 aa. The putative receptor binding region in the N'-terminal part of the canine relaxin B domain GRDYVR contained two substitutions from the classical motif (E-->D and L-->Y). Canine preprorelaxin shared highest homology with porcine and equine preprorelaxin. Northern analysis revealed a 1-kilobase transcript present in total RNA of canine uteroplacental tissue but not of kidney tissue. Uteroplacental tissue from two bitches each at Days 30 and 35 of gestation were studied by in situ hybridization to localize relaxin mRNA. Immunohistochemistry for relaxin, cytokeratin, vimentin, and von Willebrand factor was performed on uteroplacental tissue at Day 30 of gestation. The basal cell layer at the core of the chorionic villi was devoid of relaxin mRNA and immunoreactive relaxin or vimentin but was immunopositive for cytokeratin and identified as cytotrophoblast cells. The cell layer surrounding the chorionic villi displayed specific hybridization signals for relaxin mRNA and immunoreactivity for relaxin and cytokeratin but not for vimentin, and was identified as syncytiotrophoblast. Those areas of the chorioallantoic tissue with most intense relaxin immunoreactivity were highly vascularized as demonstrated by immunoreactive von Willebrand factor expressed on vascular endothelium. The uterine glands and nonplacental uterine areas of the canine zonary girdle placenta were devoid of relaxin mRNA and relaxin. We conclude that the syncytiotrophoblast is the source of relaxin in the canine placenta. PMID:10026098

  10. Purification and partial amino acid sequence of the chloroplast cytochrome b-559.

    PubMed

    Widger, W R; Cramer, W A; Hermodson, M; Meyer, D; Gullifor, M

    1984-03-25

    The hydrophobic cytochrome b-559, purified from unstacked, ethanol-washed spinach thylakoid membranes, using extraction with 2% Triton X-100 in 4 M urea and three chromatographic steps in the presence of protease inhibitors, has a dominant band on sodium dodecyl sulfate-urea gels corresponding to Mr = 10,000. The yield of this preparation is 30-50% (5-10 mg) starting with 600 mg of chlorophyll. The heme content yields a calculated molecular weight of no more than 17,500/heme, and perhaps somewhat smaller after correction for impurities. The Mr = 10,000 band is stained by the tetramethylbenzidine-H2O2 heme reagent on lithium dodecyl sulfate gels run at 0 degrees C. The Mr = 10,000 protein, further separated by high performance liquid chromatography, contains a unique NH2 terminus that is not blocked, and the amino acid sequence for the first 27 residues is NH2-Ser-Gly-Ser-Thr-Gly-Glu-Arg-Ser-Phe-Ala-Asp-Ile-Ile-Thr-Ser-Ile-Arg-Tyr-Trp -Val-Ile-X-Ser-Ile-Thr-Ile-Pro. . . COOH. Approximately 55% of the amino acids are hydrophobic, based on amino acid analysis of the Mr = 10,000 peptide, which also indicated the presence of at least one histidine. Only one cytochrome b-559 component could be identified, whose yield indicated that it arises from a single b-559 protein in chloroplasts corresponding to the in situ high potential cytochrome of the chloroplast photosystem II. PMID:6706983

  11. Sequence-Specific Electrical Purification of Nucleic Acids with Nanoporous Gold Electrodes.

    PubMed

    Daggumati, Pallavi; Appelt, Sandra; Matharu, Zimple; Marco, Maria L; Seker, Erkin

    2016-06-22

    Nucleic-acid-based biosensors have enabled rapid and sensitive detection of pathogenic targets; however, these devices often require purified nucleic acids for analysis since the constituents of complex biological fluids adversely affect sensor performance. This purification step is typically performed outside the device, thereby increasing sample-to-answer time and introducing contaminants. We report a novel approach using a multifunctional matrix, nanoporous gold (np-Au), which enables both detection of specific target sequences in a complex biological sample and their subsequent purification. The np-Au electrodes modified with 26-mer DNA probes (via thiol-gold chemistry) enabled sensitive detection and capture of complementary DNA targets in the presence of complex media (fetal bovine serum) and other interfering DNA fragments in the range of 50-1500 base pairs. Upon capture, the noncomplementary DNA fragments and serum constituents of varying sizes were washed away. Finally, the surface-bound DNA-DNA hybrids were released by electrochemically cleaving the thiol-gold linkage, and the hybrids were iontophoretically eluted from the nanoporous matrix. The optical and electrophoretic characterization of the analytes before and after the detection-purification process revealed that low target DNA concentrations (80 pg/μL) can be successfully detected in complex biological fluids and subsequently released to yield pure hybrids free of polydisperse digested DNA fragments and serum biomolecules. Taken together, this multifunctional platform is expected to enable seamless integration of detection and purification of nucleic acid biomarkers of pathogens and diseases in miniaturized diagnostic devices. PMID:27244455

  12. New families in the classification of glycosyl hydrolases based on amino acid sequence similarities.

    PubMed Central

    Henrissat, B; Bairoch, A

    1993-01-01

    301 glycosyl hydrolases and related enzymes corresponding to 39 EC entries of the I.U.B. classification system have been classified into 35 families on the basis of amino-acid-sequence similarities [Henrissat (1991) Biochem. J. 280, 309-316]. Approximately half of the families were found to be monospecific (containing only one EC number), whereas the other half were found to be polyspecific (containing at least two EC numbers). A > 60% increase in sequence data for glycosyl hydrolases (181 additional enzymes or enzyme domains sequences have since become available) allowed us to update the classification not only by the addition of more members to already identified families, but also by the finding of ten new families. On the basis of a comparison of 482 sequences corresponding to 52 EC entries, 45 families, out of which 22 are polyspecific, can now be defined. This classification has been implemented in the SWISS-PROT protein sequence data bank. PMID:8352747

  13. Sequence-specific purification of nucleic acids by PNA-controlled hybrid selection.

    PubMed

    Orum, H; Nielsen, P E; Jørgensen, M; Larsson, C; Stanley, C; Koch, T

    1995-09-01

    Using an oligohistidine peptide nucleic acids (oligohistidine-PNA) chimera, we have developed a rapid hybrid selection method that allows efficient, sequence-specific purification of a target nucleic acid. The method exploits two fundamental features of PNA. First, that PNA binds with high affinity and specificity to its complementary nucleic acid. Second, that amino acids are easily attached to the PNA oligomer during synthesis. We show that a (His)6-PNA chimera exhibits strong binding to chelated Ni2+ ions without compromising its native PNA hybridization properties. We further show that these characteristics allow the (His)6-PNA/DNA complex to be purified by the well-established method of metal ion affinity chromatography using a Ni(2+)-NTA (nitrilotriactic acid) resin. Specificity and efficiency are the touchstones of any nucleic acid purification scheme. We show that the specificity of the (His)6-PNA selection approach is such that oligonucleotides differing by only a single nucleotide can be selectively purified. We also show that large RNAs (2224 nucleotides) can be captured with high efficiency by using multiple (His)6-PNA probes. PNA can hybridize to nucleic acids in low-salt concentrations that destabilize native nucleic acid structures. We demonstrate that this property of PNA can be utilized to purify an oligonucleotide in which the target sequence forms part of an intramolecular stem/loop structure. PMID:7495562

  14. High-affinity homologous peptide nucleic acid probes for targeting a quadruplex-forming sequence from a MYC promoter element.

    PubMed

    Roy, Subhadeep; Tanious, Farial A; Wilson, W David; Ly, Danith H; Armitage, Bruce A

    2007-09-18

    Guanine-rich DNA and RNA sequences are known to fold into secondary structures known as G-quadruplexes. Recent biochemical evidence along with the discovery of an increasing number of sequences in functionally important regions of the genome capable of forming G-quadruplexes strongly indicates important biological roles for these structures. Thus, molecular probes that can selectively target quadruplex-forming sequences (QFSs) are envisioned as tools to delineate biological functions of quadruplexes as well as potential therapeutic agents. Guanine-rich peptide nucleic acids have been previously shown to hybridize to homologous DNA or RNA sequences forming PNA-DNA (or RNA) quadruplexes. For this paper we studied the hybridization of an eight-mer G-rich PNA to a quadruplex-forming sequence derived from the promoter region of the MYC proto-oncogene. UV melting analysis, fluorescence assays, and surface plasmon resonance experiments reveal that this PNA binds to the MYC QFS in a 2:1 stoichiometry and with an average binding constant Ka = (2.0 +/- 0.2) x 10(8) M(-1) or Kd = 5.0 nM. In addition, experiments carried out with short DNA targets revealed a dependence of the affinity on the sequence of bases in the loop region of the DNA. A structural model for the hybrid quadruplex is proposed, and implications for gene targeting by G-rich PNAs are discussed. PMID:17718513

  15. Sequencing and Analysis of Neanderthal Genomic DNA

    SciTech Connect

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith,Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Paabo,Svante; Pritchard, Jonathan K.; Rubin, Edward M.

    2006-06-13

    Recovery and analysis of multiple Neanderthal autosomalsequences using a metagenomic approach reveals that modern humans andNeanderthals split ~;400,000 years ago, without significant evidence ofsubsequent admixture.

  16. The developmental transcriptome landscape of bovine skeletal muscle defined by Ribo-Zero ribonucleic acid sequencing.

    PubMed

    Sun, X; Li, M; Sun, Y; Cai, H; Li, R; Wei, X; Lan, X; Huang, Y; Lei, C; Chen, H

    2015-12-01

    Ribonucleic acid sequencing (RNA-Seq) libraries are normally prepared with oligo(dT) selection of poly(A)+ mRNA, but it depends on intact total RNA samples. Recent studies have described Ribo-Zero technology, a novel method that can capture both poly(A)+ and poly(A)- transcripts from intact or fragmented RNA samples. We report here the first application of Ribo-Zero RNA-Seq for the analysis of the bovine embryonic, neonatal, and adult skeletal muscle whole transcriptome at an unprecedented depth. Overall, 19,893 genes were found to be expressed, with a high correlation of expression levels between the calf and the adult. Hundreds of genes were found to be highly expressed in the embryo and decreased at least 10-fold after birth, indicating their potential roles in embryonic muscle development. In addition, we present for the first time the analysis of global transcript isoform discovery in bovine skeletal muscle and identified 36,694 transcript isoforms. Transcriptomic data were also analyzed to unravel sequence variations; 185,036 putative SNP and 12,428 putative short insertions-deletions (InDel) were detected. Specifically, many stop-gain, stop-loss, and frameshift mutations were identified that probably change the relative protein production and sequentially affect the gene function. Notably, the numbers of stage-specific transcripts, alternative splicing events, SNP, and InDel were greater in the embryo than in the calf and the adult, suggesting that gene expression is most active in the embryo. The resulting view of the transcriptome at a single-base resolution greatly enhances the comprehensive transcript catalog and uncovers the global trends in gene expression during bovine skeletal muscle development. PMID:26641174

  17. Analyses of mitochondrial amino acid sequence datasets support the proposal that specimens of Hypodontus macropi from three species of macropodid hosts represent distinct species

    PubMed Central

    2013-01-01

    Background Hypodontus macropi is a common intestinal nematode of a range of kangaroos and wallabies (macropodid marsupials). Based on previous multilocus enzyme electrophoresis (MEE) and nuclear ribosomal DNA sequence data sets, H. macropi has been proposed to be complex of species. To test this proposal using independent molecular data, we sequenced the whole mitochondrial (mt) genomes of individuals of H. macropi from three different species of hosts (Macropus robustus robustus, Thylogale billardierii and Macropus [Wallabia] bicolor) as well as that of Macropicola ocydromi (a related nematode), and undertook a comparative analysis of the amino acid sequence datasets derived from these genomes. Results The mt genomes sequenced by next-generation (454) technology from H. macropi from the three host species varied from 13,634 bp to 13,699 bp in size. Pairwise comparisons of the amino acid sequences predicted from these three mt genomes revealed differences of 5.8% to 18%. Phylogenetic analysis of the amino acid sequence data sets using Bayesian Inference (BI) showed that H. macropi from the three different host species formed distinct, well-supported clades. In addition, sliding window analysis of the mt genomes defined variable regions for future population genetic studies of H. macropi in different macropodid hosts and geographical regions around Australia. Conclusions The present analyses of inferred mt protein sequence datasets clearly supported the hypothesis that H. macropi from M. robustus robustus, M. bicolor and T. billardierii represent distinct species. PMID:24261823

  18. Bile acids: analysis in biological fluids and tissues

    PubMed Central

    Griffiths, William J.; Sjövall, Jan

    2010-01-01

    The formation of bile acids/bile alcohols is of major importance for the maintenance of cholesterol homeostasis. Besides their functions in lipid absorption, bile acids/bile alcohols are regulatory molecules for a number of metabolic processes. Their effects are structure-dependent, and numerous metabolic conversions result in a complex mixture of biologically active and inactive forms. Advanced methods are required to characterize and quantify individual bile acids in these mixtures. A combination of such analyses with analyses of the proteome will be required for a better understanding of mechanisms of action and nature of endogenous ligands. Mass spectrometry is the basic detection technique for effluents from chromatographic columns. Capillary liquid chromatography-mass spectrometry with electrospray ionization provides the highest sensitivity in metabolome analysis. Classical gas chromatography-mass spectrometry is less sensitive but offers extensive structure-dependent fragmentation increasing the specificity in analyses of isobaric isomers of unconjugated bile acids. Depending on the nature of the bile acid/bile alcohol mixture and the range of concentration of individuals, different sample preparation sequences, from simple extractions to group separations and derivatizations, are applicable. We review the methods currently available for the analysis of bile acids in biological fluids and tissues, with emphasis on the combination of liquid and gas phase chromatography with mass spectrometry. PMID:20008121

  19. Sequence and comparative genomic analysis of actin-related proteins.

    PubMed

    Muller, Jean; Oma, Yukako; Vallar, Laurent; Friederich, Evelyne; Poch, Olivier; Winsor, Barbara

    2005-12-01

    Actin-related proteins (ARPs) are key players in cytoskeleton activities and nuclear functions. Two complexes, ARP2/3 and ARP1/11, also known as dynactin, are implicated in actin dynamics and in microtubule-based trafficking, respectively. ARP4 to ARP9 are components of many chromatin-modulating complexes. Conventional actins and ARPs codefine a large family of homologous proteins, the actin superfamily, with a tertiary structure known as the actin fold. Because ARPs and actin share high sequence conservation, clear family definition requires distinct features to easily and systematically identify each subfamily. In this study we performed an in depth sequence and comparative genomic analysis of ARP subfamilies. A high-quality multiple alignment of approximately 700 complete protein sequences homologous to actin, including 148 ARP sequences, allowed us to extend the ARP classification to new organisms. Sequence alignments revealed conserved residues, motifs, and inserted sequence signatures to define each ARP subfamily. These discriminative characteristics allowed us to develop ARPAnno (http://bips.u-strasbg.fr/ARPAnno), a new web server dedicated to the annotation of ARP sequences. Analyses of sequence conservation among actins and ARPs highlight part of the actin fold and suggest interactions between ARPs and actin-binding proteins. Finally, analysis of ARP distribution across eukaryotic phyla emphasizes the central importance of nuclear ARPs, particularly the multifunctional ARP4. PMID:16195354

  20. DSAP: deep-sequencing small RNA analysis pipeline.

    PubMed

    Huang, Po-Jung; Liu, Yi-Chung; Lee, Chi-Ching; Lin, Wei-Chen; Gan, Richie Ruei-Chi; Lyu, Ping-Chiang; Tang, Petrus

    2010-07-01

    DSAP is an automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology. DSAP uses a tab-delimited file as an input format, which holds the unique sequence reads (tags) and their corresponding number of copies generated by the Solexa sequencing platform. The input data will go through four analysis steps in DSAP: (i) cleanup: removal of adaptors and poly-A/T/C/G/N nucleotides; (ii) clustering: grouping of cleaned sequence tags into unique sequence clusters; (iii) non-coding RNA (ncRNA) matching: sequence homology mapping against a transcribed sequence library from the ncRNA database Rfam (http://rfam.sanger.ac.uk/); and (iv) known miRNA matching: detection of known miRNAs in miRBase (http://www.mirbase.org/) based on sequence homology. The expression levels corresponding to matched ncRNAs and miRNAs are summarized in multi-color clickable bar charts linked to external databases. DSAP is also capable of displaying miRNA expression levels from different jobs using a log(2)-scaled color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in miRBase. DSAP is available at http://dsap.cgu.edu.tw. PMID:20478825

  1. Evolution of vertebrate IgM: complete amino acid sequence of the constant region of Ambystoma mexicanum mu chain deduced from cDNA sequence.

    PubMed

    Fellah, J S; Wiles, M V; Charlemagne, J; Schwager, J

    1992-10-01

    cDNA clones coding for the constant region of the Mexican axolotl (Ambystoma mexicanum) mu heavy immunoglobulin chain were selected from total spleen RNA, using a cDNA polymerase chain reaction technique. The specific 5'-end primer was an oligonucleotide homologous to the JH segment of Xenopus laevis mu chain. One of the clones, JHA/3, corresponded to the complete constant region of the axolotl mu chain, consisting of a 1362-nucleotide sequence coding for a polypeptide of 454 amino acids followed in 3' direction by a 179-nucleotide untranslated region and a polyA+ tail. The axolotl C mu is divided into four typical domains (C mu 1-C mu 4) and can be aligned with the Xenopus C mu with an overall identity of 56% at the nucleotide level. Percent identities were particularly high between C mu 1 (59%) and C mu 4 (71%). The C-terminal 20-amino acid segment which constitutes the secretory part of the mu chain is strongly homologous to the equivalent sequences of chondrichthyans and of other tetrapods, including a conserved N-linked oligosaccharide, the penultimate cysteine and the C-terminal lysine. The four C mu domains of 13 vertebrate species ranging from chondrichthyans to mammals were aligned and compared at the amino acid level. The significant number of mu-specific residues which are conserved into each of the four C mu domains argues for a continuous line of evolution of the vertebrate mu chain. This notion was confirmed by the ability to reconstitute a consistent vertebrate evolution tree based on the phylogenic parsimony analysis of the C mu 4 sequences. PMID:1382992

  2. Amino acid sequence of a vitamin K-dependent Ca2+-binding peptide from bovine prothrombin.

    PubMed

    Howard, J B; Fausch, M D

    1975-08-10

    The amino acid sequence of a 31-residue peptide from bovine prothrombin has been determined. This peptide has been shown to contain the vitamin K-dependent modification required for Ca2+ binding (Nelsestuen, G. L., and Suttie, J. W. (1973) Proc. Natl. Acad. Sci. U. S. A. 70, 3366-3370) and the modified amino acid, gamma-carboxyglutamic acid (Nelsestuen, G. L., Zytkovicz, T., and Howard, J. B. (1974) J. Biol. Chem. 249, 6347-6350). The peptide was shown to correspond to residues 12 to 42 of prothrombin. PMID:807581

  3. Amino acid sequences around the cysteine residues of rabbit muscle triose phosphate isomerase

    PubMed Central

    Miller, Janet C.; Waley, S. G.

    1971-01-01

    1. The nature of the subunits in rabbit muscle triose phosphate isomerase has been investigated. 2. Amino acid analyses show that there are five cysteine residues and two methionine residues/subunit. 3. The amino acid sequences around the cysteine residues have been determined; these account for about 75 residues. 4. Cleavage at the methionine residues with cyanogen bromide gave three fragments. 5. These results show that the subunits correspond to polypeptide chains, containing about 230 amino acid residues. The chains in triose phosphate isomerase seem to be shorter than those of other glycolytic enzymes. PMID:5165707

  4. Complete amino acid sequence of the Mu heavy chain of a human IgM immunoglobulin.

    PubMed

    Putnam, F W; Florent, G; Paul, C; Shinoda, T; Shimizu, A

    1973-10-19

    The amino acid sequence of the micro, chain of a human IgM immunoglobulin, including the location of all disulfide bridges and oligosaccharides, has been determined. The homology of the constant regions of immunoglobulin micro, gamma, alpha, and epsilon heavy chains reveals evolutionary relationships and suggests that two genes code for each heavy chain. PMID:4742735

  5. Draft Genome Sequence of the Butyric Acid Producer Clostridium tyrobutyricum Strain CIP I-776 (IFP923)

    PubMed Central

    Clément, Benjamin; Lopes Ferreira, Nicolas

    2016-01-01

    Here, we report the draft genome sequence of Clostridium tyrobutyricum CIP I-776 (IFP923), an efficient producer of butyric acid. The genome consists of a single chromosome of 3.19 Mb and provides useful data concerning the metabolic capacities of the strain. PMID:26941139

  6. Draft Genome Sequence of Perfluorooctane Acid-Degrading Bacterium Pseudomonas parafulva YAB-1

    PubMed Central

    Tang, Chongjian; Peng, Qingjing; Peng, Qingzhong

    2015-01-01

    Pseudomonas parafulva YAB-1, isolated from perfluorinated compound-contaminated soil, has the ability to degrade perfluorooctane acid (PFOA) compound. Here, we report the draft genome sequence and annotation of the PFOA-degrading bacterium P. parafulva YAB-1. The data provide the basis to investigate the molecular mechanism of PFOA metabolism. PMID:26337877

  7. Amino acid sequences of lysozymes newly purified from invertebrates imply wide distribution of a novel class in the lysozyme family.

    PubMed

    Ito, Y; Yoshikawa, A; Hotani, T; Fukuda, S; Sugimura, K; Imoto, T

    1999-01-01

    Lysozymes were purified from three invertebrates: a marine bivalve, a marine conch, and an earthworm. The purified lysozymes all showed a similar molecular weight of 13 kDa on SDS/PAGE. Their N-terminal sequences up to the 33rd residue determined here were apparently homologous among them; in addition, they had a homology with a partial sequence of a starfish lysozyme which had been reported before. The complete sequence of the bivalve lysozyme was determined by peptide mapping and subsequent sequence analysis. This was composed of 123 amino acids including as many as 14 cysteine residues and did not show a clear homology with the known types of lysozymes. However, the homology search of this protein on the protein or nucleic acid database revealed two homologous proteins. One of them was a gene product, CELF22 A3.6 of C. elegans, which was a functionally unknown protein. The other was an isopeptidase of a medicinal leech, named destabilase. Thus, a new type of lysozyme found in at least four species across the three classes of the invertebrates demonstrates a novel class of protein/lysozyme family in invertebrates. The bivalve lysozyme, first characterized here, showed extremely high protein stability and hen lysozyme-like enzymatic features. PMID:9914527

  8. Direct mutation analysis by high-throughput sequencing: from germline to low-abundant, somatic variants

    PubMed Central

    Gundry, Michael; Vijg, Jan

    2011-01-01

    DNA mutations are the source of genetic variation within populations. The majority of mutations with observable effects are deleterious. In humans mutations in the germ line can cause genetic disease. In somatic cells multiple rounds of mutations and selection lead to cancer. The study of genetic variation has progressed rapidly since the completion of the draft sequence of the human genome. Recent advances in sequencing technology, most importantly the introduction of massively parallel sequencing (MPS), have resulted in more than a hundred-fold reduction in the time and cost required for sequencing nucleic acids. These improvements have greatly expanded the use of sequencing as a practical tool for mutation analysis. While in the past the high cost of sequencing limited mutation analysis to selectable markers or small forward mutation targets assumed to be representative for the genome overall, current platforms allow whole genome sequencing for less than $5,000. This has already given rise to direct estimates of germline mutation rates in multiple organisms including humans by comparing whole genome sequences between parents and offspring. Here we present a brief history of the field of mutation research, with a focus on classical tools for the measurement of mutation rates. We then review MPS, how it is currently applied and the new insight into human and animal mutation frequencies and spectra that has been obtained from whole genome sequencing. While great progress has been made, we note that the single most important limitation of current MPS approaches for mutation analysis is the inability to address low-abundance mutations that turn somatic tissues into mosaics of cells. Such mutations are at the basis of intra-tumor heterogeneity, with important implications for clinical diagnosis, and could also contribute to somatic diseases other than cancer, including aging. Some possible approaches to gain access to low-abundance mutations are discussed, with a

  9. Genome sequencing and analysis conference grant

    SciTech Connect

    Venter, J.C.

    1995-10-01

    The 14 plenary session presentations focused on nematode; yeast; fruit fly; plants; mycobacteria; and man. In addition there were presentations on a variety of technical innovations including database developments and refinements, bioelectronic genesensors, computer-assisted multiplex techniques, and hybridization analysis with DNA chip technology. This document includes a list of exhibitors and abstracts of sessions.

  10. Applying machine learning techniques to DNA sequence analysis

    SciTech Connect

    Shavlik, J.W. . Dept. of Computer Sciences); Noordewier, M.O. . Dept. of Computer Science)

    1992-01-01

    We are primarily developing a machine teaming (ML) system that modifies existing knowledge about specific types of biological sequences. It does this by considering sample members and nonmembers of the sequence motif being teamed. Using this information, our teaming algorithm produces a more accurate representation of the knowledge needed to categorize future sequences. Specifically, our KBANN algorithm maps inference rules about a given recognition task into a neural network. Neural network training techniques then use the training examples to refine these inference rules. We call these rules a domain theory, following the convention in the machine teaming community. We have been applying this approach to several problems in DNA sequence analysis. In addition, we have been extending the capabilities of our teaming system along several dimensions. We have also been investigating parallel algorithms that perform sequence alignments in the presence of frameshift errors.

  11. Applying machine learning techniques to DNA sequence analysis

    SciTech Connect

    Shavlik, J.W.

    1992-01-01

    We are developing a machine learning system that modifies existing knowledge about specific types of biological sequences. It does this by considering sample members and nonmembers of the sequence motif being learned. Using this information (which we call a domain theory''), our learning algorithm produces a more accurate representation of the knowledge needed to categorize future sequences. Specifically, the KBANN algorithm maps inference rules, such as consensus sequences, into a neural (connectionist) network. Neural network training techniques then use the training examples of refine these inference rules. We have been applying this approach to several problems in DNA sequence analysis and have also been extending the capabilities of our learning system along several dimensions.

  12. The amino acid sequence of cytochrome c-555 from the methane-oxidizing bacterium Methylococcus capsulatus.

    PubMed Central

    Ambler, R P; Dalton, H; Meyer, T E; Bartsch, R G; Kamen, M D

    1986-01-01

    The amino acid sequence of the cytochrome c-555 from the obligate methanotroph Methylococcus capsulatus strain Bath (N.C.I.B. 11132) was determined. It is a single polypeptide chain of 96 residues, binding a haem group through the cysteine residues at positions 19 and 22, and the only methionine residue is a position 59. The sequence does not closely resemble that of any other cytochrome c that has yet been characterized. Detailed evidence for the amino acid sequence of the protein has been deposited as Supplementary Publication SUP 50131 (12 pages) at the British Library Lending Division, Boston Spa, West Yorkshire LS23 7BQ, U.K., from whom copies are available on prepayment. PMID:3006666

  13. EST sequencing of Onychophora and phylogenomic analysis of Metazoa.

    PubMed

    Roeding, Falko; Hagner-Holler, Silke; Ruhberg, Hilke; Ebersberger, Ingo; von Haeseler, Arndt; Kube, Michael; Reinhardt, Richard; Burmester, Thorsten

    2007-12-01

    Onychophora (velvet worms) represent a small animal taxon considered to be related to Euarthropoda. We have obtained 1873 5' cDNA sequences (expressed sequence tags, ESTs) from the velvet worm Epiperipatus sp., which were assembled into 833 contigs. BLAST similarity searches revealed that 51.9% of the contigs had matches in the protein databases with expectation values lower than 10(-4). Most ESTs had the best hit with proteins from either Chordata or Arthropoda (approximately 40% respectively). The ESTs included sequences of 27 ribosomal proteins. The orthologous sequences from 28 other species of a broad range of phyla were obtained from the databases, including other EST projects. A concatenated amino acid alignment comprising 5021 positions was constructed, which covers 4259 positions when problematic regions were removed. Bayesian and maximum likelihood methods place Epiperipatus within the monophyletic Ecdysozoa (Onychophora, Arthropoda, Tardigrada and Nematoda), but its exact relation to the Euarthropoda remained unresolved. The "Articulata" concept was not supported. Tardigrada and Nematoda formed a well-supported monophylum, suggesting that Tardigrada are actually Cycloneuralia. In agreement with previous studies, we have demonstrated that random sequencing of cDNAs results in sequence information suitable for phylogenomic approaches to resolve metazoan relationships. PMID:17933557

  14. Allelic polymorphism in arabian camel ribonuclease and the amino acid sequence of bactrian camel ribonuclease.

    PubMed

    Welling, G W; Mulder, H; Beintema, J J

    1976-04-01

    Pancreatic ribonucleases from several species (whitetail deer, roe deer, guinea pig, and arabian camel) exhibit more than one amino acid at particular positions in their amino acid sequences. Since these enzymes were isolated from pooled pancreas, the origin of this heterogeneity is not clear. The pancreatic ribonucleases from 11 individual arabian camels (Camelus dromedarius) have been investigated with respect to the lysine-glutamine heterogeneity at position 103 (Welling et al., 1975). Six ribonucleases showed only one basic band and five showed two bands after polyacrylamide gel electrophoresis, suggesting a gene frequency of about 0.75 for the Lys gene and about 0.25 for the Gln gene. The amino acid sequence of bactrian camel (Camelus bactrianus) ribonuclease isolated from individual pancreatic tissue was determined and compared with that of arabian camel ribonuclease. The only difference was observed at position 103. In the ribonucleases from two unrelated bactrian camels, only glutamine was observed at that position. PMID:962846

  15. Comprehensive analysis of sequences of a protein switch.

    PubMed

    Chen, Szu-Hua; Meller, Jaroslaw; Elber, Ron

    2016-01-01

    Switches form a special class of proteins that dramatically change their three-dimensional structures upon a small perturbation. One possible perturbation that we explore is that of a single point mutation. Building on the pioneering experimental work of Alexander et al. (Alexander et al. PNAS, 2007; 104,11963-11968) that determines switch sequences between α and α+β folds we conduct a comprehensive sequence sampling by a Markov Chain with multiple fitness criteria to identify new switches given the experimental folds. We screen for switch sequences using a combination of contact potential, secondary structure prediction, and finally molecular dynamics simulations. Statistical properties of switch sequences are discussed and illustrated to be most sensitive to mutation at the N- and C- termini of the switch protein. Based on this analysis, a particularly stable putative switch pair is identified and proposed for further experimental analysis. PMID:26073558

  16. Deep Sequencing Analysis of Nucleolar Small RNAs: Bioinformatics.

    PubMed

    Bai, Baoyan; Laiho, Marikki

    2016-01-01

    Small RNAs (size 20-30 nt) of various types have been actively investigated in recent years, and their subcellular compartmentalization and relative concentrations are likely to be of importance to their cellular and physiological functions. Comprehensive data on this subset of the transcriptome can only be obtained by application of high-throughput sequencing, which yields data that are inherently complex and multidimensional, as sequence composition, length, and abundance will all inform to the small RNA function. Subsequent data analysis, hypothesis testing, and presentation/visualization of the results are correspondingly challenging. We have constructed small RNA libraries derived from different cellular compartments, including the nucleolus, and asked whether small RNAs exist in the nucleolus and whether they are distinct from cytoplasmic and nuclear small RNAs, the miRNAs. Here, we present a workflow for analysis of small RNA sequencing data generated by the Ion Torrent PGM sequencer from samples derived from different cellular compartments. PMID:27576724

  17. Food Fish Identification from DNA Extraction through Sequence Analysis

    ERIC Educational Resources Information Center

    Hallen-Adams, Heather E.

    2015-01-01

    This experiment exposed 3rd and 4th y undergraduates and graduate students taking a course in advanced food analysis to DNA extraction, polymerase chain reaction (PCR), and DNA sequence analysis. Students provided their own fish sample, purchased from local grocery stores, and the class as a whole extracted DNA, which was then subjected to PCR,…

  18. Basic Sequence Analysis Techniques for Use with Audit Trail Data

    ERIC Educational Resources Information Center

    Judd, Terry; Kennedy, Gregor

    2008-01-01

    Audit trail analysis can provide valuable insights to researchers and evaluators interested in comparing and contrasting designers' expectations of use and students' actual patterns of use of educational technology environments (ETEs). Sequence analysis techniques are particularly effective but have been neglected to some extent because of real…

  19. Use of a structural alphabet to find compatible folds for amino acid sequences

    PubMed Central

    Mahajan, Swapnil; de Brevern, Alexandre G; Sanejouand, Yves-Henri; Srinivasan, Narayanaswamy; Offmann, Bernard

    2015-01-01

    The structural annotation of proteins with no detectable homologs of known 3D structure identified using sequence-search methods is a major challenge today. We propose an original method that computes the conditional probabilities for the amino-acid sequence of a protein to fit to known protein 3D structures using a structural alphabet, known as “Protein Blocks” (PBs). PBs constitute a library of 16 local structural prototypes that approximate every part of protein backbone structures. It is used to encode 3D protein structures into 1D PB sequences and to capture sequence to structure relationships. Our method relies on amino acid occurrence matrices, one for each PB, to score global and local threading of query amino acid sequences to protein folds encoded into PB sequences. It does not use any information from residue contacts or sequence-search methods or explicit incorporation of hydrophobic effect. The performance of the method was assessed with independent test datasets derived from SCOP 1.75A. With a Z-score cutoff that achieved 95% specificity (i.e., less than 5% false positives), global and local threading showed sensitivity of 64.1% and 34.2%, respectively. We further tested its performance on 57 difficult CASP10 targets that had no known homologs in PDB: 38 compatible templates were identified by our approach and 66% of these hits yielded correctly predicted structures. This method scales-up well and offers promising perspectives for structural annotations at genomic level. It has been implemented in the form of a web-server that is freely available at http://www.bo-protscience.fr/forsa. PMID:25297700

  20. Use of a structural alphabet to find compatible folds for amino acid sequences.

    PubMed

    Mahajan, Swapnil; de Brevern, Alexandre G; Sanejouand, Yves-Henri; Srinivasan, Narayanaswamy; Offmann, Bernard

    2015-01-01

    The structural annotation of proteins with no detectable homologs of known 3D structure identified using sequence-search methods is a major challenge today. We propose an original method that computes the conditional probabilities for the amino-acid sequence of a protein to fit to known protein 3D structures using a structural alphabet, known as "Protein Blocks" (PBs). PBs constitute a library of 16 local structural prototypes that approximate every part of protein backbone structures. It is used to encode 3D protein structures into 1D PB sequences and to capture sequence to structure relationships. Our method relies on amino acid occurrence matrices, one for each PB, to score global and local threading of query amino acid sequences to protein folds encoded into PB sequences. It does not use any information from residue contacts or sequence-search methods or explicit incorporation of hydrophobic effect. The performance of the method was assessed with independent test datasets derived from SCOP 1.75A. With a Z-score cutoff that achieved 95% specificity (i.e., less than 5% false positives), global and local threading showed sensitivity of 64.1% and 34.2%, respectively. We further tested its performance on 57 difficult CASP10 targets that had no known homologs in PDB: 38 compatible templates were identified by our approach and 66% of these hits yielded correctly predicted structures. This method scales-up well and offers promising perspectives for structural annotations at genomic level. It has been implemented in the form of a web-server that is freely available at http://www.bo-protscience.fr/forsa. PMID:25297700

  1. Fad7 gene identification and fatty acids phenotypic variation in an olive collection by EcoTILLING and sequencing approaches.

    PubMed

    Sabetta, Wilma; Blanco, Antonio; Zelasco, Samanta; Lombardo, Luca; Perri, Enzo; Mangini, Giacomo; Montemurro, Cinzia

    2013-08-01

    The ω-3 fatty acid desaturases (FADs) are enzymes responsible for catalyzing the conversion of linoleic acid to α-linolenic acid localized in the plastid or in the endoplasmic reticulum. In this research we report the genotypic and phenotypic variation of Italian Olea europaea L. germoplasm for the fatty acid composition. The phenotypic oil characterization was followed by the molecular analysis of the plastidial-type ω-3 FAD gene (fad7) (EC 1.14.19), whose full-length sequence has been here identified in cultivar Leccino. The gene consisted of 2635 bp with 8 exons and 5'- and 3'-UTRs of 336 and 282 bp respectively, and showed a high level of heterozygousity (1/110 bp). The natural allelic variation was investigated both by a LiCOR EcoTILLING assay and the PCR product direct sequencing. Only three haplotypes were identified among the 96 analysed cultivars, highlighting the strong degree of conservation of this gene. PMID:23685785

  2. Streamlined analysis of duplex sequencing data with Du Novo.

    PubMed

    Stoler, Nicholas; Arbeithuber, Barbara; Guiblet, Wilfried; Makova, Kateryna D; Nekrutenko, Anton

    2016-01-01

    Duplex sequencing was originally developed to detect rare nucleotide polymorphisms normally obscured by the noise of high-throughput sequencing. Here we describe a new, streamlined, reference-free approach for the analysis of duplex sequencing data. We show the approach performs well on simulated data and precisely reproduces previously published results and apply it to a newly produced dataset, enabling us to type low-frequency variants in human mitochondrial DNA. Finally, we provide all necessary tools as stand-alone components as well as integrate them into the Galaxy platform. All analyses performed in this manuscript can be repeated exactly as described at http://usegalaxy.org/duplex . PMID:27566673

  3. Sequence analysis and genetic diversity of five new Indian isolates of cucumber mosaic virus.

    PubMed

    Kumar, S; Gautam, K K; Raj, S K

    2015-12-01

    Cucumber mosaic virus (CMV) is an important virus since it causes severe losses to many economically important crops worldwide. Five new isolates of CMV were isolated from naturally infected Hippeastrum hybridum, Dahlia pinnata, Hemerocallis fulva, Acorus calamus and Typhonium trilobatum plants, all exhibiting severe leaf mosaic symptoms. For molecular identification and sequence analyses, the complete coat protein (CP) gene of these isolates was amplified by RT-PCR. The resulting amplicons were cloned and sequenced and isolates were designated as HH (KP698590), DP (JF682239), HF (KP698589), AC (KP698588) and TT (JX570732). For study of genetic diversity among these isolates, the sequence data were analysed by BLASTn, multiple alignment and generating phylogenetic trees along with the respective sequences of other CMV isolates available in GenBank Database were done. The isolates under study showed 82-99% sequence diversity among them at nucleotide and amino acid levels; however they showed close relationships with CMV isolates of subgroup IB. In alignment analysis of amino acid sequences of HH and AC isolates, we have found fifteen and twelve unique substitutions, compared to HF, DP and TT isolates, suggesting the cause of high genetic diversity. PMID:26666188

  4. Analysis of Chiral Carboxylic Acids in Meteorites

    NASA Technical Reports Server (NTRS)

    Burton, A. S.; Elsila, J. E.; Hein, J. E.; Aponte, J. C.; Parker, E. T.; Glavin, D. P.; Dworkin, J. P.

    2015-01-01

    our efforts to develop highly sensitive LC-MS methods for the analysis of chiral carboxylic acids including hydroxy acids.

  5. Amino acid sequence homology between Piv, an essential protein in site-specific DNA inversion in Moraxella lacunata, and transposases of an unusual family of insertion elements.

    PubMed Central

    Lenich, A G; Glasgow, A C

    1994-01-01

    Deletion analysis of the subcloned DNA inversion region of Moraxella lacunata indicates that Piv is the only M. lacunata-encoded factor required for site-specific inversion of the tfpQ/tfpI pilin segment. The predicted amino acid sequence of Piv shows significant homology solely with the transposases/integrases of a family of insertion sequence elements, suggesting that Piv is a novel site-specific recombinase. Images PMID:8021196

  6. Amino acid analysis for pharmacopoeial purposes.

    PubMed

    Wahl, Oliver; Holzgrabe, Ulrike

    2016-07-01

    The impurity profile of amino acids depends strongly on the production process. Since there are many different production methods (e.g. fermentation, protein hydrolysis or chemical synthesis) universal, state of the art methods are required to determine the impurity profile of amino acids produced by all relevant competitors. At the moment TLC tests provided by the Ph. Eur. are being replaced by a very specific amino acid analysis procedure possibly missing out on currently unknown process related impurities. Production methods and possible impurities as well as separation and detection methods suitable for said impurities are subject to this review. PMID:27154660

  7. Software scripts for quality checking of high-throughput nucleic acid sequencers.

    PubMed

    Lazo, G R; Tong, J; Miller, R; Hsia, C; Rausch, C; Kang, Y; Anderson, O D

    2001-06-01

    We have developed a graphical interface to allow the researcher to view and assess the quality of sequencing results using a series of program scripts developed to process data generated by automated sequencers. The scripts are written in Perl programming language and are executable under the cgibin directory of a Web server environment. The scripts direct nucleic acid sequencing trace file data output from automated sequencers to be analyzed by the phred molecular biology program and are displayed as graphical hypertext mark-up language (HTML) pages. The scripts are mainly designed to handle 96-well microtiter dish samples, but the scripts are also able to read data from 384-well microtiter dishes 96 samples at a time. The scripts may be customized for different laboratory environments and computer configurations. Web links to the sources and discussion page are provided. PMID:11414222

  8. Genomic localization, sequence analysis, and transcription of the putative human cytomegalovirus DNA polymerase gene.

    PubMed Central

    Heilbronn, R; Jahn, G; Bürkle, A; Freese, U K; Fleckenstein, B; zur Hausen, H

    1987-01-01

    The human cytomegalovirus (HCMV)-induced DNA polymerase has been well characterized biochemically and functionally, but its genomic location has not yet been assigned. To identify the coding sequence, cross-hybridization with the herpes simplex virus type 1 (HSV-1) polymerase gene was used, as suggested by the close similarity of the herpes group virus-induced DNA polymerases to the HCMV DNA polymerase. A cosmid and plasmid library of the entire HCMV genome was screened with the BamHI Q fragment of HSV-1 at different stringency conditions. One PstI-HincII restriction fragment of 850 base pairs mapping within the EcoRI M fragment of HCMV cross-hybridized at Tm - 25 degrees C. Sequence analysis revealed one open reading frame spanning the entire sequence. The amino acid sequence showed a highly conserved domain of 133 amino acids shared with the HSV and putative Epstein-Barr virus polymerase sequences. This domain maps within the C-terminal part of the HSV polymerase gene, which has been suggested to contain part of the catalytic center of the enzyme. Transcription analysis revealed one 5.4-kilobase early transcript in the sense orientation with respect to the open reading frame identified. This transcript appears to code for the 140-kilodalton HCMV polymerase protein. Images PMID:3023689

  9. Genomic localization, sequence analysis, and transcription of the putative human cytomegalovirus DNA polymerase gene

    SciTech Connect

    Heilbronn, T.; Jahn, G.; Buerkle, A.; Freese, U.K.; Fleckenstein, B.; Zur Hausen, H.

    1987-01-01

    The human cytomegalovirus (HCMV)-induced DNA polymerase has been well characterized biochemically and functionally, but its genomic location has not yet been assigned. To identify the coding sequence, cross-hybridization with the herpes simplex virus type 1 (HSV-1) polymerase gene was used, as suggested by the close similarity of the herpes group virus-induced DNA polymerases to the HCMV DNA polymerase. A cosmid and plasmid library of the entire HCMV genome was screened with the BamHI Q fragment of HSF-1 at different stringency conditions. One PstI-HincII restriction fragment of 850 base pairs mapping within the EcoRI M fragment of HCMV cross-hybridized at T/sub m/ - 25/degrees/C. Sequence analysis revealed one open reading frame spanning the entire sequence. The amino acid sequence showed a highly conserved domain of 133 amino acids shared with the HSV and putative Esptein-Barr virus polymerase sequences. This domain maps within the C-terminal part of the HSV polymerase gene, which has been suggested to contain part of the catalytic center of the enzyme. Transcription analysis revealed one 5.4-kilobase early transcript in the sense orientation with respect to the open reading frame identified. This transcript appears to code for the 140-kilodalton HCMV polymerase protein.

  10. Infectious hypodermal and hematopoietic necrosis virus from Brazil: Sequencing, comparative analysis and PCR detection.

    PubMed

    Silva, Douglas C D; Nunes, Allan R D; Teixeira, Dárlio I A; Lima, João Paulo M S; Lanza, Daniel C F

    2014-08-30

    A 3739 nucleotide fragment of Infectious hypodermal and hematopoietic necrosis virus (IHHNV) from Brazil was amplified and sequenced. This fragment contains the entire coding sequences of viral proteins, the full 3' untranslated region (3'UTR) and a partial sequence of 5' untranslated region (5'UTR). The genome organization of IHHNV revealed the three typical major coding domains: a left ORF1 of 2001 bp that codes NS1, a left ORF2 (NS2) of 1091 bp that codes NS2 and a right ORF3 of 990 bp that codes VP. Nucleotide and amino acid sequences of the three viral proteins were compared with putative amino acid sequences of viruses reported from different regions. Comparisons among genomes from different geographic locations reveal 31 nucleotide regions that are 100% similar, distributed throughout the genome. An analysis of secondary structure of UTR regions, revealed regions with high probability to form hairpins, that may be involved in mechanisms of viral replication. Additionally, a maximum likelihood analysis indicates that Brazilian IHHNV belongs to lineage III, in the infectious IHHNV group, and is clustered with IHHNV isolates from Hawaii, China, Taiwan, Vietnam and South Korea. A new nested PCR targeting conserved nucleotide regions is proposed to detect IHHNV. PMID:24867614

  11. Widespread occurrence of the tfd-II genes in soil bacteria revealed by nucleotide sequence analysis of 2,4-dichlorophenoxyacetic acid degradative plasmids pDB1 and p712.

    PubMed

    Kim, Dong-Uk; Kim, Min-Sun; Lim, Jong-Sung; Ka, Jong-Ok

    2013-05-01

    Variovorax sp. strain DB1 and Pseudomonas pickettii strain 712 are 2,4-dicholorophenoxy-acetic acid (2,4-D)-degrading bacteria, which were isolated from agricultural soils in Republic of Korea and USA, respectively. Each strain harbors a 2,4-D degradative plasmid and is able to utilize 2,4-D as the sole source of carbon for its growth. The 2,4-D degradative plasmid pDB1 of strain DB1 consisted of a 65,269-bp circular molecule with a G+C content of 66.23% and had 68 ORFs. The 2,4-D degradative plasmid p712 of strain 712 was composed of a 62,798-bp circular molecule with a 62.11% G+C content and had 62 ORFs. The plasmids pDB1 and p712 share significantly homologous 2,4-D degradative genes with high similarity to the tfdR, tfdB-II, tfdC-II, tfdD-II, tfdE-II, tfdF-II, tfdK and tfdA genes of plasmid pJP4 of Alcaligenes eutrophus isolated from Australia. In a phylogenetic analysis with trfA, traL, and trbA genes, pDB1 belonged to IncP-1β with pJP4, while p712 belonged to IncP-1ε with pKJK5 and pEMT3. The results indicated that, in spite of the differences in their backbone regions, the 2,4-D catabolic genes of the two plasmids were closely related and also related to the well-known 2,4-D degradative plasmid pJP4 even though all were isolated from different geographic regions. Other similarities in the genetic organization and the presence of IS1071 suggested that these catabolic genes may be on a transposable element, leading to widespread occurrence in soil bacteria. PMID:23376020

  12. Comparison of the nucleotide and amino acid sequences of the RsrI and EcoRI restriction endonucleases.

    PubMed

    Stephenson, F H; Ballard, B T; Boyer, H W; Rosenberg, J M; Greene, P J

    1989-12-21

    The RsrI endonuclease, a type-II restriction endonuclease (ENase) found in Rhodobacter sphaeroides, is an isoschizomer of the EcoRI ENase. A clone containing an 11-kb BamHI fragment was isolated from an R. sphaeroides genomic DNA library by hybridization with synthetic oligodeoxyribonucleotide probes based on the N-terminal amino acid (aa) sequence of RsrI. Extracts of E. coli containing a subclone of the 11-kb fragment display RsrI activity. Nucleotide sequence analysis reveals an 831-bp open reading frame encoding a polypeptide of 277 aa. A 50% identity exists within a 266-aa overlap between the deduced aa sequences of RsrI and EcoRI. Regions of 75-100% aa sequence identity correspond to key structural and functional regions of EcoRI. The type-II ENases have many common properties, and a common origin might have been expected. Nevertheless, this is the first demonstration of aa sequence similarity between ENases produced by different organisms. PMID:2695392

  13. High Throughput Plasmid Sequencing with Illumina and CLC Bio (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    SciTech Connect

    Athavale, Ajay

    2012-06-01

    Ajay Athavale (Monsanto) presents "High Throughput Plasmid Sequencing with Illumina and CLC Bio" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  14. High Throughput Plasmid Sequencing with Illumina and CLC Bio (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    Athavale, Ajay [Monsanto

    2013-01-25

    Ajay Athavale (Monsanto) presents "High Throughput Plasmid Sequencing with Illumina and CLC Bio" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  15. Single Molecule Sequencing with a HeliScope Genetic Analysis System

    PubMed Central

    Thompson, John F.; Steinmann, Kathleen E.

    2010-01-01

    Helicos™ Single Molecule Sequencing (SMS) provides a unique view of genome biology through direct sequencing of cellular nucleic acids in an unbiased manner, providing both accurate quantitation and sequence information. Sample preparation does not require ligation or PCR amplification, avoiding the GC-content and size biases observed in other technologies. DNA is simply sheared, tailed with poly A, and hybridized to a flow cell surface containing oligo-dT for sequencing-by-synthesis of billions of molecules in parallel. This process also requires far less material than other technologies. Gene expression measurements can be done using 1st-strand cDNA-based methods (RNA- Seq) or using a novel approach that allows direct hybridization and sequencing of cellular RNA for the most direct quantitation possible. A diverse array of applications have been successfully performed including genome sequencing for accurate variant detection, ChIP-Seq using picogram quantities of DNA, copy number variation studies from both fresh tumor tissue and FFPE tissue samples, sequencing of ancient and degraded DNAs, small RNA studies leading to the identification of new classes of RNAs and the direct capture and sequencing of RNA from cell quantities as few as 250 cells. Because most next generation sequencing technologies require amplification and a specific size range of target molecules, DNAs not meeting those criteria cannot be sequenced in a reliable manner. Single-molecule sequencing does not suffer from those limitations as no amplification is necessary and degraded or modified molecules can be used directly as templates. Principles and methods for using the Helicos® Genetic Analysis System will be discussed. PMID:20890904

  16. Key for protein coding sequences identification: computer analysis of codon strategy.

    PubMed Central

    Rodier, F; Gabarro-Arpa, J; Ehrlich, R; Reiss, C

    1982-01-01

    The signal qualifying an AUG or GUG as an initiator in mRNAs processed by E. coli ribosomes is not found to be a systematic, literal homology sequence. In contrast, stability analysis reveals that initiators always occur within nucleic acid domains of low stability, for which a high A/U content is observed. Since no aminoacid selection pressure can be detected at N-termini of the proteins, the A/U enrichment results from a biased usage of the code degeneracy. A computer analysis is presented which allows easy detection of the codon strategy. N-terminal codons carry rather systematically A or U in third position, which suggests a mechanism for translation initiation and helps to detect protein coding sequences in sequenced DNA. PMID:7038623

  17. Sequence dependent N-terminal rearrangement and degradation of peptide nucleic acid (PNA) in aqueous solution

    NASA Technical Reports Server (NTRS)

    Eriksson, M.; Christensen, L.; Schmidt, J.; Haaima, G.; Orgel, L.; Nielsen, P. E.

    1998-01-01

    The stability of the PNA (peptide nucleic acid) thymine monomer inverted question markN-[2-(thymin-1-ylacetyl)]-N-(2-aminoaminoethyl)glycine inverted question mark and those of various PNA oligomers (5-8-mers) have been measured at room temperature (20 degrees C) as a function of pH. The thymine monomer undergoes N-acyl transfer rearrangement with a half-life of 34 days at pH 11 as analyzed by 1H NMR; and two reactions, the N-acyl transfer and a sequential degradation, are found by HPLC analysis to occur at measurable rates for the oligomers at pH 9 or above. Dependent on the amino-terminal sequence, half-lives of 350 h to 163 days were found at pH 9. At pH 12 the half-lives ranged from 1.5 h to 21 days. The results are discussed in terms of PNA as a gene therapeutic drug as well as a possible prebiotic genetic material.

  18. The amino acid sequence of ribonuclease U2 from Ustilago sphaerogena.

    PubMed Central

    Sato, S; Uchida, T

    1975-01-01

    1. RNAase (ribonuclease) U2, a purine-specific RNAase, was reduced, aminoethylated and hydrolysed with trypsin, chymotrypsin and thermolysin. On the basis of the analyses of the resulting peptides, the complete amino acid sequence of RNAase U2 was determined, 2. When the sequence was compared with the amino acid sequence of RNAase T1 (EC 3.1.4.8), the following regions were found to be similar in the two enzymes; Tyr-Pro-His-Gln-Tyr (38-42) in RNAase U2 and Tyr-Pro-His-Lys-Tyr (38-42) in RNAase T1, Glu-Phe-Pro-Leu-Val (61-65) in RNAase U2 and Glu-Trp-Pro-Ile-Leu (58-62) in RNAase T1, Asp-Arg-Val-Ile-Tyr-Gln (83-88) in RNAase U2 and Asp-Arg-Val-Phe-Asn (76-81) in RNAase T1 and Val-Thr-His-Thr-Gly-Ala (98-103) in RNAase U2 and Ile-Thr-His-Thr-Gly-Ala (90-95) in RNAase T1. All of the amino acid residues, histidine-40, glutamate-58, arginine-77 and histidine-92, which were found to play a crucial role in the biological activity of RNAase T1, were included in the regions cited here. 3. Detailed evidence for the amino acid sequence of the sequence of the proteins has been deposited as Supplementary Publication SUP 50041 (33 PAGES) AT THE British Library (Lending Division)(formerly the National Lending Library for Science and Technology), Boston Spa, Yorks. LS23 7BQ, U.K., from whom copies can be obtained on the terms indicated in Biochem. J. (1975), 145, 5. PMID:1156364

  19. Human liver type pyruvate kinase: complete amino acid sequence and the expression in mammalian cells.

    PubMed Central

    Tani, K; Fujii, H; Nagata, S; Miwa, S

    1988-01-01

    Pyruvate kinase (PK) has four isozymes (L, R, M1, M2) that are encoded by two different genes. Among these isozymes, abnormalities of liver (L)-type PK is considered to be associated with hereditary nonspherocytic hemolytic anemia in humans. We isolated and determined the full-length sequence of human L-type PK cDNA. The cDNA contains 1629 base pairs encoding 543 amino acids, 68 base pairs of 5'-noncoding sequence, and 734 base pairs of 3'-noncoding sequence. The similarity between human and rat L-type PK was 86.9% at the nucleotide sequence level and 92.4% at the amino acid sequence level. The full-length L-type PK cDNA was placed under the promoter of simian virus 40 and introduced into monkey COS cells. Human L-type PK activity was detected in the extract of COS cells by the classical PK electrophoresis method. Images PMID:3126495

  20. Human liver type pyruvate kinase: Complete amino acid sequence and the expression in mammalian cells

    SciTech Connect

    Tani, Kenzaburo; Nagata, Shigekazu ); Fujii, Hisaichi ); Miwa, Shiro )

    1988-03-01

    Pyruvate kinase (PK) has four isozymes (L, R, M{sub 1}, M{sub 2}) that are encoded by two different genes. Among these isozymes, abnormalities of liver (L)-type PK is considered to be associated with hereditary nonspherocytic hemolytic anemia in humans. The authors isolated and determined the full-length sequence of human L-type PK cDNA. The cDNA contains 1,629 base pairs encoding 543 amino acids, 68 base pairs of 5{prime}-noncoding sequence, and 734 base pairs of 3{prime}-noncoding sequence. The similarity between human and rat L-type PK was 86.9% at the nucleotide sequence level and 92.4% at the amino acid sequence level. The full-length L-type PK cDNA was placed under the promoter of simian virus 40 and introduced into monkey COS cells. Human L-type PK activity was detected in the extract of COS cells by the classical PK electrophoresis method.

  1. Analysis and Visualization Tool for Targeted Amplicon Bisulfite Sequencing on Ion Torrent Sequencers

    PubMed Central

    Pabinger, Stephan; Ernst, Karina; Pulverer, Walter; Kallmeyer, Rainer; Valdes, Ana M.; Metrustry, Sarah; Katic, Denis; Nuzzo, Angelo; Kriegner, Albert; Vierlinger, Klemens; Weinhaeusel, Andreas

    2016-01-01

    Targeted sequencing of PCR amplicons generated from bisulfite deaminated DNA is a flexible, cost-effective way to study methylation of a sample at single CpG resolution and perform subsequent multi-target, multi-sample comparisons. Currently, no platform specific protocol, support, or analysis solution is provided to perform targeted bisulfite sequencing on a Personal Genome Machine (PGM). Here, we present a novel tool, called TABSAT, for analyzing targeted bisulfite sequencing data generated on Ion Torrent sequencers. The workflow starts with raw sequencing data, performs quality assessment, and uses a tailored version of Bismark to map the reads to a reference genome. The pipeline visualizes results as lollipop plots and is able to deduce specific methylation-patterns present in a sample. The obtained profiles are then summarized and compared between samples. In order to assess the performance of the targeted bisulfite sequencing workflow, 48 samples were used to generate 53 different Bisulfite-Sequencing PCR amplicons from each sample, resulting in 2,544 amplicon targets. We obtained a mean coverage of 282X using 1,196,822 aligned reads. Next, we compared the sequencing results of these targets to the methylation level of the corresponding sites on an Illumina 450k methylation chip. The calculated average Pearson correlation coefficient of 0.91 confirms the sequencing results with one of the industry-leading CpG methylation platforms and shows that targeted amplicon bisulfite sequencing provides an accurate and cost-efficient method for DNA methylation studies, e.g., to provide platform-independent confirmation of Illumina Infinium 450k methylation data. TABSAT offers a novel way to analyze data generated by Ion Torrent instruments and can also be used with data from the Illumina MiSeq platform. It can be easily accessed via the Platomics platform, which offers a web-based graphical user interface along with sample and parameter storage. TABSAT is freely

  2. Analysis and Visualization Tool for Targeted Amplicon Bisulfite Sequencing on Ion Torrent Sequencers.

    PubMed

    Pabinger, Stephan; Ernst, Karina; Pulverer, Walter; Kallmeyer, Rainer; Valdes, Ana M; Metrustry, Sarah; Katic, Denis; Nuzzo, Angelo; Kriegner, Albert; Vierlinger, Klemens; Weinhaeusel, Andreas

    2016-01-01

    Targeted sequencing of PCR amplicons generated from bisulfite deaminated DNA is a flexible, cost-effective way to study methylation of a sample at single CpG resolution and perform subsequent multi-target, multi-sample comparisons. Currently, no platform specific protocol, support, or analysis solution is provided to perform targeted bisulfite sequencing on a Personal Genome Machine (PGM). Here, we present a novel tool, called TABSAT, for analyzing targeted bisulfite sequencing data generated on Ion Torrent sequencers. The workflow starts with raw sequencing data, performs quality assessment, and uses a tailored version of Bismark to map the reads to a reference genome. The pipeline visualizes results as lollipop plots and is able to deduce specific methylation-patterns present in a sample. The obtained profiles are then summarized and compared between samples. In order to assess the performance of the targeted bisulfite sequencing workflow, 48 samples were used to generate 53 different Bisulfite-Sequencing PCR amplicons from each sample, resulting in 2,544 amplicon targets. We obtained a mean coverage of 282X using 1,196,822 aligned reads. Next, we compared the sequencing results of these targets to the methylation level of the corresponding sites on an Illumina 450k methylation chip. The calculated average Pearson correlation coefficient of 0.91 confirms the sequencing results with one of the industry-leading CpG methylation platforms and shows that targeted amplicon bisulfite sequencing provides an accurate and cost-efficient method for DNA methylation studies, e.g., to provide platform-independent confirmation of Illumina Infinium 450k methylation data. TABSAT offers a novel way to analyze data generated by Ion Torrent instruments and can also be used with data from the Illumina MiSeq platform. It can be easily accessed via the Platomics platform, which offers a web-based graphical user interface along with sample and parameter storage. TABSAT is freely

  3. Genome Sequencing and Analysis of Catopsilia pomona nucleopolyhedrovirus: A Distinct Species in Group I Alphabaculovirus

    PubMed Central

    Wang, Jun; Zhu, Zheng; Zhang, Lei; Hou, Dianhai; Wang, Manli; Arif, Basil; Kou, Zheng; Wang, Hualin; Deng, Fei; Hu, Zhihong

    2016-01-01

    The genome sequence of Catopsilia pomona nucleopolyhedrovirus (CapoNPV) was determined by the Roche 454 sequencing system. The genome consisted of 128,058 bp and had an overall G+C content of 40%. There were 130 hypothetical open reading frames (ORFs) potentially encoding proteins of more than 50 amino acids and covering 92% of the genome. Among all the hypothetical ORFs, 37 baculovirus core genes, 23 lepidopteran baculovirus conserved genes and 10 genes conserved in Group I alphabaculoviruses were identified. In addition, the genome included regions of 8 typical baculoviral homologous repeat sequences (hrs). Phylogenic analysis showed that CapoNPV was in a distinct branch of clade “a” in Group I alphabaculoviruses. Gene parity plot analysis and overall similarity of ORFs indicated that CapoNPV is more closely related to the Group I alphabaculoviruses than to other baculoviruses. Interesting, CapoNPV lacks the genes encoding the fibroblast growth factor (fgf) and ac30, which are conserved in most lepidopteran and Group I baculoviruses, respectively. Sequence analysis of the F-like protein of CapoNPV showed that some amino acids were inserted into the fusion peptide region and the pre-transmembrane region of the protein. All these unique features imply that CapoNPV represents a member of a new baculovirus species. PMID:27166956

  4. Multilocus sequence analysis and rpoB sequencing of Mycobacterium abscessus (sensu lato) strains.

    PubMed

    Macheras, Edouard; Roux, Anne-Laure; Bastian, Sylvaine; Leão, Sylvia Cardoso; Palaci, Moises; Sivadon-Tardy, Valérie; Gutierrez, Cristina; Richter, Elvira; Rüsch-Gerdes, Sabine; Pfyffer, Gaby; Bodmer, Thomas; Cambau, Emmanuelle; Gaillard, Jean-Louis; Heym, Beate

    2011-02-01

    Mycobacterium abscessus, Mycobacterium bolletii, and Mycobacterium massiliense (Mycobacterium abscessus sensu lato) are closely related species that currently are identified by the sequencing of the rpoB gene. However, recent studies show that rpoB sequencing alone is insufficient to discriminate between these species, and some authors have questioned their current taxonomic classification. We studied here a large collection of M. abscessus (sensu lato) strains by partial rpoB sequencing (752 bp) and multilocus sequence analysis (MLSA). The final MLSA scheme developed was based on the partial sequences of eight housekeeping genes: argH, cya, glpK, gnd, murC, pgm, pta, and purH. The strains studied included the three type strains (M. abscessus CIP 104536(T), M. massiliense CIP 108297(T), and M. bolletii CIP 108541(T)) and 120 isolates recovered between 1997 and 2007 in France, Germany, Switzerland, and Brazil. The rpoB phylogenetic tree confirmed the existence of three main clusters, each comprising the type strain of one species. However, divergence values between the M. massiliense and M. bolletii clusters all were below 3% and between the M. abscessus and M. massiliense clusters were from 2.66 to 3.59%. The tree produced using the concatenated MLSA gene sequences (4,071 bp) also showed three main clusters, each comprising the type strain of one species. The M. abscessus cluster had a bootstrap value of 100% and was mostly compact. Bootstrap values for the M. massiliense and M. bolletii branches were much lower (71 and 61%, respectively), with the M. massiliense cluster having a fuzzy aspect. Mean (range) divergence values were 2.17% (1.13 to 2.58%) between the M. abscessus and M. massiliense clusters, 2.37% (1.5 to 2.85%) between the M. abscessus and M. bolletii clusters, and 2.28% (0.86 to 2.68%) between the M. massiliense and M. bolletii clusters. Adding the rpoB sequence to the MLSA-concatenated sequence (total sequence, 4,823 bp) had little effect on the

  5. Molecular cytogenetics by polymerase catalyzed amplification or in situ labelling of specific nucleic acid sequences

    SciTech Connect

    Bolund, L.; Brandt, C.; Hindkjaer, J.; Koch, J.; Koelvraa, S.; Pedersen, S. )

    1993-01-01

    The Polymerase Chain Reaction (PCR) can be performed on isolated cells or chromosomes and the product can be analyzed by DNA technology or by FISH to test metaphases. The authors have good experiences analyzing aberrant chromosomes by FACS sorting, PCR with degenerated primers and painting of test metaphases with the PCR product. They also utilize polymerases for PRimed IN Situ labelling (PRINS) of specific nucleic acid sequences. In PRINS oligonucleotides are hybridized to their target sequences and labeled nucleotides are incorporated at the site of hybridization with the oligonucleotide as primer. PRINS may eventually allow the study of individual genes, gene expression and even somatic mutations (in mRNA) in single cells.

  6. DNA Cloning of Plasmodium falciparum Circumsporozoite Gene: Amino Acid Sequence of Repetitive Epitope

    NASA Astrophysics Data System (ADS)

    Enea, Vincenzo; Ellis, Joan; Zavala, Fidel; Arnot, David E.; Asavanich, Achara; Masuda, Aoi; Quakyi, Isabella; Nussenzweig, Ruth S.

    1984-08-01

    A clone of complementary DNA encoding the circumsporozoite (CS) protein of the human malaria parasite Plasmodium falciparum has been isolated by screening an Escherichia coli complementary DNA library with a monoclonal antibody to the CS protein. The DNA sequence of the complementary DNA insert encodes a four-amino acid sequence: proline-asparagine-alanine-asparagine, tandemly repeated 23 times. The CS β -lactamase fusion protein specifically binds monoclonal antibodies to the CS protein and inhibits the binding of these antibodies to native Plasmodium falciparum CS protein. These findings provide a basis for the development of a vaccine against Plasmodium falciparum malaria.

  7. DNA sequence and analysis of human chromosome 18.

    PubMed

    Nusbaum, Chad; Zody, Michael C; Borowsky, Mark L; Kamal, Michael; Kodira, Chinnappa D; Taylor, Todd D; Whittaker, Charles A; Chang, Jean L; Cuomo, Christina A; Dewar, Ken; FitzGerald, Michael G; Yang, Xiaoping; Abouelleil, Amr; Allen, Nicole R; Anderson, Scott; Bloom, Toby; Bugalter, Boris; Butler, Jonathan; Cook, April; DeCaprio, David; Engels, Reinhard; Garber, Manuel; Gnirke, Andreas; Hafez, Nabil; Hall, Jennifer L; Norman, Catherine Hosage; Itoh, Takehiko; Jaffe, David B; Kuroki, Yoko; Lehoczky, Jessica; Lui, Annie; Macdonald, Pendexter; Mauceli, Evan; Mikkelsen, Tarjei S; Naylor, Jerome W; Nicol, Robert; Nguyen, Cindy; Noguchi, Hideki; O'Leary, Sinéad B; O'Neill, Keith; Piqani, Bruno; Smith, Cherylyn L; Talamas, Jessica A; Topham, Kerri; Totoki, Yasushi; Toyoda, Atsushi; Wain, Hester M; Young, Sarah K; Zeng, Qiandong; Zimmer, Andrew R; Fujiyama, Asao; Hattori, Masahira; Birren, Bruce W; Sakaki, Yoshiyuki; Lander, Eric S

    2005-09-22

    Chromosome 18 appears to have the lowest gene density of any human chromosome and is one of only three chromosomes for which trisomic individuals survive to term. There are also a number of genetic disorders stemming from chromosome 18 trisomy and aneuploidy. Here we report the finished sequence and gene annotation of human chromosome 18, which will allow a better understanding of the normal and disease biology of this chromosome. Despite the low density of protein-coding genes on chromosome 18, we find that the proportion of non-protein-coding sequences evolutionarily conserved among mammals is close to the genome-wide average. Extending this analysis to the entire human genome, we find that the density of conserved non-protein-coding sequences is largely uncorrelated with gene density. This has important implications for the nature and roles of non-protein-coding sequence elements. PMID:16177791

  8. Complete amino acid sequence of human plasma Zn-. cap alpha. /sub 2/-glycoprotein and its homology to histocompatibility antigens

    SciTech Connect

    Araki, T.; Gejyo, F.; Takagaki, K.; Haupt, H.; Schwick, H.G.; Buergi, W.; Marti, T.; Schaller, J.; Rickli, E.; Brossmer, R.

    1988-02-01

    In the present study the complete amino acid sequence of human plasma Zn-..cap alpha../sub 2/-glycoprotein was determined. This protein whose biological function is unknown consists of a single polypeptide chain of 276 amino acid residues including 8 tryptophan residues and has a pyroglutamyl residue at the amino terminus. The location of the two disulfide bonds in the polypeptide chain was also established. The three glycans, whose structure was elucidated with the aid of 500 MHz /sup 1/H NMR spectroscopy, were sialylated N-biantennas. The molecular weight calculated from the polypeptide and carbohydrate structure is 38,478, which is close to the reported value of approx. = 41,000 based on physicochemical measurements. The predicted secondary structure appeared to comprised of 23% ..cap alpha..-helix, 27% ..beta..-sheet, and 22% ..beta..-turns. The three N-glycans were found to be located in ..beta..-turn regions. An unexpected finding was made by computer analysis of the sequence data; this revealed that Zn-..cap alpha../sub 2/-glycoprotein is closely related to antigens of the major histocompatibility complex in amino acid sequence and in domain structure. There was an unusually high degree of sequence homology with the ..cap alpha.. chains of class I histocompatibility antigens. Moreover, this plasma protein was shown to be a member of the immunoglobulin gene superfamily. Zn-..cap alpha../sub 2/-glycoprotein appears to be truncated secretory major histocompatibility complex-related molecule, and it may have a role in the expression of the immune response.

  9. Vibrational analysis of α-cyanohydroxycinnamic acid

    NASA Astrophysics Data System (ADS)

    Mojica, Elmer-Rico E.; Vedad, Jayson; Desamero, Ruel Z. B.

    2015-08-01

    In the present study, a comparative Raman vibrational analysis of alpha-cyano-4-hydroxycinnamic acid (4CHCA) and its derivative, alpha-cyano-3-hydroxycinnamic acid (3CHCA), was performed. The Raman spectra of the 4CHCA and 3CHCA in solid form were obtained and analyzed to determine differences between the two structurally similar derivatives. For comparison, the CHCA derivatives cyanocinnamic acid (CCA) and coumaric acid (CA) were also studied. The plausible vibrational assignments were made and matched with those obtained theoretically using density functional theory (DFT) based method employing a 6-31 g basis set. The computational wavenumbers obtained were in good agreement with the observed experimental results. This was the first reported Raman study of CCA, 3CHCA and 4CHCA.

  10. Gene Expression Analysis in the Age of Mass Sequencing: An Introduction.

    PubMed

    Pilarsky, Christian; Nanduri, Lahiri Kanth; Roy, Janine

    2016-01-01

    During the last years the technology used for gene expression analysis has changed dramatically. The old mainstay, DNA microarray, has served its due course and will soon be replaced by next-generation sequencing (NGS), the Swiss army knife of modern high-throughput nucleic acid-based analysis. Therefore preparation technologies have to adapt to suit the emerging NGS technology platform. Moreover, interpretation of the results is still time consuming and employs the use of high-end computers usually not found in molecular biology laboratories. Alternatively, cloud computing might solve this problem. Nevertheless, these new challenges have to be embraced for gene expression analysis in general. PMID:26667455

  11. COMPARISON OF PHYLOGENETIC RELATIONSHIPS BASED ON PHOSPHOLIPID FATTY ACID PROFILES AND RIBOSOMAL RNA SEQUENCE SIMILARITIES AMONG DISSIMILATORY SULFATE-REDUCING BACTERIA

    EPA Science Inventory

    Twenty-five isolates of dissimilatory sulfate-reducing bacteria were clustered based on similarity analysis of their phospholipid ester-linked fatty acids (PLFA). f these, twenty-three showed the phylogenetic relationships based on the sequence similarity of their 16S rRNA direct...

  12. Method for high-volume sequencing of nucleic acids: random and directed priming with libraries of oligonucleotides

    DOEpatents

    Studier, F.W.

    1995-04-18

    Random and directed priming methods for determining nucleotide sequences by enzymatic sequencing techniques, using libraries of primers of lengths 8, 9 or 10 bases, are disclosed. These methods permit direct sequencing of nucleic acids as large as 45,000 base pairs or larger without the necessity for subcloning. Individual primers are used repeatedly to prime sequence reactions in many different nucleic acid molecules. Libraries containing as few as 10,000 octamers, 14,200 nonamers, or 44,000 decamers would have the capacity to determine the sequence of almost any cosmid DNA. Random priming with a fixed set of primers from a smaller library can also be used to initiate the sequencing of individual nucleic acid molecules, with the sequence being completed by directed priming with primers from the library. In contrast to random cloning techniques, a combined random and directed priming strategy is far more efficient. 2 figs.

  13. Method for high-volume sequencing of nucleic acids: random and directed priming with libraries of oligonucleotides

    DOEpatents

    Studier, F. William

    1995-04-18

    Random and directed priming methods for determining nucleotide sequences by enzymatic sequencing techniques, using libraries of primers of lengths 8, 9 or 10 bases, are disclosed. These methods permit direct sequencing of nucleic acids as large as 45,000 base pairs or larger without the necessity for subcloning. Individual primers are used repeatedly to prime sequence reactions in many different nucleic acid molecules. Libraries containing as few as 10,000 octamers, 14,200 nonamers, or 44,000 decamers would have the capacity to determine the sequence of almost any cosmid DNA. Random priming with a fixed set of primers from a smaller library can also be used to initiate the sequencing of individual nucleic acid molecules, with the sequence being completed by directed priming with primers from the library. In contrast to random cloning techniques, a combined random and directed priming strategy is far more efficient.

  14. Validation of Genotyping-By-Sequencing Analysis in Populations of Tetraploid Alfalfa by 454 Sequencing

    PubMed Central

    Rocher, Solen; Jean, Martine; Castonguay, Yves; Belzile, François

    2015-01-01

    Genotyping-by-sequencing (GBS) is a relatively low-cost high throughput genotyping technology based on next generation sequencing and is applicable to orphan species with no reference genome. A combination of genome complexity reduction and multiplexing with DNA barcoding provides a simple and affordable way to resolve allelic variation between plant samples or populations. GBS was performed on ApeKI libraries using DNA from 48 genotypes each of two heterogeneous populations of tetraploid alfalfa (Medicago sativa spp. sativa): the synthetic cultivar Apica (ATF0) and a derived population (ATF5) obtained after five cycles of recurrent selection for superior tolerance to freezing (TF). Nearly 400 million reads were obtained from two lanes of an Illumina HiSeq 2000 sequencer and analyzed with the Universal Network-Enabled Analysis Kit (UNEAK) pipeline designed for species with no reference genome. Following the application of whole dataset-level filters, 11,694 single nucleotide polymorphism (SNP) loci were obtained. About 60% had a significant match on the Medicago truncatula syntenic genome. The accuracy of allelic ratios and genotype calls based on GBS data was directly assessed using 454 sequencing on a subset of SNP loci scored in eight plant samples. Sequencing depth in this study was not sufficient for accurate tetraploid allelic dosage, but reliable genotype calls based on diploid allelic dosage were obtained when using additional quality filtering. Principal Component Analysis of SNP loci in plant samples revealed that a small proportion (<5%) of the genetic variability assessed by GBS is able to differentiate ATF0 and ATF5. Our results confirm that analysis of GBS data using UNEAK is a reliable approach for genome-wide discovery of SNP loci in outcrossed polyploids. PMID:26115486

  15. Nanopore sensors for nucleic acid analysis

    NASA Astrophysics Data System (ADS)

    Venkatesan, Bala Murali; Bashir, Rashid

    2011-10-01

    Nanopore analysis is an emerging technique that involves using a voltage to drive molecules through a nanoscale pore in a membrane between two electrolytes, and monitoring how the ionic current through the nanopore changes as single molecules pass through it. This approach allows charged polymers (including single-stranded DNA, double-stranded DNA and RNA) to be analysed with subnanometre resolution and without the need for labels or amplification. Recent advances suggest that nanopore-based sensors could be competitive with other third-generation DNA sequencing technologies, and may be able to rapidly and reliably sequence the human genome for under $1,000. In this article we review the use of nanopore technology in DNA sequencing, genetics and medical diagnostics.

  16. Cloning, Sequencing, and In Silico Analysis of β-Propeller Phytase Bacillus licheniformis Strain PB-13

    PubMed Central

    Sangwan, Punesh; Verma, A. K.; Agrawal, Sanjeev

    2014-01-01

    β-Propeller phytases (BPPhy) are widely distributed in nature and play a major role in phytate-phosphorus cycling. In the present study, a BPPhy gene from Bacillus licheniformis strain was expressed in E. coli with a phytase activity of 1.15 U/mL and specific activity of 0.92 U/mg proteins. The expressed enzyme represented a full length ORF “PhyPB13” of 381 amino acid residues and differs by 3 residues from the closest similar existing BPPhy sequences. The PhyPB13 sequence was characterized in silico using various bioinformatic tools to better understand structural, functional, and evolutionary aspects of BPPhy class by multiple sequence alignment and homology search, phylogenetic tree construction, variation in biochemical features, and distribution of motifs and superfamilies. In all sequences, conserved sites were observed toward their N-terminus and C-terminus. Cysteine was not present in the sequence. Overall, three major clusters were observed in phylogenetic tree with variation in biophysical characteristics. A total of 10 motifs were reported with motif “1” observed in all 44 protein sequences and might be used for diversity and expression analysis of BPPhy enzymes. This study revealed important sequence features of BPPhy and pave a way for determining catalytic mechanism and selection of phytase with desirable characteristics. PMID:24864215

  17. Complete VAX/VMS DNA/protein sequence analysis system

    SciTech Connect

    Smith, D.W.

    1987-05-01

    A complete yet flexible system of programs and database libraries for analysis of DNA, RNA and protein sequences is implemented for VAX/VMS computers. Types of analysis include 1) construction and analysis of chimeric sequences (cloning in the VAX), 2) multiple analysis of one or more single sequences, 3) search and comparison studies using sequence libraries, and 4) direct input and analysis of experimental data. Published groups of programs, including the Staden, Los Alamos, Zuker, Pearson, and PHYLIP programs, are used. GenBank and EMBL DNA libraries and PIR and Doolittle NEWAT protein libraries are available, with associated programs. The system is tutorial, with online documentation for relevent VAX software, the programs, and the databases. The complete documentation is flexibly maintained on reserve via computer printout placed in 3-ring binders. Command files are used extensively; porting of the entire system to another VAX/VMS system requires modification of a single command. Users of the system are members of a VAX group, with automatic implementation of the system upon login. The present system occupies about 140,000 blocks, and is easily expanded, or contracted, as desired. The UCSD system is used extensively for both teaching and research purposes. Use of microcomputers emulating Tektronix 4014 graphics terminals permits saving of graphics output to disk for subsequent modification to generate high quality publishable figures.

  18. Genome sequence analysis of dengue virus 1 isolated in Key West, Florida.

    PubMed

    Shin, Dongyoung; Richards, Stephanie L; Alto, Barry W; Bettinardi, David J; Smartt, Chelsea T

    2013-01-01

    Dengue virus (DENV) is transmitted to humans through the bite of mosquitoes. In November 2010, a dengue outbreak was reported in Monroe County in southern Florida (FL), including greater than 20 confirmed human cases. The virus collected from the human cases was verified as DENV serotype 1 (DENV-1) and one isolate was provided for sequence analysis. RNA was extracted from the DENV-1 isolate and was used in reverse transcription polymerase chain reaction (RT-PCR) to amplify PCR fragments to sequence. Nucleic acid primers were designed to generate overlapping PCR fragments that covered the entire genome. The DENV-1 isolate found in Key West (KW), FL was sequenced for whole genome characterization. Sequence assembly, Genbank searches, and recombination analyses were performed to verify the identity of the genome sequences and to determine percent similarity to known DENV-1 sequences. We show that the KW DENV-1 strain is 99% identical to Nicaraguan and Mexican DENV-1 strains. Phylogenetic and recombination analyses suggest that the DENV-1 isolated in KW originated from Nicaragua (NI) and the KW strain may circulate in KW. Also, recombination analysis results detected recombination events in the KW strain compared to DENV-1 strains from Puerto Rico. We evaluate the relative growth of KW strain of DENV-1 compared to other dengue viruses to determine whether the underlying genetics of the strain is associated with a replicative advantage, an important consideration since local transmission of DENV may result because domestic tourism can spread DENVs. PMID:24098658

  19. Dynamic behavior of an intrinsically unstructured linker domain is conserved in the face of negligible amino acid sequence conservation.

    PubMed

    Daughdrill, Gary W; Narayanaswami, Pranesh; Gilmore, Sara H; Belczyk, Agniezka; Brown, Celeste J

    2007-09-01

    Proteins or regions of proteins that do not form compact globular structures are classified as intrinsically unstructured proteins (IUPs). IUPs are common in nature and have essential molecular functions, but even a limited understanding of the evolution of their dynamic behavior is lacking. The primary objective of this work was to test the evolutionary conservation of dynamic behavior for a particular class of IUPs that form intrinsically unstructured linker domains (IULD) that tether flanking folded domains. This objective was accomplished by measuring the backbone flexibility of several IULD homologues using nuclear magnetic resonance (NMR) spectroscopy. The backbone flexibility of five IULDs, representing three kingdoms, was measured and analyzed. Two IULDs from animals, one IULD from fungi, and two IULDs from plants showed similar levels of backbone flexibility that were consistent with the absence of a compact globular structure. In contrast, the amino acid sequences of the IULDs from these three taxa showed no significant similarity. To investigate how the dynamic behavior of the IULDs could be conserved in the absence of detectable sequence conservation, evolutionary rate studies were performed on a set of nine mammalian IULDs. The results of this analysis showed that many sites in the IULD are evolving neutrally, suggesting that dynamic behavior can be maintained in the absence of natural selection. This work represents the first experimental test of the evolutionary conservation of dynamic behavior and demonstrates that amino acid sequence conservation is not required for the conservation of dynamic behavior and presumably molecular function. PMID:17721672

  20. Nucleotide sequence of the fadR gene, a multifunctional regulator of fatty acid metabolism in Escherichia coli.

    PubMed Central

    DiRusso, C C

    1988-01-01

    The Escherichia coli fadR gene is a multifunctional regulator of fatty acid and acetate metabolism. In the present work the nucleotide sequence of the 1.3 kb DNA fragment which encodes FadR has been determined. The coding sequence of the fadR gene is 714 nucleotides long and is preceded by a typical E. coli ribosome binding site and is followed by a sequence predicted to be sufficient for factor-independent chain termination. Primer extension experiments demonstrated that the transcription of the fadR gene initiates with an adenine nucleotide 33 nucleotides upstream from the predicted start of translation. The derived fadR peptide has a calculated molecular weight of 26,972. This is in reasonable agreement with the apparent molecular weight of 29,000 previously estimated on the basis of maxi-cell analysis of plasmid encoded proteins. There is a segment of twenty amino acids within the predicted peptide which resembles the DNA recognition and binding site of many transcriptional regulatory proteins. Images PMID:2843809

  1. The Complete Genome Sequence of the Lactic Acid Bacterium Lactococcus lactis ssp. lactis IL1403

    PubMed Central

    Bolotin, Alexander; Wincker, Patrick; Mauger, Stéphane; Jaillon, Olivier; Malarme, Karine; Weissenbach, Jean; Ehrlich, S. Dusko; Sorokin, Alexei

    2001-01-01

    Lactococcus lactis is a nonpathogenic AT-rich gram-positive bacterium closely related to the genus Streptococcus and is the most commonly used cheese starter. It is also the best-characterized lactic acid bacterium. We sequenced the genome of the laboratory strain IL1403, using a novel two-step strategy that comprises diagnostic sequencing of the entire genome and a shotgun polishing step. The genome contains 2,365,589 base pairs and encodes 2310 proteins, including 293 protein-coding genes belonging to six prophages and 43 insertion sequence (IS) elements. Nonrandom distribution of IS elements indicates that the chromosome of the sequenced strain may be a product of recent recombination between two closely related genomes. A complete set of late competence genes is present, indicating the ability of L. lactis to undergo DNA transformation. Genomic sequence revealed new possibilities for fermentation pathways and for aerobic respiration. It also indicated a horizontal transfer of genetic information from Lactococcus to gram-negative enteric bacteria of Salmonella-Escherichia group. [The sequence data described in this paper has been submitted to the GenBank data library under accession no. AE005176.] PMID:11337471

  2. Cloning and sequence analysis of the Blumea balsamifera DC farnesyl diphosphate synthase gene.

    PubMed

    Pang, Y X; Guan, L L; Wu, L F; Chen, Z X; Wang, K; Xie, X L; Yu, F L; Chen, X L; Zhang, Y B; Jiang, Q

    2014-01-01

    Blumea balsamifera DC is a member of the Compositae family and is frequently used as traditional Chinese medicine. Blumea balsamifera is rich in monoterpenes, which possess a variety of pharmacological activities, such as antioxidant, anti-bacteria, and anti-viral activities. Farnesyl diphosphate synthase (FPS) is a key enzyme in the biosynthetic pathway of terpenes, playing an important regulatory role in plant growth, such as resistance and secondary metabolism. Based on the conserved oligo amino acid residues of published FPS genes from other higher plant species, a cDNA sequence, designated BbFPS, was isolated from B. balsamifera DC using polymerase chain reaction. The clones were an average of 1.6 kb and contained an open reading frame that predicted a polypeptide of 342 amino acids with 89.07% identity to FPS from other plants. The deduced amino acid sequence was dominated by hydrophobic regions and contained 2 highly conserved DDxxD motifs that are essential for proper functioning of FPS. Phylogenetic analysis indicated that FPS grouped with other composite families. Prediction of secondary structure and subcellular localization suggested that alpha helices made up 70% of the amino acids of the sequence. PMID:25501197

  3. Sequence analysis of the aminoacylase-1 family. A new proposed signature for metalloexopeptidases.

    PubMed

    Biagini, A; Puigserver, A

    2001-03-01

    The amino acid sequence analysis of the human and porcine aminoacylases-1, the carboxypeptidase S precursor from Saccharomyces cerevisiae, the succinyl-diaminopimelate desuccinylase from Escherichia coli, Haemophilus influenzae and Corynebacterium glutamicum, the acetylornithine deacetylase from Escherichia coli and Dictyostelium discoideum and the carboxypeptidase G(2) precursor from Pseudomonas strain, using the Basic Local Alignment Search Tool (BLAST) and the Position-Specific Iterated BLAST (PSI-BLAST), allowed us to suggest that all these enzymes, which share common functional and biochemical features, belong to the same structural family. The three amino acid blocks which were found to be highly conserved, using the CLUSTAL W program, could be assigned to the catalytic active site, based on the general three-dimensional structure of the carboxypeptidase G(2) from the Pseudomonas strain precursor. Six additional proteins with the same signature have been retrieved after performing two successive PSI-BLAST iterations using the sequence of the conserved motif, namely Lactobacillus delbrueckii aminoacyl-histidine dipeptidase, Streptomyces griseus aminopeptidase, Saccharomyces cerevisiae aminopeptidase Y precursor, two Bacillus stearothermophilus N-carbamyl-L-amino acid amidohydrolases and Pseudomonas sp. hydantoin utilization protein C. The three conserved amino acid motifs corresponded to the following blocks: (i) [S, G, A]-H-x-D-x-V; (ii) G-x-x-D; and (iii) x-E-E. This new sequence signature is clearly different from that commonly reported in the literature for proteins belonging to the ArgE/DapE/CPG2/YscS family. PMID:11250542

  4. Self-sequencing of amino acids and origins of polyfunctional protocells.

    PubMed

    Fox, S W

    1984-01-01

    The primal role of the origins of proteins in molecular evolution is discussed. On the basis of this premise, the significance of the experimentally established self-sequencing of amino acids under simulated geological conditions is explained as due to the fact that the products are highly nonrandom and accordingly contain many kinds of information. When such thermal proteins are aggregated into laboratory protocells, an action that occurs readily, the resultant protocells also contain many kinds of information. Residue-by-residue order, enzymic activities, and lipid quality accordingly occur within each preparation of proteinoid (thermal protein). In this paper are reviewed briefly the phenomenon of self-sequencing of amino acids, its relationship to evolutionary processes, other significance of such self-ordering, and the experimental evidence for original polyfunctional protocells. PMID:6462684

  5. Self-Sequencing of Amino Acids and Origins of Polyfunctional Protocells

    NASA Astrophysics Data System (ADS)

    Fox, Sidney W.

    1984-12-01

    The primal role of the origins of proteins in molecular evolution is discussed. On the basis of this premise, the significance of the experimentally established self-sequencing of amino acids under simulated geological conditions is explained as due to the fact that the products are highly nonrandom and accordingly contain many kinds of information. When such thermal proteins are aggregated into laboratory protocells, an action that occurs readily, the resultant protocells also contain many kinds of information. Residue-by-residue order, enzymic activities, and lipid quality accordingly occur within each preparation of proteinoid (thermal protein). In this paper are reviewed briefly the phenomenon of self-sequencing of amino acids, its relationship to evolutionary processes, other significance of such self-ordering, and the experimental evidence for original polyfunctional protocells.

  6. Sequence Analysis and Domain Motifs in the Porcine Skin Decorin Glycosaminoglycan Chain*

    PubMed Central

    Zhao, Xue; Yang, Bo; Solakylidirim, Kemal; Joo, Eun Ji; Toida, Toshihiko; Higashi, Kyohei; Linhardt, Robert J.; Li, Lingyun

    2013-01-01

    Decorin proteoglycan is comprised of a core protein containing a single O-linked dermatan sulfate/chondroitin sulfate glycosaminoglycan (GAG) chain. Although the sequence of the decorin core protein is determined by the gene encoding its structure, the structure of its GAG chain is determined in the Golgi. The recent application of modern MS to bikunin, a far simpler chondroitin sulfate proteoglycans, suggests that it has a single or small number of defined sequences. On this basis, a similar approach to sequence the decorin of porcine skin much larger and more structurally complex dermatan sulfate/chondroitin sulfate GAG chain was undertaken. This approach resulted in information on the consistency/variability of its linkage region at the reducing end of the GAG chain, its iduronic acid-rich domain, glucuronic acid-rich domain, and non-reducing end. A general motif for the porcine skin decorin GAG chain was established. A single small decorin GAG chain was sequenced using MS/MS analysis. The data obtained in the study suggest that the decorin GAG chain has a small or a limited number of sequences. PMID:23423381

  7. Molecular analysis of the complete genomic sequences of four isolates of Gooseberry vein banding associated virus.

    PubMed

    Xu, Donglin; Mock, Ray; Kinard, Gary; Li, Ruhui

    2011-08-01

    The presence of Gooseberry vein banding associated virus (GVBaV), a badnavirus in the family Caulimoviridae, is strongly correlated with gooseberry vein banding disease in Ribes spp. In this study, full-length genomic sequences of four GVBaV isolates from different hosts and geographic regions were determined to be 7649-7663 nucleotides. These isolates share identities of 96.4-97.3% for the complete genomic sequence, indicating low genetic diversity among them. The GVBaV genome contains three open reading frames (ORFs) on the plus strand that potentially encode proteins of 26, 16, and 216 kDa. The size and organization of GVBaV ORFs 1-3 are similar to those of most other badnaviruses. The putative amino acid sequence of GVBaV ORF 3 contained motifs that are conserved among badnavirus proteins including aspartic protease, reverse transcriptase, and ribonuclease H. The highly conserved putative plant tRNA(met)-binding site is also present in the 935-bp intergenic region of GVBaV. The identities of the genomic sequences of GVBaV and other badnaviruses range from 49.1% (Sugarcane bacilliform Mor virus) to 51.7% (Pelargonium vein banding virus, PVBV). Phylogenetic analysis using the amino acid sequence of the ORF 3 putative protein shows that GVBaV groups most closely to Dioscorea bacilliform virus, PVBV, and Taro bacilliform virus. These results confirm that GVBaV is a pararetrovirus of the genus Badnavirus in the family Caulimoviridae. PMID:21533750

  8. Identification of a 35-kilodalton serovar-cross-reactive flagellar protein, FlaB, from Leptospira interrogans by N-terminal sequencing, gene cloning, and sequence analysis.

    PubMed Central

    Lin, M; Surujballi, O; Nielsen, K; Nadin-Davis, S; Randall, G

    1997-01-01

    During the screening of antibodies to pathogenic leptospires, a murine monoclonal antibody (designated M138) was found to react with various serovars. An antigen of approximately 35 kDa from Leptospira interrogans serovar pomona, which reacted strongly with M138, was characterized by N-terminal amino acid sequencing and identified as a flagellin, a class B polypeptide subunit (FlaB) of the periplasmic flagella. The gene encoding the FlaB protein, flaB, was amplified from the genomic DNA of several pathogenic serovars by PCR with a single pair of oligonucleotide primers, suggesting that FlaB is highly conserved among these serovars. Cloning and sequence analysis of flaB from serovar pomona revealed that it contains an 849-bp open reading frame with a G + C content of 46.88% which encodes a 283-amino-acid protein with a calculated molecular mass of 31.297 kDa and a predicted pI of 9.065. A sequence comparison of flagellin proteins revealed that the amino acid sequence is most variable in the central portion of the serovar pomona FlaB, which is believed to contain specific sequence information and which may thus be useful in the design of DNA or synthetic peptide probes suitable for the detection of infection with pathogenic leptospires. PMID:9317049

  9. Sequence of morphological transitions in two-dimensional pattern growth from aqueous ascorbic Acid solutions.

    PubMed

    Paranjpe, A S

    2002-08-12

    A sequence of morphological transitions in two-dimensional dehydration patterns of aqueous solutions of ascorbic acid is observed with humidity as a control parameter. Change in morphology occurs due to humidity induced variation in the concentration of the metastable supersaturated solution phase formed after initial solvent evaporation. As percent humidity is varied from 40 to 80, patterns change from compact circular --> radial --> density modulated radial (a new morphology) --> density modulated circular --> density modulated dendritic (a new morphology) --> dense branching. PMID:12190528

  10. Self-sequencing of amino acids and origins of polyfunctional protocells

    NASA Technical Reports Server (NTRS)

    Fox, S. W.

    1984-01-01

    The role of proteins in the origin of living things is discussed. It has been experimentally established that amino acids can sequence themselves under simulated geological conditions with highly nonrandom products which accordingly contain diverse information. Multiple copies of each type of macromolecule are formed, resulting in greater power for any protoenzymic molecule than would accrue from a single copy of each type. Thermal proteins are readily incorporated into laboratory protocells. The experimental evidence for original polyfunctional protocells is discussed.

  11. Snake venom. The amino acid sequence of protein A from Dendroaspis polylepis polylepis (black mamba) venom.

    PubMed

    Joubert, F J; Strydom, D J

    1980-12-01

    Protein A from Dendroaspis polylepis polylepis venom comprises 81 amino acids, including ten half-cystine residues. The complete primary structures of protein A and its variant A' were elucidated. The sequences of proteins A and A', which differ in a single position, show no homology with various neurotoxins and non-neurotoxic proteins and represent a new type of elapid venom protein. PMID:7461607

  12. Nucleic acid analysis using terminal-phosphate-labeled nucleotides

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2008-04-22

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  13. Multilocus sequence analysis of phytopathogenic species of the genus Streptomyces

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The identification and classification of species within the genus Streptomyces is difficult because there are presently 576 validly described species and this number increases every year. The value of the application of multilocus sequence analysis scheme to the systematics of Streptomyces species h...

  14. Efficient analysis of mouse genome sequences reveal many nonsense variants.

    PubMed

    Steeland, Sophie; Timmermans, Steven; Van Ryckeghem, Sara; Hulpiau, Paco; Saeys, Yvan; Van Montagu, Marc; Vandenbroucke, Roosmarijn E; Libert, Claude

    2016-05-17

    Genetic polymorphisms in coding genes play an important role when using mouse inbred strains as research models. They have been shown to influence research results, explain phenotypical differences between inbred strains, and increase the amount of interesting gene variants present in the many available inbred lines. SPRET/Ei is an inbred strain derived from Mus spretus that has ∼1% sequence difference with the C57BL/6J reference genome. We obtained a listing of all SNPs and insertions/deletions (indels) present in SPRET/Ei from the Mouse Genomes Project (Wellcome Trust Sanger Institute) and processed these data to obtain an overview of all transcripts having nonsynonymous coding sequence variants. We identified 8,883 unique variants affecting 10,096 different transcripts from 6,328 protein-coding genes, which is about 28% of all coding genes. Because only a subset of these variants results in drastic changes in proteins, we focused on variations that are nonsense mutations that ultimately resulted in a gain of a stop codon. These genes were identified by in silico changing the C57BL/6J coding sequences to the SPRET/Ei sequences, converting them to amino acid (AA) sequences, and comparing the AA sequences. All variants and transcripts affected were also stored in a database, which can be browsed using a SPRET/Ei M. spretus variants web tool (www.spretus.org), including a manual. We validated the tool by demonstrating the loss of function of three proteins predicted to be severely truncated, namely Fas, IRAK2, and IFNγR1. PMID:27147605

  15. Molecular characterization of Giardia psittaci by multilocus sequence analysis.

    PubMed

    Abe, Niichiro; Makino, Ikuko; Kojima, Atsushi

    2012-12-01

    Multilocus sequence analyses targeting small subunit ribosomal DNA (SSU rDNA), elongation factor 1 alpha (ef1α), glutamate dehydrogenase (gdh), and beta giardin (β-giardin) were performed on Giardia psittaci isolates from three Budgerigars (Melopsittacus undulates) and four Barred parakeets (Bolborhynchus lineola) kept in individual households or imported from overseas. Nucleotide differences and phylogenetic analyses at four loci indicate the distinction of G. psittaci from the other known Giardia species: Giardia muris, Giardia microti, Giardia ardeae, and Giardia duodenalis assemblages. Furthermore, G. psittaci was related more closely to G. duodenalis than to the other known Giardia species, except for G. microti. Conflicting signals regarded as "double peaks" were found at the same nucleotide positions of the ef1α in all isolates. However, the sequences of the other three loci, including gdh and β-giardin, which are known to be highly variable, from all isolates were also mutually identical at every locus. They showed no double peaks. These results suggest that double peaks found in the ef1α sequences are caused not by mixed infection with genetically different G. psittaci isolates but by allelic sequence heterogeneity (ASH), which is observed in diplomonad lineages including G. duodenalis. No sequence difference was found in any G. psittaci isolates at the gdh and β-giardin, suggesting that G. psittaci is indeed not more diverse genetically than other Giardia species. This report is the first to provide evidence related to the genetic characteristics of G. psittaci obtained using multilocus sequence analysis. PMID:22921500

  16. Motion sequence analysis in the presence of figural cues

    PubMed Central

    Sinha, Pawan; Vaina, Lucia M.

    2015-01-01

    The perception of 3D structure in dynamic sequences is believed to be subserved primarily through the use of motion cues. However, real-world sequences contain many figural shape cues besides the dynamic ones. We hypothesize that if figural cues are perceptually significant during sequence analysis, then inconsistencies in these cues over time would lead to percepts of non-rigidity in sequences showing physically rigid objects in motion. We develop an experimental paradigm to test this hypothesis and present results with two patients with impairments in motion perception due to focal neurological damage, as well as two control subjects. Consistent with our hypothesis, the data suggest that figural cues strongly influence the perception of structure in motion sequences, even to the extent of inducing non-rigid percepts in sequences where motion information alone would yield rigid structures. Beyond helping to probe the issue of shape perception, our experimental paradigm might also serve as a possible perceptual assessment tool in a clinical setting. PMID:26028822

  17. Sequence Analysis of the Genome of Carnation (Dianthus caryophyllus L.)

    PubMed Central

    Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

    2014-01-01

    The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. ‘Francesco’ was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568 887 315 bp, consisting of 45 088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16 644 bp and 60 737 bp, respectively, and the longest scaffold was 1 287 144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. PMID:24344172

  18. Analysis of singleton ORFans in fully sequenced microbial genomes.

    PubMed

    Siew, Naomi; Fischer, Daniel

    2003-11-01

    Singleton sequence ORFans are orphan ORFs (open reading frames) that have no detectable sequence similarity to any other sequence in the databases. ORFans are of particular interest not only as evolutionary puzzles but also because we can learn little about them using bioinformatics tools. Here, we present a first systematic analysis of singleton ORFans in the first 60 fully sequenced microbial genomes. We show that although ORFans have been underemphasized, the number of ORFans is steadily growing, currently accounting for 23,634 sequences. At the same time, the percentage of ORFans as a fraction of all sequences is slowly diminishing, and is currently about 14%. Short ORFans comprise about 61% of all ORFans. The abundance of short ORFans may be due to a yet unexplained artifact. The data also suggest that the number of longer ORFans may soon diminish as more genomes of closely related organisms become available. To better address the questions about the functions and origins of ORFans, we propose to focus further studies on the longer ORFans, with emphasis on three new types of ORFans: ORFan modules, paralogous ORFans, and orthologous ORFans. We conclude that the large number of ORFans reflects an intrinsic property of the genetic material not yet fully understood. Further computational and experimental studies aimed at understanding Nature's protein diversity should also include ORFans. PMID:14517975

  19. Characterization of the microbial acid mine drainage microbial community using culturing and direct sequencing techniques.

    PubMed

    Auld, Ryan R; Myre, Maxine; Mykytczuk, Nadia C S; Leduc, Leo G; Merritt, Thomas J S

    2013-05-01

    We characterized the bacterial community from an AMD tailings pond using both classical culturing and modern direct sequencing techniques and compared the two methods. Acid mine drainage (AMD) is produced by the environmental and microbial oxidation of minerals dissolved from mining waste. Surprisingly, we know little about the microbial communities associated with AMD, despite the fundamental ecological roles of these organisms and large-scale economic impact of these waste sites. AMD microbial communities have classically been characterized by laboratory culturing-based techniques and more recently by direct sequencing of marker gene sequences, primarily the 16S rRNA gene. In our comparison of the techniques, we find that their results are complementary, overall indicating very similar community structure with similar dominant species, but with each method identifying some species that were missed by the other. We were able to culture the majority of species that our direct sequencing results indicated were present, primarily species within the Acidithiobacillus and Acidiphilium genera, although estimates of relative species abundance were only obtained from direct sequencing. Interestingly, our culture-based methods recovered four species that had been overlooked from our sequencing results because of the rarity of the marker gene sequences, likely members of the rare biosphere. Further, direct sequencing indicated that a single genus, completely missed in our culture-based study, Legionella, was a dominant member of the microbial community. Our results suggest that while either method does a reasonable job of identifying the dominant members of the AMD microbial community, together the methods combine to give a more complete picture of the true diversity of this environment. PMID:23485423

  20. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... approved by the Director of the Federal Register in accordance with 5 U.S.C. 552(a) and 1 CFR part 51... base or modified or unusual amino acid may be presented in a given sequence as the corresponding unmodified base or amino acid if the modified base or modified or unusual amino acid is one of those...

  1. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... approved by the Director of the Federal Register in accordance with 5 U.S.C. 552(a) and 1 CFR part 51... base or modified or unusual amino acid may be presented in a given sequence as the corresponding unmodified base or amino acid if the modified base or modified or unusual amino acid is one of those...

  2. DNAApp: a mobile application for sequencing data analysis

    PubMed Central

    Nguyen, Phi-Vu; Verma, Chandra Shekhar; Gan, Samuel Ken-En

    2014-01-01

    Summary: There have been numerous applications developed for decoding and visualization of ab1 DNA sequencing files for Windows and MAC platforms, yet none exists for the increasingly popular smartphone operating systems. The ability to decode sequencing files cannot easily be carried out using browser accessed Web tools. To overcome this hurdle, we have developed a new native app called DNAApp that can decode and display ab1 sequencing file on Android and iOS. In addition to in-built analysis tools such as reverse complementation, protein translation and searching for specific sequences, we have incorporated convenient functions that would facilitate the harnessing of online Web tools for a full range of analysis. Given the high usage of Android/iOS tablets and smartphones, such bioinformatics apps would raise productivity and facilitate the high demand for analyzing sequencing data in biomedical research. Availability and implementation: The Android version of DNAApp is available in Google Play Store as ‘DNAApp’, and the iOS version is available in the App Store. More details on the app can be found at www.facebook.com/APDLab; www.bii.a-star.edu.sg/research/trd/apd.php The DNAApp user guide is available at http://tinyurl.com/DNAAppuser, and a video tutorial is available on Google Play Store and App Store, as well as on the Facebook page. Contact: samuelg@bii.a-star.edu.sg PMID:25095882

  3. Halvade: scalable sequence analysis with MapReduce

    PubMed Central

    Decap, Dries; Reumers, Joke; Herzeel, Charlotte; Costanza, Pascal; Fostier, Jan

    2015-01-01

    Motivation: Post-sequencing DNA analysis typically consists of read mapping followed by variant calling. Especially for whole genome sequencing, this computational step is very time-consuming, even when using multithreading on a multi-core machine. Results: We present Halvade, a framework that enables sequencing pipelines to be executed in parallel on a multi-node and/or multi-core compute infrastructure in a highly efficient manner. As an example, a DNA sequencing analysis pipeline for variant calling has been implemented according to the GATK Best Practices recommendations, supporting both whole genome and whole exome sequencing. Using a 15-node computer cluster with 360 CPU cores in total, Halvade processes the NA12878 dataset (human, 100 bp paired-end reads, 50× coverage) in <3 h with very high parallel efficiency. Even on a single, multi-core machine, Halvade attains a significant speedup compared with running the individual tools with multithreading. Availability and implementation: Halvade is written in Java and uses the Hadoop MapReduce 2.0 API. It supports a wide range of distributions of Hadoop, including Cloudera and Amazon EMR. Its source is available at http://bioinformatics.intec.ugent.be/halvade under GPL license. Contact: jan.fostier@intec.ugent.be Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25819078

  4. Exploration of phylogenetic data using a global sequence analysis method

    PubMed Central

    Chapus, Charles; Dufraigne, Christine; Edwards, Scott; Giron, Alain; Fertil, Bernard; Deschavanne, Patrick

    2005-01-01

    Background Molecular phylogenetic methods are based on alignments of nucleic or peptidic sequences. The tremendous increase in molecular data permits phylogenetic analyses of very long sequences and of many species, but also requires methods to help manage large datasets. Results Here we explore the phylogenetic signal present in molecular data by genomic signatures, defined as the set of frequencies of short oligonucleotides present in DNA sequences. Although violating many of the standard assumptions of traditional phylogenetic analyses – in particular explicit statements of homology inherent in character matrices – the use of the signature does permit the analysis of very long sequences, even those that are unalignable, and is therefore most useful in cases where alignment is questionable. We compare the results obtained by traditional phylogenetic methods to those inferred by the signature method for two genes: RAG1, which is easily alignable, and 18S RNA, where alignments are often ambiguous for some regions. We also apply this method to a multigene data set of 33 genes for 9 bacteria and one archea species as well as to the whole genome of a set of 16 γ-proteobacteria. In addition to delivering phylogenetic results comparable to traditional methods, the comparison of signatures for the sequences involved in the bacterial example identified putative candidates for horizontal gene transfers. Conclusion The signature method is therefore a fast tool for exploring phylogenetic data, providing not only a pretreatment for discovering new sequence relationships, but also for identifying cases of sequence evolution that could confound traditional phylogenetic analysis. PMID:16280081

  5. Sequence analysis and enzyme kinetics of the L2 serine beta-lactamase from Stenotrophomonas maltophilia.

    PubMed Central

    Walsh, T R; MacGowan, A P; Bennett, P M

    1997-01-01

    The L2 serine active-site beta-lactamase from Stenotrophomonas maltophilia has been classified as a clavulanic acid-sensitive cephalosporinase. The gene encoding this enzyme from S. maltophilia 1275 IID has been cloned on a 3.3-kb fragment into pK18 under the control of a Ptac promoter to generate recombinant plasmid pUB5840; when expressed in Escherichia coli, this gene confers resistance to cephalosporins and penicillins. Sequence analysis has revealed an open reading frame (ORF) of 909 bp with a GC content of 71.6%, comparable to that of the L1 metallo-beta-lactamase gene (68.4%) from the same bacterium. The ORF encodes an unmodified protein of 303 amino acids with a predicted molecular mass of 31.5 kDa, accommodating a putative leader peptide of 27 amino acids. Comparison of the amino acid sequence with those of other beta-lactamases showed it to be most closely related (54% identity) to the BLA-A beta-lactamase from Yersinia enterocolitica. Sequence identity is most obvious near the STXK active-site motif and the SDN loop motif common to all serine active-site penicillinases. Sequences outside the conserved regions display low homology with comparable regions of other class A penicillinases. Kinetics of the enzyme from the cloned gene demonstrated an increase in activity with cefotaxime but markedly less activity with imipenem than previously reported. Hence, the S. maltophilia L2 beta-lactamase is an inducible Ambler class A beta-lactamase which would account for the sensitivity to clavulanic acid. PMID:9210666

  6. Sequence analysis of both genome segments of three Croatian infectious bursal disease field viruses.

    PubMed

    Lojkić, I; Bidin, Z; Pokrić, B

    2008-09-01

    In order to determine the mutations responsible for virulence, three Croatian field infectious bursal disease viruses (IBDV), designated Cro-Ig/02, Cro-Po/00, and Cro-Pa/98 were characterized. Coding regions of both genomic segments were sequenced, and the nucleotide and deduced amino acid sequences were compared with previously reported full-length sequenced IBDV strains. Phylogenetic analysis, based on the nucleotide and deduced amino acid sequences of polyprotein and VP1, was performed. Eight characteristic amino acid residues, that were common to very virulent (vv) IBDV, were detected on polyprotein: 222A, 256I, 294I, 451L, 685N, 715S, 751D, and 1005A. All eight were found in Cro-Ig/02 and Cro-Po/00. C-Pa/98 had all the characteristics of an attenuated strain, except for glutamine on residue 253, which is common for vv, classical virulent, and variant strains. Between less virulent and vvIBDV, three substitutions were found on VP5: 49 G --> R, 79 --> F, and 137 R --> W. In VP1, there were nine characteristic amino acid residues common to vvwIBDV: 146D, 147N, 242E, 390M, 393D, 511S, 562P, 687P, and 695R. All nine residues were found in A-Ig/02, and eight were found in B-Po/00, which had isoleucine on residue 390. Based on our analyses, isolates Cro-Ig/02 and Cro-Po/00 were classified with vv IBDV strains. C-Pa/98 shared all characteristic amino acid residues with attenuated and classical virulence strains, so it was classified with those. PMID:18939645

  7. Construction of an integrated database to support genomic sequence analysis

    SciTech Connect

    Gilbert, W.; Overbeek, R.

    1994-11-01

    The central goal of this project is to develop an integrated database to support comparative analysis of genomes including DNA sequence data, protein sequence data, gene expression data and metabolism data. In developing the logic-based system GenoBase, a broader integration of available data was achieved due to assistance from collaborators. Current goals are to easily include new forms of data as they become available and to easily navigate through the ensemble of objects described within the database. This report comments on progress made in these areas.

  8. The amino acid sequence of Lady Amherst's pheasant (Chrysolophus amherstiae) and golden pheasant (Chrysolophus pictus) egg-white lysozymes.

    PubMed

    Araki, T; Kuramoto, M; Torikata, T

    1990-09-01

    The amino acids of Lady Amherst's pheasant and golden pheasant egg-white lysozymes have been sequenced. The carboxymethylated lysozymes were digested with trypsin followed by sequencing of the tryptic peptides. Lady Amherst's pheasant lysozyme proved to consist of 129 amino acid residues, and a relative molecular mass of 14,423 Da was calculated. This lysozyme had 6 amino acids substitutions when compared with hen egg-white lysozyme: Phe3 to Tyr, His15 to Leu, Gln41 to His, Asn77 to His, Gln 121 to Asn, and a newly found substitution of Ile124 to Thr. The amino acid sequence of golden pheasant lysozyme was identical to that of Lady Amherst's phesant lysozyme. The phylogenetic tree constructured by the comparison of amino acid sequences of phasianoid birds lysozymes revealed a minimum genetic distance between these pheasants and the turkey-peafowl group. PMID:1368578

  9. Complete Genome Sequence of a thermotolerant sporogenic lactic acid bacterium, Bacillus coagulans strain 36D1

    PubMed Central

    Rhee, Mun Su; Moritz, Brélan E.; Xie, Gary; Glavina del Rio, T.; Dalin, E.; Tice, H.; Bruce, D.; Goodwin, L.; Chertkov, O.; Brettin, T.; Han, C.; Detter, C.; Pitluck, S.; Land, Miriam L.; Patel, Milind; Ou, Mark; Harbrucker, Roberta; Ingram, Lonnie O.; Shanmugam, K. T.

    2011-01-01

    Bacillus coagulans is a ubiquitous soil bacterium that grows at 50-55 °C and pH 5.0 and ferments various sugars that constitute plant biomass to L (+)-lactic acid. The ability of this sporogenic lactic acid bacterium to grow at 50-55 °C and pH 5.0 makes this organism an attractive microbial biocatalyst for production of optically pure lactic acid at industrial scale not only from glucose derived from cellulose but also from xylose, a major constituent of hemicellulose. This bacterium is also considered as a potential probiotic. Complete genome sequence of a representative strain, B. coagulans strain 36D1, is presented and discussed. PMID:22675583

  10. Complete amino acid sequence of globin chains and biological activity of fragmented crocodile hemoglobin (Crocodylus siamensis).

    PubMed

    Srihongthong, Saowaluck; Pakdeesuwan, Anawat; Daduang, Sakda; Araki, Tomohiro; Dhiravisit, Apisak; Thammasirirak, Sompong

    2012-08-01

    Hemoglobin, α-chain, β-chain and fragmented hemoglobin of Crocodylus siamensis demonstrated both antibacterial and antioxidant activities. Antibacterial and antioxidant properties of the hemoglobin did not depend on the heme structure but could result from the compositions of amino acid residues and structures present in their primary structure. Furthermore, thirteen purified active peptides were obtained by RP-HPLC analyses, corresponding to fragments in the α-globin chain and the β-globin chain which are mostly located at the N-terminal and C-terminal parts. These active peptides operate on the bacterial cell membrane. The globin chains of Crocodylus siamensis showed similar amino acids to the sequences of Crocodylus niloticus. The novel amino acid substitutions of α-chain and β-chain are not associated with the heme binding site or the bicarbonate ion binding site, but could be important through their interactions with membranes of bacteria. PMID:22648692

  11. Comparative characterization of random-sequence proteins consisting of 5, 12, and 20 kinds of amino acids.

    PubMed

    Tanaka, Junko; Doi, Nobuhide; Takashima, Hideaki; Yanagawa, Hiroshi

    2010-04-01

    Screening of functional proteins from a random-sequence library has been used to evolve novel proteins in the field of evolutionary protein engineering. However, random-sequence proteins consisting of the 20 natural amino acids tend to aggregate, and the occurrence rate of functional proteins in a random-sequence library is low. From the viewpoint of the origin of life, it has been proposed that primordial proteins consisted of a limited set of amino acids that could have been abundantly formed early during chemical evolution. We have previously found that members of a random-sequence protein library constructed with five primitive amino acids show high solubility (Doi et al., Protein Eng Des Sel 2005;18:279-284). Although such a library is expected to be appropriate for finding functional proteins, the functionality may be limited, because they have no positively charged amino acid. Here, we constructed three libraries of 120-amino acid, random-sequence proteins using alphabets of 5, 12, and 20 amino acids by preselection using mRNA display (to eliminate sequences containing stop codons and frameshifts) and characterized and compared the structural properties of random-sequence proteins arbitrarily chosen from these libraries. We found that random-sequence proteins constructed with the 12-member alphabet (including five primitive amino acids and positively charged amino acids) have higher solubility than those constructed with the 20-member alphabet, though other biophysical properties are very similar in the two libraries. Thus, a library of moderate complexity constructed from 12 amino acids may be a more appropriate resource for functional screening than one constructed from 20 amino acids. PMID:20162614

  12. Low levels of haptoglobin and putative amino acid sequence in Taiwanese Lanyu miniature pigs.

    PubMed

    Yueh, Sunny C H; Wang, Yao Horng; Lin, Kuan Yu; Tseng, Chi Feng; Chu, Hsien Pin; Chen, Kuen Jaw; Wang, Shih Sheng; Lai, I Hsiang; Mao, Simon J T

    2008-04-01

    Porcine haptoglobin (Hp) is an acute phase protein. Its plasma level increases significantly during inflammation and infection. One of the main functions of Hp is to bind free hemoglobin (Hb) and inhibit its oxidative activity. In the present report, we studied the Hp phenotype of Taiwanese Lanyu miniature pigs (TLY minipigs; n=43) and found their Hp structure to be a homodimer (beta-alpha-alpha-beta) similar to human Hp 1-1. Interestingly, Western blot and high performance liquid chromatographic (HPLC) analysis showed that 25% of the TLY minipigs possessed low or no plasma Hp level (<0.05 mg/ml). The Hp cDNA of these TLY minipigs was then cloned, and the translated amino acid sequence was analyzed. No sequences were found to be deficient; they showed a 99.7% identity with domestic pigs (NP_999165). The mean overall Hp level of the TLY minipigs (0.21 +/- 0.25 mg/ml; n=43) determined by enzyme-linked immunosorbent assay (ELISA) was markedly lower than that of domestic pigs (0.78 +/- 0.45 mg/ml; p<0.001), while 25% of the TLY minipigs had an Hp level that was extremely low (<0.05 mg/ml). In addition, the initial recovery rate (first 40 min) in the circulation of infused fluorescein isothiocyanate (FITC)-Hb was significantly higher in the TLY minipigs with extremely low Hp levels than those with high levels. This data suggests that the low concentration of Hp-Hb complex is responsible for the higher recovery rate of Hb in the circulation. TLY minipigs have been used as an experimental model for cardiovascular diseases; whether they can be used as a model for inflammatory diseases, with Hp as a marker, remains a topic of interest. However, since the Hp level varies significantly among individual TLY minipigs, it is necessary to prescreen the Hp levels of the animals to minimize variation in the experimental baseline. The present study may provide a reference value for future use of the TLY minipig as an animal model for inflammation-associated diseases. PMID:18460833

  13. DNA sequence copy number analysis by Comparative Genomic Hybridization (CGH)

    SciTech Connect

    Pinkel, D.; Kallioniemi, A.; Kallioniemi, O.; Waldman, F.; Sudar, D.; Gray, I. ); Rutovitz, D.; Piper, I. )

    1993-01-01

    Comparative Genomic Hybridization (CGH) uses the kinetics of in situ hybridization to compare the copy numbers of different DNA sequences within the same genome and the copy numbers of the same sequences among different genomes. In a typical application genomic DNA from a tumor and from normal cells are differentially labeled and simultaneously hybridized to normal metaphase chromosomes, and detected with different fluorochromes. Properly registered images of each fluorochrome are obtained using a microscope equipped with multi-band filters and a CCD camera. Digital image analysis permits measurement of intensity ratio profiles along each of the target chromosomes. Studies of cells with known aberrations indicate that the intensity ratio at each position is proportional to the ratio of the copy numbers of the sequences that bind there in the tumor and normal genomes. Analytical challenges posed by the need to efficiently obtain copy number karyotypes are discussed.

  14. Genome sequencing and analysis of the model grass Brachypodium distachyon.

    PubMed

    2010-02-11

    Three subfamilies of grasses, the Ehrhartoideae, Panicoideae and Pooideae, provide the bulk of human nutrition and are poised to become major sources of renewable energy. Here we describe the genome sequence of the wild grass Brachypodium distachyon (Brachypodium), which is, to our knowledge, the first member of the Pooideae subfamily to be sequenced. Comparison of the Brachypodium, rice and sorghum genomes shows a precise history of genome evolution across a broad diversity of the grasses, and establishes a template for analysis of the large genomes of economically important pooid grasses such as wheat. The high-quality genome sequence, coupled with ease of cultivation and transformation, small size and rapid life cycle, will help Brachypodium reach its potential as an important model system for developing new energy and food crops. PMID:20148030

  15. Genome sequencing and analysis of the model grass Brachypodium distachyon

    SciTech Connect

    Yang, Xiaohan; Kalluri, Udaya C; Tuskan, Gerald A

    2010-01-01

    Three subfamilies of grasses, the Ehrhartoideae, Panicoideae and Pooideae, provide the bulk of human nutrition and are poised to become major sources of renewable energy. Here we describe the genome sequence of the wild grass Brachypodium distachyon (Brachypodium), which is, to our knowledge, the first member of the Pooideae subfamily to be sequenced. Comparison of the Brachypodium, rice and sorghum genomes shows a precise history of genome evolution across a broad diversity of the grasses, and establishes a template for analysis of the large genomes of economically important pooid grasses such as wheat. The high-quality genome sequence, coupled with ease of cultivation and transformation, small size and rapid life cycle, will help Brachypodium reach its potential as an important model system for developing new energy and food crops.

  16. DNA sequence and analysis of human chromosome 9.

    PubMed

    Humphray, S J; Oliver, K; Hunt, A R; Plumb, R W; Loveland, J E; Howe, K L; Andrews, T D; Searle, S; Hunt, S E; Scott, C E; Jones, M C; Ainscough, R; Almeida, J P; Ambrose, K D; Ashwell, R I S; Babbage, A K; Babbage, S; Bagguley, C L; Bailey, J; Banerjee, R; Barker, D J; Barlow, K F; Bates, K; Beasley, H; Beasley, O; Bird, C P; Bray-Allen, S; Brown, A J; Brown, J Y; Burford, D; Burrill, W; Burton, J; Carder, C; Carter, N P; Chapman, J C; Chen, Y; Clarke, G; Clark, S Y; Clee, C M; Clegg, S; Collier, R E; Corby, N; Crosier, M; Cummings, A T; Davies, J; Dhami, P; Dunn, M; Dutta, I; Dyer, L W; Earthrowl, M E; Faulkner, L; Fleming, C J; Frankish, A; Frankland, J A; French, L; Fricker, D G; Garner, P; Garnett, J; Ghori, J; Gilbert, J G R; Glison, C; Grafham, D V; Gribble, S; Griffiths, C; Griffiths-Jones, S; Grocock, R; Guy, J; Hall, R E; Hammond, S; Harley, J L; Harrison, E S I; Hart, E A; Heath, P D; Henderson, C D; Hopkins, B L; Howard, P J; Howden, P J; Huckle, E; Johnson, C; Johnson, D; Joy, A A; Kay, M; Keenan, S; Kershaw, J K; Kimberley, A M; King, A; Knights, A; Laird, G K; Langford, C; Lawlor, S; Leongamornlert, D A; Leversha, M; Lloyd, C; Lloyd, D M; Lovell, J; Martin, S; Mashreghi-Mohammadi, M; Matthews, L; McLaren, S; McLay, K E; McMurray, A; Milne, S; Nickerson, T; Nisbett, J; Nordsiek, G; Pearce, A V; Peck, A I; Porter, K M; Pandian, R; Pelan, S; Phillimore, B; Povey, S; Ramsey, Y; Rand, V; Scharfe, M; Sehra, H K; Shownkeen, R; Sims, S K; Skuce, C D; Smith, M; Steward, C A; Swarbreck, D; Sycamore, N; Tester, J; Thorpe, A; Tracey, A; Tromans, A; Thomas, D W; Wall, M; Wallis, J M; West, A P; Whitehead, S L; Willey, D L; Williams, S A; Wilming, L; Wray, P W; Young, L; Ashurst, J L; Coulson, A; Blöcker, H; Durbin, R; Sulston, J E; Hubbard, T; Jackson, M J; Bentley, D R; Beck, S; Rogers, J; Dunham, I

    2004-05-27

    Chromosome 9 is highly structurally polymorphic. It contains the largest autosomal block of heterochromatin, which is heteromorphic in 6-8% of humans, whereas pericentric inversions occur in more than 1% of the population. The finished euchromatic sequence of chromosome 9 comprises 109,044,351 base pairs and represents >99.6% of the region. Analysis of the sequence reveals many intra- and interchromosomal duplications, including segmental duplications adjacent to both the centromere and the large heterochromatic block. We have annotated 1,149 genes, including genes implicated in male-to-female sex reversal, cancer and neurodegenerative disease, and 426 pseudogenes. The chromosome contains the largest interferon gene cluster in the human genome. There is also a region of exceptionally high gene and G + C content including genes paralogous to those in the major histocompatibility complex. We have also detected recently duplicated genes that exhibit different rates of sequence divergence, presumably reflecting natural selection. PMID:15164053

  17. DNA sequence and analysis of human chromosome 9

    PubMed Central

    Humphray, S. J.; Oliver, K.; Hunt, A. R.; Plumb, R. W.; Loveland, J. E.; Howe, K. L.; Andrews, T. D.; Searle, S.; Hunt, S. E.; Scott, C. E.; Jones, M. C.; Ainscough, R.; Almeida, J. P.; Ambrose, K. D.; Ashwell, R. I. S.; Babbage, A. K.; Babbage, S.; Bagguley, C. L.; Bailey, J.; Banerjee, R.; Barker, D. J.; Barlow, K. F.; Bates, K.; Beasley, H.; Beasley, O.; Bird, C. P.; Bray-Allen, S.; Brown, A. J.; Brown, J. Y.; Burford, D.; Burrill, W.; Burton, J.; Carder, C.; Carter, N. P.; Chapman, J. C.; Chen, Y.; Clarke, G.; Clark, S. Y.; Clee, C. M.; Clegg, S.; Collier, R. E.; Corby, N.; Crosier, M.; Cummings, A. T.; Davies, J.; Dhami, P.; Dunn, M.; Dutta, I.; Dyer, L. W.; Earthrowl, M. E.; Faulkner, L.; Fleming, C. J.; Frankish, A.; Frankland, J. A.; French, L.; Fricker, D. G.; Garner, P.; Garnett, J.; Ghori, J.; Gilbert, J. G. R.; Glison, C.; Grafham, D. V.; Gribble, S.; Griffiths, C.; Griffiths-Jones, S.; Grocock, R.; Guy, J.; Hall, R. E.; Hammond, S.; Harley, J. L.; Harrison, E. S. I.; Hart, E. A.; Heath, P. D.; Henderson, C. D.; Hopkins, B. L.; Howard, P. J.; Howden, P. J.; Huckle, E.; Johnson, C.; Johnson, D.; Joy, A. A.; Kay, M.; Keenan, S.; Kershaw, J. K.; Kimberley, A. M.; King, A.; Knights, A.; Laird, G. K.; Langford, C.; Lawlor, S.; Leongamornlert, D. A.; Leversha, M.; Lloyd, C.; Lloyd, D. M.; Lovell, J.; Martin, S.; Mashreghi-Mohammadi, M.; Matthews, L.; McLaren, S.; McLay, K. E.; McMurray, A.; Milne, S.; Nickerson, T.; Nisbett, J.; Nordsiek, G.; Pearce, A. V.; Peck, A. I.; Porter, K. M.; Pandian, R.; Pelan, S.; Phillimore, B.; Povey, S.; Ramsey, Y.; Rand, V.; Scharfe, M.; Sehra, H. K.; Shownkeen, R.; Sims, S. K.; Skuce, C. D.; Smith, M.; Steward, C. A.; Swarbreck, D.; Sycamore, N.; Tester, J.; Thorpe, A.; Tracey, A.; Tromans, A.; Thomas, D. W.; Wall, M.; Wallis, J. M.; West, A. P.; Whitehead, S. L.; Willey, D. L.; Williams, S. A.; Wilming, L.; Wray, P. W.; Young, L.; Ashurst, J. L.; Coulson, A.; Blöcker, H.; Durbin, R.; Sulston, J. E.; Hubbard, T.; Jackson, M. J.; Bentley, D. R.; Beck, S.; Rogers, J.; Dunham, I.

    2009-01-01

    Chromosome 9 is highly structurally polymorphic. It contains the largest autosomal block of heterochromatin, which is heteromorphic in 6–8% of humans, whereas pericentric inversions occur in more than 1% of the population. The finished euchromatic sequence of chromosome 9 comprises 109,044,351 base pairs and represents >99.6% of the region. Analysis of the sequence reveals many intra- and interchromosomal duplications, including segmental duplications adjacent to both the centromere and the large heterochromatic block. We have annotated 1,149 genes, including genes implicated in male-to-female sex reversal, cancer and neurodegenerative disease, and 426 pseudogenes. The chromosome contains the largest interferon gene cluster in the human genome. There is also a region of exceptionally high gene and G + C content including genes paralogous to those in the major histocompatibility complex. We have also detected recently duplicated genes that exhibit different rates of sequence divergence, presumably reflecting natural selection. PMID:15164053

  18. A biostratigraphic sequence analysis in Cretaceous sediments from Eastern Venezuela

    SciTech Connect

    Paredes, I.; Carillo, M.; Fasola, A.; Luna, F. )

    1993-02-01

    This paper presents the results of a high resolution biostratigraphic study integrated with petrophysic analyses, of the Late Cretaceous sequence in several wells from the Maturin Sub-Basin, Eastern Venezuela. The main objective of this study is to integrate the different faunal and floral assemblages to the sedimentological evolution of the basin using sequential analysis techniques. This technique was applied using mainly terrestrial and marine palynomorphs which were relatively abundant and diverse as compared to the scarcity of foraminifera and nonnofossils. Based on the percentages of abundance and the diversity of the different groups of microfoss it was possible to establish the maximum flooding surfaces and condensation levels which allowed the definition of the possible candidates for the sequence boundaries. On the other hand, the identified bioevents made possible the definition of the chronostratigraphic datums of the sequence under study. The results obtained will contribute to optimize the exploration and development programs of the oil fields in Eastern Venezuela.

  19. Characterization and differential expression analysis of artichoke phenylalanine ammonia-lyase-coding sequences.

    PubMed

    De Paolis, Angelo; Pignone, Domenico; Morgese, Anita; Sonnante, Gabriella

    2008-01-01

    Sequences encoding phenylalanine ammonia-lyase were isolated from artichoke, by using a sequence homology strategy, by screening a genomic library and by 3'-rapid amplification of cDNA end (RACE) technology. These analyses and Southern blots suggested that, in artichoke, phenylalanine ammonia-lyase (PAL) is encoded by a small gene family. The sequences isolated from genomic DNA possess two exons and one intron at the conserved position as in most plant pal characterized to date. The 3'-RACE analysis also indicated that each member of the artichoke pal gene family was present as a pool of transcripts, different in the length of 3'-untranslated region. The deduced amino acid sequences were highly similar to those of PAL from lettuce and sunflower. One of the artichoke pal genes was completely sequenced, and its 5' upstream region contained TATA, CAAT box and cis regulatory elements identified in other phenylpropanoid pathway genes as playing a role in UV and elicitor induction. The expression of three of the identified artichoke pal sequences was evaluated in different plant parts, in developmental stages and after wounding, using gene-specific primers/probe combinations in real-time polymerase chain reaction assays. The three putative genes were differentially expressed in the plant parts analysed and were developmentally regulated. Moreover, after leaf mechanical injury, all of them were differentially regulated. The possible involvement of the single pal genes in different physiological processes is discussed. PMID:18251868

  20. Purification, amino acid sequence and mode of action of bifidocin B produced by Bifidobacterium bifidum NCFB 1454.

    PubMed

    Yildirim, Z; Winters, D K; Johnson, M G

    1999-01-01

    Bifidocin B produced by Bifidobacterium bifidum NCFB 1454 was purified to homogeneity by a rapid and simple three step purification procedure which included freeze drying, Micro-Cel adsorption/desorption and cation exchange chromatography. The purification resulted in 18% recovery and an approximately 1900-fold increase in the specific activity and purity of bifidocin B. Treatment with bifidocin B caused sensitive cells to lose high amounts of intracellular K+ ions and u.v.-absorbing materials, and to become more permeable to ONPG. Bifidocin B adsorbed to the Gram-positive bacteria but not the Gram-negative bacteria tested. Its adsorption was pH-dependent but not time-dependent. For sensitive cells, the adsorption and lethal action of bifidocin B was very rapid. In 5 min, 95% of bifidocin B adsorbed onto sensitive cells. Several salts inhibited the binding of bifidocin B, which could be overcome by increasing the amount of bifidocin B added. Pre-treatment of sensitive cells and cell walls with detergents, organic solvents or enzymes did not cause a reduction in subsequent cellular binding of bifidocin B, but cell wall preparations treated with methanol:chloroform and hot 20% (w/v) TCA lost the ability to adsorb bifidocin B. Also, the addition of purified heterologous lipoteichoic acid to sensitive cells completely blocked the adsorption of bifidocin B. The amino acid sequence indicated that the bacteriocin contained 36 residues. N-terminal amino acid sequence analysis yielded a sequence of KYYGNGVTCGLHDCRVDRGKATCGIINNGGMWGDIG. Curing experiments with 20 micrograms ml-1 acriflavine yielded cell derivatives that no longer produced bifidocin B but retained immunity to bifidocin B. Production of bifidocin B, but not immunity to bifidocin B, was associated with a plasmid of about 8 kb in this strain. PMID:10030011

  1. Partial amino acid sequence of fructose-1,6-bisphosphatase from the blue-green algae Synechococcus leopoliensis.

    PubMed

    Marcus, F; Latshaw, S P; Steup, M; Gerbling, K P

    1989-08-01

    Purified fructose-1,6-bisphosphatase from the cyanobacterium Synechococcus leopoliensis was S-carboxymethylated and cleaved with trypsin. The resulting peptides were purified by reversed-phase high performance liquid chromatography and the amino acid sequence of six of the purified peptides was determined by gas-phase microsequencing. The results revealed sequence homology with other fructose-1,6-bisphosphatases. The obtained sequence data provides information required for the design of oligonucleotide hybridization probes to screen existing libraries of cyanobacterial DNA. The determination of the amino acid sequence of cyanobacterial proteins may yield important information with respect to the endosymbiotic theory of evolution. PMID:2550924

  2. Evolution Analysis of Simple Sequence Repeats in Plant Genome

    PubMed Central

    Qin, Zhen; Wang, Yanping; Wang, Qingmei; Li, Aixian; Hou, Fuyun; Zhang, Liming

    2015-01-01

    Simple sequence repeats (SSRs) are widespread units on genome sequences, and play many important roles in plants. In order to reveal the evolution of plant genomes, we investigated the evolutionary regularities of SSRs during the evolution of plant species and the plant kingdom by analysis of twelve sequenced plant genome sequences. First, in the twelve studied plant genomes, the main SSRs were those which contain repeats of 1–3 nucleotides combination. Second, in mononucleotide SSRs, the A/T percentage gradually increased along with the evolution of plants (except for P. patens). With the increase of SSRs repeat number the percentage of A/T in C. reinhardtii had no significant change, while the percentage of A/T in terrestrial plants species gradually declined. Third, in dinucleotide SSRs, the percentage of AT/TA increased along with the evolution of plant kingdom and the repeat number increased in terrestrial plants species. This trend was more obvious in dicotyledon than monocotyledon. The percentage of CG/GC showed the opposite pattern to the AT/TA. Forth, in trinucleotide SSRs, the percentages of combinations including two or three A/T were in a rising trend along with the evolution of plant kingdom; meanwhile with the increase of SSRs repeat number in plants species, different species chose different combinations as dominant SSRs. SSRs in C. reinhardtii, P. patens, Z. mays and A. thaliana showed their specific patterns related to evolutionary position or specific changes of genome sequences. The results showed that, SSRs not only had the general pattern in the evolution of plant kingdom, but also were associated with the evolution of the specific genome sequence. The study of the evolutionary regularities of SSRs provided new insights for the analysis of the plant genome evolution. PMID:26630570

  3. A Primary Sequence Analysis of the ARGONAUTE Protein Family in Plants

    PubMed Central

    Rodríguez-Leal, Daniel; Castillo-Cobián, Amanda; Rodríguez-Arévalo, Isaac; Vielle-Calzada, Jean-Philippe

    2016-01-01

    Small RNA (sRNA)-mediated gene silencing represents a conserved regulatory mechanism controlling a wide diversity of developmental processes through interactions of sRNAs with proteins of the ARGONAUTE (AGO) family. On the basis of a large phylogenetic analysis that includes 206 AGO genes belonging to 23 plant species, AGO genes group into four clades corresponding to the phylogenetic distribution proposed for the ten family members of Arabidopsis thaliana. A primary analysis of the corresponding protein sequences resulted in 50 sequences of amino acids (blocks) conserved across their linear length. Protein members of the AGO4/6/8/9 and AGO1/10 clades are more conserved than members of the AGO5 and AGO2/3/7 clades. In addition to blocks containing components of the PIWI, PAZ, and DUF1785 domains, members of the AGO2/3/7 and AGO4/6/8/9 clades possess other consensus block sequences that are exclusive of members within these clades, suggesting unforeseen functional specialization revealed by their primary sequence. We also show that AGO proteins of animal and plant kingdoms share linear sequences of blocks that include motifs involved in posttranslational modifications such as those regulating AGO2 in humans and the PIWI protein AUBERGINE in Drosophila. Our results open possibilities for exploring new structural and functional aspects related to the evolution of AGO proteins within the plant kingdom, and their convergence with analogous proteins in mammals and invertebrates.

  4. Whole-genome sequence-based analysis of thyroid function

    PubMed Central

    Taylor, Peter N.; Porcu, Eleonora; Chew, Shelby; Campbell, Purdey J.; Traglia, Michela; Brown, Suzanne J.; Mullin, Benjamin H.; Shihab, Hashem A.; Min, Josine; Walter, Klaudia; Memari, Yasin; Huang, Jie; Barnes, Michael R.; Beilby, John P.; Charoen, Pimphen; Danecek, Petr; Dudbridge, Frank; Forgetta, Vincenzo; Greenwood, Celia; Grundberg, Elin; Johnson, Andrew D.; Hui, Jennie; Lim, Ee M.; McCarthy, Shane; Muddyman, Dawn; Panicker, Vijay; Perry, John R.B.; Bell, Jordana T.; Yuan, Wei; Relton, Caroline; Gaunt, Tom; Schlessinger, David; Abecasis, Goncalo; Cucca, Francesco; Surdulescu, Gabriela L.; Woltersdorf, Wolfram; Zeggini, Eleftheria; Zheng, Hou-Feng; Toniolo, Daniela; Dayan, Colin M.; Naitza, Silvia; Walsh, John P.; Spector, Tim; Davey Smith, George; Durbin, Richard; Brent Richards, J.; Sanna, Serena; Soranzo, Nicole; Timpson, Nicholas J.; Wilson, Scott G.; Turki, Saeed Al; Anderson, Carl; Anney, Richard; Antony, Dinu; Artigas, Maria Soler; Ayub, Muhammad; Balasubramaniam, Senduran; Barrett, Jeffrey C.; Barroso, Inês; Beales, Phil; Bentham, Jamie; Bhattacharya, Shoumo; Birney, Ewan; Blackwood, Douglas; Bobrow, Martin; Bochukova, Elena; Bolton, Patrick; Bounds, Rebecca; Boustred, Chris; Breen, Gerome; Calissano, Mattia; Carss, Keren; Chatterjee, Krishna; Chen, Lu; Ciampi, Antonio; Cirak, Sebhattin; Clapham, Peter; Clement, Gail; Coates, Guy; Collier, David; Cosgrove, Catherine; Cox, Tony; Craddock, Nick; Crooks, Lucy; Curran, Sarah; Curtis, David; Daly, Allan; Day-Williams, Aaron; Day, Ian N.M.; Down, Thomas; Du, Yuanping; Dunham, Ian; Edkins, Sarah; Ellis, Peter; Evans, David; Faroogi, Sadaf; Fatemifar, Ghazaleh; Fitzpatrick, David R.; Flicek, Paul; Flyod, James; Foley, A. Reghan; Franklin, Christopher S.; Futema, Marta; Gallagher, Louise; Geihs, Matthias; Geschwind, Daniel; Griffin, Heather; Grozeva, Detelina; Guo, Xueqin; Guo, Xiaosen; Gurling, Hugh; Hart, Deborah; Hendricks, Audrey; Holmans, Peter; Howie, Bryan; Huang, Liren; Hubbard, Tim; Humphries, Steve E.; Hurles, Matthew E.; Hysi, Pirro; Jackson, David K.; Jamshidi, Yalda; Jing, Tian; Joyce, Chris; Kaye, Jane; Keane, Thomas; Keogh, Julia; Kemp, John; Kennedy, Karen; Kolb-Kokocinski, Anja; Lachance, Genevieve; Langford, Cordelia; Lawson, Daniel; Lee, Irene; Lek, Monkol; Liang, Jieqin; Lin, Hong; Li, Rui; Li, Yingrui; Liu, Ryan; Lönnqvist, Jouko; Lopes, Margarida; Lotchkova, Valentina; MacArthur, Daniel; Marchini, Jonathan; Maslen, John; Massimo, Mangino; Mathieson, Iain; Marenne, Gaëlle; McGuffin, Peter; McIntosh, Andrew; McKechanie, Andrew G.; McQuillin, Andrew; Metrustry, Sarah; Mitchison, Hannah; Moayyeri, Alireza; Morris, James; Muntoni, Francesco; Northstone, Kate; O'Donnovan, Michael; Onoufriadis, Alexandros; O'Rahilly, Stephen; Oualkacha, Karim; Owen, Michael J.; Palotie, Aarno; Panoutsopoulou, Kalliope; Parker, Victoria; Parr, Jeremy R.; Paternoster, Lavinia; Paunio, Tiina; Payne, Felicity; Pietilainen, Olli; Plagnol, Vincent; Quaye, Lydia; Quai, Michael A.; Raymond, Lucy; Rehnström, Karola; Richards, Brent; Ring, Susan; Ritchie, Graham R.S.; Roberts, Nicola; Savage, David B.; Scambler, Peter; Schiffels, Stephen; Schmidts, Miriam; Schoenmakers, Nadia; Semple, Robert K.; Serra, Eva; Sharp, Sally I.; Shin, So-Youn; Skuse, David; Small, Kerrin; Southam, Lorraine; Spasic-Boskovic, Olivera; Clair, David St; Stalker, Jim; Stevens, Elizabeth; Pourcian, Beate St; Sun, Jianping; Suvisaari, Jaana; Tachmazidou, Ionna; Tobin, Martin D.; Valdes, Ana; Kogelenberg, Margriet Van; Vijayarangakannan, Parthiban; Visscher, Peter M.; Wain, Louise V.; Walters, James T.R.; Wang, Guangbiao; Wang, Jun; Wang, Yu; Ward, Kirsten; Wheeler, Elanor; Whyte, Tamieka; Williams, Hywel; Williamson, Kathleen A.; Wilson, Crispian; Wong, Kim; Xu, ChangJiang; Yang, Jian; Zhang, Fend; Zhang, Pingbo

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N=2,287). Using additional whole-genome sequence and deeply imputed data sets, we report meta-analysis results for common variants (MAF≥1%) associated with TSH and FT4 (N=16,335). For TSH, we identify a novel variant in SYN2 (MAF=23.5%, P=6.15 × 10−9) and a new independent variant in PDE8B (MAF=10.4%, P=5.94 × 10−14). For FT4, we report a low-frequency variant near B4GALT6/SLC25A52 (MAF=3.2%, P=1.27 × 10−9) tagging a rare TTR variant (MAF=0.4%, P=2.14 × 10−11). All common variants explain ≥20% of the variance in TSH and FT4. Analysis of rare variants (MAF<1%) using sequence kernel association testing reveals a novel association with FT4 in NRG1. Our results demonstrate that increased coverage in whole-genome sequence association studies identifies novel variants associated with thyroid function. PMID:25743335

  5. Whole-genome sequence-based analysis of thyroid function.

    PubMed

    Taylor, Peter N; Porcu, Eleonora; Chew, Shelby; Campbell, Purdey J; Traglia, Michela; Brown, Suzanne J; Mullin, Benjamin H; Shihab, Hashem A; Min, Josine; Walter, Klaudia; Memari, Yasin; Huang, Jie; Barnes, Michael R; Beilby, John P; Charoen, Pimphen; Danecek, Petr; Dudbridge, Frank; Forgetta, Vincenzo; Greenwood, Celia; Grundberg, Elin; Johnson, Andrew D; Hui, Jennie; Lim, Ee M; McCarthy, Shane; Muddyman, Dawn; Panicker, Vijay; Perry, John R B; Bell, Jordana T; Yuan, Wei; Relton, Caroline; Gaunt, Tom; Schlessinger, David; Abecasis, Goncalo; Cucca, Francesco; Surdulescu, Gabriela L; Woltersdorf, Wolfram; Zeggini, Eleftheria; Zheng, Hou-Feng; Toniolo, Daniela; Dayan, Colin M; Naitza, Silvia; Walsh, John P; Spector, Tim; Davey Smith, George; Durbin, Richard; Richards, J Brent; Sanna, Serena; Soranzo, Nicole; Timpson, Nicholas J; Wilson, Scott G

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N=2,287). Using additional whole-genome sequence and deeply imputed data sets, we report meta-analysis results for common variants (MAF≥1%) associated with TSH and FT4 (N=16,335). For TSH, we identify a novel variant in SYN2 (MAF=23.5%, P=6.15 × 10(-9)) and a new independent variant in PDE8B (MAF=10.4%, P=5.94 × 10(-14)). For FT4, we report a low-frequency variant near B4GALT6/SLC25A52 (MAF=3.2%, P=1.27 × 10(-9)) tagging a rare TTR variant (MAF=0.4%, P=2.14 × 10(-11)). All common variants explain ≥20% of the variance in TSH and FT4. Analysis of rare variants (MAF<1%) using sequence kernel association testing reveals a novel association with FT4 in NRG1. Our results demonstrate that increased coverage in whole-genome sequence association studies identifies novel variants associated with thyroid function. PMID:25743335

  6. Infrared thermal facial image sequence registration analysis and verification

    NASA Astrophysics Data System (ADS)

    Chen, Chieh-Li; Jian, Bo-Lin

    2015-03-01

    To study the emotional responses of subjects to the International Affective Picture System (IAPS), infrared thermal facial image sequence is preprocessed for registration before further analysis such that the variance caused by minor and irregular subject movements is reduced. Without affecting the comfort level and inducing minimal harm, this study proposes an infrared thermal facial image sequence registration process that will reduce the deviations caused by the unconscious head shaking of the subjects. A fixed image for registration is produced through the localization of the centroid of the eye region as well as image translation and rotation processes. Thermal image sequencing will then be automatically registered using the two-stage genetic algorithm proposed. The deviation before and after image registration will be demonstrated by image quality indices. The results show that the infrared thermal image sequence registration process proposed in this study is effective in localizing facial images accurately, which will be beneficial to the correlation analysis of psychological information related to the facial area.

  7. Congruence analysis of point clouds from unstable stereo image sequences

    NASA Astrophysics Data System (ADS)

    Jepping, C.; Bethmann, F.; Luhmann, T.

    2014-06-01

    This paper deals with the correction of exterior orientation parameters of stereo image sequences over deformed free-form surfaces without control points. Such imaging situation can occur, for example, during photogrammetric car crash test recordings where onboard high-speed stereo cameras are used to measure 3D surfaces. As a result of such measurements 3D point clouds of deformed surfaces are generated for a complete stereo sequence. The first objective of this research focusses on the development and investigation of methods for the detection of corresponding spatial and temporal tie points within the stereo image sequences (by stereo image matching and 3D point tracking) that are robust enough for a reliable handling of occlusions and other disturbances that may occur. The second objective of this research is the analysis of object deformations in order to detect stable areas (congruence analysis). For this purpose a RANSAC-based method for congruence analysis has been developed. This process is based on the sequential transformation of randomly selected point groups from one epoch to another by using a 3D similarity transformation. The paper gives a detailed description of the congruence analysis. The approach has been tested successfully on synthetic and real image data.

  8. Nucleotide sequence of the phosphoglycerate kinase gene from the extreme thermophile Thermus thermophilus. Comparison of the deduced amino acid sequence with that of the mesophilic yeast phosphoglycerate kinase.

    PubMed Central

    Bowen, D; Littlechild, J A; Fothergill, J E; Watson, H C; Hall, L

    1988-01-01

    Using oligonucleotide probes derived from amino acid sequencing information, the structural gene for phosphoglycerate kinase from the extreme thermophile, Thermus thermophilus, was cloned in Escherichia coli and its complete nucleotide sequence determined. The gene consists of an open reading frame corresponding to a protein of 390 amino acid residues (calculated Mr 41,791) with an extreme bias for G or C (93.1%) in the codon third base position. Comparison of the deduced amino acid sequence with that of the corresponding mesophilic yeast enzyme indicated a number of significant differences. These are discussed in terms of the unusual codon bias and their possible role in enhanced protein thermal stability. Images Fig. 1. PMID:3052437

  9. Automated Analysis of Dynamic Ca2+ Signals in Image Sequences

    PubMed Central

    Francis, Michael; Waldrup, Josh; Qian, Xun; Taylor, Mark S.

    2014-01-01

    Intracellular Ca2+ signals are commonly studied with fluorescent Ca2+ indicator dyes and microscopy techniques. However, quantitative analysis of Ca2+ imaging data is time consuming and subject to bias. Automated signal analysis algorithms based on region of interest (ROI) detection have been implemented for one-dimensional line scan measurements, but there is no current algorithm which integrates optimized identification and analysis of ROIs in two-dimensional image sequences. Here an algorithm for rapid acquisition and analysis of ROIs in image sequences is described. It utilizes ellipses fit to noise filtered signals in order to determine optimal ROI placement, and computes Ca2+ signal parameters of amplitude, duration and spatial spread. This algorithm was implemented as a freely available plugin for ImageJ (NIH) software. Together with analysis scripts written for the open source statistical processing software R, this approach provides a high-capacity pipeline for performing quick statistical analysis of experimental output. The authors suggest that use of this analysis protocol will lead to a more complete and unbiased characterization of physiologic Ca2+ signaling. PMID:24962784

  10. Cloning, sequence analysis and crystal structure determination of a miraculin-like protein from Murraya koenigii.

    PubMed

    Gahloth, Deepankar; Selvakumar, Purushotham; Shee, Chandan; Kumar, Pravindra; Sharma, Ashwani Kumar

    2010-02-01

    Earlier, the purification of a 21.4kDa protein with trypsin inhibitory activity from seeds of Murraya koenigii has been reported. The present study, based on the amino acid sequence deduced from both cDNA and genomic DNA, establishes it to be a miraculin-like protein and provides crystal structure at 2.9A resolution. The mature protein consists of 190 amino acid residues with seven cysteines arranged in three disulfide bridges. The amino acid sequence showed maximum homology and formed a distinct cluster with miraculin-like proteins, a soybean Kunitz super family member, in phylogenetic analyses. The major differences in sequence were observed at primary and secondary specificity sites in the reactive loop when compared to classical Kunitz family members. The crystal structure analysis showed that the protein is made of twelve antiparallel beta-strands, loops connecting beta-strands and two short helices. Despite similar overall fold, it showed significant differences from classical Kunitz trypsin inhibitors. PMID:19914199

  11. Terminal sequence studies of high-molecular-weight ribonucleic acid. The 3′-termini of rabbit globin messenger ribonucleic acid

    PubMed Central

    Hunt, John A.

    1973-01-01

    Haemoglobin mRNA isolated from EDTA-treated polyribosomes has an apparent molecular weight of 120000–180000 estimated by condensation with 3H-labelled isoniazid after periodate oxidation. Analysis of the ribonuclease digests of isoniazid-labelled RNA by paper electrophoresis and column chromatography enables the amount of contaminating 18S, 7S, 5S and 4S RNA to be estimated, and a corrected molecular weight of globin mRNA as the acid is 161000 or 500 nucleotides in length. This molecule contains two groups of 3′-terminal sequences in equal yield; G-Y-A6 and G-Y-A7 in the ratio 3:2, and G-N9–16-Y-A2 and G-N9–16-Y-N3 in the ratio 3:2. The significance of these sequences is discussed in relation to the poly(A) content of globin mRNA, the specificity of the sequences, and possible function in processing and biosynthesis of mRNA. PMID:4737318

  12. Cloning and sequence analysis of the ces10 gene encoding a Sphingomonas paucimobilis esterase.

    PubMed

    Videira, P A; Fialho, A M; Marques, A R; Coutinho, P M; Sá-Correia, I

    2003-06-01

    The ces10 gene of the gellan gum-producing strain Sphingomonas paucimobilis ATCC 31461 was cloned and sequenced. Multi-sequence alignment of the deduced protein indicated that Ces10 belongs to the serine hydrolase family with a potential catalytic triad comprising Ser(153) (within the G-X-S-X-G consensus sequence), His(75) and Asp(125). The mixed block results obtained following pattern search and the low identities detected in a BLAST analysis indicate that Ces10 is significantly different from other characterised bacterial esterases/lipases. Nevertheless, the Ces10 amino acid sequence showed 45% similarity with Rhodococcus sp. heroin esterase and 48% with Bacillus subtilis p-nitrobenzyl esterase. Ces10, with a predicted molecular mass of 30,641 Da, was overproduced in Escherichia coli and purified to homogeneity in a histidine-tagged form. Enzyme assays using p-nitrophenyl-esters (p-NP-esters) with different acyl chain-lengths as the substrate confirmed the anticipated esterase activity. Ces10 exhibited a marked preference for short-chain fatty acids, yielding the highest activity with p-NP-propionate (optimal pH 7.4, optimal temperature 37 degrees C). PMID:12764567

  13. Now and Next-Generation Sequencing Techniques: Future of Sequence Analysis Using Cloud Computing

    PubMed Central

    Thakur, Radhe Shyam; Bandopadhyay, Rajib; Chaudhary, Bratati; Chatterjee, Sourav

    2012-01-01

    Advances in the field of sequencing techniques have resulted in the greatly accelerated production of huge sequence datasets. This presents immediate challenges in database maintenance at datacenters. It provides additional computational challenges in data mining and sequence analysis. Together these represent a significant overburden on traditional stand-alone computer resources, and to reach effective conclusions quickly and efficiently, the virtualization of the resources and computation on a pay-as-you-go concept (together termed “cloud computing”) has recently appeared. The collective resources of the datacenter, including both hardware and software, can be available publicly, being then termed a public cloud, the resources being provided in a virtual mode to the clients who pay according to the resources they employ. Examples of public companies providing these resources include Amazon, Google, and Joyent. The computational workload is shifted to the provider, which also implements required hardware and software upgrades over time. A virtual environment is created in the cloud corresponding to the computational and data storage needs of the user via the internet. The task is then performed, the results transmitted to the user, and the environment finally deleted after all tasks are completed. In this discussion, we focus on the basics of cloud computing, and go on to analyze the prerequisites and overall working of clouds. Finally, the applications of cloud computing in biological systems, particularly in comparative genomics, genome informatics, and SNP detection are discussed with reference to traditional workflows. PMID:23248640

  14. [Cloning, sequence analysis and expression of N-acetylglutamate kinase gene in Corynebacterium crenatum].

    PubMed

    Hao, Ning; Zhao, Zhi; Wang, Yu; Zhang, Ying-zi; Ding, Jiu-yuan

    2006-02-01

    N-Acetylglutamate kinase (EC 2.7.2.8;NAGK) genes from wild-type Corynebacterium crenatum AS 1.542 and a L-arginine-producing mutant C. crenatum 971.1 were cloned and sequenced. Analysis of argB sequences revealed that only one ORF existed, which used ATG as the initiation codon and coded a peptide of 317 amino acids with a calculated molecular weight of 33.6kDa. Only one nucleotide difference was found in the structure gene and the difference did not cause a change of amino acid by comparison of the gene sequences between the wild type C. crenatum AS 1.542 and the mutant 971.1. The ORF sequence of argB from C. crenatum AS 1.542 showed homologies of 99.89%, 76.62%, 37.94% to those from Corynebacterium glutamicum ATCC 13032, Corynebacterium efficient YS-314 and Escherichia coli k12. And the amino acid sequence deduced from ORF displayed homologies of 100%, 78.55%, 25.25% to those from microorganisms above, respectively. An internal promoter was found in the upstream of the argB gene from C. crenatum. The argB gene from C. crenatum AS 1.542 was expressed both in C. crenatum AS 1.542 and 971.1. The NAGK activity of transformed C. crenatum AS 1.542 was greatly increased by the induction of IPTG. The NAGK activity of transformed C. crenatum 971.1 was almost twice as much as that of C. crenatum 971.1 under the same induction. The amplification of the NAGK activity yielded 25% increase of L-arginine production in C. crenatum 971.1. PMID:16579472

  15. Nucleic and amino acid sequences relating to a novel transketolase, and methods for the expression thereof

    DOEpatents

    Croteau, Rodney Bruce; Wildung, Mark Raymond; Lange, Bernd Markus; McCaskill, David G.

    2001-01-01

    cDNAs encoding 1-deoxyxylulose-5-phosphate synthase from peppermint (Mentha piperita) have been isolated and sequenced, and the corresponding amino acid sequences have been determined. Accordingly, isolated DNA sequences (SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7) are provided which code for the expression of 1-deoxyxylulose-5-phosphate synthase from plants. In another aspect the present invention provides for isolated, recombinant DXPS proteins, such as the proteins having the sequences set forth in SEQ ID NO:4, SEQ ID NO:6 and SEQ ID NO:8. In other aspects, replicable recombinant cloning vehicles are provided which code for plant 1-deoxyxylulose-5-phosphate synthases, or for a base sequence sufficiently complementary to at least a portion of 1-deoxyxylulose-5-phosphate synthase DNA or RNA to enable hybridization therewith. In yet other aspects, modified host cells are provided that have been transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence encoding a plant 1-deoxyxylulose-5-phosphate synthase. Thus, systems and methods are provided for the recombinant expression of the aforementioned recombinant 1-deoxyxylulose-5-phosphate synthase that may be used to facilitate its production, isolation and purification in significant amounts. Recombinant 1-deoxyxylulose-5-phosphate synthase may be used to obtain expression or enhanced expression of 1-deoxyxylulose-5-phosphate synthase in plants in order to enhance the production of 1-deoxyxylulose-5-phosphate, or its derivatives such as isopentenyl diphosphate (BP), or may be otherwise employed for the regulation or expression of 1-deoxyxylulose-5-phosphate synthase, or the production of its products.

  16. Applications of new sequencing technologies for transcriptome analysis.

    PubMed

    Morozova, Olena; Hirst, Martin; Marra, Marco A

    2009-01-01

    Transcriptome analysis has been a key area of biological inquiry for decades. Over the years, research in the field has progressed from candidate gene-based detection of RNAs using Northern blotting to high-throughput expression profiling driven by the advent of microarrays. Next-generation sequencing technologies have revolutionized transcriptomics by providing opportunities for multidimensional examinations of cellular transcriptomes in which high-throughput expression data are obtained at a single-base resolution. PMID:19715439

  17. Bile acid sulfotransferase I from rat liver sulfates bile acids and 3-hydroxy steroids: purification, N-terminal amino acid sequence, and kinetic properties.

    PubMed

    Barnes, S; Buchina, E S; King, R J; McBurnett, T; Taylor, K B

    1989-04-01

    A bile acid:3'phosphoadenosine-5'phosphosulfate:sulfotransferase (BAST I) from adult female rat liver cytosol has been purified 157-fold by a two-step isolation procedure. The N-terminal amino acid sequence of the 30,000 subunit has been determined for the first 35 residues. The Vmax of purified BAST I is 18.7 nmol/min per mg protein with N-(3-hydroxy-5 beta-cholanoyl)glycine (glycolithocholic acid) as substrate, comparable to that of the corresponding purified human BAST (Chen, L-J., and I. H. Segel, 1985. Arch. Biochem. Biophys. 241: 371-379). BAST I activity has a broad pH optimum from 5.5-7.5. Although maximum activity occurs with 5 mM MgCl2, Mg2+ is not essential for BAST I activity. The greatest sulfotransferase activity and the highest substrate affinity is observed with bile acids or steroids that have a steroid nucleus containing a 3 beta-hydroxy group and a 5-6 double bond or a trans A-B ring junction. These substrates have normal hyperbolic initial velocity curves with substrate inhibition occurring above 5 microM. Of the saturated 5 beta-bile acids, those with a single 3-hydroxy group are the most active. The addition of a second hydroxy group at the 6- or 7-position eliminates more than 99% of the activity. In contrast, 3 alpha,12 alpha-dihydroxy-5 beta-cholan-24-oic acid (deoxycholic acid) is an excellent substrate. The initial velocity curves for glycolithocholic and deoxycholic acid conjugates are sigmoidal rather than hyperbolic, suggestive of an allosteric effect. Maximum activity is observed at 80 microM for glycolithocholic acid. All substrates, bile acids and steroids, are inhibited by the 5 beta-bile acid, 3-keto-5 beta-cholanoic acid. The data suggest that BAST I is the same protein as hydrosteroid sulfotransferase 2 (Marcus, C. J., et al. 1980. Anal. Biochem. 107: 296-304). PMID:2754334

  18. JRC GMO-Amplicons: a collection of nucleic acid sequences related to genetically modified organisms.

    PubMed

    Petrillo, Mauro; Angers-Loustau, Alexandre; Henriksson, Peter; Bonfini, Laura; Patak, Alex; Kreysa, Joachim

    2015-01-01

    The DNA target sequence is the key element in designing detection methods for genetically modified organisms (GMOs). Unfortunately this information is frequently lacking, especially for unauthorized GMOs. In addition, patent sequences are generally poorly annotated, buried in complex and extensive documentation and hard to link to the corresponding GM event. Here, we present the JRC GMO-Amplicons, a database of amplicons collected by screening public nucleotide sequence databanks by in silico determination of PCR amplification with reference methods for GMO analysis. The European Union Reference Laboratory for Genetically Modified Food and Feed (EU-RL GMFF) provides these methods in the GMOMETHODS database to support enforcement of EU legislation and GM food/feed control. The JRC GMO-Amplicons database is composed of more than 240 000 amplicons, which can be easily accessed and screened through a web interface. To our knowledge, this is the first attempt at pooling and collecting publicly available sequences related to GMOs in food and feed. The JRC GMO-Amplicons supports control laboratories in the design and assessment of GMO methods, providing inter-alia in silico prediction of primers specificity and GM targets coverage. The new tool can assist the laboratories in the analysis of complex issues, such as the detection and identification of unauthorized GMOs. Notably, the JRC GMO-Amplicons database allows the retrieval and characterization of GMO-related sequences included in patents documentation. Finally, it can help annotating poorly described GM sequences and identifying new relevant GMO-related sequences in public databases. The JRC GMO-Amplicons is freely accessible through a web-based portal that is hosted on the EU-RL GMFF website. Database URL: http://gmo-crl.jrc.ec.europa.eu/jrcgmoamplicons/. PMID:26424080

  19. JRC GMO-Amplicons: a collection of nucleic acid sequences related to genetically modified organisms

    PubMed Central

    Petrillo, Mauro; Angers-Loustau, Alexandre; Henriksson, Peter; Bonfini, Laura; Patak, Alex; Kreysa, Joachim

    2015-01-01

    The DNA target sequence is the key element in designing detection methods for genetically modified organisms (GMOs). Unfortunately this information is frequently lacking, especially for unauthorized GMOs. In addition, patent sequences are generally poorly annotated, buried in complex and extensive documentation and hard to link to the corresponding GM event. Here, we present the JRC GMO-Amplicons, a database of amplicons collected by screening public nucleotide sequence databanks by in silico determination of PCR amplification with reference methods for GMO analysis. The European Union Reference Laboratory for Genetically Modified Food and Feed (EU-RL GMFF) provides these methods in the GMOMETHODS database to support enforcement of EU legislation and GM food/feed control. The JRC GMO-Amplicons database is composed of more than 240 000 amplicons, which can be easily accessed and screened through a web interface. To our knowledge, this is the first attempt at pooling and collecting publicly available sequences related to GMOs in food and feed. The JRC GMO-Amplicons supports control laboratories in the design and assessment of GMO methods, providing inter-alia in silico prediction of primers specificity and GM targets coverage. The new tool can assist the laboratories in the analysis of complex issues, such as the detection and identification of unauthorized GMOs. Notably, the JRC GMO-Amplicons database allows the retrieval and characterization of GMO-related sequences included in patents documentation. Finally, it can help annotating poorly described GM sequences and identifying new relevant GMO-related sequences in public databases. The JRC GMO-Amplicons is freely accessible through a web-based portal that is hosted on the EU-RL GMFF website. Database URL: http://gmo-crl.jrc.ec.europa.eu/jrcgmoamplicons/ PMID:26424080

  20. Sequence-defined bioactive macrocycles via an acid-catalysed cascade reaction

    NASA Astrophysics Data System (ADS)

    Porel, Mintu; Thornlow, Dana N.; Phan, Ngoc N.; Alabi, Christopher A.

    2016-06-01

    Synthetic macrocycles derived from sequence-defined oligomers are a unique structural class whose ring size, sequence and structure can be tuned via precise organization of the primary sequence. Similar to peptides and other peptidomimetics, these well-defined synthetic macromolecules become pharmacologically relevant when bioactive side chains are incorporated into their primary sequence. In this article, we report the synthesis of oligothioetheramide (oligoTEA) macrocycles via a one-pot acid-catalysed cascade reaction. The versatility of the cyclization chemistry and modularity of the assembly process was demonstrated via the synthesis of >20 diverse oligoTEA macrocycles. Structural characterization via NMR spectroscopy revealed the presence of conformational isomers, which enabled the determination of local chain dynamics within the macromolecular structure. Finally, we demonstrate the biological activity of oligoTEA macrocycles designed to mimic facially amphiphilic antimicrobial peptides. The preliminary results indicate that macrocyclic oligoTEAs with just two-to-three cationic charge centres can elicit potent antibacterial activity against Gram-positive and Gram-negative bacteria.

  1. Unconventional amino acid sequence of the sun anemone (Stoichactis helianthus) polypeptide neurotoxin

    SciTech Connect

    Kem, W.; Dunn, B.; Parten, B.; Pennington, M.; Price, D.

    1986-05-01

    A 5000 dalton polypeptide neurotoxin (Sh-NI) purified by G50 Sephadex, P-cellulose, and SP-Sephadex chromatography was homogeneous by isoelectric focusing. Sh-NI was highly toxic to crayfish (LD/sub 50/ 0.6 ..mu..g/kg) but without effect upon mice at 15,000 ..mu..g/kg (i.p. injection). The reduced, /sup 3/H-carboxymethylated toxin and its fragments were subjected to automatic Edman degradation and the resulting PTH-amino acids were identified by HPLC, back hydrolysis, and scintillation counting. Peptides resulting from proteolytic (clostripain, staphylococcal protease) and chemical (tryptophan) cleavage were sequenced. The sequence is: AACKCDDEGPDIRTAPLTGTVDLGSCNAGWEKCASYYTIIADCCRKKK. This sequence differs considerably from the homologous Anemonia and Anthopleura toxins; many of the identical residues (6 half-cystines, G9, P10, R13, G19, G29, W30) are probably critical for folding rather than receptor recognition. However, the Sh-NI sequence closely resembles Radioanthus macrodactylus neurotoxin III and r. paumotensis II. The authors propose that Sh-NI and related Radioanthus toxins act upon a different site on the sodium channel.

  2. Repeat sequence chromosome specific nucleic acid probes and methods of preparing and using

    DOEpatents

    Weier, H.U.G.; Gray, J.W.

    1995-06-27

    A primer directed DNA amplification method to isolate efficiently chromosome-specific repeated DNA wherein degenerate oligonucleotide primers are used is disclosed. The probes produced are a heterogeneous mixture that can be used with blocking DNA as a chromosome-specific staining reagent, and/or the elements of the mixture can be screened for high specificity, size and/or high degree of repetition among other parameters. The degenerate primers are sets of primers that vary in sequence but are substantially complementary to highly repeated nucleic acid sequences, preferably clustered within the template DNA, for example, pericentromeric alpha satellite repeat sequences. The template DNA is preferably chromosome-specific. Exemplary primers and probes are disclosed. The probes of this invention can be used to determine the number of chromosomes of a specific type in metaphase spreads, in germ line and/or somatic cell interphase nuclei, micronuclei and/or in tissue sections. Also provided is a method to select arbitrarily repeat sequence probes that can be screened for chromosome-specificity. 18 figs.

  3. Repeat sequence chromosome specific nucleic acid probes and methods of preparing and using

    DOEpatents

    Weier, Heinz-Ulrich G.; Gray, Joe W.

    1995-01-01

    A primer directed DNA amplification method to isolate efficiently chromosome-specific repeated DNA wherein degenerate oligonucleotide primers are used is disclosed. The probes produced are a heterogeneous mixture that can be used with blocking DNA as a chromosome-specific staining reagent, and/or the elements of the mixture can be screened for high specificity, size and/or high degree of repetition among other parameters. The degenerate primers are sets of primers that vary in sequence but are substantially complementary to highly repeated nucleic acid sequences, preferably clustered within the template DNA, for example, pericentromeric alpha satellite repeat sequences. The template DNA is preferably chromosome-specific. Exemplary primers ard probes are disclosed. The probes of this invention can be used to determine the number of chromosomes of a specific type in metaphase spreads, in germ line and/or somatic cell interphase nuclei, micronuclei and/or in tissue sections. Also provided is a method to select arbitrarily repeat sequence probes that can be screened for chromosome-specificity.

  4. Mitochondrial DNA Sequence Analysis - Validation and Use for Forensic Casework.

    PubMed

    Holland, M M; Parsons, T J

    1999-06-01

    With the discovery of the polymerase chain reaction (PCR) in the mid-1980's, the last in a series of critical molecular biology techniques (to include the isolation of DNA from human and non-human biological material, and primary sequence analysis of DNA) had been developed to rapidly analyze minute quantities of mitochondrial DNA (mtDNA). This was especially true for mtDNA isolated from challenged sources, such as ancient or aged skeletal material and hair shafts. One of the beneficiaries of this work has been the forensic community. Over the last decade, a significant amount of research has been conducted to develop PCR-based sequencing assays for the mtDNA control region (CR), which have subsequently been used to further characterize the CR. As a result, the reliability of these assays has been investigated, the limitations of the procedures have been determined, and critical aspects of the analysis process have been identified, so that careful control and monitoring will provide the basis for reliable testing. With the application of these assays to forensic identification casework, mtDNA sequence analysis has been properly validated, and is a reliable procedure for the examination of biological evidence encountered in forensic criminalistic cases. PMID:26255820

  5. The complete amino acid sequence of the major Kunitz trypsin inhibitor from the seeds of Prosopsis juliflora.

    PubMed

    Negreiros, A N; Carvalho, M M; Xavier Filho, J; Blanco-Labra, A; Shewry, P R; Richardson, M

    1991-01-01

    The major inhibitor of trypsin in seeds of Prosopsis juliflora was purified by precipitation with ammonium sulphate, ion-exchange column chromatography on DEAE- and CM-Sepharose and preparative reverse phase HPLC on a Vydac C-18 column. The protein inhibited trypsin in the stoichiometric ratio of 1:1, but had only weak activity against chymotrypsin and did not inhibit human salivary or porcine pancreatic alpha-amylases. SDS-PAGE indicated that the inhibitor has a Mr of ca 20,000, and IEF-PAGE showed that the pI is 8.8. The complete amino acid sequence was determined by automatic degradation, and by DABITC/PITC microsequence analysis of peptides obtained from enzyme digestions of the reduced and S-carboxymethylated protein with trypsin, chymotrypsin, elastase, the Glu-specific protease from S. aureus and the Lys-specific protease from Lysobacter enzymogenes. The inhibitor consisted of two polypeptide chains, of 137 residues (alpha chain) and 38 residues (beta chain) linked together by a single disulphide bond. The amino acid sequence of the protein exhibited homology with a number of Kunitz proteinase inhibitors from other legume seeds, the bifunctional subtilisin/alpha-amylase inhibitors from cereals and the taste-modifying protein miraculin. PMID:1367792

  6. Functional analysis of the p.(Leu15Pro) and p.(Gly20Arg) sequence changes in the signal sequence of LDL receptor.

    PubMed

    Pavloušková, Jana; Réblová, Kamila; Tichý, Lukáš; Freiberger, Tomáš; Fajkusová, Lenka

    2016-07-01

    The low density lipoprotein receptor (LDLR) is a transmembrane protein that plays a key role in cholesterol metabolism. It contains 860 amino acids including a 21 amino acid long signal sequence, which directs the protein into the endoplasmic reticulum. Mutations in the LDLR gene lead to cholesterol accumulation in the plasma and results in familial hypercholesterolemia (FH). Knowledge of the impact of a mutation on the LDLR protein structure and function is very important for the diagnosis and management of FH. Unfortunately, for a large proportion of mutations this information is still missing. In this study, we focused on the LDLR signal sequence and carried out functional and in silico analyses of two sequence changes, p.(Gly20Arg) and p.(Leu15Pro), localized in this part of the LDLR. Our results revealed that the p.(Gly20Arg) change, previously described as disease causing, has no detrimental effect on protein expression or LDL particle binding. In silico analysis supports this observation, showing that both the wt and p.(Gly20Arg) signal sequences adopt an expected α-helix structure. In contrast, the mutation p.(Leu15Pro) is not associated with functional protein expression and exhibits a structure with disrupted a α-helical arrangement in the signal sequence, which most likely affects protein folding in the endoplasmic reticulum. PMID:27175606

  7. Detection of Nucleic Acids with Graphene Nanopores: Ab Initio Characterization of a Novel Sequencing Device

    NASA Astrophysics Data System (ADS)

    Nelson, Tammie; Zhang, Bo; Prezhdo, Oleg

    2010-03-01

    We report an ab initio study of the interaction of two nucleobases, cytosine and adenine, with a novel graphene nanopore device for detecting the base sequence of a single-stranded nucleic acid (ssDNA or RNA). The nucleobases were inserted into a pore in a graphene nanoribbon, and the electrical current and conductance spectra were calculated as functions of voltage applied across the nanoribbon. The conductance spectra and charge densities were analyzed in the presence of each nucleobase in the graphene nanopore. The results indicate that, due to significant differences in the conductance spectra, the proposed device has adequate sensitivity to discriminate between different nucleotides. Moreover, we show that the nucleotide conductance spectra is not affected by its orientation inside the graphene nanopore. The proposed technique may be extremely useful for real applications in developing ultrafast, low cost DNA sequencing methods.

  8. Sequence and Bioinformatic Analysis of Family 1 Glycoside Hydrolase (GH) 1 Gene from the Oomycete Pythium myriotylum Drechsler.

    PubMed

    Nair, R Aswati; Geethu, C; Sangwan, Amit; Pillai, P Padmesh

    2015-06-01

    The oomycetous phytopathogen Pythium myriotylum secretes cellulases for growth/nutrition of the necrotroph. Cellulases are multi-enzyme system classified into different glycoside hydrolase (GH) families. The present study deals with identification and characterization of GH gene sequence from P. myriotylum by a PCR strategy using consensus primers. Cloning of the full-length gene sequence using genome walker strategy resulted in identification of 1230-bp P. myriotylum GH gene sequence, designated as PmGH1. Analysis revealed that PmGH1 encodes a predicted cytoplasmic 421 amino acid protein with an apparent molecular weight of 46.77 kDa and a theoretical pI of 8.11. Tertiary structure of the deduced amino acid sequence showed typical (α/β)8 barrel folding of family 1 GHs. Sequence characterization of PmGH1 identified the conserved active site residues, viz., Glu 181 and Glu 399, that function as acid-base catalyst and catalytically active nucleophile, respectively. Binding sites for N-acetyl-D-glucosamine (NAG) were revealed in the PmGH1 3D structure with Glu181 and Glu399 positioned on either side to form a catalytic pair. Phylogenetic analysis indicated a closer affiliation of PmGH1 with sequences of GH1 family. Results presented are first attempts providing novel insights into the evolutionary and functional perspectives of the identified P. myriotylum GH. PMID:25877398

  9. Solid phase sequencing of biopolymers

    DOEpatents

    Cantor, Charles; Koster, Hubert

    2010-09-28

    This invention relates to methods for detecting and sequencing target nucleic acid sequences, to mass modified nucleic acid probes and arrays of probes useful in these methods, and to kits and systems which contain these probes. Useful methods involve hybridizing the nucleic acids or nucleic acids which represent complementary or homologous sequences of the target to an array of nucleic acid probes. These probes comprise a single-stranded portion, an optional double-stranded portion and a variable sequence within the single-stranded portion. The molecular weights of the hybridized nucleic acids of the set can be determined by mass spectroscopy, and the sequence of the target determined from the molecular weights of the fragments. Nucleic acids whose sequences can be determined include DNA or RNA in biological samples such as patient biopsies and environmental samples. Probes may be fixed to a solid support such as a hybridization chip to facilitate automated molecular weight analysis and identification of the target sequence.

  10. Morphological tranformation of calcite crystal growth by prismatic "acidic" polypeptide sequences.

    SciTech Connect

    Kim, I; Giocondi, J L; Orme, C A; Collino, J; Evans, J S

    2007-02-13

    Many of the interesting mechanical and materials properties of the mollusk shell are thought to stem from the prismatic calcite crystal assemblies within this composite structure. It is now evident that proteins play a major role in the formation of these assemblies. Recently, a superfamily of 7 conserved prismatic layer-specific mollusk shell proteins, Asprich, were sequenced, and the 42 AA C-terminal sequence region of this protein superfamily was found to introduce surface voids or porosities on calcite crystals in vitro. Using AFM imaging techniques, we further investigate the effect that this 42 AA domain (Fragment-2) and its constituent subdomains, DEAD-17 and Acidic-2, have on the morphology and growth kinetics of calcite dislocation hillocks. We find that Fragment-2 adsorbs on terrace surfaces and pins acute steps, accelerates then decelerates the growth of obtuse steps, forms clusters and voids on terrace surfaces, and transforms calcite hillock morphology from a rhombohedral form to a rounded one. These results mirror yet are distinct from some of the earlier findings obtained for nacreous polypeptides. The subdomains Acidic-2 and DEAD-17 were found to accelerate then decelerate obtuse steps and induce oval rather than rounded hillock morphologies. Unlike DEAD-17, Acidic-2 does form clusters on terrace surfaces and exhibits stronger obtuse velocity inhibition effects than either DEAD-17 or Fragment-2. Interestingly, a 1:1 mixture of both subdomains induces an irregular polygonal morphology to hillocks, and exhibits the highest degree of acute step pinning and obtuse step velocity inhibition. This suggests that there is some interplay between subdomains within an intra (Fragment-2) or intermolecular (1:1 mixture) context, and sequence interplay phenomena may be employed by biomineralization proteins to exert net effects on crystal growth and morphology.

  11. Analysis of single nucleic acid molecules in micro- and nano-fluidics.

    PubMed

    Friedrich, Sarah M; Zec, Helena C; Wang, Tza-Huei

    2016-03-01

    Nucleic acid analysis has enhanced our understanding of biological processes and disease progression, elucidated the association of genetic variants and disease, and led to the design and implementation of new treatment strategies. These diverse applications require analysis of a variety of characteristics of nucleic acid molecules: size or length, detection or quantification of specific sequences, mapping of the general sequence structure, full sequence identification, analysis of epigenetic modifications, and observation of interactions between nucleic acids and other biomolecules. Strategies that can detect rare or transient species, characterize population distributions, and analyze small sample volumes enable the collection of richer data from biosamples. Platforms that integrate micro- and nano-fluidic operations with high sensitivity single molecule detection facilitate manipulation and detection of individual nucleic acid molecules. In this review, we will highlight important milestones and recent advances in single molecule nucleic acid analysis in micro- and nano-fluidic platforms. We focus on assessment modalities for single nucleic acid molecules and highlight the role of micro- and nano-structures and fluidic manipulation. We will also briefly discuss future directions and the current limitations and obstacles impeding even faster progress toward these goals. PMID:26818700

  12. Sequence and transcription analysis of the human cytomegalovirus DNA polymerase gene

    SciTech Connect

    Kouzarides, T.; Bankier, A.T.; Satchwell, S.C.; Weston, K.; Tomlinson, P.; Barrell, B.G.

    1987-01-01

    DNA sequence analysis has revealed that the gene coding for the human cytomegalovirus (HCMV) DNA polymerase is present within the long unique region of the virus genome. Identification is based on extensive amino acid homology between the predicted HCMV open reading frame HFLF2 and the DNA polymerase of herpes simplex virus type 1. The authors present here a 5280 base-pair DNA sequence containing the HCMV pol gene, along with the analysis of transcripts encoded within this region. Since HCMV pol also shows homology to the predicted Epstein-Barr virus pol, they were able to analyze the extent of homology between the DNA polymerases of three distantly related herpes viruses, HCMV, Epstein-Barr virus, and herpes simplex virus. The comparison shows that these DNA polymerases exhibit considerable amino acid homology and highlights a number of highly conserved regions; two such regions show homology to sequences within the adenovirus type 2 DNA polymerase. The HCMV pol gene is flanked by open reading frames with homology to those of other herpes viruses; upstream, there is a reading frame homologous to the glycoprotein B gene of herpes simplex virus type I and Epstein-Barr virus, and downstream there is a reading frame homologous to BFLF2 of Epstein-Barr virus.

  13. Genome Sequence and Analysis of the Oral Bacterium Fusobacterium nucleatum Strain ATCC 25586

    PubMed Central

    Kapatral, Vinayak; Anderson, Iain; Ivanova, Natalia; Reznik, Gary; Los, Tamara; Lykidis, Athanasios; Bhattacharyya, Anamitra; Bartman, Allen; Gardner, Warren; Grechkin, Galina; Zhu, Lihua; Vasieva, Olga; Chu, Lien; Kogan, Yakov; Chaga, Oleg; Goltsman, Eugene; Bernal, Axel; Larsen, Niels; D'Souza, Mark; Walunas, Theresa; Pusch, Gordon; Haselkorn, Robert; Fonstein, Michael; Kyrpides, Nikos; Overbeek, Ross

    2002-01-01

    We present a complete DNA sequence and metabolic analysis of the dominant oral bacterium Fusobacterium nucleatum. Although not considered a major dental pathogen on its own, this anaerobe facilitates the aggregation and establishment of several other species including the dental pathogens Porphyromonas gingivalis and Bacteroides forsythus. The F. nucleatum strain ATCC 25586 genome was assembled from shotgun sequences and analyzed using the ERGO bioinformatics suite (http://www.integratedgenomics.com). The genome contains 2.17 Mb encoding 2,067 open reading frames, organized on a single circular chromosome with 27% GC content. Despite its taxonomic position among the gram-negative bacteria, several features of its core metabolism are similar to that of gram-positive Clostridium spp., Enterococcus spp., and Lactococcus spp. The genome analysis has revealed several key aspects of the pathways of organic acid, amino acid, carbohydrate, and lipid metabolism. Nine very-high-molecular-weight outer membrane proteins are predicted from the sequence, none of which has been reported in the literature. More than 137 transporters for the uptake of a variety of substrates such as peptides, sugars, metal ions, and cofactors have been identified. Biosynthetic pathways exist for only three amino acids: glutamate, aspartate, and asparagine. The remaining amino acids are imported as such or as di- or oligopeptides that are subsequently degraded in the cytoplasm. A principal source of energy appears to be the fermentation of glutamate to butyrate. Additionally, desulfuration of cysteine and methionine yields ammonia, H2S, methyl mercaptan, and butyrate, which are capable of arresting fibroblast growth, thus preventing wound healing and aiding penetration of the gingival epithelium. The metabolic capabilities of F. nucleatum revealed by its genome are therefore consistent with its specialized niche in the mouth. PMID:11889109

  14. Genome sequence and analysis of the oral bacterium Fusobacterium nucleatum strain ATCC 25586.

    PubMed

    Kapatral, Vinayak; Anderson, Iain; Ivanova, Natalia; Reznik, Gary; Los, Tamara; Lykidis, Athanasios; Bhattacharyya, Anamitra; Bartman, Allen; Gardner, Warren; Grechkin, Galina; Zhu, Lihua; Vasieva, Olga; Chu, Lien; Kogan, Yakov; Chaga, Oleg; Goltsman, Eugene; Bernal, Axel; Larsen, Niels; D'Souza, Mark; Walunas, Theresa; Pusch, Gordon; Haselkorn, Robert; Fonstein, Michael; Kyrpides, Nikos; Overbeek, Ross

    2002-04-01

    We present a complete DNA sequence and metabolic analysis of the dominant oral bacterium Fusobacterium nucleatum. Although not considered a major dental pathogen on its own, this anaerobe facilitates the aggregation and establishment of several other species including the dental pathogens Porphyromonas gingivalis and Bacteroides forsythus. The F. nucleatum strain ATCC 25586 genome was assembled from shotgun sequences and analyzed using the ERGO bioinformatics suite (http://www.integratedgenomics.com). The genome contains 2.17 Mb encoding 2,067 open reading frames, organized on a single circular chromosome with 27% GC content. Despite its taxonomic position among the gram-negative bacteria, several features of its core metabolism are similar to that of gram-positive Clostridium spp., Enterococcus spp., and Lactococcus spp. The genome analysis has revealed several key aspects of the pathways of organic acid, amino acid, carbohydrate, and lipid metabolism. Nine very-high-molecular-weight outer membrane proteins are predicted from the sequence, none of which has been reported in the literature. More than 137 transporters for the uptake of a variety of substrates such as peptides, sugars, metal ions, and cofactors have been identified. Biosynthetic pathways exist for only three amino acids: glutamate, aspartate, and asparagine. The remaining amino acids are imported as such or as di- or oligopeptides that are subsequently degraded in the cytoplasm. A principal source of energy appears to be the fermentation of glutamate to butyrate. Additionally, desulfuration of cysteine and methionine yields ammonia, H(2)S, methyl mercaptan, and butyrate, which are capable of arresting fibroblast growth, thus preventing wound healing and aiding penetration of the gingival epithelium. The metabolic capabilities of F. nucleatum revealed by its genome are therefore consistent with its specialized niche in the mouth. PMID:11889109

  15. Environmental impact analysis for the main accidental sequences of ignitor

    SciTech Connect

    Carpignano, A.; Francabandiera, S.; Vella, R.; Zucchetti, M.

    1996-12-31

    A safety analysis study has been applied to the Ignitor machine using Probabilistic Safety Assessment. The main initiating events have been identified, and accident sequences have been studied by means of traditional methods such as Failure Mode and Effect Analysis (FMEA), Fault Trees (FT) and Event Trees (ET). The consequences of the radioactive environmental releases have been assessed in terms of Effective Dose Equivalent (EDEs) to the Most Exposed Individuals (MEI) of the chosen site, by means of a population dose code. Results point out the low enviromental impact of the machine. 13 refs., 1 fig., 3 tabs.

  16. The design and analysis of transposon insertion sequencing experiments.

    PubMed

    Chao, Michael C; Abel, Sören; Davis, Brigid M; Waldor, Matthew K

    2016-02-01

    Transposon insertion sequencing (TIS) is a powerful approach that can be extensively applied to the genome-wide definition of loci that are required for bacterial growth under diverse conditions. However, experimental design choices and stochastic biological processes can heavily influence the results of TIS experiments and affect downstream statistical analysis. In this Opinion article, we discuss TIS experimental parameters and how these factors relate to the benefits and limitations of the various statistical frameworks that can be applied to the computational analysis of TIS data. PMID:26775926

  17. Fast computational methods for predicting protein structure from primary amino acid sequence

    DOEpatents

    Agarwal, Pratul Kumar

    2011-07-19

    The present invention provides a method utilizing primary amino acid sequence of a protein, energy minimization, molecular dynamics and protein vibrational modes to predict three-dimensional structure of a protein. The present invention also determines possible intermediates in the protein folding pathway. The present invention has important applications to the design of novel drugs as well as protein engineering. The present invention predicts the three-dimensional structure of a protein independent of size of the protein, overcoming a significant limitation in the prior art.

  18. An editing environment for DNA sequence analysis and annotation

    SciTech Connect

    Uberbacher, E.C.; Xu, Y.; Shah, M.B.; Olman, V.; Parang, M.; Mural, R.

    1998-12-31

    This paper presents a computer system for analyzing and annotating large-scale genomic sequences. The core of the system is a multiple-gene structure identification program, which predicts the most probable gene structures based on the given evidence, including pattern recognition, EST and protein homology information. A graphics-based user interface provides an environment which allows the user to interactively control the evidence to be used in the gene identification process. To overcome the computational bottleneck in the database similarity search used in the gene identification process, the authors have developed an effective way to partition a database into a set of sub-databases of related sequences, and reduced the search problem on a large database to a signature identification problem and a search problem on a much smaller sub-database. This reduces the number of sequences to be searched from N to O({radical}N) on average, and hence greatly reduces the search time, where N is the number of sequences in the original database. The system provides the user with the ability to facilitate and modify the analysis and modeling in real time.

  19. Application of Subspace Clustering in DNA Sequence Analysis.

    PubMed

    Wallace, Tim; Sekmen, Ali; Wang, Xiaofei

    2015-10-01

    Identification and clustering of orthologous genes plays an important role in developing evolutionary models such as validating convergent and divergent phylogeny and predicting functional proteins in newly sequenced species of unverified nucleotide protein mappings. Here, we introduce an application of subspace clustering as applied to orthologous gene sequences and discuss the initial results. The working hypothesis is based upon the concept that genetic changes between nucleotide sequences coding for proteins among selected species and groups may lie within a union of subspaces for clusters of the orthologous groups. Estimates for the subspace dimensions were computed for a small population sample. A series of experiments was performed to cluster randomly selected sequences. The experimental design allows for both false positives and false negatives, and estimates for the statistical significance are provided. The clustering results are consistent with the main hypothesis. A simple random mutation binary tree model is used to simulate speciation events that show the interdependence of the subspace rank versus time and mutation rates. The simple mutation model is found to be largely consistent with the observed subspace clustering singular value results. Our study indicates that the subspace clustering method may be applied in orthology analysis. PMID:26162018

  20. Amino-terminal amino acid sequence of the major structural polypeptides of avian retroviruses: sequence homology between reticuloendotheliosis virus p30 and p30s of mammalian retroviruses.

    PubMed Central

    Hunter, E; Bhown, A S; Bennett, J C

    1978-01-01

    The major structural polypeptides, p30 of reticuloendotheliosis virus (REV) (strain T) and p27 of avian sarcoma virus B77, have been compared with regard to amino acid composition. NH2-terminal amino acid sequence, and immunological crossreactions. The amino acid composition of the two polypeptides is distinct, and a comparison of the first 30 NH2-terminal amino acids of REV p30 with that for the first 25 of B77 p27 yields only three homologous residues. In competition radioimmunoassays the polypeptides show no crossreactivity. A comparison of the amino acid composition and NH2-terminal amino acid sequence of REV p30 with those reported for several mammalian retrovirus p30s shows remarkable similarities. Both REV and mammalian p30s contain a large number of polar residues in their amino acid composition and show approximately 40% homology in the first 30 NH2-terminal amino acids. No crossreactivity could be observed, however, in competition radioimmunoassays between Rauscher murine leukemia virus p30 and that of REV. The observations reported here suggest a close evolutionary relationship between REV and the mammalian retroviruses. Images PMID:208072

  1. Purification and amino acid sequence of aminopeptidase P from pig kidney.

    PubMed

    Vergas Romero, C; Neudorfer, I; Mann, K; Schäfer, W

    1995-04-01

    Aminopeptidase P from kidney cortex was purified in high yield (recovery greater than or equal to 20%) by a series of column chromatographic steps after solubilization of the membrane-bound glycoprotein with n-butanol. A coupled enzymic assay, using Gly-Pro-Pro-NH-Nap as substrate and dipeptidyl-peptidase IV as auxilliary enzyme, was used to monitor the purification. The purification procedure yielded two forms of aminopeptidase P differing in their carbohydrate composition (glycoforms). Both enzyme preparations were homogeneous as assessed by SDS/PAGE silver staining, and isoelectric focusing. Both forms possessed the same substrate specificity, catalysed the same reaction, and consisted of identical protein chains. The amino acid sequence determined by Edman degradation and mass spectrometry consisted of 623 amino acids. Six N-glycosylation sites, all contained in the N-terminal half of the protein, were characterized. PMID:7744038

  2. Draft Genome Sequence of Cupriavidus sp. Strain SK-3, a 4-Chlorobiphenyl- and 4-Clorobenzoic Acid-Degrading Bacterium

    PubMed Central

    Vilo, Claudia; Benedik, Michael J.; Ilori, Matthew

    2014-01-01

    We report the draft genome sequence of Cupriavidus sp. strain SK-3, which can use 4-chlorobiphenyl and 4-clorobenzoic acid as the sole carbon source for growth. The draft genome sequence allowed the study of the polychlorinated biphenyl degradation mechanism and the recharacterization of the strain SK-3 as a Cupriavidus species. PMID:24994805

  3. Draft Genome Sequence of Bacillus subtilis subsp. natto Strain CGMCC 2108, a High Producer of Poly-γ-Glutamic Acid

    PubMed Central

    Tan, Siyuan; Su, Anping; Zhang, Chen; Ren, Yuanyuan

    2016-01-01

    Here, we report the 4.1-Mb draft genome sequence of Bacillus subtilis subsp. natto strain CGMCC 2108, a high producer of poly-γ-glutamic acid (γ-PGA). This sequence will provide further help for the biosynthesis of γ-PGA and will greatly facilitate research efforts in metabolic engineering of B. subtilis subsp. natto strain CGMCC 2108. PMID:27231363

  4. Genome Sequence of the Lactic Acid Bacterium Lactococcus lactis subsp. lactis TOMSC161, Isolated from a Nonscalded Curd Pressed Cheese

    PubMed Central

    Velly, H.; Abraham, A.-L.; Loux, V.; Delacroix-Buchet, A.; Fonseca, F.; Bouix, M.

    2014-01-01

    Lactococcus lactis is a lactic acid bacterium used in the production of many fermented foods, such as dairy products. Here, we report the genome sequence of L. lactis subsp. lactis TOMSC161, isolated from nonscalded curd pressed cheese. This genome sequence provides information in relation to dairy environment adaptation. PMID:25377704

  5. Draft Genome Sequence of Bacillus subtilis subsp. natto Strain CGMCC 2108, a High Producer of Poly-γ-Glutamic Acid.

    PubMed

    Tan, Siyuan; Meng, Yonghong; Su, Anping; Zhang, Chen; Ren, Yuanyuan

    2016-01-01

    Here, we report the 4.1-Mb draft genome sequence of Bacillus subtilis subsp. natto strain CGMCC 2108, a high producer of poly-γ-glutamic acid (γ-PGA). This sequence will provide further help for the biosynthesis of γ-PGA and will greatly facilitate research efforts in metabolic engineering of B. subtilis subsp. natto strain CGMCC 2108. PMID:27231363

  6. Comparative sequence analysis of double stranded RNA binding protein encoding gene of parapoxviruses from Indian camels.

    PubMed

    Nagarajan, G; Swami, Shelesh Kumar; Dahiya, Shyam Singh; Sivakumar, G; Tuteja, F C; Narnaware, S D; Mehta, S C; Singh, Raghvendar; Patil, N V

    2014-03-01

    The dsRNA binding protein (RBP) encoding gene of parapoxviruses (PPVs) from the Dromedary camels, inhabitating different geographical region of Rajasthan, India were amplified by polymerase chain reaction using the primers of pseudocowpoxvirus (PCPV) from Finnish reindeer and cloned into pGEM-T for sequence analysis. Analysis of RBP encoding gene revealed that PPV DNA from Bikaner shared 98.3% and 76.6% sequence identity at the amino acid level, with Pali and Udaipur PPV DNA, respectively. Reference strains of Bovine papular stomatitis virus (BPSV) and PCPV (reindeer PCPV and human PCPV) shared 52.8% and 86.9% amino acid identity with RBP gene of camel PPVs from Bikaner, respectively. But different strains of orf virus (ORFV) from different geographical areas of the world shared 69.5-71.7% amino acid identity with RBP gene of camel PPVs from Bikaner. These findings indicate that the camel PPVs described are closely related to bovine PPV (PCPV) in comparison to caprine and ovine PPV (ORFV). PMID:25685494

  7. Analysis of the constitution of the beer yeast genome by PCR, sequencing and subtelomeric sequence hybridization.

    PubMed

    Casaregola, S; Nguyen, H V; Lapathitis, G; Kotyk, A; Gaillardin, C

    2001-07-01

    The lager brewing yeasts, Saccharomyces pastorianus (synonym Saccharomyces carlsbergensis), are allopolyploid, containing parts of two divergent genomes. Saccharomyces cerevisiae contributed to the formation of these hybrids, although the identity of the other species is still unclear. The presence of alleles specific to S. cerevisiae and S. pastorianus was tested for by PCR/RFLP in brewing yeasts of various origins and in members of the Saccharomyces sensu stricto complex. S. cerevisiae-type alleles of two genes, HIS4 and YCL008c, were identified in another brewing yeast, S. pastorianus CBS 1503 (Saccharomyces monacensis), thought to be the source of the other contributor to the lager hybrid. This is consistent with the hybridization of S. cerevisiae subtelomeric sequences X and Y' to the electrophoretic karyotype of this strain. S. pastorianus CBS 1503 (S. monacensis) is therefore probably not an ancestor of S. pastorianus, but a related hybrid. Saccharomyces bayanus, also thought to be one of the contributors to the lager yeast hybrid, is a heterogeneous taxon containing at least two subgroups, one close to the type strain, CBS 380T, the other close to CBS 395 (Saccharomyces uvarum). The partial sequences of several genes (HIS4, MET10, URA3) were shown to be identical or very similar (over 99%) in S. pastorianus CBS 1513 (S. carlsbergensis), S. bayanus CBS 380T and its close derivatives, showing that S. pastorianus and S. bayanus have a common ancestor. A distinction between two subgroups within S. bayanus was made on the basis of sequence analysis: the subgroup represented by S. bayanus CBS 395 (S. uvarum) has 6-8% sequence divergence within the genes HIS4, MET10 and MET2 from S. bayanus CBS 380T, indicating that the two S. bayanus subgroups diverged recently. The detection of specific alleles by PCR/RFLP and hybridization with S. cerevisiae subtelomeric sequences X and Y' to electrophoretic karyotypes of brewing yeasts and related species confirmed our

  8. Systematic sequencing of the Escherichia coli genome: analysis of the 0-2.4 min region.

    PubMed Central

    Yura, T; Mori, H; Nagai, H; Nagata, T; Ishihama, A; Fujita, N; Isono, K; Mizobuchi, K; Nakata, A

    1992-01-01

    A contiguous 111,402-nucleotide sequence corresponding to the 0 to 2.4 min region of the E. coli chromosome was determined as a first step to complete structural analysis of the genome. The resulting sequence was used to predict open reading frames and to search for sequence similarity against the PIR protein database. A number of novel genes were found whose predicted protein sequences showed significant homology with known proteins from various organisms, including several clusters of genes similar to those involved in fatty acid metabolism in bacteria (e.g., betT, baiF) and higher organisms, iron transport (sfuA, B, C) in Serratia marcescens, and symbiotic nitrogen fixation or electron transport (fixA, B, C, X) in Azorhizobium caulinodans. In addition, several genes and IS elements that had been mapped but not sequenced (e.g., leuA, B, C, D) were identified. We estimate that about 90 genes are represented in this region of the chromosome with little spacer. Images PMID:1630901

  9. Nucleic Acid Aptamers for Living Cell Analysis

    NASA Astrophysics Data System (ADS)

    Xiong, Xiangling; Lv, Yifan; Chen, Tao; Zhang, Xiaobing; Wang, Kemin; Tan, Weihong

    2014-06-01

    Cells as the building blocks of life determine the basic functions and properties of a living organism. Understanding the structure and components of a cell aids in the elucidation of its biological functions. Moreover, knowledge of the similarities and differences between diseased and healthy cells is essential to understanding pathological mechanisms, identifying diagnostic markers, and designing therapeutic molecules. However, monitoring the structures and activities of a living cell remains a challenging task in bioanalytical and life science research. To meet the requirements of this task, aptamers, as “chemical antibodies,” have become increasingly powerful tools for cellular analysis. This article reviews recent advances in the development of nucleic acid aptamers in the areas of cell membrane analysis, cell detection and isolation, real-time monitoring of cell secretion, and intracellular delivery and analysis with living cell models. Limitations of aptamers and possible solutions are also discussed.

  10. Differentiation of sheep pox and goat poxviruses by sequence analysis and PCR-RFLP of P32 gene.

    PubMed

    Hosamani, Madhusudan; Mondal, Bimalendu; Tembhurne, Prabhakar A; Bandyopadhyay, Santanu Kumar; Singh, Raj Kumar; Rasool, Thaha Jamal

    2004-08-01

    Sheep pox and Goat pox are highly contagious viral diseases of small ruminants. These diseases were earlier thought to be caused by a single species of virus, as they are serologically indistinguishable. P32, one of the major immunogenic genes of Capripoxvirus, was isolated and Sequenced from two Indian isolates of goat poxvirus (GPV) and a vaccine strain of sheep poxvirus (SPV). The sequences were compared with other P32 sequences of capripoxviruses available in the database. Sequence analysis revealed that sheep pox and goat poxviruses share 97.5 and 94.7% homology at nucleotide and amino acid level, respectively. A major difference between them is the presence of an additional aspartic acid at 55th position of P32 of sheep poxvirus that is absent in both goat poxvirus and lumpy skin disease virus. Further, six unique neutral nucleotide substitutions were observed at positions 77, 275, 403, 552, 867 and 964 in the sequence of goat poxvirus, which can be taken as GPV signature residues. Similar unique nucleotide signatures could be identified in SPV and LSDV sequences also. Phylogenetic analysis showed that members of the Capripoxvirus could be delineated into three distinct clusters of GPV, SPV and LSDV based on the P32 genomic sequence. Using this information, a PCR-RFLP method has been developed for unequivocal genomic differentiation of SPV and GPV. PMID:15215685

  11. Multiple Amino Acid Sequence Alignment Nitrogenase Component 1: Insights into Phylogenetics and Structure-Function Relationships

    PubMed Central

    Howard, James B.; Kechris, Katerina J.; Rees, Douglas C.; Glazer, Alexander N.

    2013-01-01

    Amino acid residues critical for a protein's structure-function are retained by natural selection and these residues are identified by the level of variance in co-aligned homologous protein sequences. The relevant residues in the nitrogen fixation Component 1 α- and β-subunits were identified by the alignment of 95 protein sequences. Proteins were included from species encompassing multiple microbial phyla and diverse ecological niches as well as the nitrogen fixation genotypes, anf, nif, and vnf, which encode proteins associated with cofactors differing at one metal site. After adjusting for differences in sequence length, insertions, and deletions, the remaining >85% of the sequence co-aligned the subunits from the three genotypes. Six Groups, designated Anf, Vnf , and Nif I-IV, were assigned based upon genetic origin, sequence adjustments, and conserved residues. Both subunits subdivided into the same groups. Invariant and single variant residues were identified and were defined as “core” for nitrogenase function. Three species in Group Nif-III, Candidatus Desulforudis audaxviator, Desulfotomaculum kuznetsovii, and Thermodesulfatator indicus, were found to have a seleno-cysteine that replaces one cysteinyl ligand of the 8Fe:7S, P-cluster. Subsets of invariant residues, limited to individual groups, were identified; these unique residues help identify the gene of origin (anf, nif, or vnf) yet should not be considered diagnostic of the metal content of associated cofactors. Fourteen of the 19 residues that compose the cofactor pocket are invariant or single variant; the other five residues are highly variable but do not correlate with the putative metal content of the cofactor. The variable residues are clustered on one side of the cofactor, away from other functional centers in the three dimensional structure. Many of the invariant and single variant residues were not previously recognized as potentially critical and their identification provides the bases

  12. Computer analysis of phytochrome sequences and reevaluation of the phytochrome secondary structure by Fourier transform infrared spectroscopy.

    PubMed

    Sühnel, J; Hermann, G; Dornberger, U; Fritzsche, H

    1997-07-18

    A repertoire of various methods of computer sequence analysis was applied to phytochromes in order to gain new insights into their structure and function. A statistical analysis of 23 complete phytochrome sequences revealed regions of non-random amino acid composition, which are supposed to be of particular structural or functional importance. All phytochromes other than phyD and phyE from Arabidopsis have at least one such region at the N-terminus between residues 2 and 35. A sequence similarity search of current databases indicated striking homologies between all phytochromes and a hypothetical 84.2-kDa protein from the cyanobacterium Synechocystis. Furthermore, scanning the phytochrome sequences for the occurrence of patterns defined in the PROSITE database detected the signature of the WD repeats of the beta-transducin family within the functionally important 623-779 region (sequence numbering of phyA from Avena) in a number of phytochromes. A multiple sequence alignment performed with 23 complete phytochrome sequences is made available via the IMB Jena World-Wide Web server (http://www.imb-jena.de/PHYTO.html). It can be used as a working tool for future theoretical and experimental studies. Based on the multiple alignment striking sequence differences between phytochromes A and B were detected directly at the N-terminal end, where all phytochromes B have an additional stretch of 15-42 amino acids. There is also a variety of positions with totally conserved but different amino acids in phytochromes A and B. Most of these changes are found in the sequence segment 150-200. It is, therefore, suggested that this region might be of importance in determining the photosensory specificity of the two phytochromes. The secondary structure prediction based on the multiple alignment resulted in a small but significant beta-sheet content. This finding is confirmed by a reevaluation of the secondary structure using FTIR spectroscopy. PMID:9252112

  13. Molecular cloning, sequence characteristics, and tissue expression analysis of ECE1 gene in Tibetan pig.

    PubMed

    Wang, Yan-Dong; Zhang, Jian; Li, Chuan-Hao; Xu, Hai-Peng; Chen, Wei; Zeng, Yong-Qing; Wang, Hui

    2015-10-25

    Low air pressure and low oxygen partial pressure at high altitude seriously affect the survival and development of human beings and animals. ECE1 is a recently discovered gene that is involved in anti-hypoxia, but the full-length cDNA sequence has not been obtained. For a better understanding of the structure and function of the ECE1 gene and to study its effect in Tibetan pig, the cDNA of the ECE1 gene from the muscle of Tibetan pig was cloned, sequenced and characterized. The ECE1 full-length cDNA sequence consists of 2262 bp coding sequence (CDS) that encodes 753 amino acids with a molecular mass of 85,449 kD, 2 bp 5'UTR and 1507 bp 3'UTR. In addition, the phylogenetic tree analysis revealed that the Tibetan pig ECE1 has a closer genetic relationship and evolution distance with the land mammals ECE1. Furthermore, analysis by qPCR showed that the ECE1 transcript is constitutively expressed in the 10 tissues tested: the liver, subcutaneous fat, kidney, muscle, stomach, heart, brain, spleen, pancreas, and lung. These results serve as a foundation for further insight into the Tibetan pig ECE1 gene. PMID:26115769

  14. Bayesian Analysis and Segmentation of Multichannel Image Sequences

    NASA Astrophysics Data System (ADS)

    Chang, Michael Ming Hsin

    This thesis is concerned with the segmentation and analysis of multichannel image sequence data. In particular, we use maximum a posteriori probability (MAP) criterion and Gibbs random fields (GRF) to formulate the problems. We start by reviewing the significance of MAP estimation with GRF priors and study the feasibility of various optimization methods for implementing the MAP estimator. We proceed to investigate three areas where image data and parameter estimates are present in multichannels, multiframes, and interrelated in complicated manners. These areas of study include color image segmentation, multislice MR image segmentation, and optical flow estimation and segmentation in multiframe temporal sequences. Besides developing novel algorithms in each of these areas, we demonstrate how to exploit the potential of MAP estimation and GRFs, and we propose practical and efficient implementations. Illustrative examples and relevant experimental results are included.

  15. Nonlinear analysis of correlations in Alu repeat sequences in DNA

    NASA Astrophysics Data System (ADS)

    Xiao, Yi; Huang, Yanzhao; Li, Mingfeng; Xu, Ruizhen; Xiao, Saifeng

    2003-12-01

    We report on a nonlinear analysis of deterministic structures in Alu repeats, one of the richest repetitive DNA sequences in the human genome. Alu repeats contain the recognition sites for the restriction endonuclease AluI, which is what gives them their name. Using the nonlinear prediction method developed in chaos theory, we find that all Alu repeats have novel deterministic structures and show strong nonlinear correlations that are absent from exon and intron sequences. Furthermore, the deterministic structures of Alus of younger subfamilies show panlike shapes. As young Alus can be seen as mutation free copies from the “master genes,” it may be suggested that the deterministic structures of the older subfamilies are results of an evolution from a “panlike” structure to a more diffuse correlation pattern due to mutation.

  16. The DNA sequence and analysis of human chromosome 13

    PubMed Central

    Dunham, A.; Matthews, L. H.; Burton, J.; Ashurst, J. L.; Howe, K. L.; Ashcroft, K. J.; Beare, D. M.; Burford, D. C.; Hunt, S. E.; Griffiths-Jones, S.; Jones, M. C.; Keenan, S. J.; Oliver, K.; Scott, C. E.; Ainscough, R.; Almeida, J. P.; Ambrose, K. D.; Andrews, D. T.; Ashwell, R. I. S.; Babbage, A. K.; Bagguley, C. L.; Bailey, J.; Bannerjee, R.; Barlow, K. F.; Bates, K.; Beasley, H.; Bird, C. P.; Bray-Allen, S.; Brown, A. J.; Brown, J. Y.; Burrill, W.; Carder, C.; Carter, N. P.; Chapman, J. C.; Clamp, M. E.; Clark, S. Y.; Clarke, G.; Clee, C. M.; Clegg, S. C. M.; Cobley, V.; Collins, J. E.; Corby, N.; Coville, G. J.; Deloukas, P.; Dhami, P.; Dunham, I.; Dunn, M.; Earthrowl, M. E.; Ellington, A. G.; Faulkner, L.; Frankish, A. G.; Frankland, J.; French, L.; Garner, P.; Garnett, J.; Gilbert, J. G. R.; Gilson, C. J.; Ghori, J.; Grafham, D. V.; Gribble, S. M.; Griffiths, C.; Hall, R. E.; Hammond, S.; Harley, J. L.; Hart, E. A.; Heath, P. D.; Howden, P. J.; Huckle, E. J.; Hunt, P. J.; Hunt, A. R.; Johnson, C.; Johnson, D.; Kay, M.; Kimberley, A. M.; King, A.; Laird, G. K.; Langford, C. J.; Lawlor, S.; Leongamornlert, D. A.; Lloyd, D. M.; Lloyd, C.; Loveland, J. E.; Lovell, J.; Martin, S.; Mashreghi-Mohammadi, M.; McLaren, S. J.; McMurray, A.; Milne, S.; Moore, M. J. F.; Nickerson, T.; Palmer, S. A.; Pearce, A. V.; Peck, A. I.; Pelan, S.; Phillimore, B.; Porter, K. M.; Rice, C. M.; Searle, S.; Sehra, H. K.; Shownkeen, R.; Skuce, C. D.; Smith, M.; Steward, C. A.; Sycamore, N.; Tester, J.; Thomas, D. W.; Tracey, A.; Tromans, A.; Tubby, B.; Wall, M.; Wallis, J. M.; West, A. P.; Whitehead, S. L.; Willey, D. L.; Wilming, L.; Wray, P. W.; Wright, M. W.; Young, L.; Coulson, A.; Durbin, R.; Hubbard, T.; Sulston, J. E.; Beck, S.; Bentley, D. R.; Rogers, J.; Ross, M. T.

    2009-01-01

    Chromosome 13 is the largest acrocentric human chromosome. It carries genes involved in cancer including the breast cancer type 2 (BRCA2) and retinoblastoma (RB1) genes, is frequently rearranged in B-cell chronic lymphocytic leukaemia, and contains the DAOA locus associated with bipolar disorder and schizophrenia. We describe completion and analysis of 95.5 megabases (Mb) of sequence from chromosome 13, which contains 633 genes and 296 pseudogenes. We estimate that more than 95.4% of the protein-coding genes of this chromosome have been identified, on the basis of comparison with other vertebrate genome sequences. Additionally, 105 putative non-coding RNA genes were found. Chromosome 13 has one of the lowest gene densities (6.5 genes per Mb) among human chromosomes, and contains a central region of 38 Mb where the gene density drops to only 3.1 genes per Mb. PMID:15057823

  17. The DNA sequence and analysis of human chromosome 13.

    PubMed

    Dunham, A; Matthews, L H; Burton, J; Ashurst, J L; Howe, K L; Ashcroft, K J; Beare, D M; Burford, D C; Hunt, S E; Griffiths-Jones, S; Jones, M C; Keenan, S J; Oliver, K; Scott, C E; Ainscough, R; Almeida, J P; Ambrose, K D; Andrews, D T; Ashwell, R I S; Babbage, A K; Bagguley, C L; Bailey, J; Bannerjee, R; Barlow, K F; Bates, K; Beasley, H; Bird, C P; Bray-Allen, S; Brown, A J; Brown, J Y; Burrill, W; Carder, C; Carter, N P; Chapman, J C; Clamp, M E; Clark, S Y; Clarke, G; Clee, C M; Clegg, S C M; Cobley, V; Collins, J E; Corby, N; Coville, G J; Deloukas, P; Dhami, P; Dunham, I; Dunn, M; Earthrowl, M E; Ellington, A G; Faulkner, L; Frankish, A G; Frankland, J; French, L; Garner, P; Garnett, J; Gilbert, J G R; Gilson, C J; Ghori, J; Grafham, D V; Gribble, S M; Griffiths, C; Hall, R E; Hammond, S; Harley, J L; Hart, E A; Heath, P D; Howden, P J; Huckle, E J; Hunt, P J; Hunt, A R; Johnson, C; Johnson, D; Kay, M; Kimberley, A M; King, A; Laird, G K; Langford, C J; Lawlor, S; Leongamornlert, D A; Lloyd, D M; Lloyd, C; Loveland, J E; Lovell, J; Martin, S; Mashreghi-Mohammadi, M; McLaren, S J; McMurray, A; Milne, S; Moore, M J F; Nickerson, T; Palmer, S A; Pearce, A V; Peck, A I; Pelan, S; Phillimore, B; Porter, K M; Rice, C M; Searle, S; Sehra, H K; Shownkeen, R; Skuce, C D; Smith, M; Steward, C A; Sycamore, N; Tester, J; Thomas, D W; Tracey, A; Tromans, A; Tubby, B; Wall, M; Wallis, J M; West, A P; Whitehead, S L; Willey, D L; Wilming, L; Wray, P W; Wright, M W; Young, L; Coulson, A; Durbin, R; Hubbard, T; Sulston, J E; Beck, S; Bentley, D R; Rogers, J; Ross, M T

    2004-04-01

    Chromosome 13 is the largest acrocentric human chromosome. It carries genes involved in cancer including the breast cancer type 2 (BRCA2) and retinoblastoma (RB1) genes, is frequently rearranged in B-cell chronic lymphocytic leukaemia, and contains the DAOA locus associated with bipolar disorder and schizophrenia. We describe completion and analysis of 95.5 megabases (Mb) of sequence from chromosome 13, which contains 633 genes and 296 pseudogenes. We estimate that more than 95.4% of the protein-coding genes of this chromosome have been identified, on the basis of comparison with other vertebrate genome sequences. Additionally, 105 putative non-coding RNA genes were found. Chromosome 13 has one of the lowest gene densities (6.5 genes per Mb) among human chromosomes, and contains a central region of 38 Mb where the gene density drops to only 3.1 genes per Mb. PMID:15057823

  18. Amino acid sequence and posttranslational modifications of human factor VII sub a from plasma and transfected baby hamster kidney cells

    SciTech Connect

    Thim, L.; Bjoern, S.; Christensen, M.; Nicolaisen, E.M.; Lund-Hansen, T.; Pedersen, A.H.; Hedner, U. )

    1988-10-04

    Blood coagulation factor VII is a vitamin K dependent glycoprotein which in its activated form, factor VII{sub a}, participates in the coagulation process by activating factor X and/or factor IX in the presence of Ca{sup 2+} and tissue factor. Three types of potential posttranslational modifications exist in the human factor VII{sub a} molecule, namely, 10 {gamma}-carboxylated, N-terminally located glutamic acid residues, 1 {beta}-hydroxylated aspartic acid residue, and 2 N-glycosylated asparagine residues. In the present study, the amino acid sequence and posttranslational modifications of recombinant factor VII{sub a} as purified from the culture medium of a transfected baby hamster kidney cell line have been compared to human plasma factor VII{sub a}. By use of HPLC, amino acid analysis, peptide mapping, and automated Edman degradation, the protein backbone of recombinant factor VII{sub a} was found to be identical with human factor VII{sub a}. Asparagine residues 145 and 322 were found to be fully N-glycosylated in human plasma factor VII{sub a}. In the recombinant factor VII{sub a}, asparagine residue 322 was fully glycosylated whereas asparagine residue 145 was only partially (approximately 66%) glycosylated. Besides minor differences in the sialic acid and fucose contents, the overall carbohydrate compositions were nearly identical in recombinant factor VII{sub a} and human plasma factor VII{sub a}. These results show that factor VII{sub a} as produced in the transfected baby hamster kidney cells is very similar to human plasma factor VII{sub a} and that this cell line thus might represent an alternative source for human factor VII{sub a}.

  19. Draft Genome Sequences of Gluconobacter cerinus CECT 9110 and Gluconobacter japonicus CECT 8443, Acetic Acid Bacteria Isolated from Grape Must

    PubMed Central

    Sainz, Florencia

    2016-01-01

    We report here the draft genome sequences of Gluconobacter cerinus strain CECT9110 and Gluconobacter japonicus CECT8443, acetic acid bacteria isolated from grape must. Gluconobacter species are well known for their ability to oxidize sugar alcohols into the corresponding acids. Our objective was to select strains to oxidize effectively d-glucose. PMID:27365351

  20. Transcriptional analysis of the effect of exogenous decanoic acid stress on Streptomyces roseosporus

    PubMed Central

    2013-01-01

    Backgroud Daptomycin is an important antibiotic against infections caused by drug-resistant pathogens. Its production critically depends on the addition of decanoic acid during fermentation. Unfortunately, decanoic acid (>2.5 mM) is toxic to daptomycin producer, Streptomyces roseosporus. Results To understand the mechanism underlying decanoic tolerance or toxicity, the responses of S. roseosporus was determined by a combination of phospholipid fatty acid analysis, reactive oxygen species (ROS) measurement and RNA sequencing. Assays using fluorescent dyes indicated a sharp increase in reactive oxygen species during decanoic acid stress; fatty acid analysis revealed a marked increase in the composition of branched-chain fatty acids by approximately 10%, with a corresponding decrease in straight-chain fatty acids; functional analysis indicated decanoic acid stress has components common to other stress response, including perturbation of respiratory functions (nuo and cyd operons), oxidative stress, and heat shock. Interestingly, our transcriptomic analysis revealed that genes coding for components of proteasome and related to treholase synthesis were up-regulated in the decanoic acid –treated cells. Conclusion These findings represent an important first step in understanding mechanism of decanoic acid toxicity and provide a basis for engineering microbial tolerance. PMID:23432849

  1. Sequence homology and structural analysis of the clostridial neurotoxins.

    PubMed

    Lacy, D B; Stevens, R C

    1999-09-01

    The clostridial neurotoxins (CNTs), comprised of tetanus neurotoxin (TeNT) and the seven serotypes of botulinum neurotoxin (BoNT A-G), specifically bind to neuronal cells and disrupt neurotransmitter release by cleaving proteins involved in synaptic vesicle membrane fusion. In this study, multiple CNT sequences were analyzed within the context of the 1277 residue BoNT/A crystal structure to gain insight into the events of binding, pore formation, translocation, and catalysis that are required for toxicity. A comparison of the TeNT-binding domain structure to that of BoNT/A reveals striking differences in their surface properties. Further, the solvent accessibility of a key tryptophan in the C terminus of the BoNT/A-binding domain refines the location of the ganglioside-binding site. Data collected from a single frozen crystal of BoNT/A are included in this study, revealing slight differences in the binding domain orientation as well as density for a previously unobserved translocation domain loop. This loop and the conservation of charged residues with structural proximity to putative pore-forming sequences lend insight into the CNT mechanism of pore formation and translocation. The sequence analysis of the catalytic domain revealed an area near the active-site likely to account for specificity differences between the CNTs. It revealed also a tertiary structure, highly conserved in primary sequence, which seems critical to catalysis but is 30 A from the active-site zinc ion. This observation, along with an analysis of the 54 residue "belt" from the translocation domain are discussed with respect to the mechanism of catalysis. PMID:10518945

  2. Digital PCR analysis of circulating nucleic acids.

    PubMed

    Hudecova, Irena

    2015-10-01

    Detection of plasma circulating nucleic acids (CNAs) requires the use of extremely sensitive and precise methods. The commonly used quantitative real-time polymerase chain reaction (PCR) poses certain technical limitations in relation to the precise measurement of CNAs whereas the costs of massively parallel sequencing are still relatively high. Digital PCR (dPCR) now represents an affordable and powerful single molecule counting strategy to detect minute amounts of genetic material with performance surpassing many quantitative methods. Microfluidic (chip) and emulsion (droplet)-based technologies have already been integrated into platforms offering hundreds to millions of nanoliter- or even picoliter-scale reaction partitions. The compelling observations reported in the field of cancer research, prenatal testing, transplantation medicine and virology support translation of this technology into routine use. Extremely sensitive plasma detection of rare mutations originating from tumor or placental cells among a large background of homologous sequences facilitates unraveling of the early stages of cancer or the detection of fetal mutations. Digital measurement of quantitative changes in plasma CNAs associated with cancer or graft rejection provides valuable information on the monitoring of disease burden or the recipient's immune response and subsequent therapy treatment. Furthermore, careful quantitative assessment of the viral load offers great value for effective monitoring of antiviral therapy for immunosuppressed or transplant patients. The present review describes the inherent features of dPCR that make it exceptionally robust in precise and sensitive quantification of CNAs. Moreover, I provide an insight into the types of potential clinical applications that have been developed by researchers to date. PMID:25828047

  3. RoboOligo: software for mass spectrometry data to support manual and de novo sequencing of post-transcriptionally modified ribonucleic acids

    PubMed Central

    Sample, Paul J.; Gaston, Kirk W.; Alfonzo, Juan D.; Limbach, Patrick A.

    2015-01-01

    Ribosomal ribonucleic acid (RNA), transfer RNA and other biological or synthetic RNA polymers can contain nucleotides that have been modified by the addition of chemical groups. Traditional Sanger sequencing methods cannot establish the chemical nature and sequence of these modified-nucleotide containing oligomers. Mass spectrometry (MS) has become the conventional approach for determining the nucleotide composition, modification status and sequence of modified RNAs. Modified RNAs are analyzed by MS using collision-induced dissociation tandem mass spectrometry (CID MS/MS), which produces a complex dataset of oligomeric fragments that must be interpreted to identify and place modified nucleosides within the RNA sequence. Here we report the development of RoboOligo, an interactive software program for the robust analysis of data generated by CID MS/MS of RNA oligomers. There are three main functions of RoboOligo: (i) automated de novo sequencing via the local search paradigm. (ii) Manual sequencing with real-time spectrum labeling and cumulative intensity scoring. (iii) A hybrid approach, coined ‘variable sequencing’, which combines the user intuition of manual sequencing with the high-throughput sampling of automated de novo sequencing. PMID:25820423

  4. Swfoldrate: predicting protein folding rates from amino acid sequence with sliding window method.

    PubMed

    Cheng, Xiang; Xiao, Xuan; Wu, Zhi-cheng; Wang, Pu; Lin, Wei-zhong

    2013-01-01

    Protein folding is the process by which a protein processes from its denatured state to its specific biologically active conformation. Understanding the relationship between sequences and the folding rates of proteins remains an important challenge. Most previous methods of predicting protein folding rate require the tertiary structure of a protein as an input. In this study, the long-range and short-range contact in protein were used to derive extended version of the pseudo amino acid composition based on sliding window method. This method is capable of predicting the protein folding rates just from the amino acid sequence without the aid of any structural class information. We systematically studied the contributions of individual features to folding rate prediction. The optimal feature selection procedures are adopted by means of combining the forward feature selection and sequential backward selection method. Using the jackknife cross validation test, the method was demonstrated on the large dataset. The predictor was achieved on the basis of multitudinous physicochemical features and statistical features from protein using nonlinear support vector machine (SVM) regression model, the method obtained an excellent agreement between predicted and experimentally observed folding rates of proteins. The correlation coefficient is 0.9313 and the standard error is 2.2692. The prediction server is freely available at http://www.jci-bioinfo.cn/swfrate/input.jsp. PMID:22933332

  5. From amino acid sequence to bioactivity: The biomedical potential of antitumor peptides.

    PubMed

    Blanco-Míguez, Aitor; Gutiérrez-Jácome, Alberto; Pérez-Pérez, Martín; Pérez-Rodríguez, Gael; Catalán-García, Sandra; Fdez-Riverola, Florentino; Lourenço, Anália; Sánchez, Borja

    2016-06-01

    Chemoprevention is the use of natural and/or synthetic substances to block, reverse, or retard the process of carcinogenesis. In this field, the use of antitumor peptides is of interest as, (i) these molecules are small in size, (ii) they show good cell diffusion and permeability, (iii) they affect one or more specific molecular pathways involved in carcinogenesis, and (iv) they are not usually genotoxic. We have checked the Web of Science Database (23/11/2015) in order to collect papers reporting on bioactive peptide (1691 registers), which was further filtered searching terms such as "antiproliferative," "antitumoral," or "apoptosis" among others. Works reporting the amino acid sequence of an antiproliferative peptide were kept (60 registers), and this was complemented with the peptides included in CancerPPD, an extensive resource for antiproliferative peptides and proteins. Peptides were grouped according to one of the following mechanism of action: inhibition of cell migration, inhibition of tumor angiogenesis, antioxidative mechanisms, inhibition of gene transcription/cell proliferation, induction of apoptosis, disorganization of tubulin structure, cytotoxicity, or unknown mechanisms. The main mechanisms of action of those antiproliferative peptides with known amino acid sequences are presented and finally, their potential clinical usefulness and future challenges on their application is discussed. PMID:27010507

  6. The amino acid sequences and activities of synergistic hemolysins from Staphylococcus cohnii.

    PubMed

    Mak, Pawel; Maszewska, Agnieszka; Rozalska, Malgorzata

    2008-10-01

    Staphylococcus cohnii ssp. cohnii and S. cohnii ssp. urealyticus are a coagulase-negative staphylococci considered for a long time as unable to cause infections. This situation changed recently and pathogenic strains of these bacteria were isolated from hospital environments, patients and medical staff. Most of the isolated strains were resistant to many antibiotics. The present work describes isolation and characterization of several synergistic peptide hemolysins produced by these bacteria and acting as virulence factors responsible for hemolytic and cytotoxic activities. Amino acid sequences of respective hemolysins from S. cohnii ssp. cohnii (named as H1C, H2C and H3C) and S. cohnii ssp. urealyticus (H1U, H2U and H3U) were identical. Peptides H1 and H3 possessed significant amino acid homology to three synergistic hemolysins secreted by Staphylococcus lugdunensis and to putative antibacterial peptide produced by Staphylococcus saprophyticus ssp. saprophyticus. On the other hand, hemolysin H2 had a unique sequence. All isolated peptides lysed red cells from different mammalian species and exerted a cytotoxic effect on human fibroblasts. PMID:18752624

  7. Phylogenetic analysis of sequences from diverse bacteria with homology to the Escherichia coli rho gene.

    PubMed Central

    Opperman, T; Richardson, J P

    1994-01-01

    Genes from Pseudomonas fluorescens, Chromatium vinosum, Micrococcus luteus, Deinococcus radiodurans, and Thermotoga maritima with homology to the Escherichia coli rho gene were cloned and sequenced, and their sequences were compared with other available sequences. The species for all of the compared sequences are members of five bacterial phyla, including Thermotogales, the most deeply diverged phylum. This suggests that a rho-like gene is ubiquitous in the Bacteria and was present in their common ancestor. The comparative analysis revealed that the Rho homologs are highly conserved, exhibiting a minimum identity of 50% of their amino acid residues in pairwise comparisons. The ATP-binding domain had a particularly high degree of conservation, consisting of some blocks with sequences of residues that are very similar to segments of the alpha and beta subunits of F1-ATPase and of other blocks with sequences that are unique to Rho. The RNA-binding domain is more diverged than the ATP-binding domain. However, one of its most highly conserved segments includes a RNP1-like sequence, which is known to be involved in RNA binding. Overall, the degree of similarity is lowest in the first 50 residues (the first half of the RNA-binding domain), in the putative connector region between the RNA-binding and the ATP-binding domains, and in the last 50 residues of the polypeptide. Since functionally defective mutants for E. coli Rho exist in all three of these segments, they represent important parts of Rho that have undergone adaptive evolution. PMID:8051015

  8. A comprehensive package for DNA sequence analysis in FORTRAN IV for the PDP-11.

    PubMed Central

    Arnold, J; Eckenrode, V K; Lemke, K; Phillips, G J; Schaeffer, S W

    1986-01-01

    A computer package written in Fortran-IV for the PDP-11 minicomputer is described. The package's novel features are: software for voice-entry of sequence data; a less memory intensive algorithm for optimal sequence alignment; and programs that fit statistical models to nucleic acid and protein sequences. PMID:3003673

  9. Identification of mycobacteria from animals by restriction enzyme analysis and direct DNA cycle sequencing of polymerase chain reaction-amplified 16S rRNA gene sequences.

    PubMed Central

    Hughes, M S; Skuce, R A; Beck, L A; Neill, S D

    1993-01-01

    Two methods, based on analysis of the polymerase chain reaction-amplified 16S rRNA gene by restriction enzyme analysis (REA) or direct cycle sequencing, were developed for rapid identification of mycobacteria isolated from animals and were compared to traditional phenotypic typing. BACTEC 7H12 cultures of the specimens were examined for "cording," and specific polymerase chain reaction amplification was performed to identify the presence of tubercle complex mycobacteria. Combined results of separate REAs with HhaI, MspI, MboI, and ThaI differentiated 12 of 15 mycobacterial species tested. HhaI, MspI, and ThaI restriction enzyme profiles differentiated Actinobacillus species from mycobacterial species. Mycobacterium bovis could not be differentiated from M. bovis BCG or Mycobacterium tuberculosis. Similarly, Mycobacterium avium and Mycobacterium paratuberculosis could not be distinguished from each other by REA but were differentiated by cycle sequencing. Compared with traditional typing, both methods allowed rapid and more accurate identification of acid-fast organisms recovered from 21 specimens of bovine and badger origin. Two groups of isolates were not typed definitively by either molecular method. One group of four isolates may constitute a new species phylogenetically very closely related to Mycobacterium simiae. The remaining unidentified isolates (three badger and one bovine) had identical restriction enzyme profiles and shared 100% nucleotide identify over the sequenced signature region. This nucleotide sequence most closely resembled the data base sequence of Mycobacterium senegalense. Images PMID:7508456

  10. Isolation and amino acid sequences of squirrel monkey (Saimiri sciurea) insulin and glucagon.

    PubMed Central

    Yu, J H; Eng, J; Yalow, R S

    1990-01-01

    It was reported two decades ago that insulin was not detectable in the glucose-stimulated state in Saimiri sciurea, the New World squirrel monkey, by a radioimmunoassay system developed with guinea pig anti-pork insulin antibody and labeled pork insulin. With the same system, reasonable levels were observed in rhesus monkeys and chimpanzees. This suggested that New World monkeys, like the New World hystricomorph rodents such as the guinea pig and the coypu, might have insulins whose sequences differ markedly from those of Old World mammals. In this report we describe the purification and amino acid sequences of squirrel monkey insulin and glucagon. We demonstrate that the substitutions at B29, B27, A2, A4, and A17 of squirrel monkey insulin are identical with those previously found in another New World primate, the owl monkey (Aotus trivirgatus). The immunologic cross-reactivity of this insulin in our immunoassay system is only a few percent of that of human insulin. Squirrel monkey glucagon is identical with the usual glucagon found in Old World mammals, which predicts that the glucagons of other New World monkeys would not differ from the usual Old World mammalian glucagon. It appears that the peptides of the New World monkeys have diverged less from those of the Old World mammals than have those of the New World hystricomorph rodents. The striking improvements in peptide purification and sequencing have the potential for adding new information concerning the evolutionary divergence of species. PMID:2263627

  11. Isolation and amino acid sequences of squirrel monkey (Saimiri sciurea) insulin and glucagon

    SciTech Connect

    Yu, Jinghua ); Eng, J.; Yalow, R.S. City Univ. of New York, NY )

    1990-12-01

    It was reported two decades ago that insulin was not detectable in the glucose-stimulated state in Saimiri sciurea, the New World squirrel monkey, by a radioimmunoassay system developed with guinea pig anti-pork insulin antibody and labeled park insulin. With the same system, reasonable levels were observed in rhesus monkeys and chimpanzees. This suggested that New World monkeys, like the New World hystricomorph rodents such as the guinea pig and the coypu, might have insulins whose sequences differ markedly from those of Old World mammals. In this report the authors describe the purification and amino acid sequences of squirrel monkey insulin and glucagon. They demonstrate that the substitutions at B29, B27, A2, A4, and A17 of squirrel monkey insulin are identical with those previously found in another New World primate, the owl monkey (Aotus trivirgatus). The immunologic cross-reactivity of this insulin in their immunoassay system is only a few percent of that of human insulin. It appears that the peptides of the New World monkeys have diverged less from those of the Old World mammals than have those of the New World hystricomorph rodents. The striking improvements in peptide purification and sequencing have the potential for adding new information concerning the evolutionary divergence of species.

  12. Hyperdimensional analysis of amino acid pair distributions in proteins.

    PubMed

    Henriksen, Svend B; Mortensen, Rasmus J; Geertz-Hansen, Henrik M; Neves-Petersen, Maria Teresa; Arnason, Omar; Söring, Jón; Petersen, Steffen B

    2011-01-01

    Our manuscript presents a novel approach to protein structure analyses. We have organized an 8-dimensional data cube with protein 3D-structural information from 8706 high-resolution non-redundant protein-chains with the aim of identifying packing rules at the amino acid pair level. The cube contains information about amino acid type, solvent accessibility, spatial and sequence distance, secondary structure and sequence length. We are able to pose structural queries to the data cube using program ProPack. The response is a 1, 2 or 3D graph. Whereas the response is of a statistical nature, the user can obtain an instant list of all PDB-structures where such pair is found. The user may select a particular structure, which is displayed highlighting the pair in question. The user may pose millions of different queries and for each one he will receive the answer in a few seconds. In order to demonstrate the capabilities of the data cube as well as the programs, we have selected well known structural features, disulphide bridges and salt bridges, where we illustrate how the queries are posed, and how answers are given. Motifs involving cysteines such as disulphide bridges, zinc-fingers and iron-sulfur clusters are clearly identified and differentiated. ProPack also reveals that whereas pairs of Lys residues virtually never appear in close spatial proximity, pairs of Arg are abundant and appear at close spatial distance, contrasting the belief that electrostatic repulsion would prevent this juxtaposition and that Arg-Lys is perceived as a conservative mutation. The presented programs can find and visualize novel packing preferences in proteins structures allowing the user to unravel correlations between pairs of amino acids. The new tools allow the user to view statistical information and visualize instantly the structures that underpin the statistical information, which is far from trivial with most other SW tools for protein structure analysis. PMID:22174733

  13. Multiple Comparison Analysis of Two New Genomic Sequences of ILTV Strains from China with Other Strains from Different Geographic Regions

    PubMed Central

    Zhao, Yan; Kong, Congcong; Wang, Yunfeng

    2015-01-01

    To date, twenty complete genome sequences of ILTV strains have been published in GenBank, including one strain from China, and nineteen strains from Australian and the United States. To investigate the genomic information on ILTVs from different geographic regions, two additional individual complete genome sequences of WG and K317 strains from China were determined. The genomes of WG and K317 strains were 153,505 and 153,639 bp in length, respectively. Alignments performed on the amino acid sequences of the twelve glycoproteins showed that 13 out of 116 mutational sites were present only among the Chinese strain WG and the Australian strains SA2 and A20. The phylogenetic tree analysis suggested that the WG strain established close relationships with the Australian strain SA2. The recombination events were detected and confirmed in different subregions of the WG strain with the sequences of SA2 and K317 strains as parental. In this study, two new complete genome sequences of Chinese ILTV strains were used in comparative analysis with other complete genome sequences of ILTV strains from China, the United States, and Australia. The analysis of genome comparison, phylogenetic trees, and recombination events showed close relationships among the Chinese strain WG and the Australian strains SA2. The information of the two new complete genome sequences from China will help to facilitate the analysis of phylogenetic relationships and the molecular differences among ILTV strains from different geographic regions. PMID:26186451

  14. Multiple Comparison Analysis of Two New Genomic Sequences of ILTV Strains from China with Other Strains from Different Geographic Regions.

    PubMed

    Zhao, Yan; Kong, Congcong; Wang, Yunfeng

    2015-01-01

    To date, twenty complete genome sequences of ILTV strains have been published in GenBank, including one strain from China, and nineteen strains from Australian and the United States. To investigate the genomic information on ILTVs from different geographic regions, two additional individual complete genome sequences of WG and K317 strains from China were determined. The genomes of WG and K317 strains were 153,505 and 153,639 bp in length, respectively. Alignments performed on the amino acid sequences of the twelve glycoproteins showed that 13 out of 116 mutational sites were present only among the Chinese strain WG and the Australian strains SA2 and A20. The phylogenetic tree analysis suggested that the WG strain established close relationships with the Australian strain SA2. The recombination events were detected and confirmed in different subregions of the WG strain with the sequences of SA2 and K317 strains as parental. In this study, two new complete genome sequences of Chinese ILTV strains were used in comparative analysis with other complete genome sequences of ILTV strains from China, the United States, and Australia. The analysis of genome comparison, phylogenetic trees, and recombination events showed close relationships among the Chinese strain WG and the Australian strains SA2. The information of the two new complete genome sequences from China will help to facilitate the analysis of phylogenetic relationships and the molecular differences among ILTV strains from different geographic regions. PMID:26186451

  15. Complete genome sequence of the actinobacterium Amycolatopsis japonica MG417-CF17(T) (=DSM 44213T) producing (S,S)-N,N'-ethylenediaminedisuccinic acid.

    PubMed

    Stegmann, Evi; Albersmeier, Andreas; Spohn, Marius; Gert, Helena; Weber, Tilmann; Wohlleben, Wolfgang; Kalinowski, Jörn; Rückert, Christian

    2014-11-10

    We report the complete genome sequence of Amycolatopsis japonica MG417-CF17(T) (=DSM 44213(T)) which was identified as the producer of (S,S)-N,N'-ethylenediaminedisuccinic acid during a screening for phospholipase C inhibitors. The genome of A. japonica MG417-CF17(T) consists of two replicons: the chromosome (8,961,318 bp, 68.89% G+C content) and the plasmid pAmyja1 (92,539 bp, 68.23% G+C content), encoding a total of 8422 protein coding genes. Analysis of the sequence data revealed 30 clusters encoding the biosynthesis of secondary metabolites. PMID:25193710

  16. Identification and sequence analysis of lpfABCDE, a putative fimbrial operon of Salmonella typhimurium.

    PubMed Central

    Bäumler, A J; Heffron, F

    1995-01-01

    A chromosomal region present in Salmonella typhimurium but absent from related species was identified by hybridization. A DNA probe originating from 78 min on the S. typhimurium chromosome hybridized with DNA from Salmonella enteritidis, Salmonella heidelberg, and Salmonella dublin but not with DNA from Salmonella typhi, Salmonella arizonae, Escherichia coli, and Shigella serotypes. Cloning and sequence analysis revealed that the corresponding region of the S. typhimurium chromosome encodes a fimbrial operon. Long fimbriae inserted at the poles of the bacterium were observed by electron microscopy when this fimbrial operon was introduced into a nonpiliated E. coli strain. The genes encoding these fimbriae were therefore termed lpfABCDE, for long polar fimbriae. Genetically, the lpf operon was found to be most closely related to the fim operon of S. typhimurium, both in gene order and in conservation of the deduced amino acid sequences. PMID:7721701

  17. [Determination of body fluid based on analysis of nucleic acids].

    PubMed

    Korabečná, Marie

    2015-01-01

    Recent methodological approaches of molecular genetics allow isolation of nucleic acids (DNA and RNA) from negligible forensic samples. Analysis of these molecules may be used not only for individual identification based on DNA profiling but also for the detection of origin of the body fluid which (alone or in mixture with other body fluids) forms the examined biological trace. Such an examination can contribute to the evaluation of procedural, technical and tactical value of the trace. Molecular genetic approaches discussed in the review offer new possibilities in comparison with traditional spectrum of chemical, immunological and spectroscopic tests especially with regard to the interpretation of mixtures of biological fluids and to the confirmatory character of the tests. Approaches based on reverse transcription of tissue specific mRNA and their subsequent polymerase chain reaction (PCR) and fragmentation analysis are applicable on samples containing minimal amounts of biological material. Methods for body fluid discrimination based on examination of microRNA in samples provided so far confusing results therefore further development in this field is needed. The examination of tissue specific methylation of nucleotides in selected gene sequences seems to represent a promising enrichment of the methodological spectrum. The detection of DNA sequences of tissue related bacteria has been established and it provides satisfactory results mainly in combination with above mentioned methodological approaches. PMID:26419517

  18. Lineage analysis by microsatellite loci deep sequencing in mice.

    PubMed

    Luo, Tao; He, Xionglei; Xing, Ke

    2016-05-01

    Lineage analysis is the identification of all the progeny of a single progenitor cell, and has become particularly useful for studying developmental processes and cancer biology. Here, we propose a novel and effective method for lineage analysis that combines sequence capture and next-generation sequencing technology. Genome-wide mononucleotide and dinucleotide microsatellite loci in eight samples from two mice were identified and used to construct phylogenetic trees based on somatic indel mutations at these loci, which were unique enough to distinguish and parse samples from different mice into different groups along the lineage tree. For example, biopsies from the liver and stomach, which originate from the endoderm, were located in the same clade, while samples in kidney, which originate from the mesoderm, were located in another clade. Yet, tissue with a common developmental origin may still contain cells of a mixed ancestry. This genome-wide approach thus provides a non-invasive lineage analysis method based on mutations that accumulate in the genomes of opaque multicellular organism somatic cells. Mol. Reprod. Dev. 83: 387-391, 2016. © 2016 Wiley Periodicals, Inc. PMID:26932355

  19. Sequence characterization and comparative analysis of three plasmids isolated from environmental Vibrio spp.

    PubMed

    Hazen, Tracy H; Wu, Dongying; Eisen, Jonathan A; Sobecky, Patricia A

    2007-12-01

    The horizontal transfer of genes by mobile genetic elements such as plasmids and phages can accelerate genome diversification of Vibrio spp., affecting their physiology, pathogenicity, and ecological character. In this study, sequence analysis of three plasmids from Vibrio spp. previously isolated from salt marsh sediment revealed the remarkable diversity of these elements. Plasmids p0908 (81.4 kb), p23023 (52.5 kb), and p09022 (31.0 kb) had a predicted 99, 64, and 32 protein-coding sequences and G+C contents of 49.2%, 44.7%, and 42.4%, respectively. A phylogenetic tree based on concatenation of the host 16S rRNA and rpoA nucleotide sequences indicated p23023 and p09022 were isolated from strains most closely related to V. mediterranei and V. campbellii, respectively, while the host of p0908 forms a clade with V. fluvialis and V. furnissii. Many predicted proteins had amino acid identities to proteins of previously characterized phages and plasmids (24 to 94%). Predicted proteins with similarity to chromosomally encoded proteins included RecA, a nucleoid-associated protein (NdpA), a type IV helicase (UvrD), and multiple hypothetical proteins. Plasmid p0908 had striking similarity to enterobacteria phage P1, sharing genetic organization and amino acid identity for 23 predicted proteins. This study provides evidence of genetic exchange between Vibrio plasmids, phages, and chromosomes among diverse Vibrio spp. PMID:17921277

  20. Functional and Immunological Relevance of Anaplasma marginale Major Surface Protein 1a Sequence and Structural Analysis

    PubMed Central

    Cabezas-Cruz, Alejandro; Passos, Lygia M. F.; Lis, Katarzyna; Kenneil, Rachel; Valdés, James J.; Ferrolho, Joana; Tonk, Miray; Pohl, Anna E.; Grubhoffer, Libor; Zweygarth, Erich; Shkap, Varda; Ribeiro, Mucio F. B.; Estrada-Peña, Agustín; Kocan, Katherine M.; de la Fuente, José

    2013-01-01

    Bovine anaplasmosis is caused by cattle infection with the tick-borne bacterium, Anaplasma marginale. The major surface protein 1a (MSP1a) has been used as a genetic marker for identifying A. marginale strains based on N-terminal tandem repeats and a 5′-UTR microsatellite located in the msp1a gene. The MSP1a tandem repeats contain immune relevant elements and functional domains that bind to bovine erythrocytes and tick cells, thus providing information about the evolution of host-pathogen and vector-pathogen interactions. Here we propose one nomenclature for A. marginale strain classification based on MSP1a. All tandem repeats among A. marginale strains were classified and the amino acid variability/frequency in each position was determined. The sequence variation at immunodominant B cell epitopes was determined and the secondary (2D) structure of the tandem repeats was modeled. A total of 224 different strains of A. marginale were classified, showing 11 genotypes based on the 5′-UTR microsatellite and 193 different tandem repeats with high amino acid variability per position. Our results showed phylogenetic correlation between MSP1a sequence, secondary structure, B-cell epitope composition and tick transmissibility of A. marginale strains. The analysis of MSP1a sequences provides relevant information about the biology of A. marginale to design vaccines with a cross-protective capacity based on MSP1a B-cell epitopes. PMID:23776456

  1. Trypanosoma cruzi: sequence analysis of the variable region of kinetoplast minicircles.

    PubMed

    Telleria, Jenny; Lafay, Bénédicte; Virreira, Myrna; Barnabé, Christian; Tibayrenc, Michel; Svoboda, Michal

    2006-12-01

    The comparisons of 170 sequences of kinetoplast DNA minicircle hypervariable region obtained from 19 stocks of Trypanosoma cruzi and 2 stocks of Trypanosoma cruzi marenkellei showed that only 56% exhibited a significant homology one with other sequences. These sequences could be grouped into homology classes showing no significant sequence similarity with any other homology group. The 44% remaining sequences thus corresponded to unique sequences in our data set. In the DTU I ("Discrete Typing Units") 51% of the sequences were unique. In contrast, in the DTU IId, 87.5% of sequences were distributed into three classes. The results obtained for T. cruzi marinkellei, showed that all sequences were unique, without any similarity between them and T. cruzi sequences. Analysis of palindromes in all sequence sets show high frequency of the EcoRI site. Analysis of repetitive sequences suggested a common ancestral origin of the kDNA. The editing mechanism that occurs in kinetoplastidae is discussed. PMID:16730709

  2. Diverse Bacterial PKS Sequences Derived From Okadaic Acid-Producing Dinoflagellates

    PubMed Central

    Perez, Roberto; Liu, Li; Lopez, Jose; An, Tianying; Rein, Kathleen S.

    2008-01-01

    Okadaic acid (OA) and the related dinophysistoxins are isolated from dinoflagellates of the genus Prorocentrum and Dinophysis. Bacteria of the Roseobacter group have been associated with okadaic acid producing dinoflagellates and have been previously implicated in OA production. Analysis of 16S rRNA libraries reveals that Roseobacter are the most abundant bacteria associated with OA producing dinoflagellates of the genus Prorocentrum and are not found in association with non-toxic dinoflagellates. While some polyketide synthase (PKS) genes form a highly supported Prorocentrum clade, most appear to be bacterial, but unrelated to Roseobacter or Alpha-Proteobacterial PKSs or those derived from other Alveolates Karenia brevis or Crytosporidium parvum. PMID:18728765

  3. Sequence analysis of styrenic copolymers by tandem mass spectrometry.

    PubMed

    Yol, Aleer M; Janoski, Jonathan; Quirk, Roderic P; Wesdemiotis, Chrys

    2014-10-01

    Styrene and smaller molar amounts of either m-dimethylsilylstyrene (m-DMSS) or p-dimethylsilylstyrene (p-DMSS) were copolymerized under living anionic polymerization conditions, and the compositions, architectures, and sequences of the resulting copolymers were characterized by matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF-MS) and tandem mass spectrometry (MS(2)). MS analysis revealed that linear copolymer chains containing phenyl-Si(CH3)2H pendants were the major product for both DMSS comonomers. In addition, two-armed architectures with phenyl-Si(CH3)2-benzyl branches were detected as minor products. The comonomer sequence in the linear chains was established by MS(2) experiments on lithiated oligomers, based on the DMSS content of fragments generated by backbone C-C bond scissions and with the help of reference MS(2) spectra obtained from a polystyrene homopolymer and polystyrene end-capped with a p-DMSS block. The MS(2) data provided conclusive evidence that copolymerization of styrene/DMSS mixtures leads to chains with a rather random distribution of the silylated comonomer when m-DMSS is used, but to chains with tapered block structures, with the silylated units near the initiator, when p-DMSS is used. Hence, MS(2) fragmentation patterns permit not only differentiation of the sequences generated in the synthesis, but also the determination of specific comonomer locations along the polymer chain. PMID:25181590

  4. Integrated visual analysis of protein structures, sequences, and feature data

    PubMed Central

    2015-01-01

    Background To understand the molecular mechanisms that give rise to a protein's function, biologists often need to (i) find and access all related atomic-resolution 3D structures, and (ii) map sequence-based features (e.g., domains, single-nucleotide polymorphisms, post-translational modifications) onto these structures. Results To streamline these processes we recently developed Aquaria, a resource offering unprecedented access to protein structure information based on an all-against-all comparison of SwissProt and PDB sequences. In this work, we provide a requirements analysis for several frequently occuring tasks in molecular biology and describe how design choices in Aquaria meet these requirements. Finally, we show how the interface can be used to explore features of a protein and gain biologically meaningful insights in two case studies conducted by domain experts. Conclusions The user interface design of Aquaria enables biologists to gain unprecedented access to molecular structures and simplifies the generation of insight. The tasks involved in mapping sequence features onto structures can be conducted easier and faster using Aquaria. PMID:26329268

  5. Experience using web services for biological sequence analysis.

    PubMed

    Stockinger, Heinz; Attwood, Teresa; Chohan, Shahid Nadeem; Côté, Richard; Cudré-Mauroux, Philippe; Falquet, Laurent; Fernandes, Pedro; Finn, Robert D; Hupponen, Taavi; Korpelainen, Eija; Labarga, Alberto; Laugraud, Aurelie; Lima, Tania; Pafilis, Evangelos; Pagni, Marco; Pettifer, Steve; Phan, Isabelle; Rahman, Nazim

    2008-11-01

    Programmatic access to data and tools through the web using so-called web services has an important role to play in bioinformatics. In this article, we discuss the most popular approaches based on SOAP/WS-I and REST and describe our, a cross section of the community, experiences with providing and using web services in the context of biological sequence analysis. We briefly review main technological approaches as well as best practice hints that are useful for both users and developers. Finally, syntactic and semantic data integration issues with multiple web services are discussed. PMID:18621748

  6. Determining physical constraints in transcriptional initiationcomplexes using DNA sequence analysis

    SciTech Connect

    Shultzaberger, Ryan K.; Chiang, Derek Y.; Moses, Alan M.; Eisen,Michael B.

    2007-07-01

    Eukaryotic gene expression is often under the control ofcooperatively acting transcription factors whose binding is limited bystructural constraints. By determining these structural constraints, wecan understand the "rules" that define functional cooperativity.Conversely, by understanding the rules of binding, we can inferstructural characteristics. We have developed an information theory basedmethod for approximating the physical limitations of cooperativeinteractions by comparing sequence analysis to microarray expressiondata. When applied to the coordinated binding of the sulfur amino acidregulatory protein Met4 by Cbf1 and Met31, we were able to create acombinatorial model that can correctly identify Met4 regulatedgenes.

  7. [Comparative genomics and evolutionary analysis of CRISPR loci in acetic acid bacteria].

    PubMed

    Kai, Xia; Xinle, Liang; Yudong, Li

    2015-12-01

    The clustered regularly interspaced short palindromic repeat (CRISPR) is a widespread adaptive immunity system that exists in most archaea and many bacteria against foreign DNA, such as phages, viruses and plasmids. In general, CRISPR system consists of direct repeat, leader, spacer and CRISPR-associated sequences. Acetic acid bacteria (AAB) play an important role in industrial fermentation of vinegar and bioelectrochemistry. To investigate the polymorphism and evolution pattern of CRISPR loci in acetic acid bacteria, bioinformatic analyses were performed on 48 species from three main genera (Acetobacter, Gluconacetobacter and Gluconobacter) with whole genome sequences available from the NCBI database. The results showed that the CRISPR system existed in 32 species of the 48 strains studied. Most of the CRISPR-Cas system in AAB belonged to type I CRISPR-Cas system (subtype E and C), but type II CRISPR-Cas system which contain cas9 gene was only found in the genus Acetobacter and Gluconacetobacter. The repeat sequences of some CRISPR were highly conserved among species from different genera, and the leader sequences of some CRISPR possessed conservative motif, which was associated with regulated promoters. Moreover, phylogenetic analysis of cas1 demonstrated that they were suitable for classification of species. The conservation of cas1 genes was associated with that of repeat sequences among different strains, suggesting they were subjected to similar functional constraints. Moreover, the number of spacer was positively correlated with the number of prophages and insertion sequences, indicating the acetic acid bacteria were continually invaded by new foreign DNA. The comparative analysis of CRISR loci in acetic acid bacteria provided the basis for investigating the molecular mechanism of different acetic acid tolerance and genome stability in acetic acid bacteria. PMID:26704949

  8. Generalization of Entropy Based Divergence Measures for Symbolic Sequence Analysis

    PubMed Central

    Ré, Miguel A.; Azad, Rajeev K.

    2014-01-01

    Entropy based measures have been frequently used in symbolic sequence analysis. A symmetrized and smoothed form of Kullback-Leibler divergence or relative entropy, the Jensen-Shannon divergence (JSD), is of particular interest because of its sharing properties with families of other divergence measures and its interpretability in different domains including statistical physics, information theory and mathematical statistics. The uniqueness and versatility of this measure arise because of a number of attributes including generalization to any number of probability distributions and association of weights to the distributions. Furthermore, its entropic formulation allows its generalization in different statistical frameworks, such as, non-extensive Tsallis statistics and higher order Markovian statistics. We revisit these generalizations and propose a new generalization of JSD in the integrated Tsallis and Markovian statistical framework. We show that this generalization can be interpreted in terms of mutual information. We also investigate the performance of different JSD generalizations in deconstructing chimeric DNA sequences assembled from bacterial genomes including that of E. coli, S. enterica typhi, Y. pestis and H. influenzae. Our results show that the JSD generalizations bring in more pronounced improvements when the sequences being compared are from phylogenetically proximal organisms, which are often difficult to distinguish because of their compositional similarity. While small but noticeable improvements were observed with the Tsallis statistical JSD generalization, relatively large improvements were observed with the Markovian generalization. In contrast, the proposed Tsallis-Markovian generalization yielded more pronounced improvements relative to the Tsallis and Markovian generalizations, specifically when the sequences being compared arose from phylogenetically proximal organisms. PMID:24728338

  9. Generalization of entropy based divergence measures for symbolic sequence analysis.

    PubMed

    Ré, Miguel A; Azad, Rajeev K

    2014-01-01

    Entropy based measures have been frequently used in symbolic sequence analysis. A symmetrized and smoothed form of Kullback-Leibler divergence or relative entropy, the Jensen-Shannon divergence (JSD), is of particular interest because of its sharing properties with families of other divergence measures and its interpretability in different domains including statistical physics, information theory and mathematical statistics. The uniqueness and versatility of this measure arise because of a number of attributes including generalization to any number of probability distributions and association of weights to the distributions. Furthermore, its entropic formulation allows its generalization in different statistical frameworks, such as, non-extensive Tsallis statistics and higher order Markovian statistics. We revisit these generalizations and propose a new generalization of JSD in the integrated Tsallis and Markovian statistical framework. We show that this generalization can be interpreted in terms of mutual information. We also investigate the performance of different JSD generalizations in deconstructing chimeric DNA sequences assembled from bacterial genomes including that of E. coli, S. enterica typhi, Y. pestis and H. influenzae. Our results show that the JSD generalizations bring in more pronounced improvements when the sequences being compared are from phylogenetically proximal organisms, which are often difficult to distinguish because of their compositional similarity. While small but noticeable improvements were observed with the Tsallis statistical JSD generalization, relatively large improvements were observed with the Markovian generalization. In contrast, the proposed Tsallis-Markovian generalization yielded more pronounced improvements relative to the Tsallis and Markovian generalizations, specifically when the sequences being compared arose from phylogenetically proximal organisms. PMID:24728338

  10. Using Willie's Acid-Base Box for Blood Gas Analysis

    ERIC Educational Resources Information Center

    Dietz, John R.

    2011-01-01

    In this article, the author describes a method developed by Dr. William T. Lipscomb for teaching blood gas analysis of acid-base status and provides three examples using Willie's acid-base box. Willie's acid-base box is constructed using three of the parameters of standard arterial blood gas analysis: (1) pH; (2) bicarbonate; and (3) CO[subscript…

  11. Immunoreactivity of polyclonal antibodies generated against the carboxy terminus of the predicted amino acid sequence of the Huntington disease gene

    SciTech Connect

    Alkatib, G.; Graham, R.; Pelmear-Telenius, A.

    1994-09-01

    A cDNA fragment spanning the 3{prime}-end of the Huntington disease gene (from 8052 to 9252) was cloned into a prokaryotic expression vector containing the E. Coli lac promoter and a portion of the coding sequence for {beta}-galactosidase. The truncated {beta}-galactosidase gene was cleaved with BamHl and fused in frame to the BamHl fragment of the Huntington disease gene 3{prime}-end. Expression analysis of proteins made in E. Coli revealed that 20-30% of the total cellular proteins was represented by the {beta}-galactosidase-huntingtin fusion protein. The identity of the Huntington disease protein amino acid sequences was confirmed by protein sequence analysis. Affinity chromatography was used to purify large quantities of the fusion protein from bacterial cell lysates. Affinity-purified proteins were used to immunize New Zealand white rabbits for antibody production. The generated polyclonal antibodies were used to immunoprecipitate the Huntington disease gene product expressed in a neuroblastoma cell line. In this cell line the antibodies precipitated two protein bands of apparent gel migrations of 200 and 150 kd which together, correspond to the calculated molecular weight of the Huntington disease gene product (350 kd). Immunoblotting experiments revealed the presence of a large precursor protein in the range of 350-750 kd which is in agreement with the predicted molecular weight of the protein without post-translational modifications. These results indicate that the huntingtin protein is cleaved into two subunits in this neuroblastoma cell line and implicate that cleavage of a large precursor protein may contribute to its biological activity. Experiments are ongoing to determine the precursor-product relationship and to examine the synthesis of the huntingtin protein in freshly isolated rat brains, and to determine cellular and subcellular distribution of the gene product.

  12. Analysis of sequence diversity in hypervariable regions of the external glycoprotein of human immunodeficiency virus type 1.

    PubMed Central

    Simmonds, P; Balfe, P; Ludlam, C A; Bishop, J O; Brown, A J

    1990-01-01

    Nucleotide sequences in three hypervariable regions of the human immunodeficiency virus type 1 (HIV-1) env gene were obtained by sequencing provirus present in peripheral blood mononuclear cells of HIV-infected individuals. Single molecules of target sequences were isolated by limiting dilution and amplified in two stages by the polymerase chain reaction, using nested primers. The product was directly sequenced to avoid errors introduced by Taq polymerase during the amplification process. There was extensive variation between sequences from the same individual as well as between sequences from different individuals. Interpatient variability was markedly less in individuals infected from a common source. A high proportion of amino acid substitutions in the hypervariable regions altered the number and positions of potential N-linked glycosylation sites. Sequences in two hypervariable regions frequently contained short (3- to 15-bp) duplications or deletions, and by amplifying peripheral blood mononuclear cell DNA containing 10(2) or 10(3) proviral molecules and analyzing the product by high-resolution electrophoresis, the total number and abundance of distinct length variants within an individual could be estimated, providing a more comprehensive analysis of the variants present than would be obtained by sequencing alone. Sequences from many individuals showed frequent amino acid substitutions at certain key positions for neutralizing-antibody and cytotoxic T-cell recognition in the immunodominant loop. The rates of synonymous and nonsynonymous nucleotide substitution in the region of this and flanking regions indicate that strong positive selection for amino acid change is operating in the generation of antigenic diversity. Images PMID:2243378

  13. Mutations analysis of C1 inhibitor coding sequence gene among Portuguese patients with hereditary angioedema.

    PubMed

    Martinho, A; Mendes, J; Simões, O; Nunes, R; Gomes, J; Dias Castro, E; Leiria-Pinto, P; Ferreira, M B; Pereira, C; Castel-Branco, M G; Pais, L

    2013-04-01

    Mutations that modify the amino acid sequence of C1-INH (except Val458Met) are associated with HAE. More than 200 different mutations scattering the entire C1-INH gene have been reported. The main objective of this study was to report the mutational findings in a HAE cohort of 138 Portuguese patients followed in specialized consultation all over the country. DNA was extracted from peripheral blood with QiaSymphony BioRobot (QIAGEN Portugal). The sequence reactions were performed by using a DNA sequencing kit (Big Dye terminator cycle sequencing v1.1/v3.1 from Applied Biosystems) and sequencing products were immediately submitted to direct sequencing on an Applied Biosystem 3130 DNA Analyser. DNA sequences were analyzed at four different stages. Raw data and sequence alignments of all 8 exons and intron-exon boundaries were performed for each patient individually with SeqScape software and using SERPING1 gene NG_009625 of 24,300 bp (12-March-2011) as reference sequence. Sequence comparisons among patients and controls were performed with software CodonCode Aligner v.3.7 from CodonCode Corp and with Geneious 4.5 from Biomatters Lda. A total of 94 point mutations were observed among patients, and 67% of them were located on exon 8. In addition, we noticed one not described stop codon at position c.1459 C>T in three different patients. Translation termination was also found on exon 3 and 7, as a result of mutations at positions c.481A>7, c.1174C>T. In this population, the prevalence of the missense mutation p.Arg444Cys was 39 out of 42. Mutational analysis revealed 22 different pathogenic mutations, of which 64% were not described on HAE database. Although identification of disease causing mutations is not necessary to establish HAE diagnosis, studies on gene expression and characterization of rearrangements in SERPING1 gene are suggested in order to get new insights on function and genetic tests of C1 inhibitor. PMID:23123409

  14. Isolation and sequence analysis of peptides from the skin secretion of the Middle East tree frog Hyla savignyi.

    PubMed

    Langsdorf, Markus; Ghassempour, Alireza; Römpp, Andreas; Spengler, Bernhard

    2010-12-01

    Novel peptides were identified in the skin secretion of the tree frog Hyla savignyi. Skin secretions were collected by mild electrical stimulation. Peptides were separated by reversed-phase high-performance liquid chromatography. Mass spectra were acquired by electrospray ionization Fourier transform ion cyclotron resonance mass spectrometry (FTICR-MS), and fragment ion spectra were obtained after collision-induced dissociation and electron capture dissociation. Peptides were analyzed by manual de novo sequencing and composition-based sequencing (CBS). Sequence analyses of three so far undescribed, structurally unrelated peptides are presented in this paper, having the sequences DDSEEEEVE-OH, P*EEVEEERJK-OH, and GJJDPJTGJVGGJJ-NH(2). The glutamate-rich sequences are assumed to be acidic spacer peptides of the prepropeptide. One of these peptides contains the modified amino acid hydroxyproline, as identified and localized by high-accuracy FTICR-MS. Combination of CBS and of experience-based manual sequence analysis as complementary and database-independent sequencing strategies resulted in peptide identification with high reliability. PMID:20835817

  15. Streaming Support for Data Intensive Cloud-Based Sequence Analysis

    PubMed Central

    Issa, Shadi A.; Kienzler, Romeo; El-Kalioby, Mohamed; Tonellato, Peter J.; Wall, Dennis; Bruggmann, Rémy; Abouelhoda, Mohamed

    2013-01-01

    Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation. PMID:23710461

  16. DNA sequence-based analysis of the Pseudomonas species.

    PubMed

    Mulet, Magdalena; Lalucat, Jorge; García-Valdés, Elena

    2010-06-01

    Partial sequences of four core 'housekeeping' genes (16S rRNA, gyrB, rpoB and rpoD) of the type strains of 107 Pseudomonas species were analysed in order to obtain a comprehensive view regarding the phylogenetic relationships within the Pseudomonas genus. Gene trees allowed the discrimination of two lineages or intrageneric groups (IG), called IG P. aeruginosa and IG P. fluorescens. The first IG P. aeruginosa, was divided into three main groups, represented by the species P. aeruginosa, P. stutzeri and P. oleovorans. The second IG was divided into six groups, represented by the species P. fluorescens, P. syringae, P. lutea, P. putida, P. anguilliseptica and P. straminea. The P. fluorescens group was the most complex and included nine subgroups, represented by the species P. fluorescens, P. gessardi, P. fragi, P. mandelii, P. jesseni, P. koreensis, P. corrugata, P. chlororaphis and P. asplenii. Pseudomonas rhizospherae was affiliated with the P. fluorescens IG in the phylogenetic analysis but was independent of any group. Some species were located on phylogenetic branches that were distant from defined clusters, such as those represented by the P. oryzihabitans group and the type strains P. pachastrellae, P. pertucinogena and P. luteola. Additionally, 17 strains of P. aeruginosa, 'P. entomophila', P. fluorescens, P. putida, P. syringae and P. stutzeri, for which genome sequences have been determined, have been included to compare the results obtained in the analysis of four housekeeping genes with those obtained from whole genome analyses. PMID:20192968

  17. Cloning, nucleotide sequence, and transcriptional analysis of the Pediococcus acidilactici L-(+)-lactate dehydrogenase gene.

    PubMed Central

    Garmyn, D; Ferain, T; Bernard, N; Hols, P; Delcour, J

    1995-01-01

    Recombinant plasmids containing the Pediococcus acidilactici L-(+)-lactate dehydrogenase gene (ldhL) were isolated by complementing for growth under anaerobiosis of an Escherichia coli lactate dehydrogenase-pyruvate formate lyase double mutant. The nucleotide sequence of the ldhL gene predicted a protein of 323 amino acids showing significant similarity with other bacterial L-(+)-lactate dehydrogenases and especially with that of Lactobacillus plantarum. The ldhL transcription start points in P. acidilactici were defined by primer extension, and the promoter sequence was identified as TCAAT-(17 bp)-TATAAT. This sequence is closely related to the consensus sequence of vegetative promoters from gram-positive bacteria as well as from E. coli. Northern analysis of P. acidilactici RNA showed a 1.1-kb ldhL transcript whose abundance is growth rate regulated. These data, together with the presence of a putative rho-independent transcriptional terminator, suggest that ldhL is expressed as a monocistronic transcript in P. acidilactici. PMID:7887607

  18. Data Analysis for Sequencing by Hybridization (SBH) Experiments

    SciTech Connect

    Salbego, David

    1995-11-28

    SCORES is user friendly software designed to analyze data from SBH (Sequencing By Hybridization) experiments. In these ANL experiments DNA samples are spotted on a nylon membrane and hybridized with radioactivity labeled oligonucleotide probes. An image analysis program (DOTS) calculates a raw value for each DNA dot from images generated by the Molecular Dynamics Phosphorimager. SCORES reads in the DOTS output for each hybridization done for a particular filter. The data for each probe is normalized against a mass probe and scaled properly. These values from 100 or more probes are then used to compute the distance (i.e., degree of similarity) between any two clones on the filter. These calculated distances define clusters of similar clones (cDNA)or contigs (genomic DNA). Histograms of the data at each stage of analysis to establish thresholds for further steps. SCORES generates various statistical tables to evaluate the quality of spotting, hybridization of filters, and of individual dots.

  19. Systematic sequencing of the Escherichia coli genome: analysis of the 2.4-4.1 min (110,917-193,643 bp) region.

    PubMed Central

    Fujita, N; Mori, H; Yura, T; Ishihama, A

    1994-01-01

    The complete sequence analysis of the E. coli genome was initiated as a collaborative study in Japan. Following the initial analysis of the 0-2.4 min region (Yura, T. et al. (1992) Nucleic Acids Res. 20, 3305-3308), a contiguous sequence of 82,727 bp corresponding to the 2.4-4.1 min region (110,917-193,643 bp as counted from 0 min) was determined. The resulting sequence was found to contain at least 33 known genes and 24 putative genes predicted from protein sequence homology. PMID:8202364

  20. Full Genome Virus Detection in Fecal Samples Using Sensitive Nucleic Acid Preparation, Deep Sequencing, and a Novel Iterative Sequence Classification Algorithm

    PubMed Central

    Cotten, Matthew; Oude Munnink, Bas; Canuti, Marta; Deijs, Martin; Watson, Simon J.; Kellam, Paul; van der Hoek, Lia

    2014-01-01

    We have developed a full genome virus detection process that combines sensitive nucleic acid preparation optimised for virus identification in fecal material with Illumina MiSeq sequencing and a novel post-sequencing virus identification algorithm. Enriched viral nucleic acid was converted to double-stranded DNA and subjected to Illumina MiSeq sequencing. The resulting short reads were processed with a novel iterative Python algorithm SLIM for the identification of sequences with homology to known viruses. De novo assembly was then used to generate full viral genomes. The sensitivity of this process was demonstrated with a set of fecal samples from HIV-1 infected patients. A quantitative assessment of the mammalian, plant, and bacterial virus content of this compartment was generated and the deep sequencing data were sufficient to assembly 12 complete viral genomes from 6 virus families. The method detected high levels of enteropathic viruses that are normally controlled in healthy adults, but may be involved in the pathogenesis of HIV-1 infection and will provide a powerful tool for virus detection and for analyzing changes in the fecal virome associated with HIV-1 progression and pathogenesis. PMID:24695106

  1. Complete cDNA and deduced amino acid sequence of the chaperonin containing T-complex polypeptide 1 (CCT) delta subunit from Aedes triseriatus mosquitoes.

    PubMed

    Blitvich, B J; Rayms-Keller, A; Blair, C D; Beaty, B J

    2001-01-01

    The chaperonin containing t-complex polypeptide 1 (CCT) assists in the ATP-dependent folding and assembly of newly translated actin and tubulin in the eukaryotic cytosol. CCT is composed of eight different subunits, each encoded by an independent gene. In this report, we used RT-PCR amplification and 5'- and 3'-rapid amplification of cDNA ends (RACE) to determine the complete cDNA sequence of the CCT delta subunit from Aedes triseriatus mosquitoes. The CCT delta cDNA is 1936 nucleotides in length and encodes a putative 533 amino acid protein with a calculated molecular mass of 57,179 daltons and pI of 7.15. Hydrophobic residues comprise 39.8% of the amino acid sequence and putative motifs for ATP-binding and ATPase-activity are present. The amino acid sequence displays strong sequence similarity to Drosophila melanogaster (92%), human (85%), puffer fish (84%) and mouse (84%) counterparts. CCT delta mRNA was detected in both biosynthetically active (embryonating) and dormant (diapausing) Ae. triseriatus embryos by RT-PCR analysis. PMID:11762197

  2. Evolutionary connections of biological kingdoms based on protein and nucleic acid sequence evidence

    NASA Technical Reports Server (NTRS)

    Dayhoff, M. O.

    1983-01-01

    Prokaryotic and eukaryotic evolutionary trees are developed from protein and nucleic-acid sequences by the methods of numerical taxonomy. Trees are presented for bacterial ferredoxins, 5S ribosomal RNA, c-type cytochromes , cytochromes c2 and c', and 5.8S ribosomal RNA; the implications for early evolution are discussed; and a composite tree showing the branching of the anaerobes, aerobes, archaebacteria, and eukaryotes is shown. Single lines are found for all oxygen-evolving photosynthetic forms and for the salt-loving and high-temperature forms of archaebacteria. It is argued that the eukaryote mitochondria, chloroplasts, and cytoplasmic host material are descended from free-living prokaryotes that formed symbiotic associations, with more than one symbiotic event involved in the evolution of each organelle.

  3. Reverse transcriptase domain sequences from tree peony (Paeonia suffruticosa) long terminal repeat retrotransposons: sequence characterization and phylogenetic analysis

    PubMed Central

    Guo, Da-Long; Hou, Xiao-Gai; Jia, Tian

    2014-01-01

    Tree peony is an important horticultural plant worldwide of great ornamental and medicinal value. Long terminal repeat retrotransposons (LTR-retrotransposons) are the major components of most plant genomes and can substantially impact the genome in many ways. It is therefore crucial to understand their sequence characteristics, genetic distribution and transcriptional activity; however, no information about them is available in tree peony. Ty1-copia-like reverse transcriptase sequences were amplified from tree peony genomic DNA by polymerase chain reaction (PCR) with degenerate oligonucleotide primers corresponding to highly conserved domains of the Ty1-copia-like retrotransposons in this study. PCR fragments of roughly 270 bp were isolated and cloned, and 33 sequences were obtained. According to alignment and phylogenetic analysis, all sequences were divided into six families. The observed difference in the degree of nucleotide sequence similarity is an indication for high level of sequence heterogeneity among these clones. Most of these sequences have a frame shift, a stop codon, or both. Dot-blot analysis revealed distribution of these sequences in all the studied tree peony species. However, different hybridization signals were detected among them, which is in agreement with previous systematics studies. Reverse transcriptase PCR (RT-PCR) indicated that Ty1-copia retrotransposons in tree peony were transcriptionally inactive. The results provide basic genetic and evolutionary information of tree peony genome, and will provide valuable information for the further utilization of retrotransposons in tree peony. PMID:26019529

  4. Complete genome sequence of probiotic Bacillus coagulans HM-08: A potential lactic acid producer.

    PubMed

    Yao, Guoqiang; Gao, Pengfei; Zhang, Wenyi

    2016-06-20

    Bacillus coagulans HM-08 is a commercialized probiotic strain in China. Its genome contains a 3.62Mb circular chromosome with an average GC content of 46.3%. In silico analysis revealed the presence of one xyl operon as well as several other genes that are correlated to xylose utilization. The genetic information provided here may help to expand its future biotechnology potential in lactic acid production. PMID:27130497

  5. The amino acid alphabet and the architecture of the protein sequence-structure map. I. Binary alphabets.

    PubMed

    Ferrada, Evandro

    2014-12-01

    The correspondence between protein sequences and structures, or sequence-structure map, relates to fundamental aspects of structural, evolutionary and synthetic biology. The specifics of the mapping, such as the fraction of accessible sequences and structures, or the sequences' ability to fold fast, are dictated by the type of interactions between the monomers that compose the sequences. The set of possible interactions between monomers is encapsulated by the potential energy function. In this study, I explore the impact of the relative forces of the potential on the architecture of the sequence-structure map. My observations rely on simple exact models of proteins and random samples of the space of potential energy functions of binary alphabets. I adopt a graph perspective and study the distribution of viable sequences and the structures they produce, as networks of sequences connected by point mutations. I observe that the relative proportion of attractive, neutral and repulsive forces defines types of potentials, that induce sequence-structure maps of vastly different architectures. I characterize the properties underlying these differences and relate them to the structure of the potential. Among these properties are the expected number and relative distribution of sequences associated to specific structures and the diversity of structures as a function of sequence divergence. I study the types of binary potentials observed in natural amino acids and show that there is a strong bias towards only some types of potentials, a bias that seems to characterize the folding code of natural proteins. I discuss implications of these observations for the architecture of the sequence-structure map of natural proteins, the construction of random libraries of peptides, and the early evolution of the natural amino acid alphabet. PMID:25473967

  6. The Amino Acid Alphabet and the Architecture of the Protein Sequence-Structure Map. I. Binary Alphabets

    PubMed Central

    Ferrada, Evandro

    2014-01-01

    The correspondence between protein sequences and structures, or sequence-structure map, relates to fundamental aspects of structural, evolutionary and synthetic biology. The specifics of the mapping, such as the fraction of accessible sequences and structures, or the sequences' ability to fold fast, are dictated by the type of interactions between the monomers that compose the sequences. The set of possible interactions between monomers is encapsulated by the potential energy function. In this study, I explore the impact of the relative forces of the potential on the architecture of the sequence-structure map. My observations rely on simple exact models of proteins and random samples of the space of potential energy functions of binary alphabets. I adopt a graph perspective and study the distribution of viable sequences and the structures they produce, as networks of sequences connected by point mutations. I observe that the relative proportion of attractive, neutral and repulsive forces defines types of potentials, that induce sequence-structure maps of vastly different architectures. I characterize the properties underlying these differences and relate them to the structure of the potential. Among these properties are the expected number and relative distribution of sequences associated to specific structures and the diversity of structures as a function of sequence divergence. I study the types of binary potentials observed in natural amino acids and show that there is a strong bias towards only some types of potentials, a bias that seems to characterize the folding code of natural proteins. I discuss implications of these observations for the architecture of the sequence-structure map of natural proteins, the construction of random libraries of peptides, and the early evolution of the natural amino acid alphabet. PMID:25473967

  7. Microfluidic platform for isolating nucleic acid targets using sequence specific hybridization

    PubMed Central

    Wang, Jingjing; Morabito, Kenneth; Tang, Jay X.; Tripathi, Anubhav

    2013-01-01

    The separation of target nucleic acid sequences from biological samples has emerged as a significant process in today's diagnostics and detection strategies. In addition to the possible clinical applications, the fundamental understanding of target and sequence specific hybridization on surface modified magnetic beads is of high value. In this paper, we describe a novel microfluidic platform that utilizes a mobile magnetic field in static microfluidic channels, where single stranded DNA (ssDNA) molecules are isolated via nucleic acid hybridization. We first established efficient isolation of biotinylated capture probe (BP) using streptavidin-coated magnetic beads. Subsequently, we investigated the hybridization of target ssDNA with BP bound to beads and explained these hybridization kinetics using a dual-species kinetic model. The number of hybridized target ssDNA molecules was determined to be about 6.5 times less than that of BP on the bead surface, due to steric hindrance effects. The hybridization of target ssDNA with non-complementary BP bound to bead was also examined, and non-specific hybridization was found to be insignificant. Finally, we demonstrated highly efficient capture and isolation of target ssDNA in the presence of non-target ssDNA, where as low as 1% target ssDNA can be detected from mixture. The microfluidic method described in this paper is significantly relevant and is broadly applicable, especially towards point-of-care biological diagnostic platforms that require binding and separation of known target biomolecules, such as RNA, ssDNA, or protein. PMID:24404041

  8. Sequence analysis, expression, and binding activity of recombinant major outer sheath protein (Msp) of Treponema denticola.

    PubMed Central

    Fenno, J C; Müller, K H; McBride, B C

    1996-01-01

    The gene encoding the major outer sheath protein (Msp) of the oral spirochete Treponema denticola ATCC 35405 was cloned, sequenced, and expressed in Escherichia coli. Preliminary sequence analysis showed that the 5' end of the msp gene was not present on the 5.5-kb cloned fragment described in a recent study (M. Haapasalo, K. H. Müller, V. J. Uitto, W. K. Leung, and B. C. McBride, Infect. Immun. 60:2058-2065,1992). The 5' end of msp was obtained by PCR amplification from a T. denticola genomic library, and an open reading frame of 1,629 bp was identified as the coding region for Msp by combining overlapping sequences. The deduced peptide consisted of 543 amino acids and had a molecular mass of 58,233 Da. The peptide had a typical prokaryotic signal sequence with a potential cleavage site for signal peptidase 1. Northern (RNA) blot analysis showing the msp transcript to be approximately 1.7 kb was consistent with the identification of a promoter consensus sequence located optimally upstream of msp and a transcription termination signal found downstream of the stop codon. The entire msp sequence was amplified from T. denticola genomic DNA and cloned in E. coli by using a tightly regulated T7 RNA polymerase vector system. Expression of Msp was toxic to E. coli when the entire msp gene was present. High levels of Msp were produced as inclusion bodies when the putative signal peptide sequence was deleted and replaced by a vector-encoded T7 peptide sequence. Recombinant Msp purified to homogeneity from a clone containing the full-length msp gene adhered to immobilized laminin and fibronectin but not to bovine serum albumin. Attachment of recombinant Msp was decreased in the presence of soluble substrate. Attachment of T. denticola to immobilized laminin and fibronectin was increased by pretreatment of the substrate with recombinant Msp. These studies lend further support to the hypothesis that Msp mediates the extracellular matrix binding activity of T. denticola. PMID

  9. Molecular cloning, sequence analysis and tissue-specific expression of Akirin2 gene in Tianfu goat.

    PubMed

    Ma, Jisi; Xu, Gangyi; Wan, Lu; Wang, Nianlu

    2015-01-01

    The Akirin2 gene is a nuclear factor and is considered as a potential functional candidate gene for meat quality. To better understand the structures and functions of Akirin2 gene, the cDNA of the Tianfu goat Akirin2 gene was cloned. Sequence analysis showed that the Tianfu goat Akirin2 cDNA full coding sequence (CDS) contains 579bp nucleotides that encode 192 amino acids. A phylogenic tree of the Akirin2 protein sequence from the Tianfu goat and other species revealed that the Tianfu goat Akirin2 was closely related with cattle and sheep Akirin2. RT-qPCR analysis showed that Akirin2 was expressed in the myocardium, liver, spleen, lung, kidney, leg muscle, abdominal muscle and the longissimus dorsi muscle. Especially, high expression levels of Akirin2 were detected in the spleen, lung, and kidney whereas lower expression levels were seen in the liver, myocardium, leg muscle, abdominal muscle and longissimus dorsi muscle. Temporal mRNA expression showed that Akirin2 expression levels in the longissimus dorsi muscle, first increased then decreased from day 1 to month 12. Western blotting results showed that the Akirin2 protein was only detected in the lung and three skeletal muscle tissues. PMID:25239665

  10. Radar image sequence analysis of inhomogeneous water surfaces

    NASA Astrophysics Data System (ADS)

    Seemann, Joerg; Senet, Christian M.; Dankert, Heiko; Hatten, Helge; Ziemer, Friedwart

    1999-10-01

    The radar backscatter from the ocean surface, called sea clutter, is modulated by the surface wave field. A method was developed to estimate the near-surface current, the water depth and calibrated surface wave spectra from nautical radar image sequences. The algorithm is based on the three- dimensional Fast Fourier Transformation (FFT) of the spatio- temporal sea clutter pattern in the wavenumber-frequency domain. The dispersion relation is used to define a filter to separate the spectral signal of the imaged waves from the background noise component caused by speckle noise. The signal-to-noise ratio (SNR) contains information about the significant wave height. The method has been proved to be reliable for the analysis of homogeneous water surfaces in offshore installations. Radar images are inhomogeneous because of the dependency of the image transfer function (ITF) on the azimuth angle between the wave propagation and the antenna viewing direction. The inhomogeneity of radar imaging is analyzed using image sequences of a homogeneous deep-water surface sampled by a ship-borne radar. Changing water depths in shallow-water regions induce horizontal gradients of the tidal current. Wave refraction occurs due to the spatial variability of the current and water depth. These areas cannot be investigated with the standard method. A new method, based on local wavenumber estimation with the multiple-signal classification (MUSIC) algorithm, is outlined. The MUSIC algorithm provides superior wavenumber resolution on local spatial scales. First results, retrieved from a radar image sequence taken from an installation at a coastal site, are presented.

  11. Hybridization properties of long nucleic acid probes for detection of variable target sequences, and development of a hybridization prediction algorithm

    PubMed Central

    Öhrmalm, Christina; Jobs, Magnus; Eriksson, Ronnie; Golbob, Sultan; Elfaitouri, Amal; Benachenhou, Farid; Strømme, Maria; Blomberg, Jonas

    2010-01-01

    One of the main problems in nucleic acid-based techniques for detection of infectious agents, such as influenza viruses, is that of nucleic acid sequence variation. DNA probes, 70-nt long, some including the nucleotide analog deoxyribose-Inosine (dInosine), were analyzed for hybridization tolerance to different amounts and distributions of mismatching bases, e.g. synonymous mutations, in target DNA. Microsphere-linked 70-mer probes were hybridized in 3M TMAC buffer to biotinylated single-stranded (ss) DNA for subsequent analysis in a Luminex® system. When mismatches interrupted contiguous matching stretches of 6 nt or longer, it had a strong impact on hybridization. Contiguous matching stretches are more important than the same number of matching nucleotides separated by mismatches into several regions. dInosine, but not 5-nitroindole, substitutions at mismatching positions stabilized hybridization remarkably well, comparable to N (4-fold) wobbles in the same positions. In contrast to shorter probes, 70-nt probes with judiciously placed dInosine substitutions and/or wobble positions were remarkably mismatch tolerant, with preserved specificity. An algorithm, NucZip, was constructed to model the nucleation and zipping phases of hybridization, integrating both local and distant binding contributions. It predicted hybridization more exactly than previous algorithms, and has the potential to guide the design of variation-tolerant yet specific probes. PMID:20864443

  12. The amino acid sequence of a cereal Bowman-Birk type trypsin inhibitor from seeds of Jobs' tears (Coix lachryma-jobi L.).

    PubMed

    Ary, M B; Shewry, P R; Richardson, M

    1988-02-29

    The major trypsin inhibitor from seeds of Jobs' tears (Coix lachryma-jobi) was purified by heat treatment, fractional precipitation with (NH4)2SO4, ion-exchange chromatography on DEAE-Sepharose, gel-filtration on Sephadex G-75 and preparative reverse-phase HPLC. The complete amino acid sequence was determined by analysis of peptides derived from the reduced and S-carboxymethylated protein by digestion with trypsin, chymotrypsin and the S. aureus V8 protease. The polypeptide contained 64 amino acids with a high content of cysteine. The sequence exhibited strong homology with a number of Bowman-Birk inhibitors from legume seeds and similar proteins recently isolated from wheat and rice. PMID:3162215

  13. Sequence analysis demonstrates that Onion yellow dwarf virus isolates from China contain a P3 region much larger than other potyviruses.

    PubMed

    Chen, J; Adams, M J; Zheng, H-Y; Chen, J-P

    2003-06-01

    The complete sequence of an isolate of Onion yellow dwarf virus (OYDV) from Yuhang, Zhejiang province, China, was determined. It was 10538 nts in length and was predicted to encode a polyprotein 3403 amino acids (aa) long with a calculated M(r) of 385.1 kDa. The predicted P3 protein (530 aa) was larger than that of any of the potyviruses sequenced to date (344-378 aa). The additional sequence occurs at the N-terminus of the protein, does not represent a duplication from elsewhere in the OYDV genome and could not be matched to any other sequences in the databases. Similar sequences were found in 4 other Chinese OYDV isolates. Phylogenetic analysis of the amino acid sequences of the polyprotein showed that OYDV is distantly related to Pea seed-borne mosaic virus and the potyviruses of grasses and cereals. PMID:12756621

  14. A Δ-9 Fatty Acid Desaturase Gene in the Microalga Myrmecia incisa Reisigl: Cloning and Functional Analysis.

    PubMed

    Xue, Wen-Bin; Liu, Fan; Sun, Zheng; Zhou, Zhi-Gang

    2016-01-01

    The green alga Myrmecia incisa is one of the richest natural sources of arachidonic acid (ArA). To better understand the regulation of ArA biosynthesis in M. incisa, a novel gene putatively encoding the Δ9 fatty acid desaturase (FAD) was cloned and characterized for the first time. Rapid-amplification of cDNA ends (RACE) was employed to yield a full length cDNA designated as MiΔ9FAD, which is 2442 bp long in sequence. Comparing cDNA open reading frame (ORF) sequence to genomic sequence indicated that there are 8 introns interrupting the coding region. The deduced MiΔ9FAD protein is composed of 432 amino acids. It is soluble and localized in the chloroplast, as evidenced by the absence of transmembrane domains as well as the presence of a 61-amino acid chloroplast transit peptide. Multiple sequence alignment of amino acids revealed two conserved histidine-rich motifs, typical for Δ9 acyl-acyl carrier protein (ACP) desaturases. To determine the function of MiΔ9FAD, the gene was heterologously expressed in a Saccharomyces cerevisiae mutant strain with impaired desaturase activity. Results of GC-MS analysis indicated that MiΔ9FAD was able to restore the synthesis of monounsaturated fatty acids, generating palmitoleic acid and oleic acid through the addition of a double bond in the Δ9 position of palmitic acid and stearic acid, respectively. PMID:27438826

  15. A Δ-9 Fatty Acid Desaturase Gene in the Microalga Myrmecia incisa Reisigl: Cloning and Functional Analysis

    PubMed Central

    Xue, Wen-Bin; Liu, Fan; Sun, Zheng; Zhou, Zhi-Gang

    2016-01-01

    The green alga Myrmecia incisa is one of the richest natural sources of arachidonic acid (ArA). To better understand the regulation of ArA biosynthesis in M. incisa, a novel gene putatively encoding the Δ9 fatty acid desaturase (FAD) was cloned and characterized for the first time. Rapid-amplification of cDNA ends (RACE) was employed to yield a full length cDNA designated as MiΔ9FAD, which is 2442 bp long in sequence. Comparing cDNA open reading frame (ORF) sequence to genomic sequence indicated that there are 8 introns interrupting the coding region. The deduced MiΔ9FAD protein is composed of 432 amino acids. It is soluble and localized in the chloroplast, as evidenced by the absence of transmembrane domains as well as the presence of a 61-amino acid chloroplast transit peptide. Multiple sequence alignment of amino acids revealed two conserved histidine-rich motifs, typical for Δ9 acyl-acyl carrier protein (ACP) desaturases. To determine the function of MiΔ9FAD, the gene was heterologously expressed in a Saccharomyces cerevisiae mutant strain with impaired desaturase activity. Results of GC-MS analysis indicated that MiΔ9FAD was able to restore the synthesis of monounsaturated fatty acids, generating palmitoleic acid and oleic acid through the addition of a double bond in the Δ9 position of palmitic acid and stearic acid, respectively. PMID:27438826

  16. Lactic acid production from potato peel waste by anaerobic sequencing batch fermentation using undefined mixed culture.

    PubMed

    Liang, Shaobo; McDonald, Armando G; Coats, Erik R

    2015-11-01

    Lactic acid (LA) is a necessary industrial feedstock for producing the bioplastic, polylactic acid (PLA), which is currently produced by pure culture fermentation of food carbohydrates. This work presents an alternative to produce LA from potato peel waste (PPW) by anaerobic fermentation in a sequencing batch reactor (SBR) inoculated with undefined mixed culture from a municipal wastewater treatment plant. A statistical design of experiments approach was employed using set of 0.8L SBRs using gelatinized PPW at a solids content range from 30 to 50 g L(-1), solids retention time of 2-4 days for yield and productivity optimization. The maximum LA production yield of 0.25 g g(-1) PPW and highest productivity of 125 mg g(-1) d(-1) were achieved. A scale-up SBR trial using neat gelatinized PPW (at 80 g L(-1) solids content) at the 3 L scale was employed and the highest LA yield of 0.14 g g(-1) PPW and a productivity of 138 mg g(-1) d(-1) were achieved with a 1 d SRT. PMID:25708409

  17. Ambient temperature detection of PCR amplicons with a novel sequence-specific nucleic acid lateral flow biosensor.

    PubMed

    Ang, Geik Yong; Yu, Choo Yee; Yean, Chan Yean

    2012-01-01

    In the field of diagnostics, molecular amplification targeting unique genetic signature sequences has been widely used for rapid identification of infectious agents, which significantly aids physicians in determining the choice of treatment as well as providing important epidemiological data for surveillance and disease control assessment. We report the development of a rapid nucleic acid lateral flow biosensor (NALFB) in a dry-reagent strip format for the sequence-specific detection of single-stranded polymerase chain reaction (PCR) amplicons at ambient temperature (22-25°C). The NALFB was developed in combination with a linear-after-the-exponential PCR assay and the applicability of this biosensor was demonstrated through detection of the cholera toxin gene from diarrheal-causing toxigenic Vibrio cholerae. Amplification using the advanced asymmetric PCR boosts the production of fluorescein-labeled single-stranded amplicons, allowing capture probes immobilized on the NALFB to hybridize specifically with complementary targets in situ on the strip. Subsequent visual formation of red lines is achieved through the binding of conjugated gold nanoparticles to the fluorescein label of the captured amplicons. The visual detection limit observed with synthetic target DNA was 0.3 ng and 1 pg with pure genomic DNA. Evaluation of the NALFB with 164 strains of V. cholerae and non-V. cholerae bacteria recorded 100% for both sensitivity and specificity. The whole procedure of the low-cost NALFB, which is performed at ambient temperature, eliminates the need for preheated buffers or additional equipment, greatly simplifying the protocol for sequence-specific PCR amplicon analysis. PMID:22705404

  18. Insights from the GC content analysis of 76genome survey sequences (GSS) from Elaeisoleiferaψ

    PubMed Central

    Bhore, Subhash J; Kassim, Amelia; Shah, Farida H

    2010-01-01

    South American oil-palm (Elaeis oleifera) is not cultivated in tropical countries like Malaysia on large scale due to low yield of palm oil derived from its fruit mesocarp. However, its fruit mesocarp oil contains about 68.6 % oleic acid (C18:1) which is more than double in comparison to commercially cultivated oilpalm, E. guineensis Jacq Tenera (hybrid of Dura (♀) x Pisifera (♂)). It is also known that E. oleifera is a good source of tocotrienols and carotenoids. Therefore, it is of interest to know the genome sequence of E. oleifera. The objective of this study is to generate genome survey sequences (GSS) to get GC content insight in the E. oleifera genome. The nuclear genomic DNA isolated from young leaf‐tissues was digested with EcoRI and NdeI/DraI restriction enzymes; and three genomic DNA libraries were constructed using Lambda ZAP‐II, pGEM®‐T Easy, and pDONR 222™ as cloning vectors. Generated 76 GSSs were analyzed by using Bioinformatics tools. The analysis result indicates that the adenine, cytosine, guanine and thymine content in generated GSSs are 30%, 20%, 20%, and 30% respectively. In conclusion, based on the precise GC content analysis of the randomly isolated 76 GSSs by using Bioinformatics tools we hypothesize that GC content in E. oleifera genome is 40%. The hypothesized 40% GC content in E. oleifera genome is expected to remain close to the GC content based on the whole genome analysis. ψThe nucleotide sequence data reported in this paper have been submitted to dbGSS division of the international DNA database (GenBank/DDBJ/EMBL) under accession numbers: DX575945- DX575972 and EI798032-EI798079. Abbreviations gDNA - Nuclear genomic DNA, GSSs - Genome survey sequences K12, SAOP - South American oil‐palm Db1 PMID:21364775

  19. Genomic Sequencing and Analysis of Sucra jujuba Nucleopolyhedrovirus