complete sequence-specific assignments: Topics by Science.gov

Sample records for complete sequence-specific assignments

Sequence-specific sup 1 H NMR resonance assignments of Bacillus subtilis HPr: Use of spectra obtained from mutants to resolve spectral overlap

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wittekind, M.; Klevit, R.E.; Reizer, J.

1990-08-07

On the basis of an analysis of two-dimensional {sup 1}H NMR spectra, the complete sequence-specific {sup 1}H NMR assignments are presented for the phosphocarrier protein HPr from the Gram-positive bacterium Bacillus subtilis. During the assignment procedure, extensive use was made of spectra obtained from point mutants of HPr in order to resolve spectral overlap and to provide verification of assignments. Regions of regular secondary structure were identified by characteristic patterns of sequential backbone proton NOEs and slowly exchanging amide protons. B subtilis HPr contains four {beta}-strands that form a single antiparallel {beta}-sheet and two well-defined {alpha}-helices. There are two stretchesmore » of extended backbone structure, one of which contains the active site His{sub 15}. The overall fold of the protein is very similar to that of Escherichia coli HPr determined by NMR studies.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Fogh, R.H.; Mabbutt, B.C.; Kem, W.R.

Sequence-specific assignments are reported for the 500-MHz H nuclear magnetic resonance (NMR) spectrum of the 48-residue polypeptide neurotoxin I from the sea anemone Stichodactyla helianthus (Sh I). Spin systems were first identified by using two-dimensional relayed or multiple quantum filtered correlation spectroscopy, double quantum spectroscopy, and spin lock experiments. Specific resonance assignments were then obtained from nuclear Overhauser enhancement (NOE) connectivities between protons from residues adjacent in the amino acid sequence. Of a total of 265 potentially observable resonances, 248 (i.e., 94%) were assigned, arising from 39 completely and 9 partially assigned amino acid spin systems. The secondary structure ofmore » Sh I was defined on the basis of the pattern of sequential NOE connectivities. NOEs between protons on separate strands of the polypeptide backbone, and backbone amide exchange rates. Sh I contains a four-stranded antiparallel {beta}-sheet encompassing residues 1-5, 16-24, 30-33, and 40-46, with a {beta}-bulge at residues 17 and 18 and a reverse turn, probably a type II {beta}-turn, involving residues 27-30. No evidence of {alpha}-helical structure was found.« less
Sequential /sup 1/H NMR assignments and secondary structure of hen egg white lysozyme in solution

DOE Office of Scientific and Technical Information (OSTI.GOV)

Redfield, C.; Dobson, C.M.

Assignments of /sup 1/H NMR resonances of 121 of the 129 residues of hen egg white lysozyme have been obtained by sequence-specific methods. Spin systems were identified with phase-sensitive two-dimensional (2-D) correlated spectroscopy and single and double relayed coherence transfer spectroscopy. For key types of amino acid residues, particularly alanine, threonine, valine, and glycine, complete spin systems were identified. For other residues a less complete definition of the spin system was found to be adequate for the purpose of sequential assignment. Sequence-specific assignments were achieved by phase-sensitive 2-D nuclear Overhauser enhancement spectroscopy (NOESY). Exploitation of the wide range of hydrogenmore » exchange rates found in lysozyme was a useful approach to overcoming the problem of spectral overlap. The sequential assignment was built up from 21 peptide segments ranging in length from 2 to 13 residues. The NOESY spectra were also used to provide information about the secondary structure of the protein in solution. Three helical regions and two regions of ..beta..-sheet were identified from the NOESY data; these regions are identical with those found in the X-ray structure of hen lysozyme. Slowly exchanging amides are generally correlated with hydrogen bonding identified in the X-ray structure; a number of exceptions to this general trend were, however, found. The results presented in this paper indicate that highly detailed information can be obtained from 2-D NMR spectra of a protein that is significantly larger than those studies previously.« less
Automated sequence-specific protein NMR assignment using the memetic algorithm MATCH.

PubMed

Volk, Jochen; Herrmann, Torsten; Wüthrich, Kurt

2008-07-01

MATCH (Memetic Algorithm and Combinatorial Optimization Heuristics) is a new memetic algorithm for automated sequence-specific polypeptide backbone NMR assignment of proteins. MATCH employs local optimization for tracing partial sequence-specific assignments within a global, population-based search environment, where the simultaneous application of local and global optimization heuristics guarantees high efficiency and robustness. MATCH thus makes combined use of the two predominant concepts in use for automated NMR assignment of proteins. Dynamic transition and inherent mutation are new techniques that enable automatic adaptation to variable quality of the experimental input data. The concept of dynamic transition is incorporated in all major building blocks of the algorithm, where it enables switching between local and global optimization heuristics at any time during the assignment process. Inherent mutation restricts the intrinsically required randomness of the evolutionary algorithm to those regions of the conformation space that are compatible with the experimental input data. Using intact and artificially deteriorated APSY-NMR input data of proteins, MATCH performed sequence-specific resonance assignment with high efficiency and robustness.
Rapid NMR Assignments of Proteins by Using Optimized Combinatorial Selective Unlabeling.

PubMed

Dubey, Abhinav; Kadumuri, Rajashekar Varma; Jaipuria, Garima; Vadrevu, Ramakrishna; Atreya, Hanudatta S

2016-02-15

A new approach for rapid resonance assignments in proteins based on amino acid selective unlabeling is presented. The method involves choosing a set of multiple amino acid types for selective unlabeling and identifying specific tripeptides surrounding the labeled residues from specific 2D NMR spectra in a combinatorial manner. The methodology directly yields sequence specific assignments, without requiring a contiguously stretch of amino acid residues to be linked, and is applicable to deuterated proteins. We show that a 2D [(15) N,(1) H] HSQC spectrum with two 2D spectra can result in ∼50 % assignments. The methodology was applied to two proteins: an intrinsically disordered protein (12 kDa) and the 29 kDa (268 residue) α-subunit of Escherichia coli tryptophan synthase, which presents a challenging case with spectral overlaps and missing peaks. The method can augment existing approaches and will be useful for applications such as identifying active-site residues involved in ligand binding, phosphorylation, or protein-protein interactions, even prior to complete resonance assignments. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Genomic organization, complete sequence, and chromosomal location of the gene for human eotaxin (SCYA11), an eosinophil-specific CC chemokine

DOE Office of Scientific and Technical Information (OSTI.GOV)

Garcia-Zepeda, E.A.; Sarafi, M.N.; Luster, A.D.

1997-05-01

Eotaxin is a CC chemokine that is a specific chemoattractant for eosinophils and is implicated in the pathogenesis of eosinophilic inflammatory diseases, such as asthma. We describe the genomic organization, complete sequence, including 1354 bp 5{prime} of the RNA initiation site, and chromosomal localization of the human eotaxin gene. Fluorescence in situ hybridization analysis localized eotaxin to human chromosome 17, in the region q21.1-q21.2, and the human gene name SCYA11 was assigned. We also present the 5{prime} flanking sequence of the mouse eotaxin gene and have identified several regulatory elements that are conserved between the murine and the human promoters.more » In particular, the presence of elements such as NF-{Kappa}B, interferon-{gamma} response element, and glucocorticoid response element may explain the observed regulation of the eotaxin gene by cytokines and glucocorticoids. 17 refs., 4 figs., 1 tab.« less
Assignment of protein sequences to existing domain and family classification systems: Pfam and the PDB.

PubMed

Xu, Qifang; Dunbrack, Roland L

2012-11-01

Automating the assignment of existing domain and protein family classifications to new sets of sequences is an important task. Current methods often miss assignments because remote relationships fail to achieve statistical significance. Some assignments are not as long as the actual domain definitions because local alignment methods often cut alignments short. Long insertions in query sequences often erroneously result in two copies of the domain assigned to the query. Divergent repeat sequences in proteins are often missed. We have developed a multilevel procedure to produce nearly complete assignments of protein families of an existing classification system to a large set of sequences. We apply this to the task of assigning Pfam domains to sequences and structures in the Protein Data Bank (PDB). We found that HHsearch alignments frequently scored more remotely related Pfams in Pfam clans higher than closely related Pfams, thus, leading to erroneous assignment at the Pfam family level. A greedy algorithm allowing for partial overlaps was, thus, applied first to sequence/HMM alignments, then HMM-HMM alignments and then structure alignments, taking care to join partial alignments split by large insertions into single-domain assignments. Additional assignment of repeat Pfams with weaker E-values was allowed after stronger assignments of the repeat HMM. Our database of assignments, presented in a database called PDBfam, contains Pfams for 99.4% of chains >50 residues. The Pfam assignment data in PDBfam are available at http://dunbrack2.fccc.edu/ProtCid/PDBfam, which can be searched by PDB codes and Pfam identifiers. They will be updated regularly.
CRISPR adaptive immune systems of Archaea

PubMed Central

Vestergaard, Gisle; Garrett, Roger A; Shah, Shiraz A

2014-01-01

CRISPR adaptive immune systems were analyzed for all available completed genomes of archaea, which included representatives of each of the main archaeal phyla. Initially, all proteins encoded within, and proximal to, CRISPR-cas loci were clustered and analyzed using a profile–profile approach. Then cas genes were assigned to gene cassettes and to functional modules for adaptation and interference. CRISPR systems were then classified primarily on the basis of their concatenated Cas protein sequences and gene synteny of the interference modules. With few exceptions, they could be assigned to the universal Type I or Type III systems. For Type I, subtypes I-A, I-B, and I-D dominate but the data support the division of subtype I-B into two subtypes, designated I-B and I-G. About 70% of the Type III systems fall into the universal subtypes III-A and III-B but the remainder, some of which are phyla-specific, diverge significantly in Cas protein sequences, and/or gene synteny, and they are classified separately. Furthermore, a few CRISPR systems that could not be assigned to Type I or Type III are categorized as variant systems. Criteria are presented for assigning newly sequenced archaeal CRISPR systems to the different subtypes. Several accessory proteins were identified that show a specific gene linkage, especially to Type III interference modules, and these may be cofunctional with the CRISPR systems. Evidence is presented for extensive exchange having occurred between adaptation and interference modules of different archaeal CRISPR systems, indicating the wide compatibility of the functionally diverse interference complexes with the relatively conserved adaptation modules. PMID:24531374
Classifying proteins into functional groups based on all-versus-all BLAST of 10 million proteins.

PubMed

Kolker, Natali; Higdon, Roger; Broomall, William; Stanberry, Larissa; Welch, Dean; Lu, Wei; Haynes, Winston; Barga, Roger; Kolker, Eugene

2011-01-01

To address the monumental challenge of assigning function to millions of sequenced proteins, we completed the first of a kind all-versus-all sequence alignments using BLAST for 9.9 million proteins in the UniRef100 database. Microsoft Windows Azure produced over 3 billion filtered records in 6 days using 475 eight-core virtual machines. Protein classification into functional groups was then performed using Hive and custom jars implemented on top of Apache Hadoop utilizing the MapReduce paradigm. First, using the Clusters of Orthologous Genes (COG) database, a length normalized bit score (LNBS) was determined to be the best similarity measure for classification of proteins. LNBS achieved sensitivity and specificity of 98% each. Second, out of 5.1 million bacterial proteins, about two-thirds were assigned to significantly extended COG groups, encompassing 30 times more assigned proteins. Third, the remaining proteins were classified into protein functional groups using an innovative implementation of a single-linkage algorithm on an in-house Hadoop compute cluster. This implementation significantly reduces the run time for nonindexed queries and optimizes efficient clustering on a large scale. The performance was also verified on Amazon Elastic MapReduce. This clustering assigned nearly 2 million proteins to approximately half a million different functional groups. A similar approach was applied to classify 2.8 million eukaryotic sequences resulting in over 1 million proteins being assign to existing KOG groups and the remainder clustered into 100,000 functional groups.
Assignment of protein sequences to existing domain and family classification systems: Pfam and the PDB

PubMed Central

Dunbrack, Roland L.

2012-01-01

Motivation: Automating the assignment of existing domain and protein family classifications to new sets of sequences is an important task. Current methods often miss assignments because remote relationships fail to achieve statistical significance. Some assignments are not as long as the actual domain definitions because local alignment methods often cut alignments short. Long insertions in query sequences often erroneously result in two copies of the domain assigned to the query. Divergent repeat sequences in proteins are often missed. Results: We have developed a multilevel procedure to produce nearly complete assignments of protein families of an existing classification system to a large set of sequences. We apply this to the task of assigning Pfam domains to sequences and structures in the Protein Data Bank (PDB). We found that HHsearch alignments frequently scored more remotely related Pfams in Pfam clans higher than closely related Pfams, thus, leading to erroneous assignment at the Pfam family level. A greedy algorithm allowing for partial overlaps was, thus, applied first to sequence/HMM alignments, then HMM–HMM alignments and then structure alignments, taking care to join partial alignments split by large insertions into single-domain assignments. Additional assignment of repeat Pfams with weaker E-values was allowed after stronger assignments of the repeat HMM. Our database of assignments, presented in a database called PDBfam, contains Pfams for 99.4% of chains >50 residues. Availability: The Pfam assignment data in PDBfam are available at http://dunbrack2.fccc.edu/ProtCid/PDBfam, which can be searched by PDB codes and Pfam identifiers. They will be updated regularly. Contact: Roland.Dunbracks@fccc.edu PMID:22942020
Sequencing RNA by a combination of exonuclease digestion and uridine specific chemical cleavage using MALDI-TOF.

PubMed Central

Tolson, D A; Nicholson, N H

1998-01-01

The determination of DNA sequences by partial exonuclease digestion followed by Matrix-Assisted Laser Desorption Time of Flight Mass Spectrometry (MALDI-TOF) is a well established method. When the same procedure is applied to RNA, difficulties arise due to the small (1 Da) mass difference between the nucleotides U and C, which makes unambiguous assignment difficult using a MALDI-TOF instrument. Here we report our experiences with sequence specific endonucleases and chemical methods followed by MALDI-TOF to resolve these sequence ambiguities. We have found chemical methods superior to endonucleases both in terms of correct specificity and extent of sequence coverage. This methodology can be used in combination with exonuclease digestion to rapidly assign RNA sequences. PMID:9421498
Complete sequence of HLA-B27 cDNA identified through the characterization of structural markers unique to the HLA-A, -B, and -C allelic series

DOE Office of Scientific and Technical Information (OSTI.GOV)

Szoets, H.; Reithmueller, G.; Weiss, E.

1986-03-01

Antigen HLA-B27 is a high-risk genetic factor with respect to a group of rheumatoid disorders, especially ankylosing spondylitis. A cDNA library was constructed from an autozygous B-cell line expressing HLA-B27, HLA-Cw1, and the previously cloned HLA-A2 antigen. Clones detected with an HLA probe were isolated and sorted into homology groups by differential hybridization and restriction maps. Nucleotide sequencing allowed the unambiguous assignment of cDNAs to HLA-A, -B, and -C loci. The HLA-B27 mRNA has the structure features and the codon variability typical of an HLA class I transcript but it specifies two uncommon amino acid replacements: a cysteine in positionmore » 67 and a serine in position 131. The latter substitution may have functional consequences, because it occurs in a conserved region and at a position invariably occupied by a species-specific arginine in humans and lysine in mice. The availability of the complete sequence of HLA-B27 and of the partial sequence of HLA-Cw1 allows the recognition of locus-specific sequence markers, particularly, but not exclusively, in the transmembrane and cytoplasmic domains.« less
Selective excitation enables assignment of proton resonances and (1)H-(1)H distance measurement in ultrafast magic angle spinning solid state NMR spectroscopy.

PubMed

Zhang, Rongchun; Ramamoorthy, Ayyalusamy

2015-07-21

Remarkable developments in ultrafast magic angle spinning (MAS) solid-state NMR spectroscopy enabled proton-based high-resolution multidimensional experiments on solids. To fully utilize the benefits rendered by proton-based ultrafast MAS experiments, assignment of (1)H resonances becomes absolutely necessary. Herein, we propose an approach to identify different proton peaks by using dipolar-coupled heteronuclei such as (13)C or (15)N. In this method, after the initial preparation of proton magnetization and cross-polarization to (13)C nuclei, transverse magnetization of desired (13)C nuclei is selectively prepared by using DANTE (Delays Alternating with Nutations for Tailored Excitation) sequence and then, it is transferred to bonded protons with a short-contact-time cross polarization. Our experimental results demonstrate that protons bonded to specific (13)C atoms can be identified and overlapping proton peaks can also be assigned. In contrast to the regular 2D HETCOR experiment, only a few 1D experiments are required for the complete assignment of peaks in the proton spectrum. Furthermore, the finite-pulse radio frequency driven recoupling sequence could be incorporated right after the selection of specific proton signals to monitor the intensity buildup for other proton signals. This enables the extraction of (1)H-(1)H distances between different pairs of protons. Therefore, we believe that the proposed method will greatly aid in fast assignment of peaks in proton spectra and will be useful in the development of proton-based multi-dimensional solid-state NMR experiments to study atomic-level resolution structure and dynamics of solids.
Two-dimensional sup 1 H NMR studies on HPr protein from Staphylococcus aureus: Complete sequential assignments and secondary structure

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kalbitzer, H.R.; Neidig, K.P.; Hengstenberg, W.

1991-11-19

Complete sequence-specific assignments of the {sup 1}H NMR spectrum of HPr protein from Staphylococcus aureus were obtained by two-dimensional NMR methods. Important secondary structure elements that can be derived from the observed nuclear Overhauser effects are a large antiparallel {beta}-pleated sheet consisting of four strands, A, B, C, D, a segment S{sub AB} consisting of an extended region around the active-center histidine (His-15) and an {alpha}-helix, a half-turn between strands B and C, a segment S{sub CD} which shows no typical secondary structure, and the {alpha}-helical, C-terminal segment S{sub term}. These general structural features are similar to those found earliermore » in HPr proteins from different microorganisms such as Escherichia coli, Bacillus subtilis, and Streptococcus faecalis.« less
Assessing the utility of eDNA as a tool to survey reef-fish communities in the Red Sea

NASA Astrophysics Data System (ADS)

DiBattista, Joseph D.; Coker, Darren J.; Sinclair-Taylor, Tane H.; Stat, Michael; Berumen, Michael L.; Bunce, Michael

2017-12-01

Relatively small volumes of water may contain sufficient environmental DNA (eDNA) to detect target aquatic organisms via genetic sequencing. We therefore assessed the utility of eDNA to document the diversity of coral reef fishes in the central Red Sea. DNA from seawater samples was extracted, amplified using fish-specific 16S mitochondrial DNA primers, and sequenced using a metabarcoding workflow. DNA sequences were assigned to taxa using available genetic repositories or custom genetic databases generated from reference fishes. Our approach revealed a diversity of conspicuous, cryptobenthic, and commercially relevant reef fish at the genus level, with select genera in the family Labridae over-represented. Our approach, however, failed to capture a significant fraction of the fish fauna known to inhabit the Red Sea, which we attribute to limited spatial sampling, amplification stochasticity, and an apparent lack of sequencing depth. Given an increase in fish species descriptions, completeness of taxonomic checklists, and improvement in species-level assignment with custom genetic databases as shown here, we suggest that the Red Sea region may be ideal for further testing of the eDNA approach.
Comparative Genomics of Completely Sequenced Lactobacillus helveticus Genomes Provides Insights into Strain-Specific Genes and Resolves Metagenomics Data Down to the Strain Level.

PubMed

Schmid, Michael; Muri, Jonathan; Melidis, Damianos; Varadarajan, Adithi R; Somerville, Vincent; Wicki, Adrian; Moser, Aline; Bourqui, Marc; Wenzel, Claudia; Eugster-Meier, Elisabeth; Frey, Juerg E; Irmler, Stefan; Ahrens, Christian H

2018-01-01

Although complete genome sequences hold particular value for an accurate description of core genomes, the identification of strain-specific genes, and as the optimal basis for functional genomics studies, they are still largely underrepresented in public repositories. Based on an assessment of the genome assembly complexity for all lactobacilli, we used Pacific Biosciences' long read technology to sequence and de novo assemble the genomes of three Lactobacillus helveticus starter strains, raising the number of completely sequenced strains to 12. The first comparative genomics study for L. helveticus -to our knowledge-identified a core genome of 988 genes and sets of unique, strain-specific genes ranging from about 30 to more than 200 genes. Importantly, the comparison of MiSeq- and PacBio-based assemblies uncovered that not only accessory but also core genes can be missed in incomplete genome assemblies based on short reads. Analysis of the three genomes revealed that a large number of pseudogenes were enriched for functional Gene Ontology categories such as amino acid transmembrane transport and carbohydrate metabolism, which is in line with a reductive genome evolution in the rich natural habitat of L. helveticus . Notably, the functional Clusters of Orthologous Groups of proteins categories "cell wall/membrane biogenesis" and "defense mechanisms" were found to be enriched among the strain-specific genes. A genome mining effort uncovered examples where an experimentally observed phenotype could be linked to the underlying genotype, such as for cell envelope proteinase PrtH3 of strain FAM8627. Another possible link identified for peptidoglycan hydrolases will require further experiments. Of note, strain FAM22155 did not harbor a CRISPR/Cas system; its loss was also observed in other L. helveticus strains and lactobacillus species, thus questioning the value of the CRISPR/Cas system for diagnostic purposes. Importantly, the complete genome sequences proved to be very useful for the analysis of natural whey starter cultures with metagenomics, as a larger percentage of the sequenced reads of these complex mixtures could be unambiguously assigned down to the strain level.
Comparative Genomics of Completely Sequenced Lactobacillus helveticus Genomes Provides Insights into Strain-Specific Genes and Resolves Metagenomics Data Down to the Strain Level

PubMed Central

Schmid, Michael; Muri, Jonathan; Melidis, Damianos; Varadarajan, Adithi R.; Somerville, Vincent; Wicki, Adrian; Moser, Aline; Bourqui, Marc; Wenzel, Claudia; Eugster-Meier, Elisabeth; Frey, Juerg E.; Irmler, Stefan; Ahrens, Christian H.

2018-01-01

Although complete genome sequences hold particular value for an accurate description of core genomes, the identification of strain-specific genes, and as the optimal basis for functional genomics studies, they are still largely underrepresented in public repositories. Based on an assessment of the genome assembly complexity for all lactobacilli, we used Pacific Biosciences' long read technology to sequence and de novo assemble the genomes of three Lactobacillus helveticus starter strains, raising the number of completely sequenced strains to 12. The first comparative genomics study for L. helveticus—to our knowledge—identified a core genome of 988 genes and sets of unique, strain-specific genes ranging from about 30 to more than 200 genes. Importantly, the comparison of MiSeq- and PacBio-based assemblies uncovered that not only accessory but also core genes can be missed in incomplete genome assemblies based on short reads. Analysis of the three genomes revealed that a large number of pseudogenes were enriched for functional Gene Ontology categories such as amino acid transmembrane transport and carbohydrate metabolism, which is in line with a reductive genome evolution in the rich natural habitat of L. helveticus. Notably, the functional Clusters of Orthologous Groups of proteins categories “cell wall/membrane biogenesis” and “defense mechanisms” were found to be enriched among the strain-specific genes. A genome mining effort uncovered examples where an experimentally observed phenotype could be linked to the underlying genotype, such as for cell envelope proteinase PrtH3 of strain FAM8627. Another possible link identified for peptidoglycan hydrolases will require further experiments. Of note, strain FAM22155 did not harbor a CRISPR/Cas system; its loss was also observed in other L. helveticus strains and lactobacillus species, thus questioning the value of the CRISPR/Cas system for diagnostic purposes. Importantly, the complete genome sequences proved to be very useful for the analysis of natural whey starter cultures with metagenomics, as a larger percentage of the sequenced reads of these complex mixtures could be unambiguously assigned down to the strain level. PMID:29441050
DNA barcode data accurately assign higher spider taxa

PubMed Central

Coddington, Jonathan A.; Agnarsson, Ingi; Cheng, Ren-Chung; Čandek, Klemen; Driskell, Amy; Frick, Holger; Gregorič, Matjaž; Kostanjšek, Rok; Kropf, Christian; Kweskin, Matthew; Lokovšek, Tjaša; Pipan, Miha; Vidergar, Nina

2016-01-01

The use of unique DNA sequences as a method for taxonomic identification is no longer fundamentally controversial, even though debate continues on the best markers, methods, and technology to use. Although both existing databanks such as GenBank and BOLD, as well as reference taxonomies, are imperfect, in best case scenarios “barcodes” (whether single or multiple, organelle or nuclear, loci) clearly are an increasingly fast and inexpensive method of identification, especially as compared to manual identification of unknowns by increasingly rare expert taxonomists. Because most species on Earth are undescribed, a complete reference database at the species level is impractical in the near term. The question therefore arises whether unidentified species can, using DNA barcodes, be accurately assigned to more inclusive groups such as genera and families—taxonomic ranks of putatively monophyletic groups for which the global inventory is more complete and stable. We used a carefully chosen test library of CO1 sequences from 49 families, 313 genera, and 816 species of spiders to assess the accuracy of genus and family-level assignment. We used BLAST queries of each sequence against the entire library and got the top ten hits. The percent sequence identity was reported from these hits (PIdent, range 75–100%). Accurate assignment of higher taxa (PIdent above which errors totaled less than 5%) occurred for genera at PIdent values >95 and families at PIdent values ≥ 91, suggesting these as heuristic thresholds for accurate generic and familial identifications in spiders. Accuracy of identification increases with numbers of species/genus and genera/family in the library; above five genera per family and fifteen species per genus all higher taxon assignments were correct. We propose that using percent sequence identity between conventional barcode sequences may be a feasible and reasonably accurate method to identify animals to family/genus. However, the quality of the underlying database impacts accuracy of results; many outliers in our dataset could be attributed to taxonomic and/or sequencing errors in BOLD and GenBank. It seems that an accurate and complete reference library of families and genera of life could provide accurate higher level taxonomic identifications cheaply and accessibly, within years rather than decades. PMID:27547527
Resonance assignment of disordered protein with repetitive and overlapping sequence using combinatorial approach reveals initial structural propensities and local restrictions in the denatured state.

PubMed

Malik, Nikita; Kumar, Ashutosh

2016-09-01

NMR resonance assignment of intrinsically disordered proteins poses a challenge because of the limited dispersion of amide proton chemical shifts. This becomes even more complex with the increase in the size of the system. Residue specific selective labeling/unlabeling experiments have been used to resolve the overlap, but require multiple sample preparations. Here, we demonstrate an assignment strategy requiring only a single sample of uniformly labeled (13)C,(15)N-protein. We have used a combinatorial approach, involving 3D-HNN, CC(CO)NH and 2D-MUSIC, which allowed us to assign a denatured centromeric protein Cse4 of 229 residues. Further, we show that even the less sensitive experiments, when used in an efficient manner can lead to the complete assignment of a complex system without the use of specialized probes in a relatively short time frame. The assignment of the amino acids discloses the presence of local structural propensities even in the denatured state accompanied by restricted motion in certain regions that provides insights into the early folding events of the protein.
The genomic underpinnings of eukaryotic virus taxonomy: creating a sequence-based framework for family-level virus classification.

PubMed

Aiewsakun, Pakorn; Simmonds, Peter

2018-02-20

The International Committee on Taxonomy of Viruses (ICTV) classifies viruses into families, genera and species and provides a regulated system for their nomenclature that is universally used in virus descriptions. Virus taxonomic assignments have traditionally been based upon virus phenotypic properties such as host range, virion morphology and replication mechanisms, particularly at family level. However, gene sequence comparisons provide a clearer guide to their evolutionary relationships and provide the only information that may guide the incorporation of viruses detected in environmental (metagenomic) studies that lack any phenotypic data. The current study sought to determine whether the existing virus taxonomy could be reproduced by examination of genetic relationships through the extraction of protein-coding gene signatures and genome organisational features. We found large-scale consistency between genetic relationships and taxonomic assignments for viruses of all genome configurations and genome sizes. The analysis pipeline that we have called 'Genome Relationships Applied to Virus Taxonomy' (GRAViTy) was highly effective at reproducing the current assignments of viruses at family level as well as inter-family groupings into orders. Its ability to correctly differentiate assigned viruses from unassigned viruses, and classify them into the correct taxonomic group, was evaluated by threefold cross-validation technique. This predicted family membership of eukaryotic viruses with close to 100% accuracy and specificity potentially enabling the algorithm to predict assignments for the vast corpus of metagenomic sequences consistently with ICTV taxonomy rules. In an evaluation run of GRAViTy, over one half (460/921) of (near)-complete genome sequences from several large published metagenomic eukaryotic virus datasets were assigned to 127 novel family-level groupings. If corroborated by other analysis methods, these would potentially more than double the number of eukaryotic virus families in the ICTV taxonomy. A rapid and objective means to explore metagenomic viral diversity and make informed recommendations for their assignments at each taxonomic layer is essential. GRAViTy provides one means to make rule-based assignments at family and order levels in a manner that preserves the integrity and underlying organisational principles of the current ICTV taxonomy framework. Such methods are increasingly required as the vast virosphere is explored.

Comparative NMR analysis of the decadeoxynucleotide d-(GCATTAATGC)2 and an analogue containing 2-aminoadenine.

PubMed Central

Chazin, W J; Rance, M; Chollet, A; Leupin, W

1991-01-01

The dodecadeoxynucleotide duplex d-(GCATTAATGC)2 has been prepared with all adenine bases replaced by 2-NH2-adenine. This modified duplex has been characterized by nuclear magnetic resonance (NMR) spectroscopy. Complete sequence-specific 1H resonance assignments have been obtained by using a variety of 2D NMR methods. Multiple quantum-filtered and multiple quantum experiments have been used to completely assign all sugar ring protons, including 5'H and 5'H resonances. The assignments form the basis for a detailed comparative analysis of the 1H NMR parameters of the modified and parent duplex. The structural features of both decamer duplexes in solution are characteristic of the B-DNA family. The spin-spin coupling constants in the sugar rings and the relative spatial proximities of protons in the bases and sugars (as determined from the comparison of corresponding nuclear Overhauser effects) are virtually identical in the parent and modified duplexes. Thus, substitution by this adenine analogue in oligonucleotides appears not to disturb the global or local conformation of the DNA duplex. PMID:1945828
Genome organization of epidemic Acinetobacter baumannii strains.

PubMed

Di Nocera, Pier Paolo; Rocco, Francesco; Giannouli, Maria; Triassi, Maria; Zarrilli, Raffaele

2011-10-10

Acinetobacter baumannii is an opportunistic pathogen responsible for hospital-acquired infections. A. baumannii epidemics described world-wide were caused by few genotypic clusters of strains. The occurrence of epidemics caused by multi-drug resistant strains assigned to novel genotypes have been reported over the last few years. In the present study, we compared whole genome sequences of three A. baumannii strains assigned to genotypes ST2, ST25 and ST78, representative of the most frequent genotypes responsible for epidemics in several Mediterranean hospitals, and four complete genome sequences of A. baumannii strains assigned to genotypes ST1, ST2 and ST77. Comparative genome analysis showed extensive synteny and identified 3068 coding regions which are conserved, at the same chromosomal position, in all A. baumannii genomes. Genome alignments also identified 63 DNA regions, ranging in size from 4 o 126 kb, all defined as genomic islands, which were present in some genomes, but were either missing or replaced by non-homologous DNA sequences in others. Some islands are involved in resistance to drugs and metals, others carry genes encoding surface proteins or enzymes involved in specific metabolic pathways, and others correspond to prophage-like elements. Accessory DNA regions encode 12 to 19% of the potential gene products of the analyzed strains. The analysis of a collection of epidemic A. baumannii strains showed that some islands were restricted to specific genotypes. The definition of the genome components of A. baumannii provides a scaffold to rapidly evaluate the genomic organization of novel clinical A. baumannii isolates. Changes in island profiling will be useful in genomic epidemiology of A. baumannii population.
Selective excitation enables assignment of proton resonances and {sup 1}H-{sup 1}H distance measurement in ultrafast magic angle spinning solid state NMR spectroscopy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Rongchun; Ramamoorthy, Ayyalusamy, E-mail: ramamoor@umich.edu

2015-07-21

Remarkable developments in ultrafast magic angle spinning (MAS) solid-state NMR spectroscopy enabled proton-based high-resolution multidimensional experiments on solids. To fully utilize the benefits rendered by proton-based ultrafast MAS experiments, assignment of {sup 1}H resonances becomes absolutely necessary. Herein, we propose an approach to identify different proton peaks by using dipolar-coupled heteronuclei such as {sup 13}C or {sup 15}N. In this method, after the initial preparation of proton magnetization and cross-polarization to {sup 13}C nuclei, transverse magnetization of desired {sup 13}C nuclei is selectively prepared by using DANTE (Delays Alternating with Nutations for Tailored Excitation) sequence and then, it is transferredmore » to bonded protons with a short-contact-time cross polarization. Our experimental results demonstrate that protons bonded to specific {sup 13}C atoms can be identified and overlapping proton peaks can also be assigned. In contrast to the regular 2D HETCOR experiment, only a few 1D experiments are required for the complete assignment of peaks in the proton spectrum. Furthermore, the finite-pulse radio frequency driven recoupling sequence could be incorporated right after the selection of specific proton signals to monitor the intensity buildup for other proton signals. This enables the extraction of {sup 1}H-{sup 1}H distances between different pairs of protons. Therefore, we believe that the proposed method will greatly aid in fast assignment of peaks in proton spectra and will be useful in the development of proton-based multi-dimensional solid-state NMR experiments to study atomic-level resolution structure and dynamics of solids.« less
Structure-Specific Ribonucleases for MS-Based Elucidation of Higher-Order RNA Structure

NASA Astrophysics Data System (ADS)

Scalabrin, Matteo; Siu, Yik; Asare-Okai, Papa Nii; Fabris, Daniele

2014-07-01

Supported by high-throughput sequencing technologies, structure-specific nucleases are experiencing a renaissance as biochemical probes for genome-wide mapping of nucleic acid structure. This report explores the benefits and pitfalls of the application of Mung bean (Mb) and V1 nuclease, which attack specifically single- and double-stranded regions of nucleic acids, as possible structural probes to be employed in combination with MS detection. Both enzymes were found capable of operating in ammonium-based solutions that are preferred for high-resolution analysis by direct infusion electrospray ionization (ESI). Sequence analysis by tandem mass spectrometry (MS/MS) was performed to confirm mapping assignments and to resolve possible ambiguities arising from the concomitant formation of isobaric products with identical base composition and different sequences. The observed products grouped together into ladder-type series that facilitated their assignment to unique regions of the substrate, but revealed also a certain level of uncertainty in identifying the boundaries between paired and unpaired regions. Various experimental factors that are known to stabilize nucleic acid structure, such as higher ionic strength, presence of Mg(II), etc., increased the accuracy of cleavage information, but did not completely eliminate deviations from expected results. These observations suggest extreme caution in interpreting the results afforded by these types of reagents. Regardless of the analytical platform of choice, the results highlighted the need to repeat probing experiments under the most diverse possible conditions to recognize potential artifacts and to increase the level of confidence in the observed structural information.
Characterization of sour cherry isolates of plum pox virus from the Volga Basin in Russia reveals a new cherry strain of the virus.

PubMed

Glasa, Miroslav; Prikhodko, Yuri; Predajňa, Lukáš; Nagyová, Alžbeta; Shneyder, Yuri; Zhivaeva, Tatiana; Subr, Zdeno; Cambra, Mariano; Candresse, Thierry

2013-09-01

Plum pox virus (PPV) is the causal agent of sharka, the most detrimental virus disease of stone fruit trees worldwide. PPV isolates have been assigned into seven distinct strains, of which PPV-C regroups the genetically distinct isolates detected in several European countries on cherry hosts. Here, three complete and several partial genomic sequences of PPV isolates from sour cherry trees in the Volga River basin of Russia have been determined. The comparison of complete genome sequences has shown that the nucleotide identity values with other PPV isolates reached only 77.5 to 83.5%. Phylogenetic analyses clearly assigned the RU-17sc, RU-18sc, and RU-30sc isolates from cherry to a distinct cluster, most closely related to PPV-C and, to a lesser extent, PPV-W. Based on their natural infection of sour cherry trees and genomic characterization, the PPV isolates reported here represent a new strain of PPV, for which the name PPV-CR (Cherry Russia) is proposed. The unique amino acids conserved among PPV-CR and PPV-C cherry-infecting isolates (75 in total) are mostly distributed within the central part of P1, NIa, and the N terminus of the coat protein (CP), making them potential candidates for genetic determinants of the ability to infect cherry species or of adaptation to these hosts. The variability observed within 14 PPV-CR isolates analyzed in this study (0 to 2.6% nucleotide divergence in partial CP sequences) and the identification of these isolates in different localities and cultivation conditions suggest the efficient establishment and competitiveness of the PPV-CR in the environment. A specific primer pair has been developed, allowing the specific reverse-transcription polymerase chain reaction detection of PPV-CR isolates.
HMMER Cut-off Threshold Tool (HMMERCTTER): Supervised classification of superfamily protein sequences with a reliable cut-off threshold.

PubMed

Pagnuco, Inti Anabela; Revuelta, María Victoria; Bondino, Hernán Gabriel; Brun, Marcel; Ten Have, Arjen

2018-01-01

Protein superfamilies can be divided into subfamilies of proteins with different functional characteristics. Their sequences can be classified hierarchically, which is part of sequence function assignation. Typically, there are no clear subfamily hallmarks that would allow pattern-based function assignation by which this task is mostly achieved based on the similarity principle. This is hampered by the lack of a score cut-off that is both sensitive and specific. HMMER Cut-off Threshold Tool (HMMERCTTER) adds a reliable cut-off threshold to the popular HMMER. Using a high quality superfamily phylogeny, it clusters a set of training sequences such that the cluster-specific HMMER profiles show cluster or subfamily member detection with 100% precision and recall (P&R), thereby generating a specific threshold as inclusion cut-off. Profiles and thresholds are then used as classifiers to screen a target dataset. Iterative inclusion of novel sequences to groups and the corresponding HMMER profiles results in high sensitivity while specificity is maintained by imposing 100% P&R self detection. In three presented case studies of protein superfamilies, classification of large datasets with 100% precision was achieved with over 95% recall. Limits and caveats are presented and explained. HMMERCTTER is a promising protein superfamily sequence classifier provided high quality training datasets are used. It provides a decision support system that aids in the difficult task of sequence function assignation in the twilight zone of sequence similarity. All relevant data and source codes are available from the Github repository at the following URL: https://github.com/BBCMdP/HMMERCTTER.
HMMER Cut-off Threshold Tool (HMMERCTTER): Supervised classification of superfamily protein sequences with a reliable cut-off threshold

PubMed Central

Pagnuco, Inti Anabela; Revuelta, María Victoria; Bondino, Hernán Gabriel; Brun, Marcel

2018-01-01

Background Protein superfamilies can be divided into subfamilies of proteins with different functional characteristics. Their sequences can be classified hierarchically, which is part of sequence function assignation. Typically, there are no clear subfamily hallmarks that would allow pattern-based function assignation by which this task is mostly achieved based on the similarity principle. This is hampered by the lack of a score cut-off that is both sensitive and specific. Results HMMER Cut-off Threshold Tool (HMMERCTTER) adds a reliable cut-off threshold to the popular HMMER. Using a high quality superfamily phylogeny, it clusters a set of training sequences such that the cluster-specific HMMER profiles show cluster or subfamily member detection with 100% precision and recall (P&R), thereby generating a specific threshold as inclusion cut-off. Profiles and thresholds are then used as classifiers to screen a target dataset. Iterative inclusion of novel sequences to groups and the corresponding HMMER profiles results in high sensitivity while specificity is maintained by imposing 100% P&R self detection. In three presented case studies of protein superfamilies, classification of large datasets with 100% precision was achieved with over 95% recall. Limits and caveats are presented and explained. Conclusions HMMERCTTER is a promising protein superfamily sequence classifier provided high quality training datasets are used. It provides a decision support system that aids in the difficult task of sequence function assignation in the twilight zone of sequence similarity. All relevant data and source codes are available from the Github repository at the following URL: https://github.com/BBCMdP/HMMERCTTER. PMID:29579071
Reverse Genetics and High Throughput Sequencing Methodologies for Plant Functional Genomics

PubMed Central

Ben-Amar, Anis; Daldoul, Samia; Reustle, Götz M.; Krczal, Gabriele; Mliki, Ahmed

2016-01-01

In the post-genomic era, increasingly sophisticated genetic tools are being developed with the long-term goal of understanding how the coordinated activity of genes gives rise to a complex organism. With the advent of the next generation sequencing associated with effective computational approaches, wide variety of plant species have been fully sequenced giving a wealth of data sequence information on structure and organization of plant genomes. Since thousands of gene sequences are already known, recently developed functional genomics approaches provide powerful tools to analyze plant gene functions through various gene manipulation technologies. Integration of different omics platforms along with gene annotation and computational analysis may elucidate a complete view in a system biology level. Extensive investigations on reverse genetics methodologies were deployed for assigning biological function to a specific gene or gene product. We provide here an updated overview of these high throughout strategies highlighting recent advances in the knowledge of functional genomics in plants. PMID:28217003
Genomic Sequence of the WHO International Standard for Hepatitis A Virus RNA.

PubMed

Jenkins, Adrian; Minhas, Rehan; Morris, Clare; Berry, Neil

2018-05-10

The World Health Organization (WHO) international standard for hepatitis A virus (HAV) RNA nucleic acid assays was characterized by complete genome sequencing. The entire coding sequence and noncoding regions were assigned HAV genotype IB. This information will aid the design, development, and evaluation of HAV RNA amplification assays. Copyright © 2018 Jenkins et al.
Amino acid selective unlabeling for sequence specific resonance assignments in proteins

PubMed Central

Krishnarjuna, B.; Jaipuria, Garima; Thakur, Anushikha

2010-01-01

Sequence specific resonance assignment constitutes an important step towards high-resolution structure determination of proteins by NMR and is aided by selective identification and assignment of amino acid types. The traditional approach to selective labeling yields only the chemical shifts of the particular amino acid being selected and does not help in establishing a link between adjacent residues along the polypeptide chain, which is important for sequential assignments. An alternative approach is the method of amino acid selective ‘unlabeling’ or reverse labeling, which involves selective unlabeling of specific amino acid types against a uniformly 13C/15N labeled background. Based on this method, we present a novel approach for sequential assignments in proteins. The method involves a new NMR experiment named, {12COi–15Ni+1}-filtered HSQC, which aids in linking the 1HN/15N resonances of the selectively unlabeled residue, i, and its C-terminal neighbor, i + 1, in HN-detected double and triple resonance spectra. This leads to the assignment of a tri-peptide segment from the knowledge of the amino acid types of residues: i − 1, i and i + 1, thereby speeding up the sequential assignment process. The method has the advantage of being relatively inexpensive, applicable to 2H labeled protein and can be coupled with cell-free synthesis and/or automated assignment approaches. A detailed survey involving unlabeling of different amino acid types individually or in pairs reveals that the proposed approach is also robust to misincorporation of 14N at undesired sites. Taken together, this study represents the first application of selective unlabeling for sequence specific resonance assignments and opens up new avenues to using this methodology in protein structural studies. Electronic supplementary material The online version of this article (doi:10.1007/s10858-010-9459-z) contains supplementary material, which is available to authorized users. PMID:21153044
PCOGR: phylogenetic COG ranking as an online tool to judge the specificity of COGs with respect to freely definable groups of organisms.

PubMed

Meereis, Florian; Kaufmann, Michael

2004-10-15

The rapidly increasing number of completely sequenced genomes led to the establishment of the COG-database which, based on sequence homologies, assigns similar proteins from different organisms to clusters of orthologous groups (COGs). There are several bioinformatic studies that made use of this database to determine (hyper)thermophile-specific proteins by searching for COGs containing (almost) exclusively proteins from (hyper)thermophilic genomes. However, public software to perform individually definable group-specific searches is not available. The tool described here exactly fills this gap. The software is accessible at http://www.uni-wh.de/pcogr and is linked to the COG-database. The user can freely define two groups of organisms by selecting for each of the (current) 66 organisms to belong either to groupA, to the reference groupB or to be ignored by the algorithm. Then, for all COGs a specificity index is calculated with respect to the specificity to groupA, i. e. high scoring COGs contain proteins from the most of groupA organisms while proteins from the most organisms assigned to groupB are absent. In addition to ranking all COGs according to the user defined specificity criteria, a graphical visualization shows the distribution of all COGs by displaying their abundance as a function of their specificity indexes. This software allows detecting COGs specific to a predefined group of organisms. All COGs are ranked in the order of their specificity and a graphical visualization allows recognizing (i) the presence and abundance of such COGs and (ii) the phylogenetic relationship between groupA- and groupB-organisms. The software also allows detecting putative protein-protein interactions, novel enzymes involved in only partially known biochemical pathways, and alternate enzymes originated by convergent evolution.
Putative and unique gene sequence utilization for the design of species specific probes as modeled by Lactobacillus plantarum

USDA-ARS?s Scientific Manuscript database

The concept of utilizing putative and unique gene sequences for the design of species specific probes was tested. The abundance profile of assigned functions within the Lactobacillus plantarum genome was used for the identification of the putative and unique gene sequence, csh. The targeted gene (cs...
Sequence-specific 1H-NMR assignments for the aromatic region of several biologically active, monomeric insulins including native human insulin.

PubMed

Roy, M; Lee, R W; Kaarsholm, N C; Thøgersen, H; Brange, J; Dunn, M F

1990-06-12

The aromatic region of the 1H-FT-NMR spectrum of the biologically fully-potent, monomeric human insulin mutant, B9 Ser----Asp, B27 Thr----Glu has been investigated in D2O. At 1 to 5 mM concentrations, this mutant insulin is monomeric above pH 7.5. Coupling and amino acid classification of all aromatic signals is established via a combination of homonuclear one- and two-dimensional methods, including COSY, multiple quantum filters, selective spin decoupling and pH titrations. By comparisons with other insulin mutants and with chemically modified native insulins, all resonances in the aromatic region are given sequence-specific assignments without any reliance on the various crystal structures reported for insulin. These comparisons also give the sequence-specific assignments of most of the aromatic resonances of the mutant insulins B16 Tyr----Glu, B27 Thr----Glu and B25 Phe----Asp and the chemically modified species des-(B23-B30) insulin and monoiodo-Tyr A14 insulin. Chemical dispersion of the assigned resonances, ring current perturbations and comparisons at high pH have made possible the assignment of the aromatic resonances of human insulin, and these studies indicate that the major structural features of the human insulin monomer (including those critical to biological function) are also present in the monomeric mutant.
Unveiling the complete genome sequence of clerodendrum chlorotic spot virus, a putative dichorhavirus infecting ornamental plants.

PubMed

Ramos-González, Pedro Luis; Chabi-Jesus, Camila; Banguela-Castillo, Alexander; Tassi, Aline Daniele; Rodrigues, Mariane da Costa; Kitajima, Elliot Watanabe; Harakava, Ricardo; Freitas-Astúa, Juliana

2018-06-04

The genus Dichorhavirus includes plant-infecting rhabdoviruses with bisegmented genomes that are horizontally transmitted by false spider mites of the genus Brevipalpus. The complete genome sequences of three isolates of the putative dichorhavirus clerodendrum chlorotic spot virus were determined using next-generation sequencing (Illumina) and traditional RT-PCR. Their genome organization, sequence similarity and phylogenetic relationship to other viruses, and transmissibility by Brevipalpus yothersi mites support the assignment of these viruses to a new species of dichorhavirus, as suggested previously. New data are discussed stressing the reliability of the current rules for species demarcation and taxonomic status criteria within the genus Dichorhavirus.
Re-Envisioning the Introductory Physics Sequence at Georgia Gwinnett College (GGC)

NASA Astrophysics Data System (ADS)

Thompson, Scott J.; Sales, Kenneth B.

2013-03-01

GGC is a new, 4-year, open-access institution located in the northeast of Atlanta. As an open access college, many of the students who take the introductory physics sequence do not have a strong mathematical background. A large percentage of the students have significant work or family obligations in addition to being full-time students. To better serve these students, the first semester of the trig-based introductory physics sequence was modified in a manner that focuses and structures the material to be completed by the students both outside and inside of class such that the time spent outside of class can be reduced. Specifically, focused notes were provided to the students with an online assignment prior to class in place of reading from a textbook. Class time was then focused on a deeper understanding of the concepts to be covered instead of an initial (or secondary) introduction to the material. Data was collected for specific exam questions and compared with the results from previous classes taught by the same instructors. An overview of the results and observations of the instructors using this method will be discussed.
Phylogenetic relationships of Paradiclybothrium pacificum and Diclybothrium armatum (Monogenoidea: Diclybothriidae) inferred from 18S rDNA sequence data.

PubMed

Rozhkovan, Konstantin V; Shedko, Marina B

2015-10-01

The Diclybothriidae (Monogenoidea: Oligonchoinea) includes specific parasites of fishes assigned to the ancient order Acipenseriformes. Phylogeny of the Diclybothriidae is still unclear despite several systematic studies based on morphological characters. Together with the closely related Hexabothriidae represented by parasites of sharks and ray-fishes, the position of Diclybothriidae in different taxonomical systems has been matter of discussion. Here, we present the first molecular data on Diclybothriidae. The SSU rRNA gene was used to investigate the phylogenetic position of Paradiclybothrium pacificum and Diclybothrium armatum among the other Oligonchoinea. Complete nucleotide sequences of P. pacificum and D. armatum demonstrated high identity (98.53%) with no intraspecific sequence variability. Specimens of D. armatum were obtained from different hosts (Acipenser schrenckii and Huso dauricus); however, variation by host was not detected. The sequence divergence and phylogenetic trees data show that Diclybothriidae and Hexabothriidae are more closely related to each other than with other representatives of Oligonchoinea. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
The SUPERFAMILY database in 2004: additions and improvements.

PubMed

Madera, Martin; Vogel, Christine; Kummerfeld, Sarah K; Chothia, Cyrus; Gough, Julian

2004-01-01

The SUPERFAMILY database provides structural assignments to protein sequences and a framework for analysis of the results. At the core of the database is a library of profile Hidden Markov Models that represent all proteins of known structure. The library is based on the SCOP classification of proteins: each model corresponds to a SCOP domain and aims to represent an entire superfamily. We have applied the library to predicted proteins from all completely sequenced genomes (currently 154), the Swiss-Prot and TrEMBL databases and other sequence collections. Close to 60% of all proteins have at least one match, and one half of all residues are covered by assignments. All models and full results are available for download and online browsing at http://supfam.org. Users can study the distribution of their superfamily of interest across all completely sequenced genomes, investigate with which other superfamilies it combines and retrieve proteins in which it occurs. Alternatively, concentrating on a particular genome as a whole, it is possible first, to find out its superfamily composition, and secondly, to compare it with that of other genomes to detect superfamilies that are over- or under-represented. In addition, the webserver provides the following standard services: sequence search; keyword search for genomes, superfamilies and sequence identifiers; and multiple alignment of genomic, PDB and custom sequences.
NMR conformational properties of an Anthrax Lethal Factor domain studied by multiple amino acid-selective labeling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vourtsis, Dionysios J.; Chasapis, Christos T.; Pairas, George

2014-07-18

Highlights: • A polypeptide, N-ALF{sub 233}, was overexpressed in E. coli and successfully isolated. • We produced {sup 2}H/{sup 15}N/{sup 13}C labeled protein samples. • Amino acid selective approaches were applied. • We acquired several heteronuclear NMR spectra, to complete the backbone assignment. • Prediction of the secondary structure was performed. - Abstract: NMR-based structural biology urgently needs cost- and time-effective methods to assist both in the process of acquiring high-resolution NMR spectra and their subsequent analysis. Especially for bigger proteins (>20 kDa) selective labeling is a frequently used means of sequence-specific assignment. In this work we present the successfulmore » overexpression of a polypeptide of 233 residues, corresponding to the structured part of the N-terminal domain of Anthrax Lethal Factor, using Escherichia coli expression system. The polypeptide was subsequently isolated in pure, soluble form and analyzed structurally by solution NMR spectroscopy. Due to the non-satisfying quality and resolution of the spectra of this 27 kDa protein, an almost complete backbone assignment became feasible only by the combination of uniform and novel amino acid-selective labeling schemes. Moreover, amino acid-type selective triple-resonance NMR experiments proved to be very helpful.« less
Metabarcoding of marine nematodes – evaluation of reference datasets used in tree-based taxonomy assignment approach

PubMed Central

2016-01-01

Abstract Background Metabarcoding is becoming a common tool used to assess and compare diversity of organisms in environmental samples. Identification of OTUs is one of the critical steps in the process and several taxonomy assignment methods were proposed to accomplish this task. This publication evaluates the quality of reference datasets, alongside with several alignment and phylogeny inference methods used in one of the taxonomy assignment methods, called tree-based approach. This approach assigns anonymous OTUs to taxonomic categories based on relative placements of OTUs and reference sequences on the cladogram and support that these placements receive. New information In tree-based taxonomy assignment approach, reliable identification of anonymous OTUs is based on their placement in monophyletic and highly supported clades together with identified reference taxa. Therefore, it requires high quality reference dataset to be used. Resolution of phylogenetic trees is strongly affected by the presence of erroneous sequences as well as alignment and phylogeny inference methods used in the process. Two preparation steps are essential for the successful application of tree-based taxonomy assignment approach. Curated collections of genetic information do include erroneous sequences. These sequences have detrimental effect on the resolution of cladograms used in tree-based approach. They must be identified and excluded from the reference dataset beforehand. Various combinations of multiple sequence alignment and phylogeny inference methods provide cladograms with different topology and bootstrap support. These combinations of methods need to be tested in order to determine the one that gives highest resolution for the particular reference dataset. Completing the above mentioned preparation steps is expected to decrease the number of unassigned OTUs and thus improve the results of the tree-based taxonomy assignment approach. PMID:27932919
Metabarcoding of marine nematodes - evaluation of reference datasets used in tree-based taxonomy assignment approach.

PubMed

Holovachov, Oleksandr

2016-01-01

Metabarcoding is becoming a common tool used to assess and compare diversity of organisms in environmental samples. Identification of OTUs is one of the critical steps in the process and several taxonomy assignment methods were proposed to accomplish this task. This publication evaluates the quality of reference datasets, alongside with several alignment and phylogeny inference methods used in one of the taxonomy assignment methods, called tree-based approach. This approach assigns anonymous OTUs to taxonomic categories based on relative placements of OTUs and reference sequences on the cladogram and support that these placements receive. In tree-based taxonomy assignment approach, reliable identification of anonymous OTUs is based on their placement in monophyletic and highly supported clades together with identified reference taxa. Therefore, it requires high quality reference dataset to be used. Resolution of phylogenetic trees is strongly affected by the presence of erroneous sequences as well as alignment and phylogeny inference methods used in the process. Two preparation steps are essential for the successful application of tree-based taxonomy assignment approach. Curated collections of genetic information do include erroneous sequences. These sequences have detrimental effect on the resolution of cladograms used in tree-based approach. They must be identified and excluded from the reference dataset beforehand.Various combinations of multiple sequence alignment and phylogeny inference methods provide cladograms with different topology and bootstrap support. These combinations of methods need to be tested in order to determine the one that gives highest resolution for the particular reference dataset.Completing the above mentioned preparation steps is expected to decrease the number of unassigned OTUs and thus improve the results of the tree-based taxonomy assignment approach.

The gene space in wheat: the complete γ-gliadin gene family from the wheat cultivar Chinese Spring.

PubMed

Anderson, Olin D; Huo, Naxin; Gu, Yong Q

2013-06-01

The complete set of unique γ-gliadin genes is described for the wheat cultivar Chinese Spring using a combination of expressed sequence tag (EST) and Roche 454 DNA sequences. Assemblies of Chinese Spring ESTs yielded 11 different γ-gliadin gene sequences. Two of the sequences encode identical polypeptides and are assumed to be the result of a recent gene duplication. One gene has a 3' coding mutation that changes the reading frame in the final eight codons. A second assembly of Chinese Spring γ-gliadin sequences was generated using Roche 454 total genomic DNA sequences. The 454 assembly confirmed the same 11 active genes as the EST assembly plus two pseudogenes not represented by ESTs. These 13 γ-gliadin sequences represent the complete unique set of γ-gliadin genes for cv Chinese Spring, although not ruled out are additional genes that are exact duplications of these 13 genes. A comparison with the ESTs of two other hexaploid cultivars (Butte 86 and Recital) finds that the most active genes are present in all three cultivars, with exceptions likely due to too few ESTs for detection in Butte 86 and Recital. A comparison of the numbers of ESTs per gene indicates differential levels of expression within the γ-gliadin gene family. Genome assignments were made for 6 of the 13 Chinese Spring γ-gliadin genes, i.e., one assignment from a match to two γ-gliadin genes found within a tetraploid wheat A genome BAC and four genes that match four distinct γ-gliadin sequences assembled from Roche 454 sequences from Aegilops tauschii, the hexaploid wheat D-genome ancestor.
The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification products by 454 parallel sequencing.

PubMed

Binladen, Jonas; Gilbert, M Thomas P; Bollback, Jonathan P; Panitz, Frank; Bendixen, Christian; Nielsen, Rasmus; Willerslev, Eske

2007-02-14

The invention of the Genome Sequence 20 DNA Sequencing System (454 parallel sequencing platform) has enabled the rapid and high-volume production of sequence data. Until now, however, individual emulsion PCR (emPCR) reactions and subsequent sequencing runs have been unable to combine template DNA from multiple individuals, as homologous sequences cannot be subsequently assigned to their original sources. We use conventional PCR with 5'-nucleotide tagged primers to generate homologous DNA amplification products from multiple specimens, followed by sequencing through the high-throughput Genome Sequence 20 DNA Sequencing System (GS20, Roche/454 Life Sciences). Each DNA sequence is subsequently traced back to its individual source through 5'tag-analysis. We demonstrate that this new approach enables the assignment of virtually all the generated DNA sequences to the correct source once sequencing anomalies are accounted for (miss-assignment rate<0.4%). Therefore, the method enables accurate sequencing and assignment of homologous DNA sequences from multiple sources in single high-throughput GS20 run. We observe a bias in the distribution of the differently tagged primers that is dependent on the 5' nucleotide of the tag. In particular, primers 5' labelled with a cytosine are heavily overrepresented among the final sequences, while those 5' labelled with a thymine are strongly underrepresented. A weaker bias also exists with regards to the distribution of the sequences as sorted by the second nucleotide of the dinucleotide tags. As the results are based on a single GS20 run, the general applicability of the approach requires confirmation. However, our experiments demonstrate that 5'primer tagging is a useful method in which the sequencing power of the GS20 can be applied to PCR-based assays of multiple homologous PCR products. The new approach will be of value to a broad range of research areas, such as those of comparative genomics, complete mitochondrial analyses, population genetics, and phylogenetics.
The Impact of Computer-Based Assignments on Student Motivation to Complete Homework Assignments for Sixth-Grade Students

ERIC Educational Resources Information Center

Cyr, Mary Ann

2013-01-01

The purpose of this qualitative study was to examine the engagement of 11 middle school-aged students from a southeast Michigan public school, who were given laptop computers with twenty-four-hour-a-day Internet access in order to complete homework assignments. Specifically, this study examined the perceptions of sixth-grade students regarding the…
Bridge over troubled proline: assignment of intrinsically disordered proteins using (HCA)CON(CAN)H and (HCA)N(CA)CO(N)H experiments concomitantly with HNCO and i(HCA)CO(CA)NH.

PubMed

Hellman, Maarit; Piirainen, Henni; Jaakola, Veli-Pekka; Permi, Perttu

2014-01-01

NMR spectroscopy is by far the most versatile and information rich technique to study intrinsically disordered proteins (IDPs). While NMR is able to offer residue level information on structure and dynamics, assignment of chemical shift resonances in IDPs is not a straightforward process. Consequently, numerous pulse sequences and assignment protocols have been developed during past several years, targeted especially for the assignment of IDPs, including experiments that employ H(N), H(α) or (13)C detection combined with two to six indirectly detected dimensions. Here we propose two new HN-detection based pulse sequences, (HCA)CON(CAN)H and (HCA)N(CA)CO(N)H, that provide correlations with (1)H(N)(i - 1), (13)C'(i - 1) and (15)N(i), and (1)H(N)(i + 1), (13)C'(i) and (15)N(i) frequencies, respectively. Most importantly, they offer sequential links across the proline bridges and enable filling the single proline gaps during the assignment. We show that the novel experiments can efficiently complement the information available from existing HNCO and intraresidual i(HCA)CO(CA)NH pulse sequences and their concomitant usage enabled >95 % assignment of backbone resonances in cytoplasmic tail of adenosine receptor A2A in comparison to 73 % complete assignment using the HNCO/i(HCA)CO(CA)NH data alone.
The Relationship between Successful Completion and Sequential Movement in Self-Paced Distance Courses

ERIC Educational Resources Information Center

Lim, Janine M.

2016-01-01

A course design question for self-paced courses includes whether or not technological measures should be used in course design to force students to follow the sequence intended by the course author. This study examined learner behavior to understand whether the sequence of student assignment submissions in a self-paced distance course is related…
Noncontiguous Finished Genome Sequence of Staphylococcus aureus KLT6, a Staphylococcal Enterotoxin B-Positive Strain Involved in a Food Poisoning Outbreak in Switzerland

PubMed Central

Tobes, Raquel; Manrique, Marina; Brozynska, Marta; Stephan, Roger; Pareja, Eduardo

2013-01-01

We present the first complete genome sequence of a Staphylococcus aureus strain assigned to clonal complex 12. The strain was isolated in a food poisoning outbreak due to contaminated potato salad in Switzerland in 2009, and it produces staphylococcal enterotoxin B. PMID:23704175
Defining precision: The precision medicine initiative trials NCI-MPACT and NCI-MATCH.

PubMed

Coyne, Geraldine O'Sullivan; Takebe, Naoko; Chen, Alice P

"Precision" trials, using rationally incorporated biomarker targets and molecularly selective anticancer agents, have become of great interest to both patients and their physicians. In the endeavor to test the cornerstone premise of precision oncotherapy, that is, determining if modulating a specific molecular aberration in a patient's tumor with a correspondingly specific therapeutic agent improves clinical outcomes, the design of clinical trials with embedded genomic characterization platforms which guide therapy are an increasing challenge. The National Cancer Institute Precision Medicine Initiative is an unprecedented large interdisciplinary collaborative effort to conceptualize and test the feasibility of trials incorporating sequencing platforms and large-scale bioinformatics processing that are not currently uniformly available to patients. National Cancer Institute-Molecular Profiling-based Assignment of Cancer Therapy and National Cancer Institute-Molecular Analysis for Therapy Choice are 2 genomic to phenotypic trials under this National Cancer Institute initiative, where treatment is selected according to predetermined genetic alterations detected using next-generation sequencing technology across a broad range of tumor types. In this article, we discuss the objectives and trial designs that have enabled the public-private partnerships required to complete the scale of both trials, as well as interim trial updates and strategic considerations that have driven data analysis and targeted therapy assignment, with the intent of elucidating further the benefits of this treatment approach for patients. Copyright © 2017. Published by Elsevier Inc.
Trial to assess the utility of genetic sequencing to improve patient outcomes

Cancer.gov

A pilot trial to assess whether assigning treatment based on specific gene mutations can provide benefit to patients with metastatic solid tumors is being launched this month by the NCI. The Molecular Profiling based Assignment of Cancer Therapeutics, or
Characterization of the temperate phage vB_RleM_PPF1 and its site-specific integration into the Rhizobium leguminosarum F1 genome.

PubMed

Halmillawewa, Anupama P; Restrepo-Córdoba, Marcela; Perry, Benjamin J; Yost, Christopher K; Hynes, Michael F

2016-02-01

Bacteriophages may play an important role in regulating population size and diversity of the root nodule symbiont Rhizobium leguminosarum, as well as participating in horizontal gene transfer. Although phages that infect this species have been isolated in the past, our knowledge of their molecular biology, and especially of genome composition, is extremely limited, and this lack of information impacts on the ability to assess phage population dynamics and limits potential agricultural applications of rhizobiophages. To help address this deficit in available sequence and biological information, the complete genome sequence of the Myoviridae temperate phage PPF1 that infects R. leguminosarum biovar viciae strain F1 was determined. The genome is 54,506 bp in length with an average G+C content of 61.9 %. The genome contains 94 putative open reading frames (ORFs) and 74.5 % of these predicted ORFs share homology at the protein level with previously reported sequences in the database. However, putative functions could only be assigned to 25.5 % (24 ORFs) of the predicted genes. PPF1 was capable of efficiently lysogenizing its rhizobial host R. leguminosarum F1. The site-specific recombination system of the phage targets an integration site that lies within a putative tRNA-Pro (CGG) gene in R. leguminosarum F1. Upon integration, the phage is capable of restoring the disrupted tRNA gene, owing to the 50 bp homologous sequence (att core region) it shares with its rhizobial host genome. Phage PPF1 is the first temperate phage infecting members of the genus Rhizobium for which a complete genome sequence, as well as other biological data such as the integration site, is available.
Genome Sequences of Populus tremula Chloroplast and Mitochondrion: Implications for Holistic Poplar Breeding

PubMed Central

Mader, Malte; Le Paslier, Marie-Christine; Bounon, Rémi; Berard, Aurélie; Vettori, Cristina; Schroeder, Hilke; Leplé, Jean-Charles; Fladung, Matthias

2016-01-01

Complete Populus genome sequences are available for the nucleus (P. trichocarpa; section Tacamahaca) and for chloroplasts (seven species), but not for mitochondria. Here, we provide the complete genome sequences of the chloroplast and the mitochondrion for the clones P. tremula W52 and P. tremula x P. alba 717-1B4 (section Populus). The organization of the chloroplast genomes of both Populus clones is described. A phylogenetic tree constructed from all available complete chloroplast DNA sequences of Populus was not congruent with the assignment of the related species to different Populus sections. In total, 3,024 variable nucleotide positions were identified among all compared Populus chloroplast DNA sequences. The 5-prime part of the LSC from trnH to atpA showed the highest frequency of variations. The variable positions included 163 positions with SNPs allowing for differentiating the two clones with P. tremula chloroplast genomes (W52, 717-1B4) from the other seven Populus individuals. These potential P. tremula-specific SNPs were displayed as a whole-plastome barcode on the P. tremula W52 chloroplast DNA sequence. Three of these SNPs and one InDel in the trnH-psbA linker were successfully validated by Sanger sequencing in an extended set of Populus individuals. The complete mitochondrial genome sequence of P. tremula is the first in the family of Salicaceae. The mitochondrial genomes of the two clones are 783,442 bp (W52) and 783,513 bp (717-1B4) in size, structurally very similar and organized as single circles. DNA sequence regions with high similarity to the W52 chloroplast sequence account for about 2% of the W52 mitochondrial genome. The mean SNP frequency was found to be nearly six fold higher in the chloroplast than in the mitochondrial genome when comparing 717-1B4 with W52. The availability of the genomic information of all three DNA-containing cell organelles will allow a holistic approach in poplar molecular breeding in the future. PMID:26800039
Proposals for the classification of human rhinovirus species A, B and C into genotypically assigned types

PubMed Central

McIntyre, Chloe L.; Knowles, Nick J.

2013-01-01

Human rhinoviruses (HRVs) frequently cause mild upper respiratory tract infections and more severe disease manifestations such as bronchiolitis and asthma exacerbations. HRV is classified into three species within the genus Enterovirus of the family Picornaviridae. HRV species A and B contain 75 and 25 serotypes identified by cross-neutralization assays, although the use of such assays for routine HRV typing is hampered by the large number of serotypes, replacement of virus isolation by molecular methods in HRV diagnosis and the poor or absent replication of HRV species C in cell culture. To address these problems, we propose an alternative, genotypic classification of HRV-based genetic relatedness analogous to that used for enteroviruses. Nucleotide distances between 384 complete VP1 sequences of currently assigned HRV (sero)types identified divergence thresholds of 13, 12 and 13 % for species A, B and C, respectively, that divided inter- and intra-type comparisons. These were paralleled by 10, 9.5 and 10 % thresholds in the larger dataset of >3800 VP4 region sequences. Assignments based on VP1 sequences led to minor revisions of existing type designations (such as the reclassification of serotype pairs, e.g. A8/A95 and A29/A44, as single serotypes) and the designation of new HRV types A101–106, B101–103 and C34–C51. A protocol for assignment and numbering of new HRV types using VP1 sequences and the restriction of VP4 sequence comparisons to type identification and provisional type assignments is proposed. Genotypic assignment and identification of HRV types will be of considerable value in the future investigation of type-associated differences in disease outcomes, transmission and epidemiology. PMID:23677786
Application of a fast sorting algorithm to the assignment of mass spectrometric cross-linking data.

PubMed

Petrotchenko, Evgeniy V; Borchers, Christoph H

2014-09-01

Cross-linking combined with MS involves enzymatic digestion of cross-linked proteins and identifying cross-linked peptides. Assignment of cross-linked peptide masses requires a search of all possible binary combinations of peptides from the cross-linked proteins' sequences, which becomes impractical with increasing complexity of the protein system and/or if digestion enzyme specificity is relaxed. Here, we describe the application of a fast sorting algorithm to search large sequence databases for cross-linked peptide assignments based on mass. This same algorithm has been used previously for assigning disulfide-bridged peptides (Choi et al., ), but has not previously been applied to cross-linking studies. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Unravelling Glucan Recognition Systems by Glycome Microarrays Using the Designer Approach and Mass Spectrometry*

PubMed Central

Palma, Angelina S.; Liu, Yan; Zhang, Hongtao; Zhang, Yibing; McCleary, Barry V.; Yu, Guangli; Huang, Qilin; Guidolin, Leticia S.; Ciocchini, Andres E.; Torosantucci, Antonella; Wang, Denong; Carvalho, Ana Luísa; Fontes, Carlos M. G. A.; Mulloy, Barbara; Childs, Robert A.; Feizi, Ten; Chai, Wengang

2015-01-01

Glucans are polymers of d-glucose with differing linkages in linear or branched sequences. They are constituents of microbial and plant cell-walls and involved in important bio-recognition processes, including immunomodulation, anticancer activities, pathogen virulence, and plant cell-wall biodegradation. Translational possibilities for these activities in medicine and biotechnology are considerable. High-throughput micro-methods are needed to screen proteins for recognition of specific glucan sequences as a lead to structure–function studies and their exploitation. We describe construction of a “glucome” microarray, the first sequence-defined glycome-scale microarray, using a “designer” approach from targeted ligand-bearing glucans in conjunction with a novel high-sensitivity mass spectrometric sequencing method, as a screening tool to assign glucan recognition motifs. The glucome microarray comprises 153 oligosaccharide probes with high purity, representing major sequences in glucans. Negative-ion electrospray tandem mass spectrometry with collision-induced dissociation was used for complete linkage analysis of gluco-oligosaccharides in linear “homo” and “hetero” and branched sequences. The system is validated using antibodies and carbohydrate-binding modules known to target α- or β-glucans in different biological contexts, extending knowledge on their specificities, and applied to reveal new information on glucan recognition by two signaling molecules of the immune system against pathogens: Dectin-1 and DC-SIGN. The sequencing of the glucan oligosaccharides by the MS method and their interrogation on the microarrays provides detailed information on linkage, sequence and chain length requirements of glucan-recognizing proteins, and are a sensitive means of revealing unsuspected sequences in the polysaccharides. PMID:25670804
MIPS Arabidopsis thaliana Database (MAtDB): an integrated biological knowledge resource based on the first complete plant genome

PubMed Central

Schoof, Heiko; Zaccaria, Paolo; Gundlach, Heidrun; Lemcke, Kai; Rudd, Stephen; Kolesov, Grigory; Arnold, Roland; Mewes, H. W.; Mayer, Klaus F. X.

2002-01-01

Arabidopsis thaliana is the first plant for which the complete genome has been sequenced and published. Annotation of complex eukaryotic genomes requires more than the assignment of genetic elements to the sequence. Besides completing the list of genes, we need to discover their cellular roles, their regulation and their interactions in order to understand the workings of the whole plant. The MIPS Arabidopsis thaliana Database (MAtDB; http://mips.gsf.de/proj/thal/db) started out as a repository for genome sequence data in the European Scientists Sequencing Arabidopsis (ESSA) project and the Arabidopsis Genome Initiative. Our aim is to transform MAtDB into an integrated biological knowledge resource by integrating diverse data, tools, query and visualization capabilities and by creating a comprehensive resource for Arabidopsis as a reference model for other species, including crop plants. PMID:11752263
Scheduling with genetic algorithms

NASA Technical Reports Server (NTRS)

Fennel, Theron R.; Underbrink, A. J., Jr.; Williams, George P. W., Jr.

1994-01-01

In many domains, scheduling a sequence of jobs is an important function contributing to the overall efficiency of the operation. At Boeing, we develop schedules for many different domains, including assembly of military and commercial aircraft, weapons systems, and space vehicles. Boeing is under contract to develop scheduling systems for the Space Station Payload Planning System (PPS) and Payload Operations and Integration Center (POIC). These applications require that we respect certain sequencing restrictions among the jobs to be scheduled while at the same time assigning resources to the jobs. We call this general problem scheduling and resource allocation. Genetic algorithms (GA's) offer a search method that uses a population of solutions and benefits from intrinsic parallelism to search the problem space rapidly, producing near-optimal solutions. Good intermediate solutions are probabalistically recombined to produce better offspring (based upon some application specific measure of solution fitness, e.g., minimum flowtime, or schedule completeness). Also, at any point in the search, any intermediate solution can be accepted as a final solution; allowing the search to proceed longer usually produces a better solution while terminating the search at virtually any time may yield an acceptable solution. Many processes are constrained by restrictions of sequence among the individual jobs. For a specific job, other jobs must be completed beforehand. While there are obviously many other constraints on processes, it is these on which we focussed for this research: how to allocate crews to jobs while satisfying job precedence requirements and personnel, and tooling and fixture (or, more generally, resource) requirements.
Complete nucleotide sequence and annotation of the temperate corynephage ϕ16 genome.

PubMed

Lobanova, Juliya S; Gak, Evgueni R; Andreeva, Irina G; Rybak, Konstantin V; Krylov, Alexander A; Mashko, Sergey V

2017-08-01

The complete genome of ϕ16, a temperate corynephage from Corynebacterium glutamicum ATCC 21792, was sequenced and annotated (GenBank: KY250482). The electron microscopy study of ϕ16 virion confirmed that it belongs to the family Siphoviridae. The ϕ16 genome consists of a linear double-stranded DNA molecule of 58,200 bp (G+C = 52.2%) with protruding cohesive 3'-ends of 14 nt. Four major structural proteins were separated by SDS-PAGE and identified by peptide mass fingerprinting technique. Using bioinformatics analysis, 101 putative ORFs and 5 tRNA genes were predicted. Only 27 putative gene products could be assigned to known biological functions. The ϕ16 genome was divided into functional modules. Seven putative promoters and eight putative unidirectional intrinsic terminators were predicted. One site of putative «-1» programmed ribosomal frameshifting was proposed in the phage tail assembly genome region. C. glutamicum genetic tools could be broadened by exploiting the known integrase gene (gp33) and the newly identified excisionase gene (gp47), participating in site-specific recombination between ϕ16-attP/attB.
An end-to-end workflow for engineering of biological networks from high-level specifications.

PubMed

Beal, Jacob; Weiss, Ron; Densmore, Douglas; Adler, Aaron; Appleton, Evan; Babb, Jonathan; Bhatia, Swapnil; Davidsohn, Noah; Haddock, Traci; Loyall, Joseph; Schantz, Richard; Vasilev, Viktor; Yaman, Fusun

2012-08-17

We present a workflow for the design and production of biological networks from high-level program specifications. The workflow is based on a sequence of intermediate models that incrementally translate high-level specifications into DNA samples that implement them. We identify algorithms for translating between adjacent models and implement them as a set of software tools, organized into a four-stage toolchain: Specification, Compilation, Part Assignment, and Assembly. The specification stage begins with a Boolean logic computation specified in the Proto programming language. The compilation stage uses a library of network motifs and cellular platforms, also specified in Proto, to transform the program into an optimized Abstract Genetic Regulatory Network (AGRN) that implements the programmed behavior. The part assignment stage assigns DNA parts to the AGRN, drawing the parts from a database for the target cellular platform, to create a DNA sequence implementing the AGRN. Finally, the assembly stage computes an optimized assembly plan to create the DNA sequence from available part samples, yielding a protocol for producing a sample of engineered plasmids with robotics assistance. Our workflow is the first to automate the production of biological networks from a high-level program specification. Furthermore, the workflow's modular design allows the same program to be realized on different cellular platforms simply by swapping workflow configurations. We validated our workflow by specifying a small-molecule sensor-reporter program and verifying the resulting plasmids in both HEK 293 mammalian cells and in E. coli bacterial cells.
Expanded microbial genome coverage and improved protein family annotation in the COG database

PubMed Central

Galperin, Michael Y.; Makarova, Kira S.; Wolf, Yuri I.; Koonin, Eugene V.

2015-01-01

Microbial genome sequencing projects produce numerous sequences of deduced proteins, only a small fraction of which have been or will ever be studied experimentally. This leaves sequence analysis as the only feasible way to annotate these proteins and assign to them tentative functions. The Clusters of Orthologous Groups of proteins (COGs) database (http://www.ncbi.nlm.nih.gov/COG/), first created in 1997, has been a popular tool for functional annotation. Its success was largely based on (i) its reliance on complete microbial genomes, which allowed reliable assignment of orthologs and paralogs for most genes; (ii) orthology-based approach, which used the function(s) of the characterized member(s) of the protein family (COG) to assign function(s) to the entire set of carefully identified orthologs and describe the range of potential functions when there were more than one; and (iii) careful manual curation of the annotation of the COGs, aimed at detailed prediction of the biological function(s) for each COG while avoiding annotation errors and overprediction. Here we present an update of the COGs, the first since 2003, and a comprehensive revision of the COG annotations and expansion of the genome coverage to include representative complete genomes from all bacterial and archaeal lineages down to the genus level. This re-analysis of the COGs shows that the original COG assignments had an error rate below 0.5% and allows an assessment of the progress in functional genomics in the past 12 years. During this time, functions of many previously uncharacterized COGs have been elucidated and tentative functional assignments of many COGs have been validated, either by targeted experiments or through the use of high-throughput methods. A particularly important development is the assignment of functions to several widespread, conserved proteins many of which turned out to participate in translation, in particular rRNA maturation and tRNA modification. The new version of the COGs is expected to become an important tool for microbial genomics. PMID:25428365
Protein evolution analysis of S-hydroxynitrile lyase by complete sequence design utilizing the INTMSAlign software.

PubMed

Nakano, Shogo; Asano, Yasuhisa

2015-02-03

Development of software and methods for design of complete sequences of functional proteins could contribute to studies of protein engineering and protein evolution. To this end, we developed the INTMSAlign software, and used it to design functional proteins and evaluate their usefulness. The software could assign both consensus and correlation residues of target proteins. We generated three protein sequences with S-selective hydroxynitrile lyase (S-HNL) activity, which we call designed S-HNLs; these proteins folded as efficiently as the native S-HNL. Sequence and biochemical analysis of the designed S-HNLs suggested that accumulation of neutral mutations occurs during the process of S-HNLs evolution from a low-activity form to a high-activity (native) form. Taken together, our results demonstrate that our software and the associated methods could be applied not only to design of complete sequences, but also to predictions of protein evolution, especially within families such as esterases and S-HNLs.
Protein evolution analysis of S-hydroxynitrile lyase by complete sequence design utilizing the INTMSAlign software

NASA Astrophysics Data System (ADS)

Nakano, Shogo; Asano, Yasuhisa

2015-02-01

Development of software and methods for design of complete sequences of functional proteins could contribute to studies of protein engineering and protein evolution. To this end, we developed the INTMSAlign software, and used it to design functional proteins and evaluate their usefulness. The software could assign both consensus and correlation residues of target proteins. We generated three protein sequences with S-selective hydroxynitrile lyase (S-HNL) activity, which we call designed S-HNLs; these proteins folded as efficiently as the native S-HNL. Sequence and biochemical analysis of the designed S-HNLs suggested that accumulation of neutral mutations occurs during the process of S-HNLs evolution from a low-activity form to a high-activity (native) form. Taken together, our results demonstrate that our software and the associated methods could be applied not only to design of complete sequences, but also to predictions of protein evolution, especially within families such as esterases and S-HNLs.

Molecular characterization of pea enation mosaic virus and bean leafroll virus from the Pacific Northwest, USA.

PubMed

Vemulapati, B; Druffel, K L; Eigenbrode, S D; Karasev, A; Pappu, H R

2010-10-01

The family Luteoviridae consists of eight viruses assigned to three different genera, Luteovirus, Polerovirus and Enamovirus. The complete genomic sequences of pea enation mosaic virus (genus Enamovirus) and bean leafroll virus (genus Luteovirus) from the Pacific Northwest, USA, were determined. Annotation, sequence comparisons, and phylogenetic analysis of selected genes together with those of known polero- and enamoviruses were conducted.
When species matches are unavailable are DNA barcodes correctly assigned to higher taxa? An assessment using sphingid moths

PubMed Central

2011-01-01

Background When a specimen belongs to a species not yet represented in DNA barcode reference libraries there is disagreement over the effectiveness of using sequence comparisons to assign the query accurately to a higher taxon. Library completeness and the assignment criteria used have been proposed as critical factors affecting the accuracy of such assignments but have not been thoroughly investigated. We explored the accuracy of assignments to genus, tribe and subfamily in the Sphingidae, using the almost complete global DNA barcode reference library (1095 species) available for this family. Costa Rican sphingids (118 species), a well-documented, diverse subset of the family, with each of the tribes and subfamilies represented were used as queries. We simulated libraries with different levels of completeness (10-100% of the available species), and recorded assignments (positive or ambiguous) and their accuracy (true or false) under six criteria. Results A liberal tree-based criterion assigned 83% of queries accurately to genus, 74% to tribe and 90% to subfamily, compared to a strict tree-based criterion, which assigned 75% of queries accurately to genus, 66% to tribe and 84% to subfamily, with a library containing 100% of available species (but excluding the species of the query). The greater number of true positives delivered by more relaxed criteria was negatively balanced by the occurrence of more false positives. This effect was most sharply observed with libraries of the lowest completeness where, for example at the genus level, 32% of assignments were false positives with the liberal criterion versus < 1% when using the strict. We observed little difference (< 8% using the liberal criterion) however, in the overall accuracy of the assignments between the lowest and highest levels of library completeness at the tribe and subfamily level. Conclusions Our results suggest that when using a strict tree-based criterion for higher taxon assignment with DNA barcodes, the likelihood of assigning a query a genus name incorrectly is very low, if a genus name is provided it has a high likelihood of being accurate, and if no genus match is available the query can nevertheless be assigned to a subfamily with high accuracy regardless of library completeness. DNA barcoding often correctly assigned sphingid moths to higher taxa when species matches were unavailable, suggesting that barcode reference libraries can be useful for higher taxon assignments long before they achieve complete species coverage. PMID:21806794
Flying Cassini with Virtual Operations Teams

NASA Technical Reports Server (NTRS)

Dodd, Suzanne; Gustavson, Robert

1998-01-01

The Cassini Program's challenge is to fly a large, complex mission with a reduced operations budget. A consequence of the reduced budget is elimination of the large, centrally located group traditionally used for uplink operations. Instead, responsibility for completing parts of the uplink function is distributed throughout the Program. A critical strategy employed to handle this challenge is the use of Virtual Uplink Operations Teams. A Virtual Team is comprised of a group of people with the necessary mix of engineering and science expertise who come together for the purpose of building a specific uplink product. These people are drawn from throughout the Cassini Program and participate across a large geographical area (from Germany to the West coast of the USA), covering ten time zones. The participants will often split their time between participating in the Virtual Team and accomplishing their core responsibilities, requiring significant planning and time management. When the particular uplink product task is complete, the Virtual Team disbands and the members turn back to their home organization element for future work assignments. This time-sharing of employees is used on Cassini to build mission planning products, via the Mission Planning Virtual Team, and sequencing products and monitoring of the sequence execution, via the Sequence Virtual Team. This challenging, multitasking approach allows efficient use of personnel in a resource constrained environment.
Assessing the Impact of Sequencing Practicums for Welding in Agricultural Mechanics

ERIC Educational Resources Information Center

Rose, Malcolm; Pate, Michael L.; Lawver, Rebecca G.; Warnick, Brian K.; Dai, Xin

2015-01-01

This study examined the impact of sequencing practicums for welding on students' ability to perform a 1F (flat position-fillet lap joint) weld on low-carbon steel. Participants were randomly assigned a specific practice sequence of welding for using gas metal arc welding (GMAW) and shielded metal arc welding (SMAW). A total of 71 participants…
Undergraduates improve upon published crystal structure in class assignment.

PubMed

Horowitz, Scott; Koldewey, Philipp; Bardwell, James C

2014-01-01

Recently, 57 undergraduate students at the University of Michigan were assigned the task of solving a crystal structure, given only the electron density map of a 1.3 Å crystal structure from the electron density server, and the position of the N-terminal amino acid. To test their knowledge of amino acid chemistry, the students were not given the protein sequence. With minimal direction from the instructor on how the students should complete the assignment, the students fared remarkably well in this task, with over half the class able to reconstruct the original sequence with over 77% sequence identity, and with structures whose median ranked in the 91(st) percentile of all structures of comparable resolution in terms of structure quality. Fourteen percent of the students' structures produced Molprobity steric clash validation scores even better than that of the original structure, suggesting that multiple students achieved an improvement in the overall structure quality compared to the published structure. Students were able to delineate limiting case chemical environments, such as charged interactions or complete solvent exposure, but were less able to distinguish finer details of hydrogen bonding or hydrophobicity. Our results prompt several questions: why were students able to perform so well in their structural validation scores? How were some students able to outperform the 88% sequence identity mark that would constitute a perfect score, given the level of degenerate density or surface residues with poor density? And how can the methodology used by the best students inform the practices of professional X-ray crystallographers? Copyright © 2014 Wiley Periodicals, Inc.
Genomics of Three New Bacteriophages Useful in the Biocontrol of Salmonella

PubMed Central

Bardina, Carlota; Colom, Joan; Spricigo, Denis A.; Otero, Jennifer; Sánchez-Osuna, Miquel; Cortés, Pilar; Llagostera, Montserrat

2016-01-01

Non-typhoid Salmonella is the principal pathogen related to food-borne diseases throughout the world. Widespread antibiotic resistance has adversely affected human health and has encouraged the search for alternative antimicrobial agents. The advances in bacteriophage therapy highlight their use in controlling a broad spectrum of food-borne pathogens. One requirement for the use of bacteriophages as antibacterials is the characterization of their genomes. In this work, complete genome sequencing and molecular analyses were carried out for three new virulent Salmonella-specific bacteriophages (UAB_Phi20, UAB_Phi78, and UAB_Phi87) able to infect a broad range of Salmonella strains. Sequence analysis of the genomes of UAB_Phi20, UAB_Phi78, and UAB_Phi87 bacteriophages did not evidence the presence of known virulence-associated and antibiotic resistance genes, and potential immunoreactive food allergens. The UAB_Phi20 genome comprised 41,809 base pairs with 80 open reading frames (ORFs); 24 of them with assigned function. Genome sequence showed a high homology of UAB_Phi20 with Salmonella bacteriophage P22 and other P22likeviruses genus of the Podoviridae family, including ST64T and ST104. The DNA of UAB_Phi78 contained 44,110 bp including direct terminal repeats (DTR) of 179 bp and 58 putative ORFs were predicted and 20 were assigned function. This bacteriophage was assigned to the SP6likeviruses genus of the Podoviridae family based on its high similarity not only with SP6 but also with the K1-5, K1E, and K1F bacteriophages, all of which infect Escherichia coli. The UAB_Phi87 genome sequence consisted of 87,669 bp with terminal direct repeats of 608 bp; although 148 ORFs were identified, putative functions could be assigned to only 29 of them. Sequence comparisons revealed the mosaic structure of UAB_Phi87 and its high similarity with bacteriophages Felix O1 and wV8 of E. coli with respect to genetic content and functional organization. Phylogenetic analysis of large terminase subunits confirms their packaging strategies and grouping to the different phage genus type. All these studies are necessary for the development and the use of an efficient cocktail with commercial applications in bacteriophage therapy against Salmonella. PMID:27148229
Improved Annotation of 3′ Untranslated Regions and Complex Loci by Combination of Strand-Specific Direct RNA Sequencing, RNA-Seq and ESTs

PubMed Central

Song, Junfang; Duc, Céline; Storey, Kate G.; McLean, W. H. Irwin; Brown, Sara J.; Simpson, Gordon G.; Barton, Geoffrey J.

2014-01-01

The reference annotations made for a genome sequence provide the framework for all subsequent analyses of the genome. Correct and complete annotation in addition to the underlying genomic sequence is particularly important when interpreting the results of RNA-seq experiments where short sequence reads are mapped against the genome and assigned to genes according to the annotation. Inconsistencies in annotations between the reference and the experimental system can lead to incorrect interpretation of the effect on RNA expression of an experimental treatment or mutation in the system under study. Until recently, the genome-wide annotation of 3′ untranslated regions received less attention than coding regions and the delineation of intron/exon boundaries. In this paper, data produced for samples in Human, Chicken and A. thaliana by the novel single-molecule, strand-specific, Direct RNA Sequencing technology from Helicos Biosciences which locates 3′ polyadenylation sites to within +/− 2 nt, were combined with archival EST and RNA-Seq data. Nine examples are illustrated where this combination of data allowed: (1) gene and 3′ UTR re-annotation (including extension of one 3′ UTR by 5.9 kb); (2) disentangling of gene expression in complex regions; (3) clearer interpretation of small RNA expression and (4) identification of novel genes. While the specific examples displayed here may become obsolete as genome sequences and their annotations are refined, the principles laid out in this paper will be of general use both to those annotating genomes and those seeking to interpret existing publically available annotations in the context of their own experimental data. PMID:24722185
Full-length genome sequences of five hepatitis C virus isolates representing subtypes 3g, 3h, 3i and 3k, and a unique genotype 3 variant.

PubMed

Lu, Ling; Li, Chunhua; Yuan, Jie; Lu, Teng; Okamoto, Hiroaki; Murphy, Donald G

2013-03-01

We characterized the full-length genomes of five distinct hepatitis C virus (HCV)-3 isolates. These represent the first complete genomes for subtypes 3g and 3h, the second such genomes for 3k and 3i, and of one novel variant presently not assigned to a subtype. Each genome was determined from 18-25 overlapping fragments. They had lengths of 9579-9660 nt and each contained a single ORF encoding 3020-3025 aa. They were isolated from five patients residing in Canada; four were of Asian origin and one was of Somali origin. Phylogenetic analysis using 64 partial NS5B sequences differentiated 10 assigned subtypes, 3a-3i and 3k, and two additional lineages within genotype 3. From the data of this study, HCV-3 full-length sequences are now available for six of the assigned subtypes and one unassigned. Our findings should add insights to HCV evolutionary studies and clinical applications.
Implementation of Objective PASC-Derived Taxon Demarcation Criteria for Official Classification of Filoviruses.

PubMed

Bào, Yīmíng; Amarasinghe, Gaya K; Basler, Christopher F; Bavari, Sina; Bukreyev, Alexander; Chandran, Kartik; Dolnik, Olga; Dye, John M; Ebihara, Hideki; Formenty, Pierre; Hewson, Roger; Kobinger, Gary P; Leroy, Eric M; Mühlberger, Elke; Netesov, Sergey V; Patterson, Jean L; Paweska, Janusz T; Smither, Sophie J; Takada, Ayato; Towner, Jonathan S; Volchkov, Viktor E; Wahl-Jensen, Victoria; Kuhn, Jens H

2017-05-11

The mononegaviral family Filoviridae has eight members assigned to three genera and seven species. Until now, genus and species demarcation were based on arbitrarily chosen filovirus genome sequence divergence values (≈50% for genera, ≈30% for species) and arbitrarily chosen phenotypic virus or virion characteristics. Here we report filovirus genome sequence-based taxon demarcation criteria using the publicly accessible PAirwise Sequencing Comparison (PASC) tool of the US National Center for Biotechnology Information (Bethesda, MD, USA). Comparison of all available filovirus genomes in GenBank using PASC revealed optimal genus demarcation at the 55-58% sequence diversity threshold range for genera and at the 23-36% sequence diversity threshold range for species. Because these thresholds do not change the current official filovirus classification, these values are now implemented as filovirus taxon demarcation criteria that may solely be used for filovirus classification in case additional data are absent. A near-complete, coding-complete, or complete filovirus genome sequence will now be required to allow official classification of any novel "filovirus." Classification of filoviruses into existing taxa or determining the need for novel taxa is now straightforward and could even become automated using a presented algorithm/flowchart rooted in RefSeq (type) sequences.
Chromosomal Organization and Sequence Diversity of Genes Encoding Lachrymatory Factor Synthase in Allium cepa L.

PubMed Central

Masamura, Noriya; McCallum, John; Khrustaleva, Ludmila; Kenel, Fernand; Pither-Joyce, Meegham; Shono, Jinji; Suzuki, Go; Mukai, Yasuhiko; Yamauchi,, Naoki; Shigyo, Masayoshi

2012-01-01

Lachrymatory factor synthase (LFS) catalyzes the formation of lachrymatory factor, one of the most distinctive traits of bulb onion (Allium cepa L.). Therefore, we used LFS as a model for a functional gene in a huge genome, and we examined the chromosomal organization of LFS in A. cepa by multiple approaches. The first-level analysis completed the chromosomal assignment of LFS gene to chromosome 5 of A. cepa via the use of a complete set of A. fistulosum–shallot (A. cepa L. Aggregatum group) monosomic addition lines. Subsequent use of an F2 mapping population from the interspecific cross A. cepa × A. roylei confirmed the assignment of an LFS locus to this chromosome. Sequence comparison of two BAC clones bearing LFS genes, LFS amplicons from diverse germplasm, and expressed sequences from a doubled haploid line revealed variation consistent with duplicated LFS genes. Furthermore, the BAC-FISH study using the two BAC clones as a probe showed that LFS genes are localized in the proximal region of the long arm of the chromosome. These results suggested that LFS in A. cepa is transcribed from at least two loci and that they are localized on chromosome 5. PMID:22690373
HLA genotyping by next-generation sequencing of complementary DNA.

PubMed

Segawa, Hidenobu; Kukita, Yoji; Kato, Kikuya

2017-11-28

Genotyping of the human leucocyte antigen (HLA) is indispensable for various medical treatments. However, unambiguous genotyping is technically challenging due to high polymorphism of the corresponding genomic region. Next-generation sequencing is changing the landscape of genotyping. In addition to high throughput of data, its additional advantage is that DNA templates are derived from single molecules, which is a strong merit for the phasing problem. Although most currently developed technologies use genomic DNA, use of cDNA could enable genotyping with reduced costs in data production and analysis. We thus developed an HLA genotyping system based on next-generation sequencing of cDNA. Each HLA gene was divided into 3 or 4 target regions subjected to PCR amplification and subsequent sequencing with Ion Torrent PGM. The sequence data were then subjected to an automated analysis. The principle of the analysis was to construct candidate sequences generated from all possible combinations of variable bases and arrange them in decreasing order of the number of reads. Upon collecting candidate sequences from all target regions, 2 haplotypes were usually assigned. Cases not assigned 2 haplotypes were forwarded to 4 additional processes: selection of candidate sequences applying more stringent criteria, removal of artificial haplotypes, selection of candidate sequences with a relaxed threshold for sequence matching, and countermeasure for incomplete sequences in the HLA database. The genotyping system was evaluated using 30 samples; the overall accuracy was 97.0% at the field 3 level and 98.3% at the G group level. With one sample, genotyping of DPB1 was not completed due to short read size. We then developed a method for complete sequencing of individual molecules of the DPB1 gene, using the molecular barcode technology. The performance of the automatic genotyping system was comparable to that of systems developed in previous studies. Thus, next-generation sequencing of cDNA is a viable option for HLA genotyping.
WNV Typer: a server for genotyping of West Nile viruses using an alignment-free method based on a return time distribution.

PubMed

Kolekar, Pandurang; Hake, Nilesh; Kale, Mohan; Kulkarni-Kale, Urmila

2014-03-01

West Nile virus (WNV), genus Flavivirus, family Flaviviridae, is a major cause of viral encephalitis with broad host range and global spread. The virus has undergone a series of evolutionary changes with emergence of various genotypic lineages that are known to differ in type and severity of the diseases caused. Currently, genotyping is carried out using molecular phylogeny of complete coding sequences and genotype is assigned based on proximity to reference genotypes in tree topology. Efficient epidemiological surveillance of WNVs demands development of objective criteria for typing. An alignment-free approach based on return time distribution (RTD) of k-mers has been validated for genotyping of WNVs. The RTDs of complete genome sequences at k=7 were found to be optimum for classification of the known lineages of WNVs as well as for genotyping. It provides time and computationally efficient alternative for genome based annotation of WNV lineages. The development of a WNV Typer server based on RTD is described (http://bioinfo.net.in/wnv/homepage.html). Both the method and the server have 100% sensitivity and specificity. Copyright © 2014 The Authors. Published by Elsevier B.V. All rights reserved.
Critical thinking and reflection exercises in a biochemistry course to improve prospective health professions students' attitudes toward physician-pharmacist collaboration.

PubMed

Van Winkle, Lon J; Cornell, Susan; Fjortoft, Nancy; Bjork, Bryan C; Chandar, Nalini; Green, Jacalyn M; La Salle, Sophie; Viselli, Susan M; Burdick, Paulette; Lynch, Sean M

2013-10-14

To determine the impact of performing critical-thinking and reflection assignments within interdisciplinary learning teams in a biochemistry course on pharmacy students' and prospective health professions students' collaboration scores. Pharmacy students and prospective medical, dental, and other health professions students enrolled in a sequence of 2 required biochemistry courses. They were randomly assigned to interdisciplinary learning teams in which they were required to complete case assignments, thinking and reflection exercises, and a team service-learning project. Students were asked to complete the Scale of Attitudes Toward Physician-Pharmacist Collaboration prior to the first course, following the first course, and following the second course. The physician-pharmacist collaboration scores of prospective health professions students increased significantly (p<0.001). Having prospective health professions students work in teams with pharmacy students to think and reflect in and outside the classroom improves their attitudes toward physician-pharmacist collaboration.
Expanded microbial genome coverage and improved protein family annotation in the COG database.

PubMed

Galperin, Michael Y; Makarova, Kira S; Wolf, Yuri I; Koonin, Eugene V

2015-01-01

Microbial genome sequencing projects produce numerous sequences of deduced proteins, only a small fraction of which have been or will ever be studied experimentally. This leaves sequence analysis as the only feasible way to annotate these proteins and assign to them tentative functions. The Clusters of Orthologous Groups of proteins (COGs) database (http://www.ncbi.nlm.nih.gov/COG/), first created in 1997, has been a popular tool for functional annotation. Its success was largely based on (i) its reliance on complete microbial genomes, which allowed reliable assignment of orthologs and paralogs for most genes; (ii) orthology-based approach, which used the function(s) of the characterized member(s) of the protein family (COG) to assign function(s) to the entire set of carefully identified orthologs and describe the range of potential functions when there were more than one; and (iii) careful manual curation of the annotation of the COGs, aimed at detailed prediction of the biological function(s) for each COG while avoiding annotation errors and overprediction. Here we present an update of the COGs, the first since 2003, and a comprehensive revision of the COG annotations and expansion of the genome coverage to include representative complete genomes from all bacterial and archaeal lineages down to the genus level. This re-analysis of the COGs shows that the original COG assignments had an error rate below 0.5% and allows an assessment of the progress in functional genomics in the past 12 years. During this time, functions of many previously uncharacterized COGs have been elucidated and tentative functional assignments of many COGs have been validated, either by targeted experiments or through the use of high-throughput methods. A particularly important development is the assignment of functions to several widespread, conserved proteins many of which turned out to participate in translation, in particular rRNA maturation and tRNA modification. The new version of the COGs is expected to become an important tool for microbial genomics. Published by Oxford University Press on behalf of Nucleic Acids Research 2014. This work is written by US Government employees and is in the public domain in the US.
The Monte Carlo Quiz: Encouraging Punctual Completion and Deep Processing of Assigned Readings

ERIC Educational Resources Information Center

Fernald, Peter S.

2004-01-01

The Monte Carlo Quiz (MCQ), a single-item quiz, is so named because chance, with the roll of a die, determines (a) whether the quiz is administered; (b) the specific article, chapter, or section of the assigned reading that the quiz covers; and (c) the particular question that makes up the quiz. The MCQ encourages both punctual completion and deep…
Haplogroup-specific deviation from the stepwise mutation model at the microsatellite loci DYS388 and DYS392.

PubMed

Nebel, A; Filon, D; Hohoff, C; Faerman, M; Brinkmann, B; Oppenheim, A

2001-01-01

Deviation from the stepwise mutation model (SMM) at specific human microsatellite loci has implications for population genetic and forensic investigations. In the present study, data on six Y chromosome-specific microsatellites were pooled for 455 paternally unrelated males from six Middle Eastern populations. All chromosomes were assigned to three haplogroups defined by six binary polymorphisms. Two of the microsatellite loci tested, DYS388 and DYS392, displayed marked haplogroup-specific differences in their allele variability. A bimodal distribution of short and long alleles was observed for DYS388 in haplogroup 1 and for DYS392 in haplogroups 1 and 2. Further investigation showed that the short/long alleles segregated almost completely between genealogically distinct haplogroups defined by additional binary markers. Thus, these two loci have a discriminatory power similar to a binary polymorphism. DYS388 was characterised by an extremely low mutation rate in haplogroups 2 and 3, as was DYS392 in haplogroup 3. Sequence analysis of the repeat regions at the two loci revealed no irregularities, indicating that the triplet expansion in these loci is not controlled by sequence variation at the repeat level. A high frequency of long DYS388 alleles has, so far, been found only in populations originating in the Middle East, suggesting that this microsatellite is useful as a region-specific marker.
Barcode Identifiers as a Practical Tool for Reliable Species Assignment of Medically Important Black Yeast Species

PubMed Central

Heinrichs, Guido; de Hoog, G. Sybren

2012-01-01

Herpotrichiellaceous black yeasts and relatives comprise severe pathogens flanked by nonpathogenic environmental siblings. Reliable identification by conventional methods is notoriously difficult. Molecular identification is hampered by the sequence variability in the internal transcribed spacer (ITS) domain caused by difficult-to-sequence homopolymeric regions and by poor taxonomic attribution of sequences deposited in GenBank. Here, we present a potential solution using short barcode identifiers (27 to 50 bp) based on ITS2 ribosomal DNA (rDNA), which allows unambiguous definition of species-specific fragments. Starting from proven sequences of ex-type and authentic strains, we were able to describe 103 identifiers. Multiple BLAST searches of these proposed barcode identifiers in GenBank revealed uniqueness for 100 taxonomic entities, whereas the three remaining identifiers each matched with two entities, but the species of these identifiers could easily be discriminated by differences in the remaining ITS regions. Using the proposed barcode identifiers, a 4.1-fold increase of 100% matches in GenBank was achieved in comparison to the classical approach using the complete ITS sequences. The proposed barcode identifiers will be made accessible for the diagnostic laboratory in a permanently updated online database, thereby providing a highly practical, reliable, and cost-effective tool for identification of clinically important black yeasts and relatives. PMID:22785187
Species Identification of Archaeological Skin Objects from Danish Bogs: Comparison between Mass Spectrometry-Based Peptide Sequencing and Microscopy-Based Methods

PubMed Central

Brandt, Luise Ørsted; Schmidt, Anne Lisbeth; Mannering, Ulla; Sarret, Mathilde; Kelstrup, Christian D.; Olsen, Jesper V.; Cappellini, Enrico

2014-01-01

Denmark has an extraordinarily large and well-preserved collection of archaeological skin garments found in peat bogs, dated to approximately 920 BC – AD 775. These objects provide not only the possibility to study prehistoric skin costume and technologies, but also to investigate the animal species used for the production of skin garments. Until recently, species identification of archaeological skin was primarily performed by light and scanning electron microscopy or the analysis of ancient DNA. However, the efficacy of these methods can be limited due to the harsh, mostly acidic environment of peat bogs leading to morphological and molecular degradation within the samples. We compared species assignment results of twelve archaeological skin samples from Danish bogs using Mass Spectrometry (MS)-based peptide sequencing, against results obtained using light and scanning electron microscopy. While it was difficult to obtain reliable results using microscopy, MS enabled the identification of several species-diagnostic peptides, mostly from collagen and keratins, allowing confident species discrimination even among taxonomically close organisms, such as sheep and goat. Unlike previous MS-based methods, mostly relying on peptide fingerprinting, the shotgun sequencing approach we describe aims to identify the complete extracted ancient proteome, without preselected specific targets. As an example, we report the identification, in one of the samples, of two peptides uniquely assigned to bovine foetal haemoglobin, indicating the production of skin from a calf slaughtered within the first months of its life. We conclude that MS-based peptide sequencing is a reliable method for species identification of samples from bogs. The mass spectrometry proteomics data were deposited in the ProteomeXchange Consortium with the dataset identifier PXD001029. PMID:25260035
RIEMS: a software pipeline for sensitive and comprehensive taxonomic classification of reads from metagenomics datasets.

PubMed

Scheuch, Matthias; Höper, Dirk; Beer, Martin

2015-03-03

Fuelled by the advent and subsequent development of next generation sequencing technologies, metagenomics became a powerful tool for the analysis of microbial communities both scientifically and diagnostically. The biggest challenge is the extraction of relevant information from the huge sequence datasets generated for metagenomics studies. Although a plethora of tools are available, data analysis is still a bottleneck. To overcome the bottleneck of data analysis, we developed an automated computational workflow called RIEMS - Reliable Information Extraction from Metagenomic Sequence datasets. RIEMS assigns every individual read sequence within a dataset taxonomically by cascading different sequence analyses with decreasing stringency of the assignments using various software applications. After completion of the analyses, the results are summarised in a clearly structured result protocol organised taxonomically. The high accuracy and performance of RIEMS analyses were proven in comparison with other tools for metagenomics data analysis using simulated sequencing read datasets. RIEMS has the potential to fill the gap that still exists with regard to data analysis for metagenomics studies. The usefulness and power of RIEMS for the analysis of genuine sequencing datasets was demonstrated with an early version of RIEMS in 2011 when it was used to detect the orthobunyavirus sequences leading to the discovery of Schmallenberg virus.
EFICAz2: enzyme function inference by a combined approach enhanced by machine learning.

PubMed

Arakaki, Adrian K; Huang, Ying; Skolnick, Jeffrey

2009-04-13

We previously developed EFICAz, an enzyme function inference approach that combines predictions from non-completely overlapping component methods. Two of the four components in the original EFICAz are based on the detection of functionally discriminating residues (FDRs). FDRs distinguish between member of an enzyme family that are homofunctional (classified under the EC number of interest) or heterofunctional (annotated with another EC number or lacking enzymatic activity). Each of the two FDR-based components is associated to one of two specific kinds of enzyme families. EFICAz exhibits high precision performance, except when the maximal test to training sequence identity (MTTSI) is lower than 30%. To improve EFICAz's performance in this regime, we: i) increased the number of predictive components and ii) took advantage of consensual information from the different components to make the final EC number assignment. We have developed two new EFICAz components, analogs to the two FDR-based components, where the discrimination between homo and heterofunctional members is based on the evaluation, via Support Vector Machine models, of all the aligned positions between the query sequence and the multiple sequence alignments associated to the enzyme families. Benchmark results indicate that: i) the new SVM-based components outperform their FDR-based counterparts, and ii) both SVM-based and FDR-based components generate unique predictions. We developed classification tree models to optimally combine the results from the six EFICAz components into a final EC number prediction. The new implementation of our approach, EFICAz2, exhibits a highly improved prediction precision at MTTSI < 30% compared to the original EFICAz, with only a slight decrease in prediction recall. A comparative analysis of enzyme function annotation of the human proteome by EFICAz2 and KEGG shows that: i) when both sources make EC number assignments for the same protein sequence, the assignments tend to be consistent and ii) EFICAz2 generates considerably more unique assignments than KEGG. Performance benchmarks and the comparison with KEGG demonstrate that EFICAz2 is a powerful and precise tool for enzyme function annotation, with multiple applications in genome analysis and metabolic pathway reconstruction. The EFICAz2 web service is available at: http://cssb.biology.gatech.edu/skolnick/webservice/EFICAz2/index.html.

Genome Structure of the Legume, Lotus japonicus

PubMed Central

Sato, Shusei; Nakamura, Yasukazu; Kaneko, Takakazu; Asamizu, Erika; Kato, Tomohiko; Nakao, Mitsuteru; Sasamoto, Shigemi; Watanabe, Akiko; Ono, Akiko; Kawashima, Kumiko; Fujishiro, Tsunakazu; Katoh, Midori; Kohara, Mitsuyo; Kishida, Yoshie; Minami, Chiharu; Nakayama, Shinobu; Nakazaki, Naomi; Shimizu, Yoshimi; Shinpo, Sayaka; Takahashi, Chika; Wada, Tsuyuko; Yamada, Manabu; Ohmido, Nobuko; Hayashi, Makoto; Fukui, Kiichi; Baba, Tomoya; Nakamichi, Tomoko; Mori, Hirotada; Tabata, Satoshi

2008-01-01

The legume Lotus japonicus has been widely used as a model system to investigate the genetic background of legume-specific phenomena such as symbiotic nitrogen fixation. Here, we report structural features of the L. japonicus genome. The 315.1-Mb sequences determined in this and previous studies correspond to 67% of the genome (472 Mb), and are likely to cover 91.3% of the gene space. Linkage mapping anchored 130-Mb sequences onto the six linkage groups. A total of 10 951 complete and 19 848 partial structures of protein-encoding genes were assigned to the genome. Comparative analysis of these genes revealed the expansion of several functional domains and gene families that are characteristic of L. japonicus. Synteny analysis detected traces of whole-genome duplication and the presence of synteny blocks with other plant genomes to various degrees. This study provides the first opportunity to look into the complex and unique genetic system of legumes. PMID:18511435
New FeFe-hydrogenase genes identified in a metagenomic fosmid library from a municipal wastewater treatment plant as revealed by high-throughput sequencing.

PubMed

Tomazetto, Geizecler; Wibberg, Daniel; Schlüter, Andreas; Oliveira, Valéria M

2015-01-01

A fosmid metagenomic library was constructed with total community DNA obtained from a municipal wastewater treatment plant (MWWTP), with the aim of identifying new FeFe-hydrogenase genes encoding the enzymes most important for hydrogen metabolism. The dataset generated by pyrosequencing of a fosmid library was mined to identify environmental gene tags (EGTs) assigned to FeFe-hydrogenase. The majority of EGTs representing FeFe-hydrogenase genes were affiliated with the class Clostridia, suggesting that this group is the main hydrogen producer in the MWWTP analyzed. Based on assembled sequences, three FeFe-hydrogenase genes were predicted based on detection of the L2 motif (MPCxxKxxE) in the encoded gene product, confirming true FeFe-hydrogenase sequences. These sequences were used to design specific primers to detect fosmids encoding FeFe-hydrogenase genes predicted from the dataset. Three identified fosmids were completely sequenced. The cloned genomic fragments within these fosmids are closely related to members of the Spirochaetaceae, Bacteroidales and Firmicutes, and their FeFe-hydrogenase sequences are characterized by the structure type M3, which is common to clostridial enzymes. FeFe-hydrogenase sequences found in this study represent hitherto undetected sequences, indicating the high genetic diversity regarding these enzymes in MWWTP. Results suggest that MWWTP have to be considered as reservoirs for new FeFe-hydrogenase genes. Copyright © 2014 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.
Nonencapsulated or nontypeable Haemophilus influenzae are more likely than their encapsulated or serotypeable counterparts to have mutations in their fucose operon.

PubMed

Shuel, Michelle L; Karlowsky, Kathleen E; Law, Dennis K S; Tsang, Raymond S W

2011-12-01

Population biology of Haemophilus influenzae can be studied by multilocus sequence typing (MLST), and isolates are assigned sequence types (STs) based on nucleotide sequence variations in seven housekeeping genes, including fucK. However, the ST cannot be assigned if one of the housekeeping genes is absent or cannot be detected by the current protocol. Occasionally, strains of H. influenzae have been reported to lack the fucK gene. In this study, we examined the prevalence of this mutation among our collection of H. influenzae isolates. Of the 704 isolates studied, including 282 encapsulated and 422 nonencapsulated isolates, nine were not typeable by MLST owing to failure to detect the fucK gene. All nine fucK-negative isolates were nonencapsulated and belonged to various biotypes. DNA sequencing of the fucose operon region confirmed complete deletion of genes in the operon in seven of the nine isolates, while in the remaining two isolates, some of the genes were found intact or in parts. The significance of these findings is discussed.
A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy.

PubMed

Gao, Xiang; Lin, Huaiying; Revanna, Kashi; Dong, Qunfeng

2017-05-10

Species-level classification for 16S rRNA gene sequences remains a serious challenge for microbiome researchers, because existing taxonomic classification tools for 16S rRNA gene sequences either do not provide species-level classification, or their classification results are unreliable. The unreliable results are due to the limitations in the existing methods which either lack solid probabilistic-based criteria to evaluate the confidence of their taxonomic assignments, or use nucleotide k-mer frequency as the proxy for sequence similarity measurement. We have developed a method that shows significantly improved species-level classification results over existing methods. Our method calculates true sequence similarity between query sequences and database hits using pairwise sequence alignment. Taxonomic classifications are assigned from the species to the phylum levels based on the lowest common ancestors of multiple database hits for each query sequence, and further classification reliabilities are evaluated by bootstrap confidence scores. The novelty of our method is that the contribution of each database hit to the taxonomic assignment of the query sequence is weighted by a Bayesian posterior probability based upon the degree of sequence similarity of the database hit to the query sequence. Our method does not need any training datasets specific for different taxonomic groups. Instead only a reference database is required for aligning to the query sequences, making our method easily applicable for different regions of the 16S rRNA gene or other phylogenetic marker genes. Reliable species-level classification for 16S rRNA or other phylogenetic marker genes is critical for microbiome research. Our software shows significantly higher classification accuracy than the existing tools and we provide probabilistic-based confidence scores to evaluate the reliability of our taxonomic classification assignments based on multiple database matches to query sequences. Despite its higher computational costs, our method is still suitable for analyzing large-scale microbiome datasets for practical purposes. Furthermore, our method can be applied for taxonomic classification of any phylogenetic marker gene sequences. Our software, called BLCA, is freely available at https://github.com/qunfengdong/BLCA .
UFO: a web server for ultra-fast functional profiling of whole genome protein sequences.

PubMed

Meinicke, Peter

2009-09-02

Functional profiling is a key technique to characterize and compare the functional potential of entire genomes. The estimation of profiles according to an assignment of sequences to functional categories is a computationally expensive task because it requires the comparison of all protein sequences from a genome with a usually large database of annotated sequences or sequence families. Based on machine learning techniques for Pfam domain detection, the UFO web server for ultra-fast functional profiling allows researchers to process large protein sequence collections instantaneously. Besides the frequencies of Pfam and GO categories, the user also obtains the sequence specific assignments to Pfam domain families. In addition, a comparison with existing genomes provides dissimilarity scores with respect to 821 reference proteomes. Considering the underlying UFO domain detection, the results on 206 test genomes indicate a high sensitivity of the approach. In comparison with current state-of-the-art HMMs, the runtime measurements show a considerable speed up in the range of four orders of magnitude. For an average size prokaryotic genome, the computation of a functional profile together with its comparison typically requires about 10 seconds of processing time. For the first time the UFO web server makes it possible to get a quick overview on the functional inventory of newly sequenced organisms. The genome scale comparison with a large number of precomputed profiles allows a first guess about functionally related organisms. The service is freely available and does not require user registration or specification of a valid email address.
ANCAC: amino acid, nucleotide, and codon analysis of COGs--a tool for sequence bias analysis in microbial orthologs.

PubMed

Meiler, Arno; Klinger, Claudia; Kaufmann, Michael

2012-09-08

The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG) within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC's NUCOCOG dataset as the largest one available for that purpose thus far. Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills.
ANCAC: amino acid, nucleotide, and codon analysis of COGs – a tool for sequence bias analysis in microbial orthologs

PubMed Central

2012-01-01

Background The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG) within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Results Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC’s NUCOCOG dataset as the largest one available for that purpose thus far. Conclusions Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills. PMID:22958836
A pipeline of programs for collecting and analyzing group II intron retroelement sequences from GenBank

PubMed Central

2013-01-01

Background Accurate and complete identification of mobile elements is a challenging task in the current era of sequencing, given their large numbers and frequent truncations. Group II intron retroelements, which consist of a ribozyme and an intron-encoded protein (IEP), are usually identified in bacterial genomes through their IEP; however, the RNA component that defines the intron boundaries is often difficult to identify because of a lack of strong sequence conservation corresponding to the RNA structure. Compounding the problem of boundary definition is the fact that a majority of group II intron copies in bacteria are truncated. Results Here we present a pipeline of 11 programs that collect and analyze group II intron sequences from GenBank. The pipeline begins with a BLAST search of GenBank using a set of representative group II IEPs as queries. Subsequent steps download the corresponding genomic sequences and flanks, filter out non-group II introns, assign introns to phylogenetic subclasses, filter out incomplete and/or non-functional introns, and assign IEP sequences and RNA boundaries to the full-length introns. In the final step, the redundancy in the data set is reduced by grouping introns into sets of ≥95% identity, with one example sequence chosen to be the representative. Conclusions These programs should be useful for comprehensive identification of group II introns in sequence databases as data continue to rapidly accumulate. PMID:24359548
Critical Thinking and Reflection Exercises in a Biochemistry Course to Improve Prospective Health Professions Students’ Attitudes Toward Physician-Pharmacist Collaboration

PubMed Central

Cornell, Susan; Fjortoft, Nancy; Bjork, Bryan C.; Chandar, Nalini; Green, Jacalyn M.; La Salle, Sophie; Viselli, Susan M.; Burdick, Paulette; Lynch, Sean M.

2013-01-01

Objective. To determine the impact of performing critical-thinking and reflection assignments within interdisciplinary learning teams in a biochemistry course on pharmacy students’ and prospective health professions students’ collaboration scores. Design. Pharmacy students and prospective medical, dental, and other health professions students enrolled in a sequence of 2 required biochemistry courses. They were randomly assigned to interdisciplinary learning teams in which they were required to complete case assignments, thinking and reflection exercises, and a team service-learning project. Assessment. Students were asked to complete the Scale of Attitudes Toward Physician-Pharmacist Collaboration prior to the first course, following the first course, and following the second course. The physician-pharmacist collaboration scores of prospective health professions students increased significantly (p<0.001). Conclusions. Having prospective health professions students work in teams with pharmacy students to think and reflect in and outside the classroom improves their attitudes toward physician-pharmacist collaboration. PMID:24159210
Evolutionary origins of the emergent ST796 clone of vancomycin resistant Enterococcus faecium

PubMed Central

Buultjens, Andrew H.; Lam, Margaret M.C.; Ballard, Susan; Monk, Ian R.; Mahony, Andrew A.; Grabsch, Elizabeth A.; Grayson, M. Lindsay; Pang, Stanley; Coombs, Geoffrey W.; Robinson, J. Owen; Seemann, Torsten; Howden, Benjamin P.

2017-01-01

From early 2012, a novel clone of vancomycin resistant Enterococcus faecium (assigned the multi locus sequence type ST796) was simultaneously isolated from geographically separate hospitals in south eastern Australia and New Zealand. Here we describe the complete genome sequence of Ef_aus0233, a representative ST796 E. faecium isolate. We used PacBio single molecule real-time sequencing to establish a high quality, fully assembled genome comprising a circular chromosome of 2,888,087 bp and five plasmids. Comparison of Ef_aus0233 to other E. faecium genomes shows Ef_aus0233 is a member of the epidemic hospital-adapted lineage and has evolved from an ST555-like ancestral progenitor by the accumulation or modification of five mosaic plasmids and five putative prophage, acquisition of two cryptic genomic islands, accrued chromosomal single nucleotide polymorphisms and a 80 kb region of recombination, also gaining Tn1549 and Tn916, transposons conferring resistance to vancomycin and tetracycline respectively. The genomic dissection of this new clone presented here underscores the propensity of the hospital E. faecium lineage to change, presumably in response to the specific conditions of hospital and healthcare environments. PMID:28149688
Using the TIGR gene index databases for biological discovery.

PubMed

Lee, Yuandan; Quackenbush, John

2003-11-01

The TIGR Gene Index web pages provide access to analyses of ESTs and gene sequences for nearly 60 species, as well as a number of resources derived from these. Each species-specific database is presented using a common format with a homepage. A variety of methods exist that allow users to search each species-specific database. Methods implemented currently include nucleotide or protein sequence queries using WU-BLAST, text-based searches using various sequence identifiers, searches by gene, tissue and library name, and searches using functional classes through Gene Ontology assignments. This protocol provides guidance for using the Gene Index Databases to extract information.
The evolutionary history of Saccharomyces species inferred from completed mitochondrial genomes and revision in the ‘yeast mitochondrial genetic code’

PubMed Central

Szabóová, Dana; Bielik, Peter; Poláková, Silvia; Šoltys, Katarína; Jatzová, Katarína; Szemes, Tomáš

2017-01-01

Abstract The yeast Saccharomyces are widely used to test ecological and evolutionary hypotheses. A large number of nuclear genomic DNA sequences are available, but mitochondrial genomic data are insufficient. We completed mitochondrial DNA (mtDNA) sequencing from Illumina MiSeq reads for all Saccharomyces species. All are circularly mapped molecules decreasing in size with phylogenetic distance from Saccharomyces cerevisiae but with similar gene content including regulatory and selfish elements like origins of replication, introns, free-standing open reading frames or GC clusters. Their most profound feature is species-specific alteration in gene order. The genetic code slightly differs from well-established yeast mitochondrial code as GUG is used rarely as the translation start and CGA and CGC code for arginine. The multilocus phylogeny, inferred from mtDNA, does not correlate with the trees derived from nuclear genes. mtDNA data demonstrate that Saccharomyces cariocanus should be assigned as a separate species and Saccharomyces bayanus CBS 380T should not be considered as a distinct species due to mtDNA nearly identical to Saccharomyces uvarum mtDNA. Apparently, comparison of mtDNAs should not be neglected in genomic studies as it is an important tool to understand the origin and evolutionary history of some yeast species. PMID:28992063
Automated analysis of high-throughput B-cell sequencing data reveals a high frequency of novel immunoglobulin V gene segment alleles.

PubMed

Gadala-Maria, Daniel; Yaari, Gur; Uduman, Mohamed; Kleinstein, Steven H

2015-02-24

Individual variation in germline and expressed B-cell immunoglobulin (Ig) repertoires has been associated with aging, disease susceptibility, and differential response to infection and vaccination. Repertoire properties can now be studied at large-scale through next-generation sequencing of rearranged Ig genes. Accurate analysis of these repertoire-sequencing (Rep-Seq) data requires identifying the germline variable (V), diversity (D), and joining (J) gene segments used by each Ig sequence. Current V(D)J assignment methods work by aligning sequences to a database of known germline V(D)J segment alleles. However, existing databases are likely to be incomplete and novel polymorphisms are hard to differentiate from the frequent occurrence of somatic hypermutations in Ig sequences. Here we develop a Tool for Ig Genotype Elucidation via Rep-Seq (TIgGER). TIgGER analyzes mutation patterns in Rep-Seq data to identify novel V segment alleles, and also constructs a personalized germline database containing the specific set of alleles carried by a subject. This information is then used to improve the initial V segment assignments from existing tools, like IMGT/HighV-QUEST. The application of TIgGER to Rep-Seq data from seven subjects identified 11 novel V segment alleles, including at least one in every subject examined. These novel alleles constituted 13% of the total number of unique alleles in these subjects, and impacted 3% of V(D)J segment assignments. These results reinforce the highly polymorphic nature of human Ig V genes, and suggest that many novel alleles remain to be discovered. The integration of TIgGER into Rep-Seq processing pipelines will increase the accuracy of V segment assignments, thus improving B-cell repertoire analyses.
Men and Women in Ships: Preconceptions of the Crews.

ERIC Educational Resources Information Center

Greebler, Carol S.; And Others

Preintegration attitudes and expectations of 1,936 men and 346 women assigned to six Navy ships were measured before the women reported aboard, through the administration of gender-specific versions of the "Navy in Transition" questionnaire. An additional 483 men assigned to a ship not scheduled for integration completed the…
Routine HLA-B genotyping with PCR-sequence-specific oligonucleotides detects a B*52 variant (B*5206).

PubMed

Hoelsch, K; Lenggeler, I; Pfannes, W; Knabe, H; Klein, H-G; Woelpl, A

2005-05-01

A new human leukocyte antigen (HLA)-B allele was found during routine typing of samples for a German unrelated bone marrow donor registry, the "Aktion Knochenmarkspende Bayern". After first interpretation of data of two independent low-resolution sequence-specific oligonucleotide typing tests, a B*51 variant was suggested. Further analysis via sequence-based typing identified the sequence as new B*52 allele. This new allele officially assigned as B*5206 differs from HLA-B*520102 by one nucleotide exchange in exon 2. The mutation is located at nucleotide position 274, at which a cytosine is substituted by a thymine leading to an amino acid change at protein position 67 from serine (TCC) to phenylalanine (TTC).
The End Justifies the Means, but Only in the Middle

ERIC Educational Resources Information Center

Toure-Tillery, Maferima; Fishbach, Ayelet

2012-01-01

Achieving goals often requires the completion of sequential actions, such as finishing a series of assignments to pass a class. In the course of pursuing such goals, people can decide how closely to follow their personal standards for each action. We propose that actions at the beginning and end of a sequence appear more diagnostic of the…
Complete structure of the cell surface polysaccharide of Streptococcus oralis C104: A 600-MHz NMR study

DOE Office of Scientific and Technical Information (OSTI.GOV)

Abeygunawardana, C.; Bush, C.A.; Cisar, J.O.

1991-09-03

Specific lectin-carbohydrate interactions between certain oral streptococci and actinomyces contribute to the microbial colonization of teeth. The receptor molecules of Streptococcus oralis, 34, ATCC 10557, and Streptococcus mitis J22 for the galactose and N-acetylgalactosamine reactive fimbrial lectins of Actinomyces viscosus and Actinomyces naeslundii are antigenically distinct polysaccharides, each formed by a different phosphodiester-linked oligosaccharide repeating unit. Receptor polysaccharide was isolated form S. oralis C104 cells and was shown to contain galactose, N-acetylgalactosamine, ribitol, and phosphate with molar ratios of 4:1:1:1. The {sup 1}H NMR spectrum of the polysaccharide shows that it contains a repeating structure. The individual sugars in themore » repeating unit were identified by {sup 1}H coupling constants observed in E-COSY and DQF-COSY spectra. NMR methods included complete resonance assignments ({sup 1}H and {sup 13}C) by various homonuclear and heteronuclear correlation experiments that utilize scalar couplings. Sequence and linkage assignments were obtained from the heteronuclear multiple-bond correlation (HMBC) spectrum. This analysis shows that the receptor polysaccharide of S. oralis C104 is a ribitol teichoic acid polymer composed of a linear hexasaccharide repeating unit containing two residues each of galactopyranose and galactofuranose and a residue each of GalNAc and ribitol joined end to end by phosphodiester linkages.« less
Method for assigning sites to projected generic nuclear power plants

DOE Office of Scientific and Technical Information (OSTI.GOV)

Holter, G.M.; Purcell, W.L.; Shutz, M.E.

1986-07-01

Pacific Northwest Laboratory developed a method for forecasting potential locations and startup sequences of nuclear power plants that will be required in the future but have not yet been specifically identified by electric utilities. Use of the method results in numerical ratings for potential nuclear power plant sites located in each of the 10 federal energy regions. The rating for each potential site is obtained from numerical factors assigned to each of 5 primary siting characteristics: (1) cooling water availability, (2) site land area, (3) power transmission land area, (4) proximity to metropolitan areas, and (5) utility plans for themore » site. The sequence of plant startups in each federal energy region is obtained by use of the numerical ratings and the forecasts of generic nuclear power plant startups obtained from the EIA Middle Case electricity forecast. Sites are assigned to generic plants in chronological order according to startup date.« less
Meta-cognitive student reflections

NASA Astrophysics Data System (ADS)

Barquist, Britt; Stewart, Jim

2009-05-01

We have recently concluded a project testing the effectiveness of a weekly assignment designed to encourage awareness and improvement of meta-cognitive skills. The project is based on the idea that successful problem solvers implement a meta-cognitive process in which they identify the specific concept they are struggling with, and then identify what they understand, what they don't understand, and what they need to know in order to resolve their problem. The assignment required the students to write an email assessing the level of completion of a weekly workbook assignment and to examine in detail their experiences regarding a specific topic they struggled with. The assignment guidelines were designed to coach them through this meta-cognitive process. We responded to most emails with advice for next week's assignment. Our data follow 12 students through a quarter consisting of 11 email assignments which were scored using a rubric based on the assignment guidelines. We found no correlation between rubric scores and final grades. We do have anecdotal evidence that the assignment was beneficial.
MGmapper: Reference based mapping and taxonomy annotation of metagenomics sequence reads

PubMed Central

Lukjancenko, Oksana; Thomsen, Martin Christen Frølund; Maddalena Sperotto, Maria; Lund, Ole; Møller Aarestrup, Frank; Sicheritz-Pontén, Thomas

2017-01-01

An increasing amount of species and gene identification studies rely on the use of next generation sequence analysis of either single isolate or metagenomics samples. Several methods are available to perform taxonomic annotations and a previous metagenomics benchmark study has shown that a vast number of false positive species annotations are a problem unless thresholds or post-processing are applied to differentiate between correct and false annotations. MGmapper is a package to process raw next generation sequence data and perform reference based sequence assignment, followed by a post-processing analysis to produce reliable taxonomy annotation at species and strain level resolution. An in-vitro bacterial mock community sample comprised of 8 genuses, 11 species and 12 strains was previously used to benchmark metagenomics classification methods. After applying a post-processing filter, we obtained 100% correct taxonomy assignments at species and genus level. A sensitivity and precision at 75% was obtained for strain level annotations. A comparison between MGmapper and Kraken at species level, shows MGmapper assigns taxonomy at species level using 84.8% of the sequence reads, compared to 70.5% for Kraken and both methods identified all species with no false positives. Extensive read count statistics are provided in plain text and excel sheets for both rejected and accepted taxonomy annotations. The use of custom databases is possible for the command-line version of MGmapper, and the complete pipeline is freely available as a bitbucked package (https://bitbucket.org/genomicepidemiology/mgmapper). A web-version (https://cge.cbs.dtu.dk/services/MGmapper) provides the basic functionality for analysis of small fastq datasets. PMID:28467460

MGmapper: Reference based mapping and taxonomy annotation of metagenomics sequence reads.

PubMed

Petersen, Thomas Nordahl; Lukjancenko, Oksana; Thomsen, Martin Christen Frølund; Maddalena Sperotto, Maria; Lund, Ole; Møller Aarestrup, Frank; Sicheritz-Pontén, Thomas

2017-01-01

An increasing amount of species and gene identification studies rely on the use of next generation sequence analysis of either single isolate or metagenomics samples. Several methods are available to perform taxonomic annotations and a previous metagenomics benchmark study has shown that a vast number of false positive species annotations are a problem unless thresholds or post-processing are applied to differentiate between correct and false annotations. MGmapper is a package to process raw next generation sequence data and perform reference based sequence assignment, followed by a post-processing analysis to produce reliable taxonomy annotation at species and strain level resolution. An in-vitro bacterial mock community sample comprised of 8 genuses, 11 species and 12 strains was previously used to benchmark metagenomics classification methods. After applying a post-processing filter, we obtained 100% correct taxonomy assignments at species and genus level. A sensitivity and precision at 75% was obtained for strain level annotations. A comparison between MGmapper and Kraken at species level, shows MGmapper assigns taxonomy at species level using 84.8% of the sequence reads, compared to 70.5% for Kraken and both methods identified all species with no false positives. Extensive read count statistics are provided in plain text and excel sheets for both rejected and accepted taxonomy annotations. The use of custom databases is possible for the command-line version of MGmapper, and the complete pipeline is freely available as a bitbucked package (https://bitbucket.org/genomicepidemiology/mgmapper). A web-version (https://cge.cbs.dtu.dk/services/MGmapper) provides the basic functionality for analysis of small fastq datasets.
Complete cDNA sequence and amino acid analysis of a bovine ribonuclease K6 gene.

PubMed

Pietrowski, D; Förster, M

2000-01-01

The complete cDNA sequence of a ribonuclease k6 gene of Bos Taurus has been determined. It codes for a protein with 154 amino acids and contains the invariant cysteine, histidine and lysine residues as well as the characteristic motifs specific to ribonuclease active sites. The deduced protein sequence is 27 residues longer than other known ribonucleases k6 and shows amino acids exchanges which could reflect a strain specificity or polymorphism within the bovine genome. Based on sequence similarity we have termed the identified gene bovine ribonuclease k6 b (brk6b).
The effects of bedtime writing on difficulty falling asleep: A polysomnographic study comparing to-do lists and completed activity lists.

PubMed

Scullin, Michael K; Krueger, Madison L; Ballard, Hannah K; Pruett, Natalya; Bliwise, Donald L

2018-01-01

Bedtime worry, including worrying about incomplete future tasks, is a significant contributor to difficulty falling asleep. Previous research showed that writing about one's worries can help individuals fall asleep. We investigated whether the temporal focus of bedtime writing-writing a to-do list versus journaling about completed activities-affected sleep onset latency. Fifty-seven healthy young adults (18-30) completed a writing assignment for 5 min prior to overnight polysomnography recording in a controlled sleep laboratory. They were randomly assigned to write about tasks that they needed to remember to complete the next few days (to-do list) or about tasks they had completed the previous few days (completed list). Participants in the to-do list condition fell asleep significantly faster than those in the completed-list condition. The more specifically participants wrote their to-do list, the faster they subsequently fell asleep, whereas the opposite trend was observed when participants wrote about completed activities. Therefore, to facilitate falling asleep, individuals may derive benefit from writing a very specific to-do list for 5 min at bedtime rather than journaling about completed activities. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
FOAM (Functional Ontology Assignments for Metagenomes): A Hidden Markov Model (HMM) database with environmental focus

DOE PAGES

Prestat, Emmanuel; David, Maude M.; Hultman, Jenni; ...

2014-09-26

A new functional gene database, FOAM (Functional Ontology Assignments for Metagenomes), was developed to screen environmental metagenomic sequence datasets. FOAM provides a new functional ontology dedicated to classify gene functions relevant to environmental microorganisms based on Hidden Markov Models (HMMs). Sets of aligned protein sequences (i.e. ‘profiles’) were tailored to a large group of target KEGG Orthologs (KOs) from which HMMs were trained. The alignments were checked and curated to make them specific to the targeted KO. Within this process, sequence profiles were enriched with the most abundant sequences available to maximize the yield of accurate classifier models. An associatedmore » functional ontology was built to describe the functional groups and hierarchy. FOAM allows the user to select the target search space before HMM-based comparison steps and to easily organize the results into different functional categories and subcategories. FOAM is publicly available at http://portal.nersc.gov/project/m1317/FOAM/.« less
Quality scores for 32,000 genomes

DOE PAGES

Land, Miriam L.; Hyatt, Doug; Jun, Se-Ran; ...

2014-12-08

More than 80% of the microbial genomes in GenBank are of ‘draft’ quality (12,553 draft vs. 2,679 finished, as of October, 2013). In this study, we have examined all the microbial DNA sequences available for complete, draft, and Sequence Read Archive genomes in GenBank as well as three other major public databases, and assigned quality scores for more than 30,000 prokaryotic genome sequences. Scores were assigned using four categories: the completeness of the assembly, the presence of full-length rRNA genes, tRNA composition and the presence of a set of 102 conserved genes in prokaryotes. Most (~88%) of the genomes hadmore » quality scores of 0.8 or better and can be safely used for standard comparative genomics analysis. We compared genomes across factors that may influence the score. We found that although sequencing depth coverage of over 100x did not ensure a better score, sequencing read length was a better indicator of sequencing quality. With few exceptions, most of the 30,000 genomes have nearly all the 102 essential genes. The score can be used to set thresholds for screening data when analyzing “all published genomes” and reference data is either not available or not applicable. The scores highlighted organisms for which commonly used tools do not perform well. This information can be used to improve tools and to serve a broad group of users as more diverse organisms are sequenced. Finally and unexpectedly, the comparison of predicted tRNAs across 15,000 high quality genomes showed that anticodons beginning with an ‘A’ (codons ending with a ‘U’) are almost non-existent, with the exception of one arginine codon (CGU); this has been noted previously in the literature for a few genomes, but not with the depth found here.« less
Quality scores for 32,000 genomes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Land, Miriam L.; Hyatt, Doug; Jun, Se-Ran

More than 80% of the microbial genomes in GenBank are of ‘draft’ quality (12,553 draft vs. 2,679 finished, as of October, 2013). In this study, we have examined all the microbial DNA sequences available for complete, draft, and Sequence Read Archive genomes in GenBank as well as three other major public databases, and assigned quality scores for more than 30,000 prokaryotic genome sequences. Scores were assigned using four categories: the completeness of the assembly, the presence of full-length rRNA genes, tRNA composition and the presence of a set of 102 conserved genes in prokaryotes. Most (~88%) of the genomes hadmore » quality scores of 0.8 or better and can be safely used for standard comparative genomics analysis. We compared genomes across factors that may influence the score. We found that although sequencing depth coverage of over 100x did not ensure a better score, sequencing read length was a better indicator of sequencing quality. With few exceptions, most of the 30,000 genomes have nearly all the 102 essential genes. The score can be used to set thresholds for screening data when analyzing “all published genomes” and reference data is either not available or not applicable. The scores highlighted organisms for which commonly used tools do not perform well. This information can be used to improve tools and to serve a broad group of users as more diverse organisms are sequenced. Finally and unexpectedly, the comparison of predicted tRNAs across 15,000 high quality genomes showed that anticodons beginning with an ‘A’ (codons ending with a ‘U’) are almost non-existent, with the exception of one arginine codon (CGU); this has been noted previously in the literature for a few genomes, but not with the depth found here.« less
Stability of operational taxonomic units: an important but neglected property for analyzing microbial diversity.

PubMed

He, Yan; Caporaso, J Gregory; Jiang, Xiao-Tao; Sheng, Hua-Fang; Huse, Susan M; Rideout, Jai Ram; Edgar, Robert C; Kopylova, Evguenia; Walters, William A; Knight, Rob; Zhou, Hong-Wei

2015-01-01

The operational taxonomic unit (OTU) is widely used in microbial ecology. Reproducibility in microbial ecology research depends on the reliability of OTU-based 16S ribosomal subunit RNA (rRNA) analyses. Here, we report that many hierarchical and greedy clustering methods produce unstable OTUs, with membership that depends on the number of sequences clustered. If OTUs are regenerated with additional sequences or samples, sequences originally assigned to a given OTU can be split into different OTUs. Alternatively, sequences assigned to different OTUs can be merged into a single OTU. This OTU instability affects alpha-diversity analyses such as rarefaction curves, beta-diversity analyses such as distance-based ordination (for example, Principal Coordinate Analysis (PCoA)), and the identification of differentially represented OTUs. Our results show that the proportion of unstable OTUs varies for different clustering methods. We found that the closed-reference method is the only one that produces completely stable OTUs, with the caveat that sequences that do not match a pre-existing reference sequence collection are discarded. As a compromise to the factors listed above, we propose using an open-reference method to enhance OTU stability. This type of method clusters sequences against a database and includes unmatched sequences by clustering them via a relatively stable de novo clustering method. OTU stability is an important consideration when analyzing microbial diversity and is a feature that should be taken into account during the development of novel OTU clustering methods.
The proteome: structure, function and evolution

PubMed Central

Fleming, Keiran; Kelley, Lawrence A; Islam, Suhail A; MacCallum, Robert M; Muller, Arne; Pazos, Florencio; Sternberg, Michael J.E

2006-01-01

This paper reports two studies to model the inter-relationships between protein sequence, structure and function. First, an automated pipeline to provide a structural annotation of proteomes in the major genomes is described. The results are stored in a database at Imperial College, London (3D-GENOMICS) that can be accessed at www.sbg.bio.ic.ac.uk. Analysis of the assignments to structural superfamilies provides evolutionary insights. 3D-GENOMICS is being integrated with related proteome annotation data at University College London and the European Bioinformatics Institute in a project known as e-protein (http://www.e-protein.org/). The second topic is motivated by the developments in structural genomics projects in which the structure of a protein is determined prior to knowledge of its function. We have developed a new approach PHUNCTIONER that uses the gene ontology (GO) classification to supervise the extraction of the sequence signal responsible for protein function from a structure-based sequence alignment. Using GO we can obtain profiles for a range of specificities described in the ontology. In the region of low sequence similarity (around 15%), our method is more accurate than assignment from the closest structural homologue. The method is also able to identify the specific residues associated with the function of the protein family. PMID:16524832
StrainSeeker: fast identification of bacterial strains from raw sequencing reads using user-provided guide trees.

PubMed

Roosaare, Märt; Vaher, Mihkel; Kaplinski, Lauris; Möls, Märt; Andreson, Reidar; Lepamets, Maarja; Kõressaar, Triinu; Naaber, Paul; Kõljalg, Siiri; Remm, Maido

2017-01-01

Fast, accurate and high-throughput identification of bacterial isolates is in great demand. The present work was conducted to investigate the possibility of identifying isolates from unassembled next-generation sequencing reads using custom-made guide trees. A tool named StrainSeeker was developed that constructs a list of specific k -mers for each node of any given Newick-format tree and enables the identification of bacterial isolates in 1-2 min. It uses a novel algorithm, which analyses the observed and expected fractions of node-specific k -mers to test the presence of each node in the sample. This allows StrainSeeker to determine where the isolate branches off the guide tree and assign it to a clade whereas other tools assign each read to a reference genome. Using a dataset of 100 Escherichia coli isolates, we demonstrate that StrainSeeker can predict the clades of E. coli with 92% accuracy and correct tree branch assignment with 98% accuracy. Twenty-five thousand Illumina HiSeq reads are sufficient for identification of the strain. StrainSeeker is a software program that identifies bacterial isolates by assigning them to nodes or leaves of a custom-made guide tree. StrainSeeker's web interface and pre-computed guide trees are available at http://bioinfo.ut.ee/strainseeker. Source code is stored at GitHub: https://github.com/bioinfo-ut/StrainSeeker.
Microbial species delineation using whole genome sequences

PubMed Central

Varghese, Neha J.; Mukherjee, Supratim; Ivanova, Natalia; Konstantinidis, Konstantinos T.; Mavrommatis, Kostas; Kyrpides, Nikos C.; Pati, Amrita

2015-01-01

Increased sequencing of microbial genomes has revealed that prevailing prokaryotic species assignments can be inconsistent with whole genome information for a significant number of species. The long-standing need for a systematic and scalable species assignment technique can be met by the genome-wide Average Nucleotide Identity (gANI) metric, which is widely acknowledged as a robust measure of genomic relatedness. In this work, we demonstrate that the combination of gANI and the alignment fraction (AF) between two genomes accurately reflects their genomic relatedness. We introduce an efficient implementation of AF,gANI and discuss its successful application to 86.5M genome pairs between 13,151 prokaryotic genomes assigned to 3032 species. Subsequently, by comparing the genome clusters obtained from complete linkage clustering of these pairs to existing taxonomy, we observed that nearly 18% of all prokaryotic species suffer from anomalies in species definition. Our results can be used to explore central questions such as whether microorganisms form a continuum of genetic diversity or distinct species represented by distinct genetic signatures. We propose that this precise and objective AF,gANI-based species definition: the MiSI (Microbial Species Identifier) method, be used to address previous inconsistencies in species classification and as the primary guide for new taxonomic species assignment, supplemented by the traditional polyphasic approach, as required. PMID:26150420
Mitochondrial DNA haplogroup phylogeny of the dog: Proposal for a cladistic nomenclature.

PubMed

Fregel, Rosa; Suárez, Nicolás M; Betancor, Eva; González, Ana M; Cabrera, Vicente M; Pestano, José

2015-05-01

Canis lupus familiaris mitochondrial DNA analysis has increased in recent years, not only for the purpose of deciphering dog domestication but also for forensic genetic studies or breed characterization. The resultant accumulation of data has increased the need for a normalized and phylogenetic-based nomenclature like those provided for human maternal lineages. Although a standardized classification has been proposed, haplotype names within clades have been assigned gradually without considering the evolutionary history of dog mtDNA. Moreover, this classification is based only on the D-loop region, proven to be insufficient for phylogenetic purposes due to its high number of recurrent mutations and the lack of relevant information present in the coding region. In this study, we design 1) a refined mtDNA cladistic nomenclature from a phylogenetic tree based on complete sequences, classifying dog maternal lineages into haplogroups defined by specific diagnostic mutations, and 2) a coding region SNP analysis that allows a more accurate classification into haplogroups when combined with D-loop sequencing, thus improving the phylogenetic information obtained in dog mitochondrial DNA studies. Copyright © 2015 Elsevier B.V. All rights reserved.
DNA Microarray Profiling of a Diverse Collection of Nosocomial Methicillin-Resistant Staphylococcus aureus Isolates Assigns the Majority to the Correct Sequence Type and Staphylococcal Cassette Chromosome mec (SCCmec) Type and Results in the Subsequent Identification and Characterization of Novel SCCmec-SCCM1 Composite Islands

PubMed Central

Brennan, Orla M.; Deasy, Emily C.; Rossney, Angela S.; Kinnevey, Peter M.; Ehricht, Ralf; Monecke, Stefan; Coleman, David C.

2012-01-01

One hundred seventy-five isolates representative of methicillin-resistant Staphylococcus aureus (MRSA) clones that predominated in Irish hospitals between 1971 and 2004 and that previously underwent multilocus sequence typing (MLST) and staphylococcal cassette chromosome mec (SCCmec) typing were characterized by spa typing (175 isolates) and DNA microarray profiling (107 isolates). The isolates belonged to 26 sequence type (ST)-SCCmec types and subtypes and 35 spa types. The array assigned all isolates to the correct MLST clonal complex (CC), and 94% (100/107) were assigned an ST, with 98% (98/100) correlating with MLST. The array assigned all isolates to the correct SCCmec type, but subtyping of only some SCCmec elements was possible. Additional SCCmec/SCC genes or DNA sequence variation not detected by SCCmec typing was detected by array profiling, including the SCC-fusidic acid resistance determinant Q6GD50/fusC. Novel SCCmec/SCC composite islands (CIs) were detected among CC8 isolates and comprised SCCmec IIA-IIE, IVE, IVF, or IVg and a ccrAB4-SCC element with 99% DNA sequence identity to SCCM1 from ST8/t024-MRSA, SCCmec VIII, and SCC-CI in Staphylococcus epidermidis. The array showed that the majority of isolates harbored one or more superantigen (94%; 100/107) and immune evasion cluster (91%; 97/107) genes. Apart from fusidic acid and trimethoprim resistance, the correlation between isolate antimicrobial resistance phenotype and the presence of specific resistance genes was ≥97%. Array profiling allowed high-throughput, accurate assignment of MRSA to CCs/STs and SCCmec types and provided further evidence of the diversity of SCCmec/SCC. In most cases, array profiling can accurately predict the resistance phenotype of an isolate. PMID:22869569
Automatic phylogenetic classification of bacterial beta-lactamase sequences including structural and antibiotic substrate preference information.

PubMed

Ma, Jianmin; Eisenhaber, Frank; Maurer-Stroh, Sebastian

2013-12-01

Beta lactams comprise the largest and still most effective group of antibiotics, but bacteria can gain resistance through different beta lactamases that can degrade these antibiotics. We developed a user friendly tree building web server that allows users to assign beta lactamase sequences to their respective molecular classes and subclasses. Further clinically relevant information includes if the gene is typically chromosomal or transferable through plasmids as well as listing the antibiotics which the most closely related reference sequences are known to target and cause resistance against. This web server can automatically build three phylogenetic trees: the first tree with closely related sequences from a Tachyon search against the NCBI nr database, the second tree with curated reference beta lactamase sequences, and the third tree built specifically from substrate binding pocket residues of the curated reference beta lactamase sequences. We show that the latter is better suited to recover antibiotic substrate assignments through nearest neighbor annotation transfer. The users can also choose to build a structural model for the query sequence and view the binding pocket residues of their query relative to other beta lactamases in the sequence alignment as well as in the 3D structure relative to bound antibiotics. This web server is freely available at http://blac.bii.a-star.edu.sg/.
Genomic characterization and taxonomic position of a rhabdovirus from a hybrid snakehead.

PubMed

Zeng, Weiwei; Wang, Qing; Wang, Yingying; Liu, Cun; Liang, Hongru; Fang, Xiang; Wu, Shuqin

2014-09-01

A new rhabdovirus, tentatively designated as hybrid snakehead rhabdovirus C1207 (HSHRV-C1207), was first isolated from a moribund hybrid snakehead (Channa maculata×Channa argus) in China. We present the complete genome sequence of HSHRV-C1207 and a comprehensive sequence comparison between HSHRV-C1207 and other rhabdoviruses. Sequence alignment and phylogenetic analysis revealed that HSHRV-C1207 shared the highest degree of homology with Monopterus albus rhabdovirus and Siniperca chuatsi rhabdovirus. All three viruses clustered into a single group that was distinct from the recognized genera in the family Rhabdoviridae. Our analysis suggests that HSHRV-C1207, as well as MARV and SCRV, should be assigned to a new rhabdovirus genus.
Characterisation of the genomes of four putative vesiculoviruses: tench rhabdovirus, grass carp rhabdovirus, perch rhabdovirus and eel rhabdovirus European X.

PubMed

Stone, David M; Kerr, Rose C; Hughes, Margaret; Radford, Alan D; Darby, Alistair C

2013-11-01

The complete coding sequences were determined for four putative vesiculoviruses isolated from fish. Sequence alignment and phylogenetic analysis based on the predicted amino acid sequences of the five main proteins assigned tench rhabdovirus and grass carp rhabdovirus together with spring viraemia of carp and pike fry rhabdovirus to a lineage that was distinct from the mammalian vesiculoviruses. Perch rhabdovirus, eel virus European X, lake trout rhabdovirus 903/87 and sea trout virus were placed in a second lineage that was also distinct from the recognised genera in the family Rhabdoviridae. Establishment of two new rhabdovirus genera, "Perhabdovirus" and "Sprivivirus", is discussed.
1H, 13C, and 15N backbone assignment and secondary structure of the receptor-binding domain of vascular endothelial growth factor.

PubMed Central

Fairbrother, W. J.; Champe, M. A.; Christinger, H. W.; Keyt, B. A.; Starovasnik, M. A.

1997-01-01

Nearly complete sequence-specific 1H, 13C, and 15N resonance assignments are reported for the backbone atoms of the receptor-binding domain of vascular endothelial growth factor (VEGF), a 23-kDa homodimeric protein that is a major regulator of both normal and pathological angiogenesis. The assignment strategy relied on the use of seven 3D triple-resonance experiments [HN(CO)CA, HNCA, HNCO, (HCA)CONH, HN(COCA)HA, HN(CA)HA, and CBCA-(CO)NH] and a 3D 15N-TOCSY-HSQC experiment recorded on a 0.5 mM (12 mg/mL) sample at 500 MHz, pH 7.0, 45 degrees C. Under these conditions, 15N relaxation data show that the protein has a rotational correlation time of 15.0 ns. Despite this unusually long correlation time, assignments were obtained for 94 of the 99 residues; 8 residues lack amide 1H and 15N assignments, presumably due to rapid exchange of the amide 1H with solvent under the experimental conditions used. The secondary structure of the protein was deduced from the chemical shift indices of the 1H alpha, 13C alpha, 13C beta, and 13CO nuclei, and from analysis of backbone NOEs observed in a 3D 15N-NOESY-HSQC spectrum. Two helices and a significant amount of beta-sheet structure were identified, in general agreement with the secondary structure found in a recently determined crystal structure of a similar VEGF construct [Muller YA et al., 1997, Proc Natl Acad Sci USA 94:7192-7197]. PMID:9336848
Impact of sequencing depth on the characterization of the microbiome and resistome.

PubMed

Zaheer, Rahat; Noyes, Noelle; Ortega Polo, Rodrigo; Cook, Shaun R; Marinier, Eric; Van Domselaar, Gary; Belk, Keith E; Morley, Paul S; McAllister, Tim A

2018-04-12

Developments in high-throughput next generation sequencing (NGS) technology have rapidly advanced the understanding of overall microbial ecology as well as occurrence and diversity of specific genes within diverse environments. In the present study, we compared the ability of varying sequencing depths to generate meaningful information about the taxonomic structure and prevalence of antimicrobial resistance genes (ARGs) in the bovine fecal microbial community. Metagenomic sequencing was conducted on eight composite fecal samples originating from four beef cattle feedlots. Metagenomic DNA was sequenced to various depths, D1, D0.5 and D0.25, with average sample read counts of 117, 59 and 26 million, respectively. A comparative analysis of the relative abundance of reads aligning to different phyla and antimicrobial classes indicated that the relative proportions of read assignments remained fairly constant regardless of depth. However, the number of reads being assigned to ARGs as well as to microbial taxa increased significantly with increasing depth. We found a depth of D0.5 was suitable to describe the microbiome and resistome of cattle fecal samples. This study helps define a balance between cost and required sequencing depth to acquire meaningful results.
Genomewide Function Conservation and Phylogeny in the Herpesviridae

PubMed Central

Albà, M. Mar; Das, Rhiju; Orengo, Christine A.; Kellam, Paul

2001-01-01

The Herpesviridae are a large group of well-characterized double-stranded DNA viruses for which many complete genome sequences have been determined. We have extracted protein sequences from all predicted open reading frames of 19 herpesvirus genomes. Sequence comparison and protein sequence clustering methods have been used to construct herpesvirus protein homologous families. This resulted in 1692 proteins being clustered into 243 multiprotein families and 196 singleton proteins. Predicted functions were assigned to each homologous family based on genome annotation and published data and each family classified into seven broad functional groups. Phylogenetic profiles were constructed for each herpesvirus from the homologous protein families and used to determine conserved functions and genomewide phylogenetic trees. These trees agreed with molecular-sequence-derived trees and allowed greater insight into the phylogeny of ungulate and murine gammaherpesviruses. PMID:11156614
Determination of the Gene Sequence of Poliovirus with Pactamycin

PubMed Central

Summers, D. F.; Maizel, J. V.

1971-01-01

By examination of the virus-specific polypeptides formed after the addition of pactamycin, an inhibitor of protein chain initiation, to infected cells, it has been possible to tentatively locate the virus coat proteins at the amino terminus of the large, virus-specific protein precursor, and, therefore, to assign the coat protein cistron to the 5′ end of the RNA. PMID:4330946
Improvement of the Threespine Stickleback Genome Using a Hi-C-Based Proximity-Guided Assembly.

PubMed

Peichel, Catherine L; Sullivan, Shawn T; Liachko, Ivan; White, Michael A

2017-09-01

Scaffolding genomes into complete chromosome assemblies remains challenging even with the rapidly increasing sequence coverage generated by current next-generation sequence technologies. Even with scaffolding information, many genome assemblies remain incomplete. The genome of the threespine stickleback (Gasterosteus aculeatus), a fish model system in evolutionary genetics and genomics, is not completely assembled despite scaffolding with high-density linkage maps. Here, we first test the ability of a Hi-C based proximity-guided assembly (PGA) to perform a de novo genome assembly from relatively short contigs. Using Hi-C based PGA, we generated complete chromosome assemblies from a distribution of short contigs (20-100 kb). We found that 96.40% of contigs were correctly assigned to linkage groups (LGs), with ordering nearly identical to the previous genome assembly. Using available bacterial artificial chromosome (BAC) end sequences, we provide evidence that some of the few discrepancies between the Hi-C assembly and the existing assembly are due to structural variation between the populations used for the 2 assemblies or errors in the existing assembly. This Hi-C assembly also allowed us to improve the existing assembly, assigning over 60% (13.35 Mb) of the previously unassigned (~21.7 Mb) contigs to LGs. Together, our results highlight the potential of the Hi-C based PGA method to be used in combination with short read data to perform relatively inexpensive de novo genome assemblies. This approach will be particularly useful in organisms in which it is difficult to perform linkage mapping or to obtain high molecular weight DNA required for other scaffolding methods. © The American Genetic Association 2017. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Workflow-Based Software Development Environment

NASA Technical Reports Server (NTRS)

Izygon, Michel E.

2013-01-01

The Software Developer's Assistant (SDA) helps software teams more efficiently and accurately conduct or execute software processes associated with NASA mission-critical software. SDA is a process enactment platform that guides software teams through project-specific standards, processes, and procedures. Software projects are decomposed into all of their required process steps or tasks, and each task is assigned to project personnel. SDA orchestrates the performance of work required to complete all process tasks in the correct sequence. The software then notifies team members when they may begin work on their assigned tasks and provides the tools, instructions, reference materials, and supportive artifacts that allow users to compliantly perform the work. A combination of technology components captures and enacts any software process use to support the software lifecycle. It creates an adaptive workflow environment that can be modified as needed. SDA achieves software process automation through a Business Process Management (BPM) approach to managing the software lifecycle for mission-critical projects. It contains five main parts: TieFlow (workflow engine), Business Rules (rules to alter process flow), Common Repository (storage for project artifacts, versions, history, schedules, etc.), SOA (interface to allow internal, GFE, or COTS tools integration), and the Web Portal Interface (collaborative web environment
Multimodal RNA-seq using single-strand, double-strand, and CircLigase-based capture yields a refined and extended description of the C. elegans transcriptome.

PubMed

Lamm, Ayelet T; Stadler, Michael R; Zhang, Huibin; Gent, Jonathan I; Fire, Andrew Z

2011-02-01

We have used a combination of three high-throughput RNA capture and sequencing methods to refine and augment the transcriptome map of a well-studied genetic model, Caenorhabditis elegans. The three methods include a standard (non-directional) library preparation protocol relying on cDNA priming and foldback that has been used in several previous studies for transcriptome characterization in this species, and two directional protocols, one involving direct capture of single-stranded RNA fragments and one involving circular-template PCR (CircLigase). We find that each RNA-seq approach shows specific limitations and biases, with the application of multiple methods providing a more complete map than was obtained from any single method. Of particular note in the analysis were substantial advantages of CircLigase-based and ssRNA-based capture for defining sequences and structures of the precise 5' ends (which were lost using the double-strand cDNA capture method). Of the three methods, ssRNA capture was most effective in defining sequences to the poly(A) junction. Using data sets from a spectrum of C. elegans strains and stages and the UCSC Genome Browser, we provide a series of tools, which facilitate rapid visualization and assignment of gene structures.
Complete Mitochondrial Genomes of New Zealand’s First Dogs

PubMed Central

Greig, Karen; Boocock, James; Prost, Stefan; Horsburgh, K. Ann; Jacomb, Chris; Walter, Richard; Matisoo-Smith, Elizabeth

2015-01-01

Dogs accompanied people in their migrations across the Pacific Ocean and ultimately reached New Zealand, which is the southern-most point of their oceanic distribution, around the beginning of the fourteenth century AD. Previous ancient DNA analyses of mitochondrial control region sequences indicated the New Zealand dog population included two lineages. We sequenced complete mitochondrial genomes of fourteen dogs from the colonisation era archaeological site of Wairau Bar and found five closely-related haplotypes. The limited number of mitochondrial lineages present at Wairau Bar suggests that the founding population may have comprised only a few dogs; or that the arriving dogs were closely related. For populations such as that at Wairau Bar, which stemmed from relatively recent migration events, control region sequences have insufficient power to address questions about population structure and founding events. Sequencing mitogenomes provided the opportunity to observe sufficient diversity to discriminate between individuals that would otherwise be assigned the same haplotype and to clarify their relationships with each other. Our results also support the proposition that at least one dispersal of dogs into the Pacific was via a south-western route through Indonesia. PMID:26444283
Vibrio cholerae typing phage N4: genome sequence and its relatedness to T7 viral supergroup.

PubMed

Das, Mayukh; Nandy, R K; Bhowmick, Tushar Suvra; Yamasaki, S; Ghosh, A; Nair, G B; Sarkar, B L

2012-01-01

In countries where cholera is endemic, Vibrio cholerae O1 bacteriophages have been detected in sewage water. These have been used to serve not only as strain markers, but also for the typing of V. cholerae strains. Vibriophage N4 (ATCC 51352-B1) occupies a unique position in the new phage-typing scheme and can infect a larger number of V. cholerae O1 biotype El Tor strains. Here we characterized the complete genome sequence of this typing vibriophage. The complete DNA sequence of the N4 genome was determined by using a shotgun sequencing approach. Complete genome sequence explored that phage N4 is comprised of one circular, double-stranded chromosome of 38,497 bp with an overall GC content of 42.8%. A total of 47 open reading frames were identified and functions could be assigned to 30 of them. Further, a close relationship with another vibriophage, VP4, and the enterobacteriophage T7 could be established. DNA-DNA hybridization among V. cholerae O1 and O139 phages revealed homology among O1 vibriophages at their genomic level. This study indicates two evolutionary distinctive branches of the possible phylogenetic origin of O1 and O139 vibriophages and provides an unveiled collection of information on viral gene products of typing vibriophages. Copyright © 2011 S. Karger AG, Basel.
Mixed pyruvate labeling enables backbone resonance assignment of large proteins using a single experiment.

PubMed

Robson, Scott A; Takeuchi, Koh; Boeszoermenyi, Andras; Coote, Paul W; Dubey, Abhinav; Hyberts, Sven; Wagner, Gerhard; Arthanari, Haribabu

2018-01-24

Backbone resonance assignment is a critical first step in the investigation of proteins by NMR. This is traditionally achieved with a standard set of experiments, most of which are not optimal for large proteins. Of these, HNCA is the most sensitive experiment that provides sequential correlations. However, this experiment suffers from chemical shift degeneracy problems during the assignment procedure. We present a strategy that increases the effective resolution of HNCA and enables near-complete resonance assignment using this single HNCA experiment. We utilize a combination of 2- 13 C and 3- 13 C pyruvate as the carbon source for isotope labeling, which suppresses the one bond ( 1 J αβ ) coupling providing enhanced resolution for the Cα resonance and amino acid-specific peak shapes that arise from the residual coupling. Using this approach, we can obtain near-complete (>85%) backbone resonance assignment of a 42 kDa protein using a single HNCA experiment.
STS-107 Crew Surgeon

NASA Technical Reports Server (NTRS)

Johnston, Smith

2005-01-01

NASA Crew Surgeons (CS) provides medical support to crewmembers assigned to a space flight. Upon this mission assignment, CS s develop close working and personal relationships with crewmembers, their families and close friends. This discussion covers the role of the NASA CS from start of a mission assignment through its completion. Specific emphasis is placed on events associated with the Columbia accident to include; premission planning, initial family medical support, interface with the astronaut Casualty Assistance Control Officers (CACOs), AFIP relationship and on-going care for the families.
Escherichia coli K-12: a cooperatively developed annotation snapshot—2005

PubMed Central

Riley, Monica; Abe, Takashi; Arnaud, Martha B.; Berlyn, Mary K.B.; Blattner, Frederick R.; Chaudhuri, Roy R.; Glasner, Jeremy D.; Horiuchi, Takashi; Keseler, Ingrid M.; Kosuge, Takehide; Mori, Hirotada; Perna, Nicole T.; Plunkett, Guy; Rudd, Kenneth E.; Serres, Margrethe H.; Thomas, Gavin H.; Thomson, Nicholas R.; Wishart, David; Wanner, Barry L.

2006-01-01

The goal of this group project has been to coordinate and bring up-to-date information on all genes of Escherichia coli K-12. Annotation of the genome of an organism entails identification of genes, the boundaries of genes in terms of precise start and end sites, and description of the gene products. Known and predicted functions were assigned to each gene product on the basis of experimental evidence or sequence analysis. Since both kinds of evidence are constantly expanding, no annotation is complete at any moment in time. This is a snapshot analysis based on the most recent genome sequences of two E.coli K-12 bacteria. An accurate and up-to-date description of E.coli K-12 genes is of particular importance to the scientific community because experimentally determined properties of its gene products provide fundamental information for annotation of innumerable genes of other organisms. Availability of the complete genome sequence of two K-12 strains allows comparison of their genotypes and mutant status of alleles. PMID:16397293
Evolution, substrate specificity and subfamily classification of glycoside hydrolase family 5 (GH5).

PubMed

Aspeborg, Henrik; Coutinho, Pedro M; Wang, Yang; Brumer, Harry; Henrissat, Bernard

2012-09-20

The large Glycoside Hydrolase family 5 (GH5) groups together a wide range of enzymes acting on β-linked oligo- and polysaccharides, and glycoconjugates from a large spectrum of organisms. The long and complex evolution of this family of enzymes and its broad sequence diversity limits functional prediction. With the objective of improving the differentiation of enzyme specificities in a knowledge-based context, and to obtain new evolutionary insights, we present here a new, robust subfamily classification of family GH5. About 80% of the current sequences were assigned into 51 subfamilies in a global analysis of all publicly available GH5 sequences and associated biochemical data. Examination of subfamilies with catalytically-active members revealed that one third are monospecific (containing a single enzyme activity), although new functions may be discovered with biochemical characterization in the future. Furthermore, twenty subfamilies presently have no characterization whatsoever and many others have only limited structural and biochemical data. Mapping of functional knowledge onto the GH5 phylogenetic tree revealed that the sequence space of this historical and industrially important family is far from well dispersed, highlighting targets in need of further study. The analysis also uncovered a number of GH5 proteins which have lost their catalytic machinery, indicating evolution towards novel functions. Overall, the subfamily division of GH5 provides an actively curated resource for large-scale protein sequence annotation for glycogenomics; the subfamily assignments are openly accessible via the Carbohydrate-Active Enzyme database at http://www.cazy.org/GH5.html.
Previously unknown and highly divergent ssDNA viruses populate the oceans.

PubMed

Labonté, Jessica M; Suttle, Curtis A

2013-11-01

Single-stranded DNA (ssDNA) viruses are economically important pathogens of plants and animals, and are widespread in oceans; yet, the diversity and evolutionary relationships among marine ssDNA viruses remain largely unknown. Here we present the results from a metagenomic study of composite samples from temperate (Saanich Inlet, 11 samples; Strait of Georgia, 85 samples) and subtropical (46 samples, Gulf of Mexico) seawater. Most sequences (84%) had no evident similarity to sequenced viruses. In total, 608 putative complete genomes of ssDNA viruses were assembled, almost doubling the number of ssDNA viral genomes in databases. These comprised 129 genetically distinct groups, each represented by at least one complete genome that had no recognizable similarity to each other or to other virus sequences. Given that the seven recognized families of ssDNA viruses have considerable sequence homology within them, this suggests that many of these genetic groups may represent new viral families. Moreover, nearly 70% of the sequences were similar to one of these genomes, indicating that most of the sequences could be assigned to a genetically distinct group. Most sequences fell within 11 well-defined gene groups, each sharing a common gene. Some of these encoded putative replication and coat proteins that had similarity to sequences from viruses infecting eukaryotes, suggesting that these were likely from viruses infecting eukaryotic phytoplankton and zooplankton.
A robust and cost-effective approach to sequence and analyze complete genomes of small RNA viruses

USDA-ARS?s Scientific Manuscript database

Background: Next-generation sequencing (NGS) allows ultra-deep sequencing of nucleic acids. The use of sequence-independent amplification of viral nucleic acids without utilization of target-specific primers provides advantages over traditional sequencing methods and allows detection of unsuspected ...
Problems of classification in the family Paramyxoviridae.

PubMed

Rima, Bert; Collins, Peter; Easton, Andrew; Fouchier, Ron; Kurath, Gael; Lamb, Robert A; Lee, Benhur; Maisner, Andrea; Rota, Paul; Wang, Lin-Fa

2018-05-01

A number of unassigned viruses in the family Paramyxoviridae need to be classified either as a new genus or placed into one of the seven genera currently recognized in this family. Furthermore, numerous new paramyxoviruses continue to be discovered. However, attempts at classification have highlighted the difficulties that arise by applying historic criteria or criteria based on sequence alone to the classification of the viruses in this family. While the recent taxonomic change that elevated the previous subfamily Pneumovirinae into a separate family Pneumoviridae is readily justified on the basis of RNA dependent -RNA polymerase (RdRp or L protein) sequence motifs, using RdRp sequence comparisons for assignment to lower level taxa raises problems that would require an overhaul of the current criteria for assignment into genera in the family Paramyxoviridae. Arbitrary cut off points to delineate genera and species would have to be set if classification was based on the amino acid sequence of the RdRp alone or on pairwise analysis of sequence complementarity (PASC) of all open reading frames (ORFs). While these cut-offs cannot be made consistent with the current classification in this family, resorting to genus-level demarcation criteria with additional input from the biological context may afford a way forward. Such criteria would reflect the increasingly dynamic nature of virus taxonomy even if it would require a complete revision of the current classification.
Molecular Test to Assign Individuals within the Cacopsylla pruni Complex

PubMed Central

Peccoud, Jean; Labonne, Gérard; Sauvion, Nicolas

2013-01-01

Crop protection requires the accurate identification of disease vectors, a task that can be made difficult when these vectors encompass cryptic species. Here we developed a rapid molecular diagnostic test to identify individuals of Cacopsylla pruni (Scopoli, 1763) (Hemiptera: Psyllidae), the main vector of the European stone fruit yellows phytoplasma. This psyllid encompasses two highly divergent genetic groups that are morphologically similar and that are characterized by genotyping several microsatellite markers, a costly and time-consuming protocol. With the aim of developing species-specific PCR primers, we sequenced the Internal Transcribed Spacer 2 (ITS2) on a collection of C . pruni samples from France and other European countries. ITS2 sequences showed that the two genetic groups represent two highly divergent clades. This enabled us to develop specific primers for the assignment of individuals to either genetic group in a single PCR, based on ITS2 amplicon size. All previously assigned individuals yielded bands of expected sizes, and the PCR proved efficient on a larger sample of 799 individuals. Because none appeared heterozygous at the ITS2 locus (i.e., none produced two bands), we inferred that the genetic groups of C . pruni , whose distribution is partly sympatric, constitute biological species that have not exchanged genes for an extended period of time. Other psyllid species (Cacopsylla, Psylla, Triozidae and Aphalaridae) failed to yield any amplicon. These primers are therefore unlikely to produce false positives and allow rapid assignment of C . pruni individuals to either cryptic species. PMID:23977301
Structural features of the rice chromosome 4 centromere.

PubMed

Zhang, Yu; Huang, Yuchen; Zhang, Lei; Li, Ying; Lu, Tingting; Lu, Yiqi; Feng, Qi; Zhao, Qiang; Cheng, Zhukuan; Xue, Yongbiao; Wing, Rod A; Han, Bin

2004-01-01

A complete sequence of a chromosome centromere is necessary for fully understanding centromere function. We reported the sequence structures of the first complete rice chromosome centromere through sequencing a large insert bacterial artificial chromosome clone-based contig, which covered the rice chromosome 4 centromere. Complete sequencing of the 124-kb rice chromosome 4 centromere revealed that it consisted of 18 tracts of 379 tandemly arrayed repeats known as CentO and a total of 19 centromeric retroelements (CRs) but no unique sequences were detected. Four tracts, composed of 65 CentO repeats, were located in the opposite orientation, and 18 CentO tracts were flanked by 19 retroelements. The CRs were classified into four types, and the type I retroelements appeared to be more specific to rice centromeres. The preferential insert of the CRs among CentO repeats indicated that the centromere-specific retroelements may contribute to centromere expansion during evolution. The presence of three intact retrotransposons in the centromere suggests that they may be responsible for functional centromere initiation through a transcription-mediated mechanism.
GenBank

PubMed Central

Benson, Dennis A.; Karsch-Mizrachi, Ilene; Lipman, David J.; Ostell, James; Wheeler, David L.

2007-01-01

GenBank (R) is a comprehensive database that contains publicly available nucleotide sequences for more than 240 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the EMBL Data Library in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage (). PMID:17202161
Delay test generation for synchronous sequential circuits

NASA Astrophysics Data System (ADS)

Devadas, Srinivas

1989-05-01

We address the problem of generating tests for delay faults in non-scan synchronous sequential circuits. Delay test generation for sequential circuits is a considerably more difficult problem than delay testing of combinational circuits and has received much less attention. In this paper, we present a method for generating test sequences to detect delay faults in sequential circuits using the stuck-at fault sequential test generator STALLION. The method is complete in that it will generate a delay test sequence for a targeted fault given sufficient CPU time, if such a sequence exists. We term faults for which no delay test sequence exists, under out test methodology, sequentially delay redundant. We describe means of eliminating sequential delay redundancies in logic circuits. We present a partial-scan methodology for enhancing the testability of difficult-to-test of untestable sequential circuits, wherein a small number of flip-flops are selected and made controllable/observable. The selection process guarantees the elimination of all sequential delay redundancies. We show that an intimate relationship exists between state assignment and delay testability of a sequential machine. We describe a state assignment algorithm for the synthesis of sequential machines with maximal delay fault testability. Preliminary experimental results using the test generation, partial-scan and synthesis algorithm are presented.
Systematic Errors in Peptide and Protein Identification and Quantification by Modified Peptides*

PubMed Central

Bogdanow, Boris; Zauber, Henrik; Selbach, Matthias

2016-01-01

The principle of shotgun proteomics is to use peptide mass spectra in order to identify corresponding sequences in a protein database. The quality of peptide and protein identification and quantification critically depends on the sensitivity and specificity of this assignment process. Many peptides in proteomic samples carry biochemical modifications, and a large fraction of unassigned spectra arise from modified peptides. Spectra derived from modified peptides can erroneously be assigned to wrong amino acid sequences. However, the impact of this problem on proteomic data has not yet been investigated systematically. Here we use combinations of different database searches to show that modified peptides can be responsible for 20–50% of false positive identifications in deep proteomic data sets. These false positive hits are particularly problematic as they have significantly higher scores and higher intensities than other false positive matches. Furthermore, these wrong peptide assignments lead to hundreds of false protein identifications and systematic biases in protein quantification. We devise a “cleaned search” strategy to address this problem and show that this considerably improves the sensitivity and specificity of proteomic data. In summary, we show that modified peptides cause systematic errors in peptide and protein identification and quantification and should therefore be considered to further improve the quality of proteomic data annotation. PMID:27215553
ArrayInitiative - a tool that simplifies creating custom Affymetrix CDFs

PubMed Central

2011-01-01

Background Probes on a microarray represent a frozen view of a genome and are quickly outdated when new sequencing studies extend our knowledge, resulting in significant measurement error when analyzing any microarray experiment. There are several bioinformatics approaches to improve probe assignments, but without in-house programming expertise, standardizing these custom array specifications as a usable file (e.g. as Affymetrix CDFs) is difficult, owing mostly to the complexity of the specification file format. However, without correctly standardized files there is a significant barrier for testing competing analysis approaches since this file is one of the required inputs for many commonly used algorithms. The need to test combinations of probe assignments and analysis algorithms led us to develop ArrayInitiative, a tool for creating and managing custom array specifications. Results ArrayInitiative is a standalone, cross-platform, rich client desktop application for creating correctly formatted, custom versions of manufacturer-provided (default) array specifications, requiring only minimal knowledge of the array specification rules and file formats. Users can import default array specifications, import probe sequences for a default array specification, design and import a custom array specification, export any array specification to multiple output formats, export the probe sequences for any array specification and browse high-level information about the microarray, such as version and number of probes. The initial release of ArrayInitiative supports the Affymetrix 3' IVT expression arrays we currently analyze, but as an open source application, we hope that others will contribute modules for other platforms. Conclusions ArrayInitiative allows researchers to create new array specifications, in a standard format, based upon their own requirements. This makes it easier to test competing design and analysis strategies that depend on probe definitions. Since the custom array specifications are easily exported to the manufacturer's standard format, researchers can analyze these customized microarray experiments using established software tools, such as those available in Bioconductor. PMID:21548938
Chirality- and sequence-selective successive self-sorting via specific homo- and complementary-duplex formations

PubMed Central

Makiguchi, Wataru; Tanabe, Junki; Yamada, Hidekazu; Iida, Hiroki; Taura, Daisuke; Ousaka, Naoki; Yashima, Eiji

2015-01-01

Self-recognition and self-discrimination within complex mixtures are of fundamental importance in biological systems, which entirely rely on the preprogrammed monomer sequences and homochirality of biological macromolecules. Here we report artificial chirality- and sequence-selective successive self-sorting of chiral dimeric strands bearing carboxylic acid or amidine groups joined by chiral amide linkers with different sequences through homo- and complementary-duplex formations. A mixture of carboxylic acid dimers linked by racemic-1,2-cyclohexane bis-amides with different amide sequences (NHCO or CONH) self-associate to form homoduplexes in a completely sequence-selective way, the structures of which are different from each other depending on the linker amide sequences. The further addition of an enantiopure amide-linked amidine dimer to a mixture of the racemic carboxylic acid dimers resulted in the formation of a single optically pure complementary duplex with a 100% diastereoselectivity and complete sequence specificity stabilized by the amidinium–carboxylate salt bridges, leading to the perfect chirality- and sequence-selective duplex formation. PMID:26051291
Microbial species delineation using whole genome sequences.

PubMed

Varghese, Neha J; Mukherjee, Supratim; Ivanova, Natalia; Konstantinidis, Konstantinos T; Mavrommatis, Kostas; Kyrpides, Nikos C; Pati, Amrita

2015-08-18

Increased sequencing of microbial genomes has revealed that prevailing prokaryotic species assignments can be inconsistent with whole genome information for a significant number of species. The long-standing need for a systematic and scalable species assignment technique can be met by the genome-wide Average Nucleotide Identity (gANI) metric, which is widely acknowledged as a robust measure of genomic relatedness. In this work, we demonstrate that the combination of gANI and the alignment fraction (AF) between two genomes accurately reflects their genomic relatedness. We introduce an efficient implementation of AF,gANI and discuss its successful application to 86.5M genome pairs between 13,151 prokaryotic genomes assigned to 3032 species. Subsequently, by comparing the genome clusters obtained from complete linkage clustering of these pairs to existing taxonomy, we observed that nearly 18% of all prokaryotic species suffer from anomalies in species definition. Our results can be used to explore central questions such as whether microorganisms form a continuum of genetic diversity or distinct species represented by distinct genetic signatures. We propose that this precise and objective AF,gANI-based species definition: the MiSI (Microbial Species Identifier) method, be used to address previous inconsistencies in species classification and as the primary guide for new taxonomic species assignment, supplemented by the traditional polyphasic approach, as required. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
An Investigation of the Partial-Assignment Completion Effect on Students' Assignment Choice Behavior

ERIC Educational Resources Information Center

Hawthorn-Embree, Meredith L.; Skinner, Christopher H.; Parkhurst, John; Conley, Elisha

2011-01-01

This study was designed to investigate the partial assignment completion effect. Seventh-grade students were given a math assignment. After working for 5 min, they were interrupted and their partially completed assignments were collected. About 20 min later, students were given their partially completed assignment and a new, control assignment…

Storage and utilization of HLA genomic data--new approaches to HLA typing.

PubMed

Helmberg, W

2000-01-01

Currently available DNA-based HLA typing assays can provide detailed information about sequence motifs of a tested sample. It is still a common practice, however, for information acquired by high-resolution sequence specific oligonucleotide probe (SSOP) typing or sequence specific priming (SSP) to be presented in a low-resolution serological format. Unfortunately, this representation can lead to significant loss of useful data in many cases. An alternative to assigning allele equivalents to suchDNA typing results is simply to store the observed typing pattern and utilize the information with the help of Virtual DNA Analysis (VDA). Interpretation of the stored typing patterns can then be updated based on newly defined alleles, assuming the sequence motifs detected by the typing reagents are known. Rather than updating reagent specificities in individual laboratories, such updates should be performed in a central, publicly available sequence database. By referring to this database, HLA genomic data can then be stored and transferred between laboratories without loss of information. The 13th International Histocompatibility Workshop offers an ideal opportunity to begin building this common database for the entire human MHC.
PRED-CLASS: cascading neural networks for generalized protein classification and genome-wide applications.

PubMed

Pasquier, C; Promponas, V J; Hamodrakas, S J

2001-08-15

A cascading system of hierarchical, artificial neural networks (named PRED-CLASS) is presented for the generalized classification of proteins into four distinct classes-transmembrane, fibrous, globular, and mixed-from information solely encoded in their amino acid sequences. The architecture of the individual component networks is kept very simple, reducing the number of free parameters (network synaptic weights) for faster training, improved generalization, and the avoidance of data overfitting. Capturing information from as few as 50 protein sequences spread among the four target classes (6 transmembrane, 10 fibrous, 13 globular, and 17 mixed), PRED-CLASS was able to obtain 371 correct predictions out of a set of 387 proteins (success rate approximately 96%) unambiguously assigned into one of the target classes. The application of PRED-CLASS to several test sets and complete proteomes of several organisms demonstrates that such a method could serve as a valuable tool in the annotation of genomic open reading frames with no functional assignment or as a preliminary step in fold recognition and ab initio structure prediction methods. Detailed results obtained for various data sets and completed genomes, along with a web sever running the PRED-CLASS algorithm, can be accessed over the World Wide Web at http://o2.biol.uoa.gr/PRED-CLASS.
The complete sequence of mitochondrial genome of polled yak (Bos grunniens).

PubMed

Chu, Min; Wu, Xiaoyun; Liang, Chunnian; Pei, Jie; Ding, Xuezhi; Guo, Xian; Bao, Pengjia; Yan, Ping

2016-05-01

Generally speaking, the hornless trait is also known as polled. Although the POLL locus could be assigned to a 1.36-Mb interval in the centromeric region of BTA1 (Georges et al., 1993; Drögemüller et al., 2005)), and (Liu et al., 2014) reported a 147-kb segment that included three protein-coding genes was the most likely location of the POLL mutation in domestic yaks, the underlying genetic basis for the polled trait is still unknown. In this work, the complete mitochondrial genome sequence of polled yak was determined for the first time. The total length of the mitogenome is 16,324 bp long, with the base composition of 33.72% A, 27.25% T, 25.83% C, and 13.20% G. It contained 13 protein-coding genes, 22 tRNA genes, 2 rRNA genes and 1 non-coding region (D-loop region). The gene order of polled yak mitogenome is identical to that observed in most other vertebrates. The complete mitogenome sequence information of polled yak will provide useful data for further studies on protection of genetic resources and phylogenetic relationships within Bos grunniens.
Classifying short genomic fragments from novel lineages using composition and homology

PubMed Central

2011-01-01

Background The assignment of taxonomic attributions to DNA fragments recovered directly from the environment is a vital step in metagenomic data analysis. Assignments can be made using rank-specific classifiers, which assign reads to taxonomic labels from a predetermined level such as named species or strain, or rank-flexible classifiers, which choose an appropriate taxonomic rank for each sequence in a data set. The choice of rank typically depends on the optimal model for a given sequence and on the breadth of taxonomic groups seen in a set of close-to-optimal models. Homology-based (e.g., LCA) and composition-based (e.g., PhyloPythia, TACOA) rank-flexible classifiers have been proposed, but there is at present no hybrid approach that utilizes both homology and composition. Results We first develop a hybrid, rank-specific classifier based on BLAST and Naïve Bayes (NB) that has comparable accuracy and a faster running time than the current best approach, PhymmBL. By substituting LCA for BLAST or allowing the inclusion of suboptimal NB models, we obtain a rank-flexible classifier. This hybrid classifier outperforms established rank-flexible approaches on simulated metagenomic fragments of length 200 bp to 1000 bp and is able to assign taxonomic attributions to a subset of sequences with few misclassifications. We then demonstrate the performance of different classifiers on an enhanced biological phosphorous removal metagenome, illustrating the advantages of rank-flexible classifiers when representative genomes are absent from the set of reference genomes. Application to a glacier ice metagenome demonstrates that similar taxonomic profiles are obtained across a set of classifiers which are increasingly conservative in their classification. Conclusions Our NB-based classification scheme is faster than the current best composition-based algorithm, Phymm, while providing equally accurate predictions. The rank-flexible variant of NB, which we term ε-NB, is complementary to LCA and can be combined with it to yield conservative prediction sets of very high confidence. The simple parameterization of LCA and ε-NB allows for tuning of the balance between more predictions and increased precision, allowing the user to account for the sensitivity of downstream analyses to misclassified or unclassified sequences. PMID:21827705
Authentication of an endangered herb Changium smyrnioides from different producing areas based on rDNA ITS sequences and allele-specific PCR.

PubMed

Sun, Xiaoqin; Wei, Yanglian; Qin, Minjian; Guo, Qiaosheng; Guo, Jianlin; Zhou, Yifeng; Hang, Yueyu

2012-03-01

The rDNA ITS region of 18 samples of Changium smyrnioides from 7 areas and of 2 samples of Chuanminshen violaceum were sequenced and analyzed. The amplified ITS region of the samples, including a partial sequence of ITS1 and complete sequences of 5.8S and ITS2, had a total length of 555 bp. After complete alignment, there were 49 variable sites, of which 45 were informative, when gaps were treated as missing data. Samples of C. smyrnioides from different locations could be identified exactly based on the variable sites. The maximum parsimony (MP) and neighbor joining (NJ) tree constructed from the ITS sequences based on Kumar's two-parameter model showed that the genetic distances of the C. smyrnioides samples from different locations were not always related to their geographical distances. A specific primer set for Allele-specific PCR authentication of C. violaceum from Jurong of Jiangsu was designed based on the SNP in the ITS sequence alignment. C. violaceum from the major genuine producing area in Jurong of Jiangsu could be identified exactly and quickly by Allele-specific PCR.
Enrichment of individual KIR2DL4 sequences from genomic DNA using long-template PCR and allele-specific hybridization to magnetic bead-bound oligonucleotide probes.

PubMed

Roberts, C H; Turino, C; Madrigal, J A; Marsh, S G E

2007-06-01

DNA enrichment by allele-specific hybridization (DEASH) was used as a means to isolate individual alleles of the killer cell immunoglobulin-like receptor (KIR2DL4) gene from heterozygous genomic DNA. Using long-template polymerase chain reaction (LT-PCR), the complete KIR2DL4 gene was amplified from a cell line that had previously been characterized for its KIR gene content by PCR using sequence-specific primers (PCR-SSP). The whole gene amplicons were sequenced and we identified two heterozygous positions in accordance with the predictions of the PCR-SSP. The amplicons were then hybridized to allele-specific, biotinylated oligonucleotide probes and through binding to streptavidin-coated beads, the targeted alleles were enriched. A second PCR amplified only the exonic regions of the enriched allele, and these were then sequenced in full. We show DEASH to be capable of enriching single alleles from a heterozygous PCR product, and through sequencing the enriched DNA, we are able to produce complete coding sequences of the KIR2DL4 alleles in accordance with the typing predicted by PCR-SSP.
Complete virilization in congenital adrenal hyperplasia: clinical course, medical management and disease-related complications.

PubMed

Woelfle, J; Hoepffner, W; Sippell, W G; Brämswig, J H; Heidemann, P; Deiss, D; Bökenkamp, A; Roth, C; Irle, U; Wollmann, H A; Zachmann, M; Kubini, K; Albers, N

2002-02-01

In girls with congenital adrenal hyperplasia (CAH), genital ambiguity usually leads to a rapid neonatal diagnosis. Rarely, CAH causes complete virilization and male sex assignment with a delayed diagnosis. After being confronted with very specific problems in two of such patients, we collected data of patients with CAH and complete virilization in a nationwide study to delineate specific problems of these rare patients in order to improve their management. Through the German Working Group of Paediatric Endocrinology (Arbeitsgemeinschaft Pädiatrische Endokrinologie, APE), questionnaires were sent to all members caring for patients with CAH and complete virilization in their endocrine clinics. Data from 16 patients from 10 paediatric endocrine centres were assessed by questionnaire. The following problems have been encountered. (1) Sex assignment/gender identity: initially all patients had a male sex assignment. Six patients were diagnosed during the first month of life. Five were reassigned to female sex immediately, one at the age of 19 months. Except in one girl demonstrating some tomboyish behaviour, gender role behaviour in these patients did not differ from unaffected girls. Ten patients were diagnosed late at 3.4--7 years of age. In seven patients with a late diagnosis, male sex assignment was maintained; one of them expressed some concerns about living as a male. In three patients late sex reversal was performed, gender identity is very poor in one and new sex assignment is currently under consideration. (2) SURGERY: irrespective of the sex assigned, all patients had between one and three surgical procedures, including clitoris reduction and (repeated) vaginoplasties in patients with female sex assignment. Hysterectomy and ovarectomy were performed in patients with male sex assignment. (3) Short stature: patients with a late diagnosis of CAH had extremely advanced bone ages of +6.3 to +9.5 years, leading to severely reduced final height of 137 to 150 cm in adult patients. Patients tended to follow height percentiles of genetic females. One pubertal patient was suicidal due to short stature. (4) Central precocious puberty (CPP): prolonged exposition to adrenal androgens led to CPP in one patient. He was treated with GnRH analogues until gonadectomy. Patients with CAH and complete virilization have a high risk of being diagnosed late. There are major problems and uncertainties of the patients' families and the treating physicians concerning gender assignment. Gender identity is disturbed in some patients. In addition, multiple surgical procedures are necessary and short stature as well as central precocious puberty might be important to avoid late sequelae. While some surgical interventions are probably unavoidable, most of these issues could be resolved with an early diagnosis. Thus, especially for these patients, a neonatal screening programme for CAH would be of paramount importance.
Complete amino acid sequence of ananain and a comparison with stem bromelain and other plant cysteine proteases.

PubMed Central

Lee, K L; Albee, K L; Bernasconi, R J; Edmunds, T

1997-01-01

The amino acid sequences of ananain (EC3.4.22.31) and stem bromelain (3.4.22.32), two cysteine proteases from pineapple stem, are similar yet ananain and stem bromelain possess distinct specificities towards synthetic peptide substrates and different reactivities towards the cysteine protease inhibitors E-64 and chicken egg white cystatin. We present here the complete amino acid sequence of ananain and compare it with the reported sequences of pineapple stem bromelain, papain and chymopapain from papaya and actinidin from kiwifruit. Ananain is comprised of 216 residues with a theoretical mass of 23464 Da. This primary structure includes a sequence insert between residues 170 and 174 not present in stem bromelain or papain and a hydrophobic series of amino acids adjacent to His-157. It is possible that these sequence differences contribute to the different substrate and inhibitor specificities exhibited by ananain and stem bromelain. PMID:9355753
Multilocus Sequence Typing of Cronobacter Strains Isolated from Retail Foods and Environmental Samples.

PubMed

Killer, Jiří; Skřivanová, Eva; Hochel, Igor; Marounek, Milan

2015-06-01

Cronobacter spp. are bacterial pathogens that affect children and immunocompromised adults. In this study, we used multilocus sequence typing (MLST) to determine sequence types (STs) in 11 Cronobacter spp. strains isolated from retail foods, 29 strains from dust samples obtained from vacuum cleaners, and 4 clinical isolates. Using biochemical tests, species-specific polymerase chain reaction, and MLST analysis, 36 strains were identified as Cronobacter sakazakii, and 6 were identified as Cronobacter malonaticus. In addition, one strain that originated from retail food and one from a dust sample from a vacuum cleaner were identified on the basis of MLST analysis as Cronobacter dublinensis and Cronobacter turicensis, respectively. Cronobacter spp. strains isolated from the retail foods were assigned to eight different MLST sequence types, seven of which were newly identified. The strains isolated from the dust samples were assigned to 7 known STs and 14 unknown STs. Three clinical isolates and one household dust isolate were assigned to ST4, which is the predominant ST associated with neonatal meningitis. One clinical isolate was classified based on MLST analysis as Cronobacter malonaticus and belonged to an as-yet-unknown ST. Three strains isolated from the household dust samples were assigned to ST1, which is another clinically significant ST. It can be concluded that Cronobacter spp. strains of different origin are genetically quite variable. The recovery of C. sakazakii strains belonging to ST1 and ST4 from the dust samples suggests the possibility that contamination could occur during food preparation. All of the novel STs and alleles for C. sakazakii, C. malonaticus, C. dublinensis, and C. turicensis determined in this study were deposited in the Cronobacter MLST database available online ( http://pubmlst.org/cronobacter/).
Optimizing high performance computing workflow for protein functional annotation.

PubMed

Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene

2014-09-10

Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data.
Optimizing high performance computing workflow for protein functional annotation

PubMed Central

Stanberry, Larissa; Rekepalli, Bhanu; Liu, Yuan; Giblock, Paul; Higdon, Roger; Montague, Elizabeth; Broomall, William; Kolker, Natali; Kolker, Eugene

2014-01-01

Functional annotation of newly sequenced genomes is one of the major challenges in modern biology. With modern sequencing technologies, the protein sequence universe is rapidly expanding. Newly sequenced bacterial genomes alone contain over 7.5 million proteins. The rate of data generation has far surpassed that of protein annotation. The volume of protein data makes manual curation infeasible, whereas a high compute cost limits the utility of existing automated approaches. In this work, we present an improved and optmized automated workflow to enable large-scale protein annotation. The workflow uses high performance computing architectures and a low complexity classification algorithm to assign proteins into existing clusters of orthologous groups of proteins. On the basis of the Position-Specific Iterative Basic Local Alignment Search Tool the algorithm ensures at least 80% specificity and sensitivity of the resulting classifications. The workflow utilizes highly scalable parallel applications for classification and sequence alignment. Using Extreme Science and Engineering Discovery Environment supercomputers, the workflow processed 1,200,000 newly sequenced bacterial proteins. With the rapid expansion of the protein sequence universe, the proposed workflow will enable scientists to annotate big genome data. PMID:25313296
Assignment Choice, Effort, and Assignment Completion: Does Work Ethic Predict Those Who Choose Higher-Effort Assignments?

ERIC Educational Resources Information Center

Parkhurst, John T.; Fleisher, Matthew S.; Skinner, Christopher H.; Woehr, David J.; Hawthorn-Embree, Meredith L.

2011-01-01

After completing the Multidimensional Work-Ethic Profile (MWEP), 98 college students were given a 20-problem math computation assignment and instructed to stop working on the assignment after completing 10 problems. Next, they were allowed to choose to finish either the partially completed assignment that had 10 problems remaining or a new…
Quantized phase coding and connected region labeling for absolute phase retrieval.

PubMed

Chen, Xiangcheng; Wang, Yuwei; Wang, Yajun; Ma, Mengchao; Zeng, Chunnian

2016-12-12

This paper proposes an absolute phase retrieval method for complex object measurement based on quantized phase-coding and connected region labeling. A specific code sequence is embedded into quantized phase of three coded fringes. Connected regions of different codes are labeled and assigned with 3-digit-codes combining the current period and its neighbors. Wrapped phase, more than 36 periods, can be restored with reference to the code sequence. Experimental results verify the capability of the proposed method to measure multiple isolated objects.
Initial description of primate-specific cystine-knot Prometheus genes and differential gene expansions of D-dopachrome tautomerase genes

PubMed Central

Premzl, Marko

2015-01-01

Using eutherian comparative genomic analysis protocol and public genomic sequence data sets, the present work attempted to update and revise two gene data sets. The most comprehensive third party annotation gene data sets of eutherian adenohypophysis cystine-knot genes (128 complete coding sequences), and d-dopachrome tautomerases and macrophage migration inhibitory factor genes (30 complete coding sequences) were annotated. For example, the present study first described primate-specific cystine-knot Prometheus genes, as well as differential gene expansions of D-dopachrome tautomerase genes. Furthermore, new frameworks of future experiments of two eutherian gene data sets were proposed. PMID:25941635
Phylogenetic analysis of Austrian canine distemper virus strains from clinical samples from dogs and wild carnivores.

PubMed

Benetka, V; Leschnik, M; Affenzeller, N; Möstl, K

2011-04-09

Austrian field cases of canine distemper (14 dogs, one badger [Meles meles] and one stone marten [Martes foina]) from 2002 to 2007 were investigated and the case histories were summarised briefly. Phylogenetic analysis of fusion (F) and haemagglutinin (H) gene sequences revealed different canine distemper virus (CDV) lineages circulating in Austria. The majority of CDV strains detected from 2002 to 2004 were well embedded in the European lineage. One Austrian canine sample detected in 2003, with a high similarity to Hungarian sequences from 2005 to 2006, could be assigned to the Arctic group (phocine distemper virus type 2-like). The two canine sequences from 2007 formed a clearly distinct group flanked by sequences detected previously in China and the USA on an intermediate position between the European wildlife and the Asia-1 cluster. The Austrian wildlife strains (2006 and 2007) could be assigned to the European wildlife group and were most closely related to, yet clearly different from, the 2007 canine samples. To elucidate the epidemiological role of Austrian wildlife in the transmission of the disease to dogs and vice versa, H protein residues related to receptor and host specificity (residues 530 and 549) were analysed. All samples showed the amino acids expected for their host of origin, with the exception of a canine sequence from 2007, which had an intermediate position between wildlife and canine viral strains. In the period investigated, canine strains circulating in Austria could be assigned to four different lineages reflecting both a high diversity and probably different origins of virus introduction to Austria in different years.
Phylogeny of anaerobic fungi (phylum Neocallimastigomycota), with contributions from yak in China.

PubMed

Wang, Xuewei; Liu, Xingzhong; Groenewald, Johannes Z

2017-01-01

The phylum Neocallimastigomycota contains eight genera (about 20 species) of strictly anaerobic fungi. The evolutionary relationships of these genera are uncertain due to insufficient sequence data to infer their phylogenies. Based on morphology and molecular phylogeny, thirteen isolates obtained from yak faeces and rumen digesta in China were assigned to Neocallimastix frontalis (nine isolates), Orpinomyces joyonii (two isolates) and Caecomyces sp. (two isolates), respectively. The phylogenetic relationships of the eight genera were evaluated using complete ITS and partial LSU sequences, compared to the ITS1 region which has been widely used in this phylum in the past. Five monophyletic lineages corresponding to six of the eight genera were statistically supported. Isolates of Caecomyces and Cyllamyces were present in a single lineage and could not be separated properly. Members of Neocallimastigomycota with uniflagellate zoospores represented by Piromyces were polyphyletic. The Piromyces-like genus Oontomyces was consistently closely related to the traditional Anaeromyces, and separated the latter genus into two clades. The phylogenetic position of the Piromyces-like genus Buwchfawromyces remained unresolved. Orpinomyces and Neocallimastix, sharing polyflagellate zoospores, were supported as sister genera in the LSU phylogeny. Apparently ITS, specifically ITS1 alone, is not a good marker to resolve the generic affinities of the studied fungi. The LSU sequences are easier to align and appear to work well to resolve generic relationships. This study provides a comparative phylogenetic revision of Neocallimastigomycota isolates known from culture and sequence data.
Comparative phylobiomic analysis of the bacterial community of water kefir by 16S rRNA gene amplicon sequencing and ARDRA analysis.

PubMed

Gulitz, A; Stadie, J; Ehrmann, M A; Ludwig, W; Vogel, R F

2013-04-01

The aim of this study was to analyse the bacterial microbiota of water kefir using culture-independent methods. We compared four water kefirs of different origins using 16S rDNA amplicon sequencing and ARDRA. The microbiota consisted of different proportions of the genera Lactobacillus (Lact.), Leuconostoc (Leuc.), Acetobacter (Acet.) and Gluconobacter. Surprisingly, varying but consistently high numbers of sequences representing members of the genus Bifidobacterium (Bif.) were found in all kefirs. Whereas part of the bifidobacterial sequences could be assigned to Bifidobacterium psychraerophilum, a majority of sequences identical to each other could not be assigned to any known species. A nearly full-length sequence of the latter exhibited a beyond-species similarity (96.4%) with the sequence from the closest relative species Bif. psychraerophilum. A Bifidobacterium-specific ARDRA analysis reflected the abundance of the novel Bifidobacterium species by revealing its unique MboI restriction profile. Attempts to isolate the bifidobacteria were successful for Bif. psychraerophilum only. The complexity of the water kefir microbiota has been underestimated in previously studies. The occurrence of bifidobacteria as part of the consortium is novel. These data give new insights into the understanding of the complexity of food fermentations and underline the need for approaches detecting noncultivable organisms. © 2013 The Society for Applied Microbiology.
GenBank.

PubMed

Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Wheeler, David L

2008-01-01

GenBank (R) is a comprehensive database that contains publicly available nucleotide sequences for more than 260 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the European Molecular Biology Laboratory Nucleotide Sequence Database in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage: www.ncbi.nlm.nih.gov.
GenBank

PubMed Central

Benson, Dennis A.; Karsch-Mizrachi, Ilene; Lipman, David J.; Ostell, James; Wheeler, David L.

2008-01-01

GenBank (R) is a comprehensive database that contains publicly available nucleotide sequences for more than 260 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the European Molecular Biology Laboratory Nucleotide Sequence Database in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage: www.ncbi.nlm.nih.gov PMID:18073190
Replicating and Extending Research on the Partial Assignment Completion Effect: Is Sunk Cost Related to Partial Assignment Completion Strength?

ERIC Educational Resources Information Center

Hawthorn-Embree, Meredith L.; Taylor, Emily P.; Skinner, Christopher H.; Parkhurst, John; Nalls, Meagan L.

2014-01-01

After students acquire a skill, mastery often requires them to choose to engage in assigned academic activities (e.g., independent seatwork, and homework). Although students may be more likely to choose to work on partially completed assignments than on new assignments, the partial assignment completion (PAC) effect may not be very powerful. The…

Comparative and Joint Analysis of Two Metagenomic Datasets from a Biogas Fermenter Obtained by 454-Pyrosequencing

PubMed Central

Jaenicke, Sebastian; Ander, Christina; Bekel, Thomas; Bisdorf, Regina; Dröge, Marcus; Gartemann, Karl-Heinz; Jünemann, Sebastian; Kaiser, Olaf; Krause, Lutz; Tille, Felix; Zakrzewski, Martha; Pühler, Alfred

2011-01-01

Biogas production from renewable resources is attracting increased attention as an alternative energy source due to the limited availability of traditional fossil fuels. Many countries are promoting the use of alternative energy sources for sustainable energy production. In this study, a metagenome from a production-scale biogas fermenter was analysed employing Roche's GS FLX Titanium technology and compared to a previous dataset obtained from the same community DNA sample that was sequenced on the GS FLX platform. Taxonomic profiling based on 16S rRNA-specific sequences and an Environmental Gene Tag (EGT) analysis employing CARMA demonstrated that both approaches benefit from the longer read lengths obtained on the Titanium platform. Results confirmed Clostridia as the most prevalent taxonomic class, whereas species of the order Methanomicrobiales are dominant among methanogenic Archaea. However, the analyses also identified additional taxa that were missed by the previous study, including members of the genera Streptococcus, Acetivibrio, Garciella, Tissierella, and Gelria, which might also play a role in the fermentation process leading to the formation of methane. Taking advantage of the CARMA feature to correlate taxonomic information of sequences with their assigned functions, it appeared that Firmicutes, followed by Bacteroidetes and Proteobacteria, dominate within the functional context of polysaccharide degradation whereas Methanomicrobiales represent the most abundant taxonomic group responsible for methane production. Clostridia is the most important class involved in the reductive CoA pathway (Wood-Ljungdahl pathway) that is characteristic for acetogenesis. Based on binning of 16S rRNA-specific sequences allocated to the dominant genus Methanoculleus, it could be shown that this genus is represented by several different species. Phylogenetic analysis of these sequences placed them in close proximity to the hydrogenotrophic methanogen Methanoculleus bourgensis. While rarefaction analyses still indicate incomplete coverage, examination of the GS FLX Titanium dataset resulted in the identification of additional genera and functional elements, providing a far more complete coverage of the community involved in anaerobic fermentative pathways leading to methane formation. PMID:21297863
MICCA: a complete and accurate software for taxonomic profiling of metagenomic data.

PubMed

Albanese, Davide; Fontana, Paolo; De Filippo, Carlotta; Cavalieri, Duccio; Donati, Claudio

2015-05-19

The introduction of high throughput sequencing technologies has triggered an increase of the number of studies in which the microbiota of environmental and human samples is characterized through the sequencing of selected marker genes. While experimental protocols have undergone a process of standardization that makes them accessible to a large community of scientist, standard and robust data analysis pipelines are still lacking. Here we introduce MICCA, a software pipeline for the processing of amplicon metagenomic datasets that efficiently combines quality filtering, clustering of Operational Taxonomic Units (OTUs), taxonomy assignment and phylogenetic tree inference. MICCA provides accurate results reaching a good compromise among modularity and usability. Moreover, we introduce a de-novo clustering algorithm specifically designed for the inference of Operational Taxonomic Units (OTUs). Tests on real and synthetic datasets shows that thanks to the optimized reads filtering process and to the new clustering algorithm, MICCA provides estimates of the number of OTUs and of other common ecological indices that are more accurate and robust than currently available pipelines. Analysis of public metagenomic datasets shows that the higher consistency of results improves our understanding of the structure of environmental and human associated microbial communities. MICCA is an open source project.
Phylogenomic detection and functional prediction of genes potentially important for plant meiosis.

PubMed

Zhang, Luoyan; Kong, Hongzhi; Ma, Hong; Yang, Ji

2018-02-15

Meiosis is a specialized type of cell division necessary for sexual reproduction in eukaryotes. A better understanding of the cytological procedures of meiosis has been achieved by comprehensive cytogenetic studies in plants, while the genetic mechanisms regulating meiotic progression remain incompletely understood. The increasing accumulation of complete genome sequences and large-scale gene expression datasets has provided a powerful resource for phylogenomic inference and unsupervised identification of genes involved in plant meiosis. By integrating sequence homology and expression data, 164, 131, 124 and 162 genes potentially important for meiosis were identified in the genomes of Arabidopsis thaliana, Oryza sativa, Selaginella moellendorffii and Pogonatum aloides, respectively. The predicted genes were assigned to 45 meiotic GO terms, and their functions were related to different processes occurring during meiosis in various organisms. Most of the predicted meiotic genes underwent lineage-specific duplication events during plant evolution, with about 30% of the predicted genes retaining only a single copy in higher plant genomes. The results of this study provided clues to design experiments for better functional characterization of meiotic genes in plants, promoting the phylogenomic approach to the evolutionary dynamics of the plant meiotic machineries. Copyright © 2017 Elsevier B.V. All rights reserved.
MICCA: a complete and accurate software for taxonomic profiling of metagenomic data

PubMed Central

Albanese, Davide; Fontana, Paolo; De Filippo, Carlotta; Cavalieri, Duccio; Donati, Claudio

2015-01-01

The introduction of high throughput sequencing technologies has triggered an increase of the number of studies in which the microbiota of environmental and human samples is characterized through the sequencing of selected marker genes. While experimental protocols have undergone a process of standardization that makes them accessible to a large community of scientist, standard and robust data analysis pipelines are still lacking. Here we introduce MICCA, a software pipeline for the processing of amplicon metagenomic datasets that efficiently combines quality filtering, clustering of Operational Taxonomic Units (OTUs), taxonomy assignment and phylogenetic tree inference. MICCA provides accurate results reaching a good compromise among modularity and usability. Moreover, we introduce a de-novo clustering algorithm specifically designed for the inference of Operational Taxonomic Units (OTUs). Tests on real and synthetic datasets shows that thanks to the optimized reads filtering process and to the new clustering algorithm, MICCA provides estimates of the number of OTUs and of other common ecological indices that are more accurate and robust than currently available pipelines. Analysis of public metagenomic datasets shows that the higher consistency of results improves our understanding of the structure of environmental and human associated microbial communities. MICCA is an open source project. PMID:25988396
Rapid and accurate taxonomic classification of insect (class Insecta) cytochrome c oxidase subunit 1 (COI) DNA barcode sequences using a naïve Bayesian classifier

PubMed Central

Porter, Teresita M; Gibson, Joel F; Shokralla, Shadi; Baird, Donald J; Golding, G Brian; Hajibabaei, Mehrdad

2014-01-01

Current methods to identify unknown insect (class Insecta) cytochrome c oxidase (COI barcode) sequences often rely on thresholds of distances that can be difficult to define, sequence similarity cut-offs, or monophyly. Some of the most commonly used metagenomic classification methods do not provide a measure of confidence for the taxonomic assignments they provide. The aim of this study was to use a naïve Bayesian classifier (Wang et al. Applied and Environmental Microbiology, 2007; 73: 5261) to automate taxonomic assignments for large batches of insect COI sequences such as data obtained from high-throughput environmental sequencing. This method provides rank-flexible taxonomic assignments with an associated bootstrap support value, and it is faster than the blast-based methods commonly used in environmental sequence surveys. We have developed and rigorously tested the performance of three different training sets using leave-one-out cross-validation, two field data sets, and targeted testing of Lepidoptera, Diptera and Mantodea sequences obtained from the Barcode of Life Data system. We found that type I error rates, incorrect taxonomic assignments with a high bootstrap support, were already relatively low but could be lowered further by ensuring that all query taxa are actually present in the reference database. Choosing bootstrap support cut-offs according to query length and summarizing taxonomic assignments to more inclusive ranks can also help to reduce error while retaining the maximum number of assignments. Additionally, we highlight gaps in the taxonomic and geographic representation of insects in public sequence databases that will require further work by taxonomists to improve the quality of assignments generated using any method.
Towards Automated Structure-Based NMR Resonance Assignment

NASA Astrophysics Data System (ADS)

Jang, Richard; Gao, Xin; Li, Ming

We propose a general framework for solving the structure-based NMR backbone resonance assignment problem. The core is a novel 0-1 integer programming model that can start from a complete or partial assignment, generate multiple assignments, and model not only the assignment of spins to residues, but also pairwise dependencies consisting of pairs of spins to pairs of residues. It is still a challenge for automated resonance assignment systems to perform the assignment directly from spectra without any manual intervention. To test the feasibility of this for structure-based assignment, we integrated our system with our automated peak picking and sequence-based resonance assignment system to obtain an assignment for the protein TM1112 with 91% recall and 99% precision without manual intervention. Since using a known structure has the potential to allow one to use only N-labeled NMR data and avoid the added expense of using C-labeled data, we work towards the goal of automated structure-based assignment using only such labeled data. Our system reduced the assignment error of Xiong-Pandurangan-Bailey-Kellogg's contact replacement (CR) method, which to our knowledge is the most error-tolerant method for this problem, by 5 folds on average. By using an iterative algorithm, our system has the added capability of using the NOESY data to correct assignment errors due to errors in predicting the amino acid and secondary structure type of each spin system. On a publicly available data set for Ubiquitin, where the type prediction accuracy is 83%, we achieved 91% assignment accuracy, compared to the 59% accuracy that was obtained without correcting for typing errors.
Phylogenetic position of parabasalid symbionts from the termite Calotermes flavicollis based on small subunit rRNA sequences.

PubMed

Gerbod, D; Edgcomb, V P; Noël, C; Delgado-Viscogliosi, P; Viscogliosi, E

2000-09-01

Small subunit rDNA genes were amplified by polymerase chain reaction using specific primers from mixed-population DNA obtained from the whole hindgut of the termite Calotermes flavicollis. Comparative sequence analysis of the clones revealed two kinds of sequences that were both from parabasalid symbionts. In a molecular tree inferred by distance, parsimony and likelihood methods, and including 27 parabasalid sequences retrieved from the data bases, the sequences of the group II (clones Cf5 and Cf6) were closely related to the Devescovinidae/Calonymphidae species and thus were assigned to the Devescovinidae Foaina. The sequence of the group I (clone Cf1) emerged within the Trichomonadinae and strongly clustered with Tetratrichomonas gallinarum. On the basis of morphological data, the Monocercomonadidae Hexamastix termitis might be the most likely origin of this sequence.
Determination of the complete genomic sequence and analysis of the gene products of the virus of Spring Viremia of Carp, a fish rhabdovirus.

PubMed

Hoffmann, Bernd; Schütze, Heike; Mettenleiter, Thomas C

2002-03-20

The complete genome of spring viremia of carp virus (SVCV) was cloned and the sequence of 11019 nucleotides was determined. It contains five open reading frames (ORF's) encoding for the nucleoprotein N; phosphoprotein P; matrix protein M; glycoprotein G; and the viral RNA dependent RNA polymerase L. Genes are organised in the order typical for rhabdoviruses: 3'-N-P-M-G-L-5'. The short leader and trailer regions of SVCV exhibit inverse complementarity and are similar to the respective 3' and 5' ends of the genome of vesicular stomatitis virus. To verify the predicted open reading frames proteins were expressed in bacteria and analysed with a polyclonal anti-SVCV serum. Furthermore, monospecific antisera against the distinct viral proteins were generated. Comparison of genome and protein confirm the assignment of SVCV to the genus Vesiculovirus.
RetroTector online, a rational tool for analysis of retroviral elements in small and medium size vertebrate genomic sequences

PubMed Central

Sperber, Göran; Lövgren, Anders; Eriksson, Nils-Einar; Benachenhou, Farid; Blomberg, Jonas

2009-01-01

Background The rapid accumulation of genomic information in databases necessitates rapid and specific algorithms for extracting biologically meaningful information. More or less complete retroviral sequences, also called proviral or endogenous retroviral sequences; ERVs, constitutes at least 5% of vertebrate genomes. After infecting the host, these retroviruses have integrated in germ line cells, and have then been carried in genomes for at least several 100 million years. A better understanding of structure and function of these sequences can have profound biological and medical consequences. Methods RetroTector© (ReTe) is a platform-independent Java program for identification and characterization of proviral sequences in vertebrate genomes. The full ReTe requires a local installation with a MySQL database. Although not overly complicated, the installation may take some time. A "light" version of ReTe, (RetroTector online; ROL) which does not require specific installation procedures is provided, via the World Wide Web. Results ROL was implemented under the Batchelor web interface (A Lövgren et al). It allows both GenBank accession number, file and FASTA cut-and-paste admission of sequences (5 to 10 000 kilobases). Up to ten submissions can be done simultaneously, allowing batch analysis of <= 100 Megabases. Jobs are shown in an IP-number specific list. Results are text files, and can be viewed with the program, RetroTectorViewer.jar (at the same site), which has the full graphical capabilities of the basic ReTe program. A detailed analysis of any retroviral sequences found in the submitted sequence is graphically presented, exportable in standard formats. With the current server, a complete analysis of a 1 Megabase sequence is complete in 10 minutes. It is possible to mask nonretroviral repetitive sequences in the submitted sequence, using host genome specific "brooms", which increase specificity. Discussion Proviral sequences can be hard to recognize, especially if the integration occurred many million years ago. Precise delineation of LTR, gag, pro, pol and env can be difficult, requiring manual work. ROL is a way of simplifying these tasks. Conclusion ROL provides 1. annotation and presentation of known retroviral sequences, 2. detection of proviral chains in unknown genomic sequences, with up to 100 Mbase per submission. PMID:19534753
RetroTector online, a rational tool for analysis of retroviral elements in small and medium size vertebrate genomic sequences.

PubMed

Sperber, Göran; Lövgren, Anders; Eriksson, Nils-Einar; Benachenhou, Farid; Blomberg, Jonas

2009-06-16

The rapid accumulation of genomic information in databases necessitates rapid and specific algorithms for extracting biologically meaningful information. More or less complete retroviral sequences, also called proviral or endogenous retroviral sequences; ERVs, constitutes at least 5% of vertebrate genomes. After infecting the host, these retroviruses have integrated in germ line cells, and have then been carried in genomes for at least several 100 million years. A better understanding of structure and function of these sequences can have profound biological and medical consequences. RetroTector (ReTe) is a platform-independent Java program for identification and characterization of proviral sequences in vertebrate genomes. The full ReTe requires a local installation with a MySQL database. Although not overly complicated, the installation may take some time. A "light" version of ReTe, (RetroTector online; ROL) which does not require specific installation procedures is provided, via the World Wide Web. ROL http://www.fysiologi.neuro.uu.se/jbgs/ was implemented under the Batchelor web interface (A Lövgren et al). It allows both GenBank accession number, file and FASTA cut-and-paste admission of sequences (5 to 10,000 kilobases). Up to ten submissions can be done simultaneously, allowing batch analysis of
An Efficient Approach for the Development of Locus Specific Primers in Bread Wheat (Triticum aestivum L.) and Its Application to Re-Sequencing of Genes Involved in Frost Tolerance

PubMed Central

Babben, Steve; Perovic, Dragan; Koch, Michael; Ordon, Frank

2015-01-01

Recent declines in costs accelerated sequencing of many species with large genomes, including hexaploid wheat (Triticum aestivum L.). Although the draft sequence of bread wheat is known, it is still one of the major challenges to developlocus specific primers suitable to be used in marker assisted selection procedures, due to the high homology of the three genomes. In this study we describe an efficient approach for the development of locus specific primers comprising four steps, i.e. (i) identification of genomic and coding sequences (CDS) of candidate genes, (ii) intron- and exon-structure reconstruction, (iii) identification of wheat A, B and D sub-genome sequences and primer development based on sequence differences between the three sub-genomes, and (iv); testing of primers for functionality, correct size and localisation. This approach was applied to single, low and high copy genes involved in frost tolerance in wheat. In summary for 27 of these genes for which sequences were derived from Triticum aestivum, Triticum monococcum and Hordeum vulgare, a set of 119 primer pairs was developed and after testing on Nulli-tetrasomic (NT) lines, a set of 65 primer pairs (54.6%), corresponding to 19 candidate genes, turned out to be specific. Out of these a set of 35 fragments was selected for validation via Sanger's amplicon re-sequencing. All fragments, with the exception of one, could be assigned to the original reference sequence. The approach presented here showed a much higher specificity in primer development in comparison to techniques used so far in bread wheat and can be applied to other polyploid species with a known draft sequence. PMID:26565976
Optimizing and evaluating the reconstruction of Metagenome-assembled microbial genomes.

PubMed

Papudeshi, Bhavya; Haggerty, J Matthew; Doane, Michael; Morris, Megan M; Walsh, Kevin; Beattie, Douglas T; Pande, Dnyanada; Zaeri, Parisa; Silva, Genivaldo G Z; Thompson, Fabiano; Edwards, Robert A; Dinsdale, Elizabeth A

2017-11-28

Microbiome/host interactions describe characteristics that affect the host's health. Shotgun metagenomics includes sequencing a random subset of the microbiome to analyze its taxonomic and metabolic potential. Reconstruction of DNA fragments into genomes from metagenomes (called metagenome-assembled genomes) assigns unknown fragments to taxa/function and facilitates discovery of novel organisms. Genome reconstruction incorporates sequence assembly and sorting of assembled sequences into bins, characteristic of a genome. However, the microbial community composition, including taxonomic and phylogenetic diversity may influence genome reconstruction. We determine the optimal reconstruction method for four microbiome projects that had variable sequencing platforms (IonTorrent and Illumina), diversity (high or low), and environment (coral reefs and kelp forests), using a set of parameters to select for optimal assembly and binning tools. We tested the effects of the assembly and binning processes on population genome reconstruction using 105 marine metagenomes from 4 projects. Reconstructed genomes were obtained from each project using 3 assemblers (IDBA, MetaVelvet, and SPAdes) and 2 binning tools (GroopM and MetaBat). We assessed the efficiency of assemblers using statistics that including contig continuity and contig chimerism and the effectiveness of binning tools using genome completeness and taxonomic identification. We concluded that SPAdes, assembled more contigs (143,718 ± 124 contigs) of longer length (N50 = 1632 ± 108 bp), and incorporated the most sequences (sequences-assembled = 19.65%). The microbial richness and evenness were maintained across the assembly, suggesting low contig chimeras. SPAdes assembly was responsive to the biological and technological variations within the project, compared with other assemblers. Among binning tools, we conclude that MetaBat produced bins with less variation in GC content (average standard deviation: 1.49), low species richness (4.91 ± 0.66), and higher genome completeness (40.92 ± 1.75) across all projects. MetaBat extracted 115 bins from the 4 projects of which 66 bins were identified as reconstructed metagenome-assembled genomes with sequences belonging to a specific genus. We identified 13 novel genomes, some of which were 100% complete, but show low similarity to genomes within databases. In conclusion, we present a set of biologically relevant parameters for evaluation to select for optimal assembly and binning tools. For the tools we tested, SPAdes assembler and MetaBat binning tools reconstructed quality metagenome-assembled genomes for the four projects. We also conclude that metagenomes from microbial communities that have high coverage of phylogenetically distinct, and low taxonomic diversity results in highest quality metagenome-assembled genomes.
Tardigrade workbench: comparing stress-related proteins, sequence-similar and functional protein clusters as well as RNA elements in tardigrades

PubMed Central

2009-01-01

Background Tardigrades represent an animal phylum with extraordinary resistance to environmental stress. Results To gain insights into their stress-specific adaptation potential, major clusters of related and similar proteins are identified, as well as specific functional clusters delineated comparing all tardigrades and individual species (Milnesium tardigradum, Hypsibius dujardini, Echiniscus testudo, Tulinus stephaniae, Richtersius coronifer) and functional elements in tardigrade mRNAs are analysed. We find that 39.3% of the total sequences clustered in 58 clusters of more than 20 proteins. Among these are ten tardigrade specific as well as a number of stress-specific protein clusters. Tardigrade-specific functional adaptations include strong protein, DNA- and redox protection, maintenance and protein recycling. Specific regulatory elements regulate tardigrade mRNA stability such as lox P DICE elements whereas 14 other RNA elements of higher eukaryotes are not found. Further features of tardigrade specific adaption are rapidly identified by sequence and/or pattern search on the web-tool tardigrade analyzer http://waterbear.bioapps.biozentrum.uni-wuerzburg.de. The work-bench offers nucleotide pattern analysis for promotor and regulatory element detection (tardigrade specific; nrdb) as well as rapid COG search for function assignments including species-specific repositories of all analysed data. Conclusion Different protein clusters and regulatory elements implicated in tardigrade stress adaptations are analysed including unpublished tardigrade sequences. PMID:19821996
Tardigrade workbench: comparing stress-related proteins, sequence-similar and functional protein clusters as well as RNA elements in tardigrades.

PubMed

Förster, Frank; Liang, Chunguang; Shkumatov, Alexander; Beisser, Daniela; Engelmann, Julia C; Schnölzer, Martina; Frohme, Marcus; Müller, Tobias; Schill, Ralph O; Dandekar, Thomas

2009-10-12

Tardigrades represent an animal phylum with extraordinary resistance to environmental stress. To gain insights into their stress-specific adaptation potential, major clusters of related and similar proteins are identified, as well as specific functional clusters delineated comparing all tardigrades and individual species (Milnesium tardigradum, Hypsibius dujardini, Echiniscus testudo, Tulinus stephaniae, Richtersius coronifer) and functional elements in tardigrade mRNAs are analysed. We find that 39.3% of the total sequences clustered in 58 clusters of more than 20 proteins. Among these are ten tardigrade specific as well as a number of stress-specific protein clusters. Tardigrade-specific functional adaptations include strong protein, DNA- and redox protection, maintenance and protein recycling. Specific regulatory elements regulate tardigrade mRNA stability such as lox P DICE elements whereas 14 other RNA elements of higher eukaryotes are not found. Further features of tardigrade specific adaption are rapidly identified by sequence and/or pattern search on the web-tool tardigrade analyzer http://waterbear.bioapps.biozentrum.uni-wuerzburg.de. The work-bench offers nucleotide pattern analysis for promotor and regulatory element detection (tardigrade specific; nrdb) as well as rapid COG search for function assignments including species-specific repositories of all analysed data. Different protein clusters and regulatory elements implicated in tardigrade stress adaptations are analysed including unpublished tardigrade sequences.
Complete genome sequence of enterohemorrhagic Escherichia coli O157:H7 and genomic comparison with a laboratory strain K-12.

PubMed

Hayashi, T; Makino, K; Ohnishi, M; Kurokawa, K; Ishii, K; Yokoyama, K; Han, C G; Ohtsubo, E; Nakayama, K; Murata, T; Tanaka, M; Tobe, T; Iida, T; Takami, H; Honda, T; Sasakawa, C; Ogasawara, N; Yasunaga, T; Kuhara, S; Shiba, T; Hattori, M; Shinagawa, H

2001-02-28

Escherichia coli O157:H7 is a major food-borne infectious pathogen that causes diarrhea, hemorrhagic colitis, and hemolytic uremic syndrome. Here we report the complete chromosome sequence of an O157:H7 strain isolated from the Sakai outbreak, and the results of genomic comparison with a benign laboratory strain, K-12 MG1655. The chromosome is 5.5 Mb in size, 859 Kb larger than that of K-12. We identified a 4.1-Mb sequence highly conserved between the two strains, which may represent the fundamental backbone of the E. coli chromosome. The remaining 1.4-Mb sequence comprises of O157:H7-specific sequences, most of which are horizontally transferred foreign DNAs. The predominant roles of bacteriophages in the emergence of O157:H7 is evident by the presence of 24 prophages and prophage-like elements that occupy more than half of the O157:H7-specific sequences. The O157:H7 chromosome encodes 1632 proteins and 20 tRNAs that are not present in K-12. Among these, at least 131 proteins are assumed to have virulence-related functions. Genome-wide codon usage analysis suggested that the O157:H7-specific tRNAs are involved in the efficient expression of the strain-specific genes. A complete set of the genes specific to O157:H7 presented here sheds new insight into the pathogenicity and the physiology of O157:H7, and will open a way to fully understand the molecular mechanisms underlying the O157:H7 infection.
Quantum mechanical approaches to in silico enzyme characterization and drug design

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nilmeier, J P; Fattebert, J L; Jacobson, M P

2012-01-17

The astonishing, exponentially increasing rates of genome sequencing has led to one of the most significant challenges for the biological and computational sciences in the 21st century: assigning the likely functions of the encoded proteins. Enzymes represent a particular challenge, and a critical one, because the universe of enzymes is likely to contain many novel functions that may be useful for synthetic biology, or as drug targets. Current approaches to protein annotation are largely based on bioinformatics. At the simplest level, this annotation involves transferring the annotations of characterized enzymes to related sequences. In practice, however, there is no simple,more » sequence based criterion for transferring annotations, and bioinformatics alone cannot propose new enzymatic functions. Structure-based computational methods have the potential to address these limitations, by identifying potential substrates of enzymes, as we and others have shown. One successful approach has used in silico 'docking' methods, more commonly applied in structure-based drug design, to identify possible metabolite substrates. A major limitation of this approach is that it only considers substrate binding, and does not directly assess the potential of the enzyme to catalyze a particular reaction using a particular substrate. That is, substrate binding affinity is necessary but not sufficient to assign function. A reaction profile is ultimately what is needed for a more complete quantitative description of function. To address this rather fundamental limitation, they propose to use quantum mechanical methods to explicitly compute transition state barriers that govern the rates of catalysis. Although quantum mechanical, and mixed quantum/classical (QM/MM), methods have been used extensively to investigate enzymatic reactions, the focus has been primarily on elucidating complex reaction mechanisms. Here, the key catalytic steps are known, and they use these methods quantify substrate specificity. That is, we bring the power of quantum mechanics to bear on the problem of annotating enzyme function, which is a novel approach. Although it has been clear to us at the Jacobson group for some time that enzyme specificity may be encoded in transition states, rather than simply substrate recognition, the main limitation has always been computational expense. Using a hierarchy of different methods, they can reduce the list of plausible substrates of an enzyme to a small number in most cases, but even identifying the transition states for a dozen plausible substrates requires significant computational effort, beyond what is practical using standard QM/MM methods. For this project, they have chosen two enzyme superfamilies which they have used as 'model systems' for functional assignment. The enolase superfamily is a large group of {alpha}-{beta} barrel enzymes with highly diverse substrates and chemical transformations. Despite decades of work, over a third of the superfamily remains unassigned, which means that the remaining cases are by definition difficult to assign. They have focused on acid sugar dehydratases, and have considerable expertise on the matter. They are also interested in the isoprenoid synthase superfamily, which is of central interest to the synthetic biology community, because these enzymes are used by nature to create complex rare natural products of medicinal value. the most notable example of this is the artemisinin, an antimalarial compound that is found in trace amounts in the wormwod root. From the standpoint of enzyme function assignment, these enzymes are intriguing because they use a small number of chemically simple substrates to generate, potentially, tens of thousands of different products. Hence, substrate binding specificity is only a small part of the challenge; the key is determining how the enzyme directs the carbocation chemistry to specific products. These more complex modeling approaches clearly require quantum mechanical methods.« less
Discrimination of germline V genes at different sequencing lengths and mutational burdens: A new tool for identifying and evaluating the reliability of V gene assignment.

PubMed

Zhang, Bochao; Meng, Wenzhao; Prak, Eline T Luning; Hershberg, Uri

2015-12-01

Immune repertoires are collections of lymphocytes that express diverse antigen receptor gene rearrangements consisting of Variable (V), (Diversity (D) in the case of heavy chains) and Joining (J) gene segments. Clonally related cells typically share the same germline gene segments and have highly similar junctional sequences within their third complementarity determining regions. Identifying clonal relatedness of sequences is a key step in the analysis of immune repertoires. The V gene is the most important for clone identification because it has the longest sequence and the greatest number of sequence variants. However, accurate identification of a clone's germline V gene source is challenging because there is a high degree of similarity between different germline V genes. This difficulty is compounded in antibodies, which can undergo somatic hypermutation. Furthermore, high-throughput sequencing experiments often generate partial sequences and have significant error rates. To address these issues, we describe a novel method to estimate which germline V genes (or alleles) cannot be discriminated under different conditions (read lengths, sequencing errors or somatic hypermutation frequencies). Starting with any set of germline V genes, this method measures their similarity using different sequencing lengths and calculates their likelihood of unambiguous assignment under different levels of mutation. Hence, one can identify, under different experimental and biological conditions, the germline V genes (or alleles) that cannot be uniquely identified and bundle them together into groups of specific V genes with highly similar sequences. Copyright © 2015 Elsevier B.V. All rights reserved.
Genetic Characterization of a Panel of Diverse HIV-1 Isolates at Seven International Sites

PubMed Central

Chen, Yue; Sanchez, Ana M.; Sabino, Ester; Hunt, Gillian; Ledwaba, Johanna; Hackett, John; Swanson, Priscilla; Hewlett, Indira; Ragupathy, Viswanath; Vikram Vemula, Sai; Zeng, Peibin; Tee, Kok-Keng; Chow, Wei Zhen; Ji, Hezhao; Sandstrom, Paul; Denny, Thomas N.; Busch, Michael P.; Gao, Feng

2016-01-01

HIV-1 subtypes and drug resistance are routinely tested by many international surveillance groups. However, results from different sites often vary. A systematic comparison of results from multiple sites is needed to determine whether a standardized protocol is required for consistent and accurate data analysis. A panel of well-characterized HIV-1 isolates (N = 50) from the External Quality Assurance Program Oversight Laboratory (EQAPOL) was assembled for evaluation at seven international sites. This virus panel included seven subtypes, six circulating recombinant forms (CRFs), nine unique recombinant forms (URFs) and three group O viruses. Seven viruses contained 10 major drug resistance mutations (DRMs). HIV-1 isolates were prepared at a concentration of 107 copies/ml and compiled into blinded panels. Subtypes and DRMs were determined with partial or full pol gene sequences by conventional Sanger sequencing and/or Next Generation Sequencing (NGS). Subtype and DRM results were reported and decoded for comparison with full-length genome sequences generated by EQAPOL. The partial pol gene was amplified by RT-PCR and sequenced for 89.4%-100% of group M viruses at six sites. Subtyping results of majority of the viruses (83%-97.9%) were correctly determined for the partial pol sequences. All 10 major DRMs in seven isolates were detected at these six sites. The complete pol gene sequence was also obtained by NGS at one site. However, this method missed six group M viruses and sequences contained host chromosome fragments. Three group O viruses were only characterized with additional group O-specific RT-PCR primers employed by one site. These results indicate that PCR protocols and subtyping tools should be standardized to efficiently amplify diverse viruses and more consistently assign virus genotypes, which is critical for accurate global subtype and drug resistance surveillance. Targeted NGS analysis of partial pol sequences can serve as an alternative approach, especially for detection of low-abundance DRMs. PMID:27314585
Genetic Characterization of a Panel of Diverse HIV-1 Isolates at Seven International Sites.

PubMed

Hora, Bhavna; Keating, Sheila M; Chen, Yue; Sanchez, Ana M; Sabino, Ester; Hunt, Gillian; Ledwaba, Johanna; Hackett, John; Swanson, Priscilla; Hewlett, Indira; Ragupathy, Viswanath; Vikram Vemula, Sai; Zeng, Peibin; Tee, Kok-Keng; Chow, Wei Zhen; Ji, Hezhao; Sandstrom, Paul; Denny, Thomas N; Busch, Michael P; Gao, Feng

2016-01-01

HIV-1 subtypes and drug resistance are routinely tested by many international surveillance groups. However, results from different sites often vary. A systematic comparison of results from multiple sites is needed to determine whether a standardized protocol is required for consistent and accurate data analysis. A panel of well-characterized HIV-1 isolates (N = 50) from the External Quality Assurance Program Oversight Laboratory (EQAPOL) was assembled for evaluation at seven international sites. This virus panel included seven subtypes, six circulating recombinant forms (CRFs), nine unique recombinant forms (URFs) and three group O viruses. Seven viruses contained 10 major drug resistance mutations (DRMs). HIV-1 isolates were prepared at a concentration of 107 copies/ml and compiled into blinded panels. Subtypes and DRMs were determined with partial or full pol gene sequences by conventional Sanger sequencing and/or Next Generation Sequencing (NGS). Subtype and DRM results were reported and decoded for comparison with full-length genome sequences generated by EQAPOL. The partial pol gene was amplified by RT-PCR and sequenced for 89.4%-100% of group M viruses at six sites. Subtyping results of majority of the viruses (83%-97.9%) were correctly determined for the partial pol sequences. All 10 major DRMs in seven isolates were detected at these six sites. The complete pol gene sequence was also obtained by NGS at one site. However, this method missed six group M viruses and sequences contained host chromosome fragments. Three group O viruses were only characterized with additional group O-specific RT-PCR primers employed by one site. These results indicate that PCR protocols and subtyping tools should be standardized to efficiently amplify diverse viruses and more consistently assign virus genotypes, which is critical for accurate global subtype and drug resistance surveillance. Targeted NGS analysis of partial pol sequences can serve as an alternative approach, especially for detection of low-abundance DRMs.
Novel insect-specific flavivirus isolated from northern Europe

PubMed Central

Huhtamo, Eili; Moureau, Gregory; Cook, Shelley; Julkunen, Ora; Putkuri, Niina; Kurkela, Satu; Uzcátegui, Nathalie Y.; Harbach, Ralph E.; Gould, Ernest A.; Vapalahti, Olli; de Lamballerie, Xavier

2012-01-01

Mosquitoes collected in Finland were screened for flaviviral RNA leading to the discovery and isolation of a novel flavivirus designated Hanko virus (HANKV). Virus characterization, including phylogenetic analysis of the complete coding sequence, confirmed HANKV as a member of the “insect-specific” flavivirus (ISF) group. HANKV is the first member of this group isolated from northern Europe, and therefore the first northern European ISF for which the complete coding sequence has been determined. HANKV was not transcribed as DNA in mosquito cell culture, which appears atypical for an ISF. HANKV shared highest sequence homology with the partial NS5 sequence available for the recently discovered Spanish Ochlerotatus flavivirus (SOcFV). Retrospective analysis of mitochondrial sequences from the virus-positive mosquito pool suggested an Ochlerotatus mosquito species as the most likely host for HANKV. HANKV and SOcFV may therefore represent a novel group of Ochlerotatus-hosted insect-specific flaviviruses in Europe and further afield. PMID:22999256

Autofluorescence microscopy for paired-matched morphological and molecular identification of individual chigger mites (Acari: Trombiculidae), the vectors of scrub typhus.

PubMed

Kumlert, Rawadee; Chaisiri, Kittipong; Anantatat, Tippawan; Stekolnikov, Alexandr A; Morand, Serge; Prasartvit, Anchana; Makepeace, Benjamin L; Sungvornyothin, Sungsit; Paris, Daniel H

2018-01-01

Conventional gold standard characterization of chigger mites involves chemical preparation procedures (i.e. specimen clearing) for visualization of morphological features, which however contributes to destruction of the arthropod host DNA and any endosymbiont or pathogen DNA harbored within the specimen. In this study, a novel work flow based on autofluorescence microscopy was developed to enable identification of trombiculid mites to the species level on the basis of morphological traits without any special preparation, while preserving the mite DNA for subsequent genotyping. A panel of 16 specifically selected fluorescence microscopy images of mite features from available identification keys served for complete chigger morphological identification to the species level, and was paired with corresponding genotype data. We evaluated and validated this method for paired chigger morphological and genotypic ID using the mitochondrial cytochrome c oxidase subunit I gene (coi) in 113 chigger specimens representing 12 species and 7 genera (Leptotrombidium, Ascoschoengastia, Gahrliepia, Walchia, Blankaartia, Schoengastia and Schoutedenichia) from the Lao People's Democratic Republic (Lao PDR) to the species level (complete characterization), and 153 chiggers from 5 genera (Leptotrombidium, Ascoschoengastia, Helenicula, Schoengastiella and Walchia) from Thailand, Cambodia and Lao PDR to the genus level. A phylogenetic tree constructed from 77 coi gene sequences (approximately 640 bp length, n = 52 new coi sequences and n = 25 downloaded from GenBank), demonstrated clear grouping of assigned morphotypes at the genus levels, although evidence of both genetic polymorphism and morphological plasticity was found. With this new methodology, we provided the largest collection of characterized coi gene sequences for trombiculid mites to date, and almost doubled the number of available characterized coi gene sequences with a single study. The ability to provide paired phenotypic-genotypic data is of central importance for future characterization of mites and dissecting the molecular epidemiology of mites transmitting diseases like scrub typhus.
Autofluorescence microscopy for paired-matched morphological and molecular identification of individual chigger mites (Acari: Trombiculidae), the vectors of scrub typhus

PubMed Central

Chaisiri, Kittipong; Anantatat, Tippawan; Stekolnikov, Alexandr A.; Morand, Serge; Prasartvit, Anchana; Makepeace, Benjamin L.; Sungvornyothin, Sungsit; Paris, Daniel H.

2018-01-01

Background Conventional gold standard characterization of chigger mites involves chemical preparation procedures (i.e. specimen clearing) for visualization of morphological features, which however contributes to destruction of the arthropod host DNA and any endosymbiont or pathogen DNA harbored within the specimen. Methodology/Principal findings In this study, a novel work flow based on autofluorescence microscopy was developed to enable identification of trombiculid mites to the species level on the basis of morphological traits without any special preparation, while preserving the mite DNA for subsequent genotyping. A panel of 16 specifically selected fluorescence microscopy images of mite features from available identification keys served for complete chigger morphological identification to the species level, and was paired with corresponding genotype data. We evaluated and validated this method for paired chigger morphological and genotypic ID using the mitochondrial cytochrome c oxidase subunit I gene (coi) in 113 chigger specimens representing 12 species and 7 genera (Leptotrombidium, Ascoschoengastia, Gahrliepia, Walchia, Blankaartia, Schoengastia and Schoutedenichia) from the Lao People’s Democratic Republic (Lao PDR) to the species level (complete characterization), and 153 chiggers from 5 genera (Leptotrombidium, Ascoschoengastia, Helenicula, Schoengastiella and Walchia) from Thailand, Cambodia and Lao PDR to the genus level. A phylogenetic tree constructed from 77 coi gene sequences (approximately 640 bp length, n = 52 new coi sequences and n = 25 downloaded from GenBank), demonstrated clear grouping of assigned morphotypes at the genus levels, although evidence of both genetic polymorphism and morphological plasticity was found. Conclusions/Significance With this new methodology, we provided the largest collection of characterized coi gene sequences for trombiculid mites to date, and almost doubled the number of available characterized coi gene sequences with a single study. The ability to provide paired phenotypic-genotypic data is of central importance for future characterization of mites and dissecting the molecular epidemiology of mites transmitting diseases like scrub typhus. PMID:29494599
Accurate Sample Assignment in a Multiplexed, Ultrasensitive, High-Throughput Sequencing Assay for Minimal Residual Disease.

PubMed

Bartram, Jack; Mountjoy, Edward; Brooks, Tony; Hancock, Jeremy; Williamson, Helen; Wright, Gary; Moppett, John; Goulden, Nick; Hubank, Mike

2016-07-01

High-throughput sequencing (HTS) (next-generation sequencing) of the rearranged Ig and T-cell receptor genes promises to be less expensive and more sensitive than current methods of monitoring minimal residual disease (MRD) in patients with acute lymphoblastic leukemia. However, the adoption of new approaches by clinical laboratories requires careful evaluation of all potential sources of error and the development of strategies to ensure the highest accuracy. Timely and efficient clinical use of HTS platforms will depend on combining multiple samples (multiplexing) in each sequencing run. Here we examine the Ig heavy-chain gene HTS on the Illumina MiSeq platform for MRD. We identify errors associated with multiplexing that could potentially impact the accuracy of MRD analysis. We optimize a strategy that combines high-purity, sequence-optimized oligonucleotides, dual indexing, and an error-aware demultiplexing approach to minimize errors and maximize sensitivity. We present a probability-based, demultiplexing pipeline Error-Aware Demultiplexer that is suitable for all MiSeq strategies and accurately assigns samples to the correct identifier without excessive loss of data. Finally, using controls quantified by digital PCR, we show that HTS-MRD can accurately detect as few as 1 in 10(6) copies of specific leukemic MRD. Crown Copyright © 2016. Published by Elsevier Inc. All rights reserved.
Divergence of Structure and Function in the Haloacid Dehalogenase Enzyme Superfamily: Bacteroides thetaiotaomicron BT2127 is an Inorganic Pyrophosphatase+

PubMed Central

Huang, Hua; Yury, Patskovsky; Toro, Rafael; Farelli, Jeremiah D.; Pandya, Chetanya; Almo, Steven C.; Allen, Karen N.; Dunaway-Mariano, Debra

2012-01-01

The explosion of protein sequence information requires that current strategies for function assignment must evolve to complement experimental approaches with computationally-based function prediction. This necessitates the development of strategies based on the identification of sequence markers in the form of specificity determinants and a more informed definition of orthologues. Herein, we have undertaken the function assignment of the unknown Haloalkanoate Dehalogenase superfamily member BT2127 (Uniprot accession # Q8A5V9) from Bacteroides thetaiotaomicron using an integrated bioinformatics/structure/mechanism approach. The substrate specificity profile and steady-state rate constants of BT2127 (with kcat/Km value for pyrophosphate of ∼1 × 105 M−1 s−1), together with the gene context, supports the assigned in vivo function as an inorganic pyrophosphatase. The X-ray structural analysis of the wild-type BT2127 and several variants generated by site-directed mutagenesis shows that substrate discrimination is based, in part, on active site space restrictions imposed by the cap domain (specifically by residues Tyr76 and Glu47). Structure guided site directed mutagenesis coupled with kinetic analysis of the mutant enzymes identified the residues required for catalysis, substrate binding, and domain-domain association. Based on this structure-function analysis, the catalytic residues Asp11, Asp13, Thr113, and Lys147 as well the metal binding residues Asp171, Asn172 and Glu47 were used as markers to confirm BT2127 orthologues identified via sequence searches. This bioinformatic analysis demonstrated that the biological range of BT2127 orthologue is restricted to the phylum Bacteroidetes/Chlorobi. The key structural determinants in the divergence of BT2127 and its closest homologue β-phosphoglucomutase control the leaving group size (phosphate vs. glucose-phosphate) and the position of the Asp acid/base in the open vs. closed conformations. HADSF pyrophosphatases represent a third mechanistic and fold type for bacterial pyrophosphatases. PMID:21894910
Using specific length amplified fragment sequencing to construct the high-density genetic map for Vitis (Vitis vinifera L. × Vitis amurensis Rupr.).

PubMed

Guo, Yinshan; Shi, Guangli; Liu, Zhendong; Zhao, Yuhui; Yang, Xiaoxu; Zhu, Junchi; Li, Kun; Guo, Xiuwu

2015-01-01

In this study, 149 F1 plants from the interspecific cross between 'Red Globe' (Vitis vinifera L.) and 'Shuangyou' (Vitis amurensis Rupr.) and the parent were used to construct a molecular genetic linkage map by using the specific length amplified fragment sequencing technique. DNA sequencing generated 41.282 Gb data consisting of 206,411,693 paired-end reads. The average sequencing depths were 68.35 for 'Red Globe,' 63.65 for 'Shuangyou,' and 8.01 for each progeny. In all, 115,629 high-quality specific length amplified fragments were detected, of which 42,279 were polymorphic. The genetic map was constructed using 7,199 of these polymorphic markers. These polymorphic markers were assigned to 19 linkage groups; the total length of the map was 1929.13 cm, with an average distance of 0.28 cm between each maker. To our knowledge, the genetic maps constructed in this study contain the largest number of molecular markers. These high-density genetic maps might form the basis for the fine quantitative trait loci mapping and molecular-assisted breeding of grape.
Coverage Bias and Sensitivity of Variant Calling for Four Whole-genome Sequencing Technologies

PubMed Central

Lasitschka, Bärbel; Jones, David; Northcott, Paul; Hutter, Barbara; Jäger, Natalie; Kool, Marcel; Taylor, Michael; Lichter, Peter; Pfister, Stefan; Wolf, Stephan; Brors, Benedikt; Eils, Roland

2013-01-01

The emergence of high-throughput, next-generation sequencing technologies has dramatically altered the way we assess genomes in population genetics and in cancer genomics. Currently, there are four commonly used whole-genome sequencing platforms on the market: Illumina’s HiSeq2000, Life Technologies’ SOLiD 4 and its completely redesigned 5500xl SOLiD, and Complete Genomics’ technology. A number of earlier studies have compared a subset of those sequencing platforms or compared those platforms with Sanger sequencing, which is prohibitively expensive for whole genome studies. Here we present a detailed comparison of the performance of all currently available whole genome sequencing platforms, especially regarding their ability to call SNVs and to evenly cover the genome and specific genomic regions. Unlike earlier studies, we base our comparison on four different samples, allowing us to assess the between-sample variation of the platforms. We find a pronounced GC bias in GC-rich regions for Life Technologies’ platforms, with Complete Genomics performing best here, while we see the least bias in GC-poor regions for HiSeq2000 and 5500xl. HiSeq2000 gives the most uniform coverage and displays the least sample-to-sample variation. In contrast, Complete Genomics exhibits by far the smallest fraction of bases not covered, while the SOLiD platforms reveal remarkable shortcomings, especially in covering CpG islands. When comparing the performance of the four platforms for calling SNPs, HiSeq2000 and Complete Genomics achieve the highest sensitivity, while the SOLiD platforms show the lowest false positive rate. Finally, we find that integrating sequencing data from different platforms offers the potential to combine the strengths of different technologies. In summary, our results detail the strengths and weaknesses of all four whole-genome sequencing platforms. It indicates application areas that call for a specific sequencing platform and disallow other platforms. This helps to identify the proper sequencing platform for whole genome studies with different application scopes. PMID:23776689
Taxator-tk: precise taxonomic assignment of metagenomes by fast approximation of evolutionary neighborhoods

PubMed Central

Dröge, J.; Gregor, I.; McHardy, A. C.

2015-01-01

Motivation: Metagenomics characterizes microbial communities by random shotgun sequencing of DNA isolated directly from an environment of interest. An essential step in computational metagenome analysis is taxonomic sequence assignment, which allows identifying the sequenced community members and reconstructing taxonomic bins with sequence data for the individual taxa. For the massive datasets generated by next-generation sequencing technologies, this cannot be performed with de-novo phylogenetic inference methods. We describe an algorithm and the accompanying software, taxator-tk, which performs taxonomic sequence assignment by fast approximate determination of evolutionary neighbors from sequence similarities. Results: Taxator-tk was precise in its taxonomic assignment across all ranks and taxa for a range of evolutionary distances and for short as well as for long sequences. In addition to the taxonomic binning of metagenomes, it is well suited for profiling microbial communities from metagenome samples because it identifies bacterial, archaeal and eukaryotic community members without being affected by varying primer binding strengths, as in marker gene amplification, or copy number variations of marker genes across different taxa. Taxator-tk has an efficient, parallelized implementation that allows the assignment of 6 Gb of sequence data per day on a standard multiprocessor system with 10 CPU cores and microbial RefSeq as the genomic reference data. Availability and implementation: Taxator-tk source and binary program files are publicly available at http://algbio.cs.uni-duesseldorf.de/software/. Contact: Alice.McHardy@uni-duesseldorf.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25388150
ARResT/AssignSubsets: a novel application for robust subclassification of chronic lymphocytic leukemia based on B cell receptor IG stereotypy.

PubMed

Bystry, Vojtech; Agathangelidis, Andreas; Bikos, Vasilis; Sutton, Lesley Ann; Baliakas, Panagiotis; Hadzidimitriou, Anastasia; Stamatopoulos, Kostas; Darzentas, Nikos

2015-12-01

An ever-increasing body of evidence supports the importance of B cell receptor immunoglobulin (BcR IG) sequence restriction, alias stereotypy, in chronic lymphocytic leukemia (CLL). This phenomenon accounts for ∼30% of studied cases, one in eight of which belong to major subsets, and extends beyond restricted sequence patterns to shared biologic and clinical characteristics and, generally, outcome. Thus, the robust assignment of new cases to major CLL subsets is a critical, and yet unmet, requirement. We introduce a novel application, ARResT/AssignSubsets, which enables the robust assignment of BcR IG sequences from CLL patients to major stereotyped subsets. ARResT/AssignSubsets uniquely combines expert immunogenetic sequence annotation from IMGT/V-QUEST with curation to safeguard quality, statistical modeling of sequence features from more than 7500 CLL patients, and results from multiple perspectives to allow for both objective and subjective assessment. We validated our approach on the learning set, and evaluated its real-world applicability on a new representative dataset comprising 459 sequences from a single institution. ARResT/AssignSubsets is freely available on the web at http://bat.infspire.org/arrest/assignsubsets/ nikos.darzentas@gmail.com. Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
An In Vivo Study of Self-Regulated Study Sequencing in Introductory Psychology Courses

PubMed Central

de Leeuw, Joshua R.; Motz, Benjamin A.; Goldstone, Robert L.

2016-01-01

Study sequence can have a profound influence on learning. In this study we investigated how students decide to sequence their study in a naturalistic context and whether their choices result in improved learning. In the study reported here, 2061 undergraduate students enrolled in an Introductory Psychology course completed an online homework tutorial on measures of central tendency, a topic relevant to an exam that counted towards their grades. One group of students was enabled to choose their own study sequence during the tutorial (Self-Regulated group), while the other group of students studied the same materials in sequences chosen by other students (Yoked group). Students who chose their sequence of study showed a clear tendency to block their study by concept, and this tendency was positively associated with subsequent exam performance. In the Yoked group, study sequence had no effect on exam performance. These results suggest that despite findings that blocked study is maladaptive when assigned by an experimenter, it may actually be adaptive when chosen by the learner in a naturalistic context. PMID:27003164
An In Vivo Study of Self-Regulated Study Sequencing in Introductory Psychology Courses.

PubMed

Carvalho, Paulo F; Braithwaite, David W; de Leeuw, Joshua R; Motz, Benjamin A; Goldstone, Robert L

2016-01-01

Study sequence can have a profound influence on learning. In this study we investigated how students decide to sequence their study in a naturalistic context and whether their choices result in improved learning. In the study reported here, 2061 undergraduate students enrolled in an Introductory Psychology course completed an online homework tutorial on measures of central tendency, a topic relevant to an exam that counted towards their grades. One group of students was enabled to choose their own study sequence during the tutorial (Self-Regulated group), while the other group of students studied the same materials in sequences chosen by other students (Yoked group). Students who chose their sequence of study showed a clear tendency to block their study by concept, and this tendency was positively associated with subsequent exam performance. In the Yoked group, study sequence had no effect on exam performance. These results suggest that despite findings that blocked study is maladaptive when assigned by an experimenter, it may actually be adaptive when chosen by the learner in a naturalistic context.
GenBank.

PubMed

Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Wheeler, David L

2007-01-01

GenBank (R) is a comprehensive database that contains publicly available nucleotide sequences for more than 240 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the EMBL Data Library in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage (www.ncbi.nlm.nih.gov).
GenBank.

PubMed

Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Wheeler, David L

2005-01-01

GenBank is a comprehensive database that contains publicly available DNA sequences for more than 165,000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the EMBL Data Library in the UK and the DNA Data Bank of Japan helps to ensure worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, go to the NCBI Homepage at http://www.ncbi.nlm.nih.gov.
GenBank.

PubMed

Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Wheeler, David L

2006-01-01

GenBank (R) is a comprehensive database that contains publicly available DNA sequences for more than 205 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the Web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the EMBL Data Library in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, go to the NCBI Homepage at www.ncbi.nlm.nih.gov.
Utilizing the International GeoSample Number Concept during ICDP Expedition COSC

NASA Astrophysics Data System (ADS)

Conze, Ronald; Lorenz, Henning; Ulbricht, Damian; Gorgas, Thomas; Elger, Kirsten

2016-04-01

The concept of the International GeoSample Number (IGSN) was introduced to uniquely identify and register geo-related sample material, and make it retrievable via electronic media (e.g., SESAR - http://www.geosamples.org/igsnabout). The general aim of the IGSN concept is to improve accessing stored sample material worldwide, enable the exact identification, its origin and provenance, and also the exact and complete citation of acquired samples throughout the literature. The ICDP expedition COSC (Collisional Orogeny in the Scandinavian Caledonides, http://cosc.icdp-online.org) prompted for the first time in ICDP's history to assign and register IGSNs during an ongoing drilling campaign. ICDP drilling expeditions are using commonly the Drilling Information System DIS (http://doi.org/10.2204/iodp.sd.4.07.2007) for the inventory of recovered sample material. During COSC IGSNs were assigned to every drill hole, core run, core section, and sample taken from core material. The original IGSN specification has been extended to achieve the required uniqueness of IGSNs with our offline-procedure. The ICDP name space indicator and the Expedition ID (5054) are forming an extended prefix (ICDP5054). For every type of sample material, an encoded sequence of characters follows. This sequence is derived from the DIS naming convention which is unique from the beginning. Thereby every ICDP expedition has an unlimited name space for IGSN assignments. This direct derivation of IGSNs from the DIS database context ensures the distinct parent-child hierarchy of the IGSNs among each other. In the case of COSC this method of inventory-keeping of all drill cores was done routinely using the ExpeditionDIS during field work and subsequent sampling party. After completing the field campaign, all sample material was transferred to the "Nationales Bohrkernlager" in Berlin-Spandau, Germany. Corresponding data was subsequently imported into the CurationDIS used at the aforementioned core storage facility. This CurationDIS assigns IGSNs on samples newly taken in the repository in the identical fashion as done in the field. Thereby, the parent-child linkage of the IGSNs is ensured consistently throughout the entire sampling process. The only difference between ExpeditionDIS and CurationDIS sample curation is using the name space ICDP and BGRB respectively as part of the corresponding ID string. To prepare the IGSN registry, a set of metadata is generated for every assigned IGSN using the DIS, which is then exported from the DIS into one common xml-file. The xml-file is based on the SESAR schema and a proposal of IGSN e.V. (http://schema.igsn.org). This systematics has been recently extended for drilling data to achieve additional information for future retrieval options. The two allocation agents GFZ Potsdam und PANGAEA are currently involved in the registry of IGSNs in the case of COSC drill campaigns. An example for the IGSN registration of the COSC-1 drill hole A (5054_1_A) is "ICDP5054EEW1001" and can be resolved using the URL http://hdl.handle.net/10273/ICDP5054EEW1001. Opening the landing page for the complete COSC core material for this particular hole showcases graphically a hierarchical tree entitled "Sample Family". An example of an IGSN citation associated with a COSC sample set is featured on an EGU-2016 poster presentation by Ulrich Harms, Johannes Hierold et al. (EGU2016-8646).
SPHINX--an algorithm for taxonomic binning of metagenomic sequences.

PubMed

Mohammed, Monzoorul Haque; Ghosh, Tarini Shankar; Singh, Nitin Kumar; Mande, Sharmila S

2011-01-01

Compared with composition-based binning algorithms, the binning accuracy and specificity of alignment-based binning algorithms is significantly higher. However, being alignment-based, the latter class of algorithms require enormous amount of time and computing resources for binning huge metagenomic datasets. The motivation was to develop a binning approach that can analyze metagenomic datasets as rapidly as composition-based approaches, but nevertheless has the accuracy and specificity of alignment-based algorithms. This article describes a hybrid binning approach (SPHINX) that achieves high binning efficiency by utilizing the principles of both 'composition'- and 'alignment'-based binning algorithms. Validation results with simulated sequence datasets indicate that SPHINX is able to analyze metagenomic sequences as rapidly as composition-based algorithms. Furthermore, the binning efficiency (in terms of accuracy and specificity of assignments) of SPHINX is observed to be comparable with results obtained using alignment-based algorithms. A web server for the SPHINX algorithm is available at http://metagenomics.atc.tcs.com/SPHINX/.
TOPPE: A framework for rapid prototyping of MR pulse sequences.

PubMed

Nielsen, Jon-Fredrik; Noll, Douglas C

2018-06-01

To introduce a framework for rapid prototyping of MR pulse sequences. We propose a simple file format, called "TOPPE", for specifying all details of an MR imaging experiment, such as gradient and radiofrequency waveforms and the complete scan loop. In addition, we provide a TOPPE file "interpreter" for GE scanners, which is a binary executable that loads TOPPE files and executes the sequence on the scanner. We also provide MATLAB scripts for reading and writing TOPPE files and previewing the sequence prior to hardware execution. With this setup, the task of the pulse sequence programmer is reduced to creating TOPPE files, eliminating the need for hardware-specific programming. No sequence-specific compilation is necessary; the interpreter only needs to be compiled once (for every scanner software upgrade). We demonstrate TOPPE in three different applications: k-space mapping, non-Cartesian PRESTO whole-brain dynamic imaging, and myelin mapping in the brain using inhomogeneous magnetization transfer. We successfully implemented and executed the three example sequences. By simply changing the various TOPPE sequence files, a single binary executable (interpreter) was used to execute several different sequences. The TOPPE file format is a complete specification of an MR imaging experiment, based on arbitrary sequences of a (typically small) number of unique modules. Along with the GE interpreter, TOPPE comprises a modular and flexible platform for rapid prototyping of new pulse sequences. Magn Reson Med 79:3128-3134, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
Genes encoding calmodulin-binding proteins in the Arabidopsis genome

NASA Technical Reports Server (NTRS)

Reddy, Vaka S.; Ali, Gul S.; Reddy, Anireddy S N.

2002-01-01

Analysis of the recently completed Arabidopsis genome sequence indicates that approximately 31% of the predicted genes could not be assigned to functional categories, as they do not show any sequence similarity with proteins of known function from other organisms. Calmodulin (CaM), a ubiquitous and multifunctional Ca(2+) sensor, interacts with a wide variety of cellular proteins and modulates their activity/function in regulating diverse cellular processes. However, the primary amino acid sequence of the CaM-binding domain in different CaM-binding proteins (CBPs) is not conserved. One way to identify most of the CBPs in the Arabidopsis genome is by protein-protein interaction-based screening of expression libraries with CaM. Here, using a mixture of radiolabeled CaM isoforms from Arabidopsis, we screened several expression libraries prepared from flower meristem, seedlings, or tissues treated with hormones, an elicitor, or a pathogen. Sequence analysis of 77 positive clones that interact with CaM in a Ca(2+)-dependent manner revealed 20 CBPs, including 14 previously unknown CBPs. In addition, by searching the Arabidopsis genome sequence with the newly identified and known plant or animal CBPs, we identified a total of 27 CBPs. Among these, 16 CBPs are represented by families with 2-20 members in each family. Gene expression analysis revealed that CBPs and CBP paralogs are expressed differentially. Our data suggest that Arabidopsis has a large number of CBPs including several plant-specific ones. Although CaM is highly conserved between plants and animals, only a few CBPs are common to both plants and animals. Analysis of Arabidopsis CBPs revealed the presence of a variety of interesting domains. Our analyses identified several hypothetical proteins in the Arabidopsis genome as CaM targets, suggesting their involvement in Ca(2+)-mediated signaling networks.
The primary structure of aspartate aminotransferase from pig heart muscle. Digestion with a proteinase having specificity for lysine residues.

PubMed Central

Doonan, S; Doonan, H J; Hanford, R; Vernon, C A; Walker, J M; da Airold, L P; Bossa, F; Barra, D; Carloni, M; Fasella, P; Riva, F

1975-01-01

Carboxymethylated aspartate aminotransferase was digested with a proteinase claimed to be specific for lysine residues. Complete cleavage occurred at 12 of the 19 lysine residues in the protein, but at the remaining seven residues cleavage was either restricted or absent. In addition, cleavage was observed at three of the 26 arginine residues. These results are discussed with reference to the amino acid residues adjacent to points of complete or restricted cleavage. The complete primary structure of aspartate aminotransferase, based on these and other studies, is given. Evidence for the assignment of some acid and amide side chains has been deposited as Supplementary Publication SUP 50050 (11 pp.) at the British Library (Lending Division), Boston Spa, Wetherby, W. Yorkshire LS23 7BQ, U.K., from whom copies can be obtained on the terms indicated in Biochem. J. (1975) 145, 5. The evidence for the assignment of residue 366 was less conclusive than for the other acid and amide side chains and is, therefore, given in the main paper. PMID:1239277
Quality Control Test for Sequence-Phenotype Assignments

PubMed Central

Ortiz, Maria Teresa Lara; Rosario, Pablo Benjamín Leon; Luna-Nevarez, Pablo; Gamez, Alba Savin; Martínez-del Campo, Ana; Del Rio, Gabriel

2015-01-01

Relating a gene mutation to a phenotype is a common task in different disciplines such as protein biochemistry. In this endeavour, it is common to find false relationships arising from mutations introduced by cells that may be depurated using a phenotypic assay; yet, such phenotypic assays may introduce additional false relationships arising from experimental errors. Here we introduce the use of high-throughput DNA sequencers and statistical analysis aimed to identify incorrect DNA sequence-phenotype assignments and observed that 10–20% of these false assignments are expected in large screenings aimed to identify critical residues for protein function. We further show that this level of incorrect DNA sequence-phenotype assignments may significantly alter our understanding about the structure-function relationship of proteins. We have made available an implementation of our method at http://bis.ifc.unam.mx/en/software/chispas. PMID:25700273
Longitudinal Evaluation of the Importance of Homework Assignment Completion for the Academic Performance of Middle School Students with ADHD

PubMed Central

Langberg, Joshua M.; Dvorsky, Melissa R.; Molitor, Stephen J.; Bourchtein, Elizaveta; Eddy, Laura D.; Smith, Zoe; Schultz, Brandon K.; Evans, Steven W.

2016-01-01

The primary goal of this study was to longitudinally evaluate the homework assignment completion patterns of middle school age adolescents with ADHD, their associations with academic performance, and malleable predictors of homework assignment completion. Analyses were conducted on a sample of 104 middle school students comprehensively diagnosed with ADHD and followed for 18 months. Multiple teachers for each student provided information about the percentage of homework assignments turned in at five separate timepoints and school grades were collected quarterly. Results showed that agreement between teachers with respect to students’ assignment completion was high, with an intraclass correlation of .879 at baseline. Students with ADHD were turning in an average of 12% fewer assignments each academic quarter in comparison to teacher-reported classroom averages. Regression analyses revealed a robust association between the percentage of assignments turned in at baseline and school grades 18 months later, even after controlling for baseline grades, achievement (reading and math), intelligence, family income, and race. Cross-lag analyses demonstrated that the association between assignment completion and grades was reciprocal, with assignment completion negatively impacting grades and low grades in turn being associated with decreased future homework completion. Parent ratings of homework materials management abilities at baseline significantly predicted the percentage of assignments turned in as reported by teachers 18 months later. These findings demonstrate that homework assignment completion problems are persistent across time and an important intervention target for adolescents with ADHD. PMID:26931065

FragIdent--automatic identification and characterisation of cDNA-fragments.

PubMed

Seelow, Dominik; Goehler, Heike; Hoffmann, Katrin

2009-03-02

Many genetic studies and functional assays are based on cDNA fragments. After the generation of cDNA fragments from an mRNA sample, their content is at first unknown and must be assigned by sequencing reactions or hybridisation experiments. Even in characterised libraries, a considerable number of clones are wrongly annotated. Furthermore, mix-ups can happen in the laboratory. It is therefore essential to the relevance of experimental results to confirm or determine the identity of the employed cDNA fragments. However, the manual approach for the characterisation of these fragments using BLAST web interfaces is not suited for larger number of sequences and so far, no user-friendly software is publicly available. Here we present the development of FragIdent, an application for the automatic identification of open reading frames (ORFs) within cDNA-fragments. The software performs BLAST analyses to identify the genes represented by the sequences and suggests primers to complete the sequencing of the whole insert. Gene-specific information as well as the protein domains encoded by the cDNA fragment are retrieved from Internet-based databases and included in the output. The application features an intuitive graphical interface and is designed for researchers without any bioinformatics skills. It is suited for projects comprising up to several hundred different clones. We used FragIdent to identify 84 cDNA clones from a yeast two-hybrid experiment. Furthermore, we identified 131 protein domains within our analysed clones. The source code is freely available from our homepage at http://compbio.charite.de/genetik/FragIdent/.
GenBank.

PubMed

Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Sayers, Eric W

2010-01-01

GenBank is a comprehensive database that contains publicly available nucleotide sequences for more than 300,000 organisms named at the genus level or lower, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole genome shotgun (WGS) and environmental sampling projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the European Molecular Biology Laboratory Nucleotide Sequence Database in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through the NCBI Entrez retrieval system, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bi-monthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI homepage: www.ncbi.nlm.nih.gov.
GenBank.

PubMed

Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Sayers, Eric W

2009-01-01

GenBank is a comprehensive database that contains publicly available nucleotide sequences for more than 300,000 organisms named at the genus level or lower, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and accession numbers are assigned by GenBank(R) staff upon receipt. Daily data exchange with the European Molecular Biology Laboratory Nucleotide Sequence Database in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through the National Center for Biotechnology Information (NCBI) Entrez retrieval system, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage: www.ncbi.nlm.nih.gov.
Characterization of the complete genome segments from BmCPV-SZ, a novel Bombyx mori cypovirus 1 isolate.

PubMed

Cao, Guangli; Meng, Xiangkun; Xue, Renyu; Zhu, Yuexiong; Zhang, Xiaorong; Pan, Zhonghua; Zheng, Xiaojian; Gong, Chengliang

2012-07-01

A novel Bombyx mori cypovirus 1 isolated from infected silkworm larvae and tentatively assigned as Bombyx mori cypovirus 1 isolate Suzhou (BmCPV-SZ). The complete nucleotide sequences of genomic segments S1-S10 from BmCPV-SZ were determined. All segments possessed a single open reading frame; however, bioinformatic evidence suggested a short overlapping coding sequence in S1. Each BmCPV-SZ segment possessed the conserved terminal sequences AGUAA and GUUAGCC at the 5' and 3' ends, respectively. The conserved A/G at the -3 position in relation to the AUG codon could be found in the BmCPV-SZ genome, and it was postulated that this conserved A/G may be the most important nucleotide for efficient translation initiation in cypoviruses (CPVs). Examination of the putative amino acid sequences encoded by BmCPV-SZ revealed some characteristic motifs. Homology searches showed that viral structural proteins VP1, VP3, and VP4 had localized homologies with proteins of Rice ragged stunt virus , a member of the genus Oryzavirus within the family Reoviridae. A phylogenetic tree based on RNA-dependent RNA polymerase sequences demonstrated that CPV is more closely related to Rice ragged stunt virus and Aedes pseudoscutellaris reovirus than to other members of Reoviridae, suggesting that they may have originated from common ancestors.
BioMaS: a modular pipeline for Bioinformatic analysis of Metagenomic AmpliconS.

PubMed

Fosso, Bruno; Santamaria, Monica; Marzano, Marinella; Alonso-Alemany, Daniel; Valiente, Gabriel; Donvito, Giacinto; Monaco, Alfonso; Notarangelo, Pasquale; Pesole, Graziano

2015-07-01

Substantial advances in microbiology, molecular evolution and biodiversity have been carried out in recent years thanks to Metagenomics, which allows to unveil the composition and functions of mixed microbial communities in any environmental niche. If the investigation is aimed only at the microbiome taxonomic structure, a target-based metagenomic approach, here also referred as Meta-barcoding, is generally applied. This approach commonly involves the selective amplification of a species-specific genetic marker (DNA meta-barcode) in the whole taxonomic range of interest and the exploration of its taxon-related variants through High-Throughput Sequencing (HTS) technologies. The accessibility to proper computational systems for the large-scale bioinformatic analysis of HTS data represents, currently, one of the major challenges in advanced Meta-barcoding projects. BioMaS (Bioinformatic analysis of Metagenomic AmpliconS) is a new bioinformatic pipeline designed to support biomolecular researchers involved in taxonomic studies of environmental microbial communities by a completely automated workflow, comprehensive of all the fundamental steps, from raw sequence data upload and cleaning to final taxonomic identification, that are absolutely required in an appropriately designed Meta-barcoding HTS-based experiment. In its current version, BioMaS allows the analysis of both bacterial and fungal environments starting directly from the raw sequencing data from either Roche 454 or Illumina HTS platforms, following two alternative paths, respectively. BioMaS is implemented into a public web service available at https://recasgateway.ba.infn.it/ and is also available in Galaxy at http://galaxy.cloud.ba.infn.it:8080 (only for Illumina data). BioMaS is a friendly pipeline for Meta-barcoding HTS data analysis specifically designed for users without particular computing skills. A comparative benchmark, carried out by using a simulated dataset suitably designed to broadly represent the currently known bacterial and fungal world, showed that BioMaS outperforms QIIME and MOTHUR in terms of extent and accuracy of deep taxonomic sequence assignments.
SMARTIV: combined sequence and structure de-novo motif discovery for in-vivo RNA binding data.

PubMed

Polishchuk, Maya; Paz, Inbal; Yakhini, Zohar; Mandel-Gutfreund, Yael

2018-05-25

Gene expression regulation is highly dependent on binding of RNA-binding proteins (RBPs) to their RNA targets. Growing evidence supports the notion that both RNA primary sequence and its local secondary structure play a role in specific Protein-RNA recognition and binding. Despite the great advance in high-throughput experimental methods for identifying sequence targets of RBPs, predicting the specific sequence and structure binding preferences of RBPs remains a major challenge. We present a novel webserver, SMARTIV, designed for discovering and visualizing combined RNA sequence and structure motifs from high-throughput RNA-binding data, generated from in-vivo experiments. The uniqueness of SMARTIV is that it predicts motifs from enriched k-mers that combine information from ranked RNA sequences and their predicted secondary structure, obtained using various folding methods. Consequently, SMARTIV generates Position Weight Matrices (PWMs) in a combined sequence and structure alphabet with assigned P-values. SMARTIV concisely represents the sequence and structure motif content as a single graphical logo, which is informative and easy for visual perception. SMARTIV was examined extensively on a variety of high-throughput binding experiments for RBPs from different families, generated from different technologies, showing consistent and accurate results. Finally, SMARTIV is a user-friendly webserver, highly efficient in run-time and freely accessible via http://smartiv.technion.ac.il/.
Evolutionary conservation, diversity and specificity of LTR retrotransposons in flowering plants: insights from genome-wide analysis and multi-specific comparison

USDA-ARS?s Scientific Manuscript database

The availability of complete or nearly complete genome sequences from several plant species permits detailed discovery and cross-species comparison of transposable elements (TEs) at the whole genome level. We initially investigated 510 LTR-retrotransposon (LTR-RT) families that are comprised of 32,...
Inferring Higher Functional Information for RIKEN Mouse Full-Length cDNA Clones With FACTS

PubMed Central

Nagashima, Takeshi; Silva, Diego G.; Petrovsky, Nikolai; Socha, Luis A.; Suzuki, Harukazu; Saito, Rintaro; Kasukawa, Takeya; Kurochkin, Igor V.; Konagaya, Akihiko; Schönbach, Christian

2003-01-01

FACTS (Functional Association/Annotation of cDNA Clones from Text/Sequence Sources) is a semiautomated knowledge discovery and annotation system that integrates molecular function information derived from sequence analysis results (sequence inferred) with functional information extracted from text. Text-inferred information was extracted from keyword-based retrievals of MEDLINE abstracts and by matching of gene or protein names to OMIM, BIND, and DIP database entries. Using FACTS, we found that 47.5% of the 60,770 RIKEN mouse cDNA FANTOM2 clone annotations were informative for text searches. MEDLINE queries yielded molecular interaction-containing sentences for 23.1% of the clones. When disease MeSH and GO terms were matched with retrieved abstracts, 22.7% of clones were associated with potential diseases, and 32.5% with GO identifiers. A significant number (23.5%) of disease MeSH-associated clones were also found to have a hereditary disease association (OMIM Morbidmap). Inferred neoplastic and nervous system disease represented 49.6% and 36.0% of disease MeSH-associated clones, respectively. A comparison of sequence-based GO assignments with informative text-based GO assignments revealed that for 78.2% of clones, identical GO assignments were provided for that clone by either method, whereas for 21.8% of clones, the assignments differed. In contrast, for OMIM assignments, only 28.5% of clones had identical sequence-based and text-based OMIM assignments. Sequence, sentence, and term-based functional associations are included in the FACTS database (http://facts.gsc.riken.go.jp/), which permits results to be annotated and explored through web-accessible keyword and sequence search interfaces. The FACTS database will be a critical tool for investigating the functional complexity of the mouse transcriptome, cDNA-inferred interactome (molecular interactions), and pathome (pathologies). PMID:12819151
Neo-sex Chromosomes in the Monarch Butterfly, Danaus plexippus

PubMed Central

Mongue, Andrew J.; Nguyen, Petr; Voleníková, Anna; Walters, James R.

2017-01-01

We report the discovery of a neo-sex chromosome in the monarch butterfly, Danaus plexippus, and several of its close relatives. Z-linked scaffolds in the D. plexippus genome assembly were identified via sex-specific differences in Illumina sequencing coverage. Additionally, a majority of the D. plexippus genome assembly was assigned to chromosomes based on counts of one-to-one orthologs relative to the butterfly Melitaea cinxia (with replication using two other lepidopteran species), in which genome scaffolds have been mapped to linkage groups. Sequencing coverage-based assessments of Z linkage combined with homology-based chromosomal assignments provided strong evidence for a Z-autosome fusion in the Danaus lineage, involving the autosome homologous to chromosome 21 in M. cinxia. Coverage analysis also identified three notable assembly errors resulting in chimeric Z-autosome scaffolds. Cytogenetic analysis further revealed a large W chromosome that is partially euchromatic, consistent with being a neo-W chromosome. The discovery of a neo-Z and the provisional assignment of chromosome linkage for >90% of D. plexippus genes lays the foundation for novel insights concerning sex chromosome evolution in this female-heterogametic model species for functional and evolutionary genomics. PMID:28839116
A pharmacy practice laboratory exercise to apply biochemistry concepts.

PubMed

Harrold, Marc W; McFalls, Marsha A

2010-10-11

To develop exercises that allow pharmacy students to apply foundational knowledge discussed in a first-professional year (P1) biochemistry course to specific disease states and patient scenarios. A pharmacy practice laboratory exercise was developed to accompany a lecture sequence pertaining to purine biosynthesis and degradation. The assignment required students to fill a prescription, provide patient counseling tips, and answer questions pertaining to the disease state, the underlying biochemical problem, and the prescribed medication. Students were graded on the accuracy with which they filled the prescription, provided patient counseling, and answered the questions provided. Overall, students displayed mastery in all of these areas. Additionally, students completed a course survey on which they rated this exercise favorably, noting that it helped them to integrate basic science concepts and pharmacy practice. A laboratory exercise provided an opportunity for P1 students to apply foundational pharmacy knowledge to a patient case and can serve as a template for the design of additional exercises.
[Multiplexing mapping of human cDNAs]. Final report, September 1, 1991--February 28, 1994

DOE Office of Scientific and Technical Information (OSTI.GOV)

Not Available

Using PCR with automated product analysis, 329 human brain cDNA sequences have been assigned to individual human chromosomes. Primers were designed from single-pass cDNA sequences expressed sequence tags (ESTs). Primers were used in PCR reactions with DNA from somatic cell hybrid mapping panels as templates, often with multiplexing. Many ESTs mapped match sequence database records. To evaluate of these matches, the position of the primers relative to the matching region (In), the BLAST scores and the Poisson probability values of the EST/sequence record match were determined. In cases where the gene product was stringently identified by the sequence match hadmore » already been mapped, the gene locus determined by EST was consistent with the previous position which strongly supports the validity of assigning unknown genes to human chromosomes based on the EST sequence matches. In the present cases mapping the ESTs to a chromosome can also be considered to have mapped the known gene product: rolipram-sensitive cAMP phosphodiesterase, chromosome 1; protein phosphatase 2A{beta}, chromosome 4; alpha-catenin, chromosome 5; the ELE1 oncogene, chromosome 10q11.2 or q2.1-q23; MXII protein, chromosome l0q24-qter; ribosomal protein L18a homologue, chromosome 14; ribosomal protein L3, chromosome 17; and moesin, Xp11-cen. There were also ESTs mapped that were closely related to non-human sequence records. These matches therefore can be considered to identify human counterparts of known gene products, or members of known gene families. Examples of these include membrane proteins, translation-associated proteins, structural proteins, and enzymes. These data then demonstrate that single pass sequence information is sufficient to design PCR primers useful for assigning cDNA sequences to human chromosomes. When the EST sequence matches previous sequence database records, the chromosome assignments of the EST can be used to make preliminary assignments of the human gene to a chromosome.« less
Longitudinal evaluation of the importance of homework assignment completion for the academic performance of middle school students with ADHD.

PubMed

Langberg, Joshua M; Dvorsky, Melissa R; Molitor, Stephen J; Bourchtein, Elizaveta; Eddy, Laura D; Smith, Zoe; Schultz, Brandon K; Evans, Steven W

2016-04-01

The primary goal of this study was to longitudinally evaluate the homework assignment completion patterns of middle school age adolescents with ADHD, their associations with academic performance, and malleable predictors of homework assignment completion. Analyses were conducted on a sample of 104 middle school students comprehensively diagnosed with ADHD and followed for 18 months. Multiple teachers for each student provided information about the percentage of homework assignments turned in at five separate time points and school grades were collected quarterly. Results showed that agreement between teachers with respect to students assignment completion was high, with an intraclass correlation of .879 at baseline. Students with ADHD were turning in an average of 12% fewer assignments each academic quarter in comparison to teacher-reported classroom averages. Regression analyses revealed a robust association between the percentage of assignments turned in at baseline and school grades 18 months later, even after controlling for baseline grades, achievement (reading and math), intelligence, family income, and race. Cross-lag analyses demonstrated that the association between assignment completion and grades was reciprocal, with assignment completion negatively impacting grades and low grades in turn being associated with decreased future homework completion. Parent ratings of homework materials management abilities at baseline significantly predicted the percentage of assignments turned in as reported by teachers 18 months later. These findings demonstrate that homework assignment completion problems are persistent across time and an important intervention target for adolescents with ADHD. Copyright © 2015 Society for the Study of School Psychology. Published by Elsevier Ltd. All rights reserved.
[Soil propagule bank of ectomycorrhizal fungi in natural forest of Pinus bungeana].

PubMed

Zhao, Nan Xing; Han, Qi Sheng; Huang, Jian

2017-12-01

To conserve and restore the forest of Pinu bungeana, we investigated the soil propagule bank of ectomycorrhizal (ECM) fungi in a severely disturbed natural forest of P. bungeana in Shaanxi Province, China. We used a seedling-bioassay method to bait the ECM fungal propagules in the soils collected from the forest site. ECM was identified by combining morph typing with ITS-PCR-sequencing. We obtained 73 unique sequences from the ECM associated with P. bungeana seedlings, and assigned them into 12 ECM fungal OTUs at the threshold of 97% based on the sequence similarity. Rarefaction curve displayed almost all ECM fungi in the propagule bank were detected. The most frequent OTU (80%) showed poor similarity (75%) with existing sequences in the online database, which suggested it might be a new species. Cenococcum geophilum, Tomentella sp., Tuber sp. were common species in the propagule bank. Although C. geophilum and Tomentella sp. were frequently detected in other soil propagule banks of pine forest, the most frequent OTU was not assigned to known genus or family, which indicated the host-specif of ECM propagule banks associa-ted with P. bungeana. This result confirmed the importance of the special ECM propagule banks associated with P. bungeana for natural forest restoration.
One hertz repetitive transcranial magnetic stimulation over dorsal premotor cortex enhances offline motor memory consolidation for sequence-specific implicit learning.

PubMed

Meehan, S K; Zabukovec, J R; Dao, E; Cheung, K L; Linsdell, M A; Boyd, L A

2013-10-01

Consolidation of motor memories associated with skilled practice can occur both online, concurrent with practice, and offline, after practice has ended. The current study investigated the role of dorsal premotor cortex (PMd) in early offline motor memory consolidation of implicit sequence-specific learning. Thirty-three participants were assigned to one of three groups of repetitive transcranial magnetic stimulation (rTMS) over left PMd (5 Hz, 1 Hz or control) immediately following practice of a novel continuous tracking task. There was no additional practice following rTMS. This procedure was repeated for 4 days. The continuous tracking task contained a repeated sequence that could be learned implicitly and random sequences that could not. On a separate fifth day, a retention test was performed to assess implicit sequence-specific motor learning of the task. Tracking error was decreased for the group who received 1 Hz rTMS over the PMd during the early consolidation period immediately following practice compared with control or 5 Hz rTMS. Enhanced sequence-specific learning with 1 Hz rTMS following practice was due to greater offline consolidation, not differences in online learning between the groups within practice days. A follow-up experiment revealed that stimulation of PMd following practice did not differentially change motor cortical excitability, suggesting that changes in offline consolidation can be largely attributed to stimulation-induced changes in PMd. These findings support a differential role for the PMd in support of online and offline sequence-specific learning of a visuomotor task and offer converging evidence for competing memory systems. © 2013 Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
Rescuing discarded spectra: Full comprehensive analysis of a minimal proteome.

PubMed

Lluch-Senar, Maria; Mancuso, Francesco M; Climente-González, Héctor; Peña-Paz, Marcia I; Sabido, Eduard; Serrano, Luis

2016-02-01

A common problem encountered when performing large-scale MS proteome analysis is the loss of information due to the high percentage of unassigned spectra. To determine the causes behind this loss we have analyzed the proteome of one of the smallest living bacteria that can be grown axenically, Mycoplasma pneumoniae (729 ORFs). The proteome of M. pneumoniae cells, grown in defined media, was analyzed by MS. An initial search with both Mascot and a species-specific NCBInr database with common contaminants (NCBImpn), resulted in around 79% of the acquired spectra not having an assignment. The percentage of non-assigned spectra was reduced to 27% after re-analysis of the data with the PEAKS software, thereby increasing the proteome coverage of M. pneumoniae from the initial 60% to over 76%. Nonetheless, 33,413 spectra with assigned amino acid sequences could not be mapped to any NCBInr database protein sequence. Approximately, 1% of these unassigned peptides corresponded to PTMs and 4% to M. pneumoniae protein variants (deamidation and translation inaccuracies). The most abundant peptide sequence variants (Phe-Tyr and Ala-Ser) could be explained by alterations in the editing capacity of the corresponding tRNA synthases. About another 1% of the peptides not associated to any protein had repetitions of the same aromatic/hydrophobic amino acid at the N-terminus, or had Arg/Lys at the C-terminus. Thus, in a model system, we have maximized the number of assigned spectra to 73% (51,453 out of the 70,040 initial acquired spectra). All MS data have been deposited in the ProteomeXchange with identifier PXD002779 (http://proteomecentral.proteomexchange.org/dataset/PXD002779). © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Using specific length amplified fragment sequencing to construct the high-density genetic map for Vitis (Vitis vinifera L. × Vitis amurensis Rupr.)

PubMed Central

Guo, Yinshan; Shi, Guangli; Liu, Zhendong; Zhao, Yuhui; Yang, Xiaoxu; Zhu, Junchi; Li, Kun; Guo, Xiuwu

2015-01-01

In this study, 149 F1 plants from the interspecific cross between ‘Red Globe’ (Vitis vinifera L.) and ‘Shuangyou’ (Vitis amurensis Rupr.) and the parent were used to construct a molecular genetic linkage map by using the specific length amplified fragment sequencing technique. DNA sequencing generated 41.282 Gb data consisting of 206,411,693 paired-end reads. The average sequencing depths were 68.35 for ‘Red Globe,’ 63.65 for ‘Shuangyou,’ and 8.01 for each progeny. In all, 115,629 high-quality specific length amplified fragments were detected, of which 42,279 were polymorphic. The genetic map was constructed using 7,199 of these polymorphic markers. These polymorphic markers were assigned to 19 linkage groups; the total length of the map was 1929.13 cm, with an average distance of 0.28 cm between each maker. To our knowledge, the genetic maps constructed in this study contain the largest number of molecular markers. These high-density genetic maps might form the basis for the fine quantitative trait loci mapping and molecular-assisted breeding of grape. PMID:26089826
Specific minor groove solvation is a crucial determinant of DNA binding site recognition

PubMed Central

Harris, Lydia-Ann; Williams, Loren Dean; Koudelka, Gerald B.

2014-01-01

The DNA sequence preferences of nearly all sequence specific DNA binding proteins are influenced by the identities of bases that are not directly contacted by protein. Discrimination between non-contacted base sequences is commonly based on the differential abilities of DNA sequences to allow narrowing of the DNA minor groove. However, the factors that govern the propensity of minor groove narrowing are not completely understood. Here we show that the differential abilities of various DNA sequences to support formation of a highly ordered and stable minor groove solvation network are a key determinant of non-contacted base recognition by a sequence-specific binding protein. In addition, disrupting the solvent network in the non-contacted region of the binding site alters the protein's ability to recognize contacted base sequences at positions 5–6 bases away. This observation suggests that DNA solvent interactions link contacted and non-contacted base recognition by the protein. PMID:25429976
On the persistence and detectability of ancient Beothuk mitochondrial DNA genomes in living First Nations peoples.

PubMed

Collier, Ashley; Carr, Steven M

2018-03-29

Claims have long been made as to the survival to the present day of descendants of the Newfoundland Beothuk, a group generally accepted to have become extinct with the death of the last known member, Shanawdithit, in 1829. Interest has recently been revived by the availability of commercial genetic testing, which some claim can assign living individuals to specific Native American groups. We compare complete mitogenome sequences (16569 bp) from aDNA of eight distinct Beothuk lineages, including Shanawdithit's uncle Nonosabasut and his wife Demasduit, with three Newfoundland Mi'kmaq lineages and 21 other living Native Americans drawn from GenBank. A Newfoundland Mi'kmaq lineage in Haplogroup A is more similar to three Native Americans (1-3 SNPs) than to the most closely related Beothuk (24 SNPs). Nonosabasut in Haplogroup X is identical to a non-Beothuk Native American. Demasduit in Haplogroup C differs from three other Native Americans by 1-4 substitutions. Within a 2168 bp region of the HVS sequences available from living Mi'kmaq of the Miawpukek First Nation in Newfoundland, lineages in Haplogroups C, X, and A differ by 1, 4, and 8 substitutions, from the most similar Beothuk, and are more similar to other Native Americans. MtDNA genome sequences in living persons identical or similar to those of Beothuk do not necessarily indicate Beothuk ancestry. Mi'kmaq lineages cannot at this time be associated with any Beothuk lineages more closely than those of other Native Americans.
Streaming fragment assignment for real-time analysis of sequencing experiments

PubMed Central

Roberts, Adam; Pachter, Lior

2013-01-01

We present eXpress, a software package for highly efficient probabilistic assignment of ambiguously mapping sequenced fragments. eXpress uses a streaming algorithm with linear run time and constant memory use. It can determine abundances of sequenced molecules in real time, and can be applied to ChIP-seq, metagenomics and other large-scale sequencing data. We demonstrate its use on RNA-seq data, showing greater efficiency than other quantification methods. PMID:23160280
Complete genome sequence of Lactobacillus johnsonii FI9785, a competitive exclusion agent against pathogens in poultry.

PubMed

Wegmann, Udo; Overweg, Karin; Horn, Nikki; Goesmann, Alexander; Narbad, Arjan; Gasson, Michael J; Shearman, Claire

2009-11-01

Lactobacillus johnsonii is a member of the acidophilus group of lactobacilli. Because of their probiotic properties, including attachment to epithelial cells, immunomodulation, and competitive exclusion of pathogens, representatives of this group are being intensively studied. Here we report the complete annotated genome sequence of Lactobacillus johnsonii FI9785, a strain which prevents the colonization of specific-pathogen-free chicks by Clostridium perfringens.

Practice Makes Progress? Homework Assignments and Outcome in Treatment of Cocaine Dependence

PubMed Central

Carroll, Kathleen M.; Nich, Charla; Ball, Samuel A.

2008-01-01

The relationship between treatment outcome and the extent to which participants completed homework assignments was evaluated among 60 cocaine-dependent individuals assigned to cognitive–behavioral therapy (CBT). Homework was assigned in 72% of all sessions and initiated by participants in 48% of the sessions in which it was assigned. Completion of homework was unrelated to participants' baseline characteristics and several indicators of treatment compliance. Participants who completed more homework assignments demonstrated significantly greater increases in the quantity and quality of their coping skills and used significantly less cocaine during treatment and through a 1-year follow-up. These data suggest that the extent to which participants are willing to complete extrasession assignments may be an important mediator of response to CBT. PMID:16173864
Isolation, sequence identification and tissue expression profiles of 3 novel porcine genes: ASPA, NAGA, and HEXA.

PubMed

Shu, Xianghua; Liu, Yonggang; Yang, Liangyu; Song, Chunlian; Hou, Jiafa

2008-01-01

The complete coding sequences of 3 porcine genes - ASPA, NAGA, and HEXA - were amplified by the reverse transcriptase polymerase chain reaction (RT-PCR) based on the conserved sequence information of the mouse or other mammals and referenced pig ESTs. These 3 novel porcine genes were then deposited in the NCBI database and assigned GeneIDs: 100142661, 100142664 and 100142667. The phylogenetic tree analysis revealed that the porcine ASPA, NAGA, and HEXA all have closer genetic relationships with the ASPA, NAGA, and HEXA of cattle. Tissue expression profile analysis was also carried out and results revealed that swine ASPA, NAGA, and HEXA genes were differentially expressed in various organs, including skeletal muscle, the heart, liver, fat, kidney, lung, and small and large intestines. Our experiment is the first one to establish the foundation for further research on these 3 swine genes.
Single Machine Scheduling and Due Date Assignment with Past-Sequence-Dependent Setup Time and Position-Dependent Processing Time

PubMed Central

Zhao, Chuan-Li; Hsu, Hua-Feng

2014-01-01

This paper considers single machine scheduling and due date assignment with setup time. The setup time is proportional to the length of the already processed jobs; that is, the setup time is past-sequence-dependent (p-s-d). It is assumed that a job's processing time depends on its position in a sequence. The objective functions include total earliness, the weighted number of tardy jobs, and the cost of due date assignment. We analyze these problems with two different due date assignment methods. We first consider the model with job-dependent position effects. For each case, by converting the problem to a series of assignment problems, we proved that the problems can be solved in O(n 4) time. For the model with job-independent position effects, we proved that the problems can be solved in O(n 3) time by providing a dynamic programming algorithm. PMID:25258727
Single machine scheduling and due date assignment with past-sequence-dependent setup time and position-dependent processing time.

PubMed

Zhao, Chuan-Li; Hsu, Chou-Jung; Hsu, Hua-Feng

2014-01-01

This paper considers single machine scheduling and due date assignment with setup time. The setup time is proportional to the length of the already processed jobs; that is, the setup time is past-sequence-dependent (p-s-d). It is assumed that a job's processing time depends on its position in a sequence. The objective functions include total earliness, the weighted number of tardy jobs, and the cost of due date assignment. We analyze these problems with two different due date assignment methods. We first consider the model with job-dependent position effects. For each case, by converting the problem to a series of assignment problems, we proved that the problems can be solved in O(n(4)) time. For the model with job-independent position effects, we proved that the problems can be solved in O(n(3)) time by providing a dynamic programming algorithm.
Visualization of genome signatures of eukaryote genomes by batch-learning self-organizing map with a special emphasis on Drosophila genomes.

PubMed

Abe, Takashi; Hamano, Yuta; Ikemura, Toshimichi

2014-01-01

A strategy of evolutionary studies that can compare vast numbers of genome sequences is becoming increasingly important with the remarkable progress of high-throughput DNA sequencing methods. We previously established a sequence alignment-free clustering method "BLSOM" for di-, tri-, and tetranucleotide compositions in genome sequences, which can characterize sequence characteristics (genome signatures) of a wide range of species. In the present study, we generated BLSOMs for tetra- and pentanucleotide compositions in approximately one million sequence fragments derived from 101 eukaryotes, for which almost complete genome sequences were available. BLSOM recognized phylotype-specific characteristics (e.g., key combinations of oligonucleotide frequencies) in the genome sequences, permitting phylotype-specific clustering of the sequences without any information regarding the species. In our detailed examination of 12 Drosophila species, the correlation between their phylogenetic classification and the classification on the BLSOMs was observed to visualize oligonucleotides diagnostic for species-specific clustering.
Sequence Analysis of IncA/C and IncI1 Plasmids Isolated from Multidrug-Resistant Salmonella Newport Using Single-Molecule Real-Time Sequencing.

PubMed

Cao, Guojie; Allard, Marc; Hoffmann, Maria; Muruvanda, Tim; Luo, Yan; Payne, Justin; Meng, Kevin; Zhao, Shaohua; McDermott, Patrick; Brown, Eric; Meng, Jianghong

2018-06-01

Multidrug-resistant (MDR) plasmids play an important role in disseminating antimicrobial resistance genes. To elucidate the antimicrobial resistance gene compositions in A/C incompatibility complex (IncA/C) plasmids carried by animal-derived MDR Salmonella Newport, and to investigate the spread mechanism of IncA/C plasmids, this study characterizes the complete nucleotide sequences of IncA/C plasmids by comparative analysis. Complete nucleotide sequencing of plasmids and chromosomes of six MDR Salmonella Newport strains was performed using PacBio RSII. Open reading frames were assigned using prokaryotic genome annotation pipeline (PGAP). To understand genomic diversity and evolutionary relationships among Salmonella Newport IncA/C plasmids, we included three complete IncA/C plasmid sequences with similar backbones from Salmonella Newport and Escherichia coli: pSN254, pAM04528, and peH4H, and additional 200 draft chromosomes. With the exception of canine isolate CVM22462, which contained an additional IncI1 plasmid, each of the six MDR Salmonella Newport strains contained only the IncA/C plasmid. These IncA/C plasmids (including references) ranged in size from 80.1 (pCVM21538) to 176.5 kb (pSN254) and carried various resistance genes. Resistance genes floR, tetA, tetR, strA, strB, sul, and mer were identified in all IncA/C plasmids. Additionally, bla CMY-2 and sugE were present in all IncA/C plasmids, excepting pCVM21538. Plasmid pCVM22462 was capable of being transferred by conjugation. The IncI1 plasmid pCVM22462b in CVM22462 carried bla CMY-2 and sugE. Our data showed that MDR Salmonella Newport strains carrying similar IncA/C plasmids clustered together in the phylogenetic tree using chromosome sequences and the IncA/C plasmids from animal-derived Salmonella Newport contained diverse resistance genes. In the current study, we analyzed genomic diversities and phylogenetic relationships among MDR Salmonella Newport using complete plasmids and chromosome sequences and provided possible spread mechanism of IncA/C plasmids in Salmonella Newport Lineage II.
Genomic sequences of murine gamma B- and gamma C-crystallin-encoding genes: promoter analysis and complete evolutionary pattern of mouse, rat and human gamma-crystallins.

PubMed

Graw, J; Liebstein, A; Pietrowski, D; Schmitt-John, T; Werner, T

1993-12-22

The murine genes, gamma B-cry and gamma C-cry, encoding the gamma B- and gamma C-crystallins, were isolated from a genomic DNA library. The complete nucleotide (nt) sequences of both genes were determined from 661 and 711 bp, respectively, upstream from the first exon to the corresponding polyadenylation sites, comprising more than 2650 and 2890 bp, respectively. The new sequences were compared to the partial cDNA sequences available for the murine gamma B-cry and gamma C-cry, as well as to the corresponding genomic sequences from rat and man, at both the nt and predicted amino acid (aa) sequence levels. In the gamma B-cry promoter region, a canonical CCAAT-box, a TATA-box, putative NF-I and C/EBP sites were detected. An R-repeat is inserted 366 bp upstream from the transcription start point. In contrast, the gamma C-cry promoter does not contain a CCAAT-box, but some other putative binding sites for transcription factors (AP-2, UBP-1, LBP-1) were located by computer analysis. The promoter regions of all six gamma-cry from mouse, rat and human, except human psi gamma F-cry, were analyzed for common sequence elements. A complex sequence element of about 70-80 bp was found in the proximal promoter, which contains a gamma-cry-specific and almost invariant sequence (crygpel) of 14 nt, and ends with the also invariant TATA-box. Within the complex sequence element, a minimum of three further features specific for the gamma A-, gamma B- and gamma D/E/F-cry genes can be defined, at least two of which were recently shown to be functional. In addition to these four sequence elements, a subtype-specific structure of inverted repeats with different-sized spacers can be deduced from the multiple sequence alignment. A phylogenetic analysis based on the promoter region, as well as the complete exon 3 of all gamma-cry from mouse, rat and man, suggests separation of only five gamma-cry subtypes (gamma A-, gamma B-, gamma C-, gamma D- and gamma E/F-cry) prior to species separation.
Amyloid cores in prion domains: Key regulators for prion conformational conversion.

PubMed

Fernández, María Rosario; Batlle, Cristina; Gil-García, Marcos; Ventura, Salvador

2017-01-02

Despite the significant efforts devoted to decipher the particular protein features that encode for a prion or prion-like behavior, they are still poorly understood. The well-characterized yeast prions constitute an ideal model system to address this question, because, in these proteins, the prion activity can be univocally assigned to a specific region of their sequence, known as the prion forming domain (PFD). These PFDs are intrinsically disordered, relatively long and, in many cases, of low complexity, being enriched in glutamine/asparagine residues. Computational analyses have identified a significant number of proteins having similar domains in the human proteome. The compositional bias of these regions plays an important role in the transition of the prions to the amyloid state. However, it is difficult to explain how composition alone can account for the formation of specific contacts that position correctly PFDs and provide the enthalpic force to compensate for the large entropic cost of immobilizing these domains in the initial assemblies. We have hypothesized that short, sequence-specific, amyloid cores embedded in PFDs can perform these functions and, accordingly, act as preferential nucleation centers in both spontaneous and seeded aggregation. We have shown that the implementation of this concept in a prediction algorithm allows to score the prion propensities of putative PFDs with high accuracy. Recently, we have provided experimental evidence for the existence of such amyloid cores in the PFDs of Sup35, Ure2, Swi1, and Mot3 yeast prions. The fibrils formed by these short stretches may recognize and promote the aggregation of the complete proteins inside cells, being thus a promising tool for targeted protein inactivation.
Trends in genome dynamics among major orders of insects revealed through variations in protein families.

PubMed

Rappoport, Nadav; Linial, Michal

2015-08-07

Insects belong to a class that accounts for the majority of animals on earth. With over one million identified species, insects display a huge diversity and occupy extreme environments. At present, there are dozens of fully sequenced insect genomes that cover a range of habitats, social behavior and morphologies. In view of such diverse collection of genomes, revealing evolutionary trends and charting functional relationships of proteins remain challenging. We analyzed the relatedness of 17 complete proteomes representative of proteomes from insects including louse, bee, beetle, ants, flies and mosquitoes, as well as an out-group from the crustaceans. The analyzed proteomes mostly represented the orders of Hymenoptera and Diptera. The 287,405 protein sequences from the 18 proteomes were automatically clustered into 20,933 families, including 799 singletons. A comprehensive analysis based on statistical considerations identified the families that were significantly expanded or reduced in any of the studied organisms. Among all the tested species, ants are characterized by an exceptionally high rate of family gain and loss. By assigning annotations to hundreds of species-specific families, the functional diversity among species and between the major clades (Diptera and Hymenoptera) is revealed. We found that many species-specific families are associated with receptor signaling, stress-related functions and proteases. The highest variability among insects associates with the function of transposition and nucleic acids processes (collectively coined TNAP). Specifically, the wasp and ants have an order of magnitude more TNAP families and proteins relative to species that belong to Diptera (mosquitoes and flies). An unsupervised clustering methodology combined with a comparative functional analysis unveiled proteomic signatures in the major clades of winged insects. We propose that the expansion of TNAP families in Hymenoptera potentially contributes to the accelerated genome dynamics that characterize the wasp and ants.
Conformation-Specific Infrared and Ultraviolet Spectroscopy of Cold [YAPAA+H]^{+} and [YGPAA+H]^{+} Ions: a Stereochemical "twist" on the β-HAIRPIN Turn

NASA Astrophysics Data System (ADS)

DeBlase, Andrew F.; Harrilal, Christopher P.; Lawler, John T.; Burke, Nicole L.; McLuckey, Scott A.; Zwier, Timothy S.

2017-06-01

Incorporation of the unnatural D-proline (^{D}P) stereoisomer into a polypeptide sequence is a typical strategy to encourage formation of β-hairpin loops because natural sequences are often unstructured in solution. Using conformation-specific IR and UV spectroscopy of cold (10 K) gas-phase ions, we probe the inherent conformational preferences of the ^{D}P and ^{L}P diastereomers in the protonated peptide [YAPAA+H]^{+}, where only intramolecular interactions are possible. Consistent with the solution phase studies, one of the conformers of [YADPAA+H]^{+} is folded into a charge-stabilized β-hairpin turn. However, a second predominant conformer family containing two sequential γ-turns is also identified, with similar energetic stability. A single conformational isomer of the ^{L}P diastereomer, [YALPAA+H]^{+}, is found and assigned to a structure that is not the anticipated "mirror image" β-turn. Instead, the ^{L}P stereo center promotes a cis alanine-proline amide bond. The assigned structures contain clues that the preference of the ^{D}P diastereomer to support a trans-amide bond and the proclivity of ^{L}P for a cis-amide bond is sterically driven and can be reversed by substituting glycine for alanine in position 2, forming [YGLPAA+H]^{+}. These results provide a basis for understanding the residue-specific and stereo-specific alterations in the potential energy surface that underlie these changing preferences, providing insights to the origin of β-hairpin formation.
Rapid measurement of 3J(H N-H alpha) and 3J(N-H beta) coupling constants in polypeptides.

PubMed

Barnwal, Ravi Pratap; Rout, Ashok K; Chary, Kandala V R; Atreya, Hanudatta S

2007-12-01

We present two NMR experiments, (3,2)D HNHA and (3,2)D HNHB, for rapid and accurate measurement of 3J(H N-H alpha) and 3J(N-H beta) coupling constants in polypeptides based on the principle of G-matrix Fourier transform NMR spectroscopy and quantitative J-correlation. These experiments, which facilitate fast acquisition of three-dimensional data with high spectral/digital resolution and chemical shift dispersion, will provide renewed opportunities to utilize them for sequence specific resonance assignments, estimation/characterization of secondary structure with/without prior knowledge of resonance assignments, stereospecific assignment of prochiral groups and 3D structure determination, refinement and validation. Taken together, these experiments have a wide range of applications from structural genomics projects to studying structure and folding in polypeptides.
GenBank.

PubMed

Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Sayers, Eric W

2011-01-01

GenBank® is a comprehensive database that contains publicly available nucleotide sequences for more than 380,000 organisms named at the genus level or lower, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole genome shotgun (WGS) and environmental sampling projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. GenBank is accessible through the NCBI Entrez retrieval system that integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage: www.ncbi.nlm.nih.gov.
Identification and subspecific differentiation of Mycobacterium scrofulaceum by automated sequencing of a region of the gene (hsp65) encoding a 65-kilodalton heat shock protein.

PubMed Central

Swanson, D S; Pan, X; Musser, J M

1996-01-01

Mycobacterium scrofulaceum is most commonly recovered from children with cervical lymphadenitis, although it also accounts for approximately 2% of the mycobacterial infections in AIDS patients. Species assignment of M. scrofulaceum isolated by conventional techniques can be difficult and time-consuming. To develop a strategy for rapid species assignment of these organisms, a 360-bp region of the gene (hsp65) encoding a 65-kDa heat shock protein in 37 isolates from diverse sources was sequenced. Eight hsp65 alleles were identified, and these sequences formed phylogenetic clusters and lineages largely distinct from other Mycobacterium species. There was incomplete correlation between serovar designation and hsp65 allele assignment. The hsp65 data correlated strongly with the results of sequence analysis of the gene coding for 16S rRNA. Automated DNA sequencing of a 360-bp region of the hsp65 gene provides a rapid and unambiguous method for species assignment of these acid-fast organisms for diagnostic purposes. PMID:8940463
VDJServer: A Cloud-Based Analysis Portal and Data Commons for Immune Repertoire Sequences and Rearrangements.

PubMed

Christley, Scott; Scarborough, Walter; Salinas, Eddie; Rounds, William H; Toby, Inimary T; Fonner, John M; Levin, Mikhail K; Kim, Min; Mock, Stephen A; Jordan, Christopher; Ostmeyer, Jared; Buntzman, Adam; Rubelt, Florian; Davila, Marco L; Monson, Nancy L; Scheuermann, Richard H; Cowell, Lindsay G

2018-01-01

Recent technological advances in immune repertoire sequencing have created tremendous potential for advancing our understanding of adaptive immune response dynamics in various states of health and disease. Immune repertoire sequencing produces large, highly complex data sets, however, which require specialized methods and software tools for their effective analysis and interpretation. VDJServer is a cloud-based analysis portal for immune repertoire sequence data that provide access to a suite of tools for a complete analysis workflow, including modules for preprocessing and quality control of sequence reads, V(D)J gene segment assignment, repertoire characterization, and repertoire comparison. VDJServer also provides sophisticated visualizations for exploratory analysis. It is accessible through a standard web browser via a graphical user interface designed for use by immunologists, clinicians, and bioinformatics researchers. VDJServer provides a data commons for public sharing of repertoire sequencing data, as well as private sharing of data between users. We describe the main functionality and architecture of VDJServer and demonstrate its capabilities with use cases from cancer immunology and autoimmunity. VDJServer provides a complete analysis suite for human and mouse T-cell and B-cell receptor repertoire sequencing data. The combination of its user-friendly interface and high-performance computing allows large immune repertoire sequencing projects to be analyzed with no programming or software installation required. VDJServer is a web-accessible cloud platform that provides access through a graphical user interface to a data management infrastructure, a collection of analysis tools covering all steps in an analysis, and an infrastructure for sharing data along with workflows, results, and computational provenance. VDJServer is a free, publicly available, and open-source licensed resource.
Complete genome sequence of Tomato mosaic virus isolated from jasmine in the United States

USDA-ARS?s Scientific Manuscript database

Tomato mosaic virus (ToMV) was first identified in jasmine in the U.S. in Florida in 1999. This report provides the first full genome sequence of a ToMV isolate from jasmine. The full genome sequence of this virus will enable research scientists to develop additional specific diagnostic tests for ...
Performance Based Traffic Safety Education Course. Two-Phase Program.

ERIC Educational Resources Information Center

Washington State Board of Education, Olympia.

This course for high school highway traffic safety education is intended to help students learn to make good driving decisions. It consists of twenty-one modules--ten sequenced, two not in specific sequence but intended to be completed in the earlier part of the course, and nine non-sequenced modules. Each module begins with an outline providing…
MALDI Top-Down sequencing: calling N- and C-terminal protein sequences with high confidence and speed.

PubMed

Suckau, Detlev; Resemann, Anja

2009-12-01

The ability to match Top-Down protein sequencing (TDS) results by MALDI-TOF to protein sequences by classical protein database searching was evaluated in this work. Resulting from these analyses were the protein identity, the simultaneous assignment of the N- and C-termini and protein sequences of up to 70 residues from either terminus. In combination with de novo sequencing using the MALDI-TDS data, even fusion proteins were assigned and the detailed sequence around the fusion site was elucidated. MALDI-TDS allowed to efficiently match protein sequences quickly and to validate recombinant protein structures-in particular, protein termini-on the level of undigested proteins.
Learning cellular sorting pathways using protein interactions and sequence motifs.

PubMed

Lin, Tien-Ho; Bar-Joseph, Ziv; Murphy, Robert F

2011-11-01

Proper subcellular localization is critical for proteins to perform their roles in cellular functions. Proteins are transported by different cellular sorting pathways, some of which take a protein through several intermediate locations until reaching its final destination. The pathway a protein is transported through is determined by carrier proteins that bind to specific sequence motifs. In this article, we present a new method that integrates protein interaction and sequence motif data to model how proteins are sorted through these sorting pathways. We use a hidden Markov model (HMM) to represent protein sorting pathways. The model is able to determine intermediate sorting states and to assign carrier proteins and motifs to the sorting pathways. In simulation studies, we show that the method can accurately recover an underlying sorting model. Using data for yeast, we show that our model leads to accurate prediction of subcellular localization. We also show that the pathways learned by our model recover many known sorting pathways and correctly assign proteins to the path they utilize. The learned model identified new pathways and their putative carriers and motifs and these may represent novel protein sorting mechanisms. Supplementary results and software implementation are available from http://murphylab.web.cmu.edu/software/2010_RECOMB_pathways/.
Classification of DNA nucleotides with transverse tunneling currents

NASA Astrophysics Data System (ADS)

Nyvold Pedersen, Jonas; Boynton, Paul; Di Ventra, Massimiliano; Jauho, Antti-Pekka; Flyvbjerg, Henrik

2017-01-01

It has been theoretically suggested and experimentally demonstrated that fast and low-cost sequencing of DNA, RNA, and peptide molecules might be achieved by passing such molecules between electrodes embedded in a nanochannel. The experimental realization of this scheme faces major challenges, however. In realistic liquid environments, typical currents in tunneling devices are of the order of picoamps. This corresponds to only six electrons per microsecond, and this number affects the integration time required to do current measurements in real experiments. This limits the speed of sequencing, though current fluctuations due to Brownian motion of the molecule average out during the required integration time. Moreover, data acquisition equipment introduces noise, and electronic filters create correlations in time-series data. We discuss how these effects must be included in the analysis of, e.g., the assignment of specific nucleobases to current signals. As the signals from different molecules overlap, unambiguous classification is impossible with a single measurement. We argue that the assignment of molecules to a signal is a standard pattern classification problem and calculation of the error rates is straightforward. The ideas presented here can be extended to other sequencing approaches of current interest.
Tissue-Specific Transcriptome Profiling of Plutella Xylostella Third Instar Larval Midgut

PubMed Central

Xie, Wen; Lei, Yanyuan; Fu, Wei; Yang, Zhongxia; Zhu, Xun; Guo, Zhaojiang; Wu, Qingjun; Wang, Shaoli; Xu, Baoyun; Zhou, Xuguo; Zhang, Youjun

2012-01-01

The larval midgut of diamondback moth, Plutella xylostella, is a dynamic tissue that interfaces with a diverse array of physiological and toxicological processes, including nutrient digestion and allocation, xenobiotic detoxification, innate and adaptive immune response, and pathogen defense. Despite its enormous agricultural importance, the genomic resources for P. xylostella are surprisingly scarce. In this study, a Bt resistant P. xylostella strain was subjected to the in-depth transcriptome analysis to identify genes and gene networks putatively involved in various physiological and toxicological processes in the P. xylostella larval midgut. Using Illumina deep sequencing, we obtained roughly 40 million reads containing approximately 3.6 gigabases of sequence data. De novo assembly generated 63,312 ESTs with an average read length of 416bp, and approximately half of the P. xylostella sequences (45.4%, 28,768) showed similarity to the non-redundant database in GenBank with a cut-off E-value below 10-5. Among them, 11,092 unigenes were assigned to one or multiple GO terms and 16,732 unigenes were assigned to 226 specific pathways. In-depth analysis indentified genes putatively involved in insecticide resistance, nutrient digestion, and innate immune defense. Besides conventional detoxification enzymes and insecticide targets, novel genes, including 28 chymotrypsins and 53 ABC transporters, have been uncovered in the P. xylostella larval midgut transcriptome; which are potentially linked to the Bt toxicity and resistance. Furthermore, an unexpectedly high number of ESTs, including 46 serpins and 7 lysozymes, were predicted to be involved in the immune defense. As the first tissue-specific transcriptome analysis of P. xylostella, this study sheds light on the molecular understanding of insecticide resistance, especially Bt resistance in an agriculturally important insect pest, and lays the foundation for future functional genomics research. In addition, current sequencing effort greatly enriched the existing P. xylostella EST database, and makes RNAseq a viable option in the future genomic analysis. PMID:23091412

Tissue-specific transcriptome profiling of Plutella xylostella third instar larval midgut.

PubMed

Xie, Wen; Lei, Yanyuan; Fu, Wei; Yang, Zhongxia; Zhu, Xun; Guo, Zhaojiang; Wu, Qingjun; Wang, Shaoli; Xu, Baoyun; Zhou, Xuguo; Zhang, Youjun

2012-01-01

The larval midgut of diamondback moth, Plutella xylostella, is a dynamic tissue that interfaces with a diverse array of physiological and toxicological processes, including nutrient digestion and allocation, xenobiotic detoxification, innate and adaptive immune response, and pathogen defense. Despite its enormous agricultural importance, the genomic resources for P. xylostella are surprisingly scarce. In this study, a Bt resistant P. xylostella strain was subjected to the in-depth transcriptome analysis to identify genes and gene networks putatively involved in various physiological and toxicological processes in the P. xylostella larval midgut. Using Illumina deep sequencing, we obtained roughly 40 million reads containing approximately 3.6 gigabases of sequence data. De novo assembly generated 63,312 ESTs with an average read length of 416 bp, and approximately half of the P. xylostella sequences (45.4%, 28,768) showed similarity to the non-redundant database in GenBank with a cut-off E-value below 10(-5). Among them, 11,092 unigenes were assigned to one or multiple GO terms and 16,732 unigenes were assigned to 226 specific pathways. In-depth analysis identified genes putatively involved in insecticide resistance, nutrient digestion, and innate immune defense. Besides conventional detoxification enzymes and insecticide targets, novel genes, including 28 chymotrypsins and 53 ABC transporters, have been uncovered in the P. xylostella larval midgut transcriptome; which are potentially linked to the Bt toxicity and resistance. Furthermore, an unexpectedly high number of ESTs, including 46 serpins and 7 lysozymes, were predicted to be involved in the immune defense.As the first tissue-specific transcriptome analysis of P. xylostella, this study sheds light on the molecular understanding of insecticide resistance, especially Bt resistance in an agriculturally important insect pest, and lays the foundation for future functional genomics research. In addition, current sequencing effort greatly enriched the existing P. xylostella EST database, and makes RNAseq a viable option in the future genomic analysis.
Species-specific Typing of DNA Based on Palindrome Frequency Patterns

PubMed Central

Lamprea-Burgunder, Estelle; Ludin, Philipp; Mäser, Pascal

2011-01-01

DNA in its natural, double-stranded form may contain palindromes, sequences which read the same from either side because they are identical to their reverse complement on the sister strand. Short palindromes are underrepresented in all kinds of genomes. The frequency distribution of short palindromes exhibits more than twice the inter-species variance of non-palindromic sequences, which renders palindromes optimally suited for the typing of DNA. Here, we show that based on palindrome frequency, DNA sequences can be discriminated to the level of species of origin. By plotting the ratios of actual occurrence to expectancy, we generate palindrome frequency patterns that allow to cluster different sequences of the same genome and to assign plasmids, and in some cases even viruses to their respective host genomes. This finding will be of use in the growing field of metagenomics. PMID:21429991
Ultraviolet photodissociation enhances top-down mass spectrometry as demonstrated on green fluorescent protein variants.

PubMed

Dang, Xibei; Young, Nicolas L

2014-05-01

Ultraviolet photodissociation (UVPD) is a compelling fragmentation technique with great potential to enhance proteomics generally and top-down MS specifically. In this issue, Cannon et al. (Proteomics 2014, 14, XXXX-XXXX) use UVPD to perform top-down MS on several sequence variants of green fluorescent protein and compare the results to CID, higher energy collision induced dissociation, and electron transfer dissociation. As compared to the other techniques UVPD produces a wider variety of fragment ion types that are relatively evenly distributed across the protein sequences. Overall, their results demonstrate enhanced sequence coverage and higher confidence in sequence assignment via UVPD MS. Based on these and other recent results UVPD is certain to become an increasingly widespread and valuable tool for top-down proteomics. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Stable isotope, site-specific mass tagging for protein identification

DOEpatents

Chen, Xian

2006-10-24

Proteolytic peptide mass mapping as measured by mass spectrometry provides an important method for the identification of proteins, which are usually identified by matching the measured and calculated m/z values of the proteolytic peptides. A unique identification is, however, heavily dependent upon the mass accuracy and sequence coverage of the fragment ions generated by peptide ionization. The present invention describes a method for increasing the specificity, accuracy and efficiency of the assignments of particular proteolytic peptides and consequent protein identification, by the incorporation of selected amino acid residue(s) enriched with stable isotope(s) into the protein sequence without the need for ultrahigh instrumental accuracy. Selected amino acid(s) are labeled with .sup.13C/.sup.15N/.sup.2H and incorporated into proteins in a sequence-specific manner during cell culturing. Each of these labeled amino acids carries a defined mass change encoded in its monoisotopic distribution pattern. Through their characteristic patterns, the peptides with mass tag(s) can then be readily distinguished from other peptides in mass spectra. The present method of identifying unique proteins can also be extended to protein complexes and will significantly increase data search specificity, efficiency and accuracy for protein identifications.
Genomic organization of human fetal specific P-450IIIA7 (cytochrome P-450HFLa)-related gene(s) and interaction of transcriptional regulatory factor with its DNA element in the 5' flanking region.

PubMed

Itoh, S; Yanagimoto, T; Tagawa, S; Hashimoto, H; Kitamura, R; Nakajima, Y; Okochi, T; Fujimoto, S; Uchino, J; Kamataki, T

1992-03-24

P-450IIIA7 is a form of cytochrome P-450 which was isolated from human fetal livers and termed P-450HFLa. This form has been clarified to be expressed during fetal life specifically (Komori, M., Nishio, K., Kitada, M., Shiramatsu, K., Muroya, K., Soma, M., Nagashima, K. and Kamataki, T. (1990) Biochemistry 29, 4430-4433). In the present study, we isolated five independent clones which probably corresponded to the human P-450IIIA7 gene. These clones were completely sequenced, all exons, exon-intron junctions and the 5' flanking region from the cap site to-869. Although the sequences in the coding region were completely identical to P-450IIIA7, it is possible that genomic fragments sequenced in this study encode portions of other P-450IIIA7-related genes since we could not obtain a complete overlapping set of genomic clones. Within its 5' flanking sequence, the putative binding sites of several transcriptional regulatory factors existed. Among them, it was shown that a basic transcription element binding factor (BTEB) actually interacted with the 5' flanking region of this gene.
Comprehensive analysis of orthologous protein domains using the HOPS database.

PubMed

Storm, Christian E V; Sonnhammer, Erik L L

2003-10-01

One of the most reliable methods for protein function annotation is to transfer experimentally known functions from orthologous proteins in other organisms. Most methods for identifying orthologs operate on a subset of organisms with a completely sequenced genome, and treat proteins as single-domain units. However, it is well known that proteins are often made up of several independent domains, and there is a wealth of protein sequences from genomes that are not completely sequenced. A comprehensive set of protein domain families is found in the Pfam database. We wanted to apply orthology detection to Pfam families, but first some issues needed to be addressed. First, orthology detection becomes impractical and unreliable when too many species are included. Second, shorter domains contain less information. It is therefore important to assess the quality of the orthology assignment and avoid very short domains altogether. We present a database of orthologous protein domains in Pfam called HOPS: Hierarchical grouping of Orthologous and Paralogous Sequences. Orthology is inferred in a hierarchic system of phylogenetic subgroups using ortholog bootstrapping. To avoid the frequent errors stemming from horizontally transferred genes in bacteria, the analysis is presently limited to eukaryotic genes. The results are accessible in the graphical browser NIFAS, a Java tool originally developed for analyzing phylogenetic relations within Pfam families. The method was tested on a set of curated orthologs with experimentally verified function. In comparison to tree reconciliation with a complete species tree, our approach finds significantly more orthologs in the test set. Examples for investigating gene fusions and domain recombination using HOPS are given.
Plastome Sequence Determination and Comparative Analysis for Members of the Lolium-Festuca Grass Species Complex

PubMed Central

Hand, Melanie L.; Spangenberg, German C.; Forster, John W.; Cogan, Noel O. I.

2013-01-01

Chloroplast genome sequences are of broad significance in plant biology, due to frequent use in molecular phylogenetics, comparative genomics, population genetics, and genetic modification studies. The present study used a second-generation sequencing approach to determine and assemble the plastid genomes (plastomes) of four representatives from the agriculturally important Lolium-Festuca species complex of pasture grasses (Lolium multiflorum, Festuca pratensis, Festuca altissima, and Festuca ovina). Total cellular DNA was extracted from either roots or leaves, was sequenced, and the output was filtered for plastome-related reads. A comparison between sources revealed fewer plastome-related reads from root-derived template but an increase in incidental bacterium-derived sequences. Plastome assembly and annotation indicated high levels of sequence identity and a conserved organization and gene content between species. However, frequent deletions within the F. ovina plastome appeared to contribute to a smaller plastid genome size. Comparative analysis with complete plastome sequences from other members of the Poaceae confirmed conservation of most grass-specific features. Detailed analysis of the rbcL–psaI intergenic region, however, revealed a “hot-spot” of variation characterized by independent deletion events. The evolutionary implications of this observation are discussed. The complete plastome sequences are anticipated to provide the basis for potential organelle-specific genetic modification of pasture grasses. PMID:23550121
Detection of nucleic acids by multiple sequential invasive cleavages

DOEpatents

Hall, Jeff G.; Lyamichev, Victor I.; Mast, Andrea L.; Brow, Mary Ann D.

1999-01-01

The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based on charge. The present invention also provides methods for the detection of non-target cleavage products via the formation of a complete and activated protein binding region. The invention further provides sensitive and specific methods for the detection of human cytomegalovirus nucleic acid in a sample.
Nucleic acid detection kits

DOEpatents

Hall, Jeff G.; Lyamichev, Victor I.; Mast, Andrea L.; Brow, Mary Ann; Kwiatkowski, Robert W.; Vavra, Stephanie H.

2005-03-29

The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based on charge. The present invention also provides methods for the detection of non-target cleavage products via the formation of a complete and activated protein binding region. The invention further provides sensitive and specific methods for the detection of nucleic acid from various viruses in a sample.
Detection of nucleic acids by multiple sequential invasive cleavages 02

DOEpatents

Hall, Jeff G.; Lyamichev, Victor I.; Mast, Andrea L.; Brow, Mary Ann D.

2002-01-01

The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based on charge. The present invention also provides methods for the detection of non-target cleavage products via the formation of a complete and activated protein binding region. The invention further provides sensitive and specific methods for the detection of human cytomegalovirus nucleic acid in a sample.
Detection of nucleic acids by multiple sequential invasive cleavages

DOEpatents

Hall, Jeff G; Lyamichev, Victor I; Mast, Andrea L; Brow, Mary Ann D

2012-10-16

The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based on charge. The present invention also provides methods for the detection of non-target cleavage products via the formation of a complete and activated protein binding region. The invention further provides sensitive and specific methods for the detection of human cytomegalovirus nucleic acid in a sample.
A De-Novo Genome Analysis Pipeline (DeNoGAP) for large-scale comparative prokaryotic genomics studies.

PubMed

Thakur, Shalabh; Guttman, David S

2016-06-30

Comparative analysis of whole genome sequence data from closely related prokaryotic species or strains is becoming an increasingly important and accessible approach for addressing both fundamental and applied biological questions. While there are number of excellent tools developed for performing this task, most scale poorly when faced with hundreds of genome sequences, and many require extensive manual curation. We have developed a de-novo genome analysis pipeline (DeNoGAP) for the automated, iterative and high-throughput analysis of data from comparative genomics projects involving hundreds of whole genome sequences. The pipeline is designed to perform reference-assisted and de novo gene prediction, homolog protein family assignment, ortholog prediction, functional annotation, and pan-genome analysis using a range of proven tools and databases. While most existing methods scale quadratically with the number of genomes since they rely on pairwise comparisons among predicted protein sequences, DeNoGAP scales linearly since the homology assignment is based on iteratively refined hidden Markov models. This iterative clustering strategy enables DeNoGAP to handle a very large number of genomes using minimal computational resources. Moreover, the modular structure of the pipeline permits easy updates as new analysis programs become available. DeNoGAP integrates bioinformatics tools and databases for comparative analysis of a large number of genomes. The pipeline offers tools and algorithms for annotation and analysis of completed and draft genome sequences. The pipeline is developed using Perl, BioPerl and SQLite on Ubuntu Linux version 12.04 LTS. Currently, the software package accompanies script for automated installation of necessary external programs on Ubuntu Linux; however, the pipeline should be also compatible with other Linux and Unix systems after necessary external programs are installed. DeNoGAP is freely available at https://sourceforge.net/projects/denogap/ .
Complete genome sequences of three Erwinia amylovora phages isolated in north america and a bacteriophage induced from an Erwinia tasmaniensis strain.

PubMed

Müller, I; Kube, M; Reinhardt, R; Jelkmann, W; Geider, K

2011-02-01

Fire blight, a plant disease of economic importance caused by Erwinia amylovora, may be controlled by the application of bacteriophages. Here, we provide the complete genome sequences and the annotation of three E. amylovora-specific phages isolated in North America and genomic information about a bacteriophage induced by mitomycin C treatment of an Erwinia tasmaniensis strain that is antagonistic for E. amylovora. The American phages resemble two already-described viral genomes, whereas the E. tasmaniensis phage displays a singular genomic sequence in BLAST searches.
Species-specific identification of Dekkera/Brettanomyces yeasts by fluorescently labeled DNA probes targeting the 26S rRNA.

PubMed

Röder, Christoph; König, Helmut; Fröhlich, Jürgen

2007-09-01

Sequencing of the complete 26S rRNA genes of all Dekkera/Brettanomyces species colonizing different beverages revealed the potential for a specific primer and probe design to support diagnostic PCR approaches and FISH. By analysis of the complete 26S rRNA genes of all five currently known Dekkera/Brettanomyces species (Dekkera bruxellensis, D. anomala, Brettanomyces custersianus, B. nanus and B. naardenensis), several regions with high nucleotide sequence variability yet distinct from the D1/D2 domains were identified. FISH species-specific probes targeting the 26S rRNA gene's most variable regions were designed. Accessibility of probe targets for hybridization was facilitated by the construction of partially complementary 'side'-labeled probes, based on secondary structure models of the rRNA sequences. The specificity and routine applicability of the FISH-based method for yeast identification were tested by analyzing different wine isolates. Investigation of the prevalence of Dekkera/Brettanomyces yeasts in the German viticultural regions Wonnegau, Nierstein and Bingen (Rhinehesse, Rhineland-Palatinate) resulted in the isolation of 37 D. bruxellensis strains from 291 wine samples.
Collaborative Learning through Formative Peer Review with Technology

ERIC Educational Resources Information Center

Eaton, Carrie Diaz; Wade, Stephanie

2014-01-01

This paper describes a collaboration between a mathematician and a compositionist who developed a sequence of collaborative writing assignments for calculus. This sequence of developmentally appropriate assignments presents peer review as a collaborative process that promotes reflection, deepens understanding, and improves exposition. First, we…
Complete genomic sequences for hepatitis C virus subtypes 4b, 4c, 4d, 4g, 4k, 4l, 4m, 4n, 4o, 4p, 4q, 4r and 4t.

PubMed

Li, Chunhua; Lu, Ling; Wu, Xianghong; Wang, Chuanxi; Bennett, Phil; Lu, Teng; Murphy, Donald

2009-08-01

In this study, we characterized the full-length genomic sequences of 13 distinct hepatitis C virus (HCV) genotype 4 isolates/subtypes: QC264/4b, QC381/4c, QC382/4d, QC193/4g, QC383/4k, QC274/4l, QC249/4m, QC97/4n, QC93/4o, QC139/4p, QC262/4q, QC384/4r and QC155/4t. These were amplified, using RT-PCR, from the sera of patients now residing in Canada, 11 of which were African immigrants. The resulting genomes varied between 9421 and 9475 nt in length and each contains a single ORF of 9018-9069 nt. The sequences showed nucleotide similarities of 77.3-84.3 % in comparison with subtypes 4a (GenBank accession no. Y11604) and 4f (EF589160) and 70.6-72.8 % in comparison with genotype 1 (M62321/1a, M58335/1b, D14853/1c, and 1?/AJ851228) reference sequences. These similarities were often higher than those currently defined by HCV classification criteria for subtype (75.0-80.0 %) and genotype (67.0-70.0 %) division, respectively. Further analyses of the complete and partial E1 and partial NS5B sequences confirmed these 13 'provisionally assigned subtypes'.
An expanded mammal mitogenome dataset from Southeast Asia

PubMed Central

Ramos-Madrigal, Jazmín; Peñaloza, Fernando; Liu, Shanlin; Mikkel-Holger, S. Sinding; Riddhi, P. Patel; Martins, Renata; Lenz, Dorina; Fickel, Jörns; Roos, Christian; Shamsir, Mohd Shahir; Azman, Mohammad Shahfiz; Burton, K. Lim; Stephen, J. Rossiter; Wilting, Andreas

2017-01-01

Abstract Southeast (SE) Asia is 1 of the most biodiverse regions in the world, and it holds approximately 20% of all mammal species. Despite this, the majority of SE Asia's genetic diversity is still poorly characterized. The growing interest in using environmental DNA to assess and monitor SE Asian species, in particular threatened mammals—has created the urgent need to expand the available reference database of mitochondrial barcode and complete mitogenome sequences. We have partially addressed this need by generating 72 new mitogenome sequences reconstructed from DNA isolated from a range of historical and modern tissue samples. Approximately 55 gigabases of raw sequence were generated. From this data, we assembled 72 complete mitogenome sequences, with an average depth of coverage of ×102.9 and ×55.2 for modern samples and historical samples, respectively. This dataset represents 52 species, of which 30 species had no previous mitogenome data available. The mitogenomes were geotagged to their sampling location, where known, to display a detailed geographical distribution of the species. Our new database of 52 taxa will strongly enhance the utility of environmental DNA approaches for monitoring mammals in SE Asia as it greatly increases the likelihoods that identification of metabarcoding sequencing reads can be assigned to reference sequences. This magnifies the confidence in species detections and thus allows more robust surveys and monitoring programmes of SE Asia's threatened mammal biodiversity. The extensive collections of historical samples from SE Asia in western and SE Asian museums should serve as additional valuable material to further enrich this reference database. PMID:28873965
An expanded mammal mitogenome dataset from Southeast Asia.

PubMed

Mohd Salleh, Faezah; Ramos-Madrigal, Jazmín; Peñaloza, Fernando; Liu, Shanlin; Mikkel-Holger, S Sinding; Riddhi, P Patel; Martins, Renata; Lenz, Dorina; Fickel, Jörns; Roos, Christian; Shamsir, Mohd Shahir; Azman, Mohammad Shahfiz; Burton, K Lim; Stephen, J Rossiter; Wilting, Andreas; Gilbert, M Thomas P

2017-08-01

Southeast (SE) Asia is 1 of the most biodiverse regions in the world, and it holds approximately 20% of all mammal species. Despite this, the majority of SE Asia's genetic diversity is still poorly characterized. The growing interest in using environmental DNA to assess and monitor SE Asian species, in particular threatened mammals-has created the urgent need to expand the available reference database of mitochondrial barcode and complete mitogenome sequences. We have partially addressed this need by generating 72 new mitogenome sequences reconstructed from DNA isolated from a range of historical and modern tissue samples. Approximately 55 gigabases of raw sequence were generated. From this data, we assembled 72 complete mitogenome sequences, with an average depth of coverage of ×102.9 and ×55.2 for modern samples and historical samples, respectively. This dataset represents 52 species, of which 30 species had no previous mitogenome data available. The mitogenomes were geotagged to their sampling location, where known, to display a detailed geographical distribution of the species. Our new database of 52 taxa will strongly enhance the utility of environmental DNA approaches for monitoring mammals in SE Asia as it greatly increases the likelihoods that identification of metabarcoding sequencing reads can be assigned to reference sequences. This magnifies the confidence in species detections and thus allows more robust surveys and monitoring programmes of SE Asia's threatened mammal biodiversity. The extensive collections of historical samples from SE Asia in western and SE Asian museums should serve as additional valuable material to further enrich this reference database. © The Author 2017. Published by Oxford University Press.
Activity-based protein profiling: from enzyme chemistry to proteomic chemistry.

PubMed

Cravatt, Benjamin F; Wright, Aaron T; Kozarich, John W

2008-01-01

Genome sequencing projects have provided researchers with a complete inventory of the predicted proteins produced by eukaryotic and prokaryotic organisms. Assignment of functions to these proteins represents one of the principal challenges for the field of proteomics. Activity-based protein profiling (ABPP) has emerged as a powerful chemical proteomic strategy to characterize enzyme function directly in native biological systems on a global scale. Here, we review the basic technology of ABPP, the enzyme classes addressable by this method, and the biological discoveries attributable to its application.
BAIT: Organizing genomes and mapping rearrangements in single cells.

PubMed

Hills, Mark; O'Neill, Kieran; Falconer, Ester; Brinkman, Ryan; Lansdorp, Peter M

2013-01-01

Strand-seq is a single-cell sequencing technique to finely map sister chromatid exchanges (SCEs) and other rearrangements. To analyze these data, we introduce BAIT, software which assigns templates and identifies and localizes SCEs. We demonstrate BAIT can refine completed reference assemblies, identifying approximately 21 Mb of incorrectly oriented fragments and placing over half (2.6 Mb) of the orphan fragments in mm10/GRCm38. BAIT also stratifies scaffold-stage assemblies, potentially accelerating the assembling and finishing of reference genomes. BAIT is available at http://sourceforge.net/projects/bait/.

Molecular characterization and combined genotype association study of bovine cluster of differentiation 14 gene with clinical mastitis in crossbred dairy cattle

PubMed Central

Selvan, A. Sakthivel; Gupta, I. D.; Verma, A.; Chaudhari, M. V.; Magotra, A.

2016-01-01

Aim: The present study was undertaken with the objectives to characterize and to analyze combined genotypes of cluster of differentiation 14 (CD14) gene to explore its association with clinical mastitis in Karan Fries (KF) cows maintained in the National Dairy Research Institute herd, Karnal. Materials and Methods: Genomic DNA was extracted using blood of randomly selected 94 KF lactating cattle by phenol-chloroform method. After checking its quality and quantity, polymerase chain reaction (PCR) was carried out using six sets of reported gene-specific primers to amplify complete KF CD14 gene. The forward and reverse sequences for each PCR fragments were assembled to form complete sequence for the respective region of KF CD14 gene. The multiple sequence alignments of the edited sequence with the corresponding reference with reported Bos taurus sequence (EU148610.1) were performed with ClustalW software to identify single nucleotide polymorphisms (SNPs). Basic Local Alignment Search Tool analysis was performed to compare the sequence identity of KF CD14 gene with other species. The restriction fragment length polymorphism (RFLP) analysis was carried out in all KF cows using Helicobacter pylori 188I (Hpy188I) (contig 2) and Haemophilus influenzae I (HinfI) (contig 4) restriction enzyme (RE). Cows were assigned genotypes obtained by PCR-RFLP analysis, and association study was done using Chi-square (χ2) test. The genotypes of both contigs (loci) number 2 and 4 were combined with respect to each animal to construct combined genotype patterns. Results: Two types of sequences of KF were obtained: One with 2630 bp having one insertion at 616 nucleotide (nt) position and one deletion at 1117 nt position, and the another sequence was of 2629 bp having only one deletion at 615 nt position. ClustalW, multiple alignments of KF CD14 gene sequence with B. taurus cattle sequence (EU148610.1), revealed 24 nt changes (SNPs). Cows were also screened using PCR-RFLP with Hpy188I (contig 2) and HinfI (contig 4) RE, which revealed three genotypes each that differed significantly regarding mastitis incidence. The maximum possible combination of these two loci shown nine combined genotype patterns and it was observed only eight combined genotypes out of nine: AACC, AACD, AADD, ABCD, ABDD, BBCC, BBCD, and BBDD. The combined genotype ABCC was not observed in the studied population of KF cows. Out of 94 animals, AACD combined genotype animals (10.63%) were found to be not affected with mastitis, and ABDD combined genotyped animals was observed having the highest mastitis incidence of 15.96%. Conclusion: AACD typed cows were found to be least susceptible to mastitis incidence as compared to other combined genotypes. PMID:27536026
Molecular characterization and combined genotype association study of bovine cluster of differentiation 14 gene with clinical mastitis in crossbred dairy cattle.

PubMed

Selvan, A Sakthivel; Gupta, I D; Verma, A; Chaudhari, M V; Magotra, A

2016-07-01

The present study was undertaken with the objectives to characterize and to analyze combined genotypes of cluster of differentiation 14 (CD14) gene to explore its association with clinical mastitis in Karan Fries (KF) cows maintained in the National Dairy Research Institute herd, Karnal. Genomic DNA was extracted using blood of randomly selected 94 KF lactating cattle by phenol-chloroform method. After checking its quality and quantity, polymerase chain reaction (PCR) was carried out using six sets of reported gene-specific primers to amplify complete KF CD14 gene. The forward and reverse sequences for each PCR fragments were assembled to form complete sequence for the respective region of KF CD14 gene. The multiple sequence alignments of the edited sequence with the corresponding reference with reported Bos taurus sequence (EU148610.1) were performed with ClustalW software to identify single nucleotide polymorphisms (SNPs). Basic Local Alignment Search Tool analysis was performed to compare the sequence identity of KF CD14 gene with other species. The restriction fragment length polymorphism (RFLP) analysis was carried out in all KF cows using Helicobacter pylori 188I (Hpy188I) (contig 2) and Haemophilus influenzae I (HinfI) (contig 4) restriction enzyme (RE). Cows were assigned genotypes obtained by PCR-RFLP analysis, and association study was done using Chi-square (χ (2)) test. The genotypes of both contigs (loci) number 2 and 4 were combined with respect to each animal to construct combined genotype patterns. Two types of sequences of KF were obtained: One with 2630 bp having one insertion at 616 nucleotide (nt) position and one deletion at 1117 nt position, and the another sequence was of 2629 bp having only one deletion at 615 nt position. ClustalW, multiple alignments of KF CD14 gene sequence with B. taurus cattle sequence (EU148610.1), revealed 24 nt changes (SNPs). Cows were also screened using PCR-RFLP with Hpy188I (contig 2) and HinfI (contig 4) RE, which revealed three genotypes each that differed significantly regarding mastitis incidence. The maximum possible combination of these two loci shown nine combined genotype patterns and it was observed only eight combined genotypes out of nine: AACC, AACD, AADD, ABCD, ABDD, BBCC, BBCD, and BBDD. The combined genotype ABCC was not observed in the studied population of KF cows. Out of 94 animals, AACD combined genotype animals (10.63%) were found to be not affected with mastitis, and ABDD combined genotyped animals was observed having the highest mastitis incidence of 15.96%. AACD typed cows were found to be least susceptible to mastitis incidence as compared to other combined genotypes.
Detecting non-orthology in the COGs database and other approaches grouping orthologs using genome-specific best hits.

PubMed

Dessimoz, Christophe; Boeckmann, Brigitte; Roth, Alexander C J; Gonnet, Gaston H

2006-01-01

Correct orthology assignment is a critical prerequisite of numerous comparative genomics procedures, such as function prediction, construction of phylogenetic species trees and genome rearrangement analysis. We present an algorithm for the detection of non-orthologs that arise by mistake in current orthology classification methods based on genome-specific best hits, such as the COGs database. The algorithm works with pairwise distance estimates, rather than computationally expensive and error-prone tree-building methods. The accuracy of the algorithm is evaluated through verification of the distribution of predicted cases, case-by-case phylogenetic analysis and comparisons with predictions from other projects using independent methods. Our results show that a very significant fraction of the COG groups include non-orthologs: using conservative parameters, the algorithm detects non-orthology in a third of all COG groups. Consequently, sequence analysis sensitive to correct orthology assignments will greatly benefit from these findings.
An algebraic hypothesis about the primeval genetic code architecture.

PubMed

Sánchez, Robersy; Grau, Ricardo

2009-09-01

A plausible architecture of an ancient genetic code is derived from an extended base triplet vector space over the Galois field of the extended base alphabet {D,A,C,G,U}, where symbol D represents one or more hypothetical bases with unspecific pairings. We hypothesized that the high degeneration of a primeval genetic code with five bases and the gradual origin and improvement of a primeval DNA repair system could make possible the transition from ancient to modern genetic codes. Our results suggest that the Watson-Crick base pairing G identical with C and A=U and the non-specific base pairing of the hypothetical ancestral base D used to define the sum and product operations are enough features to determine the coding constraints of the primeval and the modern genetic code, as well as, the transition from the former to the latter. Geometrical and algebraic properties of this vector space reveal that the present codon assignment of the standard genetic code could be induced from a primeval codon assignment. Besides, the Fourier spectrum of the extended DNA genome sequences derived from the multiple sequence alignment suggests that the called period-3 property of the present coding DNA sequences could also exist in the ancient coding DNA sequences. The phylogenetic analyses achieved with metrics defined in the N-dimensional vector space (B(3))(N) of DNA sequences and with the new evolutionary model presented here also suggest that an ancient DNA coding sequence with five or more bases does not contradict the expected evolutionary history.
Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution.

PubMed

2004-12-09

We present here a draft genome sequence of the red jungle fowl, Gallus gallus. Because the chicken is a modern descendant of the dinosaurs and the first non-mammalian amniote to have its genome sequenced, the draft sequence of its genome--composed of approximately one billion base pairs of sequence and an estimated 20,000-23,000 genes--provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes. For example, the evolutionary distance between chicken and human provides high specificity in detecting functional elements, both non-coding and coding. Notably, many conserved non-coding sequences are far from genes and cannot be assigned to defined functional classes. In coding regions the evolutionary dynamics of protein domains and orthologous groups illustrate processes that distinguish the lineages leading to birds and mammals. The distinctive properties of avian microchromosomes, together with the inferred patterns of conserved synteny, provide additional insights into vertebrate chromosome architecture.
Clonal architecture of secondary acute myeloid leukemia defined by single-cell sequencing.

PubMed

Hughes, Andrew E O; Magrini, Vincent; Demeter, Ryan; Miller, Christopher A; Fulton, Robert; Fulton, Lucinda L; Eades, William C; Elliott, Kevin; Heath, Sharon; Westervelt, Peter; Ding, Li; Conrad, Donald F; White, Brian S; Shao, Jin; Link, Daniel C; DiPersio, John F; Mardis, Elaine R; Wilson, Richard K; Ley, Timothy J; Walter, Matthew J; Graubert, Timothy A

2014-07-01

Next-generation sequencing has been used to infer the clonality of heterogeneous tumor samples. These analyses yield specific predictions-the population frequency of individual clones, their genetic composition, and their evolutionary relationships-which we set out to test by sequencing individual cells from three subjects diagnosed with secondary acute myeloid leukemia, each of whom had been previously characterized by whole genome sequencing of unfractionated tumor samples. Single-cell mutation profiling strongly supported the clonal architecture implied by the analysis of bulk material. In addition, it resolved the clonal assignment of single nucleotide variants that had been initially ambiguous and identified areas of previously unappreciated complexity. Accordingly, we find that many of the key assumptions underlying the analysis of tumor clonality by deep sequencing of unfractionated material are valid. Furthermore, we illustrate a single-cell sequencing strategy for interrogating the clonal relationships among known variants that is cost-effective, scalable, and adaptable to the analysis of both hematopoietic and solid tumors, or any heterogeneous population of cells.
Sequence-structural features and evolutionary relationships of family GH57 α-amylases and their putative α-amylase-like homologues.

PubMed

Janeček, Stefan; Blesák, Karol

2011-08-01

The glycoside hydrolase family 57 (GH57) contains α-amylase and a few other amylolytic specificities. It counts ~400 members from Archaea (1/4) and Bacteria (3/4), mostly of extremophilic prokaryotes. Only 17 GH57 enzymes have been biochemically characterized. The main goal of the present bioinformatics study was to analyze sequences having the clear GH57 α-amylase features. Of the 107 GH57 sequences, 59 were evaluated as α-amylases (containing both GH57 catalytic residues), whereas 48 were assigned as GH57 α-amylase-like proteins (having a substitution in one or both catalytic residues). Forty-eight of 59 α-amylases were from Archaea, but 42 of 48 α-amylase-like proteins were of bacterial origin. The catalytic residues were substituted in most cases in Bacteroides and Prevotella by serine (instead of catalytic nucleophile glutamate) and glutamate (instead of proton donor aspartate). The GH57 α-amylase specificity has thus been evolved and kept enzymatically active mainly in Archaea.
Identification of Sequence Specificity of 5-Methylcytosine Oxidation by Tet1 Protein with High-Throughput Sequencing.

PubMed

Kizaki, Seiichiro; Chandran, Anandhakumar; Sugiyama, Hiroshi

2016-03-02

Tet (ten-eleven translocation) family proteins have the ability to oxidize 5-methylcytosine (mC) to 5-hydroxymethylcytosine (hmC), 5-formylcytosine (fC), and 5-carboxycytosine (caC). However, the oxidation reaction of Tet is not understood completely. Evaluation of genomic-level epigenetic changes by Tet protein requires unbiased identification of the highly selective oxidation sites. In this study, we used high-throughput sequencing to investigate the sequence specificity of mC oxidation by Tet1. A 6.6×10(4) -member mC-containing random DNA-sequence library was constructed. The library was subjected to Tet-reactive pulldown followed by high-throughput sequencing. Analysis of the obtained sequence data identified the Tet1-reactive sequences. We identified mCpG as a highly reactive sequence of Tet1 protein. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Automatic Assignment of Methyl-NMR Spectra of Supramolecular Machines Using Graph Theory.

PubMed

Pritišanac, Iva; Degiacomi, Matteo T; Alderson, T Reid; Carneiro, Marta G; Ab, Eiso; Siegal, Gregg; Baldwin, Andrew J

2017-07-19

Methyl groups are powerful probes for the analysis of structure, dynamics and function of supramolecular assemblies, using both solution- and solid-state NMR. Widespread application of the methodology has been limited due to the challenges associated with assigning spectral resonances to specific locations within a biomolecule. Here, we present Methyl Assignment by Graph Matching (MAGMA), for the automatic assignment of methyl resonances. A graph matching protocol examines all possibilities for each resonance in order to determine an exact assignment that includes a complete description of any ambiguity. MAGMA gives 100% accuracy in confident assignments when tested against both synthetic data, and 9 cross-validated examples using both solution- and solid-state NMR data. We show that this remarkable accuracy enables a user to distinguish between alternative protein structures. In a drug discovery application on HSP90, we show the method can rapidly and efficiently distinguish between possible ligand binding modes. By providing an exact and robust solution to methyl resonance assignment, MAGMA can facilitate significantly accelerated studies of supramolecular machines using methyl-based NMR spectroscopy.
Therapist Behaviors as Predictors of Immediate Homework Engagement in Cognitive Therapy for Depression.

PubMed

Conklin, Laren R; Strunk, Daniel R; Cooper, Andrew A

2018-02-01

Homework assignments are an integral part of cognitive therapy (CT) for depression, though facilitating homework engagement in patients with depression can be a challenge. We sought to examine three classes of therapist behaviors as predictors of homework engagement in early sessions of CT: therapist behaviors related to the review of homework, the assignment of homework, and efforts to help patients overcome obstacles to completing homework. In a sample of 66 depressed outpatients participating in CT, therapist behaviors involved in assigning homework predicted both CT-specific homework engagement and more general homework engagement. Therapist behaviors involved in homework review were not predictive of homework engagement. Our findings are consistent with the possibility that therapists' emphasis of key elements of the homework assignment process enhances patients' engagement in homework in early sessions of CT.
A taxonomic framework for cable bacteria and proposal of the candidate genera Electrothrix and Electronema.

PubMed

Trojan, Daniela; Schreiber, Lars; Bjerg, Jesper T; Bøggild, Andreas; Yang, Tingting; Kjeldsen, Kasper U; Schramm, Andreas

2016-07-01

Cable bacteria are long, multicellular filaments that can conduct electric currents over centimeter-scale distances. All cable bacteria identified to date belong to the deltaproteobacterial family Desulfobulbaceae and have not been isolated in pure culture yet. Their taxonomic delineation and exact phylogeny is uncertain, as most studies so far have reported only short partial 16S rRNA sequences or have relied on identification by a combination of filament morphology and 16S rRNA-targeted fluorescence in situ hybridization with a Desulfobulbaceae-specific probe. In this study, nearly full-length 16S rRNA gene sequences of 16 individual cable bacteria filaments from freshwater, salt marsh, and marine sites of four geographic locations are presented. These sequences formed a distinct, monophyletic sister clade to the genus Desulfobulbus and could be divided into six coherent, species-level clusters, arranged as two genus-level groups. The same grouping was retrieved by phylogenetic analysis of full or partial dsrAB genes encoding the dissimilatory sulfite reductase. Based on these results, it is proposed to accommodate cable bacteria within two novel candidate genera: the mostly marine "Candidatus Electrothrix", with four candidate species, and the mostly freshwater "Candidatus Electronema", with two candidate species. This taxonomic framework can be used to assign environmental sequences confidently to the cable bacteria clade, even without morphological information. Database searches revealed 185 16S rRNA gene sequences that affiliated within the clade formed by the proposed cable bacteria genera, of which 120 sequences could be assigned to one of the six candidate species, while the remaining 65 sequences indicated the existence of up to five additional species. Copyright © 2016 The Author(s). Published by Elsevier GmbH.. All rights reserved.
Triangulating the provenance of African elephants using mitochondrial DNA

PubMed Central

Ishida, Yasuko; Georgiadis, Nicholas J; Hondo, Tomoko; Roca, Alfred L

2013-01-01

African elephant mitochondrial (mt) DNA follows a distinctive evolutionary trajectory. As females do not migrate between elephant herds, mtDNA exhibits low geographic dispersal. We therefore examined the effectiveness of mtDNA for assigning the provenance of African elephants (or their ivory). For 653 savanna and forest elephants from 22 localities in 13 countries, 4258 bp of mtDNA was sequenced. We detected eight mtDNA subclades, of which seven had regionally restricted distributions. Among 108 unique haplotypes identified, 72% were found at only one locality and 84% were country specific, while 44% of individuals carried a haplotype detected only at their sampling locality. We combined 316 bp of our control region sequences with those generated by previous trans-national surveys of African elephants. Among 101 unique control region haplotypes detected in African elephants across 81 locations in 22 countries, 62% were present in only a single country. Applying our mtDNA results to a previous microsatellite-based assignment study would improve estimates of the provenance of elephants in 115 of 122 mis-assigned cases. Nuclear partitioning followed species boundaries and not mtDNA subclade boundaries. For taxa such as elephants in which nuclear and mtDNA markers differ in phylogeography, combining the two markers can triangulate the origins of confiscated wildlife products. PMID:23798975
sup 1 H assignments and secondary structure determination of the soybean trypsin/chymotrypsin Bowman-Birk inhibitor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Werner, M.H.; Wemmer, D.E.

1991-04-09

The {sup 1}H resonance assignments and secondary structure of the trypsin/chymotrypsin Bowman-Birk inhibitor from soybeans were determined by nuclear magnetic resonance spectroscopy (NMR) at 600 MHz in an 18% acetonitrile-d{sub 3}/aqueous cosolvent. Resonances from 69 to 71 amino acids were assigned sequence specifically. Residues Q11-T15 form an antiparallel {beta}-sheet with residues Q21-S25 in the tryptic inhibitory domain and an analogous region of antiparallel sheet forms between residues S38-A42 and Q48-V52 in the chymotryptic inhibitory domain. The inhibitory sites of each fragment (K16-S17 for trypsin, L43-S44 for chymotrypsin) are each part of a type VI like turn at one end ofmore » their respective region of the antiparallel {beta}-sheet. These structural elements are compared to those found in other Bowman-Birk inhibitors.« less
MinION™ nanopore sequencing of environmental metagenomes: a synthetic approach

PubMed Central

Watson, Mick; Minot, Samuel S.; Rivera, Maria C.; Franklin, Rima B.

2017-01-01

Abstract Background: Environmental metagenomic analysis is typically accomplished by assigning taxonomy and/or function from whole genome sequencing or 16S amplicon sequences. Both of these approaches are limited, however, by read length, among other technical and biological factors. A nanopore-based sequencing platform, MinION™, produces reads that are ≥1 × 104 bp in length, potentially providing for more precise assignment, thereby alleviating some of the limitations inherent in determining metagenome composition from short reads. We tested the ability of sequence data produced by MinION (R7.3 flow cells) to correctly assign taxonomy in single bacterial species runs and in three types of low-complexity synthetic communities: a mixture of DNA using equal mass from four species, a community with one relatively rare (1%) and three abundant (33% each) components, and a mixture of genomic DNA from 20 bacterial strains of staggered representation. Taxonomic composition of the low-complexity communities was assessed by analyzing the MinION sequence data with three different bioinformatic approaches: Kraken, MG-RAST, and One Codex. Results: Long read sequences generated from libraries prepared from single strains using the version 5 kit and chemistry, run on the original MinION device, yielded as few as 224 to as many as 3497 bidirectional high-quality (2D) reads with an average overall study length of 6000 bp. For the single-strain analyses, assignment of reads to the correct genus by different methods ranged from 53.1% to 99.5%, assignment to the correct species ranged from 23.9% to 99.5%, and the majority of misassigned reads were to closely related organisms. A synthetic metagenome sequenced with the same setup yielded 714 high quality 2D reads of approximately 5500 bp that were up to 98% correctly assigned to the species level. Synthetic metagenome MinION libraries generated using version 6 kit and chemistry yielded from 899 to 3497 2D reads with lengths averaging 5700 bp with up to 98% assignment accuracy at the species level. The observed community proportions for “equal” and “rare” synthetic libraries were close to the known proportions, deviating from 0.1% to 10% across all tests. For a 20-species mock community with staggered contributions, a sequencing run detected all but 3 species (each included at <0.05% of DNA in the total mixture), 91% of reads were assigned to the correct species, 93% of reads were assigned to the correct genus, and >99% of reads were assigned to the correct family. Conclusions: At the current level of output and sequence quality (just under 4 × 103 2D reads for a synthetic metagenome), MinION sequencing followed by Kraken or One Codex analysis has the potential to provide rapid and accurate metagenomic analysis where the consortium is comprised of a limited number of taxa. Important considerations noted in this study included: high sensitivity of the MinION platform to the quality of input DNA, high variability of sequencing results across libraries and flow cells, and relatively small numbers of 2D reads per analysis limit. Together, these limited detection of very rare components of the microbial consortia, and would likely limit the utility of MinION for the sequencing of high-complexity metagenomic communities where thousands of taxa are expected. Furthermore, the limitations of the currently available data analysis tools suggest there is considerable room for improvement in the analytical approaches for the characterization of microbial communities using long reads. Nevertheless, the fact that the accurate taxonomic assignment of high-quality reads generated by MinION is approaching 99.5% and, in most cases, the inferred community structure mirrors the known proportions of a synthetic mixture warrants further exploration of practical application to environmental metagenomics as the platform continues to develop and improve. With further improvement in sequence throughput and error rate reduction, this platform shows great promise for precise real-time analysis of the composition and structure of more complex microbial communities. PMID:28327976
MinION™ nanopore sequencing of environmental metagenomes: a synthetic approach.

PubMed

Brown, Bonnie L; Watson, Mick; Minot, Samuel S; Rivera, Maria C; Franklin, Rima B

2017-03-01

Environmental metagenomic analysis is typically accomplished by assigning taxonomy and/or function from whole genome sequencing or 16S amplicon sequences. Both of these approaches are limited, however, by read length, among other technical and biological factors. A nanopore-based sequencing platform, MinION™, produces reads that are ≥1 × 104 bp in length, potentially providing for more precise assignment, thereby alleviating some of the limitations inherent in determining metagenome composition from short reads. We tested the ability of sequence data produced by MinION (R7.3 flow cells) to correctly assign taxonomy in single bacterial species runs and in three types of low-complexity synthetic communities: a mixture of DNA using equal mass from four species, a community with one relatively rare (1%) and three abundant (33% each) components, and a mixture of genomic DNA from 20 bacterial strains of staggered representation. Taxonomic composition of the low-complexity communities was assessed by analyzing the MinION sequence data with three different bioinformatic approaches: Kraken, MG-RAST, and One Codex. Results: Long read sequences generated from libraries prepared from single strains using the version 5 kit and chemistry, run on the original MinION device, yielded as few as 224 to as many as 3497 bidirectional high-quality (2D) reads with an average overall study length of 6000 bp. For the single-strain analyses, assignment of reads to the correct genus by different methods ranged from 53.1% to 99.5%, assignment to the correct species ranged from 23.9% to 99.5%, and the majority of misassigned reads were to closely related organisms. A synthetic metagenome sequenced with the same setup yielded 714 high quality 2D reads of approximately 5500 bp that were up to 98% correctly assigned to the species level. Synthetic metagenome MinION libraries generated using version 6 kit and chemistry yielded from 899 to 3497 2D reads with lengths averaging 5700 bp with up to 98% assignment accuracy at the species level. The observed community proportions for “equal” and “rare” synthetic libraries were close to the known proportions, deviating from 0.1% to 10% across all tests. For a 20-species mock community with staggered contributions, a sequencing run detected all but 3 species (each included at <0.05% of DNA in the total mixture), 91% of reads were assigned to the correct species, 93% of reads were assigned to the correct genus, and >99% of reads were assigned to the correct family. Conclusions: At the current level of output and sequence quality (just under 4 × 103 2D reads for a synthetic metagenome), MinION sequencing followed by Kraken or One Codex analysis has the potential to provide rapid and accurate metagenomic analysis where the consortium is comprised of a limited number of taxa. Important considerations noted in this study included: high sensitivity of the MinION platform to the quality of input DNA, high variability of sequencing results across libraries and flow cells, and relatively small numbers of 2D reads per analysis limit. Together, these limited detection of very rare components of the microbial consortia, and would likely limit the utility of MinION for the sequencing of high-complexity metagenomic communities where thousands of taxa are expected. Furthermore, the limitations of the currently available data analysis tools suggest there is considerable room for improvement in the analytical approaches for the characterization of microbial communities using long reads. Nevertheless, the fact that the accurate taxonomic assignment of high-quality reads generated by MinION is approaching 99.5% and, in most cases, the inferred community structure mirrors the known proportions of a synthetic mixture warrants further exploration of practical application to environmental metagenomics as the platform continues to develop and improve. With further improvement in sequence throughput and error rate reduction, this platform shows great promise for precise real-time analysis of the composition and structure of more complex microbial communities. © The Author 2017. Published by Oxford University Press.
Unprecedented genomic diversity of AhR1 and AhR2 genes in Atlantic salmon (Salmo salar L.).

PubMed

Hansson, Maria C; Wittzell, Håkan; Persson, Kerstin; von Schantz, Torbjörn

2004-06-24

Aryl hydrocarbon receptor (AhR) genes encode proteins involved in mediating the toxic responses induced by several environmental pollutants. Here, we describe the identification of the first two AhR1 (alpha and beta) genes and two additional AhR2 (alpha and beta) genes in the tetraploid species Atlantic salmon (Salmo salar L.) from a cosmid library screening. Cosmid clones containing genomic salmon AhR sequences were isolated using a cDNA clone containing the coding region of the Atlantic salmon AhR2gamma as a probe. Screening revealed 14 positive clones, from which four were chosen for further analyses. One of the cosmids contained genomic AhR sequences that were highly similar to the rainbow trout (Oncorhynchus mykiss) AhR2alpha and beta genes. SMART RACE amplified two complete, highly similar but not identical AhR type 2 sequences from salmon cDNA, which from phylogenetic analyses were determined as the rainbow trout AhR2alpha and beta orthologs. The salmon AhR2alpha and beta encode proteins of 1071 and 1058 residues, respectively, and encompass characteristic AhR sequence elements like a basic-helix-loop-helix (bHLH) and two PER-ARNT-SIM (PAS) domains. Both genes are transcribed in liver, spleen and muscle tissues of adult salmon. A second cosmid contained partial sequences, which were identical to the previously characterized AhR2gamma gene. The last two cosmids contained partial genomic AhR sequences, which were more similar to other AhR type 1 fish genes than the four characterized salmon AhR2 genes. However, attempts to amplify the corresponding complete cDNA sequences of the inserts proved very difficult, suggesting that these genes are non-functional or very weakly transcribed in the examined tissues. Phylogenetic analyses of the conserved regions did, however, clearly indicate that these two AhRs belong to the AhR type 1 clade and have been assigned as the Atlantic salmon AhR1alpha and AhR1beta genes. Taken together, these findings demonstrate that multiple AhR genes are present in Atlantic salmon genome, which likely is a consequence of previous genome duplications in the evolutionary past of salmonids. Plausible explanations for the high incidence of AhR genes in fish and more specifically in salmonids, like rapid divergences in specialized functions, are discussed.
Derivational Suffixes as Cues to Stress Position in Reading Greek

ERIC Educational Resources Information Center

Grimani, Aikaterini; Protopapas, Athanassios

2017-01-01

Background: In languages with lexical stress, reading aloud must include stress assignment. Stress information sources across languages include word-final letter sequences. Here, we examine whether such sequences account for stress assignment in Greek and whether this is attributable to absolute rules involving accenting morphemes or to…
Sustainable Design of EPA's Campus in Research Triangle Park, NC—Environmental Performance Specifications in Construction Contracts—Section 01450 Sequence of Finishes Installation

EPA Pesticide Factsheets

Learn more about the special construction scheduling/sequencing requirements and procedures necessary to assure achievement of designed Indoor Air Quality (IAQ) levels for the completed project required by the EPA IAQ Program.
Chip-based in situ hybridization for identification of bacteria from the human microbiome.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Light, Yooli Kim; Meagher, Robert J.; Singh, Anup K.

2010-11-01

The emerging field of metagenomics seeks to assess the genetic diversity of complex mixed populations of bacteria, such as those found at different sites within the human body. A single person's mouth typically harbors up to 100 bacterial species, while surveys of many people have found more than 700 different species, of which {approx}50% have never been cultivated. In typical metagenomics studies, the cells themselves are destroyed in the process of gathering sequence information, and thus the connection between genotype and phenotype is lost. A great deal of sequence information may be generated, but it is impossible to assign anymore » given sequence to a specific cell. We seek non-destructive, culture-independent means of gathering sequence information from selected individual cells from mixed populations. As a first step, we have developed a microfluidic device for concentrating and specifically labeling bacteria from a mixed population. Bacteria are electrophoretically concentrated against a photopolymerized membrane element, and then incubated with a specific fluorescent label, which can include antibodies as well as specific or non-specific nucleic acid stains. Unbound stain is washed away, and the labeled bacteria are released from the membrane. The stained cells can then be observed via epifluorescence microscopy, or counted via flow cytometry. We have tested our device with three representative bacteria from the human microbiome: E. coli (gut, Gram-negative), Lactobacillus acidophilus (mouth, Gram-positive), and Streptococcus mutans (mouth, Gram-positive), with results comparable to off-chip labeling techniques.« less
Flexible taxonomic assignment of ambiguous sequencing reads

PubMed Central

2011-01-01

Background To characterize the diversity of bacterial populations in metagenomic studies, sequencing reads need to be accurately assigned to taxonomic units in a given reference taxonomy. Reads that cannot be reliably assigned to a unique leaf in the taxonomy (ambiguous reads) are typically assigned to the lowest common ancestor of the set of species that match it. This introduces a potentially severe error in the estimation of bacteria present in the sample due to false positives, since all species in the subtree rooted at the ancestor are implicitly assigned to the read even though many of them may not match it. Results We present a method that maps each read to a node in the taxonomy that minimizes a penalty score while balancing the relevance of precision and recall in the assignment through a parameter q. This mapping can be obtained in time linear in the number of matching sequences, because LCA queries to the reference taxonomy take constant time. When applied to six different metagenomic datasets, our algorithm produces different taxonomic distributions depending on whether coverage or precision is maximized. Including information on the quality of the reads reduces the number of unassigned reads but increases the number of ambiguous reads, stressing the relevance of our method. Finally, two measures of performance are described and results with a set of artificially generated datasets are discussed. Conclusions The assignment strategy of sequencing reads introduced in this paper is a versatile and a quick method to study bacterial communities. The bacterial composition of the analyzed samples can vary significantly depending on how ambiguous reads are assigned depending on the value of the q parameter. Validation of our results in an artificial dataset confirm that a combination of values of q produces the most accurate results. PMID:21211059

Specificity of homework compliance effects on treatment outcome in CBT: evidence from a controlled trial on panic disorder and agoraphobia.

PubMed

Cammin-Nowak, Sandra; Helbig-Lang, Sylvia; Lang, Thomas; Gloster, Andrew T; Fehm, Lydia; Gerlach, Alexander L; Ströhle, Andreas; Deckert, Jürgen; Kircher, Tilo; Hamm, Alfons O; Alpers, Georg W; Arolt, Volker; Wittchen, H-U

2013-06-01

Although homework assignments are an integral component of cognitive-behavioral therapy (CBT) and relate to positive therapy outcomes, it is unclear whether specific homework types and their completion have specific effects on outcome. Data from N = 292 patients (75% female, mean age 36 years) with panic disorder and agoraphobia and treated with standardized CBT were analyzed with homework compliance quality and quantity for different types of homework serving as predictors for different outcome variables. Quality ratings of homework completion were stronger outcome predictors than quantitative compliance ratings. Exposure homework was a better outcome predictor than homework relating to psychoeducation and self-monitoring. Different aspects of homework compliance and specific homework types might differentially relate to CBT outcome. © 2013 Wiley Periodicals, Inc.
Communicating with parents with full disclosure: a case of cloacal extrophy with genital ambiguity.

PubMed

Myers, Catherine; Lee, Peter A

2004-03-01

Full disclosure and complete involvement of parents in decisions concerning assignment of sex and genital surgery must be part of medical care for children presenting with findings consistent with disorders of intersex. Intersex most commonly involves disorders of steroidogenesis or gonadal function, but may include multiple cloacal anomalies, such as presented here. To describe full disclosure of medical findings, by a multi-disciplinary medical team, as they became available over a period of weeks, in an infant originally assigned male, but eventually assigned female. An infant born at 24 weeks of gestation, after prenatal ultrasound showing distended bladder, ascites, and bilateral hydroureters, found to have an imperforate anus and a tubular structure appearing as a thin penis, without palpable corpora. Events concerning this case are discussed in relation to full disclosure of medical information to parents, guidelines for management of intersex, and the diagnosis (cloacal anomaly, cloacal extrophy, ano-rectal anomalies or uro-rectal septum malformation sequence). Full disclosure with involvement of parents with medical decisions is not only mandated currently, but also can be an effective approach in intersex care.
The BG21 isoform of Golli myelin basic protein is intrinsically disordered with a highly flexible amino-terminal domain.

PubMed

Ahmed, Mumdooh A M; Bamm, Vladimir V; Harauz, George; Ladizhansky, Vladimir

2007-08-28

The genes of the oligodendrocyte lineage (Golli) encode a family of developmentally regulated isoforms of myelin basic protein. The "classic" MBP isoforms arise from transcription start site 3, whereas Golli-specific isoforms arise from transcription start site 1, and comprise both Golli-specific and classic MBP sequences. The Golli isoform BG21 has been suggested to play roles in myelination and T cell activation pathways. It is an intrinsically disordered protein, thereby presenting a large effective surface area for interaction with other proteins such as Golli-interacting protein. We have used multidimensional heteronuclear NMR spectroscopy to achieve sequence-specific resonance assignments of the recombinant murine BG21 in physiologically relevant buffer, to analyze its secondary structure using chemical shift indexing (CSI), and to investigate its backbone dynamics using 15N spin relaxation measurements. We have assigned 184 out of 199 residues unambiguously. The CSI analysis revealed little ordered secondary structure under these conditions, with only some small fragments having a slight tendency toward alpha-helicity, which may represent putative recognition motifs. The 15N relaxation and NOE measurements confirmed the general behavior of the protein as an extended polypeptide chain, with the N-terminal Golli-specific portion (residues S5-T69) being exceptionally flexible, even in comparison to other intrinsically disordered proteins that have been studied this way. The high degree of flexibility of this N-terminal region may be to provide additional plasticity, or conformational adaptability, in protein-protein interactions. Another highly mobile segment, A126-S127-G128-G129, may function as a hinge.
Identification of the polypeptides encoded in the unassigned reading frames 2, 4, 4L, and 5 of human mitochondrial DNA.

PubMed Central

Mariottini, P; Chomyn, A; Riley, M; Cottrell, B; Doolittle, R F; Attardi, G

1986-01-01

In previous work, antibodies prepared against chemically synthesized peptides predicted from the DNA sequence were used to identify the polypeptides encoded in three of the eight unassigned reading frames (URFs) of human mitochondrial DNA (mtDNA). In the present study, this approach has been extended to other human mtDNA URFs. In particular, antibodies directed against the NH2-terminal octapeptide of the putative URF2 product specifically precipitated component 11 of the HeLa cell mitochondrial translation products, the reaction being inhibited by the specific peptide. Similarly, antibodies directed against the COOH-terminal nonapeptide of the putative URF4 product reacted specifically with components 4 and 5, and antibodies against a COOH-terminal heptapeptide of the presumptive URF4L product reacted specifically with component 26. Antibodies against the NH2-terminal heptapeptide of the putative product of URF5 reacted with component 1, but only to a marginal extent; however, the results of a trypsin fingerprinting analysis of component 1 point strongly to this component as being the authentic product of URF5. The polypeptide assignments to the mtDNA URFs analyzed here are supported by the relative electrophoretic mobilities of proteins 11, 4-5, 26, and 1, which are those expected for the molecular weights predicted from the DNA sequence for the products of URF2, URF4, URF4L, and URF5, respectively. With the present assignment, seven of the eight human mtDNA URFs have been shown to be expressed in HeLa cells. Images PMID:3456601
Conservation and variability of West Nile virus proteins.

PubMed

Koo, Qi Ying; Khan, Asif M; Jung, Keun-Ok; Ramdas, Shweta; Miotto, Olivo; Tan, Tin Wee; Brusic, Vladimir; Salmon, Jerome; August, J Thomas

2009-01-01

West Nile virus (WNV) has emerged globally as an increasingly important pathogen for humans and domestic animals. Studies of the evolutionary diversity of the virus over its known history will help to elucidate conserved sites, and characterize their correspondence to other pathogens and their relevance to the immune system. We describe a large-scale analysis of the entire WNV proteome, aimed at identifying and characterizing evolutionarily conserved amino acid sequences. This study, which used 2,746 WNV protein sequences collected from the NCBI GenPept database, focused on analysis of peptides of length 9 amino acids or more, which are immunologically relevant as potential T-cell epitopes. Entropy-based analysis of the diversity of WNV sequences, revealed the presence of numerous evolutionarily stable nonamer positions across the proteome (entropy value of < or = 1). The representation (frequency) of nonamers variant to the predominant peptide at these stable positions was, generally, low (< or = 10% of the WNV sequences analyzed). Eighty-eight fragments of length 9-29 amino acids, representing approximately 34% of the WNV polyprotein length, were identified to be identical and evolutionarily stable in all analyzed WNV sequences. Of the 88 completely conserved sequences, 67 are also present in other flaviviruses, and several have been associated with the functional and structural properties of viral proteins. Immunoinformatic analysis revealed that the majority (78/88) of conserved sequences are potentially immunogenic, while 44 contained experimentally confirmed human T-cell epitopes. This study identified a comprehensive catalogue of completely conserved WNV sequences, many of which are shared by other flaviviruses, and majority are potential epitopes. The complete conservation of these immunologically relevant sequences through the entire recorded WNV history suggests they will be valuable as components of peptide-specific vaccines or other therapeutic applications, for sequence-specific diagnosis of a wide-range of Flavivirus infections, and for studies of homologous sequences among other flaviviruses.
Sequence Analysis of the Genome of Carnation (Dianthus caryophyllus L.)

PubMed Central

Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

2014-01-01

The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. ‘Francesco’ was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568 887 315 bp, consisting of 45 088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16 644 bp and 60 737 bp, respectively, and the longest scaffold was 1 287 144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. PMID:24344172
Phylogenomics databases for facilitating functional genomics in rice.

PubMed

Jung, Ki-Hong; Cao, Peijian; Sharma, Rita; Jain, Rashmi; Ronald, Pamela C

2015-12-01

The completion of whole genome sequence of rice (Oryza sativa) has significantly accelerated functional genomics studies. Prior to the release of the sequence, only a few genes were assigned a function each year. Since sequencing was completed in 2005, the rate has exponentially increased. As of 2014, 1,021 genes have been described and added to the collection at The Overview of functionally characterized Genes in Rice online database (OGRO). Despite this progress, that number is still very low compared with the total number of genes estimated in the rice genome. One limitation to progress is the presence of functional redundancy among members of the same rice gene family, which covers 51.6 % of all non-transposable element-encoding genes. There remain a significant portion or rice genes that are not functionally redundant, as reflected in the recovery of loss-of-function mutants. To more accurately analyze functional redundancy in the rice genome, we have developed a phylogenomics databases for six large gene families in rice, including those for glycosyltransferases, glycoside hydrolases, kinases, transcription factors, transporters, and cytochrome P450 monooxygenases. In this review, we introduce key features and applications of these databases. We expect that they will serve as a very useful guide in the post-genomics era of research.
Pax1, a member of the paired box-containing class of developmental control genes, is mapped to human chromosome 20p11.2 by in situ hybridization (ISH and FISH).

PubMed

Schnittger, S; Rao, V V; Deutsch, U; Gruss, P; Balling, R; Hansmann, I

1992-11-01

Pax-1, a member of a murine multigene family, belongs to the paired box-containing class of developmental control genes first identified in Drosophila. The Pax-1 gene encodes a sequence-specific DNA-binding protein with transcriptional activating properties and has been found to be mutated in the autosomal recessive mutation undulated (un) on mouse chromosome 2 with vertebral anomalies along the entire rostrocaudal axis. By radioactive in situ hybridization (ISH) using a fragment from the murine Pax-1 paired box that is almost identical to the respective sequences from the cognate human gene HuP48 and fluorescence in situ hybridization (FISH) using a complete mouse Pax-1 cDNA, we have assigned the human homologue of murine Pax-1, the PAX1 locus, to chromosome 20p. The map position of PAX1 after FISH (FL-pter value of 0.34 +/- 0.04) corresponds to band p11.2. These results confirm the exceptional homology between human chromosome 20 and the distal segment of mouse chromosome 2, extending from bands F to G, and add PAX1 to the group of genes on 20p like PTPA, PRNP, SCG1, BMP2A, which are located in proximity on both chromosomes.
The Gypsy Database (GyDB) of mobile genetic elements: release 2.0

PubMed Central

Llorens, Carlos; Futami, Ricardo; Covelli, Laura; Domínguez-Escribá, Laura; Viu, Jose M.; Tamarit, Daniel; Aguilar-Rodríguez, Jose; Vicente-Ripolles, Miguel; Fuster, Gonzalo; Bernet, Guillermo P.; Maumus, Florian; Munoz-Pomer, Alfonso; Sempere, Jose M.; Latorre, Amparo; Moya, Andres

2011-01-01

This article introduces the second release of the Gypsy Database of Mobile Genetic Elements (GyDB 2.0): a research project devoted to the evolutionary dynamics of viruses and transposable elements based on their phylogenetic classification (per lineage and protein domain). The Gypsy Database (GyDB) is a long-term project that is continuously progressing, and that owing to the high molecular diversity of mobile elements requires to be completed in several stages. GyDB 2.0 has been powered with a wiki to allow other researchers participate in the project. The current database stage and scope are long terminal repeats (LTR) retroelements and relatives. GyDB 2.0 is an update based on the analysis of Ty3/Gypsy, Retroviridae, Ty1/Copia and Bel/Pao LTR retroelements and the Caulimoviridae pararetroviruses of plants. Among other features, in terms of the aforementioned topics, this update adds: (i) a variety of descriptions and reviews distributed in multiple web pages; (ii) protein-based phylogenies, where phylogenetic levels are assigned to distinct classified elements; (iii) a collection of multiple alignments, lineage-specific hidden Markov models and consensus sequences, called GyDB collection; (iv) updated RefSeq databases and BLAST and HMM servers to facilitate sequence characterization of new LTR retroelement and caulimovirus queries; and (v) a bibliographic server. GyDB 2.0 is available at http://gydb.org. PMID:21036865
The Gypsy Database (GyDB) of mobile genetic elements: release 2.0.

PubMed

Llorens, Carlos; Futami, Ricardo; Covelli, Laura; Domínguez-Escribá, Laura; Viu, Jose M; Tamarit, Daniel; Aguilar-Rodríguez, Jose; Vicente-Ripolles, Miguel; Fuster, Gonzalo; Bernet, Guillermo P; Maumus, Florian; Munoz-Pomer, Alfonso; Sempere, Jose M; Latorre, Amparo; Moya, Andres

2011-01-01

This article introduces the second release of the Gypsy Database of Mobile Genetic Elements (GyDB 2.0): a research project devoted to the evolutionary dynamics of viruses and transposable elements based on their phylogenetic classification (per lineage and protein domain). The Gypsy Database (GyDB) is a long-term project that is continuously progressing, and that owing to the high molecular diversity of mobile elements requires to be completed in several stages. GyDB 2.0 has been powered with a wiki to allow other researchers participate in the project. The current database stage and scope are long terminal repeats (LTR) retroelements and relatives. GyDB 2.0 is an update based on the analysis of Ty3/Gypsy, Retroviridae, Ty1/Copia and Bel/Pao LTR retroelements and the Caulimoviridae pararetroviruses of plants. Among other features, in terms of the aforementioned topics, this update adds: (i) a variety of descriptions and reviews distributed in multiple web pages; (ii) protein-based phylogenies, where phylogenetic levels are assigned to distinct classified elements; (iii) a collection of multiple alignments, lineage-specific hidden Markov models and consensus sequences, called GyDB collection; (iv) updated RefSeq databases and BLAST and HMM servers to facilitate sequence characterization of new LTR retroelement and caulimovirus queries; and (v) a bibliographic server. GyDB 2.0 is available at http://gydb.org.
A Teaching-Learning Sequence about Weather Map Reading

ERIC Educational Resources Information Center

Mandrikas, Achilleas; Stavrou, Dimitrios; Skordoulis, Constantine

2017-01-01

In this paper a teaching-learning sequence (TLS) introducing pre-service elementary teachers (PET) to weather map reading, with emphasis on wind assignment, is presented. The TLS includes activities about recognition of wind symbols, assignment of wind direction and wind speed on a weather map and identification of wind characteristics in a…
Hepatitis E virus genotype 3 diversity: phylogenetic analysis and presence of subtype 3b in wild boar in Europe.

PubMed

Vina-Rodriguez, Ariel; Schlosser, Josephine; Becher, Dietmar; Kaden, Volker; Groschup, Martin H; Eiden, Martin

2015-05-22

An increasing number of indigenous cases of hepatitis E caused by genotype 3 viruses (HEV-3) have been diagnosed all around the word, particularly in industrialized countries. Hepatitis E is a zoonotic disease and accumulating evidence indicates that domestic pigs and wild boars are the main reservoirs of HEV-3. A detailed analysis of HEV-3 subtypes could help to determine the interplay of human activity, the role of animals as reservoirs and cross species transmission. Although complete genome sequences are most appropriate for HEV subtype determination, in most cases only partial genomic sequences are available. We therefore carried out a subtype classification analysis, which uses regions from all three open reading frames of the genome. Using this approach, more than 1000 published HEV-3 isolates were subtyped. Newly recovered HEV partial sequences from hunted German wild boars were also included in this study. These sequences were assigned to genotype 3 and clustered within subtype 3a, 3i and, unexpectedly, one of them within the subtype 3b, a first non-human report of this subtype in Europe.
[Taxonomic status of the Tyulek virus (TLKV) (Orthomyxoviridae, Quaranjavirus, Quaranfil group) isolated from the ticks Argas vulgaris Filippova, 1961 (Argasidae) from the birds burrow nest biotopes in the Kyrgyzstan].

PubMed

L'vov, D K; Al'khovskiĭ, S V; Shchelkanov, M Iu; Shchetinin, A M; Deriabin, P G; Aristova, V A; Gitel'man, A K; Samokhvalov, E I; Botikov, A G

2014-01-01

The Tyulek virus (TLKV) was isolated from the ticks Argas vulgaris Filippova, 1961 (Argasidae), collected from the burrow biotopes in multispecies birds colony in the Aksu river floodplain near Tyulek village (northern part of Chu Valley, Kyrgyzstan). Recently, the TLKV was assigned to the Quaranfil group (including the Quaranfil virus (QRFV), Johnston Atoll virus (JAV), Lake Chad virus) that is a novel genus of the Quaranjavirus in the Orthomyxoviridae family. In his work, the complete genome (ID GenBank KJ438647-8) sequence of the TLKV was determined using next-generation sequencing (Illumina platform). Comparison of deduced amino acid sequences shows closed relationship of the TLKV with QRFV and JAV (86% and 84% identity for PB1 and about 70% for PB2 and PA, respectively). The identity level of the TLKV and QRFV in outer glycoprotein GP is 72% and 80% for nucleotide and amino acid sequences, respectively. The phylogenetic analysis showed that the TLKV belongs to the genus of the Quaranjavirus in the family Orthomyxoviridae.
The draft genome sequence of cork oak

PubMed Central

Ramos, António Marcos; Usié, Ana; Barbosa, Pedro; Barros, Pedro M.; Capote, Tiago; Chaves, Inês; Simões, Fernanda; Abreu, Isabl; Carrasquinho, Isabel; Faro, Carlos; Guimarães, Joana B.; Mendonça, Diogo; Nóbrega, Filomena; Rodrigues, Leandra; Saibo, Nelson J. M.; Varela, Maria Carolina; Egas, Conceição; Matos, José; Miguel, Célia M.; Oliveira, M. Margarida; Ricardo, Cândido P.; Gonçalves, Sónia

2018-01-01

Cork oak (Quercus suber) is native to southwest Europe and northwest Africa where it plays a crucial environmental and economical role. To tackle the cork oak production and industrial challenges, advanced research is imperative but dependent on the availability of a sequenced genome. To address this, we produced the first draft version of the cork oak genome. We followed a de novo assembly strategy based on high-throughput sequence data, which generated a draft genome comprising 23,347 scaffolds and 953.3 Mb in size. A total of 79,752 genes and 83,814 transcripts were predicted, including 33,658 high-confidence genes. An InterPro signature assignment was detected for 69,218 transcripts, which represented 82.6% of the total. Validation studies demonstrated the genome assembly and annotation completeness and highlighted the usefulness of the draft genome for read mapping of high-throughput sequence data generated using different protocols. All data generated is available through the public databases where it was deposited, being therefore ready to use by the academic and industry communities working on cork oak and/or related species. PMID:29786699
The draft genome sequence of cork oak.

PubMed

Ramos, António Marcos; Usié, Ana; Barbosa, Pedro; Barros, Pedro M; Capote, Tiago; Chaves, Inês; Simões, Fernanda; Abreu, Isabl; Carrasquinho, Isabel; Faro, Carlos; Guimarães, Joana B; Mendonça, Diogo; Nóbrega, Filomena; Rodrigues, Leandra; Saibo, Nelson J M; Varela, Maria Carolina; Egas, Conceição; Matos, José; Miguel, Célia M; Oliveira, M Margarida; Ricardo, Cândido P; Gonçalves, Sónia

2018-05-22

Cork oak (Quercus suber) is native to southwest Europe and northwest Africa where it plays a crucial environmental and economical role. To tackle the cork oak production and industrial challenges, advanced research is imperative but dependent on the availability of a sequenced genome. To address this, we produced the first draft version of the cork oak genome. We followed a de novo assembly strategy based on high-throughput sequence data, which generated a draft genome comprising 23,347 scaffolds and 953.3 Mb in size. A total of 79,752 genes and 83,814 transcripts were predicted, including 33,658 high-confidence genes. An InterPro signature assignment was detected for 69,218 transcripts, which represented 82.6% of the total. Validation studies demonstrated the genome assembly and annotation completeness and highlighted the usefulness of the draft genome for read mapping of high-throughput sequence data generated using different protocols. All data generated is available through the public databases where it was deposited, being therefore ready to use by the academic and industry communities working on cork oak and/or related species.
NMR assignments of the N-terminal domain of Nephila clavipes spidroin 1

PubMed Central

Parnham, Stuart; Gaines, William A.; Duggan, Brendan M.; Marcotte, William R.

2011-01-01

The building blocks of spider dragline silk are two fibrous proteins secreted from the major ampullate gland named spidroins 1 and 2 (MaSp1, MaSp2). These proteins consist of a large central domain composed of approximately 100 tandem copies of a 35–40 amino acid repeat sequence. Non-repetitive N and C-terminal domains, of which the C-terminal domain has been implicated to transition from soluble and insoluble states during spinning, flank the repetitive core. The N-terminal domain until recently has been largely unknown due to difficulties in cloning and expression. Here, we report nearly complete assignment for all 1H, 13C, and 15N resonances in the 14 kDa N-terminal domain of major ampullate spidroin 1 (MaSp1-N) of the golden orb-web spider Nephila clavipes. PMID:21152998
Complete genome sequence of Terriglobus saanensis type strain SP1PR4T, an Acidobacteria from tundra soil

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rawat, Suman R.; Mannisto, Minna; Starovoytov, Valentin

2012-01-01

Terriglobus saanensis SP1PR4T is a novel species of the genus Terriglobus. T. saanensis is of ecological interest because it is a representative of the phylum Acidobacteria, which are dominant members of bacterial soil microbiota in Arctic ecosystems. T. saanensis is a cold-adapted acidophile and a versatile heterotroph utilizing a suite of simple sugars and complex polysaccharides. The genome contained an abundance of genes assigned to metabolism and transport of carbohydrates including gene modules encoding for carbohydrate-active enzyme (CAZyme) family involved in breakdown, utilization and biosynthesis of diverse structural and storage polysaccharides. T. saanensis SP1PR4T represents the first member of genusmore » Terriglobus with a completed genome sequence, consisting of a single replicon of 5,095,226 base pairs (bp), 54 RNA genes and 4,279 protein-coding genes. We infer that the physiology and metabolic potential of T. saanensis is adapted to allow for resilience to the nutrient-deficient conditions and fluctuating temperatures of Arctic tundra soils.« less
QUICR-learning for Multi-Agent Coordination

NASA Technical Reports Server (NTRS)

Agogino, Adrian K.; Tumer, Kagan

2006-01-01

Coordinating multiple agents that need to perform a sequence of actions to maximize a system level reward requires solving two distinct credit assignment problems. First, credit must be assigned for an action taken at time step t that results in a reward at time step t > t. Second, credit must be assigned for the contribution of agent i to the overall system performance. The first credit assignment problem is typically addressed with temporal difference methods such as Q-learning. The second credit assignment problem is typically addressed by creating custom reward functions. To address both credit assignment problems simultaneously, we propose the "Q Updates with Immediate Counterfactual Rewards-learning" (QUICR-learning) designed to improve both the convergence properties and performance of Q-learning in large multi-agent problems. QUICR-learning is based on previous work on single-time-step counterfactual rewards described by the collectives framework. Results on a traffic congestion problem shows that QUICR-learning is significantly better than a Q-learner using collectives-based (single-time-step counterfactual) rewards. In addition QUICR-learning provides significant gains over conventional and local Q-learning. Additional results on a multi-agent grid-world problem show that the improvements due to QUICR-learning are not domain specific and can provide up to a ten fold increase in performance over existing methods.
Measuring homework completion in behavioral activation.

PubMed

Busch, Andrew M; Uebelacker, Lisa A; Kalibatseva, Zornitsa; Miller, Ivan W

2010-07-01

The aim of this study was to develop and validate an observer-based coding system for the characterization and completion of homework assignments during Behavioral Activation (BA). Existing measures of homework completion are generally unsophisticated, and there is no current measure of homework completion designed to capture the particularities of BA. The tested scale sought to capture the type of assignment, realm of functioning targeted, extent of completion, and assignment difficulty. Homework assignments were drawn from 12 (mean age = 48, 83% female) clients in two trials of a 10-session BA manual targeting treatment-resistant depression in primary care. The two coders demonstrated acceptable or better reliability on most codes, and unreliable codes were dropped from the proposed scale. In addition, correlations between homework completion and outcome were strong, providing some support for construct validity. Ultimately, this line of research aims to develop a user-friendly, reliable measure of BA homework completion that can be completed by a therapist during session.
Targeted sequencing for high-resolution evolutionary analyses following genome duplication in salmonid fish: Proof of concept for key components of the insulin-like growth factor axis.

PubMed

Lappin, Fiona M; Shaw, Rebecca L; Macqueen, Daniel J

2016-12-01

High-throughput sequencing has revolutionised comparative and evolutionary genome biology. It has now become relatively commonplace to generate multiple genomes and/or transcriptomes to characterize the evolution of large taxonomic groups of interest. Nevertheless, such efforts may be unsuited to some research questions or remain beyond the scope of some research groups. Here we show that targeted high-throughput sequencing offers a viable alternative to study genome evolution across a vertebrate family of great scientific interest. Specifically, we exploited sequence capture and Illumina sequencing to characterize the evolution of key components from the insulin-like growth (IGF) signalling axis of salmonid fish at unprecedented phylogenetic resolution. The IGF axis represents a central governor of vertebrate growth and its core components were expanded by whole genome duplication in the salmonid ancestor ~95Ma. Using RNA baits synthesised to genes encoding the complete family of IGF binding proteins (IGFBP) and an IGF hormone (IGF2), we captured, sequenced and assembled orthologous and paralogous exons from species representing all ten salmonid genera. This approach generated 299 novel sequences, most as complete or near-complete protein-coding sequences. Phylogenetic analyses confirmed congruent evolutionary histories for all nineteen recognized salmonid IGFBP family members and identified novel salmonid-specific IGF2 paralogues. Moreover, we reconstructed the evolution of duplicated IGF axis paralogues across a replete salmonid phylogeny, revealing complex historic selection regimes - both ancestral to salmonids and lineage-restricted - that frequently involved asymmetric paralogue divergence under positive and/or relaxed purifying selection. Our findings add to an emerging literature highlighting diverse applications for targeted sequencing in comparative-evolutionary genomics. We also set out a viable approach to obtain large sets of nuclear genes for any member of the salmonid family, which should enable insights into the evolutionary role of whole genome duplication before additional nuclear genome sequences become available. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

Metagenomic analysis of Sichuan takin fecal sample viromes reveals novel enterovirus and astrovirus.

PubMed

Guan, Tian-Pei; Teng, Jade L L; Yeong, Kai-Yan; You, Zhang-Qiang; Liu, Hao; Wong, Samson S Y; Lau, Susanna K P; Woo, Patrick C Y

2018-06-07

The Sichuan takin inhabits the bamboo forests in the Eastern Himalayas and is considered as a national treasure of China with the highest legal protection and conservation status considered as vulnerable according to The IUCN Red List of Threatened Species. In this study, fecal samples of 71 Sichuan takins were pooled and deep sequenced. Among the 103,553 viral sequences, 21,961 were assigned to mammalian viruses. De novo assembly revealed genomes of an enterovirus and an astrovirus and contigs of circoviruses and genogroup I picobirnaviruses. Complete genome sequencing and phylogenetic analysis showed that Sichuan takin enterovirus is a novel serotype/genotype of the species Enterovirus G, with evidence of recombination. Sichuan takin astrovirus is a new subtype of bovine astrovirus, probably belonging to a new genogroup in the genus Mamastrovirus. Further studies will reveal whether these viruses can also be found in Mishmi takin and Shaanxi takin and their pathogenic potentials. Copyright © 2018 Elsevier Inc. All rights reserved.
Investigation of the Evolutionary Development of the Genus Bifidobacterium by Comparative Genomics

PubMed Central

Lugli, Gabriele Andrea; Milani, Christian; Turroni, Francesca; Duranti, Sabrina; Ferrario, Chiara; Viappiani, Alice; Mancabelli, Leonardo; Mangifesta, Marta; Taminiau, Bernard; Delcenserie, Véronique; van Sinderen, Douwe

2014-01-01

The Bifidobacterium genus currently encompasses 48 recognized taxa, which have been isolated from different ecosystems. However, the current phylogeny of bifidobacteria is hampered by the relative paucity of genotypic data. Here, we reassessed the taxonomy of this bacterial genus using genome-based approaches, which demonstrated that the previous taxonomic view of bifidobacteria contained several inconsistencies. In particular, high levels of genetic relatedness were shown to exist between particular Bifidobacterium taxa which would not justify their status as separate species. The results presented are here based on average nucleotide identity analysis involving the genome sequences for each type strain of the 48 bifidobacterial taxa, as well as phylogenetic comparative analysis of the predicted core genome of the Bifidobacterium genus. The results of this study demonstrate that the availability of complete genome sequences allows the reconstruction of a more robust bifidobacterial phylogeny than that obtained from a single gene-based sequence comparison, thus discouraging the assignment of a new or separate bifidobacterial taxon without such a genome-based validation. PMID:25107967
Attitudes toward, and Use of, Textbooks among Marketing Undergraduates: An Exploratory Study

ERIC Educational Resources Information Center

Vafeas, Mario

2013-01-01

While textbooks remain a key part of the teaching and learning process, evidence suggests that student completion of reading assignments is lower than teacher expectations. Although there is a small body of literature examining textbook use, studies relating specifically to marketing textbooks are sparse. This article seeks to explore how…
Learning Cellular Sorting Pathways Using Protein Interactions and Sequence Motifs

PubMed Central

Lin, Tien-Ho; Bar-Joseph, Ziv

2011-01-01

Abstract Proper subcellular localization is critical for proteins to perform their roles in cellular functions. Proteins are transported by different cellular sorting pathways, some of which take a protein through several intermediate locations until reaching its final destination. The pathway a protein is transported through is determined by carrier proteins that bind to specific sequence motifs. In this article, we present a new method that integrates protein interaction and sequence motif data to model how proteins are sorted through these sorting pathways. We use a hidden Markov model (HMM) to represent protein sorting pathways. The model is able to determine intermediate sorting states and to assign carrier proteins and motifs to the sorting pathways. In simulation studies, we show that the method can accurately recover an underlying sorting model. Using data for yeast, we show that our model leads to accurate prediction of subcellular localization. We also show that the pathways learned by our model recover many known sorting pathways and correctly assign proteins to the path they utilize. The learned model identified new pathways and their putative carriers and motifs and these may represent novel protein sorting mechanisms. Supplementary results and software implementation are available from http://murphylab.web.cmu.edu/software/2010_RECOMB_pathways/. PMID:21999284
Structure-based functional annotation: yeast ymr099c codes for a D-hexose-6-phosphate mutarotase.

PubMed

Graille, Marc; Baltaze, Jean-Pierre; Leulliot, Nicolas; Liger, Dominique; Quevillon-Cheruel, Sophie; van Tilbeurgh, Herman

2006-10-06

Despite the generation of a large amount of sequence information over the last decade, more than 40% of well characterized enzymatic functions still lack associated protein sequences. Assigning protein sequences to documented biochemical functions is an interesting challenge. We illustrate here that structural genomics may be a reasonable approach in addressing these questions. We present the crystal structure of the Saccharomyces cerevisiae YMR099cp, a protein of unknown function. YMR099cp adopts the same fold as galactose mutarotase and shares the same catalytic machinery necessary for the interconversion of the alpha and beta anomers of galactose. The structure revealed the presence in the active site of a sulfate ion attached by an arginine clamp made by the side chain from two strictly conserved arginine residues. This sulfate is ideally positioned to mimic the phosphate group of hexose 6-phosphate. We have subsequently successfully demonstrated that YMR099cp is a hexose-6-phosphate mutarotase with broad substrate specificity. We solved high resolution structures of some substrate enzyme complexes, further confirming our functional hypothesis. The metabolic role of a hexose-6-phosphate mutarotase is discussed. This work illustrates that structural information has been crucial to assign YMR099cp to the orphan EC activity: hexose-phosphate mutarotase.
A combination of PhP typing and β-d-glucuronidase gene sequence variation analysis for differentiation of Escherichia coli from humans and animals.

PubMed

Masters, N; Christie, M; Katouli, M; Stratton, H

2015-06-01

We investigated the usefulness of the β-d-glucuronidase gene variance in Escherichia coli as a microbial source tracking tool using a novel algorithm for comparison of sequences from a prescreened set of host-specific isolates using a high-resolution PhP typing method. A total of 65 common biochemical phenotypes belonging to 318 E. coli strains isolated from humans and domestic and wild animals were analysed for nucleotide variations at 10 loci along a 518 bp fragment of the 1812 bp β-d-glucuronidase gene. Neighbour-joining analysis of loci variations revealed 86 (76.8%) human isolates and 91.2% of animal isolates were correctly identified. Pairwise hierarchical clustering improved assignment; where 92 (82.1%) human and 204 (99%) animal strains were assigned to their respective cluster. Our data show that initial typing of isolates and selection of common types from different hosts prior to analysis of the β-d-glucuronidase gene sequence improves source identification. We also concluded that numerical profiling of the nucleotide variations can be used as a valuable approach to differentiate human from animal E. coli. This study signifies the usefulness of the β-d-glucuronidase gene as a marker for differentiating human faecal pollution from animal sources.
Geographically widespread swordfish barcode stock identification: a case study of its application.

PubMed

Pappalardo, Anna Maria; Guarino, Francesca; Reina, Simona; Messina, Angela; De Pinto, Vito

2011-01-01

The swordfish (Xiphias gladius) is a cosmopolitan large pelagic fish inhabiting tempered and tropical waters and it is a target species for fisheries all around the world. The present study investigated the ability of COI barcoding to reliably identify swordfish and particularly specific stocks of this commercially important species. We applied the classical DNA barcoding technology, upon a 682 bp segment of COI, and compared swordfish sequences from different geographical sources (Atlantic, Indian Oceans and Mediterranean Sea). The sequences of the 5' hyper-variable fragment of the control region (5'dloop), were also used to validate the efficacy of COI as a stock-specific marker. This information was successfully applied to the discrimination of unknown samples from the market, detecting in some cases mislabeled seafood products. The NJ distance-based phenogram (K2P model) obtained with COI sequences allowed us to correlate the swordfish haplotypes to the different geographical stocks. Similar results were obtained with 5'dloop. Our preliminary data in swordfish Xiphias gladius confirm that Cytochrome Oxidase I can be proposed as an efficient species-specific marker that has also the potential to assign geographical provenance. This information might speed the samples analysis in commercial application of barcoding.
Structural impact of complete CpG methylation within target DNA on specific complex formation of the inducible transcription factor Egr-1.

PubMed

Zandarashvili, Levani; White, Mark A; Esadze, Alexandre; Iwahara, Junji

2015-07-08

The inducible transcription factor Egr-1 binds specifically to 9-bp target sequences containing two CpG sites that can potentially be methylated at four cytosine bases. Although it appears that complete CpG methylation would make an unfavorable steric clash in the previous crystal structures of the complexes with unmethylated or partially methylated DNA, our affinity data suggest that DNA recognition by Egr-1 is insensitive to CpG methylation. We have determined, at a 1.4-Å resolution, the crystal structure of the Egr-1 zinc-finger complex with completely methylated target DNA. Structural comparison of the three different methylation states reveals why Egr-1 can recognize the target sequences regardless of CpG methylation. Copyright © 2015 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.
AMPLISAS: a web server for multilocus genotyping using next-generation amplicon sequencing data.

PubMed

Sebastian, Alvaro; Herdegen, Magdalena; Migalska, Magdalena; Radwan, Jacek

2016-03-01

Next-generation sequencing (NGS) technologies are revolutionizing the fields of biology and medicine as powerful tools for amplicon sequencing (AS). Using combinations of primers and barcodes, it is possible to sequence targeted genomic regions with deep coverage for hundreds, even thousands, of individuals in a single experiment. This is extremely valuable for the genotyping of gene families in which locus-specific primers are often difficult to design, such as the major histocompatibility complex (MHC). The utility of AS is, however, limited by the high intrinsic sequencing error rates of NGS technologies and other sources of error such as polymerase amplification or chimera formation. Correcting these errors requires extensive bioinformatic post-processing of NGS data. Amplicon Sequence Assignment (AMPLISAS) is a tool that performs analysis of AS results in a simple and efficient way, while offering customization options for advanced users. AMPLISAS is designed as a three-step pipeline consisting of (i) read demultiplexing, (ii) unique sequence clustering and (iii) erroneous sequence filtering. Allele sequences and frequencies are retrieved in excel spreadsheet format, making them easy to interpret. AMPLISAS performance has been successfully benchmarked against previously published genotyped MHC data sets obtained with various NGS technologies. © 2015 John Wiley & Sons Ltd.
Optimized co-solute paramagnetic relaxation enhancement for the rapid NMR analysis of a highly fibrillogenic peptide.

PubMed

Oktaviani, Nur Alia; Risør, Michael W; Lee, Young-Ho; Megens, Rik P; de Jong, Djurre H; Otten, Renee; Scheek, Ruud M; Enghild, Jan J; Nielsen, Niels Chr; Ikegami, Takahisa; Mulder, Frans A A

2015-06-01

Co-solute paramagnetic relaxation enhancement (PRE) is an attractive way to speed up data acquisition in NMR spectroscopy by shortening the T 1 relaxation time of the nucleus of interest and thus the necessary recycle delay. Here, we present the rationale to utilize high-spin iron(III) as the optimal transition metal for this purpose and characterize the properties of its neutral chelate form Fe(DO3A) as a suitable PRE agent. Fe(DO3A) effectively reduces the T 1 values across the entire sequence of the intrinsically disordered protein α-synuclein with negligible impact on line width. The agent is better suited than currently used alternatives, shows no specific interaction with the polypeptide chain and, due to its high relaxivity, is effective at low concentrations and in 'proton-less' NMR experiments. By using Fe(DO3A) we were able to complete the backbone resonance assignment of a highly fibrillogenic peptide from α1-antitrypsin by acquiring the necessary suite of multidimensional NMR datasets in 3 h.
Large-scale parallel 454 sequencing reveals host ecological group specificity of arbuscular mycorrhizal fungi in a boreonemoral forest.

PubMed

Opik, M; Metsis, M; Daniell, T J; Zobel, M; Moora, M

2009-10-01

* Knowledge of the diversity of arbuscular mycorrhizal fungi (AMF) in natural ecosystems is a major bottleneck in mycorrhizal ecology. Here, we aimed to apply 454 sequencing--providing a new level of descriptive power--to assess the AMF diversity in a boreonemoral forest. * 454 sequencing reads of the small subunit ribosomal RNA (SSU rRNA) gene of Glomeromycota were assigned to sequence groups by blast searches against a custom-made annotated sequence database. * We detected 47 AMF taxa in the roots of 10 plant species in a 10 x 10 m plot, which is almost the same as the number of plant species in the whole studied forest. There was a significant difference between AMF communities in the roots of forest specialist plant species and in the roots of habitat generalist plant species. Forest plant species hosted 22 specialist AMF taxa, and the generalist plants shared all but one AMF taxon with forest plants, including globally distributed generalist fungi. These AMF taxa that have been globally recorded only in forest ecosystems were significantly over-represented in the roots of forest plant species. * Our findings suggest that partner specificity in AM symbiosis may occur at the level of ecological groups, rather than at the species level, of both plant and fungal partners.
Efficacy of memory aids after traumatic brain injury: A single case series.

PubMed

Bos, Hannah R; Babbage, Duncan R; Leathem, Janet M

2017-01-01

Individuals living with traumatic brain injury commonly have difficulties with prospective memory-the ability to remember a planned action at the intended time. Traditionally a memory notebook has been recommended as a compensatory memory aid. Electronic devices have the advantage of providing a cue at the appropriate time to remind participants to refer to the memory aid and complete tasks. Research suggests these have potential benefit in neurorehabilitation. This study aimed to investigate the efficacy of a memory notebook and specifically a smartphone as a compensatory memory aid. A single case series design was used to assess seven participants. A no-intervention baseline was followed by training and intervention with either the smartphone alone, or a memory notebook and later the smartphone. Memory was assessed with weekly assigned memory tasks. Participants using a smartphone showed improvements in their ability to complete assigned memory tasks accurately and within the assigned time periods. Use of a smartphone provided additional benefits over and above those already seen for those who received a memory notebook first. Smartphones have the potential to be a useful and cost effective tool in neurorehabilitation practice.
Complete amino acid sequence of the myoglobin from the Pacific sei whale, Balaenoptera borealis.

PubMed

Jones, B N; Rothgeb, T M; England, R D; Gurd, F R

1979-04-25

The complete amino acid sequence of the major component myoglobin from Pacific sei whale, Balaenoptera borealis, was determined by specific cleavage of the protein to obtain large peptides which are readily degraded by the automatic sequencer. The acetimidated apomyoglobin was selectively cleaved at its two methionyl residues with cyanogen bromide and at its three arginyl residues by trypsin. From the sequence analysis of four of these peptides and the apomyoglobin, over 75% of the covalent structure of the protein was obtained. The remainder of the primary structure was determined by the sequence analysis of peptides that resulted from further digestion of the amino-terminal and central cyanogen bromide fragments. The amino-terminal fragment was specifically cleaved at its two tryptophanyl residues with N-chlorosuccinimide and the central cyanogen bromide fragment was cleaved at its glutamyl residues with staphylococcal protease and at its single tyrosyl residue with N-bromosuccinimide. The primary structure of this myoglobin proved identical with that from the gray whale but differs from that of the finback whale at four positions, from that of the minke whale at three positions and from the myoglobin of the humpback whale at one position. The above sequence identities and differences reflect the close taxonomic relationship of these five species of Cetacea.
Application of the MIDAS approach for analysis of lysine acetylation sites.

PubMed

Evans, Caroline A; Griffiths, John R; Unwin, Richard D; Whetton, Anthony D; Corfe, Bernard M

2013-01-01

Multiple Reaction Monitoring Initiated Detection and Sequencing (MIDAS™) is a mass spectrometry-based technique for the detection and characterization of specific post-translational modifications (Unwin et al. 4:1134-1144, 2005), for example acetylated lysine residues (Griffiths et al. 18:1423-1428, 2007). The MIDAS™ technique has application for discovery and analysis of acetylation sites. It is a hypothesis-driven approach that requires a priori knowledge of the primary sequence of the target protein and a proteolytic digest of this protein. MIDAS essentially performs a targeted search for the presence of modified, for example acetylated, peptides. The detection is based on the combination of the predicted molecular weight (measured as mass-charge ratio) of the acetylated proteolytic peptide and a diagnostic fragment (product ion of m/z 126.1), which is generated by specific fragmentation of acetylated peptides during collision induced dissociation performed in tandem mass spectrometry (MS) analysis. Sequence information is subsequently obtained which enables acetylation site assignment. The technique of MIDAS was later trademarked by ABSciex for targeted protein analysis where an MRM scan is combined with full MS/MS product ion scan to enable sequence confirmation.
Improving Care for Veterans with PTSD: Comparing Risks and Benefits of Antipsychotics Versus Other Medications to Augment First-Line Pharmacologic Therapy

DTIC Science & Technology

2017-10-01

for all project Aims. Timeline- months 3-6. Status: completed. Task 6: Complete primary analyses and hypothesis testing for Aim 2, including...glucose. For each of these lab tests , each VA site can name them something different and can change names over times. Labs should be linked to Logical...Observation Identifiers Names (LOINC) codes, an international standard system that assigns a numeric code to specific lab tests . However, VA data
Comparison of dkgB-linked intergenic sequence ribotyping to DNA microarray hybridization for assigning serotype to Salmonella enterica

PubMed Central

Guard, Jean; Sanchez-Ingunza, Roxana; Morales, Cesar; Stewart, Tod; Liljebjelke, Karen; Kessel, JoAnn; Ingram, Kim; Jones, Deana; Jackson, Charlene; Fedorka-Cray, Paula; Frye, Jonathan; Gast, Richard; Hinton, Arthur

2012-01-01

Two DNA-based methods were compared for the ability to assign serotype to 139 isolates of Salmonella enterica ssp. I. Intergenic sequence ribotyping (ISR) evaluated single nucleotide polymorphisms occurring in a 5S ribosomal gene region and flanking sequences bordering the gene dkgB. A DNA microarray hybridization method that assessed the presence and the absence of sets of genes was the second method. Serotype was assigned for 128 (92.1%) of submissions by the two DNA methods. ISR detected mixtures of serotypes within single colonies and it cost substantially less than Kauffmann–White serotyping and DNA microarray hybridization. Decreasing the cost of serotyping S. enterica while maintaining reliability may encourage routine testing and research. PMID:22998607
Fast Timing Study of the β- Decay of 63Mn to 63Fe

NASA Astrophysics Data System (ADS)

Olaizola, B.; Fraile, L. M.; Mach, H.; Briz, J. A.; Cal-González, J.; Ghita, D.; Köster, U.; Kurcewicz, W.; Lesher, S. R.; Pauwels, D.; Picado, E.; Poves, A.; Radulov, D.; Simpson, G. S.; Udias, J. M.

The β- decay of 63Mn to 63Fe has been studied in an experiment at ISOLDE, CERN. The previously known 63Fe level scheme has been confirmed and greatly expanded, to a total of 31 levels and 73 γ lines. The energy of the 9/2+ isomer state has been measured for the first time at 475.0 keV, completing the systematics of such states in odd-Fe isotopes below 68Ni. In addition, the lifetimes of the low-lying states have been measured, allowing the tentative assignment of the spin-parity sequence for those levels.
Detailed analysis of metagenome datasets obtained from biogas-producing microbial communities residing in biogas reactors does not indicate the presence of putative pathogenic microorganisms

PubMed Central

2013-01-01

Background In recent years biogas plants in Germany have been supposed to be involved in amplification and dissemination of pathogenic bacteria causing severe infections in humans and animals. In particular, biogas plants are discussed to contribute to the spreading of Escherichia coli infections in humans or chronic botulism in cattle caused by Clostridium botulinum. Metagenome datasets of microbial communities from an agricultural biogas plant as well as from anaerobic lab-scale digesters operating at different temperatures and conditions were analyzed for the presence of putative pathogenic bacteria and virulence determinants by various bioinformatic approaches. Results All datasets featured a low abundance of reads that were taxonomically assigned to the genus Escherichia or further selected genera comprising pathogenic species. Higher numbers of reads were taxonomically assigned to the genus Clostridium. However, only very few sequences were predicted to originate from pathogenic clostridial species. Moreover, mapping of metagenome reads to complete genome sequences of selected pathogenic bacteria revealed that not the pathogenic species itself, but only species that are more or less related to pathogenic ones are present in the fermentation samples analyzed. Likewise, known virulence determinants could hardly be detected. Only a marginal number of reads showed similarity to sequences described in the Microbial Virulence Database MvirDB such as those encoding protein toxins, virulence proteins or antibiotic resistance determinants. Conclusions Findings of this first study of metagenomic sequence reads of biogas producing microbial communities suggest that the risk of dissemination of pathogenic bacteria by application of digestates from biogas fermentations as fertilizers is low, because obtained results do not indicate the presence of putative pathogenic microorganisms in the samples analyzed. PMID:23557021
Motivating Reading Compliance: Adaptation of Monte Carlo Quizzes for Online Delivery.

PubMed

Azzarello, Jo; Ogans, Judy; Robertson, Victoria

Getting students to complete reading assignments is often a source of frustration for nurse educators. Monte Carlo Quizzes (MCQs) were adapted for online delivery in a hybrid nursing course to encourage timely completion and deep processing of readings. Students indicated that MCQs motivated them to complete the assigned readings and to read more carefully. However, there were no significant differences on scores for other course assignments between those who completed readings and those who did not.
Preliminary catalog of pictures taken on the lunar surface during the Apollo 16 mission

NASA Technical Reports Server (NTRS)

Batson, R. M.; Carson, K. B.; Reed, V. S.; Tyner, R. L.

1972-01-01

A catalog of all pictures taken from the lunar module or the lunar surface during the Apollo 16 lunar stay is presented. The tabulations are arranged for the following specific uses: (1) given the number of a particular frame, find its location in the sequence of lunar surface activity, the station from which it was taken and the subject matter of the picture; (2) given a particular location or activity within the sequence of lunar surface activity, find the pictures taken at that time and their subject matter; and (3) given a sample number from the voice transcript listed, find the designation assigned to the same sample by the lunar receiving laboratory.

First-order and higher order sequence learning in specific language impairment.

PubMed

Clark, Gillian M; Lum, Jarrad A G

2017-02-01

A core claim of the procedural deficit hypothesis of specific language impairment (SLI) is that the disorder is associated with poor implicit sequence learning. This study investigated whether implicit sequence learning problems in SLI are present for first-order conditional (FOC) and higher order conditional (HOC) sequences. Twenty-five children with SLI and 27 age-matched, nonlanguage-impaired children completed 2 serial reaction time tasks. On 1 version, the sequence to be implicitly learnt comprised a FOC sequence and on the other a HOC sequence. Results showed that the SLI group learned the HOC sequence (η p ² = .285, p = .005) but not the FOC sequence (η p ² = .099, p = .118). The control group learned both sequences (FOC η p ² = .497, HOC η p 2= .465, ps < .001). The SLI group's difficulty learning the FOC sequence is consistent with the procedural deficit hypothesis. However, the study provides new evidence that multiple mechanisms may underpin the learning of FOC and HOC sequences. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Fragment assignment in the cloud with eXpress-D

PubMed Central

2013-01-01

Background Probabilistic assignment of ambiguously mapped fragments produced by high-throughput sequencing experiments has been demonstrated to greatly improve accuracy in the analysis of RNA-Seq and ChIP-Seq, and is an essential step in many other sequence census experiments. A maximum likelihood method using the expectation-maximization (EM) algorithm for optimization is commonly used to solve this problem. However, batch EM-based approaches do not scale well with the size of sequencing datasets, which have been increasing dramatically over the past few years. Thus, current approaches to fragment assignment rely on heuristics or approximations for tractability. Results We present an implementation of a distributed EM solution to the fragment assignment problem using Spark, a data analytics framework that can scale by leveraging compute clusters within datacenters–“the cloud”. We demonstrate that our implementation easily scales to billions of sequenced fragments, while providing the exact maximum likelihood assignment of ambiguous fragments. The accuracy of the method is shown to be an improvement over the most widely used tools available and can be run in a constant amount of time when cluster resources are scaled linearly with the amount of input data. Conclusions The cloud offers one solution for the difficulties faced in the analysis of massive high-thoughput sequencing data, which continue to grow rapidly. Researchers in bioinformatics must follow developments in distributed systems–such as new frameworks like Spark–for ways to port existing methods to the cloud and help them scale to the datasets of the future. Our software, eXpress-D, is freely available at: http://github.com/adarob/express-d. PMID:24314033
Molecular detection and species identification of Alexandrium (Dinophyceae) causing harmful algal blooms along the Chilean coastline

PubMed Central

Jedlicki, Ana; Fernández, Gonzalo; Astorga, Marcela; Oyarzún, Pablo; Toro, Jorge E.; Navarro, Jorge M.; Martínez, Víctor

2012-01-01

Background and aims On the basis of morphological evidence, the species involved in South American Pacific coast harmful algal blooms (HABs) has been traditionally recognized as Alexandrium catenella (Dinophyceae). However, these observations have not been confirmed using evidence based on genomic sequence variability. Our principal objective was to accurately determine the species of Alexandrium involved in local HABs in order to implement a real-time polymerase chain reaction (PCR) assay for its rapid and easy detection on filter-feeding shellfish, such as mussels. Methodology For species-specific determination, the intergenic spacer 1 (ITS1), 5.8S subunit, ITS2 and the hypervariable genomic regions D1–D5 of the large ribosomal subunit of local strains were sequenced and compared with two data sets of other Alexandrium sequences. Species-specific primers were used to amplify signature sequences within the genomic DNA of the studied species by conventional and real-time PCR. Principal results Phylogenetic analysis determined that the Chilean strain falls into Group I of the tamarensis complex. Our results support the allocation of the Chilean Alexandrium species as a toxic Alexandrium tamarense rather than A. catenella, as currently defined. Once local species were determined to belong to Group I of the tamarensis complex, a highly sensitive and accurate real-time PCR procedure was developed to detect dinoflagellate presence in Mytilus spp. (Bivalvia) samples after being fed (challenged) in vitro with the Chilean Alexandrium strain. The results show that real-time PCR is useful to detect Alexandrium intake in filter-feeding molluscs. Conclusions It has been shown that the classification of local Alexandrium using morphological evidence is not very accurate. Molecular methods enabled the HAB dinoflagellate species of the Chilean coast to be assigned as A. tamarense rather than A. catenella. Real-time PCR analysis based on A. tamarense primers allowed the detection of dinoflagellate DNA in Mytilus spp. samples exposed to this alga. Through the specific assignment of dinoflagellate species involved in HABs, more reliable preventive policies can be implemented. PMID:23259043
Supervised DNA Barcodes species classification: analysis, comparisons and results

PubMed Central

2014-01-01

Background Specific fragments, coming from short portions of DNA (e.g., mitochondrial, nuclear, and plastid sequences), have been defined as DNA Barcode and can be used as markers for organisms of the main life kingdoms. Species classification with DNA Barcode sequences has been proven effective on different organisms. Indeed, specific gene regions have been identified as Barcode: COI in animals, rbcL and matK in plants, and ITS in fungi. The classification problem assigns an unknown specimen to a known species by analyzing its Barcode. This task has to be supported with reliable methods and algorithms. Methods In this work the efficacy of supervised machine learning methods to classify species with DNA Barcode sequences is shown. The Weka software suite, which includes a collection of supervised classification methods, is adopted to address the task of DNA Barcode analysis. Classifier families are tested on synthetic and empirical datasets belonging to the animal, fungus, and plant kingdoms. In particular, the function-based method Support Vector Machines (SVM), the rule-based RIPPER, the decision tree C4.5, and the Naïve Bayes method are considered. Additionally, the classification results are compared with respect to ad-hoc and well-established DNA Barcode classification methods. Results A software that converts the DNA Barcode FASTA sequences to the Weka format is released, to adapt different input formats and to allow the execution of the classification procedure. The analysis of results on synthetic and real datasets shows that SVM and Naïve Bayes outperform on average the other considered classifiers, although they do not provide a human interpretable classification model. Rule-based methods have slightly inferior classification performances, but deliver the species specific positions and nucleotide assignments. On synthetic data the supervised machine learning methods obtain superior classification performances with respect to the traditional DNA Barcode classification methods. On empirical data their classification performances are at a comparable level to the other methods. Conclusions The classification analysis shows that supervised machine learning methods are promising candidates for handling with success the DNA Barcoding species classification problem, obtaining excellent performances. To conclude, a powerful tool to perform species identification is now available to the DNA Barcoding community. PMID:24721333
Systematic Analysis of Primary Sequence Domain Segments for the Discrimination Between Class C GPCR Subtypes.

PubMed

König, Caroline; Alquézar, René; Vellido, Alfredo; Giraldo, Jesús

2018-03-01

G-protein-coupled receptors (GPCRs) are a large and diverse super-family of eukaryotic cell membrane proteins that play an important physiological role as transmitters of extracellular signal. In this paper, we investigate Class C, a member of this super-family that has attracted much attention in pharmacology. The limited knowledge about the complete 3D crystal structure of Class C receptors makes necessary the use of their primary amino acid sequences for analytical purposes. Here, we provide a systematic analysis of distinct receptor sequence segments with regard to their ability to differentiate between seven class C GPCR subtypes according to their topological location in the extracellular, transmembrane, or intracellular domains. We build on the results from the previous research that provided preliminary evidence of the potential use of separated domains of complete class C GPCR sequences as the basis for subtype classification. The use of the extracellular N-terminus domain alone was shown to result in a minor decrease in subtype discrimination in comparison with the complete sequence, despite discarding much of the sequence information. In this paper, we describe the use of Support Vector Machine-based classification models to evaluate the subtype-discriminating capacity of the specific topological sequence segments.
Waves and Particles, The Orbital Atom, Parts One and Two of an Integrated Science Sequence, Teacher's Guide, 1973 Edition.

ERIC Educational Resources Information Center

Portland Project Committee, OR.

This teacher's guide includes parts one and two of the four-part third year Portland Project, a three-year integrated secondary science curriculum sequence. The Harvard Project Physics textbook is used for reading assignments for part one. Assignments relate to waves, light, electricity, magnetic fields, Faraday and the electrical age,…
Deep sequencing reveals exceptional diversity and modes of transmission for bacterial sponge symbionts.

PubMed

Webster, Nicole S; Taylor, Michael W; Behnam, Faris; Lücker, Sebastian; Rattei, Thomas; Whalan, Stephen; Horn, Matthias; Wagner, Michael

2010-08-01

Marine sponges contain complex bacterial communities of considerable ecological and biotechnological importance, with many of these organisms postulated to be specific to sponge hosts. Testing this hypothesis in light of the recent discovery of the rare microbial biosphere, we investigated three Australian sponges by massively parallel 16S rRNA gene tag pyrosequencing. Here we show bacterial diversity that is unparalleled in an invertebrate host, with more than 250,000 sponge-derived sequence tags being assigned to 23 bacterial phyla and revealing up to 2996 operational taxonomic units (95% sequence similarity) per sponge species. Of the 33 previously described 'sponge-specific' clusters that were detected in this study, 48% were found exclusively in adults and larvae - implying vertical transmission of these groups. The remaining taxa, including 'Poribacteria', were also found at very low abundance among the 135,000 tags retrieved from surrounding seawater. Thus, members of the rare seawater biosphere may serve as seed organisms for widely occurring symbiont populations in sponges and their host association might have evolved much more recently than previously thought. © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd.
A high-throughput Sanger strategy for human mitochondrial genome sequencing

PubMed Central

2013-01-01

Background A population reference database of complete human mitochondrial genome (mtGenome) sequences is needed to enable the use of mitochondrial DNA (mtDNA) coding region data in forensic casework applications. However, the development of entire mtGenome haplotypes to forensic data quality standards is difficult and laborious. A Sanger-based amplification and sequencing strategy that is designed for automated processing, yet routinely produces high quality sequences, is needed to facilitate high-volume production of these mtGenome data sets. Results We developed a robust 8-amplicon Sanger sequencing strategy that regularly produces complete, forensic-quality mtGenome haplotypes in the first pass of data generation. The protocol works equally well on samples representing diverse mtDNA haplogroups and DNA input quantities ranging from 50 pg to 1 ng, and can be applied to specimens of varying DNA quality. The complete workflow was specifically designed for implementation on robotic instrumentation, which increases throughput and reduces both the opportunities for error inherent to manual processing and the cost of generating full mtGenome sequences. Conclusions The described strategy will assist efforts to generate complete mtGenome haplotypes which meet the highest data quality expectations for forensic genetic and other applications. Additionally, high-quality data produced using this protocol can be used to assess mtDNA data developed using newer technologies and chemistries. Further, the amplification strategy can be used to enrich for mtDNA as a first step in sample preparation for targeted next-generation sequencing. PMID:24341507
Two-dimensional sup 1 H nuclear magnetic resonance study of AaH IT, an anti-insect toxin from the scorpion Androctonus australis Hector. Sequential resonance assignments and folding of the polypeptide chain

DOE Office of Scientific and Technical Information (OSTI.GOV)

Darbon, H.; Weber, C.; Braun, W.

1991-02-19

Sequence-specific nuclear magnetic resonance assignments for the polypeptide backbone and for most of the amino acid side-chain protons, as well as the general folding of AaH IT, are described. AaH IT is a neurotoxin purified from the venom of the scorpion Androctonus australis Hector and is specifically active on the insect nervous system. The secondary structure and the hydrogen-bonding patterns in the regular secondary structure elements are deduced from nuclear Overhauser effects and the sequence locations of the slowly exchanging amide protons. The backbone folding is determined by distance geometry calculations with the DISMAN program. The regular secondary structure includesmore » two and a half turns of {alpha}-helix running from residues 21 to 30 and a three-stranded antiparallel {beta}-sheet including peptides 3-5, 34-38, and 41-46. Two tight turns are present, one connecting the end of the {alpha}-helix to an external strand of the {beta}-sheet, i.e., turn 31-34, and another connecting this same strand to the central one, i.e., turn 38-41. The differences in the specificity of these related proteins, which are able to discriminate between mammalian and insect voltage-dependent sodium channels of excitable tissues, are most probably brought about by the position of the C-terminal peptide with regard to a hydrophobic surface common to all scorpion toxins examined thus far. Thus, the interaction of a given scorpion toxin with its receptor might well be governed by the presence of this solvent-exposed hydrophobic surface, whereas adjacent areas modulate the specificity of the interaction.« less
The Two Worlds of School: Differences in the Photographs of Black and White Adolescents.

ERIC Educational Resources Information Center

Damico, Sandra Bowman

This paper presents a study conducted to document adolescents' visual perceptions of school. Specifically, an attempt was made to determine whether black and white adolescents, when given cameras, an entire school day, and complete freedom from class assignments, would select different physical and social aspects of their school environment to…
Sequence-specific backbone resonance assignments and microsecond timescale molecular dynamics simulation of human eosinophil-derived neurotoxin.

PubMed

Gagné, Donald; Narayanan, Chitra; Bafna, Khushboo; Charest, Laurie-Anne; Agarwal, Pratul K; Doucet, Nicolas

2017-10-01

Eight active canonical members of the pancreatic-like ribonuclease A (RNase A) superfamily have been identified in human. All structural homologs share similar RNA-degrading functions, while also cumulating other various biological activities in different tissues. The functional homologs eosinophil-derived neurotoxin (EDN, or RNase 2) and eosinophil cationic protein (ECP, or RNase 3) are known to be expressed and secreted by eosinophils in response to infection, and have thus been postulated to play an important role in host defense and inflammatory response. We recently initiated the biophysical and dynamical investigation of several vertebrate RNase homologs and observed that clustering residue dynamics appear to be linked with the phylogeny and biological specificity of several members. Here we report the 1 H, 13 C and 15 N backbone resonance assignments of human EDN (RNase 2) and its molecular dynamics simulation on the microsecond timescale, providing means to pursue this comparative atomic-scale functional and dynamical analysis by NMR and computation over multiple time frames.
Sequence analysis of the genome of carnation (Dianthus caryophyllus L.).

PubMed

Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

2014-06-01

The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. 'Francesco' was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568,887,315 bp, consisting of 45,088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16,644 bp and 60,737 bp, respectively, and the longest scaffold was 1,287,144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼ 98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. © The Author 2013. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Metagenomic Analysis of a Biphenyl-Degrading Soil Bacterial Consortium Reveals the Metabolic Roles of Specific Populations

PubMed Central

Garrido-Sanz, Daniel; Manzano, Javier; Martín, Marta; Redondo-Nieto, Miguel; Rivilla, Rafael

2018-01-01

Polychlorinated biphenyls (PCBs) are widespread persistent pollutants that cause several adverse health effects. Aerobic bioremediation of PCBs involves the activity of either one bacterial species or a microbial consortium. Using multiple species will enhance the range of PCB congeners co-metabolized since different PCB-degrading microorganisms exhibit different substrate specificity. We have isolated a bacterial consortium by successive enrichment culture using biphenyl (analog of PCBs) as the sole carbon and energy source. This consortium is able to grow on biphenyl, benzoate, and protocatechuate. Whole-community DNA extracted from the consortium was used to analyze biodiversity by Illumina sequencing of a 16S rRNA gene amplicon library and to determine the metagenome by whole-genome shotgun Illumina sequencing. Biodiversity analysis shows that the consortium consists of 24 operational taxonomic units (≥97% identity). The consortium is dominated by strains belonging to the genus Pseudomonas, but also contains betaproteobacteria and Rhodococcus strains. whole-genome shotgun (WGS) analysis resulted in contigs containing 78.3 Mbp of sequenced DNA, representing around 65% of the expected DNA in the consortium. Bioinformatic analysis of this metagenome has identified the genes encoding the enzymes implicated in three pathways for the conversion of biphenyl to benzoate and five pathways from benzoate to tricarboxylic acid (TCA) cycle intermediates, allowing us to model the whole biodegradation network. By genus assignment of coding sequences, we have also been able to determine that the three biphenyl to benzoate pathways are carried out by Rhodococcus strains. In turn, strains belonging to Pseudomonas and Bordetella are the main responsible of three of the benzoate to TCA pathways while the benzoate conversion into TCA cycle intermediates via benzoyl-CoA and the catechol meta-cleavage pathways are carried out by beta proteobacteria belonging to genera such as Achromobacter and Variovorax. We have isolated a Rhodococcus strain WAY2 from the consortium which contains the genes encoding the three biphenyl to benzoate pathways indicating that this strain is responsible for all the biphenyl to benzoate transformations. The presented results show that metagenomic analysis of consortia allows the identification of bacteria active in biodegradation processes and the assignment of specific reactions and pathways to specific bacterial groups. PMID:29497412
Phylogenetic characterization of a biogas plant microbial community integrating clone library 16S-rDNA sequences and metagenome sequence data obtained by 454-pyrosequencing.

PubMed

Kröber, Magdalena; Bekel, Thomas; Diaz, Naryttza N; Goesmann, Alexander; Jaenicke, Sebastian; Krause, Lutz; Miller, Dimitri; Runte, Kai J; Viehöver, Prisca; Pühler, Alfred; Schlüter, Andreas

2009-06-01

The phylogenetic structure of the microbial community residing in a fermentation sample from a production-scale biogas plant fed with maize silage, green rye and liquid manure was analysed by an integrated approach using clone library sequences and metagenome sequence data obtained by 454-pyrosequencing. Sequencing of 109 clones from a bacterial and an archaeal 16S-rDNA amplicon library revealed that the obtained nucleotide sequences are similar but not identical to 16S-rDNA database sequences derived from different anaerobic environments including digestors and bioreactors. Most of the bacterial 16S-rDNA sequences could be assigned to the phylum Firmicutes with the most abundant class Clostridia and to the class Bacteroidetes, whereas most archaeal 16S-rDNA sequences cluster close to the methanogen Methanoculleus bourgensis. Further sequences of the archaeal library most probably represent so far non-characterised species within the genus Methanoculleus. A similar result derived from phylogenetic analysis of mcrA clone sequences. The mcrA gene product encodes the alpha-subunit of methyl-coenzyme-M reductase involved in the final step of methanogenesis. BLASTn analysis applying stringent settings resulted in assignment of 16S-rDNA metagenome sequence reads to 62 16S-rDNA amplicon sequences thus enabling frequency of abundance estimations for 16S-rDNA clone library sequences. Ribosomal Database Project (RDP) Classifier processing of metagenome 16S-rDNA reads revealed abundance of the phyla Firmicutes, Bacteroidetes and Euryarchaeota and the orders Clostridiales, Bacteroidales and Methanomicrobiales. Moreover, a large fraction of 16S-rDNA metagenome reads could not be assigned to lower taxonomic ranks, demonstrating that numerous microorganisms in the analysed fermentation sample of the biogas plant are still unclassified or unknown.
Genomic and Genetic Diversity within the Pseudomonas fluorescens Complex

PubMed Central

Garrido-Sanz, Daniel; Meier-Kolthoff, Jan P.; Göker, Markus; Martín, Marta; Rivilla, Rafael; Redondo-Nieto, Miguel

2016-01-01

The Pseudomonas fluorescens complex includes Pseudomonas strains that have been taxonomically assigned to more than fifty different species, many of which have been described as plant growth-promoting rhizobacteria (PGPR) with potential applications in biocontrol and biofertilization. So far the phylogeny of this complex has been analyzed according to phenotypic traits, 16S rDNA, MLSA and inferred by whole-genome analysis. However, since most of the type strains have not been fully sequenced and new species are frequently described, correlation between taxonomy and phylogenomic analysis is missing. In recent years, the genomes of a large number of strains have been sequenced, showing important genomic heterogeneity and providing information suitable for genomic studies that are important to understand the genomic and genetic diversity shown by strains of this complex. Based on MLSA and several whole-genome sequence-based analyses of 93 sequenced strains, we have divided the P. fluorescens complex into eight phylogenomic groups that agree with previous works based on type strains. Digital DDH (dDDH) identified 69 species and 75 subspecies within the 93 genomes. The eight groups corresponded to clustering with a threshold of 31.8% dDDH, in full agreement with our MLSA. The Average Nucleotide Identity (ANI) approach showed inconsistencies regarding the assignment to species and to the eight groups. The small core genome of 1,334 CDSs and the large pan-genome of 30,848 CDSs, show the large diversity and genetic heterogeneity of the P. fluorescens complex. However, a low number of strains were enough to explain most of the CDSs diversity at core and strain-specific genomic fractions. Finally, the identification and analysis of group-specific genome and the screening for distinctive characters revealed a phylogenomic distribution of traits among the groups that provided insights into biocontrol and bioremediation applications as well as their role as PGPR. PMID:26915094
The Genomes OnLine Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata

PubMed Central

Pagani, Ioanna; Liolios, Konstantinos; Jansson, Jakob; Chen, I-Min A.; Smirnova, Tatyana; Nosrat, Bahador; Markowitz, Victor M.; Kyrpides, Nikos C.

2012-01-01

The Genomes OnLine Database (GOLD, http://www.genomesonline.org/) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2011, GOLD, now on version 4.0, contains information for 11 472 sequencing projects, of which 2907 have been completed and their sequence data has been deposited in a public repository. Out of these complete projects, 1918 are finished and 989 are permanent drafts. Moreover, GOLD contains information for 340 metagenome studies associated with 1927 metagenome samples. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about any (x) Sequence specification and beyond. PMID:22135293
The Genomes OnLine Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata.

PubMed

Pagani, Ioanna; Liolios, Konstantinos; Jansson, Jakob; Chen, I-Min A; Smirnova, Tatyana; Nosrat, Bahador; Markowitz, Victor M; Kyrpides, Nikos C

2012-01-01

The Genomes OnLine Database (GOLD, http://www.genomesonline.org/) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2011, GOLD, now on version 4.0, contains information for 11,472 sequencing projects, of which 2907 have been completed and their sequence data has been deposited in a public repository. Out of these complete projects, 1918 are finished and 989 are permanent drafts. Moreover, GOLD contains information for 340 metagenome studies associated with 1927 metagenome samples. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about any (x) Sequence specification and beyond.
Web-based sex diaries and young adult men who have sex with men: assessing feasibility, reactivity, and data agreement.

PubMed

Glick, Sara Nelson; Winer, Rachel L; Golden, Matthew R

2013-10-01

We compared quantitative diary data with retrospective survey data collected from a cohort of young adult men who have sex with men (MSM) in Seattle, Washington. Ninety-five MSM, aged 16-30 years, completed web-based surveys every 3 months and were randomized to 4 diary submission schedules: every 2 weeks, once a week, twice a week, or never. We calculated diary completion rates and assessed agreement between daily diary data and aggregate retrospective survey data for sexual behavior measures. Over 6 months, 78 % of participants completed at least 80 % of their diary days, and the 2-week schedule had the highest and most consistent completion rate. The majority of sexual behavior and substance use measures had strong agreement between the diary and retrospective survey data (i.e., kappa >0.80 or concordance correlation coefficient ≥0.75), although we observed poorer agreement for some measures of numbers of anal sex acts. There were no significant differences in mean responses across diary schedules. We observed some evidence of reactivity (i.e., a difference in behavior associated with diary completion). Participants not assigned diaries reported significantly more unprotected anal sex acts and were more likely to be newly diagnosed with HIV or another sexually transmitted infection compared to those assigned active diary schedules. This study suggests that sexual behavior data collected from young adult MSM during 3-month retrospective survey--an interval commonly used in sexual behavior research--are likely valid. Diaries, however, may have greater utility in sexual behavioral research in which counts, timing, sequence, or within-person variation over time are of particular import.
Maximizing Completion and Comprehension of Reading Assignments

ERIC Educational Resources Information Center

Owen, Leanne R.

2017-01-01

The author presents self-report data from students in three upper-level undergraduate courses to illustrate the comparative effectiveness of different out-of-class assessment approaches in promoting completion and comprehension of reading assignments. Students reported agreeing or strongly agreeing that all three assignments motivated them to…
Evaluating multiplexed next-generation sequencing as a method in palynology for mixed pollen samples.

PubMed

Keller, A; Danner, N; Grimmer, G; Ankenbrand, M; von der Ohe, K; von der Ohe, W; Rost, S; Härtel, S; Steffan-Dewenter, I

2015-03-01

The identification of pollen plays an important role in ecology, palaeo-climatology, honey quality control and other areas. Currently, expert knowledge and reference collections are essential to identify pollen origin through light microscopy. Pollen identification through molecular sequencing and DNA barcoding has been proposed as an alternative approach, but the assessment of mixed pollen samples originating from multiple plant species is still a tedious and error-prone task. Next-generation sequencing has been proposed to avoid this hindrance. In this study we assessed mixed pollen probes through next-generation sequencing of amplicons from the highly variable, species-specific internal transcribed spacer 2 region of nuclear ribosomal DNA. Further, we developed a bioinformatic workflow to analyse these high-throughput data with a newly created reference database. To evaluate the feasibility, we compared results from classical identification based on light microscopy from the same samples with our sequencing results. We assessed in total 16 mixed pollen samples, 14 originated from honeybee colonies and two from solitary bee nests. The sequencing technique resulted in higher taxon richness (deeper assignments and more identified taxa) compared to light microscopy. Abundance estimations from sequencing data were significantly correlated with counted abundances through light microscopy. Simulation analyses of taxon specificity and sensitivity indicate that 96% of taxa present in the database are correctly identifiable at the genus level and 70% at the species level. Next-generation sequencing thus presents a useful and efficient workflow to identify pollen at the genus and species level without requiring specialised palynological expert knowledge. © 2014 German Botanical Society and The Royal Botanical Society of the Netherlands.

The practical evaluation of DNA barcode efficacy.

PubMed

Spouge, John L; Mariño-Ramírez, Leonardo

2012-01-01

This chapter describes a workflow for measuring the efficacy of a barcode in identifying species. First, assemble individual sequence databases corresponding to each barcode marker. A controlled collection of taxonomic data is preferable to GenBank data, because GenBank data can be problematic, particularly when comparing barcodes based on more than one marker. To ensure proper controls when evaluating species identification, specimens not having a sequence in every marker database should be discarded. Second, select a computer algorithm for assigning species to barcode sequences. No algorithm has yet improved notably on assigning a specimen to the species of its nearest neighbor within a barcode database. Because global sequence alignments (e.g., with the Needleman-Wunsch algorithm, or some related algorithm) examine entire barcode sequences, they generally produce better species assignments than local sequence alignments (e.g., with BLAST). No neighboring method (e.g., global sequence similarity, global sequence distance, or evolutionary distance based on a global alignment) has yet shown a notable superiority in identifying species. Finally, "the probability of correct identification" (PCI) provides an appropriate measurement of barcode efficacy. The overall PCI for a data set is the average of the species PCIs, taken over all species in the data set. This chapter states explicitly how to calculate PCI, how to estimate its statistical sampling error, and how to use data on PCR failure to set limits on how much improvements in PCR technology can improve species identification.
Nuclear localization signal targeting to macronucleus and micronucleus in binucleated ciliate Tetrahymena thermophila.

PubMed

Iwamoto, Masaaki; Mori, Chie; Osakada, Hiroko; Koujin, Takako; Hiraoka, Yasushi; Haraguchi, Tokuko

2018-06-08

Ciliated protozoa possess two morphologically and functionally distinct nuclei: a macronucleus (MAC) and a micronucleus (MIC). The MAC is transcriptionally active and functions in all cellular events. The MIC is transcriptionally inactive during cell growth, but functions in meiotic events to produce progeny nuclei. Thus, these two nuclei must be distinguished by the nuclear proteins required for their distinct functions during cellular events such as cell proliferation and meiosis. To understand the mechanism of the nuclear transport specific to either MAC or MIC, we identified specific nuclear localization signals (NLSs) in two MAC- and MIC-specific nuclear proteins, macronuclear histone H1 and micronuclear linker histone-like protein (Mlh1), respectively. By expressing GFP-fused fragments of these proteins in Tetrahymena thermophila cells, two distinct regions in macronuclear histone H1 protein were assigned as independent MAC-specific NLSs and two distinct regions in Mlh1 protein were assigned as independent MIC-specific NLSs. These NLSs contain several essential lysine residues responsible for the MAC- and MIC-specific nuclear transport, but neither contains any consensus sequence with known monopartite or bipartite NLSs in other model organisms. Our findings contribute to understanding how specific nuclear targeting is achieved to perform distinct nuclear functions in binucleated ciliates. © 2018 The Authors. Genes to Cells published by Molecular Biology Society of Japan and John Wiley & Sons Australia, Ltd.
Identification of Mycobacterium spp. of veterinary importance using rpoB gene sequencing

PubMed Central

2011-01-01

Background Studies conducted on Mycobacterium spp. isolated from human patients indicate that sequencing of a 711 bp portion of the rpoB gene can be useful in assigning a species identity, particularly for members of the Mycobacterium avium complex (MAC). Given that MAC are important pathogens in livestock, companion animals, and zoo/exotic animals, we were interested in evaluating the use of rpoB sequencing for identification of Mycobacterium isolates of veterinary origin. Results A total of 386 isolates, collected over 2008 - June 2011 from 378 animals (amphibians, reptiles, birds, and mammals) underwent PCR and sequencing of a ~ 711 bp portion of the rpoB gene; 310 isolates (80%) were identified to the species level based on similarity at ≥ 98% with a reference sequence. The remaining 76 isolates (20%) displayed < 98% similarity with reference sequences and were assigned to a clade based on their location in a neighbor-joining tree containing reference sequences. For a subset of 236 isolates that received both 16S rRNA and rpoB sequencing, 167 (70%) displayed a similar species/clade assignation for both sequencing methods. For the remaining 69 isolates, species/clade identities were different with each sequencing method. Mycobacterium avium subsp. hominissuis was the species most frequently isolated from specimens from pigs, cervids, companion animals, cattle, and exotic/zoo animals. Conclusions rpoB sequencing proved useful in identifying Mycobacterium isolates of veterinary origin to clade, species, or subspecies levels, particularly for assemblages (such as the MAC) where 16S rRNA sequencing alone is not adequate to demarcate these taxa. rpoB sequencing can represent a cost-effective identification tool suitable for routine use in the veterinary diagnostic laboratory. PMID:22118247
Allele-specific copy-number discovery from whole-genome and whole-exome sequencing

PubMed Central

Wang, WeiBo; Wang, Wei; Sun, Wei; Crowley, James J.; Szatkiewicz, Jin P.

2015-01-01

Copy-number variants (CNVs) are a major form of genetic variation and a risk factor for various human diseases, so it is crucial to accurately detect and characterize them. It is conceivable that allele-specific reads from high-throughput sequencing data could be leveraged to both enhance CNV detection and produce allele-specific copy number (ASCN) calls. Although statistical methods have been developed to detect CNVs using whole-genome sequence (WGS) and/or whole-exome sequence (WES) data, information from allele-specific read counts has not yet been adequately exploited. In this paper, we develop an integrated method, called AS-GENSENG, which incorporates allele-specific read counts in CNV detection and estimates ASCN using either WGS or WES data. To evaluate the performance of AS-GENSENG, we conducted extensive simulations, generated empirical data using existing WGS and WES data sets and validated predicted CNVs using an independent methodology. We conclude that AS-GENSENG not only predicts accurate ASCN calls but also improves the accuracy of total copy number calls, owing to its unique ability to exploit information from both total and allele-specific read counts while accounting for various experimental biases in sequence data. Our novel, user-friendly and computationally efficient method and a complete analytic protocol is freely available at https://sourceforge.net/projects/asgenseng/. PMID:25883151
[Sequence-based typing of enviromental Legionella pneumophila isolates in Guangzhou].

PubMed

Zhang, Ying; Qu, Pinghua; Zhang, Jian; Chen, Shouyi

2011-03-01

To characterize the genes of Legionella pneumophila isolated from different water source in Guangzhou from 2006 to 2009. To genotype the strains by using sequence-based typing (SBT) scheme. In total 44 L. pneumophila strains were identified by SBT with 7 diversifying genes of flaA, asd, mip, pilE, mompS, proA and neuA. Analysis of the amplicons sequence was taken in the European Working Group for Legionella Infections (EWGLI) international SBT database to obtain the allelic profiles and sequence types (STs). Serogroups were typed by latex agglutination test. Data from SBT revealed a high diversity among the strains and ST01 accounts for 30% (13/ 44). Fifteen new STs were discovered from 20 STs and 2 of them were newly assigned (ST887 and ST888) by EWGLI. SBT Phylogenetic tree was generated by SplitsTree and BURST programs. High diversity and specificity were observed of the L. pneumophila strains in Guangzhou. SBT is useful for L. pneumophila genomic study and epidemiological surveillance.
MToolBox: a highly automated pipeline for heteroplasmy annotation and prioritization analysis of human mitochondrial variants in high-throughput sequencing

PubMed Central

Diroma, Maria Angela; Santorsola, Mariangela; Guttà, Cristiano; Gasparre, Giuseppe; Picardi, Ernesto; Pesole, Graziano; Attimonelli, Marcella

2014-01-01

Motivation: The increasing availability of mitochondria-targeted and off-target sequencing data in whole-exome and whole-genome sequencing studies (WXS and WGS) has risen the demand of effective pipelines to accurately measure heteroplasmy and to easily recognize the most functionally important mitochondrial variants among a huge number of candidates. To this purpose, we developed MToolBox, a highly automated pipeline to reconstruct and analyze human mitochondrial DNA from high-throughput sequencing data. Results: MToolBox implements an effective computational strategy for mitochondrial genomes assembling and haplogroup assignment also including a prioritization analysis of detected variants. MToolBox provides a Variant Call Format file featuring, for the first time, allele-specific heteroplasmy and annotation files with prioritized variants. MToolBox was tested on simulated samples and applied on 1000 Genomes WXS datasets. Availability and implementation: MToolBox package is available at https://sourceforge.net/projects/mtoolbox/. Contact: marcella.attimonelli@uniba.it Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25028726
RaptorX server: a resource for template-based protein structure modeling.

PubMed

Källberg, Morten; Margaryan, Gohar; Wang, Sheng; Ma, Jianzhu; Xu, Jinbo

2014-01-01

Assigning functional properties to a newly discovered protein is a key challenge in modern biology. To this end, computational modeling of the three-dimensional atomic arrangement of the amino acid chain is often crucial in determining the role of the protein in biological processes. We present a community-wide web-based protocol, RaptorX server ( http://raptorx.uchicago.edu ), for automated protein secondary structure prediction, template-based tertiary structure modeling, and probabilistic alignment sampling.Given a target sequence, RaptorX server is able to detect even remotely related template sequences by means of a novel nonlinear context-specific alignment potential and probabilistic consistency algorithm. Using the protocol presented here it is thus possible to obtain high-quality structural models for many target protein sequences when only distantly related protein domains have experimentally solved structures. At present, RaptorX server can perform secondary and tertiary structure prediction of a 200 amino acid target sequence in approximately 30 min.
Characterizing visible and invisible cell wall mutant phenotypes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Carpita, Nicholas C.; McCann, Maureen C.

2015-04-06

About 10% of a plant's genome is devoted to generating the protein machinery to synthesize, remodel, and deconstruct the cell wall. High-throughput genome sequencing technologies have enabled a reasonably complete inventory of wall-related genes that can be assembled into families of common evolutionary origin. Assigning function to each gene family member has been aided immensely by identification of mutants with visible phenotypes or by chemical and spectroscopic analysis of mutants with ‘invisible’ phenotypes of modified cell wall composition and architecture that do not otherwise affect plant growth or development. This review connects the inference of gene function on the basismore » of deviation from the wild type in genetic functional analyses to insights provided by modern analytical techniques that have brought us ever closer to elucidating the sequence structures of the major polysaccharide components of the plant cell wall.« less
Identification of the Coumermycin A1 Biosynthetic Gene Cluster of Streptomyces rishiriensis DSM 40489

PubMed Central

Wang, Zhao-Xin; Li, Shu-Ming; Heide, Lutz

2000-01-01

The biosynthetic gene cluster of the aminocoumarin antibiotic coumermycin A1 was cloned by screening of a cosmid library of Streptomyces rishiriensis DSM 40489 with heterologous probes from a dTDP-glucose 4,6-dehydratase gene, involved in deoxysugar biosynthesis, and from the aminocoumarin resistance gyrase gene gyrBr. Sequence analysis of a 30.8-kb region upstream of gyrBr revealed the presence of 28 complete open reading frames (ORFs). Fifteen of the identified ORFs showed, on average, 84% identity to corresponding ORFs in the biosynthetic gene cluster of novobiocin, another aminocoumarin antibiotic. Possible functions of 17 ORFs in the biosynthesis of coumermycin A1 could be assigned by comparison with sequences in GenBank. Experimental proof for the function of the identified gene cluster was provided by an insertional gene inactivation experiment, which resulted in an abolishment of coumermycin A1 production. PMID:11036020
Development and characterization of a complete set of Triticum aestivum-Roegneria ciliaris disomic addition lines.

PubMed

Kong, Lingna; Song, Xinying; Xiao, Jin; Sun, Haojie; Dai, Keli; Lan, Caixia; Singh, Pawan; Yuan, Chunxia; Zhang, Shouzhong; Singh, Ravi; Wang, Haiyan; Wang, Xiue

2018-05-31

A complete set wheat-R. ciliaris disomic addition lines (DALs) were characterized and the homoeologous groups and genome affinities of R. ciliaris chromosomes were determined. Wild relatives are rich gene resources for cultivated wheat. The development of alien addition chromosome lines not only greatly broadens the genetic diversity, but also provides genetic stocks for comparative genomics studies. Roegneria ciliaris (genome S c S c Y c Y c ), a tetraploid wild relative of wheat, is tolerant or resistant to many abiotic and biotic stresses. To develop a complete set of wheat-R. ciliaris disomic addition lines (DALs), we undertook a euplasmic backcrossing program to overcome allocytoplasmic effects and preferential chromosome transmission. To improve the efficiency of identifying chromosomes from S c and Y c , we established techniques including sequential genomic in situ hybridization/fluorescence in situ hybridization (FISH) and molecular marker analysis. Fourteen DALs of wheat, each containing one pair of R. ciliaris chromosomes pairs, were characterized by FISH using four repetitive sequences [pTa794, pTa71, RcAfa and (GAA) 10 ] as probes. One hundred and sixty-two R. ciliaris-specific markers were developed. FISH and marker analysis enabled us to assign the homoeologous groups and genome affinities of R. ciliaris chromosomes. FHB resistance evaluation in successive five growth seasons showed that the amphiploid, DA2Y c , DA5Y c and DA6S c had improved FHB resistance, indicating their potential value in wheat improvement. The 14 DALs are likely new gene resources and will be phenotyped for more agronomic performances traits.
The complete nucleotide sequence of the barley yellow dwarf GPV isolate from China shows that it is a new member of the genus Polerovirus.

PubMed

Zhang, Wenwei; Cheng, Zhuomin; Xu, Lei; Wu, Maosen; Waterhouse, Peter; Zhou, Guanghe; Li, Shifang

2009-01-01

The complete nucleotide sequence of the ssRNA genome of a Chinese GPV isolate of barley yellow dwarf virus (BYDV) was determined. It comprised 5673 nucleotides, and the deduced genome organization resembled that of members of the genus Polerovirus. It was most closely related to cereal yellow dwarf virus-RPV (77% nt identity over the entire genome; coat protein amino acid identity 79%). The GPV isolate also differs in vector specificity from other BYDV strains. Biological properties, phylogenetic analyses and detailed sequence comparisons suggest that GPV should be considered a member of a new species within the genus, and the name Wheat yellow dwarf virus-GPV is proposed.
Gene calling and bacterial genome annotation with BG7.

PubMed

Tobes, Raquel; Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Kovach, Evdokim; Alekhin, Alexey; Pareja, Eduardo

2015-01-01

New massive sequencing technologies are providing many bacterial genome sequences from diverse taxa but a refined annotation of these genomes is crucial for obtaining scientific findings and new knowledge. Thus, bacterial genome annotation has emerged as a key point to investigate in bacteria. Any efficient tool designed specifically to annotate bacterial genomes sequenced with massively parallel technologies has to consider the specific features of bacterial genomes (absence of introns and scarcity of nonprotein-coding sequence) and of next-generation sequencing (NGS) technologies (presence of errors and not perfectly assembled genomes). These features make it convenient to focus on coding regions and, hence, on protein sequences that are the elements directly related with biological functions. In this chapter we describe how to annotate bacterial genomes with BG7, an open-source tool based on a protein-centered gene calling/annotation paradigm. BG7 is specifically designed for the annotation of bacterial genomes sequenced with NGS. This tool is sequence error tolerant maintaining their capabilities for the annotation of highly fragmented genomes or for annotating mixed sequences coming from several genomes (as those obtained through metagenomics samples). BG7 has been designed with scalability as a requirement, with a computing infrastructure completely based on cloud computing (Amazon Web Services).
A bioinformatic pipeline for identifying informative SNP panels for parentage assignment from RADseq data.

PubMed

Andrews, Kimberly R; Adams, Jennifer R; Cassirer, E Frances; Plowright, Raina K; Gardner, Colby; Dwire, Maggie; Hohenlohe, Paul A; Waits, Lisette P

2018-06-05

The development of high-throughput sequencing technologies is dramatically increasing the use of single nucleotide polymorphisms (SNPs) across the field of genetics, but most parentage studies of wild populations still rely on microsatellites. We developed a bioinformatic pipeline for identifying SNP panels that are informative for parentage analysis from restriction site-associated DNA sequencing (RADseq) data. This pipeline includes options for analysis with or without a reference genome, and provides methods to maximize genotyping accuracy and select sets of unlinked loci that have high statistical power. We test this pipeline on small populations of Mexican gray wolf and bighorn sheep, for which parentage analyses are expected to be challenging due to low genetic diversity and the presence of many closely related individuals. We compare the results of parentage analysis across SNP panels generated with or without the use of a reference genome, and between SNPs and microsatellites. For Mexican gray wolf, we conducted parentage analyses for 30 pups from a single cohort where samples were available from 64% of possible mothers and 53% of possible fathers, and the accuracy of parentage assignments could be estimated because true identities of parents were known a priori based on field data. For bighorn sheep, we conducted maternity analyses for 39 lambs from five cohorts where 77% of possible mothers were sampled, but true identities of parents were unknown. Analyses with and without a reference genome produced SNP panels with >95% parentage assignment accuracy for Mexican gray wolf, outperforming microsatellites at 78% accuracy. Maternity assignments were completely consistent across all SNP panels for the bighorn sheep, and were 74.4% consistent with assignments from microsatellites. Accuracy and consistency of parentage analysis were not reduced when using as few as 284 SNPs for Mexican gray wolf and 142 SNPs for bighorn sheep, indicating our pipeline can be used to develop SNP genotyping assays for parentage analysis with relatively small numbers of loci. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
Coordinating Environmental Genomics and Geochemistry Reveals Metabolic Transitions in a Hot Spring Ecosystem

PubMed Central

Swingley, Wesley D.; Meyer-Dombard, D’Arcy R.; Shock, Everett L.; Alsop, Eric B.; Falenski, Heinz D.; Havig, Jeff R.; Raymond, Jason

2012-01-01

We have constructed a conceptual model of biogeochemical cycles and metabolic and microbial community shifts within a hot spring ecosystem via coordinated analysis of the “Bison Pool” (BP) Environmental Genome and a complementary contextual geochemical dataset of ∼75 geochemical parameters. 2,321 16S rRNA clones and 470 megabases of environmental sequence data were produced from biofilms at five sites along the outflow of BP, an alkaline hot spring in Sentinel Meadow (Lower Geyser Basin) of Yellowstone National Park. This channel acts as a >22 m gradient of decreasing temperature, increasing dissolved oxygen, and changing availability of biologically important chemical species, such as those containing nitrogen and sulfur. Microbial life at BP transitions from a 92°C chemotrophic streamer biofilm community in the BP source pool to a 56°C phototrophic mat community. We improved automated annotation of the BP environmental genomes using BLAST-based Markov clustering. We have also assigned environmental genome sequences to individual microbial community members by complementing traditional homology-based assignment with nucleotide word-usage algorithms, allowing more than 70% of all reads to be assigned to source organisms. This assignment yields high genome coverage in dominant community members, facilitating reconstruction of nearly complete metabolic profiles and in-depth analysis of the relation between geochemical and metabolic changes along the outflow. We show that changes in environmental conditions and energy availability are associated with dramatic shifts in microbial communities and metabolic function. We have also identified an organism constituting a novel phylum in a metabolic “transition” community, located physically between the chemotroph- and phototroph-dominated sites. The complementary analysis of biogeochemical and environmental genomic data from BP has allowed us to build ecosystem-based conceptual models for this hot spring, reconstructing whole metabolic networks in order to illuminate community roles in shaping and responding to geochemical variability. PMID:22675512
Mouse mammary tumor virus-like gene sequences are present in lung patient specimens

PubMed Central

2011-01-01

Background Previous studies have reported on the presence of Murine Mammary Tumor Virus (MMTV)-like gene sequences in human cancer tissue specimens. Here, we search for MMTV-like gene sequences in lung diseases including carcinomas specimens from a Mexican population. This study was based on our previous study reporting that the INER51 lung cancer cell line, from a pleural effusion of a Mexican patient, contains MMTV-like env gene sequences. Results The MMTV-like env gene sequences have been detected in three out of 18 specimens studied, by PCR using a specific set of MMTV-like primers. The three identified MMTV-like gene sequences, which were assigned as INER6, HZ101, and HZ14, were 99%, 98%, and 97% homologous, respectively, as compared to GenBank sequence accession number AY161347. The INER6 and HZ-101 samples were isolated from lung cancer specimens, and the HZ-14 was isolated from an acute inflammatory lung infiltrate sample. Two of the env sequences exhibited disruption of the reading frame due to mutations. Conclusion In summary, we identified the presence of MMTV-like gene sequences in 2 out of 11 (18%) of the lung carcinomas and 1 out of 7 (14%) of acute inflamatory lung infiltrate specimens studied of a Mexican Population. PMID:21943279
Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo) genome assembly and analysis

USDA-ARS?s Scientific Manuscript database

Next-generation sequencing technologies were used to rapidly and efficiently sequence the genome of the domestic turkey (Meleagris gallopavo). The current genome assembly (~1.1 Gb) includes 917 Mb of sequence assigned to chromosomes. Innate heterozygosity of the sequenced bird allowed discovery of...
Determination of the Structural Basis of Antibody Diversity Using NMR

DTIC Science & Technology

1989-06-15

Tomasello , J., & Whitaker, the time - in terms of the true first-order off rate constants M. (1987) Biochemistry 26, 6058-6064. kso and kDO and the fractional...Levitt. M., spectra is unlikely to yield sequence-specific assignments for McConnell, H. M., Rule. G. S., Tomasello , J. & Whittaker. M. AN02. The...Leahy, D. J., Levitt, M., McConnell, H. M., Rule, G. S., Tomasello , J., & Whittaker, M. (1987) Biochemistry 26, 6058-6064. Leahy, D. J., Rule, G. S
Hepatitis E Virus Genotype 3 Diversity: Phylogenetic Analysis and Presence of Subtype 3b in Wild Boar in Europe

PubMed Central

Vina-Rodriguez, Ariel; Schlosser, Josephine; Becher, Dietmar; Kaden, Volker; Groschup, Martin H.; Eiden, Martin

2015-01-01

An increasing number of indigenous cases of hepatitis E caused by genotype 3 viruses (HEV-3) have been diagnosed all around the word, particularly in industrialized countries. Hepatitis E is a zoonotic disease and accumulating evidence indicates that domestic pigs and wild boars are the main reservoirs of HEV-3. A detailed analysis of HEV-3 subtypes could help to determine the interplay of human activity, the role of animals as reservoirs and cross species transmission. Although complete genome sequences are most appropriate for HEV subtype determination, in most cases only partial genomic sequences are available. We therefore carried out a subtype classification analysis, which uses regions from all three open reading frames of the genome. Using this approach, more than 1000 published HEV-3 isolates were subtyped. Newly recovered HEV partial sequences from hunted German wild boars were also included in this study. These sequences were assigned to genotype 3 and clustered within subtype 3a, 3i and, unexpectedly, one of them within the subtype 3b, a first non-human report of this subtype in Europe. PMID:26008708
HoloVir: A Workflow for Investigating the Diversity and Function of Viruses in Invertebrate Holobionts

PubMed Central

Laffy, Patrick W.; Wood-Charlson, Elisha M.; Turaev, Dmitrij; Weynberg, Karen D.; Botté, Emmanuelle S.; van Oppen, Madeleine J. H.; Webster, Nicole S.; Rattei, Thomas

2016-01-01

Abundant bioinformatics resources are available for the study of complex microbial metagenomes, however their utility in viral metagenomics is limited. HoloVir is a robust and flexible data analysis pipeline that provides an optimized and validated workflow for taxonomic and functional characterization of viral metagenomes derived from invertebrate holobionts. Simulated viral metagenomes comprising varying levels of viral diversity and abundance were used to determine the optimal assembly and gene prediction strategy, and multiple sequence assembly methods and gene prediction tools were tested in order to optimize our analysis workflow. HoloVir performs pairwise comparisons of single read and predicted gene datasets against the viral RefSeq database to assign taxonomy and additional comparison to phage-specific and cellular markers is undertaken to support the taxonomic assignments and identify potential cellular contamination. Broad functional classification of the predicted genes is provided by assignment of COG microbial functional category classifications using EggNOG and higher resolution functional analysis is achieved by searching for enrichment of specific Swiss-Prot keywords within the viral metagenome. Application of HoloVir to viral metagenomes from the coral Pocillopora damicornis and the sponge Rhopaloeides odorabile demonstrated that HoloVir provides a valuable tool to characterize holobiont viral communities across species, environments, or experiments. PMID:27375564
Tissue-specific Proteogenomic Analysis of Plutella xylostella Larval Midgut Using a Multialgorithm Pipeline*

PubMed Central

Zhu, Xun; Xie, Shangbo; Armengaud, Jean; Xie, Wen; Guo, Zhaojiang; Kang, Shi; Wu, Qingjun; Wang, Shaoli; Xia, Jixing; He, Rongjun; Zhang, Youjun

2016-01-01

The diamondback moth, Plutella xylostella (L.), is the major cosmopolitan pest of brassica and other cruciferous crops. Its larval midgut is a dynamic tissue that interfaces with a wide variety of toxicological and physiological processes. The draft sequence of the P. xylostella genome was recently released, but its annotation remains challenging because of the low sequence coverage of this branch of life and the poor description of exon/intron splicing rules for these insects. Peptide sequencing by computational assignment of tandem mass spectra to genome sequence information provides an experimental independent approach for confirming or refuting protein predictions, a concept that has been termed proteogenomics. In this study, we carried out an in-depth proteogenomic analysis to complement genome annotation of P. xylostella larval midgut based on shotgun HPLC-ESI-MS/MS data by means of a multialgorithm pipeline. A total of 876,341 tandem mass spectra were searched against the predicted P. xylostella protein sequences and a whole-genome six-frame translation database. Based on a data set comprising 2694 novel genome search specific peptides, we discovered 439 novel protein-coding genes and corrected 128 existing gene models. To get the most accurate data to seed further insect genome annotation, more than half of the novel protein-coding genes, i.e. 235 over 439, were further validated after RT-PCR amplification and sequencing of the corresponding transcripts. Furthermore, we validated 53 novel alternative splicings. Finally, a total of 6764 proteins were identified, resulting in one of the most comprehensive proteogenomic study of a nonmodel animal. As the first tissue-specific proteogenomics analysis of P. xylostella, this study provides the fundamental basis for high-throughput proteomics and functional genomics approaches aimed at deciphering the molecular mechanisms of resistance and controlling this pest. PMID:26902207

The histidine permease gene (HIP1) of Saccharomyces cerevisiae.

PubMed

Tanaka, J; Fink, G R

1985-01-01

The histidine-specific permease gene (HIP1) of Saccharomyces cerevisiae has been mapped, cloned, and sequenced. The HIP1 gene maps to the right arm of chromosome VII, approx. 11 cM distal to the ADE3 gene. The gene was isolated as an 8.6-kb BamHI-Sau3A fragment by complementation of the histidine-specific permease deficiency in recipient yeast cells. We sequenced a 2.4-kb subfragment of this BamHI-Sau3A fragment containing the HIP1 gene and identified a 1596-bp open reading frame (ORF). We confirmed the assignment of the 1596-bp ORF as the HIP1 coding sequence by sequencing a hip1 nonsense mutation. Analysis of the amino acid (aa) sequence of the HIP1 gene reveals several hydrophobic stretches, but shows no obvious N-terminal signal peptide. We have constructed a deletion of the HIP1 gene in vitro and replaced the wild-type copy of the gene with this deletion. The hip1 deletion mutant can grow when it is supplemented with 30 mM histidine, 50 times the amount required for the growth of HIP1 cells. Revertants of this deletion mutant able to grow on a normal level of histidine arise by mutation in unlinked genes. Both these observations suggest that there are additional, low-affinity pathways for histidine uptake.
DAMe: a toolkit for the initial processing of datasets with PCR replicates of double-tagged amplicons for DNA metabarcoding analyses.

PubMed

Zepeda-Mendoza, Marie Lisandra; Bohmann, Kristine; Carmona Baez, Aldo; Gilbert, M Thomas P

2016-05-03

DNA metabarcoding is an approach for identifying multiple taxa in an environmental sample using specific genetic loci and taxa-specific primers. When combined with high-throughput sequencing it enables the taxonomic characterization of large numbers of samples in a relatively time- and cost-efficient manner. One recent laboratory development is the addition of 5'-nucleotide tags to both primers producing double-tagged amplicons and the use of multiple PCR replicates to filter erroneous sequences. However, there is currently no available toolkit for the straightforward analysis of datasets produced in this way. We present DAMe, a toolkit for the processing of datasets generated by double-tagged amplicons from multiple PCR replicates derived from an unlimited number of samples. Specifically, DAMe can be used to (i) sort amplicons by tag combination, (ii) evaluate PCR replicates dissimilarity, and (iii) filter sequences derived from sequencing/PCR errors, chimeras, and contamination. This is attained by calculating the following parameters: (i) sequence content similarity between the PCR replicates from each sample, (ii) reproducibility of each unique sequence across the PCR replicates, and (iii) copy number of the unique sequences in each PCR replicate. We showcase the insights that can be obtained using DAMe prior to taxonomic assignment, by applying it to two real datasets that vary in their complexity regarding number of samples, sequencing libraries, PCR replicates, and used tag combinations. Finally, we use a third mock dataset to demonstrate the impact and importance of filtering the sequences with DAMe. DAMe allows the user-friendly manipulation of amplicons derived from multiple samples with PCR replicates built in a single or multiple sequencing libraries. It allows the user to: (i) collapse amplicons into unique sequences and sort them by tag combination while retaining the sample identifier and copy number information, (ii) identify sequences carrying unused tag combinations, (iii) evaluate the comparability of PCR replicates of the same sample, and (iv) filter tagged amplicons from a number of PCR replicates using parameters of minimum length, copy number, and reproducibility across the PCR replicates. This enables an efficient analysis of complex datasets, and ultimately increases the ease of handling datasets from large-scale studies.
A gene-specific non-enhancer sequence is critical for expression from the promoter of the small heat shock protein gene αB-crystallin

PubMed Central

2014-01-01

Background Deciphering of the information content of eukaryotic promoters has remained confined to universal landmarks and conserved sequence elements such as enhancers and transcription factor binding motifs, which are considered sufficient for gene activation and regulation. Gene-specific sequences, interspersed between the canonical transacting factor binding sites or adjoining them within a promoter, are generally taken to be devoid of any regulatory information and have therefore been largely ignored. An unanswered question therefore is, do gene-specific sequences within a eukaryotic promoter have a role in gene activation? Here, we present an exhaustive experimental analysis of a gene-specific sequence adjoining the heat shock element (HSE) in the proximal promoter of the small heat shock protein gene, αB-crystallin (cryab). These sequences are highly conserved between the rodents and the humans. Results Using human retinal pigment epithelial cells in culture as the host, we have identified a 10-bp gene-specific promoter sequence (GPS), which, unlike an enhancer, controls expression from the promoter of this gene, only when in appropriate position and orientation. Notably, the data suggests that GPS in comparison with the HSE works in a context-independent fashion. Additionally, when moved upstream, about a nucleosome length of DNA (−154 bp) from the transcription start site (TSS), the activity of the promoter is markedly inhibited, suggesting its involvement in local promoter access. Importantly, we demonstrate that deletion of the GPS results in complete loss of cryab promoter activity in transgenic mice. Conclusions These data suggest that gene-specific sequences such as the GPS, identified here, may have critical roles in regulating gene-specific activity from eukaryotic promoters. PMID:24589182
Cryogenic terahertz spectrum of (+)-methamphetamine hydrochloride and assignment using solid-state density functional theory.

PubMed

Hakey, Patrick M; Allis, Damian G; Ouellette, Wayne; Korter, Timothy M

2009-04-30

The cryogenic terahertz spectrum of (+)-methamphetamine hydrochloride from 10.0 to 100.0 cm(-1) is presented, as is the complete structural analysis and vibrational assignment of the compound using solid-state density functional theory. This cryogenic investigation reveals multiple spectral features that were not previously reported in room-temperature terahertz studies of the title compound. Modeling of the compound employed eight density functionals utilizing both solid-state and isolated-molecule methods. The results clearly indicate the necessity of solid-state simulations for the accurate assignment of solid-state THz spectra. Assignment of the observed spectral features to specific atomic motions is based on the BP density functional, which provided the best-fit solid-state simulation of the experimental spectrum. The seven experimental spectral features are the result of thirteen infrared-active vibrational modes predicted at a BP/DNP level of theory with more than 90% of the total spectral intensity associated with external crystal vibrations.
NMR studies on the structure and dynamics of lac operator DNA

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lee, S.C.

Nuclear Magnetic Resonance spectroscopy was used to elucidate the relationships between structure, dynamics and function of the gene regulatory sequence corresponding to the lactose operon operator of Escherichia coli. The length of the DNA fragments examined varied from 13 to 36 base pair, containing all or part of the operator sequence. These DNA fragments are either derived genetically or synthesized chemically. Resonances of the imino protons were assigned by one dimensional inter-base pair nuclear Overhauser enhancement (NOE) measurements. Imino proton exchange rates were measured by saturation recovery methods. Results from the kinetic measurements show an interesting dynamic heterogeneity with amore » maximum opening rate centered about a GTG/CAC sequence which correlates with the biological function of the operator DNA. This particular three base pair sequence occurs frequently and often symmetrically in prokaryotic nd eukaryotic DNA sites where one anticipates specific protein interaction for gene regulation. The observed sequence dependent imino proton exchange rate may be a reflection of variation of the local structure of regulatory DNA. The results also indicate that the observed imino proton exchange rates are length dependent.« less
Single-cell template strand sequencing by Strand-seq enables the characterization of individual homologs.

PubMed

Sanders, Ashley D; Falconer, Ester; Hills, Mark; Spierings, Diana C J; Lansdorp, Peter M

2017-06-01

The ability to distinguish between genome sequences of homologous chromosomes in single cells is important for studies of copy-neutral genomic rearrangements (such as inversions and translocations), building chromosome-length haplotypes, refining genome assemblies, mapping sister chromatid exchange events and exploring cellular heterogeneity. Strand-seq is a single-cell sequencing technology that resolves the individual homologs within a cell by restricting sequence analysis to the DNA template strands used during DNA replication. This protocol, which takes up to 4 d to complete, relies on the directionality of DNA, in which each single strand of a DNA molecule is distinguished based on its 5'-3' orientation. Culturing cells in a thymidine analog for one round of cell division labels nascent DNA strands, allowing for their selective removal during genomic library construction. To preserve directionality of template strands, genomic preamplification is bypassed and labeled nascent strands are nicked and not amplified during library preparation. Each single-cell library is multiplexed for pooling and sequencing, and the resulting sequence data are aligned, mapping to either the minus or plus strand of the reference genome, to assign template strand states for each chromosome in the cell. The major adaptations to conventional single-cell sequencing protocols include harvesting of daughter cells after a single round of BrdU incorporation, bypassing of whole-genome amplification, and removal of the BrdU + strand during Strand-seq library preparation. By sequencing just template strands, the structure and identity of each homolog are preserved.
Genome structure of Rosa multiflora, a wild ancestor of cultivated roses

PubMed Central

Nakamura, Noriko; Hirakawa, Hideki; Sato, Shusei; Otagaki, Shungo; Matsumoto, Shogo; Tabata, Satoshi; Tanaka, Yoshikazu

2018-01-01

Abstract The draft genome sequence of a wild rose (Rosa multiflora Thunb.) was determined using Illumina MiSeq and HiSeq platforms. The total length of the scaffolds was 739,637,845 bp, consisting of 83,189 scaffolds, which was close to the 711 Mbp length estimated by k-mer analysis. N50 length of the scaffolds was 90,830 bp, and extent of the longest was 1,133,259 bp. The average GC content of the scaffolds was 38.9%. After gene prediction, 67,380 candidates exhibiting sequence homology to known genes and domains were extracted, which included complete and partial gene structures. This large number of genes for a diploid plant may reflect heterogeneity of the genome originating from self-incompatibility in R. multiflora. According to CEGMA analysis, 91.9% and 98.0% of the core eukaryotic genes were completely and partially conserved in the scaffolds, respectively. Genes presumably involved in flower color, scent and flowering are assigned. The results of this study will serve as a valuable resource for fundamental and applied research in the rose, including breeding and phylogenetic study of cultivated roses. PMID:29045613
Evaluation of a shared-work program for reducing assistance provided to supported workers with severe multiple disabilities.

PubMed

Parsons, Marsha B; Reid, Dennis H; Green, Carolyn W; Browning, Leah B; Hensley, Mary B

2002-01-01

Concern has been expressed recently regarding the need to enhance the performance of individuals with highly significant disabilities in community-based, supported jobs. We evaluated a shared-work program for reducing job coach assistance provided to three workers with severe multiple disabilities in a publishing company. Following systematic observations of the assistance provided as each worker worked on entire job tasks, steps comprising the tasks were then re-assigned across workers. The re-assignment involved assigning each worker only those task steps for which the respective worker received the least amount of assistance (e.g., re-assigning steps that a worker could not complete due to physical disabilities), and ensuring the entire tasks were still completed by combining steps performed by all three workers. The shared-work program was accompanied by reductions in job coach assistance provided to each worker. Work productivity of the supported workers initially decreased but then increased to a level equivalent to the higher ranges of baseline productivity. These results suggested that the shared-work program appears to represent a viable means of enhancing supported work performance of people with severe multiple disabilities in some types of community jobs. Future research needs discussed focus on evaluating shared-work approaches with other jobs, and developing additional community work models specifically for people with highly significant disabilities.
A genomewide survey of basic helix–loop–helix factors in Drosophila

PubMed Central

Moore, Adrian W.; Barbel, Sandra; Jan, Lily Yeh; Jan, Yuh Nung

2000-01-01

The basic helix–loop–helix (bHLH) transcription factors play important roles in the specification of tissue type during the development of animals. We have used the information contained in the recently published genomic sequence of Drosophila melanogaster to identify 12 additional bHLH proteins. By sequence analysis we have assigned these proteins to families defined by Atonal, Hairy-Enhancer of Split, Hand, p48, Mesp, MYC/USF, and the bHLH-Per, Arnt, Sim (PAS) domain. In addition, one single protein represents a unique family of bHLH proteins. mRNA in situ analysis demonstrates that the genes encoding these proteins are expressed in several tissue types but are particularly concentrated in the developing nervous system and mesoderm. PMID:10973473
The complete genome sequencing of Prevotella intermedia strain OMA14 and a subsequent fine-scale, intra-species genomic comparison reveal an unusual amplification of conjugative and mobile transposons and identify a novel Prevotella-lineage-specific repeat

PubMed Central

Naito, Mariko; Ogura, Yoshitoshi; Itoh, Takehiko; Shoji, Mikio; Okamoto, Masaaki; Hayashi, Tetsuya; Nakayama, Koji

2016-01-01

Prevotella intermedia is a pathogenic bacterium involved in periodontal diseases. Here, we present the complete genome sequence of a clinical strain, OMA14, of this bacterium along with the results of comparative genome analysis with strain 17 of the same species whose genome has also been sequenced, but not fully analysed yet. The genomes of both strains consist of two circular chromosomes: the larger chromosomes are similar in size and exhibit a high overall linearity of gene organizations, whereas the smaller chromosomes show a significant size variation and have undergone remarkable genome rearrangements. Unique features of the Pre. intermedia genomes are the presence of a remarkable number of essential genes on the second chromosomes and the abundance of conjugative and mobilizable transposons (CTns and MTns). The CTns/MTns are particularly abundant in the second chromosomes, involved in its extensive genome rearrangement, and have introduced a number of strain-specific genes into each strain. We also found a novel 188-bp repeat sequence that has been highly amplified in Pre. intermedia and are specifically distributed among the Pre. intermedia-related species. These findings expand our understanding of the genetic features of Pre. intermedia and the roles of CTns and MTns in the evolution of bacteria. PMID:26645327
Identification of Sinorhizobium (Ensifer) medicae based on a specific genomic sequence unveiled by M13-PCR fingerprinting.

PubMed

Dourado, Ana Catarina; Alves, Paula I L; Tenreiro, Tania; Ferreira, Eugénio M; Tenreiro, Rogério; Fareleira, Paula; Crespo, M Teresa Barreto

2009-12-01

A collection of nodule isolates from Medicago polymorpha obtained from southern and central Portugal was evaluated by M13-PCR fingerprinting and hierarchical cluster analysis. Several genomic clusters were obtained which, by 16S rRNA gene sequencing of selected representatives, were shown to be associated with particular taxonomic groups of rhizobia and other soil bacteria. The method provided a clear separation between rhizobia and co-isolated non-symbiotic soil contaminants. Ten M13-PCR groups were assigned to Sinorhizobium (Ensifer) medicae and included all isolates responsible for the formation of nitrogen-fixing nodules upon re-inoculation of M. polymorpha test-plants. In addition, enterobacterial repetitive intergenic consensus (ERIC)-PCR fingerprinting indicated a high genomic heterogeneity within the major M13- PCR clusters of S. medicae isolates. Based on nucleotide sequence data of an M13-PCR amplicon of ca. 1500 bp, observed only in S. medicae isolates and spanning locus Smed_3707 to Smed_3709 from the pSMED01 plasmid sequence of S. medicae WSM419 genome's sequence, a pair of PCR primers was designed and used for direct PCR amplification of a 1399-bp sequence within this fragment. Additional in silico and in vitro experiments, as well as phylogenetic analysis, confirmed the specificity of this primer combination and therefore the reliability of this approach in the prompt identification of S. medicae isolates and their distinction from other soil bacteria.
Molecular and functional characterization of novel fructosyltransferases and invertases from Agave tequilana.

PubMed

Cortés-Romero, Celso; Martínez-Hernández, Aída; Mellado-Mojica, Erika; López, Mercedes G; Simpson, June

2012-01-01

Fructans are the main storage polysaccharides found in Agave species. The synthesis of these complex carbohydrates relies on the activities of specific fructosyltransferase enzymes closely related to the hydrolytic invertases. Analysis of Agave tequilana transcriptome data led to the identification of ESTs encoding putative fructosyltransferases and invertases. Based on sequence alignments and structure/function relationships, two different genes were predicted to encode 1-SST and 6G-FFT type fructosyltransferases, in addition, 4 genes encoding putative cell wall invertases and 4 genes encoding putative vacuolar invertases were also identified. Probable functions for each gene, were assigned based on conserved amino acid sequences and confirmed for 2 fructosyltransferases and one invertase by analyzing the enzymatic activity of recombinant Agave protein s expressed and purified from Pichia pastoris. The genome organization of the fructosyltransferase/invertase genes, for which the corresponding cDNA contained the complete open reading frame, was found to be well conserved since all genes were shown to carry a 9 bp mini-exon and all showed a similar structure of 8 exons/7 introns with the exception of a cell wall invertase gene which has 7 exons and 6 introns. Fructosyltransferase genes were strongly expressed in the storage organs of the plants, especially in vegetative stages of development and to lower levels in photosynthetic tissues, in contrast to the invertase genes where higher levels of expression were observed in leaf tissues and in mature plants.
Molecular and Functional Characterization of Novel Fructosyltransferases and Invertases from Agave tequilana

PubMed Central

Cortés-Romero, Celso; Martínez-Hernández, Aída; Mellado-Mojica, Erika; López, Mercedes G.; Simpson, June

2012-01-01

Fructans are the main storage polysaccharides found in Agave species. The synthesis of these complex carbohydrates relies on the activities of specific fructosyltransferase enzymes closely related to the hydrolytic invertases. Analysis of Agave tequilana transcriptome data led to the identification of ESTs encoding putative fructosyltransferases and invertases. Based on sequence alignments and structure/function relationships, two different genes were predicted to encode 1-SST and 6G-FFT type fructosyltransferases, in addition, 4 genes encoding putative cell wall invertases and 4 genes encoding putative vacuolar invertases were also identified. Probable functions for each gene, were assigned based on conserved amino acid sequences and confirmed for 2 fructosyltransferases and one invertase by analyzing the enzymatic activity of recombinant Agave protein s expressed and purified from Pichia pastoris. The genome organization of the fructosyltransferase/invertase genes, for which the corresponding cDNA contained the complete open reading frame, was found to be well conserved since all genes were shown to carry a 9 bp mini-exon and all showed a similar structure of 8 exons/7 introns with the exception of a cell wall invertase gene which has 7 exons and 6 introns. Fructosyltransferase genes were strongly expressed in the storage organs of the plants, especially in vegetative stages of development and to lower levels in photosynthetic tissues, in contrast to the invertase genes where higher levels of expression were observed in leaf tissues and in mature plants. PMID:22558253
"Barriers to Cognitive Behavioral Therapy Homework Completion Scale- Depression Version": Development and Psychometric Evaluation.

PubMed

Callan, Judith A; Dunbar-Jacob, Jacqueline; Sereika, Susan M; Stone, Clement; Fasiczka, Amy; Jarrett, Robin B; Thase, Michael E

2012-01-01

We conducted a two-phase study to develop and evaluate the psychometric properties of an instrument to identify barriers to Cognitive Behavioral Therapy (CBT) homework completion in a depressed sample. In Phase I, we developed an item pool by interviewing 20 depressed patients and 20 CBT therapists. In Phase II, we created and administered a draft instrument to 56 people with depression. Exploratory Factor Analysis revealed a 2-factor oblique solution of "Patient Factors" and "Therapy/Task Factors." Internal consistency coefficients ranged from .80 to .95. Temporal stability was demonstrated through Pearson correlations of .72 (for the therapist/task subscale) to .95 (for the patient subscale) over periods of time that ranged from 2 days to 3 weeks. The patient subscale was able to satisfactorily classify patients (75 to 79 %) with low and high adherence at both sessions. Specificity was .66 at both time points. Sensitivity was .80 at sessions B and .77 at session C. There were no consistent predictors of assignment compliance when measured by the Assignment Compliance Rating Scale (Primakoff, Epstein, & Covi, 1986). The Rating Scale and subscale scores did, however, correlate significantly with assignment non-compliance (.32 to .46).
“Barriers to Cognitive Behavioral Therapy Homework Completion Scale- Depression Version”: Development and Psychometric Evaluation

PubMed Central

Callan, Judith A.; Dunbar-Jacob, Jacqueline; Sereika, Susan M.; Stone, Clement; Fasiczka, Amy; Jarrett, Robin B.; Thase, Michael E.

2013-01-01

We conducted a two-phase study to develop and evaluate the psychometric properties of an instrument to identify barriers to Cognitive Behavioral Therapy (CBT) homework completion in a depressed sample. In Phase I, we developed an item pool by interviewing 20 depressed patients and 20 CBT therapists. In Phase II, we created and administered a draft instrument to 56 people with depression. Exploratory Factor Analysis revealed a 2-factor oblique solution of “Patient Factors” and “Therapy/Task Factors.” Internal consistency coefficients ranged from .80 to .95. Temporal stability was demonstrated through Pearson correlations of .72 (for the therapist/task subscale) to .95 (for the patient subscale) over periods of time that ranged from 2 days to 3 weeks. The patient subscale was able to satisfactorily classify patients (75 to 79 %) with low and high adherence at both sessions. Specificity was .66 at both time points. Sensitivity was .80 at sessions B and .77 at session C. There were no consistent predictors of assignment compliance when measured by the Assignment Compliance Rating Scale (Primakoff, Epstein, & Covi, 1986). The Rating Scale and subscale scores did, however, correlate significantly with assignment non-compliance (.32 to .46). PMID:24049556
The reconstruction of 2,631 draft metagenome-assembled genomes from the global oceans.

PubMed

Tully, Benjamin J; Graham, Elaina D; Heidelberg, John F

2018-01-16

Microorganisms play a crucial role in mediating global biogeochemical cycles in the marine environment. By reconstructing the genomes of environmental organisms through metagenomics, researchers are able to study the metabolic potential of Bacteria and Archaea that are resistant to isolation in the laboratory. Utilizing the large metagenomic dataset generated from 234 samples collected during the Tara Oceans circumnavigation expedition, we were able to assemble 102 billion paired-end reads into 562 million contigs, which in turn were co-assembled and consolidated in to 7.2 million contigs ≥2 kb in length. Approximately 1 million of these contigs were binned to reconstruct draft genomes. In total, 2,631 draft genomes with an estimated completion of ≥50% were generated (1,491 draft genomes >70% complete; 603 genomes >90% complete). A majority of the draft genomes were manually assigned phylogeny based on sets of concatenated phylogenetic marker genes and/or 16S rRNA gene sequences. The draft genomes are now publically available for the research community at-large.
Germline viral "fossils" guide in silico reconstruction of a mid-Cenozoic era marsupial adeno-associated virus.

PubMed

Smith, Richard H; Hallwirth, Claus V; Westerman, Michael; Hetherington, Nicola A; Tseng, Yu-Shan; Cecchini, Sylvain; Virag, Tamas; Ziegler, Mona-Larissa; Rogozin, Igor B; Koonin, Eugene V; Agbandje-McKenna, Mavis; Kotin, Robert M; Alexander, Ian E

2016-07-05

Germline endogenous viral elements (EVEs) genetically preserve viral nucleotide sequences useful to the study of viral evolution, gene mutation, and the phylogenetic relationships among host organisms. Here, we describe a lineage-specific, adeno-associated virus (AAV)-derived endogenous viral element (mAAV-EVE1) found within the germline of numerous closely related marsupial species. Molecular screening of a marsupial DNA panel indicated that mAAV-EVE1 occurs specifically within the marsupial suborder Macropodiformes (present-day kangaroos, wallabies, and related macropodoids), to the exclusion of other Diprotodontian lineages. Orthologous mAAV-EVE1 locus sequences from sixteen macropodoid species, representing a speciation history spanning an estimated 30 million years, facilitated compilation of an inferred ancestral sequence that recapitulates the genome of an ancient marsupial AAV that circulated among Australian metatherian fauna sometime during the late Eocene to early Oligocene. In silico gene reconstruction and molecular modelling indicate remarkable conservation of viral structure over a geologic timescale. Characterisation of AAV-EVE loci among disparate species affords insight into AAV evolution and, in the case of macropodoid species, may offer an additional genetic basis for assignment of phylogenetic relationships among the Macropodoidea. From an applied perspective, the identified AAV "fossils" provide novel capsid sequences for use in translational research and clinical applications.
Identification and Characterization of Sites Where Persistent Atrial Fibrillation Is Terminated by Localized Ablation.

PubMed

Zaman, Junaid A B; Sauer, William H; Alhusseini, Mahmood I; Baykaner, Tina; Borne, Ryan T; Kowalewski, Christopher A B; Busch, Sonia; Zei, Paul C; Park, Shirley; Viswanathan, Mohan N; Wang, Paul J; Brachmann, Johannes; Krummen, David E; Miller, John M; Rappel, Wouter Jan; Narayan, Sanjiv M; Peters, Nicholas S

2018-01-01

The mechanisms by which persistent atrial fibrillation (AF) terminates via localized ablation are not well understood. To address the hypothesis that sites where localized ablation terminates persistent AF have characteristics identifiable with activation mapping during AF, we systematically examined activation patterns acquired only in cases of unequivocal termination by ablation. We recruited 57 patients with persistent AF undergoing ablation, in whom localized ablation terminated AF to sinus rhythm or organized tachycardia. For each site, we performed an offline analysis of unprocessed unipolar electrograms collected during AF from multipolar basket catheters using the maximum -dV/dt assignment to construct isochronal activation maps for multiple cycles. Additional computational modeling and phase analysis were used to study mechanisms of map variability. At all sites of AF termination, localized repetitive activation patterns were observed. Partial rotational circuits were observed in 26 of 57 (46%) cases, focal patterns in 19 of 57 (33%), and complete rotational activity in 12 of 57 (21%) cases. In computer simulations, incomplete segments of partial rotations coincided with areas of slow conduction characterized by complex, multicomponent electrograms, and variations in assigning activation times at such sites substantially altered mapped mechanisms. Local activation mapping at sites of termination of persistent AF showed repetitive patterns of rotational or focal activity. In computer simulations, complete rotational activation sequence was observed but was sensitive to assignment of activation timing particularly in segments of slow conduction. The observed phenomena of repetitive localized activation and the mechanism by which local ablation terminates putative AF drivers require further investigation. © 2018 American Heart Association, Inc.
Using Self-Recording, Evaluation, and Graphing to Increase Completion of Homework Assignments.

ERIC Educational Resources Information Center

Trammel, Diana Lynn; And Others

1994-01-01

Self-monitoring procedures were effective in increasing the number of daily homework assignments completed by eight secondary level students with learning disabilities. A daily listing of all assignments given by regular classroom teachers was used. Goal setting and self-graphing of data appeared to increase self-monitoring effectiveness. (DB)
Studies on tridecaptin B(1), a lipopeptide with activity against multidrug resistant Gram-negative bacteria.

PubMed

Cochrane, Stephen A; Lohans, Christopher T; van Belkum, Marco J; Bels, Manon A; Vederas, John C

2015-06-07

Previously other groups had reported that Paenibacillus polymyxa NRRL B-30507 produces SRCAM 37, a type IIA bacteriocin with antimicrobial activity against Campylobacter jejuni. Genome sequencing and isolation of antimicrobial compounds from this P. polymyxa strain show that the antimicrobial activity is due to polymyxins and tridecaptin B1. The complete structural assignment, synthesis, and antimicrobial profile of tridecaptin B1 is reported, as well as the putative gene cluster responsible for its biosynthesis. This peptide displays strong activity against multidrug resistant Gram-negative bacteria, a finding that is timely to the current problem of antibiotic resistance.

The Comprehensive Microbial Resource.

PubMed

Peterson, J D; Umayam, L A; Dickinson, T; Hickey, E K; White, O

2001-01-01

One challenge presented by large-scale genome sequencing efforts is effective display of uniform information to the scientific community. The Comprehensive Microbial Resource (CMR) contains robust annotation of all complete microbial genomes and allows for a wide variety of data retrievals. The bacterial information has been placed on the Web at http://www.tigr.org/CMR for retrieval using standard web browsing technology. Retrievals can be based on protein properties such as molecular weight or hydrophobicity, GC-content, functional role assignments and taxonomy. The CMR also has special web-based tools to allow data mining using pre-run homology searches, whole genome dot-plots, batch downloading and traversal across genomes using a variety of datatypes.
Neratinib Efficacy and Circulating Tumor DNA Detection of HER2 Mutations in HER2 Nonamplified Metastatic Breast Cancer.

PubMed

Ma, Cynthia X; Bose, Ron; Gao, Feng; Freedman, Rachel A; Telli, Melinda L; Kimmick, Gretchen; Winer, Eric; Naughton, Michael; Goetz, Matthew P; Russell, Christy; Tripathy, Debu; Cobleigh, Melody; Forero, Andres; Pluard, Timothy J; Anders, Carey; Niravath, Polly Ann; Thomas, Shana; Anderson, Jill; Bumb, Caroline; Banks, Kimberly C; Lanman, Richard B; Bryce, Richard; Lalani, Alshad S; Pfeifer, John; Hayes, Daniel F; Pegram, Mark; Blackwell, Kimberly; Bedard, Philippe L; Al-Kateb, Hussam; Ellis, Matthew J C

2017-10-01

Purpose: Based on promising preclinical data, we conducted a single-arm phase II trial to assess the clinical benefit rate (CBR) of neratinib, defined as complete/partial response (CR/PR) or stable disease (SD) ≥24 weeks, in HER2 mut nonamplified metastatic breast cancer (MBC). Secondary endpoints included progression-free survival (PFS), toxicity, and circulating tumor DNA (ctDNA) HER2 mut detection. Experimental Design: Tumor tissue positive for HER2 mut was required for eligibility. Neratinib was administered 240 mg daily with prophylactic loperamide. ctDNA sequencing was performed retrospectively for 54 patients (14 positive and 40 negative for tumor HER2 mut ). Results: Nine of 381 tumors (2.4%) sequenced centrally harbored HER2 mut (lobular 7.8% vs. ductal 1.6%; P = 0.026). Thirteen additional HER2 mut cases were identified locally. Twenty-one of these 22 HER2 mut cases were estrogen receptor positive. Sixteen patients [median age 58 (31-74) years and three (2-10) prior metastatic regimens] received neratinib. The CBR was 31% [90% confidence interval (CI), 13%-55%], including one CR, one PR, and three SD ≥24 weeks. Median PFS was 16 (90% CI, 8-31) weeks. Diarrhea (grade 2, 44%; grade 3, 25%) was the most common adverse event. Baseline ctDNA sequencing identified the same HER2 mut in 11 of 14 tumor-positive cases (sensitivity, 79%; 90% CI, 53%-94%) and correctly assigned 32 of 32 informative negative cases (specificity, 100%; 90% CI, 91%-100%). In addition, ctDNA HER2 mut variant allele frequency decreased in nine of 11 paired samples at week 4, followed by an increase upon progression. Conclusions: Neratinib is active in HER2 mut , nonamplified MBC. ctDNA sequencing offers a noninvasive strategy to identify patients with HER2 mut cancers for clinical trial participation. Clin Cancer Res; 23(19); 5687-95. ©2017 AACR . ©2017 American Association for Cancer Research.
Analysis of the complete genome of subgroup A' hepatitis B virus isolates from South Africa.

PubMed

Kramvis, Anna; Weitzmann, Louise; Owiredu, William K B A; Kew, Michael C

2002-04-01

A phylogenetic analysis is presented of six complete and seven pre-S1/S2/S gene sequences of hepatitis B virus (HBV) isolates from South Africa. Five of the full-length sequences and all of the pre-S2/S sequences have been previously reported. Four of the six complete genomes and three of the five incomplete sequences clustered with subgroup A', a unique segment of genotype A of HBV previously identified in 60% of South African isolates using analysis of the pre-S2/S region alone. This separation was also evident when the polymerase open reading frame was analysed, but not on analysis of either the X or pre-core/core genes. Amino acids were identified in the pre-S1 and polymerase regions specific to subgroup A'. In common with genotype D, 10 of 11 genotype A South African isolates had an 11 amino acid deletion in the amino end of the pre-S1 region. This deletion is also found in hepadnaviruses from non-human primates.
Effectiveness of a Rapid Lumbar Spine MRI Protocol Using 3D T2-Weighted SPACE Imaging Versus a Standard Protocol for Evaluation of Degenerative Changes of the Lumbar Spine.

PubMed

Sayah, Anousheh; Jay, Ann K; Toaff, Jacob S; Makariou, Erini V; Berkowitz, Frank

2016-09-01

Reducing lumbar spine MRI scanning time while retaining diagnostic accuracy can benefit patients and reduce health care costs. This study compares the effectiveness of a rapid lumbar MRI protocol using 3D T2-weighted sampling perfection with application-optimized contrast with different flip-angle evolutions (SPACE) sequences with a standard MRI protocol for evaluation of lumbar spondylosis. Two hundred fifty consecutive unenhanced lumbar MRI examinations performed at 1.5 T were retrospectively reviewed. Full, rapid, and complete versions of each examination were interpreted for spondylotic changes at each lumbar level, including herniations and neural compromise. The full examination consisted of sagittal T1-weighted, T2-weighted turbo spin-echo (TSE), and STIR sequences; and axial T1- and T2-weighted TSE sequences (time, 18 minutes 40 seconds). The rapid examination consisted of sagittal T1- and T2-weighted SPACE sequences, with axial SPACE reformations (time, 8 minutes 46 seconds). The complete examination consisted of the full examination plus the T2-weighted SPACE sequence. Sensitivities and specificities of the full and rapid examinations were calculated using the complete study as the reference standard. The rapid and full studies had sensitivities of 76.0% and 69.3%, with specificities of 97.2% and 97.9%, respectively, for all degenerative processes. Rapid and full sensitivities were 68.7% and 66.3% for disk herniation, 85.2% and 81.5% for canal compromise, 82.9% and 69.1% for lateral recess compromise, and 76.9% and 69.7% for foraminal compromise, respectively. Isotropic SPACE T2-weighted imaging provides high-quality imaging of lumbar spondylosis, with multiplanar reformatting capability. Our SPACE-based rapid protocol had sensitivities and specificities for herniations and neural compromise comparable to those of the protocol without SPACE. This protocol fits within a 15-minute slot, potentially reducing costs and discomfort for a large subgroup of patients.
Protein structure and the sequential structure of mRNA: alpha-helix and beta-sheet signals at the nucleotide level.

PubMed

Brunak, S; Engelbrecht, J

1996-06-01

A direct comparison of experimentally determined protein structures and their corresponding protein coding mRNA sequences has been performed. We examine whether real world data support the hypothesis that clusters of rare codons correlate with the location of structural units in the resulting protein. The degeneracy of the genetic code allows for a biased selection of codons which may control the translational rate of the ribosome, and may thus in vivo have a catalyzing effect on the folding of the polypeptide chain. A complete search for GenBank nucleotide sequences coding for structural entries in the Brookhaven Protein Data Bank produced 719 protein chains with matching mRNA sequence, amino acid sequence, and secondary structure assignment. By neural network analysis, we found strong signals in mRNA sequence regions surrounding helices and sheets. These signals do not originate from the clustering of rare codons, but from the similarity of codons coding for very abundant amino acid residues at the N- and C-termini of helices and sheets. No correlation between the positioning of rare codons and the location of structural units was found. The mRNA signals were also compared with conserved nucleotide features of 16S-like ribosomal RNA sequences and related to mechanisms for maintaining the correct reading frame by the ribosome.
Neural Correlates of Temporal Credit Assignment in the Parietal Lobe

PubMed Central

Eisenberg, Ian; Gottlieb, Jacqueline

2014-01-01

Empirical studies of decision making have typically assumed that value learning is governed by time, such that a reward prediction error arising at a specific time triggers temporally-discounted learning for all preceding actions. However, in natural behavior, goals must be acquired through multiple actions, and each action can have different significance for the final outcome. As is recognized in computational research, carrying out multi-step actions requires the use of credit assignment mechanisms that focus learning on specific steps, but little is known about the neural correlates of these mechanisms. To investigate this question we recorded neurons in the monkey lateral intraparietal area (LIP) during a serial decision task where two consecutive eye movement decisions led to a final reward. The underlying decision trees were structured such that the two decisions had different relationships with the final reward, and the optimal strategy was to learn based on the final reward at one of the steps (the “F” step) but ignore changes in this reward at the remaining step (the “I” step). In two distinct contexts, the F step was either the first or the second in the sequence, controlling for effects of temporal discounting. We show that LIP neurons had the strongest value learning and strongest post-decision responses during the transition after the F step regardless of the serial position of this step. Thus, the neurons encode correlates of temporal credit assignment mechanisms that allocate learning to specific steps independently of temporal discounting. PMID:24523935
Genome-wide comparisons of phylogenetic similarities between partial genomic regions and the full-length genome in Hepatitis E virus genotyping.

PubMed

Wang, Shuai; Wei, Wei; Luo, Xuenong; Cai, Xuepeng

2014-01-01

Besides the complete genome, different partial genomic sequences of Hepatitis E virus (HEV) have been used in genotyping studies, making it difficult to compare the results based on them. No commonly agreed partial region for HEV genotyping has been determined. In this study, we used a statistical method to evaluate the phylogenetic performance of each partial genomic sequence from a genome wide, by comparisons of evolutionary distances between genomic regions and the full-length genomes of 101 HEV isolates to identify short genomic regions that can reproduce HEV genotype assignments based on full-length genomes. Several genomic regions, especially one genomic region at the 3'-terminal of the papain-like cysteine protease domain, were detected to have relatively high phylogenetic correlations with the full-length genome. Phylogenetic analyses confirmed the identical performances between these regions and the full-length genome in genotyping, in which the HEV isolates involved could be divided into reasonable genotypes. This analysis may be of value in developing a partial sequence-based consensus classification of HEV species.
Three genes in the human MHC class III region near the junction with the class II: Gene for receptor of advanced glycosylation end products, PBX2 homeobox gene and a notch homolog, human counterpart of mouse mammary tumor gene int-3

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sugaya, K.; Fukagawa, T.; Matsumoto, K.

Cosmid walking of about 250 kb from MHC class III gene CYP21 to class II was conducted. The gene for receptor of advanced glycosylation end products of proteins (RAGE, a member of immunoglobulin super-family molecules), the PBX2 homeobox gene designated HOX12, and the human counterpart of the mouse mammary tumor gene int-3 were found. The contiguous RAGE and HOX12 genes were completely sequenced, and the human int-3 counterpart was partially sequenced and assigned to a Notch homolog. This human Notch homolog, designated NOTCH3, showed both the intracellular portion present in the mouse int-3 sequence and the extracellular portion absent inmore » the int-3. It thus corresponds to the intact form of a Notch-type transmembrane protein. About 20 kb of dense Alu clustering was found just centromeric to the NOTCH3. 48 refs., 9 figs., 2 tabs.« less
Microbial culturomics to isolate halophilic bacteria from table salt: genome sequence and description of the moderately halophilic bacterium Bacillus salis sp. nov.

PubMed

Seck, E H; Diop, A; Armstrong, N; Delerce, J; Fournier, P-E; Raoult, D; Khelaifia, S

2018-05-01

Bacillus salis strain ES3 T (= CSUR P1478 = DSM 100598) is the type strain of B. salis sp. nov. It is an aerobic, Gram-positive, moderately halophilic, motile and spore-forming bacterium. It was isolated from commercial table salt as part of a broad culturomics study aiming to maximize the culture conditions for the in-depth exploration of halophilic bacteria in salty food. Here we describe the phenotypic characteristics of this isolate, its complete genome sequence and annotation, together with a comparison with closely related bacteria. Phylogenetic analysis based on 16S rRNA gene sequences indicated 97.5% similarity with Bacillus aquimaris, the closest species. The 8 329 771 bp long genome (one chromosome, no plasmids) exhibits a G+C content of 39.19%. It is composed of 18 scaffolds with 29 contigs. Of the 8303 predicted genes, 8109 were protein-coding genes and 194 were RNAs. A total of 5778 genes (71.25%) were assigned a putative function.
AutoFACT: An Automatic Functional Annotation and Classification Tool

PubMed Central

Koski, Liisa B; Gray, Michael W; Lang, B Franz; Burger, Gertraud

2005-01-01

Background Assignment of function to new molecular sequence data is an essential step in genomics projects. The usual process involves similarity searches of a given sequence against one or more databases, an arduous process for large datasets. Results We present AutoFACT, a fully automated and customizable annotation tool that assigns biologically informative functions to a sequence. Key features of this tool are that it (1) analyzes nucleotide and protein sequence data; (2) determines the most informative functional description by combining multiple BLAST reports from several user-selected databases; (3) assigns putative metabolic pathways, functional classes, enzyme classes, GeneOntology terms and locus names; and (4) generates output in HTML, text and GFF formats for the user's convenience. We have compared AutoFACT to four well-established annotation pipelines. The error rate of functional annotation is estimated to be only between 1–2%. Comparison of AutoFACT to the traditional top-BLAST-hit annotation method shows that our procedure increases the number of functionally informative annotations by approximately 50%. Conclusion AutoFACT will serve as a useful annotation tool for smaller sequencing groups lacking dedicated bioinformatics staff. It is implemented in PERL and runs on LINUX/UNIX platforms. AutoFACT is available at . PMID:15960857
GASP: Gapped Ancestral Sequence Prediction for proteins

PubMed Central

Edwards, Richard J; Shields, Denis C

2004-01-01

Background The prediction of ancestral protein sequences from multiple sequence alignments is useful for many bioinformatics analyses. Predicting ancestral sequences is not a simple procedure and relies on accurate alignments and phylogenies. Several algorithms exist based on Maximum Parsimony or Maximum Likelihood methods but many current implementations are unable to process residues with gaps, which may represent insertion/deletion (indel) events or sequence fragments. Results Here we present a new algorithm, GASP (Gapped Ancestral Sequence Prediction), for predicting ancestral sequences from phylogenetic trees and the corresponding multiple sequence alignments. Alignments may be of any size and contain gaps. GASP first assigns the positions of gaps in the phylogeny before using a likelihood-based approach centred on amino acid substitution matrices to assign ancestral amino acids. Important outgroup information is used by first working down from the tips of the tree to the root, using descendant data only to assign probabilities, and then working back up from the root to the tips using descendant and outgroup data to make predictions. GASP was tested on a number of simulated datasets based on real phylogenies. Prediction accuracy for ungapped data was similar to three alternative algorithms tested, with GASP performing better in some cases and worse in others. Adding simple insertions and deletions to the simulated data did not have a detrimental effect on GASP accuracy. Conclusions GASP (Gapped Ancestral Sequence Prediction) will predict ancestral sequences from multiple protein alignments of any size. Although not as accurate in all cases as some of the more sophisticated maximum likelihood approaches, it can process a wide range of input phylogenies and will predict ancestral sequences for gapped and ungapped residues alike. PMID:15350199
Cloning and sequencing of the pheP gene, which encodes the phenylalanine-specific transport system of Escherichia coli.

PubMed Central

Pi, J; Wookey, P J; Pittard, A J

1991-01-01

The phenylalanine-specific permease gene (pheP) of Escherichia coli has been cloned and sequenced. The gene was isolated on a 6-kb Sau3AI fragment from a chromosomal library, and its presence was verified by complementation of a mutant lacking the functional phenylalanine-specific permease. Subcloning from this fragment localized the pheP gene on a 2.7-kb HindIII-HindII fragment. The nucleotide sequence of this 2.7-kb region was determined. An open reading frame was identified which extends from a putative start point of translation (GTG at position 636) to a termination signal (TAA at position 2010). The assignment of the GTG as the initiation codon was verified by site-directed mutagenesis of the initiation codon and by introducing a chain termination mutation into the pheP-lacZ fusion construct. A single initiation site of transcription 30 bp upstream of the start point of translation was identified by the primer extension analysis. The pheP structural gene consists of 1,374 nucleotides specifying a protein of 458 amino acid residues. The PheP protein is very hydrophobic (71% nonpolar residues). A topological model predicted from the sequence analysis defines 12 transmembrane segments. This protein is highly homologous with the AroP (general aromatic transport) system of E. coli (59.6% identity) and to a lesser extent with the yeast permeases CAN1 (arginine), PUT4 (proline), and HIP1 (histidine) of Saccharomyces cerevisiae. Images PMID:1711024
openSputnik--a database to ESTablish comparative plant genomics using unsaturated sequence collections.

PubMed

Rudd, Stephen

2005-01-01

The public expressed sequence tag collections are continually being enriched with high-quality sequences that represent an ever-expanding range of taxonomically diverse plant species. While these sequence collections provide biased insight into the populations of expressed genes available within individual species and their associated tissues, the information is conceivably of wider relevance in a comparative context. When we consider the available expressed sequence tag (EST) collections of summer 2004, most of the major plant taxonomic clades are at least superficially represented. Investigation of the five million available plant ESTs provides a wealth of information that has applications in modelling the routes of plant genome evolution and the identification of lineage-specific genes and gene families. Over four million ESTs from over 50 distinct plant species have been collated within an EST analysis pipeline called openSputnik. The ESTs were resolved down into approximately one million unigene sequences. These have been annotated using orthology-based annotation transfer from reference plant genomes and using a variety of contemporary bioinformatics methods to assign peptide, structural and functional attributes. The openSputnik database is available at http://sputnik.btk.fi.
Multi-Platform Next-Generation Sequencing of the Domestic Turkey (Meleagris gallopavo): Genome Assembly and Analysis

PubMed Central

Aslam, Luqman; Beal, Kathryn; Ann Blomberg, Le; Bouffard, Pascal; Burt, David W.; Crasta, Oswald; Crooijmans, Richard P. M. A.; Cooper, Kristal; Coulombe, Roger A.; De, Supriyo; Delany, Mary E.; Dodgson, Jerry B.; Dong, Jennifer J.; Evans, Clive; Frederickson, Karin M.; Flicek, Paul; Florea, Liliana; Folkerts, Otto; Groenen, Martien A. M.; Harkins, Tim T.; Herrero, Javier; Hoffmann, Steve; Megens, Hendrik-Jan; Jiang, Andrew; de Jong, Pieter; Kaiser, Pete; Kim, Heebal; Kim, Kyu-Won; Kim, Sungwon; Langenberger, David; Lee, Mi-Kyung; Lee, Taeheon; Mane, Shrinivasrao; Marcais, Guillaume; Marz, Manja; McElroy, Audrey P.; Modise, Thero; Nefedov, Mikhail; Notredame, Cédric; Paton, Ian R.; Payne, William S.; Pertea, Geo; Prickett, Dennis; Puiu, Daniela; Qioa, Dan; Raineri, Emanuele; Ruffier, Magali; Salzberg, Steven L.; Schatz, Michael C.; Scheuring, Chantel; Schmidt, Carl J.; Schroeder, Steven; Searle, Stephen M. J.; Smith, Edward J.; Smith, Jacqueline; Sonstegard, Tad S.; Stadler, Peter F.; Tafer, Hakim; Tu, Zhijian (Jake); Van Tassell, Curtis P.; Vilella, Albert J.; Williams, Kelly P.; Yorke, James A.; Zhang, Liqing; Zhang, Hong-Bin; Zhang, Xiaojun; Zhang, Yang; Reed, Kent M.

2010-01-01

A synergistic combination of two next-generation sequencing platforms with a detailed comparative BAC physical contig map provided a cost-effective assembly of the genome sequence of the domestic turkey (Meleagris gallopavo). Heterozygosity of the sequenced source genome allowed discovery of more than 600,000 high quality single nucleotide variants. Despite this heterozygosity, the current genome assembly (∼1.1 Gb) includes 917 Mb of sequence assigned to specific turkey chromosomes. Annotation identified nearly 16,000 genes, with 15,093 recognized as protein coding and 611 as non-coding RNA genes. Comparative analysis of the turkey, chicken, and zebra finch genomes, and comparing avian to mammalian species, supports the characteristic stability of avian genomes and identifies genes unique to the avian lineage. Clear differences are seen in number and variety of genes of the avian immune system where expansions and novel genes are less frequent than examples of gene loss. The turkey genome sequence provides resources to further understand the evolution of vertebrate genomes and genetic variation underlying economically important quantitative traits in poultry. This integrated approach may be a model for providing both gene and chromosome level assemblies of other species with agricultural, ecological, and evolutionary interest. PMID:20838655
Construction of random sheared fosmid library from Chinese cabbage and its use for Brassica rapa genome sequencing project.

PubMed

Park, Tae-Ho; Park, Beom-Seok; Kim, Jin-A; Hong, Joon Ki; Jin, Mina; Seol, Young-Joo; Mun, Jeong-Hwan

2011-01-01

As a part of the Multinational Genome Sequencing Project of Brassica rapa, linkage group R9 and R3 were sequenced using a bacterial artificial chromosome (BAC) by BAC strategy. The current physical contigs are expected to cover approximately 90% euchromatins of both chromosomes. As the project progresses, BAC selection for sequence extension becomes more limited because BAC libraries are restriction enzyme-specific. To support the project, a random sheared fosmid library was constructed. The library consists of 97536 clones with average insert size of approximately 40 kb corresponding to seven genome equivalents, assuming a Chinese cabbage genome size of 550 Mb. The library was screened with primers designed at the end of sequences of nine points of scaffold gaps where BAC clones cannot be selected to extend the physical contigs. The selected positive clones were end-sequenced to check the overlap between the fosmid clones and the adjacent BAC clones. Nine fosmid clones were selected and fully sequenced. The sequences revealed two completed gap filling and seven sequence extensions, which can be used for further selection of BAC clones confirming that the fosmid library will facilitate the sequence completion of B. rapa. Copyright © 2011. Published by Elsevier Ltd.
Functional assignment of gene AAC16202.1 from Rhodobacter capsulatus SB1003: new insights into the bacterial SDR sorbitol dehydrogenases family.

PubMed

Sola-Carvajal, Agustín; García-García, María Inmaculada; Sánchez-Carrón, Guiomar; García-Carmona, Francisco; Sánchez-Ferrer, Alvaro

2012-11-01

Short-chain dehydrogenases/reductases (SDR) constitute one of the largest enzyme superfamilies with over 60,000 non-redundant sequences in the database, many of which need a correct functional assignment. Among them, the gene AAC16202.1 (NCBI) from Rhodobacter capsulatus SB1003 has been assigned in Uniprot both as a sorbitol dehydrogenase (#D5AUY1) and, as an N-acetyl-d-mannosamine dehydrogenase (#O66112), both enzymes being of biotechnological interest. When the gene was overexpressed in Escherichia coli Rosetta (DE3)pLys, the purified enzyme was not active toward N-acetyl-d-mannosamine, whereas it was active toward d-sorbitol and d-fructose. However, the relative activities toward xylitol and l-iditol (0.45 and 6.9%, respectively) were low compared with that toward d-sorbitol. Thus, the enzyme could be considered sorbitol dehydrogenase (SDH) with very low activity toward xylitol, which could increase its biotechnological interest for determining sorbitol without the unspecific cross-determination of added xylitol in food and pharma compositions. The tetrameric enzyme (120 kDa) showed similar catalytic efficiency (2.2 × 10(3) M(-1) s(-1)) to other sorbitol dehydrogenases for d-sorbitol, with an optimum pH of 9.0 and an optimum temperature of 37 °C. The enzyme was also more thermostable than other reported SDH, ammonium sulfate being the best stabilizer in this respect, increasing the melting temperature (T(m)) up to 52.9 °C. The enzyme can also be considered as a new member of the Zn(2+) independent SDH family since no effect on activity was detected in the presence of divalent cations or chelating agents. Finally, its in silico analysis enabled the specific conserved sequence blocks that are the fingerprints of bacterial sorbitol dehydrogenases and mainly located at C-terminal of the protein, to be determined for the first time. This knowledge will facilitate future data curation of present databases and a better functional assignment of newly described sequences. Copyright © 2012 Elsevier Masson SAS. All rights reserved.
Identification of family-specific residue packing motifs and their use for structure-based protein function prediction: I. Method development.

PubMed

Bandyopadhyay, Deepak; Huan, Jun; Prins, Jan; Snoeyink, Jack; Wang, Wei; Tropsha, Alexander

2009-11-01

Protein function prediction is one of the central problems in computational biology. We present a novel automated protein structure-based function prediction method using libraries of local residue packing patterns that are common to most proteins in a known functional family. Critical to this approach is the representation of a protein structure as a graph where residue vertices (residue name used as a vertex label) are connected by geometrical proximity edges. The approach employs two steps. First, it uses a fast subgraph mining algorithm to find all occurrences of family-specific labeled subgraphs for all well characterized protein structural and functional families. Second, it queries a new structure for occurrences of a set of motifs characteristic of a known family, using a graph index to speed up Ullman's subgraph isomorphism algorithm. The confidence of function inference from structure depends on the number of family-specific motifs found in the query structure compared with their distribution in a large non-redundant database of proteins. This method can assign a new structure to a specific functional family in cases where sequence alignments, sequence patterns, structural superposition and active site templates fail to provide accurate annotation.
Fingerprinting and quantification of GMOs in the agro-food sector.

PubMed

Taverniers, I; Van Bockstaele, E; De Loose, M

2003-01-01

Most strategies for analyzing GMOs in plants and derived food and feed products, are based on the polymerase chain reaction (PCR) technique. In conventional PCR methods, a 'known' sequence between two specific primers is amplified. To the contrary, with the 'anchor PCR' technique, unknown sequences adjacent to a known sequence, can be amplified. Because T-DNA/plant border sequences are being amplified, anchor PCR is the perfect tool for unique identification of transgenes, including non-authorized GMOs. In this work, anchor PCR was applied to characterize the 'transgene locus' and to clarify the complete molecular structure of at least six different commercial transgenic plants. Based on sequences of T-DNA/plant border junctions, obtained by anchor PCR, event specific primers were developed. The junction fragments, together with endogeneous reference gene targets, were cloned in plasmids. The latter were then used as event specific calibrators in real-time PCR, a new technique for the accurate relative quantification of GMOs. We demonstrate here the importance of anchor PCR for identification and the usefulness of plasmid DNA calibrators in quantification strategies for GMOs, throughout the agro-food sector.
Phylogenetic utility, and variability in structure and content, of complete mitochondrial genomes among genetic lineages of the Hawaiian anchialine shrimp Halocaridina rubra Holthuis 1963 (Atyidae:Decapoda).

PubMed

Justice, Joshua L; Weese, David A; Santos, Scott Ross

2016-07-01

The Atyidae are caridean shrimp possessing hair-like setae on their claws and are important contributors to ecological services in tropical and temperate fresh and brackish water ecosystems. Complete mitochondrial genomes have only been reported from five of the 449 species in the family, thus limiting understanding of mitochondrial genome evolution and the phylogenetic utility of complete mitochondrial sequences in the Atyidae. Here, comparative analyses of complete mitochondrial genomes from eight genetic lineages of Halocaridina rubra, an atyid endemic to the anchialine ecosystem of the Hawaiian Archipelago, are presented. Although gene number, order, and orientation were syntenic among genomes, three regions were identified and further quantified where conservation was substantially lower: (1) high length and sequence variability in the tRNA-Lys and tRNA-Asp intergenic region; (2) a 317-bp insertion between the NAD6 and CytB genes confined to a single lineage and representing a partial duplication of CytB; and (3) the putative control region. Phylogenetic analyses utilizing complete mitochondrial sequences provided new insights into relationships among the H. rubra genetic lineages, with the topology of one clade correlating to the geologic sequence of the islands. However, deeper nodes in the phylogeny lacked bootstrap support. Overall, our results from H. rubra suggest intra-specific mitochondrial genomic diversity could be underestimated across the Metazoa since the vast majority of complete genomes are from just a single individual of a species.
Taxonomic and functional assignment of cloned sequences from high Andean forest soil metagenome.

PubMed

Montaña, José Salvador; Jiménez, Diego Javier; Hernández, Mónica; Angel, Tatiana; Baena, Sandra

2012-02-01

Total metagenomic DNA was isolated from high Andean forest soil and subjected to taxonomical and functional composition analyses by means of clone library generation and sequencing. The obtained yield of 1.7 μg of DNA/g of soil was used to construct a metagenomic library of approximately 20,000 clones (in the plasmid p-Bluescript II SK+) with an average insert size of 4 Kb, covering 80 Mb of the total metagenomic DNA. Metagenomic sequences near the plasmid cloning site were sequenced and them trimmed and assembled, obtaining 299 reads and 31 contigs (0.3 Mb). Taxonomic assignment of total sequences was performed by BLASTX, resulting in 68.8, 44.8 and 24.5% classification into taxonomic groups using the metagenomic RAST server v2.0, WebCARMA v1.0 online system and MetaGenome Analyzer v3.8 software, respectively. Most clone sequences were classified as Bacteria belonging to phlya Actinobacteria, Proteobacteria and Acidobacteria. Among the most represented orders were Actinomycetales (34% average), Rhizobiales, Burkholderiales and Myxococcales and with a greater number of sequences in the genus Mycobacterium (7% average), Frankia, Streptomyces and Bradyrhizobium. The vast majority of sequences were associated with the metabolism of carbohydrates, proteins, lipids and catalytic functions, such as phosphatases, glycosyltransferases, dehydrogenases, methyltransferases, dehydratases and epoxide hydrolases. In this study we compared different methods of taxonomic and functional assignment of metagenomic clone sequences to evaluate microbial diversity in an unexplored soil ecosystem, searching for putative enzymes of biotechnological interest and generating important information for further functional screening of clone libraries.

Structure and Distribution of Centromeric Retrotransposons at Diploid and Allotetraploid Coffea Centromeric and Pericentromeric Regions

PubMed Central

de Castro Nunes, Renata; Orozco-Arias, Simon; Crouzillat, Dominique; Mueller, Lukas A.; Strickler, Suzy R.; Descombes, Patrick; Fournier, Coralie; Moine, Deborah; de Kochko, Alexandre; Yuyama, Priscila M.; Vanzela, André L. L.; Guyot, Romain

2018-01-01

Centromeric regions of plants are generally composed of large array of satellites from a specific lineage of Gypsy LTR-retrotransposons, called Centromeric Retrotransposons. Repeated sequences interact with a specific H3 histone, playing a crucial function on kinetochore formation. To study the structure and composition of centromeric regions in the genus Coffea, we annotated and classified Centromeric Retrotransposons sequences from the allotetraploid C. arabica genome and its two diploid ancestors: Coffea canephora and C. eugenioides. Ten distinct CRC (Centromeric Retrotransposons in Coffea) families were found. The sequence mapping and FISH experiments of CRC Reverse Transcriptase domains in C. canephora, C. eugenioides, and C. arabica clearly indicate a strong and specific targeting mainly onto proximal chromosome regions, which can be associated also with heterochromatin. PacBio genome sequence analyses of putative centromeric regions on C. arabica and C. canephora chromosomes showed an exceptional density of one family of CRC elements, and the complete absence of satellite arrays, contrasting with usual structure of plant centromeres. Altogether, our data suggest a specific centromere organization in Coffea, contrasting with other plant genomes. PMID:29497436
Solution 1H NMR characterization of the axial bonding of the two His in oxidized human cytoglobin

PubMed Central

Bondarenko, Vasyl; Dewilde, Sylvia; Moens, Luc; La Mar, Gerd N.

2008-01-01

Solution 1H NMR spectroscopy has been used to determine the relative strengths (covalency) of the two axial His-Fe bonds in paramagnetic, S = 1/2, human met-cytoglobin. The sequence specific assignments of crucial portions of the proximal and distal helices, together with the magnitude of hyperfine shifts and paramagnetic relaxation, establish that His81 and His113, at the canonical positions E7 and F8 in the myoglobin fold, respectively, are ligated to the iron. The characterized complex (~90%) in solution has protohemin oriented as in crystals, with the remaining ~10% exhibiting the hemin orientation rotated 180° about the α-, γ-meso axis. No evidence could be obtained for any five-coordinate complex (<1%) in equilibrium with the six-coordinate complexes. Extensive sequence-specific assignments on other dipolar shifted helical fragments and loops, together with available alternate crystal coordinates for the complex, allowed the robust determination of the orientation and anisotropies of the paramagnetic susceptibility tensor. The tilt of the major axis is controlled by the His-Fe-His vector, and the rhombic axes by the mean of the imidazole orientations for the two His. The anisotropy of the paramagnetic susceptibility tensor allowed the quantitative factoring of the hyperfine shifts for the two axial His to reveal indistinguishable pattern and magnitudes of the contact shifts or π spin densities, and hence, indistinguishable Fe-imidazole covalency for both Fe-His bonds. PMID:17002396
Research progress of plant population genomics based on high-throughput sequencing.

PubMed

Wang, Yun-sheng

2016-08-01

Population genomics, a new paradigm for population genetics, combine the concepts and techniques of genomics with the theoretical system of population genetics and improve our understanding of microevolution through identification of site-specific effect and genome-wide effects using genome-wide polymorphic sites genotypeing. With the appearance and improvement of the next generation high-throughput sequencing technology, the numbers of plant species with complete genome sequences increased rapidly and large scale resequencing has also been carried out in recent years. Parallel sequencing has also been done in some plant species without complete genome sequences. These studies have greatly promoted the development of population genomics and deepened our understanding of the genetic diversity, level of linking disequilibium, selection effect, demographical history and molecular mechanism of complex traits of relevant plant population at a genomic level. In this review, I briely introduced the concept and research methods of population genomics and summarized the research progress of plant population genomics based on high-throughput sequencing. I also discussed the prospect as well as existing problems of plant population genomics in order to provide references for related studies.
Comparison of simple sequence repeats in 19 Archaea.

PubMed

Trivedi, S

2006-12-05

All organisms that have been studied until now have been found to have differential distribution of simple sequence repeats (SSRs), with more SSRs in intergenic than in coding sequences. SSR distribution was investigated in Archaea genomes where complete chromosome sequences of 19 Archaea were analyzed with the program SPUTNIK to find di- to penta-nucleotide repeats. The number of repeats was determined for the complete chromosome sequences and for the coding and non-coding sequences. Different from what has been found for other groups of organisms, there is an abundance of SSRs in coding regions of the genome of some Archaea. Dinucleotide repeats were rare and CG repeats were found in only two Archaea. In general, trinucleotide repeats are the most abundant SSR motifs; however, pentanucleotide repeats are abundant in some Archaea. Some of the tetranucleotide and pentanucleotide repeat motifs are organism specific. In general, repeats are short and CG-rich repeats are present in Archaea having a CG-rich genome. Among the 19 Archaea, SSR density was not correlated with genome size or with optimum growth temperature. Pentanucleotide density had an inverse correlation with the CG content of the genome.
Equivalent Indels – Ambiguous Functional Classes and Redundancy in Databases

PubMed Central

Assmus, Jens; Kleffe, Jürgen; Schmitt, Armin O.; Brockmann, Gudrun A.

2013-01-01

There is considerable interest in studying sequenced variations. However, while the positions of substitutions are uniquely identifiable by sequence alignment, the location of insertions and deletions still poses problems. Each insertion and deletion causes a change of sequence. Yet, due to low complexity or repetitive sequence structures, the same indel can sometimes be annotated in different ways. Two indels which differ in allele sequence and position can be one and the same, i.e. the alternative sequence of the whole chromosome is identical in both cases and, therefore, the two deletions are biologically equivalent. In such a case, it is impossible to identify the exact position of an indel merely based on sequence alignment. Thus, variation entries in a mutation database are not necessarily uniquely defined. We prove the existence of a contiguous region around an indel in which all deletions of the same length are biologically identical. Databases often show only one of several possible locations for a given variation. Furthermore, different data base entries can represent equivalent variation events. We identified 1,045,590 such problematic entries of insertions and deletions out of 5,860,408 indel entries in the current human database of Ensembl. Equivalent indels are found in sequence regions of different functions like exons, introns or 5' and 3' UTRs. One and the same variation can be assigned to several different functional classifications of which only one is correct. We implemented an algorithm that determines for each indel database entry its complete set of equivalent indels which is uniquely characterized by the indel itself and a given interval of the reference sequence. PMID:23658777
Bioinformatics Analysis of the Complete Genome Sequence of the Mango Tree Pathogen Pseudomonas syringae pv. syringae UMAF0158 Reveals Traits Relevant to Virulence and Epiphytic Lifestyle

PubMed Central

Arrebola, Eva; Carrión, Víctor J.; Gutiérrez-Barranquero, José Antonio; Pérez-García, Alejandro; Ramos, Cayo; Cazorla, Francisco M.; de Vicente, Antonio

2015-01-01

The genome sequence of more than 100 Pseudomonas syringae strains has been sequenced to date; however only few of them have been fully assembled, including P. syringae pv. syringae B728a. Different strains of pv. syringae cause different diseases and have different host specificities; so, UMAF0158 is a P. syringae pv. syringae strain related to B728a but instead of being a bean pathogen it causes apical necrosis of mango trees, and the two strains belong to different phylotypes of pv.syringae and clades of P. syringae. In this study we report the complete sequence and annotation of P. syringae pv. syringae UMAF0158 chromosome and plasmid pPSS158. A comparative analysis with the available sequenced genomes of other 25 P. syringae strains, both closed (the reference genomes DC3000, 1448A and B728a) and draft genomes was performed. The 5.8 Mb UMAF0158 chromosome has 59.3% GC content and comprises 5017 predicted protein-coding genes. Bioinformatics analysis revealed the presence of genes potentially implicated in the virulence and epiphytic fitness of this strain. We identified several genetic features, which are absent in B728a, that may explain the ability of UMAF0158 to colonize and infect mango trees: the mangotoxin biosynthetic operon mbo, a gene cluster for cellulose production, two different type III and two type VI secretion systems, and a particular T3SS effector repertoire. A mutant strain defective in the rhizobial-like T3SS Rhc showed no differences compared to wild-type during its interaction with host and non-host plants and worms. Here we report the first complete sequence of the chromosome of a pv. syringae strain pathogenic to a woody plant host. Our data also shed light on the genetic factors that possibly determine the pathogenic and epiphytic lifestyle of UMAF0158. This work provides the basis for further analysis on specific mechanisms that enable this strain to infect woody plants and for the functional analysis of host specificity in the P. syringae complex. PMID:26313942
Adaptive Discrete Hypergraph Matching.

PubMed

Yan, Junchi; Li, Changsheng; Li, Yin; Cao, Guitao

2018-02-01

This paper addresses the problem of hypergraph matching using higher-order affinity information. We propose a solver that iteratively updates the solution in the discrete domain by linear assignment approximation. The proposed method is guaranteed to converge to a stationary discrete solution and avoids the annealing procedure and ad-hoc post binarization step that are required in several previous methods. Specifically, we start with a simple iterative discrete gradient assignment solver. This solver can be trapped in an -circle sequence under moderate conditions, where is the order of the graph matching problem. We then devise an adaptive relaxation mechanism to jump out this degenerating case and show that the resulting new path will converge to a fixed solution in the discrete domain. The proposed method is tested on both synthetic and real-world benchmarks. The experimental results corroborate the efficacy of our method.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Wemmer, D.E.; Kumar, N.V.; Metrione, R.M.

Toxin II from Radianthus paumotensis (Rp/sub II/) has been investigated by high-resolution NMR and chemical sequencing methods. Resonance assignments have been obtained for this protein by the sequential approach. NMR assignments could not be made consistent with the previously reported primary sequence for this protein, and chemical methods have been used to determine a sequence with which the NMR data are consistent. Analysis of the 2D NOE spectra shows that the protein secondary structure is comprised of two sequences of ..beta..-sheet, probably joined into a distorted continuous sheet, connected by turns and extended loops, without any regular ..cap alpha..-helical segments.more » The residues previously implicated in activity in this class of proteins, D8 and R13, occur in a loop region.« less
Comparative analysis of the complete sequence of the plastid genome of Parthenium argentatum and identification of DNA barcodes to differentiate Parthenium species and lines

PubMed Central

2009-01-01

Background Parthenium argentatum (guayule) is an industrial crop that produces latex, which was recently commercialized as a source of latex rubber safe for people with Type I latex allergy. The complete plastid genome of P. argentatum was sequenced. The sequence provides important information useful for genetic engineering strategies. Comparison to the sequences of plastid genomes from three other members of the Asteraceae, Lactuca sativa, Guitozia abyssinica and Helianthus annuus revealed details of the evolution of the four genomes. Chloroplast-specific DNA barcodes were developed for identification of Parthenium species and lines. Results The complete plastid genome of P. argentatum is 152,803 bp. Based on the overall comparison of individual protein coding genes with those in L. sativa, G. abyssinica and H. annuus, we demonstrate that the P. argentatum chloroplast genome sequence is most closely related to that of H. annuus. Similar to chloroplast genomes in G. abyssinica, L. sativa and H. annuus, the plastid genome of P. argentatum has a large 23 kb inversion with a smaller 3.4 kb inversion, within the large inversion. Using the matK and psbA-trnH spacer chloroplast DNA barcodes, three of the four Parthenium species tested, P. tomentosum, P. hysterophorus and P. schottii, can be differentiated from P. argentatum. In addition, we identified lines within P. argentatum. Conclusion The genome sequence of the P. argentatum chloroplast will enrich the sequence resources of plastid genomes in commercial crops. The availability of the complete plastid genome sequence may facilitate transformation efficiency by using the precise sequence of endogenous flanking sequences and regulatory elements in chloroplast transformation vectors. The DNA barcoding study forms the foundation for genetic identification of commercially significant lines of P. argentatum that are important for producing latex. PMID:19917140
Rapid Identification of Sequences for Orphan Enzymes to Power Accurate Protein Annotation

PubMed Central

Ojha, Sunil; Watson, Douglas S.; Bomar, Martha G.; Galande, Amit K.; Shearer, Alexander G.

2013-01-01

The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the “back catalog” of enzymology – “orphan enzymes,” those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC) database alone. In this study, we demonstrate how this orphan enzyme “back catalog” is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis) to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology’s “back catalog” another powerful tool to drive accurate genome annotation. PMID:24386392
Rapid identification of sequences for orphan enzymes to power accurate protein annotation.

PubMed

Ramkissoon, Kevin R; Miller, Jennifer K; Ojha, Sunil; Watson, Douglas S; Bomar, Martha G; Galande, Amit K; Shearer, Alexander G

2013-01-01

The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the "back catalog" of enzymology--"orphan enzymes," those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC) database alone. In this study, we demonstrate how this orphan enzyme "back catalog" is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis) to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology's "back catalog" another powerful tool to drive accurate genome annotation.
Allele-specific copy-number discovery from whole-genome and whole-exome sequencing.

PubMed

Wang, WeiBo; Wang, Wei; Sun, Wei; Crowley, James J; Szatkiewicz, Jin P

2015-08-18

Copy-number variants (CNVs) are a major form of genetic variation and a risk factor for various human diseases, so it is crucial to accurately detect and characterize them. It is conceivable that allele-specific reads from high-throughput sequencing data could be leveraged to both enhance CNV detection and produce allele-specific copy number (ASCN) calls. Although statistical methods have been developed to detect CNVs using whole-genome sequence (WGS) and/or whole-exome sequence (WES) data, information from allele-specific read counts has not yet been adequately exploited. In this paper, we develop an integrated method, called AS-GENSENG, which incorporates allele-specific read counts in CNV detection and estimates ASCN using either WGS or WES data. To evaluate the performance of AS-GENSENG, we conducted extensive simulations, generated empirical data using existing WGS and WES data sets and validated predicted CNVs using an independent methodology. We conclude that AS-GENSENG not only predicts accurate ASCN calls but also improves the accuracy of total copy number calls, owing to its unique ability to exploit information from both total and allele-specific read counts while accounting for various experimental biases in sequence data. Our novel, user-friendly and computationally efficient method and a complete analytic protocol is freely available at https://sourceforge.net/projects/asgenseng/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
First Pass Annotation of Promoters on Human Chromosome 22

PubMed Central

Scherf, Matthias; Klingenhoff, Andreas; Frech, Kornelie; Quandt, Kerstin; Schneider, Ralf; Grote, Korbinian; Frisch, Matthias; Gailus-Durner, Valérie; Seidel, Alexander; Brack-Werner, Ruth; Werner, Thomas

2001-01-01

The publication of the first almost complete sequence of a human chromosome (chromosome 22) is a major milestone in human genomics. Together with the sequence, an excellent annotation of genes was published which certainly will serve as an information resource for numerous future projects. We noted that the annotation did not cover regulatory regions; in particular, no promoter annotation has been provided. Here we present an analysis of the complete published chromosome 22 sequence for promoters. A recent breakthrough in specific in silico prediction of promoter regions enabled us to attempt large-scale prediction of promoter regions on chromosome 22. Scanning of sequence databases revealed only 20 experimentally verified promoters, of which 10 were correctly predicted by our approach. Nearly 40% of our 465 predicted promoter regions are supported by the currently available gene annotation. Promoter finding also provides a biologically meaningful method for “chromosomal scaffolding”, by which long genomic sequences can be divided into segments starting with a gene. As one example, the combination of promoter region prediction with exon/intron structure predictions greatly enhances the specificity of de novo gene finding. The present study demonstrates that it is possible to identify promoters in silico on the chromosomal level with sufficient reliability for experimental planning and indicates that a wealth of information about regulatory regions can be extracted from current large-scale (megabase) sequencing projects. Results are available on-line at http://genomatix.gsf.de/chr22/. PMID:11230158
A proteomic analysis of leaf sheaths from rice.

PubMed

Shen, Shihua; Matsubae, Masami; Takao, Toshifumi; Tanaka, Naoki; Komatsu, Setsuko

2002-10-01

The proteins extracted from the leaf sheaths of rice seedlings were separated by 2-D PAGE, and analyzed by Edman sequencing and mass spectrometry, followed by database searching. Image analysis revealed 352 protein spots on 2-D PAGE after staining with Coomassie Brilliant Blue. The amino acid sequences of 44 of 84 proteins were determined; for 31 of these proteins, a clear function could be assigned, whereas for 12 proteins, no function could be assigned. Forty proteins did not yield amino acid sequence information, because they were N-terminally blocked, or the obtained sequences were too short and/or did not give unambiguous results. Fifty-nine proteins were analyzed by mass spectrometry; all of these proteins were identified by matching to the protein database. The amino acid sequences of 19 of 27 proteins analyzed by mass spectrometry were similar to the results of Edman sequencing. These results suggest that 2-D PAGE combined with Edman sequencing and mass spectrometry analysis can be effectively used to identify plant proteins.
Reconstructing Ancient Forms of Life

NASA Technical Reports Server (NTRS)

Benner, Steven A.

1998-01-01

Progress in the past three months has occurred in two areas, reconstruction of ancestral proteins and improved understanding of chemical features that are likely to be universal in generic matter regardless of its genesis. Ancestral ribonucleases have been reconstructed, and an example has been developed that shows how physiological function can be assigned to in vitro behaviors observed in biological systems. Sequence data have been collected to permit the reconstruction of src homology 2 domains that underwent radiative divergence at the time of the radiative divergence of chordates. New studies have been completed that show how genetic matter (or its remnants) might be detected on Mars (or other non-terrean locations.) Last, the first in vitro selection experiments have been completed using a nucleoside library carrying positively charged functionality, illustrating the importance of non-standard nucleotides to those attempting to obtain evidence for an "RNA world" as an early episode of life on earth.
Sex Genotyping of Archival Fixed and Immunolabeled Guinea Pig Cochleas.

PubMed

Depreux, Frédéric F; Czech, Lyubov; Whitlon, Donna S

2018-03-26

For decades, outbred guinea pigs (GP) have been used as research models. Various past research studies using guinea pigs used measures that, unknown at the time, may be sex-dependent, but from which today, archival tissues may be all that remain. We aimed to provide a protocol for sex-typing archival guinea pig tissue, whereby past experiments could be re-evaluated for sex effects. No PCR sex-genotyping protocols existed for GP. We found that published sequence of the GP Sry gene differed from that in two separate GP stocks. We used sequences from other species to deduce PCR primers for Sry. After developing a genomic DNA extraction for archival, fixed, decalcified, immunolabeled, guinea pig cochlear half-turns, we used a multiplex assay (Y-specific Sry; X-specific Dystrophin) to assign sex to tissue as old as 3 years. This procedure should allow reevaluation of prior guinea pig studies in various research areas for the effects of sex on experimental outcomes.
Characterization of full-length sequenced cDNA inserts (FLIcs) from Atlantic salmon (Salmo salar)

PubMed Central

Andreassen, Rune; Lunner, Sigbjørn; Høyheim, Bjørn

2009-01-01

Background Sequencing of the Atlantic salmon genome is now being planned by an international research consortium. Full-length sequenced inserts from cDNAs (FLIcs) are an important tool for correct annotation and clustering of the genomic sequence in any species. The large amount of highly similar duplicate sequences caused by the relatively recent genome duplication in the salmonid ancestor represents a particular challenge for the genome project. FLIcs will therefore be an extremely useful resource for the Atlantic salmon sequencing project. In addition to be helpful in order to distinguish between duplicate genome regions and in determining correct gene structures, FLIcs are an important resource for functional genomic studies and for investigation of regulatory elements controlling gene expression. In contrast to the large number of ESTs available, including the ESTs from 23 developmental and tissue specific cDNA libraries contributed by the Salmon Genome Project (SGP), the number of sequences where the full-length of the cDNA insert has been determined has been small. Results High quality full-length insert sequences from 560 pre-smolt white muscle tissue specific cDNAs were generated, accession numbers [GenBank: BT043497 - BT044056]. Five hundred and ten (91%) of the transcripts were annotated using Gene Ontology (GO) terms and 440 of the FLIcs are likely to contain a complete coding sequence (cCDS). The sequence information was used to identify putative paralogs, characterize salmon Kozak motifs, polyadenylation signal variation and to identify motifs likely to be involved in the regulation of particular genes. Finally, conserved 7-mers in the 3'UTRs were identified, of which some were identical to miRNA target sequences. Conclusion This paper describes the first Atlantic salmon FLIcs from a tissue and developmental stage specific cDNA library. We have demonstrated that many FLIcs contained a complete coding sequence (cCDS). This suggests that the remaining cDNA libraries generated by SGP represent a valuable cCDS FLIc source. The conservation of 7-mers in 3'UTRs indicates that these motifs are functionally important. Identity between some of these 7-mers and miRNA target sequences suggests that they are miRNA targets in Salmo salar transcripts as well. PMID:19878547
First observation of rotational structures in Re 168

DOE PAGES

Hartley, D. J.; Janssens, R. V. F.; Riedinger, L. L.; ...

2016-11-30

We assigned first rotational sequences to the odd-odd nucleus 168Re. Coincidence relationships of these structures with rhenium x rays confirm the isotopic assignment, while arguments based on the γ-ray multiplicity (K-fold) distributions observed with the new bands lead to the mass assignment. Configurations for the two bands were determined through analysis of the rotational alignments of the structures and a comparison of the experimental B(M1)/B(E2) ratios with theory. Tentative spin assignments are proposed for the πh 11/2νi 13/2 band, based on energy level systematics for other known sequences in neighboring odd-odd rhenium nuclei, as well as on systematics seen formore » the signature inversion feature that is well known in this region. Furthermore, the spin assignment for the πh 11/2ν(h 9/2/f 7/2) structure provides additional validation of the proposed spins and configurations for isomers in the 176Au → 172Ir → 168Re α-decay chain.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Hartley, D. J.; Janssens, R. V. F.; Riedinger, L. L.

We assigned first rotational sequences to the odd-odd nucleus 168Re. Coincidence relationships of these structures with rhenium x rays confirm the isotopic assignment, while arguments based on the γ-ray multiplicity (K-fold) distributions observed with the new bands lead to the mass assignment. Configurations for the two bands were determined through analysis of the rotational alignments of the structures and a comparison of the experimental B(M1)/B(E2) ratios with theory. Tentative spin assignments are proposed for the πh 11/2νi 13/2 band, based on energy level systematics for other known sequences in neighboring odd-odd rhenium nuclei, as well as on systematics seen formore » the signature inversion feature that is well known in this region. Furthermore, the spin assignment for the πh 11/2ν(h 9/2/f 7/2) structure provides additional validation of the proposed spins and configurations for isomers in the 176Au → 172Ir → 168Re α-decay chain.« less
Selective excitation for spectral editing and assignment in separated local field experiments of oriented membrane proteins

NASA Astrophysics Data System (ADS)

Koroloff, Sophie N.; Nevzorov, Alexander A.

2017-01-01

Spectroscopic assignment of NMR spectra for oriented uniformly labeled membrane proteins embedded in their native-like bilayer environment is essential for their structure determination. However, sequence-specific assignment in oriented-sample (OS) NMR is often complicated by insufficient resolution and spectral crowding. Therefore, the assignment process is usually done by a laborious and expensive "shotgun" method involving multiple selective labeling of amino acid residues. Presented here is a strategy to overcome poor spectral resolution in crowded regions of 2D spectra by selecting resolved "seed" residues via soft Gaussian pulses inserted into spin-exchange separated local-field experiments. The Gaussian pulse places the selected polarization along the z-axis while dephasing the other signals before the evolution of the 1H-15N dipolar couplings. The transfer of magnetization is accomplished via mismatched Hartmann-Hahn conditions to the nearest-neighbor peaks via the proton bath. By optimizing the length and amplitude of the Gaussian pulse, one can also achieve a phase inversion of the closest peaks, thus providing an additional phase contrast. From the superposition of the selective spin-exchanged SAMPI4 onto the fully excited SAMPI4 spectrum, the 15N sites that are directly adjacent to the selectively excited residues can be easily identified, thereby providing a straightforward method for initiating the assignment process in oriented membrane proteins.

Histoimmunogenetics Markup Language 1.0: Reporting next generation sequencing-based HLA and KIR genotyping.

PubMed

Milius, Robert P; Heuer, Michael; Valiga, Daniel; Doroschak, Kathryn J; Kennedy, Caleb J; Bolon, Yung-Tsi; Schneider, Joel; Pollack, Jane; Kim, Hwa Ran; Cereb, Nezih; Hollenbach, Jill A; Mack, Steven J; Maiers, Martin

2015-12-01

We present an electronic format for exchanging data for HLA and KIR genotyping with extensions for next-generation sequencing (NGS). This format addresses NGS data exchange by refining the Histoimmunogenetics Markup Language (HML) to conform to the proposed Minimum Information for Reporting Immunogenomic NGS Genotyping (MIRING) reporting guidelines (miring.immunogenomics.org). Our refinements of HML include two major additions. First, NGS is supported by new XML structures to capture additional NGS data and metadata required to produce a genotyping result, including analysis-dependent (dynamic) and method-dependent (static) components. A full genotype, consensus sequence, and the surrounding metadata are included directly, while the raw sequence reads and platform documentation are externally referenced. Second, genotype ambiguity is fully represented by integrating Genotype List Strings, which use a hierarchical set of delimiters to represent allele and genotype ambiguity in a complete and accurate fashion. HML also continues to enable the transmission of legacy methods (e.g. site-specific oligonucleotide, sequence-specific priming, and Sequence Based Typing (SBT)), adding features such as allowing multiple group-specific sequencing primers, and fully leveraging techniques that combine multiple methods to obtain a single result, such as SBT integrated with NGS. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Illumina MiSeq Sequencing for Preliminary Analysis of Microbiome Causing Primary Endodontic Infections in Egypt

PubMed Central

Azab, Marwa Mohamed; Fayyad, Dalia Mukhtar

2018-01-01

The use of high throughput next generation technologies has allowed more comprehensive analysis than traditional Sanger sequencing. The specific aim of this study was to investigate the microbial diversity of primary endodontic infections using Illumina MiSeq sequencing platform in Egyptian patients. Samples were collected from 19 patients in Suez Canal University Hospital (Endodontic Department) using sterile # 15K file and paper points. DNA was extracted using Mo Bio power soil DNA isolation extraction kit followed by PCR amplification and agarose gel electrophoresis. The microbiome was characterized on the basis of the V3 and V4 hypervariable region of the 16S rRNA gene by using paired-end sequencing on Illumina MiSeq device. MOTHUR software was used in sequence filtration and analysis of sequenced data. A total of 1858 operational taxonomic units at 97% similarity were assigned to 26 phyla, 245 families, and 705 genera. Four main phyla Firmicutes, Bacteroidetes, Proteobacteria, and Synergistetes were predominant in all samples. At genus level, Prevotella, Bacillus, Porphyromonas, Streptococcus, and Bacteroides were the most abundant. Illumina MiSeq platform sequencing can be used to investigate oral microbiome composition of endodontic infections. Elucidating the ecology of endodontic infections is a necessary step in developing effective intracanal antimicrobials. PMID:29849646
Mapping and characterization of wheat stem rust resistance genes SrTm5 and Sr60 from Triticum monococcum.

PubMed

Chen, Shisheng; Guo, Yan; Briggs, Jordan; Dubach, Felix; Chao, Shiaoman; Zhang, Wenjun; Rouse, Matthew N; Dubcovsky, Jorge

2018-03-01

The new stem rust resistance gene Sr60 was fine-mapped to the distal region of chromosome arm 5A m S, and the TTKSK-effective gene SrTm5 could be a new allele of Sr22. The emergence and spread of new virulent races of the wheat stem rust pathogen (Puccinia graminis f. sp. tritici; Pgt), including the Ug99 race group, is a serious threat to global wheat production. In this study, we mapped and characterized two stem rust resistance genes from diploid wheat Triticum monococcum accession PI 306540. We mapped SrTm5, a previously postulated gene effective to Ug99, on chromosome arm 7A m L, completely linked to Sr22. SrTm5 displayed a different race specificity compared to Sr22 indicating that they are distinct. Sequencing of the Sr22 homolog in PI 306540 revealed a novel haplotype. Characterization of the segregating populations with Pgt race QFCSC revealed an additional resistance gene on chromosome arm 5A m S that was assigned the official name Sr60. This gene was also effective against races QTHJC and SCCSC but not against TTKSK (a Ug99 group race). Using two large mapping populations (4046 gametes), we mapped Sr60 within a 0.44 cM interval flanked by sequenced-based markers GH724575 and CJ942731. These two markers delimit a 54.6-kb region in Brachypodium distachyon chromosome 4 and a 430-kb region in the Chinese Spring reference genome. Both regions include a leucine-rich repeat protein kinase (LRRK123.1) that represents a potential candidate gene. Three CC-NBS-LRR genes were found in the colinear Brachypodium region but not in the wheat genome. We are currently developing a Bacterial Artificial Chromosome library of PI 306540 to determine which of these candidate genes are present in the T. monococcum genome and to complete the cloning of Sr60.
Taxonomic Characterization of Honey Bee (Apis mellifera) Pollen Foraging Based on Non-Overlapping Paired-End Sequencing of Nuclear Ribosomal Loci.

PubMed

Cornman, R Scott; Otto, Clint R V; Iwanowicz, Deborah; Pettis, Jeffery S

2015-01-01

Identifying plant taxa that honey bees (Apis mellifera) forage upon is of great apicultural interest, but traditional methods are labor intensive and may lack resolution. Here we evaluate a high-throughput genetic barcoding approach to characterize trap-collected pollen from multiple North Dakota apiaries across multiple years. We used the Illumina MiSeq platform to generate sequence scaffolds from non-overlapping 300-bp paired-end sequencing reads of the ribosomal internal transcribed spacers (ITS). Full-length sequence scaffolds represented ~530 bp of ITS sequence after adapter trimming, drawn from the 5' of ITS1 and the 3' of ITS2, while skipping the uninformative 5.8S region. Operational taxonomic units (OTUs) were picked from scaffolds clustered at 97% identity, searched by BLAST against the nt database, and given taxonomic assignments using the paired-read lowest common ancestor approach. Taxonomic assignments and quantitative patterns were consistent with known plant distributions, phenology, and observational reports of pollen foraging, but revealed an unexpected contribution from non-crop graminoids and wetland plants. The mean number of plant species assignments per sample was 23.0 (+/- 5.5) and the mean species diversity (effective number of equally abundant species) was 3.3 (+/- 1.2). Bray-Curtis similarities showed good agreement among samples from the same apiary and sampling date. Rarefaction plots indicated that fewer than 50,000 reads are typically needed to characterize pollen samples of this complexity. Our results show that a pre-compiled, curated reference database is not essential for genus-level assignments, but species-level assignments are hindered by database gaps, reference length variation, and probable errors in the taxonomic assignment, requiring post-hoc evaluation. Although the effective per-sample yield achieved using custom MiSeq amplicon primers was less than the machine maximum, primarily due to lower "read2" quality, further protocol optimization and/or a modest reduction in multiplex scale should offset this difficulty. As small quantities of pollen are sufficient for amplification, our approach might be extendable to other questions or species for which large pollen samples are not available.
Taxonomic Characterization of Honey Bee (Apis mellifera) Pollen Foraging Based on Non-Overlapping Paired-End Sequencing of Nuclear Ribosomal Loci

PubMed Central

Cornman, R. Scott; Otto, Clint R. V.; Iwanowicz, Deborah; Pettis, Jeffery S.

2015-01-01

Identifying plant taxa that honey bees (Apis mellifera) forage upon is of great apicultural interest, but traditional methods are labor intensive and may lack resolution. Here we evaluate a high-throughput genetic barcoding approach to characterize trap-collected pollen from multiple North Dakota apiaries across multiple years. We used the Illumina MiSeq platform to generate sequence scaffolds from non-overlapping 300-bp paired-end sequencing reads of the ribosomal internal transcribed spacers (ITS). Full-length sequence scaffolds represented ~530 bp of ITS sequence after adapter trimming, drawn from the 5’ of ITS1 and the 3’ of ITS2, while skipping the uninformative 5.8S region. Operational taxonomic units (OTUs) were picked from scaffolds clustered at 97% identity, searched by BLAST against the nt database, and given taxonomic assignments using the paired-read lowest common ancestor approach. Taxonomic assignments and quantitative patterns were consistent with known plant distributions, phenology, and observational reports of pollen foraging, but revealed an unexpected contribution from non-crop graminoids and wetland plants. The mean number of plant species assignments per sample was 23.0 (+/- 5.5) and the mean species diversity (effective number of equally abundant species) was 3.3 (+/- 1.2). Bray-Curtis similarities showed good agreement among samples from the same apiary and sampling date. Rarefaction plots indicated that fewer than 50,000 reads are typically needed to characterize pollen samples of this complexity. Our results show that a pre-compiled, curated reference database is not essential for genus-level assignments, but species-level assignments are hindered by database gaps, reference length variation, and probable errors in the taxonomic assignment, requiring post-hoc evaluation. Although the effective per-sample yield achieved using custom MiSeq amplicon primers was less than the machine maximum, primarily due to lower “read2” quality, further protocol optimization and/or a modest reduction in multiplex scale should offset this difficulty. As small quantities of pollen are sufficient for amplification, our approach might be extendable to other questions or species for which large pollen samples are not available. PMID:26700168
Taxonomic characterization of honey bee (Apis mellifera) pollen foraging based on non-overlapping paired-end sequencing of nuclear ribosomal loci

USGS Publications Warehouse

Cornman, Robert S.; Otto, Clint R.; Iwanowicz, Deborah; Pettis, Jeffery S

2015-01-01

Identifying plant taxa that honey bees (Apis mellifera) forage upon is of great apicultural interest, but traditional methods are labor intensive and may lack resolution. Here we evaluate a high-throughput genetic barcoding approach to characterize trap-collected pollen from multiple North Dakota apiaries across multiple years. We used the Illumina MiSeq platform to generate sequence scaffolds from non-overlapping 300-bp paired-end sequencing reads of the ribosomal internal transcribed spacers (ITS). Full-length sequence scaffolds represented ~530 bp of ITS sequence after adapter trimming, drawn from the 5’ of ITS1 and the 3’ of ITS2, while skipping the uninformative 5.8S region. Operational taxonomic units (OTUs) were picked from scaffolds clustered at 97% identity, searched by BLAST against the nt database, and given taxonomic assignments using the paired-read lowest common ancestor approach. Taxonomic assignments and quantitative patterns were consistent with known plant distributions, phenology, and observational reports of pollen foraging, but revealed an unexpected contribution from non-crop graminoids and wetland plants. The mean number of plant species assignments per sample was 23.0 (+/- 5.5) and the mean species diversity (effective number of equally abundant species) was 3.3 (+/- 1.2). Bray-Curtis similarities showed good agreement among samples from the same apiary and sampling date. Rarefaction plots indicated that fewer than 50,000 reads are typically needed to characterize pollen samples of this complexity. Our results show that a pre-compiled, curated reference database is not essential for genus-level assignments, but species-level assignments are hindered by database gaps, reference length variation, and probable errors in the taxonomic assignment, requiring post-hoc evaluation. Although the effective per-sample yield achieved using custom MiSeq amplicon primers was less than the machine maximum, primarily due to lower “read2” quality, further protocol optimization and/or a modest reduction in multiplex scale should offset this difficulty. As small quantities of pollen are sufficient for amplification, our approach might be extendable to other questions or species for which large pollen samples are not available.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Sikorski, Johannes; Lapidus, Alla L.; Copeland, A

Segniliparus rotundus Butler 2005 is the type species of the genus Segniliparus, which is cur-rently the only genus in the corynebacterial family Segniliparaceae. This family is of large in-terest because of a novel late-emerging genus-specific mycolate pattern. The type strain has been isolated from human sputum and is probably an opportunistic pathogen. Here we de-scribe the features of this organism, together with the complete genome sequence and anno-tation. This is the first completed genome sequence of the family Segniliparaceae. The 3,157,527 bp long genome with its 3,081 protein-coding and 52 RNA genes is part of the Genomic Encyclopedia of Bacteriamore » and Archaea project.« less
Differentiating Visual from Response Sequencing during Long-term Skill Learning.

PubMed

Lynch, Brighid; Beukema, Patrick; Verstynen, Timothy

2017-01-01

The dual-system model of sequence learning posits that during early learning there is an advantage for encoding sequences in sensory frames; however, it remains unclear whether this advantage extends to long-term consolidation. Using the serial RT task, we set out to distinguish the dynamics of learning sequential orders of visual cues from learning sequential responses. On each day, most participants learned a new mapping between a set of symbolic cues and responses made with one of four fingers, after which they were exposed to trial blocks of either randomly ordered cues or deterministic ordered cues (12-item sequence). Participants were randomly assigned to one of four groups (n = 15 per group): Visual sequences (same sequence of visual cues across training days), Response sequences (same order of key presses across training days), Combined (same serial order of cues and responses on all training days), and a Control group (a novel sequence each training day). Across 5 days of training, sequence-specific measures of response speed and accuracy improved faster in the Visual group than any of the other three groups, despite no group differences in explicit awareness of the sequence. The two groups that were exposed to the same visual sequence across days showed a marginal improvement in response binding that was not found in the other groups. These results indicate that there is an advantage, in terms of rate of consolidation across multiple days of training, for learning sequences of actions in a sensory representational space, rather than as motoric representations.
A second-generation anchored genetic linkage map of the tammar wallaby (Macropus eugenii)

PubMed Central

2011-01-01

Background The tammar wallaby, Macropus eugenii, a small kangaroo used for decades for studies of reproduction and metabolism, is the model Australian marsupial for genome sequencing and genetic investigations. The production of a more comprehensive cytogenetically-anchored genetic linkage map will significantly contribute to the deciphering of the tammar wallaby genome. It has great value as a resource to identify novel genes and for comparative studies, and is vital for the ongoing genome sequence assembly and gene ordering in this species. Results A second-generation anchored tammar wallaby genetic linkage map has been constructed based on a total of 148 loci. The linkage map contains the original 64 loci included in the first-generation map, plus an additional 84 microsatellite loci that were chosen specifically to increase coverage and assist with the anchoring and orientation of linkage groups to chromosomes. These additional loci were derived from (a) sequenced BAC clones that had been previously mapped to tammar wallaby chromosomes by fluorescence in situ hybridization (FISH), (b) End sequence from BACs subsequently FISH-mapped to tammar wallaby chromosomes, and (c) tammar wallaby genes orthologous to opossum genes predicted to fill gaps in the tammar wallaby linkage map as well as three X-linked markers from a published study. Based on these 148 loci, eight linkage groups were formed. These linkage groups were assigned (via FISH-mapped markers) to all seven autosomes and the X chromosome. The sex-pooled map size is 1402.4 cM, which is estimated to provide 82.6% total coverage of the genome, with an average interval distance of 10.9 cM between adjacent markers. The overall ratio of female/male map length is 0.84, which is comparable to the ratio of 0.78 obtained for the first-generation map. Conclusions Construction of this second-generation genetic linkage map is a significant step towards complete coverage of the tammar wallaby genome and considerably extends that of the first-generation map. It will be a valuable resource for ongoing tammar wallaby genetic research and assembling the genome sequence. The sex-pooled map is available online at http://compldb.angis.org.au/. PMID:21854616
A second-generation anchored genetic linkage map of the tammar wallaby (Macropus eugenii).

PubMed

Wang, Chenwei; Webley, Lee; Wei, Ke-jun; Wakefield, Matthew J; Patel, Hardip R; Deakin, Janine E; Alsop, Amber; Marshall Graves, Jennifer A; Cooper, Desmond W; Nicholas, Frank W; Zenger, Kyall R

2011-08-19

The tammar wallaby, Macropus eugenii, a small kangaroo used for decades for studies of reproduction and metabolism, is the model Australian marsupial for genome sequencing and genetic investigations. The production of a more comprehensive cytogenetically-anchored genetic linkage map will significantly contribute to the deciphering of the tammar wallaby genome. It has great value as a resource to identify novel genes and for comparative studies, and is vital for the ongoing genome sequence assembly and gene ordering in this species. A second-generation anchored tammar wallaby genetic linkage map has been constructed based on a total of 148 loci. The linkage map contains the original 64 loci included in the first-generation map, plus an additional 84 microsatellite loci that were chosen specifically to increase coverage and assist with the anchoring and orientation of linkage groups to chromosomes. These additional loci were derived from (a) sequenced BAC clones that had been previously mapped to tammar wallaby chromosomes by fluorescence in situ hybridization (FISH), (b) End sequence from BACs subsequently FISH-mapped to tammar wallaby chromosomes, and (c) tammar wallaby genes orthologous to opossum genes predicted to fill gaps in the tammar wallaby linkage map as well as three X-linked markers from a published study. Based on these 148 loci, eight linkage groups were formed. These linkage groups were assigned (via FISH-mapped markers) to all seven autosomes and the X chromosome. The sex-pooled map size is 1402.4 cM, which is estimated to provide 82.6% total coverage of the genome, with an average interval distance of 10.9 cM between adjacent markers. The overall ratio of female/male map length is 0.84, which is comparable to the ratio of 0.78 obtained for the first-generation map. Construction of this second-generation genetic linkage map is a significant step towards complete coverage of the tammar wallaby genome and considerably extends that of the first-generation map. It will be a valuable resource for ongoing tammar wallaby genetic research and assembling the genome sequence. The sex-pooled map is available online at http://compldb.angis.org.au/.
Report number codes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nelson, R.N.

This publication lists all report number codes processed by the Office of Scientific and Technical Information. The report codes are substantially based on the American National Standards Institute, Standard Technical Report Number (STRN)-Format and Creation Z39.23-1983. The Standard Technical Report Number (STRN) provides one of the primary methods of identifying a specific technical report. The STRN consists of two parts: The report code and the sequential number. The report code identifies the issuing organization, a specific program, or a type of document. The sequential number, which is assigned in sequence by each report issuing entity, is not included in thismore » publication. Part I of this compilation is alphabetized by report codes followed by issuing installations. Part II lists the issuing organization followed by the assigned report code(s). In both Parts I and II, the names of issuing organizations appear for the most part in the form used at the time the reports were issued. However, for some of the more prolific installations which have had name changes, all entries have been merged under the current name.« less
TypeLoader: A fast and efficient automated workflow for the annotation and submission of novel full-length HLA alleles.

PubMed

Surendranath, V; Albrecht, V; Hayhurst, J D; Schöne, B; Robinson, J; Marsh, S G E; Schmidt, A H; Lange, V

2017-07-01

Recent years have seen a rapid increase in the discovery of novel allelic variants of the human leukocyte antigen (HLA) genes. Commonly, only the exons encoding the peptide binding domains of novel HLA alleles are submitted. As a result, the IPD-IMGT/HLA Database lacks sequence information outside those regions for the majority of known alleles. This has implications for the application of the new sequencing technologies, which deliver sequence data often covering the complete gene. As these technologies simplify the characterization of the complete gene regions, it is desirable for novel alleles to be submitted as full-length sequences to the database. However, the manual annotation of full-length alleles and the generation of specific formats required by the sequence repositories is prone to error and time consuming. We have developed TypeLoader to address both these facets. With only the full-length sequence as a starting point, Typeloader performs automatic sequence annotation and subsequently handles all steps involved in preparing the specific formats for submission with very little manual intervention. TypeLoader is routinely used at the DKMS Life Science Lab and has aided in the successful submission of more than 900 novel HLA alleles as full-length sequences to the European Nucleotide Archive repository and the IPD-IMGT/HLA Database with a 95% reduction in the time spent on annotation and submission when compared with handling these processes manually. TypeLoader is implemented as a web application and can be easily installed and used on a standalone Linux desktop system or within a Linux client/server architecture. TypeLoader is downloadable from http://www.github.com/DKMS-LSL/typeloader. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
A teaching-learning sequence about weather map reading

NASA Astrophysics Data System (ADS)

Mandrikas, Achilleas; Stavrou, Dimitrios; Skordoulis, Constantine

2017-07-01

In this paper a teaching-learning sequence (TLS) introducing pre-service elementary teachers (PET) to weather map reading, with emphasis on wind assignment, is presented. The TLS includes activities about recognition of wind symbols, assignment of wind direction and wind speed on a weather map and identification of wind characteristics in a weather forecast. Sixty PET capabilities and difficulties in understanding weather maps were investigated, using inquiry-based learning activities. The results show that most PET became more capable of reading weather maps and assigning wind direction and speed on them. Our results also show that PET could be guided to understand meteorology concepts useful in everyday life and in teaching their future students.
Studying long 16S rDNA sequences with ultrafast-metagenomic sequence classification using exact alignments (Kraken).

PubMed

Valenzuela-González, Fabiola; Martínez-Porchas, Marcel; Villalpando-Canchola, Enrique; Vargas-Albores, Francisco

2016-03-01

Ultrafast-metagenomic sequence classification using exact alignments (Kraken) is a novel approach to classify 16S rDNA sequences. The classifier is based on mapping short sequences to the lowest ancestor and performing alignments to form subtrees with specific weights in each taxon node. This study aimed to evaluate the classification performance of Kraken with long 16S rDNA random environmental sequences produced by cloning and then Sanger sequenced. A total of 480 clones were isolated and expanded, and 264 of these clones formed contigs (1352 ± 153 bp). The same sequences were analyzed using the Ribosomal Database Project (RDP) classifier. Deeper classification performance was achieved by Kraken than by the RDP: 73% of the contigs were classified up to the species or variety levels, whereas 67% of these contigs were classified no further than the genus level by the RDP. The results also demonstrated that unassembled sequences analyzed by Kraken provide similar or inclusively deeper information. Moreover, sequences that did not form contigs, which are usually discarded by other programs, provided meaningful information when analyzed by Kraken. Finally, it appears that the assembly step for Sanger sequences can be eliminated when using Kraken. Kraken cumulates the information of both sequence senses, providing additional elements for the classification. In conclusion, the results demonstrate that Kraken is an excellent choice for use in the taxonomic assignment of sequences obtained by Sanger sequencing or based on third generation sequencing, of which the main goal is to generate larger sequences. Copyright © 2016 Elsevier B.V. All rights reserved.
Sma3s: A universal tool for easy functional annotation of proteomes and transcriptomes.

PubMed

Casimiro-Soriguer, Carlos S; Muñoz-Mérida, Antonio; Pérez-Pulido, Antonio J

2017-06-01

The current cheapening of next-generation sequencing has led to an enormous growth in the number of sequenced genomes and transcriptomes, allowing wet labs to get the sequences from their organisms of study. To make the most of these data, one of the first things that should be done is the functional annotation of the protein-coding genes. But it used to be a slow and tedious step that can involve the characterization of thousands of sequences. Sma3s is an accurate computational tool for annotating proteins in an unattended way. Now, we have developed a completely new version, which includes functionalities that will be of utility for fundamental and applied science. Currently, the results provide functional categories such as biological processes, which become useful for both characterizing particular sequence datasets and comparing results from different projects. But one of the most important implemented innovations is that it has now low computational requirements, and the complete annotation of a simple proteome or transcriptome usually takes around 24 hours in a personal computer. Sma3s has been tested with a large amount of complete proteomes and transcriptomes, and it has demonstrated its potential in health science and other specific projects. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
The complete genome sequencing of Prevotella intermedia strain OMA14 and a subsequent fine-scale, intra-species genomic comparison reveal an unusual amplification of conjugative and mobile transposons and identify a novel Prevotella-lineage-specific repeat.

PubMed

Naito, Mariko; Ogura, Yoshitoshi; Itoh, Takehiko; Shoji, Mikio; Okamoto, Masaaki; Hayashi, Tetsuya; Nakayama, Koji

2016-02-01

Prevotella intermedia is a pathogenic bacterium involved in periodontal diseases. Here, we present the complete genome sequence of a clinical strain, OMA14, of this bacterium along with the results of comparative genome analysis with strain 17 of the same species whose genome has also been sequenced, but not fully analysed yet. The genomes of both strains consist of two circular chromosomes: the larger chromosomes are similar in size and exhibit a high overall linearity of gene organizations, whereas the smaller chromosomes show a significant size variation and have undergone remarkable genome rearrangements. Unique features of the Pre. intermedia genomes are the presence of a remarkable number of essential genes on the second chromosomes and the abundance of conjugative and mobilizable transposons (CTns and MTns). The CTns/MTns are particularly abundant in the second chromosomes, involved in its extensive genome rearrangement, and have introduced a number of strain-specific genes into each strain. We also found a novel 188-bp repeat sequence that has been highly amplified in Pre. intermedia and are specifically distributed among the Pre. intermedia-related species. These findings expand our understanding of the genetic features of Pre. intermedia and the roles of CTns and MTns in the evolution of bacteria. © The Author 2015. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Incidence of dentinal defects after root canal preparation: reciprocating versus rotary instrumentation.

PubMed

Bürklein, Sebastian; Tsotsis, Polymnia; Schäfer, Edgar

2013-04-01

The purpose of this study was to evaluate the incidence of dentinal defects after root canal preparation with reciprocating instruments (Reciproc and WaveOne) and rotary instruments. One hundred human central mandibular incisors were randomly assigned to 5 groups (n = 20 teeth per group). The root canals were instrumented by using the reciprocating single-file systems Reciproc and WaveOne and the full-sequence rotary Mtwo and ProTaper instruments. One group was left unprepared as control. Roots were sectioned horizontally at 3, 6, and 9 mm from the apex and evaluated under a microscope by using 25-fold magnification. The presence of dentinal defects (complete/incomplete cracks and craze lines) was noted and analyzed by using the chi-square test. No defects were observed in the controls. All canal preparation created dentinal defects. Overall, instrumentation with Reciproc was associated with more complete cracks than the full-sequence files (P = .021). Although both reciprocating files produced more incomplete cracks apically (3 mm) compared with the rotary files (P = .001), no statistically significant differences were obtained concerning the summarized values of all cross sections (P > .05). Under the conditions of this study, root canal preparation with both rotary and reciprocating instruments resulted in dentinal defects. At the apical level of the canals, reciprocating files produced significantly more incomplete dentinal cracks than full-sequence rotary systems (P < .05). Copyright © 2013 American Association of Endodontists. Published by Elsevier Inc. All rights reserved.
Looking into flowering time in almond (Prunus dulcis (Mill) D. A. Webb): the candidate gene approach.

PubMed

Silva, C; Garcia-Mas, J; Sánchez, A M; Arús, P; Oliveira, M M

2005-03-01

Blooming time is one of the most important agronomic traits in almond. Biochemical and molecular events underlying flowering regulation must be understood before methods to stimulate late flowering can be developed. Attempts to elucidate the genetic control of this process have led to the identification of a major gene (Lb) and quantitative trait loci (QTLs) linked to observed phenotypic differences, but although this gene and these QTLs have been placed on the Prunus reference genetic map, their sequences and specific functions remain unknown. The aim of our investigation was to associate these loci with known genes using a candidate gene approach. Two almond cDNAs and eight Prunus expressed sequence tags were selected as candidate genes (CGs) since their sequences were highly identical to those of flowering regulatory genes characterized in other species. The CGs were amplified from both parental lines of the mapping population using specific primers. Sequence comparison revealed DNA polymorphisms between the parental lines, mainly of the single nucleotide type. Polymorphisms were used to develop co-dominant cleaved amplified polymorphic sequence markers or length polymorphisms based on insertion/deletion events for mapping the candidate genes on the Prunus reference map. Ten candidate genes were assigned to six linkage groups in the Prunus genome. The positions of two of these were compatible with the regions where two QTLs for blooming time were detected. One additional candidate was localized close to the position of the Evergrowing gene, which determines a non-deciduous behaviour in peach.
Improved serial analysis of V1 ribosomal sequence tags (SARST-V1) provides a rapid, comprehensive, sequence-based characterization of bacterial diversity and community composition.

PubMed

Yu, Zhongtang; Yu, Marie; Morrison, Mark

2006-04-01

Serial analysis of ribosomal sequence tags (SARST) is a recently developed technology that can generate large 16S rRNA gene (rrs) sequence data sets from microbiomes, but there are numerous enzymatic and purification steps required to construct the ribosomal sequence tag (RST) clone libraries. We report here an improved SARST method, which still targets the V1 hypervariable region of rrs genes, but reduces the number of enzymes, oligonucleotides, reagents, and technical steps needed to produce the RST clone libraries. The new method, hereafter referred to as SARST-V1, was used to examine the eubacterial diversity present in community DNA recovered from the microbiome resident in the ovine rumen. The 190 sequenced clones contained 1055 RSTs and no less than 236 unique phylotypes (based on > or = 95% sequence identity) that were assigned to eight different eubacterial phyla. Rarefaction and monomolecular curve analyses predicted that the complete RST clone library contains 99% of the 353 unique phylotypes predicted to exist in this microbiome. When compared with ribosomal intergenic spacer analysis (RISA) of the same community DNA sample, as well as a compilation of nine previously published conventional rrs clone libraries prepared from the same type of samples, the RST clone library provided a more comprehensive characterization of the eubacterial diversity present in rumen microbiomes. As such, SARST-V1 should be a useful tool applicable to comprehensive examination of diversity and composition in microbiomes and offers an affordable, sequence-based method for diversity analysis.
A meta-analysis of bacterial diversity in the feces of cattle

USDA-ARS?s Scientific Manuscript database

In this study, we conducted a meta-analysis on 16S rRNA gene sequences of bovine fecal origin that are publicly available in the RDP database. A total of 13663 sequences including 603 isolate sequences were identified in the RDP database (Release 11, Update 1), where 13447 sequences were assigned t...

Complete genome sequence of a potyvirus infecting yam beans (Pachyrhizus spp.) in Peru.

PubMed

Fuentes, Segundo; Heider, Bettina; Tasso, Ruby Carolina; Romero, Elisa; Zum Felde, Thomas; Kreuze, Jan Frederik

2012-04-01

In 2010, yam beans in a field trial in Peru showed viral disease symptoms. Graft-transmission and positive ELISA results using potyvirus-specific antibodies suggested that the symptoms could be the result of a potyviral infection. Small interfering RNA (siRNA) were extracted from one of the samples and sent for high-throughput sequencing. The full genome of a new potyvirus could be assembled from the resulting siRNA sequences, and it was sufficiently different from other sequences to be considered a member of a new species, which we have designated Yam bean mosaic virus (YBMV). Sequence similarity suggests that YBMV has also been detected in yam beans in Indonesia.
OrthoSelect: a protocol for selecting orthologous groups in phylogenomics.

PubMed

Schreiber, Fabian; Pick, Kerstin; Erpenbeck, Dirk; Wörheide, Gert; Morgenstern, Burkhard

2009-07-16

Phylogenetic studies using expressed sequence tags (EST) are becoming a standard approach to answer evolutionary questions. Such studies are usually based on large sets of newly generated, unannotated, and error-prone EST sequences from different species. A first crucial step in EST-based phylogeny reconstruction is to identify groups of orthologous sequences. From these data sets, appropriate target genes are selected, and redundant sequences are eliminated to obtain suitable sequence sets as input data for tree-reconstruction software. Generating such data sets manually can be very time consuming. Thus, software tools are needed that carry out these steps automatically. We developed a flexible and user-friendly software pipeline, running on desktop machines or computer clusters, that constructs data sets for phylogenomic analyses. It automatically searches assembled EST sequences against databases of orthologous groups (OG), assigns ESTs to these predefined OGs, translates the sequences into proteins, eliminates redundant sequences assigned to the same OG, creates multiple sequence alignments of identified orthologous sequences and offers the possibility to further process this alignment in a last step by excluding potentially homoplastic sites and selecting sufficiently conserved parts. Our software pipeline can be used as it is, but it can also be adapted by integrating additional external programs. This makes the pipeline useful for non-bioinformaticians as well as to bioinformatic experts. The software pipeline is especially designed for ESTs, but it can also handle protein sequences. OrthoSelect is a tool that produces orthologous gene alignments from assembled ESTs. Our tests show that OrthoSelect detects orthologs in EST libraries with high accuracy. In the absence of a gold standard for orthology prediction, we compared predictions by OrthoSelect to a manually created and published phylogenomic data set. Our tool was not only able to rebuild the data set with a specificity of 98%, but it detected four percent more orthologous sequences. Furthermore, the results OrthoSelect produces are in absolut agreement with the results of other programs, but our tool offers a significant speedup and additional functionality, e.g. handling of ESTs, computing sequence alignments, and refining them. To our knowledge, there is currently no fully automated and freely available tool for this purpose. Thus, OrthoSelect is a valuable tool for researchers in the field of phylogenomics who deal with large quantities of EST sequences. OrthoSelect is written in Perl and runs on Linux/Mac OS X. The tool can be downloaded at (http://gobics.de/fabian/orthoselect.php).
Complete (1)H resonance assignment of beta-maltose from (1)H-(1)H DQ-SQ CRAMPS and (1)H (DQ-DUMBO)-(13)C SQ refocused INEPT 2D solid-state NMR spectra and first principles GIPAW calculations.

PubMed

Webber, Amy L; Elena, Bénédicte; Griffin, John M; Yates, Jonathan R; Pham, Tran N; Mauri, Francesco; Pickard, Chris J; Gil, Ana M; Stein, Robin; Lesage, Anne; Emsley, Lyndon; Brown, Steven P

2010-07-14

A disaccharide is a challenging case for high-resolution (1)H solid-state NMR because of the 24 distinct protons (14 aliphatic and 10 OH) having (1)H chemical shifts that all fall within a narrow range of approximately 3 to 7 ppm. High-resolution (1)H (500 MHz) double-quantum (DQ) combined rotation and multiple pulse sequence (CRAMPS) solid-state NMR spectra of beta-maltose monohydrate are presented. (1)H-(1)H DQ-SQ CRAMPS spectra are presented together with (1)H (DQ)-(13)C correlation spectra obtained with a new pulse sequence that correlates a high-resolution (1)H DQ dimension with a (13)C single quantum (SQ) dimension using the refocused INEPT pulse-sequence element to transfer magnetization via one-bond (13)C-(1)H J couplings. Compared to the observation of only a single broad peak in a (1)H DQ spectrum recorded at 30 kHz magic-angle spinning (MAS), the use of DUMBO (1)H homonuclear decoupling in the (1)H DQ CRAMPS experiment allows the resolution of distinct DQ correlation peaks which, in combination with first-principles chemical shift calculations based on the GIPAW (Gauge Including Projector Augmented Waves) plane-wave pseudopotential approach, enables the assignment of the (1)H resonances to the 24 distinct protons. We believe this to be the first experimental solid-state NMR determination of the hydroxyl OH (1)H chemical shifts for a simple sugar. Variable-temperature (1)H-(1)H DQ CRAMPS spectra reveal small increases in the (1)H chemical shifts of the OH resonances upon decreasing the temperature from 348 K to 248 K.
FFPred 2.0: Improved Homology-Independent Prediction of Gene Ontology Terms for Eukaryotic Protein Sequences

PubMed Central

Minneci, Federico; Piovesan, Damiano; Cozzetto, Domenico; Jones, David T.

2013-01-01

To understand fully cell behaviour, biologists are making progress towards cataloguing the functional elements in the human genome and characterising their roles across a variety of tissues and conditions. Yet, functional information – either experimentally validated or computationally inferred by similarity – remains completely missing for approximately 30% of human proteins. FFPred was initially developed to bridge this gap by targeting sequences with distant or no homologues of known function and by exploiting clear patterns of intrinsic disorder associated with particular molecular activities and biological processes. Here, we present an updated and improved version, which builds on larger datasets of protein sequences and annotations, and uses updated component feature predictors as well as revised training procedures. FFPred 2.0 includes support vector regression models for the prediction of 442 Gene Ontology (GO) terms, which largely expand the coverage of the ontology and of the biological process category in particular. The GO term list mainly revolves around macromolecular interactions and their role in regulatory, signalling, developmental and metabolic processes. Benchmarking experiments on newly annotated proteins show that FFPred 2.0 provides more accurate functional assignments than its predecessor and the ProtFun server do; also, its assignments can complement information obtained using BLAST-based transfer of annotations, improving especially prediction in the biological process category. Furthermore, FFPred 2.0 can be used to annotate proteins belonging to several eukaryotic organisms with a limited decrease in prediction quality. We illustrate all these points through the use of both precision-recall plots and of the COGIC scores, which we recently proposed as an alternative numerical evaluation measure of function prediction accuracy. PMID:23717476
PathoScope 2.0: a complete computational framework for strain identification in environmental or clinical sequencing samples

PubMed Central

2014-01-01

Background Recent innovations in sequencing technologies have provided researchers with the ability to rapidly characterize the microbial content of an environmental or clinical sample with unprecedented resolution. These approaches are producing a wealth of information that is providing novel insights into the microbial ecology of the environment and human health. However, these sequencing-based approaches produce large and complex datasets that require efficient and sensitive computational analysis workflows. Many recent tools for analyzing metagenomic-sequencing data have emerged, however, these approaches often suffer from issues of specificity, efficiency, and typically do not include a complete metagenomic analysis framework. Results We present PathoScope 2.0, a complete bioinformatics framework for rapidly and accurately quantifying the proportions of reads from individual microbial strains present in metagenomic sequencing data from environmental or clinical samples. The pipeline performs all necessary computational analysis steps; including reference genome library extraction and indexing, read quality control and alignment, strain identification, and summarization and annotation of results. We rigorously evaluated PathoScope 2.0 using simulated data and data from the 2011 outbreak of Shiga-toxigenic Escherichia coli O104:H4. Conclusions The results show that PathoScope 2.0 is a complete, highly sensitive, and efficient approach for metagenomic analysis that outperforms alternative approaches in scope, speed, and accuracy. The PathoScope 2.0 pipeline software is freely available for download at: http://sourceforge.net/projects/pathoscope/. PMID:25225611
Development of a Multiplex Single Base Extension Assay for Mitochondrial DNA Haplogroup Typing

PubMed Central

Nelson, Tahnee M.; Just, Rebecca S.; Loreille, Odile; Schanfield, Moses S.; Podini, Daniele

2007-01-01

Aim To provide a screening tool to reduce time and sample consumption when attempting mtDNA haplogroup typing. Methods A single base primer extension assay was developed to enable typing, in a single reaction, of twelve mtDNA haplogroup specific polymorphisms. For validation purposes a total of 147 samples were tested including 73 samples successfully haplogroup typed using mtDNA control region (CR) sequence data, 21 samples inconclusively haplogroup typed by CR data, 20 samples previously haplogroup typed using restriction fragment length polymorphism (RFLP) analysis, and 31 samples of known ancestral origin without previous haplogroup typing. Additionally, two highly degraded human bones embalmed and buried in the early 1950s were analyzed using the single nucleotide polymorphisms (SNP) multiplex. Results When the SNP multiplex was used to type the 96 previously CR sequenced specimens, an increase in haplogroup or macrohaplogroup assignment relative to conventional CR sequence analysis was observed. The single base extension assay was also successfully used to assign a haplogroup to decades-old, embalmed skeletal remains dating to World War II. Conclusion The SNP multiplex was successfully used to obtain haplogroup status of highly degraded human bones, and demonstrated the ability to eliminate possible contributors. The SNP multiplex provides a low-cost, high throughput method for typing of mtDNA haplogroups A, B, C, D, E, F, G, H, L1/L2, L3, M, and N that could be useful for screening purposes for human identification efforts and anthropological studies. PMID:17696300
Evolutionary and Functional Relationships in the Truncated Hemoglobin Family.

PubMed

Bustamante, Juan P; Radusky, Leandro; Boechi, Leonardo; Estrin, Darío A; Ten Have, Arjen; Martí, Marcelo A

2016-01-01

Predicting function from sequence is an important goal in current biological research, and although, broad functional assignment is possible when a protein is assigned to a family, predicting functional specificity with accuracy is not straightforward. If function is provided by key structural properties and the relevant properties can be computed using the sequence as the starting point, it should in principle be possible to predict function in detail. The truncated hemoglobin family presents an interesting benchmark study due to their ubiquity, sequence diversity in the context of a conserved fold and the number of characterized members. Their functions are tightly related to O2 affinity and reactivity, as determined by the association and dissociation rate constants, both of which can be predicted and analyzed using in-silico based tools. In the present work we have applied a strategy, which combines homology modeling with molecular based energy calculations, to predict and analyze function of all known truncated hemoglobins in an evolutionary context. Our results show that truncated hemoglobins present conserved family features, but that its structure is flexible enough to allow the switch from high to low affinity in a few evolutionary steps. Most proteins display moderate to high oxygen affinities and multiple ligand migration paths, which, besides some minor trends, show heterogeneous distributions throughout the phylogenetic tree, again suggesting fast functional adaptation. Our data not only deepens our comprehension of the structural basis governing ligand affinity, but they also highlight some interesting functional evolutionary trends.
Evolutionary and Functional Relationships in the Truncated Hemoglobin Family

PubMed Central

Bustamante, Juan P.; Radusky, Leandro; Boechi, Leonardo; Estrin, Darío A.; ten Have, Arjen; Martí, Marcelo A.

2016-01-01

Predicting function from sequence is an important goal in current biological research, and although, broad functional assignment is possible when a protein is assigned to a family, predicting functional specificity with accuracy is not straightforward. If function is provided by key structural properties and the relevant properties can be computed using the sequence as the starting point, it should in principle be possible to predict function in detail. The truncated hemoglobin family presents an interesting benchmark study due to their ubiquity, sequence diversity in the context of a conserved fold and the number of characterized members. Their functions are tightly related to O2 affinity and reactivity, as determined by the association and dissociation rate constants, both of which can be predicted and analyzed using in-silico based tools. In the present work we have applied a strategy, which combines homology modeling with molecular based energy calculations, to predict and analyze function of all known truncated hemoglobins in an evolutionary context. Our results show that truncated hemoglobins present conserved family features, but that its structure is flexible enough to allow the switch from high to low affinity in a few evolutionary steps. Most proteins display moderate to high oxygen affinities and multiple ligand migration paths, which, besides some minor trends, show heterogeneous distributions throughout the phylogenetic tree, again suggesting fast functional adaptation. Our data not only deepens our comprehension of the structural basis governing ligand affinity, but they also highlight some interesting functional evolutionary trends. PMID:26788940
Application of Faecalibacterium 16S rDNA genetic marker for accurate identification of duck faeces.

PubMed

Sun, Da; Duan, Chuanren; Shang, Yaning; Ma, Yunxia; Tan, Lili; Zhai, Jun; Gao, Xu; Guo, Jingsong; Wang, Guixue

2016-04-01

The aim of this study was to judge the legal duty of pollution liabilities by assessing a duck faeces-specific marker, which can exclude distractions of residual bacteria from earlier contamination accidents. With the gene sequencing technology and bioinformatics method, we completed the comparative analysis of Faecalibacterium sequences, which were associated with ducks and other animal species, and found the sequences unique to duck faeces. Polymerase chain reaction (PCR) and agarose gel electrophoresis techniques were used to verify the reliability of both human and duck faeces-specific primers. The duck faeces-specific primers generated an amplicon of 141 bp from 43.3 % of duck faecal samples, 0 % of control samples and 100 % of sewage wastewater samples that contained duck faeces. We present here the initial evidence of Faecalibacterium-based applicability as human faeces-specificity in China. Meanwhile, this study represents the initial report of a Faecalibacterium marker for duck faeces and suggests an independent or supplementary environmental biotechnology of microbial source tracking (MST).
Sma3s: a three-step modular annotator for large sequence datasets.

PubMed

Muñoz-Mérida, Antonio; Viguera, Enrique; Claros, M Gonzalo; Trelles, Oswaldo; Pérez-Pulido, Antonio J

2014-08-01

Automatic sequence annotation is an essential component of modern 'omics' studies, which aim to extract information from large collections of sequence data. Most existing tools use sequence homology to establish evolutionary relationships and assign putative functions to sequences. However, it can be difficult to define a similarity threshold that achieves sufficient coverage without sacrificing annotation quality. Defining the correct configuration is critical and can be challenging for non-specialist users. Thus, the development of robust automatic annotation techniques that generate high-quality annotations without needing expert knowledge would be very valuable for the research community. We present Sma3s, a tool for automatically annotating very large collections of biological sequences from any kind of gene library or genome. Sma3s is composed of three modules that progressively annotate query sequences using either: (i) very similar homologues, (ii) orthologous sequences or (iii) terms enriched in groups of homologous sequences. We trained the system using several random sets of known sequences, demonstrating average sensitivity and specificity values of ~85%. In conclusion, Sma3s is a versatile tool for high-throughput annotation of a wide variety of sequence datasets that outperforms the accuracy of other well-established annotation algorithms, and it can enrich existing database annotations and uncover previously hidden features. Importantly, Sma3s has already been used in the functional annotation of two published transcriptomes. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
The human myelin oligodendrocyte glycoprotein (MOG) gene: Complete nucleotide sequence and structural characterization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Paule Roth, M.; Malfroy, L.; Offer, C.

1995-07-20

Human myelin oligodendrocyte glycoprotein (MOG), a myelin component of the central nervous system, is a candidate target antigen for autoimmune-mediated demyelination. We have isolated and sequenced part of a cosmid clone that contains the entire human MOG gene. The primary nuclear transcript, extending from the putative start of transcription to the site of poly(A) addition, is 15,561 nucleotides in length. The human MOG gene contains 8 exons, separated by 7 introns; canonical intron/exon boundary sites are observed at each junction. The introns vary in size from 242 to 6484 bp and contain numerous repetitive DNA elements, including 14 Alu sequencesmore » within 3 introns. Another Alu element is located in the 3{prime}-untranslated region of the gene. Alu sequences were classified with respect to subfamily assignment. Seven hundred sixty-three nucleotides 5{prime} of the transcription start and 1214 nucleotides 3{prime} of the poly(A) addition sites were also sequenced. The 5{prime}-flanking region revealed the presence of several consensus sequences that could be relevant in the transcription of the MOG gene, in particular binding sites in common with other myelin gene promoters. Two polymorphic intragenic dinucleotide (CA){sub n} and tetranucleotide (TAAA){sub n} repeats were identified and may provide genetic marker tools for association and linkage studies. 50 refs., 3 figs., 3 tabs.« less
Multidimensional oriented solid-state NMR experiments enable the sequential assignment of uniformly 15N labeled integral membrane proteins in magnetically aligned lipid bilayers.

PubMed

Mote, Kaustubh R; Gopinath, T; Traaseth, Nathaniel J; Kitchen, Jason; Gor'kov, Peter L; Brey, William W; Veglia, Gianluigi

2011-11-01

Oriented solid-state NMR is the most direct methodology to obtain the orientation of membrane proteins with respect to the lipid bilayer. The method consists of measuring (1)H-(15)N dipolar couplings (DC) and (15)N anisotropic chemical shifts (CSA) for membrane proteins that are uniformly aligned with respect to the membrane bilayer. A significant advantage of this approach is that tilt and azimuthal (rotational) angles of the protein domains can be directly derived from analytical expression of DC and CSA values, or, alternatively, obtained by refining protein structures using these values as harmonic restraints in simulated annealing calculations. The Achilles' heel of this approach is the lack of suitable experiments for sequential assignment of the amide resonances. In this Article, we present a new pulse sequence that integrates proton driven spin diffusion (PDSD) with sensitivity-enhanced PISEMA in a 3D experiment ([(1)H,(15)N]-SE-PISEMA-PDSD). The incorporation of 2D (15)N/(15)N spin diffusion experiments into this new 3D experiment leads to the complete and unambiguous assignment of the (15)N resonances. The feasibility of this approach is demonstrated for the membrane protein sarcolipin reconstituted in magnetically aligned lipid bicelles. Taken with low electric field probe technology, this approach will propel the determination of sequential assignment as well as structure and topology of larger integral membrane proteins in aligned lipid bilayers. © Springer Science+Business Media B.V. 2011
Using RSAT to scan genome sequences for transcription factor binding sites and cis-regulatory modules.

PubMed

Turatsinze, Jean-Valery; Thomas-Chollier, Morgane; Defrance, Matthieu; van Helden, Jacques

2008-01-01

This protocol shows how to detect putative cis-regulatory elements and regions enriched in such elements with the regulatory sequence analysis tools (RSAT) web server (http://rsat.ulb.ac.be/rsat/). The approach applies to known transcription factors, whose binding specificity is represented by position-specific scoring matrices, using the program matrix-scan. The detection of individual binding sites is known to return many false predictions. However, results can be strongly improved by estimating P value, and by searching for combinations of sites (homotypic and heterotypic models). We illustrate the detection of sites and enriched regions with a study case, the upstream sequence of the Drosophila melanogaster gene even-skipped. This protocol is also tested on random control sequences to evaluate the reliability of the predictions. Each task requires a few minutes of computation time on the server. The complete protocol can be executed in about one hour.
[Cloning and sequencing of KIR2DL1 framework gene cDNA and identification of a novel allele].

PubMed

Sun, Ge; Wang, Chang; Zhen, Jianxin; Zhang, Guobin; Xu, Yunping; Deng, Zhihui

2016-10-01

To develop an assay for cDNA cloning and haplotype sequencing of KIR2DL1 framework gene and determine the genotype of an ethnic Han from southern China. Total RNA was isolated from peripheral blood sample, and complementary DNA (cDNA) transcript was synthesized by RT-PCR. The entire coding sequence of the KIR2DL1 framework gene was amplified with a pair of KIR2DL1-specific PCR primers. The PCR products with a length of approximately 1.2 kb were then subjected to cloning and haplotype sequencing. A specific target fragment of the KIR2DL1 framework gene was obtained. Following allele separation, a wild-type KIR2DL1*00302 allele and a novel variant allele, KIR2DL1*031, were identified. Sequence alignment with KIR2DL1 alleles from the IPD-KIR Database showed that the novel allele KIR2DL1*031 has differed from the closest allele KIR2DL1*00302 by a non-synonymous mutation at CDS nt 188A>G (codon 42 GAG>GGG) in exon 4, which has caused an amino acid change Glu42Gly. The sequence of the novel allele KIR2DL1*031 was submitted to GenBank under the accession number KP025960 and to the IPD-KIR Database under the submission number IWS40001982. A name KIR2DL1*031 has been officially assigned by the World Health Organization (WHO) Nomenclature Committee. An assay for cDNA cloning and haplotype sequencing of KIR2DL1 has been established, which has a broad applications in KIR studies at allelic level.
Binning of shallowly sampled metagenomic sequence fragments reveals that low abundance bacteria play important roles in sulfur cycling and degradation of complex organic polymers in an acid mine drainage community

NASA Astrophysics Data System (ADS)

Dick, G. J.; Andersson, A.; Banfield, J. F.

2007-12-01

Our understanding of environmental microbiology has been greatly enhanced by community genome sequencing of DNA recovered directly the environment. Community genomics provides insights into the diversity, community structure, metabolic function, and evolution of natural populations of uncultivated microbes, thereby revealing dynamics of how microorganisms interact with each other and their environment. Recent studies have demonstrated the potential for reconstructing near-complete genomes from natural environments while highlighting the challenges of analyzing community genomic sequence, especially from diverse environments. A major challenge of shotgun community genome sequencing is identification of DNA fragments from minor community members for which only low coverage of genomic sequence is present. We analyzed community genome sequence retrieved from biofilms in an acid mine drainage (AMD) system in the Richmond Mine at Iron Mountain, CA, with an emphasis on identification and assembly of DNA fragments from low-abundance community members. The Richmond mine hosts an extensive, relatively low diversity subterranean chemolithoautotrophic community that is sustained entirely by oxidative dissolution of pyrite. The activity of these microorganisms greatly accelerates the generation of AMD. Previous and ongoing work in our laboratory has focused on reconstrucing genomes of dominant community members, including several bacteria and archaea. We binned contigs from several samples (including one new sample and two that had been previously analyzed) by tetranucleotide frequency with clustering by Self-Organizing Maps (SOM). The binning, evaluated by comparison with information from the manually curated assembly of the dominant organisms, was found to be very effective: fragments were correctly assigned with 95% accuracy. Improperly assigned fragments often contained sequences that are either evolutionarily constrained (e.g. 16S rRNA genes) or mobile elements that are not expected to reflect the tetranucleotide frequency signature of the host genome. Four unknown tetranucleotide frequency clusters with significant sequence (6 Mb total) were noted and analyzed further. Based on phylogenetic markers and BLAST results, these clusters represent low abundance bacteria including Acintobacteria, Firmicutes, and Proteobacteria. Functional analysis of these clusters revealved that the low- abundance bacteria harbor genes that could potentially encode important ecosystem functions such as sulfur utilization (e.g. polysulfide reductase) and polymer degradation (e.g. chitinase and glycoside hydrolase). We conclude that ESOM clustering of tetranucleotide frequency patterns is an effective method for rapidly binning shotgun community genomic sequences and a valuable tool for analyzing minor community members, which despite their low abundance may play crucial ecological roles.
The Comprehensive Microbial Resource

PubMed Central

Peterson, Jeremy D.; Umayam, Lowell A.; Dickinson, Tanja; Hickey, Erin K.; White, Owen

2001-01-01

One challenge presented by large-scale genome sequencing efforts is effective display of uniform information to the scientific community. The Comprehensive Microbial Resource (CMR) contains robust annotation of all complete microbial genomes and allows for a wide variety of data retrievals. The bacterial information has been placed on the Web at http://www.tigr.org/CMR for retrieval using standard web browsing technology. Retrievals can be based on protein properties such as molecular weight or hydrophobicity, GC-content, functional role assignments and taxonomy. The CMR also has special web-based tools to allow data mining using pre-run homology searches, whole genome dot-plots, batch downloading and traversal across genomes using a variety of datatypes. PMID:11125067
A Single Multilocus Sequence Typing (MLST) Scheme for Seven Pathogenic Leptospira Species

PubMed Central

Amornchai, Premjit; Wuthiekanun, Vanaporn; Bailey, Mark S.; Holden, Matthew T. G.; Zhang, Cuicai; Jiang, Xiugao; Koizumi, Nobuo; Taylor, Kyle; Galloway, Renee; Hoffmaster, Alex R.; Craig, Scott; Smythe, Lee D.; Hartskeerl, Rudy A.; Day, Nicholas P.; Chantratita, Narisara; Feil, Edward J.; Aanensen, David M.; Spratt, Brian G.; Peacock, Sharon J.

2013-01-01

Background The available Leptospira multilocus sequence typing (MLST) scheme supported by a MLST website is limited to L. interrogans and L. kirschneri. Our aim was to broaden the utility of this scheme to incorporate a total of seven pathogenic species. Methodology and Findings We modified the existing scheme by replacing one of the seven MLST loci (fadD was changed to caiB), as the former gene did not appear to be present in some pathogenic species. Comparison of the original and modified schemes using data for L. interrogans and L. kirschneri demonstrated that the discriminatory power of the two schemes was not significantly different. The modified scheme was used to further characterize 325 isolates (L. alexanderi [n = 5], L. borgpetersenii [n = 34], L. interrogans [n = 222], L. kirschneri [n = 29], L. noguchii [n = 9], L. santarosai [n = 10], and L. weilii [n = 16]). Phylogenetic analysis using concatenated sequences of the 7 loci demonstrated that each species corresponded to a discrete clade, and that no strains were misclassified at the species level. Comparison between genotype and serovar was possible for 254 isolates. Of the 31 sequence types (STs) represented by at least two isolates, 18 STs included isolates assigned to two or three different serovars. Conversely, 14 serovars were identified that contained between 2 to 10 different STs. New observations were made on the global phylogeography of Leptospira spp., and the utility of MLST in making associations between human disease and specific maintenance hosts was demonstrated. Conclusion The new MLST scheme, supported by an updated MLST website, allows the characterization and species assignment of isolates of the seven major pathogenic species associated with leptospirosis. PMID:23359622
Sequence Analysis of Leuconostoc mesenteroides Bacteriophage Φ1-A4 Isolated from an Industrial Vegetable Fermentation▿

PubMed Central

Lu, Z.; Altermann, E.; Breidt, F.; Kozyavkin, S.

2010-01-01

Vegetable fermentations rely on the proper succession of a variety of lactic acid bacteria (LAB). Leuconostoc mesenteroides initiates fermentation. As fermentation proceeds, L. mesenteroides dies off and other LAB complete the fermentation. Phages infecting L. mesenteroides may significantly influence the die-off of L. mesenteroides. However, no L. mesenteroides phages have been previously genetically characterized. Knowledge of more phage genome sequences may provide new insights into phage genomics, phage evolution, and phage-host interactions. We have determined the complete genome sequence of L. mesenteroides phage Φ1-A4, isolated from an industrial sauerkraut fermentation. The phage possesses a linear, double-stranded DNA genome consisting of 29,508 bp with a G+C content of 36%. Fifty open reading frames (ORFs) were predicted. Putative functions were assigned to 26 ORFs (52%), including 5 ORFs of structural proteins. The phage genome was modularly organized, containing DNA replication, DNA-packaging, head and tail morphogenesis, cell lysis, and DNA regulation/modification modules. In silico analyses showed that Φ1-A4 is a unique lytic phage with a large-scale genome inversion (∼30% of the genome). The genome inversion encompassed the lysis module, part of the structural protein module, and a cos site. The endolysin gene was flanked by two holin genes. The tail morphogenesis module was interspersed with cell lysis genes and other genes with unknown functions. The predicted amino acid sequences of the phage proteins showed little similarity to other phages, but functional analyses showed that Φ1-A4 clusters with several Lactococcus phages. To our knowledge, Φ1-A4 is the first genetically characterized L. mesenteroides phage. PMID:20118355
Expressed sequence tags from the plant trypanosomatid Phytomonas serpens.

PubMed

Pappas, Georgios J; Benabdellah, Karim; Zingales, Bianca; González, Antonio

2005-08-01

We have generated 2190 expressed sequence tags (ESTs) from a cDNA library of the plant trypanosomatid Phytomonas serpens. Upon processing and clustering the set of 1893 accepted sequences was reduced to 697 clusters consisting of 452 singletons and 245 contigs. Functional categories were assigned based on BLAST searches against a database of the eukaryotic orthologous groups of proteins (KOG). Thirty six percent of the generated sequences showed no hits against the KOG database and 39.6% presented similarity to the KOG classes corresponding to translation, ribosomal structure and biogenesis. The most populated cluster contained 45 ESTs homologous to members of the glucose transporter family. This fact can be immediately correlated to the reported Phytomonas dependence on anaerobic glycolytic ATP production due to the lack of cytochrome-mediated respiratory chain. In this context, not only a number of enzymes of the glycolytic pathway were identified but also of the Krebs cycle as well as specific components of the respiratory chain. The data here reported, including a few hundred unique sequences and the description of tandemly repeated motifs and putative transcript stability motifs at untranslated mRNA ends, represent an initial approach to overcome the lack of information on the molecular biology of this organism.
The specificity of public stigma: A comparison of suicide and depression-related stigma.

PubMed

Sheehan, Lindsay; Dubke, Rachel; Corrigan, Patrick W

2017-10-01

Each year, approximately 1.3 million Americans survive a suicide attempt. While stigma has been reported by suicide attempt survivors, limited research has examined how suicide stigma may differ from the stigma of mental illness. U.S. adults (n = 440) completed an online survey in which they were randomly assigned to one of four vignettes. Vignettes depicted a target individual with either past depression, past suicide attempt, death by suicide, or no information on suicide or mental illness (control). Participants completed a general measure of stigma, a suicide-specific stigma measure, and were surveyed on the recovery potential of individuals with mental illness and suicide attempt. While the general stigma measure failed to distinguish between groups, significant differences on the suicide stigma scale (SSAS-44) emerged between participants assigned in the depression and suicide conditions, especially for stereotype and prejudice subscales. Across conditions, participants believed that recovery was more realistic for someone described as having a mental illness than it was for someone described as having attempted suicide. These findings suggest that individuals who have attempted suicide are subject to differential stigma content from those with depression. Implications are discussed for combating stigma for suicide attempt survivors. Copyright © 2017 Elsevier B.V. All rights reserved.

Single Stage Tandem Mass Spectrometry Assignment of the C-5 Uronic Acid Stereochemistry in Heparan Sulfate Tetrasaccharides using Electron Detachment Dissociation

NASA Astrophysics Data System (ADS)

Agyekum, Isaac; Zong, Chengli; Boons, Geert-Jan; Amster, I. Jonathan

2017-09-01

The analysis of heparan sulfate (HS) glycosaminoglycans presents many challenges, due to the high degree of structural heterogeneity arising from their non-template biosynthesis. Complete structural elucidation of glycosaminoglycans necessitates the unambiguous assignments of sulfo modifications and the C-5 uronic acid stereochemistry. Efforts to develop tandem mass spectrometric-based methods for the structural analysis of glycosaminoglycans have focused on the assignment of sulfo positions. The present work focuses on the assignment of the C-5 stereochemistry of the uronic acid that lies closest to the reducing end. Prior work with electron-based tandem mass spectrometry methods, specifically electron detachment dissociation (EDD), have shown great promise in providing stereo-specific product ions, such as the B3 ´ -CO2, which has been found to distinguish glucuronic acid (GlcA) from iduronic acid (IdoA) in some HS tetrasaccharides. The previously observed diagnostic ions are generally not observed with 2- O-sulfo uronic acids or for more highly sulfated heparan sulfate tetrasaccharides. A recent study using electron detachment dissociation and principal component analysis revealed a series of ions that correlate with GlcA versus IdoA for a set of 2- O-sulfo HS tetrasaccharide standards. The present work comprehensively investigates the efficacy of these ions for assigning the C-5 stereochemistry of the reducing end uronic acid in 33 HS tetrasaccharides. A diagnostic ratio can be computed from the sum of the ions that correlate to GlcA to those that correlate to IdoA. [Figure not available: see fulltext.
Complete nucleotide sequence of the freshwater unicellular cyanobacterium Synechococcus elongatus PCC 6301 chromosome: gene content and organization.

PubMed

Sugita, Chieko; Ogata, Koretsugu; Shikata, Masamitsu; Jikuya, Hiroyuki; Takano, Jun; Furumichi, Miho; Kanehisa, Minoru; Omata, Tatsuo; Sugiura, Masahiro; Sugita, Mamoru

2007-01-01

The entire genome of the unicellular cyanobacterium Synechococcus elongatus PCC 6301 (formerly Anacystis nidulans Berkeley strain 6301) was sequenced. The genome consisted of a circular chromosome 2,696,255 bp long. A total of 2,525 potential protein-coding genes, two sets of rRNA genes, 45 tRNA genes representing 42 tRNA species, and several genes for small stable RNAs were assigned to the chromosome by similarity searches and computer predictions. The translated products of 56% of the potential protein-coding genes showed sequence similarities to experimentally identified and predicted proteins of known function, and the products of 35% of the genes showed sequence similarities to the translated products of hypothetical genes. The remaining 9% of genes lacked significant similarities to genes for predicted proteins in the public DNA databases. Some 139 genes coding for photosynthesis-related components were identified. Thirty-seven genes for two-component signal transduction systems were also identified. This is the smallest number of such genes identified in cyanobacteria, except for marine cyanobacteria, suggesting that only simple signal transduction systems are found in this strain. The gene arrangement and nucleotide sequence of Synechococcus elongatus PCC 6301 were nearly identical to those of a closely related strain Synechococcus elongatus PCC 7942, except for the presence of a 188.6 kb inversion. The sequences as well as the gene information shown in this paper are available in the Web database, CYORF (http://www.cyano.genome.jp/).
The Mouse Genomes Project: a repository of inbred laboratory mouse strain genomes.

PubMed

Adams, David J; Doran, Anthony G; Lilue, Jingtao; Keane, Thomas M

2015-10-01

The Mouse Genomes Project was initiated in 2009 with the goal of using next-generation sequencing technologies to catalogue molecular variation in the common laboratory mouse strains, and a selected set of wild-derived inbred strains. The initial sequencing and survey of sequence variation in 17 inbred strains was completed in 2011 and included comprehensive catalogue of single nucleotide polymorphisms, short insertion/deletions, larger structural variants including their fine scale architecture and landscape of transposable element variation, and genomic sites subject to post-transcriptional alteration of RNA. From this beginning, the resource has expanded significantly to include 36 fully sequenced inbred laboratory mouse strains, a refined and updated data processing pipeline, and new variation querying and data visualisation tools which are available on the project's website ( http://www.sanger.ac.uk/resources/mouse/genomes/ ). The focus of the project is now the completion of de novo assembled chromosome sequences and strain-specific gene structures for the core strains. We discuss how the assembled chromosomes will power comparative analysis, data access tools and future directions of mouse genetics.
Testing of Badminton-Specific Endurance.

PubMed

Madsen, Christian M; Højlyng, Mads; Nybo, Lars

2016-09-01

Madsen, CM, Højlyng, M, and Nybo, L. Testing of badminton-specific endurance. J Strength Cond Res 30(9): 2582-2590, 2016-In the present study, a novel intermittent badminton endurance (B-ENDURANCE) test was developed and tested in elite (n = 17) and skilled (n = 9) badminton players and in age-matched physically active men (nonbadminton players; n = 8). In addition, B-ENDURANCE test-retest reproducibility was evaluated in 9 badminton players. The B-ENDURANCE test is an incremental test where each level consists of repeated sequences of badminton-specific actions toward the 4 corners of the court. The subject starts in the center of the court in front of a computer screen and within each sequence, he must, in a randomized order, complete 8 actions as dictated by the computer, providing the audiovisual input and verifying that the appropriate sensor is activated within the allocated time. Recovery time between each sequence is 10 seconds throughout the test, but the time to complete each sequence is gradually decreased until the subjects cannot follow the dictated tempo. The B-ENDURANCE test performance for elite players was better (p ≤ 0.05) compared with the skilled players and nonbadminton players. In addition, the B-ENDURANCE test performance correlated (r = 0.8 and p < 0.0001) with elite players' national single rankings. Test-retest coefficient of variation was 7.9% between the first 2 trials (i.e., without a familiarization trial) but reduced to 2.5% when comparing the second and third trials. In conclusion, the B-ENDURANCE test is relevant for the evaluation of badminton-specific endurance but at least 1 familiarization trial is recommended if the test is used for evaluation of longitudinal changes, e.g., tracking training effects.
Symbolically Modeling Concurrent MCAPI Executions

NASA Technical Reports Server (NTRS)

Fischer, Topher; Mercer, Eric; Rungta, Neha

2011-01-01

Improper use of Inter-Process Communication (IPC) within concurrent systems often creates data races which can lead to bugs that are challenging to discover. Techniques that use Satisfiability Modulo Theories (SMT) problems to symbolically model possible executions of concurrent software have recently been proposed for use in the formal verification of software. In this work we describe a new technique for modeling executions of concurrent software that use a message passing API called MCAPI. Our technique uses an execution trace to create an SMT problem that symbolically models all possible concurrent executions and follows the same sequence of conditional branch outcomes as the provided execution trace. We check if there exists a satisfying assignment to the SMT problem with respect to specific safety properties. If such an assignment exists, it provides the conditions that lead to the violation of the property. We show how our method models behaviors of MCAPI applications that are ignored in previously published techniques.
An Evaluation of the Effects of an Oven Timer Study Behavior and Concurrent Completion and Accuracy of Assignments for a First Grade Repeater: A Case Study.

ERIC Educational Resources Information Center

Riegelman, Elizabeth D.; And Others

The effects of an oven timer as an antecedent stimulus on study behavior and concurrent completion and accuracy of reading and writing assignments were investigated for an 8-year-old first grade repeater who lacked motivation. Following baseline observations during which the teacher recorded study behavior and collected assignments with no…
The Complete Mitochondrial Genome of Gossypium hirsutum and Evolutionary Analysis of Higher Plant Mitochondrial Genomes

PubMed Central

Su, Aiguo; Geng, Jianing; Grover, Corrinne E.; Hu, Songnian; Hua, Jinping

2013-01-01

Background Mitochondria are the main manufacturers of cellular ATP in eukaryotes. The plant mitochondrial genome contains large number of foreign DNA and repeated sequences undergone frequently intramolecular recombination. Upland Cotton (Gossypium hirsutum L.) is one of the main natural fiber crops and also an important oil-producing plant in the world. Sequencing of the cotton mitochondrial (mt) genome could be helpful for the evolution research of plant mt genomes. Methodology/Principal Findings We utilized 454 technology for sequencing and combined with Fosmid library of the Gossypium hirsutum mt genome screening and positive clones sequencing and conducted a series of evolutionary analysis on Cycas taitungensis and 24 angiosperms mt genomes. After data assembling and contigs joining, the complete mitochondrial genome sequence of G. hirsutum was obtained. The completed G.hirsutum mt genome is 621,884 bp in length, and contained 68 genes, including 35 protein genes, four rRNA genes and 29 tRNA genes. Five gene clusters are found conserved in all plant mt genomes; one and four clusters are specifically conserved in monocots and dicots, respectively. Homologous sequences are distributed along the plant mt genomes and species closely related share the most homologous sequences. For species that have both mt and chloroplast genome sequences available, we checked the location of cp-like migration and found several fragments closely linked with mitochondrial genes. Conclusion The G. hirsutum mt genome possesses most of the common characters of higher plant mt genomes. The existence of syntenic gene clusters, as well as the conservation of some intergenic sequences and genic content among the plant mt genomes suggest that evolution of mt genomes is consistent with plant taxonomy but independent among different species. PMID:23940520
The complete mitochondrial genome of Gossypium hirsutum and evolutionary analysis of higher plant mitochondrial genomes.

PubMed

Liu, Guozheng; Cao, Dandan; Li, Shuangshuang; Su, Aiguo; Geng, Jianing; Grover, Corrinne E; Hu, Songnian; Hua, Jinping

2013-01-01

Mitochondria are the main manufacturers of cellular ATP in eukaryotes. The plant mitochondrial genome contains large number of foreign DNA and repeated sequences undergone frequently intramolecular recombination. Upland Cotton (Gossypium hirsutum L.) is one of the main natural fiber crops and also an important oil-producing plant in the world. Sequencing of the cotton mitochondrial (mt) genome could be helpful for the evolution research of plant mt genomes. We utilized 454 technology for sequencing and combined with Fosmid library of the Gossypium hirsutum mt genome screening and positive clones sequencing and conducted a series of evolutionary analysis on Cycas taitungensis and 24 angiosperms mt genomes. After data assembling and contigs joining, the complete mitochondrial genome sequence of G. hirsutum was obtained. The completed G.hirsutum mt genome is 621,884 bp in length, and contained 68 genes, including 35 protein genes, four rRNA genes and 29 tRNA genes. Five gene clusters are found conserved in all plant mt genomes; one and four clusters are specifically conserved in monocots and dicots, respectively. Homologous sequences are distributed along the plant mt genomes and species closely related share the most homologous sequences. For species that have both mt and chloroplast genome sequences available, we checked the location of cp-like migration and found several fragments closely linked with mitochondrial genes. The G. hirsutum mt genome possesses most of the common characters of higher plant mt genomes. The existence of syntenic gene clusters, as well as the conservation of some intergenic sequences and genic content among the plant mt genomes suggest that evolution of mt genomes is consistent with plant taxonomy but independent among different species.
The COG database: a tool for genome-scale analysis of protein functions and evolution

PubMed Central

Tatusov, Roman L.; Galperin, Michael Y.; Natale, Darren A.; Koonin, Eugene V.

2000-01-01

Rational classification of proteins encoded in sequenced genomes is critical for making the genome sequences maximally useful for functional and evolutionary studies. The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on a phylogenetic classification of the proteins encoded in 21 complete genomes of bacteria, archaea and eukaryotes (http://www.ncbi.nlm.nih.gov/COG ). The COGs were constructed by applying the criterion of consistency of genome-specific best hits to the results of an exhaustive comparison of all protein sequences from these genomes. The database comprises 2091 COGs that include 56–83% of the gene products from each of the complete bacterial and archaeal genomes and ~35% of those from the yeast Saccharomyces cerevisiae genome. The COG database is accompanied by the COGNITOR program that is used to fit new proteins into the COGs and can be applied to functional and phylogenetic annotation of newly sequenced genomes. PMID:10592175
Complete genome sequence of Enterobacter sp. IIT-BT 08: A potential microbial strain for high rate hydrogen production.

PubMed

Khanna, Namita; Ghosh, Ananta Kumar; Huntemann, Marcel; Deshpande, Shweta; Han, James; Chen, Amy; Kyrpides, Nikos; Mavrommatis, Kostas; Szeto, Ernest; Markowitz, Victor; Ivanova, Natalia; Pagani, Ioanna; Pati, Amrita; Pitluck, Sam; Nolan, Matt; Woyke, Tanja; Teshima, Hazuki; Chertkov, Olga; Daligault, Hajnalka; Davenport, Karen; Gu, Wei; Munk, Christine; Zhang, Xiaojing; Bruce, David; Detter, Chris; Xu, Yan; Quintana, Beverly; Reitenga, Krista; Kunde, Yulia; Green, Lance; Erkkila, Tracy; Han, Cliff; Brambilla, Evelyne-Marie; Lang, Elke; Klenk, Hans-Peter; Goodwin, Lynne; Chain, Patrick; Das, Debabrata

2013-12-20

Enterobacter sp. IIT-BT 08 belongs to Phylum: Proteobacteria, Class: Gammaproteobacteria, Order: Enterobacteriales, Family: Enterobacteriaceae. The organism was isolated from the leaves of a local plant near the Kharagpur railway station, Kharagpur, West Bengal, India. It has been extensively studied for fermentative hydrogen production because of its high hydrogen yield. For further enhancement of hydrogen production by strain development, complete genome sequence analysis was carried out. Sequence analysis revealed that the genome was linear, 4.67 Mbp long and had a GC content of 56.01%. The genome properties encode 4,393 protein-coding and 179 RNA genes. Additionally, a putative pathway of hydrogen production was suggested based on the presence of formate hydrogen lyase complex and other related genes identified in the genome. Thus, in the present study we describe the specific properties of the organism and the generation, annotation and analysis of its genome sequence as well as discuss the putative pathway of hydrogen production by this organism.
Swine and Poultry Pathogens: the Complete Genome Sequences of Two Strains of Mycoplasma hyopneumoniae and a Strain of Mycoplasma synoviae†

PubMed Central

Vasconcelos, Ana Tereza R.; Ferreira, Henrique B.; Bizarro, Cristiano V.; Bonatto, Sandro L.; Carvalho, Marcos O.; Pinto, Paulo M.; Almeida, Darcy F.; Almeida, Luiz G. P.; Almeida, Rosana; Alves-Filho, Leonardo; Assunção, Enedina N.; Azevedo, Vasco A. C.; Bogo, Maurício R.; Brigido, Marcelo M.; Brocchi, Marcelo; Burity, Helio A.; Camargo, Anamaria A.; Camargo, Sandro S.; Carepo, Marta S.; Carraro, Dirce M.; de Mattos Cascardo, Júlio C.; Castro, Luiza A.; Cavalcanti, Gisele; Chemale, Gustavo; Collevatti, Rosane G.; Cunha, Cristina W.; Dallagiovanna, Bruno; Dambrós, Bibiana P.; Dellagostin, Odir A.; Falcão, Clarissa; Fantinatti-Garboggini, Fabiana; Felipe, Maria S. S.; Fiorentin, Laurimar; Franco, Gloria R.; Freitas, Nara S. A.; Frías, Diego; Grangeiro, Thalles B.; Grisard, Edmundo C.; Guimarães, Claudia T.; Hungria, Mariangela; Jardim, Sílvia N.; Krieger, Marco A.; Laurino, Jomar P.; Lima, Lucymara F. A.; Lopes, Maryellen I.; Loreto, Élgion L. S.; Madeira, Humberto M. F.; Manfio, Gilson P.; Maranhão, Andrea Q.; Martinkovics, Christyanne T.; Medeiros, Sílvia R. B.; Moreira, Miguel A. M.; Neiva, Márcia; Ramalho-Neto, Cicero E.; Nicolás, Marisa F.; Oliveira, Sergio C.; Paixão, Roger F. C.; Pedrosa, Fábio O.; Pena, Sérgio D. J.; Pereira, Maristela; Pereira-Ferrari, Lilian; Piffer, Itamar; Pinto, Luciano S.; Potrich, Deise P.; Salim, Anna C. M.; Santos, Fabrício R.; Schmitt, Renata; Schneider, Maria P. C.; Schrank, Augusto; Schrank, Irene S.; Schuck, Adriana F.; Seuanez, Hector N.; Silva, Denise W.; Silva, Rosane; Silva, Sérgio C.; Soares, Célia M. A.; Souza, Kelly R. L.; Souza, Rangel C.; Staats, Charley C.; Steffens, Maria B. R.; Teixeira, Santuza M. R.; Urmenyi, Turan P.; Vainstein, Marilene H.; Zuccherato, Luciana W.; Simpson, Andrew J. G.; Zaha, Arnaldo

2005-01-01

This work reports the results of analyses of three complete mycoplasma genomes, a pathogenic (7448) and a nonpathogenic (J) strain of the swine pathogen Mycoplasma hyopneumoniae and a strain of the avian pathogen Mycoplasma synoviae; the genome sizes of the three strains were 920,079 bp, 897,405 bp, and 799,476 bp, respectively. These genomes were compared with other sequenced mycoplasma genomes reported in the literature to examine several aspects of mycoplasma evolution. Strain-specific regions, including integrative and conjugal elements, and genome rearrangements and alterations in adhesin sequences were observed in the M. hyopneumoniae strains, and all of these were potentially related to pathogenicity. Genomic comparisons revealed that reduction in genome size implied loss of redundant metabolic pathways, with maintenance of alternative routes in different species. Horizontal gene transfer was consistently observed between M. synoviae and Mycoplasma gallisepticum. Our analyses indicated a likely transfer event of hemagglutinin-coding DNA sequences from M. gallisepticum to M. synoviae. PMID:16077101
Partial DNA sequencing of Douglas-fir cDNAs used in RFLP mapping

Treesearch

K.D. Jermstad; D.L. Bassoni; C.S. Kinlaw; D.B. Neale

1998-01-01

DNA sequences from 87 Douglas-fir (Pseudotsuga menziesii [Mirb.] Franco) cDNA RFLP probes were determined. Sequences were submitted to the GenBank dbEST database and searched for similarity against nucleotide and protein databases using the BLASTn and BLASTx programs. Twenty-one sequences (24%) were assigned putative functions; 18 of which...
Metagenomic Assessment of a Dynamic Microbial Population from Subseafloor Aquifer Fluids in the Cold, Oxygenated Crust

NASA Astrophysics Data System (ADS)

Tully, B. J.; Heidelberg, J. F.; Kraft, B.; Girguis, P. R.; Huber, J. A.

2016-12-01

The oceanic crust contains the largest aquifer on Earth with a volume approximately 2% of the global ocean. Ongoing research at the North Pond (NP) site, west of the Mid-Atlantic Ridge, provides an environment representative of oxygenated crustal aquifers beneath oligotrophic surface waters. Using subseafloor CORK observatories for multiple sampling depths beneath the seafloor, crustal fluids were sampled along the predicted aquifer fluid flow path over a two-year period. DNA was extracted and sequenced for metagenomic analysis from 22 crustal fluid samples, along with the overlying bottom. At broad taxonomic groupings, the aquifer system is highly dynamic over time and space, with shifts in dominant taxa and "blooms" of transient groups that appear at discreet time points and sample depths. We were able to reconstruct 194 high-quality, low-contamination bacterial and archaeal metagenomic-assembled genomes (MAGs) with estimated completeness >50% (429 MAGs >20% complete). Environmental genomes were assigned to phylogenies from the major bacterial phyla, putative novel groups, and poorly sampled phylogenetic groups, including the Marinimicrobia, Candidate Phyla Radiation, and Planctomycetes. Biogeochemically relevant processes were assigned to MAGs, including denitrification, dissimilatory sulfur and hydrogen cycling, and carbon fixation. Collectively, the oxic NP aquifer system represents a diverse, dynamic microbial habitat with the metabolic potential to impact multiple globally relevant biogeochemical cycles, including nitrogen, sulfur, and carbon.
BAsE-Seq: a method for obtaining long viral haplotypes from short sequence reads.

PubMed

Hong, Lewis Z; Hong, Shuzhen; Wong, Han Teng; Aw, Pauline P K; Cheng, Yan; Wilm, Andreas; de Sessions, Paola F; Lim, Seng Gee; Nagarajan, Niranjan; Hibberd, Martin L; Quake, Stephen R; Burkholder, William F

2014-01-01

We present a method for obtaining long haplotypes, of over 3 kb in length, using a short-read sequencer, Barcode-directed Assembly for Extra-long Sequences (BAsE-Seq). BAsE-Seq relies on transposing a template-specific barcode onto random segments of the template molecule and assembling the barcoded short reads into complete haplotypes. We applied BAsE-Seq on mixed clones of hepatitis B virus and accurately identified haplotypes occurring at frequencies greater than or equal to 0.4%, with >99.9% specificity. Applying BAsE-Seq to a clinical sample, we obtained over 9,000 viral haplotypes, which provided an unprecedented view of hepatitis B virus population structure during chronic infection. BAsE-Seq is readily applicable for monitoring quasispecies evolution in viral diseases.
Sequence-based analysis of pQBR103; a representative of a unique, transfer-proficient mega plasmid resident in the microbial community of sugar beet

PubMed Central

Tett, Adrian; Spiers, Andrew J; Crossman, Lisa C; Ager, Duane; Ciric, Lena; Dow, J Maxwell; Fry, John C; Harris, David; Lilley, Andrew; Oliver, Anna; Parkhill, Julian; Quail, Michael A; Rainey, Paul B; Saunders, Nigel J; Seeger, Kathy; Snyder, Lori AS; Squares, Rob; Thomas, Christopher M; Turner, Sarah L; Zhang, Xue-Xian; Field, Dawn; Bailey, Mark J

2009-01-01

The plasmid pQBR103 was found within Pseudomonas populations colonizing the leaf and root surfaces of sugar beet plants growing at Wytham, Oxfordshire, UK. At 425 kb it is the largest self-transmissible plasmid yet sequenced from the phytosphere. It is known to enhance the competitive fitness of its host, and parts of the plasmid are known to be actively transcribed in the plant environment. Analysis of the complete sequence of this plasmid predicts a coding sequence (CDS)-rich genome containing 478 CDSs and an exceptional degree of genetic novelty; 80% of predicted coding sequences cannot be ascribed a function and 60% are orphans. Of those to which function could be assigned, 40% bore greatest similarity to sequences from Pseudomonas spp, and the majority of the remainder showed similarity to other c-proteobacterial genera and plasmids. pQBR103 has identifiable regions presumed responsible for replication and partitioning, but despite being tra+ lacks the full complement of any previously described conjugal transfer functions. The DNA sequence provided few insights into the functional significance of plant-induced transcriptional regions, but suggests that 14% of CDSs may be expressed (11 CDSs with functional annotation and 54 without), further highlighting the ecological importance of these novel CDSs. Comparative analysis indicates that pQBR103 shares significant regions of sequence with other plasmids isolated from sugar beet plants grown at the same geographic location. These plasmid sequences indicate there is more novelty in the mobile DNA pool accessible to phytosphere pseudomonas than is currently appreciated or understood. PMID:18043644
Global Identification of Three Major Genotypes of Varicella-Zoster Virus: Longitudinal Clustering and Strategies for Genotyping

PubMed Central

Loparev, Vladimir N.; Gonzalez, Antonio; Deleon-Carnes, Marlene; Tipples, Graham; Fickenscher, Helmut; Torfason, Einar G.; Schmid, D. Scott

2004-01-01

By analysis of a single, variable, and short DNA sequence of 447 bp located within open reading frame 22 (ORF22), we discriminated three major varicella-zoster virus (VZV) genotypes. VZV isolates from all six inhabited continents that showed nearly complete homology to ORF22 of the European reference strain Dumas were assigned to the European (E) genotype. All Japanese isolates, defined as the Japanese (J) genotype, were identical in the respective genomic region and proved the most divergent from the E strains, carrying four distinct variations. The remaining isolates carried a combination of E- and J-specific variations in the target sequence and thus were collectively termed the mosaic (M) genotype. Three hundred twenty-six isolates collected in 27 countries were genotyped. A distinctive longitudinal distribution of VZV genotypes supports this approach. Among 111 isolates collected from European patients, 96.4% were genotype E. Consistent with this observation, approximately 80% of the VZV strains from the United States were also genotype E. Similarly, genotype E viruses were dominant in the Asian part of Russia and in eastern Australia. M genotype viruses were strongly dominant in tropical regions of Africa, Indochina, and Central America, and they were common in western Australia. However, genotype M viruses were also identified as a minority in several countries worldwide. Two major intertypic variations of genotype M strains were identified, suggesting that the M genotype can be further differentiated into subgenotypes. These data highlight the direction for future VZV genotyping efforts. This approach provides the first simple genotyping method for VZV strains in clinical samples. PMID:15254207
Multiplex APLP System for High-Resolution Haplogrouping of Extremely Degraded East-Asian Mitochondrial DNAs

PubMed Central

Kakuda, Tsuneo; Shojo, Hideki; Tanaka, Mayumi; Nambiar, Phrabhakaran; Minaguchi, Kiyoshi; Umetsu, Kazuo; Adachi, Noboru

2016-01-01

Mitochondrial DNA (mtDNA) serves as a powerful tool for exploring matrilineal phylogeographic ancestry, as well as for analyzing highly degraded samples, because of its polymorphic nature and high copy numbers per cell. The recent advent of complete mitochondrial genome sequencing has led to improved techniques for phylogenetic analyses based on mtDNA, and many multiplex genotyping methods have been developed for the hierarchical analysis of phylogenetically important mutations. However, few high-resolution multiplex genotyping systems for analyzing East-Asian mtDNA can be applied to extremely degraded samples. Here, we present a multiplex system for analyzing mitochondrial single nucleotide polymorphisms (mtSNPs), which relies on a novel amplified product-length polymorphisms (APLP) method that uses inosine-flapped primers and is specifically designed for the detailed haplogrouping of extremely degraded East-Asian mtDNAs. We used fourteen 6-plex polymerase chain reactions (PCRs) and subsequent electrophoresis to examine 81 haplogroup-defining SNPs and 3 insertion/deletion sites, and we were able to securely assign the studied mtDNAs to relevant haplogroups. Our system requires only 1×10−13 g (100 fg) of crude DNA to obtain a full profile. Owing to its small amplicon size (<110 bp), this new APLP system was successfully applied to extremely degraded samples for which direct sequencing of hypervariable segments using mini-primer sets was unsuccessful, and proved to be more robust than conventional APLP analysis. Thus, our new APLP system is effective for retrieving reliable data from extremely degraded East-Asian mtDNAs. PMID:27355212
Multiplex APLP System for High-Resolution Haplogrouping of Extremely Degraded East-Asian Mitochondrial DNAs.

PubMed

Kakuda, Tsuneo; Shojo, Hideki; Tanaka, Mayumi; Nambiar, Phrabhakaran; Minaguchi, Kiyoshi; Umetsu, Kazuo; Adachi, Noboru

2016-01-01

Mitochondrial DNA (mtDNA) serves as a powerful tool for exploring matrilineal phylogeographic ancestry, as well as for analyzing highly degraded samples, because of its polymorphic nature and high copy numbers per cell. The recent advent of complete mitochondrial genome sequencing has led to improved techniques for phylogenetic analyses based on mtDNA, and many multiplex genotyping methods have been developed for the hierarchical analysis of phylogenetically important mutations. However, few high-resolution multiplex genotyping systems for analyzing East-Asian mtDNA can be applied to extremely degraded samples. Here, we present a multiplex system for analyzing mitochondrial single nucleotide polymorphisms (mtSNPs), which relies on a novel amplified product-length polymorphisms (APLP) method that uses inosine-flapped primers and is specifically designed for the detailed haplogrouping of extremely degraded East-Asian mtDNAs. We used fourteen 6-plex polymerase chain reactions (PCRs) and subsequent electrophoresis to examine 81 haplogroup-defining SNPs and 3 insertion/deletion sites, and we were able to securely assign the studied mtDNAs to relevant haplogroups. Our system requires only 1×10-13 g (100 fg) of crude DNA to obtain a full profile. Owing to its small amplicon size (<110 bp), this new APLP system was successfully applied to extremely degraded samples for which direct sequencing of hypervariable segments using mini-primer sets was unsuccessful, and proved to be more robust than conventional APLP analysis. Thus, our new APLP system is effective for retrieving reliable data from extremely degraded East-Asian mtDNAs.
Gene Polymorphism Studies in a Teaching Laboratory

NASA Astrophysics Data System (ADS)

Shultz, Jeffry

2009-02-01

I present a laboratory procedure for illustrating transcription, post-transcriptional modification, gene conservation, and comparative genetics for use in undergraduate biology education. Students are individually assigned genes in a targeted biochemical pathway, for which they design and test polymerase chain reaction (PCR) primers. In this example, students used genes annotated for the steroid biosynthesis pathway in soybean. The authoritative Kyoto encyclopedia of genes and genomes (KEGG) interactive database and other online resources were used to design primers based first on soybean expressed sequence tags (ESTs), then on ESTs from an alternate organism if soybean sequence was unavailable. Students designed a total of 50 gene-based primer pairs (37 soybean, 13 alternative) and tested these for polymorphism state and similarity between two soybean and two pea lines. Student assessment was based on acquisition of laboratory skills and successful project completion. This simple procedure illustrates conservation of genes and is not limited to soybean or pea. Cost per student estimates are included, along with a detailed protocol and flow diagram of the procedure.
Bioconductor Workflow for Microbiome Data Analysis: from raw reads to community analyses

PubMed Central

Callahan, Ben J.; Sankaran, Kris; Fukuyama, Julia A.; McMurdie, Paul J.; Holmes, Susan P.

2016-01-01

High-throughput sequencing of PCR-amplified taxonomic markers (like the 16S rRNA gene) has enabled a new level of analysis of complex bacterial communities known as microbiomes. Many tools exist to quantify and compare abundance levels or OTU composition of communities in different conditions. The sequencing reads have to be denoised and assigned to the closest taxa from a reference database. Common approaches use a notion of 97% similarity and normalize the data by subsampling to equalize library sizes. In this paper, we show that statistical models allow more accurate abundance estimates. By providing a complete workflow in R, we enable the user to do sophisticated downstream statistical analyses, whether parametric or nonparametric. We provide examples of using the R packages dada2, phyloseq, DESeq2, ggplot2 and vegan to filter, visualize and test microbiome data. We also provide examples of supervised analyses using random forests and nonparametric testing using community networks and the ggnetwork package. PMID:27508062

Characterizing visible and invisible cell wall mutant phenotypes.

PubMed

Carpita, Nicholas C; McCann, Maureen C

2015-07-01

About 10% of a plant's genome is devoted to generating the protein machinery to synthesize, remodel, and deconstruct the cell wall. High-throughput genome sequencing technologies have enabled a reasonably complete inventory of wall-related genes that can be assembled into families of common evolutionary origin. Assigning function to each gene family member has been aided immensely by identification of mutants with visible phenotypes or by chemical and spectroscopic analysis of mutants with 'invisible' phenotypes of modified cell wall composition and architecture that do not otherwise affect plant growth or development. This review connects the inference of gene function on the basis of deviation from the wild type in genetic functional analyses to insights provided by modern analytical techniques that have brought us ever closer to elucidating the sequence structures of the major polysaccharide components of the plant cell wall. © The Author 2015. Published by Oxford University Press on behalf of the Society for Experimental Biology. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Later learning stages in procedural memory are impaired in children with Specific Language Impairment.

PubMed

Desmottes, Lise; Meulemans, Thierry; Maillart, Christelle

2016-01-01

According to the Procedural Deficit Hypothesis (PDH), difficulties in the procedural memory system may contribute to the language difficulties encountered by children with Specific Language Impairment (SLI). Most studies investigating the PDH have used the sequence learning paradigm; however these studies have principally focused on initial sequence learning in a single practice session. The present study sought to extend these investigations by assessing the consolidation stage and longer-term retention of implicit sequence-specific knowledge in 42 children with or without SLI. Both groups of children completed a serial reaction time task and were tested 24h and one week after practice. Results showed that children with SLI succeeded as well as children with typical development (TD) in the early acquisition stage of the sequence learning task. However, as training blocks progressed, only TD children improved their sequence knowledge while children with SLI did not appear to evolve any more. Moreover, children with SLI showed a lack of the consolidation gains in sequence knowledge displayed by the TD children. Overall, these results were in line with the predictions of the PDH and suggest that later learning stages in procedural memory are impaired in SLI. Copyright © 2015 Elsevier Ltd. All rights reserved.
Comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes).

PubMed

Dessimoz, Christophe; Zoller, Stefan; Manousaki, Tereza; Qiu, Huan; Meyer, Axel; Kuraku, Shigehiro

2011-09-01

Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references.
Comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes)

PubMed Central

Zoller, Stefan; Manousaki, Tereza; Qiu, Huan; Meyer, Axel; Kuraku, Shigehiro

2011-01-01

Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references. PMID:21712341
Self-organizing approach for meta-genomes.

PubMed

Zhu, Jianfeng; Zheng, Wei-Mou

2014-12-01

We extend the self-organizing approach for annotation of a bacterial genome to analyze the raw sequencing data of the human gut metagenome without sequence assembling. The original approach divides the genomic sequence of a bacterium into non-overlapping segments of equal length and assigns to each segment one of seven 'phases', among which one is for the noncoding regions, three for the direct coding regions to indicate the three possible codon positions of the segment starting site, and three for the reverse coding regions. The noncoding phase and the six coding phases are described by two frequency tables of the 64 triplet types or 'codon usages'. A set of codon usages can be used to update the phase assignment and vice versa. An iteration after an initialization leads to a convergent phase assignment to give an annotation of the genome. In the extension of the approach to a metagenome, we consider a mixture model of a number of categories described by different codon usages. The Illumina Genome Analyzer sequencing data of the total DNA from faecal samples are then examined to understand the diversity of the human gut microbiome. Copyright © 2014 Elsevier Ltd. All rights reserved.
Investigation of the protein osteocalcin of Camelops hesternus: Sequence, structure and phylogenetic implications

NASA Astrophysics Data System (ADS)

Humpula, James F.; Ostrom, Peggy H.; Gandhi, Hasand; Strahler, John R.; Walker, Angela K.; Stafford, Thomas W.; Smith, James J.; Voorhies, Michael R.; George Corner, R.; Andrews, Phillip C.

2007-12-01

Ancient DNA sequences offer an extraordinary opportunity to unravel the evolutionary history of ancient organisms. Protein sequences offer another reservoir of genetic information that has recently become tractable through the application of mass spectrometric techniques. The extent to which ancient protein sequences resolve phylogenetic relationships, however, has not been explored. We determined the osteocalcin amino acid sequence from the bone of an extinct Camelid (21 ka, Camelops hesternus) excavated from Isleta Cave, New Mexico and three bones of extant camelids: bactrian camel ( Camelus bactrianus); dromedary camel ( Camelus dromedarius) and guanaco ( Llama guanacoe) for a diagenetic and phylogenetic assessment. There was no difference in sequence among the four taxa. Structural attributes observed in both modern and ancient osteocalcin include a post-translation modification, Hyp 9, deamidation of Gln 35 and Gln 39, and oxidation of Met 36. Carbamylation of the N-terminus in ancient osteocalcin may result in blockage and explain previous difficulties in sequencing ancient proteins via Edman degradation. A phylogenetic analysis using osteocalcin sequences of 25 vertebrate taxa was conducted to explore osteocalcin protein evolution and the utility of osteocalcin sequences for delineating phylogenetic relationships. The maximum likelihood tree closely reflected generally recognized taxonomic relationships. For example, maximum likelihood analysis recovered rodents, birds and, within hominins, the Homo-Pan-Gorilla trichotomy. Within Artiodactyla, character state analysis showed that a substitution of Pro 4 for His 4 defines the Capra-Ovis clade within Artiodactyla. Homoplasy in our analysis indicated that osteocalcin evolution is not a perfect indicator of species evolution. Limited sequence availability prevented assigning functional significance to sequence changes. Our preliminary analysis of osteocalcin evolution represents an initial step towards a complete character analysis aimed at determining the evolutionary history of this functionally significant protein. We emphasize that ancient protein sequencing and phylogenetic analyses using amino acid sequences must pay close attention to post-translational modifications, amino acid substitutions due to diagenetic alteration and the impacts of isobaric amino acids on mass shifts and sequence alignments.
Heteronuclear NMR assignments and secondary structure of the coiled coil trimerization domain from cartilage matrix protein in oxidized and reduced forms.

PubMed Central

Wiltscheck, R.; Kammerer, R. A.; Dames, S. A.; Schulthess, T.; Blommers, M. J.; Engel, J.; Alexandrescu, A. T.

1997-01-01

The C-terminal oligomerization domain of chicken cartilage matrix protein is a trimeric coiled coil comprised of three identical 43-residue chains. NMR spectra of the protein show equivalent magnetic environments for each monomer, indicating a parallel coiled coil structure with complete threefold symmetry. Sequence-specific assignments for 1H-, 15N-, and 13C-NMR resonances have been obtained from 2D 1H NOESY and TOCSY spectra, and from 3D HNCA, 15N NOESY-HSQC, and HCCH-TOCSY spectra. A stretch of alpha-helix encompassing five heptad repeats (35 residues) has been identified from intra-chain HN-HN and HN-H alpha NOE connectivities. 3JHNH alpha coupling constants, and chemical shift indices. The alpha-helix begins immediately downstream of inter-chain disulfide bonds between residues Cys 5 and Cys 7, and extends to near the C-terminus of the molecule. The threefold symmetry of the molecule is maintained when the inter-chain disulfide bonds that flank the N-terminus of the coiled coil are reduced. Residues Ile 21 through Glu 36 show conserved chemical shifts and NOE connectivities, as well as strong protection from solvent exchange in the oxidized and reduced forms of the protein. By contrast, residues Ile 10 through Val 17 show pronounced chemical shift differences between the oxidized and reduced protein. Strong chemical exchange NOEs between HN resonances and water indicate solvent exchange on time scales faster than 10 s, and suggests a dynamic fraying of the N-terminus of the coiled coil upon reduction of the disulfide bonds. Possible roles for the disulfide crosslinks of the oligomerization domain in the function of cartilage matrix protein are proposed. PMID:9260286
Dictionary-driven protein annotation.

PubMed

Rigoutsos, Isidore; Huynh, Tien; Floratos, Aris; Parida, Laxmi; Platt, Daniel

2002-09-01

Computational methods seeking to automatically determine the properties (functional, structural, physicochemical, etc.) of a protein directly from the sequence have long been the focus of numerous research groups. With the advent of advanced sequencing methods and systems, the number of amino acid sequences that are being deposited in the public databases has been increasing steadily. This has in turn generated a renewed demand for automated approaches that can annotate individual sequences and complete genomes quickly, exhaustively and objectively. In this paper, we present one such approach that is centered around and exploits the Bio-Dictionary, a collection of amino acid patterns that completely covers the natural sequence space and can capture functional and structural signals that have been reused during evolution, within and across protein families. Our annotation approach also makes use of a weighted, position-specific scoring scheme that is unaffected by the over-representation of well-conserved proteins and protein fragments in the databases used. For a given query sequence, the method permits one to determine, in a single pass, the following: local and global similarities between the query and any protein already present in a public database; the likeness of the query to all available archaeal/ bacterial/eukaryotic/viral sequences in the database as a function of amino acid position within the query; the character of secondary structure of the query as a function of amino acid position within the query; the cytoplasmic, transmembrane or extracellular behavior of the query; the nature and position of binding domains, active sites, post-translationally modified sites, signal peptides, etc. In terms of performance, the proposed method is exhaustive, objective and allows for the rapid annotation of individual sequences and full genomes. Annotation examples are presented and discussed in Results, including individual queries and complete genomes that were released publicly after we built the Bio-Dictionary that is used in our experiments. Finally, we have computed the annotations of more than 70 complete genomes and made them available on the World Wide Web at http://cbcsrv.watson.ibm.com/Annotations/.
Identification of novel biomass-degrading enzymes from genomic dark matter: Populating genomic sequence space with functional annotation.

PubMed

Piao, Hailan; Froula, Jeff; Du, Changbin; Kim, Tae-Wan; Hawley, Erik R; Bauer, Stefan; Wang, Zhong; Ivanova, Nathalia; Clark, Douglas S; Klenk, Hans-Peter; Hess, Matthias

2014-08-01

Although recent nucleotide sequencing technologies have significantly enhanced our understanding of microbial genomes, the function of ∼35% of genes identified in a genome currently remains unknown. To improve the understanding of microbial genomes and consequently of microbial processes it will be crucial to assign a function to this "genomic dark matter." Due to the urgent need for additional carbohydrate-active enzymes for improved production of transportation fuels from lignocellulosic biomass, we screened the genomes of more than 5,500 microorganisms for hypothetical proteins that are located in the proximity of already known cellulases. We identified, synthesized and expressed a total of 17 putative cellulase genes with insufficient sequence similarity to currently known cellulases to be identified as such using traditional sequence annotation techniques that rely on significant sequence similarity. The recombinant proteins of the newly identified putative cellulases were subjected to enzymatic activity assays to verify their hydrolytic activity towards cellulose and lignocellulosic biomass. Eleven (65%) of the tested enzymes had significant activity towards at least one of the substrates. This high success rate highlights that a gene context-based approach can be used to assign function to genes that are otherwise categorized as "genomic dark matter" and to identify biomass-degrading enzymes that have little sequence similarity to already known cellulases. The ability to assign function to genes that have no related sequence representatives with functional annotation will be important to enhance our understanding of microbial processes and to identify microbial proteins for a wide range of applications. © 2014 Wiley Periodicals, Inc.
The reconstruction of 2,631 draft metagenome-assembled genomes from the global oceans

PubMed Central

Tully, Benjamin J.; Graham, Elaina D.; Heidelberg, John F.

2018-01-01

Microorganisms play a crucial role in mediating global biogeochemical cycles in the marine environment. By reconstructing the genomes of environmental organisms through metagenomics, researchers are able to study the metabolic potential of Bacteria and Archaea that are resistant to isolation in the laboratory. Utilizing the large metagenomic dataset generated from 234 samples collected during the Tara Oceans circumnavigation expedition, we were able to assemble 102 billion paired-end reads into 562 million contigs, which in turn were co-assembled and consolidated in to 7.2 million contigs ≥2 kb in length. Approximately 1 million of these contigs were binned to reconstruct draft genomes. In total, 2,631 draft genomes with an estimated completion of ≥50% were generated (1,491 draft genomes >70% complete; 603 genomes >90% complete). A majority of the draft genomes were manually assigned phylogeny based on sets of concatenated phylogenetic marker genes and/or 16S rRNA gene sequences. The draft genomes are now publically available for the research community at-large. PMID:29337314
Arrays of probes for positional sequencing by hybridization

DOEpatents

Cantor, Charles R [Boston, MA; Prezetakiewiczr, Marek [East Boston, MA; Smith, Cassandra L [Boston, MA; Sano, Takeshi [Waltham, MA

2008-01-15

This invention is directed to methods and reagents useful for sequencing nucleic acid targets utilizing sequencing by hybridization technology comprising probes, arrays of probes and methods whereby sequence information is obtained rapidly and efficiently in discrete packages. That information can be used for the detection, identification, purification and complete or partial sequencing of a particular target nucleic acid. When coupled with a ligation step, these methods can be performed under a single set of hybridization conditions. The invention also relates to the replication of probe arrays and methods for making and replicating arrays of probes which are useful for the large scale manufacture of diagnostic aids used to screen biological samples for specific target sequences. Arrays created using PCR technology may comprise probes with 5'- and/or 3'-overhangs.
DNA methylation assessment from human slow- and fast-twitch skeletal muscle fibers

PubMed Central

Begue, Gwénaëlle; Raue, Ulrika; Jemiolo, Bozena

2017-01-01

A new application of the reduced representation bisulfite sequencing method was developed using low-DNA input to investigate the epigenetic profile of human slow- and fast-twitch skeletal muscle fibers. Successful library construction was completed with as little as 15 ng of DNA, and high-quality sequencing data were obtained with 32 ng of DNA. Analysis identified 143,160 differentially methylated CpG sites across 14,046 genes. In both fiber types, selected genes predominantly expressed in slow or fast fibers were hypomethylated, which was supported by the RNA-sequencing analysis. These are the first fiber type-specific methylation data from human skeletal muscle and provide a unique platform for future research. NEW & NOTEWORTHY This study validates a low-DNA input reduced representation bisulfite sequencing method for human muscle biopsy samples to investigate the methylation patterns at a fiber type-specific level. These are the first fiber type-specific methylation data reported from human skeletal muscle and thus provide initial insight into basal state differences in myosin heavy chain I and IIa muscle fibers among young, healthy men. PMID:28057818
Characterization of the glutathione S-transferase gene family through ESTs and expression analyses within common and pigmented cultivars of Citrus sinensis (L.) Osbeck.

PubMed

Licciardello, Concetta; D'Agostino, Nunzio; Traini, Alessandra; Recupero, Giuseppe Reforgiato; Frusciante, Luigi; Chiusano, Maria Luisa

2014-02-03

Glutathione S-transferases (GSTs) represent a ubiquitous gene family encoding detoxification enzymes able to recognize reactive electrophilic xenobiotic molecules as well as compounds of endogenous origin. Anthocyanin pigments require GSTs for their transport into the vacuole since their cytoplasmic retention is toxic to the cell. Anthocyanin accumulation in Citrus sinensis (L.) Osbeck fruit flesh determines different phenotypes affecting the typical pigmentation of Sicilian blood oranges. In this paper we describe: i) the characterization of the GST gene family in C. sinensis through a systematic EST analysis; ii) the validation of the EST assembly by exploiting the genome sequences of C. sinensis and C. clementina and their genome annotations; iii) GST gene expression profiling in six tissues/organs and in two different sweet orange cultivars, Cadenera (common) and Moro (pigmented). We identified 61 GST transcripts, described the full- or partial-length nature of the sequences and assigned to each sequence the GST class membership exploiting a comparative approach and the classification scheme proposed for plant species. A total of 23 full-length sequences were defined. Fifty-four of the 61 transcripts were successfully aligned to the C. sinensis and C. clementina genomes. Tissue specific expression profiling demonstrated that the expression of some GST transcripts was 'tissue-affected' and cultivar specific. A comparative analysis of C. sinensis GSTs with those from other plant species was also considered. Data from the current analysis are accessible at http://biosrv.cab.unina.it/citrusGST/, with the aim to provide a reference resource for C. sinensis GSTs. This study aimed at the characterization of the GST gene family in C. sinensis. Based on expression patterns from two different cultivars and on sequence-comparative analyses, we also highlighted that two sequences, a Phi class GST and a Mapeg class GST, could be involved in the conjugation of anthocyanin pigments and in their transport into the vacuole, specifically in fruit flesh of the pigmented cultivar.
Tissue-specific Proteogenomic Analysis of Plutella xylostella Larval Midgut Using a Multialgorithm Pipeline.

PubMed

Zhu, Xun; Xie, Shangbo; Armengaud, Jean; Xie, Wen; Guo, Zhaojiang; Kang, Shi; Wu, Qingjun; Wang, Shaoli; Xia, Jixing; He, Rongjun; Zhang, Youjun

2016-06-01

The diamondback moth, Plutella xylostella (L.), is the major cosmopolitan pest of brassica and other cruciferous crops. Its larval midgut is a dynamic tissue that interfaces with a wide variety of toxicological and physiological processes. The draft sequence of the P. xylostella genome was recently released, but its annotation remains challenging because of the low sequence coverage of this branch of life and the poor description of exon/intron splicing rules for these insects. Peptide sequencing by computational assignment of tandem mass spectra to genome sequence information provides an experimental independent approach for confirming or refuting protein predictions, a concept that has been termed proteogenomics. In this study, we carried out an in-depth proteogenomic analysis to complement genome annotation of P. xylostella larval midgut based on shotgun HPLC-ESI-MS/MS data by means of a multialgorithm pipeline. A total of 876,341 tandem mass spectra were searched against the predicted P. xylostella protein sequences and a whole-genome six-frame translation database. Based on a data set comprising 2694 novel genome search specific peptides, we discovered 439 novel protein-coding genes and corrected 128 existing gene models. To get the most accurate data to seed further insect genome annotation, more than half of the novel protein-coding genes, i.e. 235 over 439, were further validated after RT-PCR amplification and sequencing of the corresponding transcripts. Furthermore, we validated 53 novel alternative splicings. Finally, a total of 6764 proteins were identified, resulting in one of the most comprehensive proteogenomic study of a nonmodel animal. As the first tissue-specific proteogenomics analysis of P. xylostella, this study provides the fundamental basis for high-throughput proteomics and functional genomics approaches aimed at deciphering the molecular mechanisms of resistance and controlling this pest. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.
Dissemination and genetic diversity of chlamydial agents in Polish wildfowl: Isolation and molecular characterisation of avian Chlamydia abortus strains.

PubMed

Szymańska-Czerwińska, Monika; Mitura, Agata; Niemczuk, Krzysztof; Zaręba, Kinga; Jodełko, Agnieszka; Pluta, Aneta; Scharf, Sabine; Vitek, Bailey; Aaziz, Rachid; Vorimore, Fabien; Laroucau, Karine; Schnee, Christiane

2017-01-01

Wild birds are considered as a reservoir for avian chlamydiosis posing a potential infectious threat to domestic poultry and humans. Analysis of 894 cloacal or fecal swabs from free-living birds in Poland revealed an overall Chlamydiaceae prevalence of 14.8% (n = 132) with the highest prevalence noted in Anatidae (19.7%) and Corvidae (13.4%). Further testing conducted with species-specific real-time PCR showed that 65 samples (49.2%) were positive for C. psittaci whereas only one was positive for C. avium. To classify the non-identified chlamydial agents and to genotype the C. psittaci and C. avium-positive samples, specimens were subjected to ompA-PCR and sequencing (n = 83). The ompA-based NJ dendrogram revealed that only 23 out of 83 sequences were assigned to C. psittaci, in particular to four clades representing the previously described C. psittaci genotypes B, C, Mat116 and 1V. Whereas the 59 remaining sequences were assigned to two new clades named G1 and G2, each one including sequences recently obtained from chlamydiae detected in Swedish wetland birds. G1 (18 samples from Anatidae and Rallidae) grouped closely together with genotype 1V and in relative proximity to several C. abortus isolates, and G2 (41 samples from Anatidae and Corvidae) grouped closely to C. psittaci strains of the classical ABE cluster, Matt116 and M56. Finally, deep molecular analysis of four representative isolates of genotypes 1V, G1 and G2 based on 16S rRNA, IGS and partial 23S rRNA sequences as well as MLST clearly classify these isolates within the C. abortus species. Consequently, we propose an expansion of the C. abortus species to include not only the classical isolates of mammalian origin, but also avian isolates so far referred to as atypical C. psittaci or C. psittaci/C. abortus intermediates.
Dissemination and genetic diversity of chlamydial agents in Polish wildfowl: Isolation and molecular characterisation of avian Chlamydia abortus strains

PubMed Central

Szymańska-Czerwińska, Monika; Mitura, Agata; Niemczuk, Krzysztof; Zaręba, Kinga; Jodełko, Agnieszka; Pluta, Aneta; Scharf, Sabine; Vitek, Bailey; Aaziz, Rachid; Vorimore, Fabien; Laroucau, Karine; Schnee, Christiane

2017-01-01

Wild birds are considered as a reservoir for avian chlamydiosis posing a potential infectious threat to domestic poultry and humans. Analysis of 894 cloacal or fecal swabs from free-living birds in Poland revealed an overall Chlamydiaceae prevalence of 14.8% (n = 132) with the highest prevalence noted in Anatidae (19.7%) and Corvidae (13.4%). Further testing conducted with species-specific real-time PCR showed that 65 samples (49.2%) were positive for C. psittaci whereas only one was positive for C. avium. To classify the non-identified chlamydial agents and to genotype the C. psittaci and C. avium-positive samples, specimens were subjected to ompA-PCR and sequencing (n = 83). The ompA-based NJ dendrogram revealed that only 23 out of 83 sequences were assigned to C. psittaci, in particular to four clades representing the previously described C. psittaci genotypes B, C, Mat116 and 1V. Whereas the 59 remaining sequences were assigned to two new clades named G1 and G2, each one including sequences recently obtained from chlamydiae detected in Swedish wetland birds. G1 (18 samples from Anatidae and Rallidae) grouped closely together with genotype 1V and in relative proximity to several C. abortus isolates, and G2 (41 samples from Anatidae and Corvidae) grouped closely to C. psittaci strains of the classical ABE cluster, Matt116 and M56. Finally, deep molecular analysis of four representative isolates of genotypes 1V, G1 and G2 based on 16S rRNA, IGS and partial 23S rRNA sequences as well as MLST clearly classify these isolates within the C. abortus species. Consequently, we propose an expansion of the C. abortus species to include not only the classical isolates of mammalian origin, but also avian isolates so far referred to as atypical C. psittaci or C. psittaci/C. abortus intermediates. PMID:28350846
TANDEM: matching proteins with tandem mass spectra.

PubMed

Craig, Robertson; Beavis, Ronald C

2004-06-12

Tandem mass spectra obtained from fragmenting peptide ions contain some peptide sequence specific information, but often there is not enough information to sequence the original peptide completely. Several proprietary software applications have been developed to attempt to match the spectra with a list of protein sequences that may contain the sequence of the peptide. The application TANDEM was written to provide the proteomics research community with a set of components that can be used to test new methods and algorithms for performing this type of sequence-to-data matching. The source code and binaries for this software are available at http://www.proteome.ca/opensource.html, for Windows, Linux and Macintosh OSX. The source code is made available under the Artistic License, from the authors.
Genetic and Phylogenetic Characterization of Tataguine and Witwatersrand Viruses and Other Orthobunyaviruses of the Anopheles A, Capim, Guamá, Koongol, Mapputta, Tete, and Turlock Serogroups

PubMed Central

Shchetinin, Alexey M.; Lvov, Dmitry K.; Deriabin, Petr G.; Botikov, Andrey G.; Gitelman, Asya K.; Kuhn, Jens H.; Alkhovsky, Sergey V.

2015-01-01

The family Bunyaviridae has more than 530 members that are distributed among five genera or remain to be classified. The genus Orthobunyavirus is the most diverse bunyaviral genus with more than 220 viruses that have been assigned to more than 18 serogroups based on serological cross-reactions and limited molecular-biological characterization. Sequence information for all three orthobunyaviral genome segments is only available for viruses belonging to the Bunyamwera, Bwamba/Pongola, California encephalitis, Gamboa, Group C, Mapputta, Nyando, and Simbu serogroups. Here we present coding-complete sequences for all three genome segments of 15 orthobunyaviruses belonging to the Anopheles A, Capim, Guamá, Kongool, Tete, and Turlock serogroups, and of two unclassified bunyaviruses previously not known to be orthobunyaviruses (Tataguine and Witwatersrand viruses). Using those sequence data, we established the most comprehensive phylogeny of the Orthobunyavirus genus to date, now covering 15 serogroups. Our results emphasize the high genetic diversity of orthobunyaviruses and reveal that the presence of the small nonstructural protein (NSs)-encoding open reading frame is not as common in orthobunyavirus genomes as previously thought. PMID:26610546
The genome sequence of the plant pathogen Xylella fastidiosa. The Xylella fastidiosa Consortium of the Organization for Nucleotide Sequencing and Analysis.

PubMed

Simpson, A J; Reinach, F C; Arruda, P; Abreu, F A; Acencio, M; Alvarenga, R; Alves, L M; Araya, J E; Baia, G S; Baptista, C S; Barros, M H; Bonaccorsi, E D; Bordin, S; Bové, J M; Briones, M R; Bueno, M R; Camargo, A A; Camargo, L E; Carraro, D M; Carrer, H; Colauto, N B; Colombo, C; Costa, F F; Costa, M C; Costa-Neto, C M; Coutinho, L L; Cristofani, M; Dias-Neto, E; Docena, C; El-Dorry, H; Facincani, A P; Ferreira, A J; Ferreira, V C; Ferro, J A; Fraga, J S; França, S C; Franco, M C; Frohme, M; Furlan, L R; Garnier, M; Goldman, G H; Goldman, M H; Gomes, S L; Gruber, A; Ho, P L; Hoheisel, J D; Junqueira, M L; Kemper, E L; Kitajima, J P; Krieger, J E; Kuramae, E E; Laigret, F; Lambais, M R; Leite, L C; Lemos, E G; Lemos, M V; Lopes, S A; Lopes, C R; Machado, J A; Machado, M A; Madeira, A M; Madeira, H M; Marino, C L; Marques, M V; Martins, E A; Martins, E M; Matsukuma, A Y; Menck, C F; Miracca, E C; Miyaki, C Y; Monteriro-Vitorello, C B; Moon, D H; Nagai, M A; Nascimento, A L; Netto, L E; Nhani, A; Nobrega, F G; Nunes, L R; Oliveira, M A; de Oliveira, M C; de Oliveira, R C; Palmieri, D A; Paris, A; Peixoto, B R; Pereira, G A; Pereira, H A; Pesquero, J B; Quaggio, R B; Roberto, P G; Rodrigues, V; de M Rosa, A J; de Rosa, V E; de Sá, R G; Santelli, R V; Sawasaki, H E; da Silva, A C; da Silva, A M; da Silva, F R; da Silva, W A; da Silveira, J F; Silvestri, M L; Siqueira, W J; de Souza, A A; de Souza, A P; Terenzi, M F; Truffi, D; Tsai, S M; Tsuhako, M H; Vallada, H; Van Sluys, M A; Verjovski-Almeida, S; Vettore, A L; Zago, M A; Zatz, M; Meidanis, J; Setubal, J C

2000-07-13

Xylella fastidiosa is a fastidious, xylem-limited bacterium that causes a range of economically important plant diseases. Here we report the complete genome sequence of X. fastidiosa clone 9a5c, which causes citrus variegated chlorosis--a serious disease of orange trees. The genome comprises a 52.7% GC-rich 2,679,305-base-pair (bp) circular chromosome and two plasmids of 51,158 bp and 1,285 bp. We can assign putative functions to 47% of the 2,904 predicted coding regions. Efficient metabolic functions are predicted, with sugars as the principal energy and carbon source, supporting existence in the nutrient-poor xylem sap. The mechanisms associated with pathogenicity and virulence involve toxins, antibiotics and ion sequestration systems, as well as bacterium-bacterium and bacterium-host interactions mediated by a range of proteins. Orthologues of some of these proteins have only been identified in animal and human pathogens; their presence in X. fastidiosa indicates that the molecular basis for bacterial pathogenicity is both conserved and independent of host. At least 83 genes are bacteriophage-derived and include virulence-associated genes from other bacteria, providing direct evidence of phage-mediated horizontal gene transfer.
The Classification of Protein Domains.

PubMed

Dawson, Natalie; Sillitoe, Ian; Marsden, Russell L; Orengo, Christine A

2017-01-01

The significant expansion in protein sequence and structure data that we are now witnessing brings with it a pressing need to bring order to the protein world. Such order enables us to gain insights into the evolution of proteins, their function and the extent to which the functional repertoire can vary across the three kingdoms of life. This has lead to the creation of a wide range of protein family classifications that aim to group proteins based upon their evolutionary relationships.In this chapter we discuss the approaches and methods that are frequently used in the classification of proteins, with a specific emphasis on the classification of protein domains. The construction of both domain sequence and domain structure databases is considered and we show how the use of domain family annotations to assign structural and functional information is enhancing our understanding of genomes.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.