Sample records for acid sequence variations

  1. Detection of nucleic acid sequences by invader-directed cleavage

    DOEpatents

    Brow, Mary Ann D.; Hall, Jeff Steven Grotelueschen; Lyamichev, Victor; Olive, David Michael; Prudent, James Robert

    1999-01-01

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The 5' nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based by charge.

  2. Variation of clinical expression in patients with Stargardt dystrophy and sequence variations in the ABCR gene.

    PubMed

    Fishman, G A; Stone, E M; Grover, S; Derlacki, D J; Haines, H L; Hockey, R R

    1999-04-01

    To report the spectrum of ophthalmic findings in patients with Stargardt dystrophy or fundus flavimaculatus who have a specific sequence variation in the ABCR gene. Twenty-nine patients with Stargardt dystrophy or fundus flavimaculatus from different pedigrees were identified with possible disease-causing sequence variations in the ABCR gene from a group of 66 patients who were screened for sequence variations in this gene. Patients underwent a routine ocular examination, including slitlamp biomicroscopy and a dilated fundus examination. Fluorescein angiography was performed on 22 patients, and electroretinographic measurements were obtained on 24 of 29 patients. Kinetic visual fields were measured with a Goldmann perimeter in 26 patients. Single-strand conformation polymorphism analysis and DNA sequencing were used to identify variations in coding sequences of the ABCR gene. Three clinical phenotypes were observed among these 29 patients. In phenotype I, 9 of 12 patients had a sequence change in exon 42 of the ABCR gene in which the amino acid glutamic acid was substituted for glycine (Gly1961Glu). In only 4 of these 9 patients was a second possible disease-causing mutation found on the other ABCR allele. In addition to an atrophic-appearing macular lesion, phenotype I was characterized by localized perifoveal yellowish white flecks, the absence of a dark choroid, and normal electroretinographic amplitudes. Phenotype II consisted of 10 patients who showed a dark choroid and more diffuse yellowish white flecks in the fundus. None exhibited the Gly1961Glu change. Phenotype III consisted of 7 patients who showed extensive atrophic-appearing changes of the retinal pigment epithelium. Electroretinographic cone and rod amplitudes were reduced. One patient showed the Gly1961Glu change. A wide variation in clinical phenotype can occur in patients with sequence changes in the ABCR gene. In individual patients, a certain phenotype seems to be associated with the presence of

  3. Full genome sequence of Rocio virus reveal substantial variations from the prototype Rocio virus SPH 34675 sequence.

    PubMed

    Setoh, Yin Xiang; Amarilla, Alberto A; Peng, Nias Y; Slonchak, Andrii; Periasamy, Parthiban; Figueiredo, Luiz T M; Aquino, Victor H; Khromykh, Alexander A

    2018-01-01

    Rocio virus (ROCV) is an arbovirus belonging to the genus Flavivirus, family Flaviviridae. We present an updated sequence of ROCV strain SPH 34675 (GenBank: AY632542.4), the only available full genome sequence prior to this study. Using next-generation sequencing of the entire genome, we reveal substantial sequence variation from the prototype sequence, with 30 nucleotide differences amounting to 14 amino acid changes, as well as significant changes to predicted 3'UTR RNA structures. Our results present an updated and corrected sequence of a potential emerging human-virulent flavivirus uniquely indigenous to Brazil (GenBank: MF461639).

  4. Variation of amino acid sequences of serum amyloid a (SAA) and immunohistochemical analysis of amyloid a (AA) in Japanese domestic cats.

    PubMed

    Tei, Meina; Uchida, Kazuyuki; Chambers, James K; Watanabe, Ken-Ichi; Tamamoto, Takashi; Ohno, Koichi; Nakayama, Hiroyuki

    2018-02-02

    Amyloid A (AA) amyloidosis, a fatal systemic amyloid disease, occurs secondary to chronic inflammatory conditions in humans. Although persistently elevated serum amyloid A (SAA) levels are required for its pathogenesis, not all individuals with chronic inflammation necessarily develop AA amyloidosis. Furthermore, many diseases in cats are associated with the elevated production of SAA, whereas only a small number actually develop AA amyloidosis. We hypothesized that a genetic mutation in the SAA gene may strongly contribute to the pathogenesis of feline AA amyloidosis. In the present study, genomic DNA from four Japanese domestic cats (JDCs) with AA amyloidosis and from five without amyloidosis was analyzed using polymerase chain reaction (PCR) amplification and direct sequencing. We identified the novel variation combination of 45R-51A in the deduced amino acid sequences of four JDCs with amyloidosis and five without. However, there was no relationship between amino acid variations and the distribution of AA amyloid deposits, indicating that differences in SAA sequences do not contribute to the pathogenesis of AA amyloidosis. Immunohistochemical analysis using antisera against the three different parts of the feline SAA protein-i.e., the N-terminal, central, and C-terminal regions-revealed that feline AA contained the C-terminus, unlike human AA. These results indicate that the cleavage and degradation of the C-terminus are not essential for amyloid fibril formation in JDCs.

  5. Genomic Sequence Variation Markup Language (GSVML).

    PubMed

    Nakaya, Jun; Kimura, Michio; Hiroi, Kaei; Ido, Keisuke; Yang, Woosung; Tanaka, Hiroshi

    2010-02-01

    With the aim of making good use of internationally accumulated genomic sequence variation data, which is increasing rapidly due to the explosive amount of genomic research at present, the development of an interoperable data exchange format and its international standardization are necessary. Genomic Sequence Variation Markup Language (GSVML) will focus on genomic sequence variation data and human health applications, such as gene based medicine or pharmacogenomics. We developed GSVML through eight steps, based on case analysis and domain investigations. By focusing on the design scope to human health applications and genomic sequence variation, we attempted to eliminate ambiguity and to ensure practicability. We intended to satisfy the requirements derived from the use case analysis of human-based clinical genomic applications. Based on database investigations, we attempted to minimize the redundancy of the data format, while maximizing the data covering range. We also attempted to ensure communication and interface ability with other Markup Languages, for exchange of omics data among various omics researchers or facilities. The interface ability with developing clinical standards, such as the Health Level Seven Genotype Information model, was analyzed. We developed the human health-oriented GSVML comprising variation data, direct annotation, and indirect annotation categories; the variation data category is required, while the direct and indirect annotation categories are optional. The annotation categories contain omics and clinical information, and have internal relationships. For designing, we examined 6 cases for three criteria as human health application and 15 data elements for three criteria as data formats for genomic sequence variation data exchange. The data format of five international SNP databases and six Markup Languages and the interface ability to the Health Level Seven Genotype Model in terms of 317 items were investigated. GSVML was developed as

  6. Composition for nucleic acid sequencing

    DOEpatents

    Korlach, Jonas [Ithaca, NY; Webb, Watt W [Ithaca, NY; Levene, Michael [Ithaca, NY; Turner, Stephen [Ithaca, NY; Craighead, Harold G [Ithaca, NY; Foquet, Mathieu [Ithaca, NY

    2008-08-26

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  7. Chip-based sequencing nucleic acids

    DOEpatents

    Beer, Neil Reginald

    2014-08-26

    A system for fast DNA sequencing by amplification of genetic material within microreactors, denaturing, demulsifying, and then sequencing the material, while retaining it in a PCR/sequencing zone by a magnetic field. One embodiment includes sequencing nucleic acids on a microchip that includes a microchannel flow channel in the microchip. The nucleic acids are isolated and hybridized to magnetic nanoparticles or to magnetic polystyrene-coated beads. Microreactor droplets are formed in the microchannel flow channel. The microreactor droplets containing the nucleic acids and the magnetic nanoparticles are retained in a magnetic trap in the microchannel flow channel and sequenced.

  8. Diagnostics based on nucleic acid sequence variant profiling: PCR, hybridization, and NGS approaches.

    PubMed

    Khodakov, Dmitriy; Wang, Chunyan; Zhang, David Yu

    2016-10-01

    Nucleic acid sequence variations have been implicated in many diseases, and reliable detection and quantitation of DNA/RNA biomarkers can inform effective therapeutic action, enabling precision medicine. Nucleic acid analysis technologies being translated into the clinic can broadly be classified into hybridization, PCR, and sequencing, as well as their combinations. Here we review the molecular mechanisms of popular commercial assays, and their progress in translation into in vitro diagnostics. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  9. Method for sequencing nucleic acid molecules

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2006-06-06

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  10. Method for sequencing nucleic acid molecules

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2006-05-30

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  11. Method for identifying and quantifying nucleic acid sequence aberrations

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1998-01-01

    A method for detecting nucleic acid sequence aberrations by detecting nucleic acid sequences having both a first and a second nucleic acid sequence type, the presence of the first and second sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. The method uses a first hybridization probe which includes a nucleic acid sequence that is complementary to a first sequence type and a first complexing agent capable of attaching to a second complexing agent and a second hybridization probe which includes a nucleic acid sequence that selectively hybridizes to the second nucleic acid sequence type over the first sequence type and includes a detectable marker for detecting the second hybridization probe.

  12. Method for identifying and quantifying nucleic acid sequence aberrations

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1998-07-21

    A method is disclosed for detecting nucleic acid sequence aberrations by detecting nucleic acid sequences having both a first and a second nucleic acid sequence type, the presence of the first and second sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. The method uses a first hybridization probe which includes a nucleic acid sequence that is complementary to a first sequence type and a first complexing agent capable of attaching to a second complexing agent and a second hybridization probe which includes a nucleic acid sequence that selectively hybridizes to the second nucleic acid sequence type over the first sequence type and includes a detectable marker for detecting the second hybridization probe. 11 figs.

  13. Using chaos to generate variations on movement sequences

    NASA Astrophysics Data System (ADS)

    Bradley, Elizabeth; Stuart, Joshua

    1998-12-01

    We describe a method for introducing variations into predefined motion sequences using a chaotic symbol-sequence reordering technique. A progression of symbols representing the body positions in a dance piece, martial arts form, or other motion sequence is mapped onto a chaotic trajectory, establishing a symbolic dynamics that links the movement sequence and the attractor structure. A variation on the original piece is created by generating a trajectory with slightly different initial conditions, inverting the mapping, and using special corpus-based graph-theoretic interpolation schemes to smooth any abrupt transitions. Sensitive dependence guarantees that the variation is different from the original; the attractor structure and the symbolic dynamics guarantee that the two resemble one another in both aesthetic and mathematical senses.

  14. Solid phase sequencing of double-stranded nucleic acids

    DOEpatents

    Fu, Dong-Jing; Cantor, Charles R.; Koster, Hubert; Smith, Cassandra L.

    2002-01-01

    This invention relates to methods for detecting and sequencing of target double-stranded nucleic acid sequences, to nucleic acid probes and arrays of probes useful in these methods, and to kits and systems which contain these probes. Useful methods involve hybridizing the nucleic acids or nucleic acids which represent complementary or homologous sequences of the target to an array of nucleic acid probes. These probe comprise a single-stranded portion, an optional double-stranded portion and a variable sequence within the single-stranded portion. The molecular weights of the hybridized nucleic acids of the set can be determined by mass spectroscopy, and the sequence of the target determined from the molecular weights of the fragments. Nucleic acids whose sequences can be determined include nucleic acids in biological samples such as patient biopsies and environmental samples. Probes may be fixed to a solid support such as a hybridization chip to facilitate automated determination of molecular weights and identification of the target sequence.

  15. Conversion of amino-acid sequence in proteins to classical music: search for auditory patterns

    PubMed Central

    2007-01-01

    We have converted genome-encoded protein sequences into musical notes to reveal auditory patterns without compromising musicality. We derived a reduced range of 13 base notes by pairing similar amino acids and distinguishing them using variations of three-note chords and codon distribution to dictate rhythm. The conversion will help make genomic coding sequences more approachable for the general public, young children, and vision-impaired scientists. PMID:17477882

  16. Cleavage of nucleic acids

    DOEpatents

    Prudent, James R.; Hall, Jeff G.; Lyamichev, Victor L.; Brow, Mary Ann D.; Dahlberg, James E.

    2007-12-11

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof.

  17. Cleavage of nucleic acids

    DOEpatents

    Prudent, James R.; Hall, Jeff G.; Lyamichev, Victor I.; Brow; Mary Ann D.; Dahlberg, James E.

    2010-11-09

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof.

  18. Cleavage of nucleic acids

    DOEpatents

    Prudent, James R.; Hall, Jeff G.; Lyamichev, Victor I.; Brow, Mary Ann D.; Dahlberg, James E.

    2000-01-01

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof.

  19. Nucleic acid detection assays

    DOEpatents

    Prudent, James R.; Hall, Jeff G.; Lyamichev, Victor I.; Brow, Mary Ann; Dahlberg, James E.

    2005-04-05

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof.

  20. Methods and compositions for efficient nucleic acid sequencing

    DOEpatents

    Drmanac, Radoje

    2006-07-04

    Disclosed are novel methods and compositions for rapid and highly efficient nucleic acid sequencing based upon hybridization with two sets of small oligonucleotide probes of known sequences. Extremely large nucleic acid molecules, including chromosomes and non-amplified RNA, may be sequenced without prior cloning or subcloning steps. The methods of the invention also solve various current problems associated with sequencing technology such as, for example, high noise to signal ratios and difficult discrimination, attaching many nucleic acid fragments to a surface, preparing many, longer or more complex probes and labelling more species.

  1. Methods and compositions for efficient nucleic acid sequencing

    DOEpatents

    Drmanac, Radoje

    2002-01-01

    Disclosed are novel methods and compositions for rapid and highly efficient nucleic acid sequencing based upon hybridization with two sets of small oligonucleotide probes of known sequences. Extremely large nucleic acid molecules, including chromosomes and non-amplified RNA, may be sequenced without prior cloning or subcloning steps. The methods of the invention also solve various current problems associated with sequencing technology such as, for example, high noise to signal ratios and difficult discrimination, attaching many nucleic acid fragments to a surface, preparing many, longer or more complex probes and labelling more species.

  2. Invasive cleavage of nucleic acids

    DOEpatents

    Prudent, James R.; Hall, Jeff G.; Lyamichev, Victor I.; Brow, Mary Ann D.; Dahlberg, James E.

    1999-01-01

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof.

  3. Invasive cleavage of nucleic acids

    DOEpatents

    Prudent, James R.; Hall, Jeff G.; Lyamichev, Victor I.; Brow, Mary Ann D.; Dahlberg, James E.

    2002-01-01

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof.

  4. Mining sequence variations in representative polyploid sugarcane germplasm accessions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yang, Xiping; Song, Jian; You, Qian

    Sugarcane (Saccharum spp.) is one of the most important economic crops because of its high sugar production and biofuel potential. Due to the high polyploid level and complex genome of sugarcane, it has been a huge challenge to investigate genomic sequence variations, which are critical for identifying alleles contributing to important agronomic traits. In order to mine the genetic variations in sugarcane, genotyping by sequencing (GBS), was used to genotype 14 representative Saccharum complex accessions. GBS is a method to generate a large number of markers, enabled by next generation sequencing (NGS) and the genome complexity reduction using restriction enzymes.more » To use GBS for high throughput genotyping highly polyploid sugarcane, the GBS analysis pipelines in 14 Saccharum complex accessions were established by evaluating different alignment methods, sequence variants callers, and sequence depth for single nucleotide polymorphism (SNP) filtering. By using the established pipeline, a total of 76,251 non-redundant SNPs, 5642 InDels, 6380 presence/absence variants (PAVs), and 826 copy number variations (CNVs) were detected among the 14 accessions. In addition, non-reference based universal network enabled analysis kit and Stacks de novo called 34,353 and 109,043 SNPs, respectively. In the 14 accessions, the percentages of single dose SNPs ranged from 38.3% to 62.3% with an average of 49.6%, much more than the portions of multiple dosage SNPs. Concordantly called SNPs were used to evaluate the phylogenetic relationship among the 14 accessions. The results showed that the divergence time between the Erianthus genus and the Saccharum genus was more than 10 million years ago (MYA). The Saccharum species separated from their common ancestors ranging from 0.19 to 1.65 MYA. The GBS pipelines including the reference sequences, alignment methods, sequence variant callers, and sequence depth were recommended and discussed for the Saccharum complex and other related

  5. Mining sequence variations in representative polyploid sugarcane germplasm accessions

    DOE PAGES

    Yang, Xiping; Song, Jian; You, Qian; ...

    2017-08-09

    Sugarcane (Saccharum spp.) is one of the most important economic crops because of its high sugar production and biofuel potential. Due to the high polyploid level and complex genome of sugarcane, it has been a huge challenge to investigate genomic sequence variations, which are critical for identifying alleles contributing to important agronomic traits. In order to mine the genetic variations in sugarcane, genotyping by sequencing (GBS), was used to genotype 14 representative Saccharum complex accessions. GBS is a method to generate a large number of markers, enabled by next generation sequencing (NGS) and the genome complexity reduction using restriction enzymes.more » To use GBS for high throughput genotyping highly polyploid sugarcane, the GBS analysis pipelines in 14 Saccharum complex accessions were established by evaluating different alignment methods, sequence variants callers, and sequence depth for single nucleotide polymorphism (SNP) filtering. By using the established pipeline, a total of 76,251 non-redundant SNPs, 5642 InDels, 6380 presence/absence variants (PAVs), and 826 copy number variations (CNVs) were detected among the 14 accessions. In addition, non-reference based universal network enabled analysis kit and Stacks de novo called 34,353 and 109,043 SNPs, respectively. In the 14 accessions, the percentages of single dose SNPs ranged from 38.3% to 62.3% with an average of 49.6%, much more than the portions of multiple dosage SNPs. Concordantly called SNPs were used to evaluate the phylogenetic relationship among the 14 accessions. The results showed that the divergence time between the Erianthus genus and the Saccharum genus was more than 10 million years ago (MYA). The Saccharum species separated from their common ancestors ranging from 0.19 to 1.65 MYA. The GBS pipelines including the reference sequences, alignment methods, sequence variant callers, and sequence depth were recommended and discussed for the Saccharum complex and other related

  6. Dipeptide Sequence Determination: Analyzing Phenylthiohydantoin Amino Acids by HPLC

    NASA Astrophysics Data System (ADS)

    Barton, Janice S.; Tang, Chung-Fei; Reed, Steven S.

    2000-02-01

    Amino acid composition and sequence determination, important techniques for characterizing peptides and proteins, are essential for predicting conformation and studying sequence alignment. This experiment presents improved, fundamental methods of sequence analysis for an upper-division biochemistry laboratory. Working in pairs, students use the Edman reagent to prepare phenylthiohydantoin derivatives of amino acids for determination of the sequence of an unknown dipeptide. With a single HPLC technique, students identify both the N-terminal amino acid and the composition of the dipeptide. This method yields good precision of retention times and allows use of a broad range of amino acids as components of the dipeptide. Students learn fundamental principles and techniques of sequence analysis and HPLC.

  7. 77 FR 65537 - Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-10-29

    ... DEPARTMENT OF COMMERCE Patent and Trademark Office Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence Disclosures ACTION: Proposed collection; comment request... Patent applications that contain nucleotide and/or amino acid sequence disclosures must include a copy of...

  8. High speed nucleic acid sequencing

    DOEpatents

    Korlach, Jonas [Ithaca, NY; Webb, Watt W [Ithaca, NY; Levene, Michael [Ithaca, NY; Turner, Stephen [Ithaca, NY; Craighead, Harold G [Ithaca, NY; Foquet, Mathieu [Ithaca, NY

    2011-05-17

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid. Each type of labeled nucleotide comprises an acceptor fluorophore attached to a phosphate portion of the nucleotide such that the fluorophore is removed upon incorporation into a growing strand. Fluorescent signal is emitted via fluorescent resonance energy transfer between the donor fluorophore and the acceptor fluorophore as each nucleotide is incorporated into the growing strand. The sequence is deduced by identifying which base is being incorporated into the growing strand.

  9. Terminal region sequence variations in variola virus DNA.

    PubMed

    Massung, R F; Loparev, V N; Knight, J C; Totmenin, A V; Chizhikov, V E; Parsons, J M; Safronov, P F; Gutorov, V V; Shchelkunov, S N; Esposito, J J

    1996-07-15

    Genome DNA terminal region sequences were determined for a Brazilian alastrim variola minor virus strain Garcia-1966 that was associated with an 0.8% case-fatality rate and African smallpox strains Congo-1970 and Somalia-1977 associated with variola major (9.6%) and minor (0.4%) mortality rates, respectively. A base sequence identity of > or = 98.8% was determined after aligning 30 kb of the left- or right-end region sequences with cognate sequences previously determined for Asian variola major strains India-1967 (31% death rate) and Bangladesh-1975 (18.5% death rate). The deduced amino acid sequences of putative proteins of > or = 65 amino acids also showed relatively high identity, although the Asian and African viruses were clearly more related to each other than to alastrim virus. Alastrim virus contained only 10 of 70 proteins that were 100% identical to homologs in Asian strains, and 7 alastrim-specific proteins were noted.

  10. In Silico Detection of Sequence Variations Modifying Transcriptional Regulation

    PubMed Central

    Andersen, Malin C; Engström, Pär G; Lithwick, Stuart; Arenillas, David; Eriksson, Per; Lenhard, Boris; Wasserman, Wyeth W; Odeberg, Jacob

    2008-01-01

    Identification of functional genetic variation associated with increased susceptibility to complex diseases can elucidate genes and underlying biochemical mechanisms linked to disease onset and progression. For genes linked to genetic diseases, most identified causal mutations alter an encoded protein sequence. Technological advances for measuring RNA abundance suggest that a significant number of undiscovered causal mutations may alter the regulation of gene transcription. However, it remains a challenge to separate causal genetic variations from linked neutral variations. Here we present an in silico driven approach to identify possible genetic variation in regulatory sequences. The approach combines phylogenetic footprinting and transcription factor binding site prediction to identify variation in candidate cis-regulatory elements. The bioinformatics approach has been tested on a set of SNPs that are reported to have a regulatory function, as well as background SNPs. In the absence of additional information about an analyzed gene, the poor specificity of binding site prediction is prohibitive to its application. However, when additional data is available that can give guidance on which transcription factor is involved in the regulation of the gene, the in silico binding site prediction improves the selection of candidate regulatory polymorphisms for further analyses. The bioinformatics software generated for the analysis has been implemented as a Web-based application system entitled RAVEN (regulatory analysis of variation in enhancers). The RAVEN system is available at http://www.cisreg.ca for all researchers interested in the detection and characterization of regulatory sequence variation. PMID:18208319

  11. Correlation between fibroin amino acid sequence and physical silk properties.

    PubMed

    Fedic, Robert; Zurovec, Michal; Sehnal, Frantisek

    2003-09-12

    The fiber properties of lepidopteran silk depend on the amino acid repeats that interact during H-fibroin polymerization. The aim of our research was to relate repeat composition to insect biology and fiber strength. Representative regions of the H-fibroin genes were sequenced and analyzed in three pyralid species: wax moth (Galleria mellonella), European flour moth (Ephestia kuehniella), and Indian meal moth (Plodia interpunctella). The amino acid repeats are species-specific, evidently a diversification of an ancestral region of 43 residues, and include three types of regularly dispersed motifs: modifications of GSSAASAA sequence, stretches of tripeptides GXZ where X and Z represent bulky residues, and sequences similar to PVIVIEE. No concatenations of GX dipeptide or alanine, which are typical for Bombyx silkworms and Antheraea silk moths, respectively, were found. Despite different repeat structure, the silks of G. mellonella and E. kuehniella exhibit similar tensile strength as the Bombyx and Antheraea silks. We suggest that in these latter two species, variations in the repeat length obstruct repeat alignment, but sufficiently long stretches of iterated residues get superposed to interact. In the pyralid H-fibroins, interactions of the widely separated and diverse motifs depend on the precision of repeat matching; silk is strong in G. mellonella and E. kuehniella, with 2-3 types of long homogeneous repeats, and nearly 10 times weaker in P. interpunctella, with seven types of shorter erratic repeats. The high proportion of large amino acids in the H-fibroin of pyralids has probably evolved in connection with the spinning habit of caterpillars that live in protective silk tubes and spin continuously, enlarging the tubes on one end and partly devouring the other one. The silk serves as a depot of energetically rich and essential amino acids that may be scarce in the diet.

  12. Major Breeding Plumage Color Differences of Male Ruffs (Philomachus pugnax) Are Not Associated With Coding Sequence Variation in the MC1R Gene

    PubMed Central

    Küpper, Clemens; Burke, Terry; Lank, David B.

    2015-01-01

    Sequence variation in the melanocortin-1 receptor (MC1R) gene explains color morph variation in several species of birds and mammals. Ruffs (Philomachus pugnax) exhibit major dark/light color differences in melanin-based male breeding plumage which is closely associated with alternative reproductive behavior. A previous study identified a microsatellite marker (Ppu020) near the MC1R locus associated with the presence/absence of ornamental plumage. We investigated whether coding sequence variation in the MC1R gene explains major dark/light plumage color variation and/or the presence/absence of ornamental plumage in ruffs. Among 821bp of the MC1R coding region from 44 male ruffs we found 3 single nucleotide polymorphisms, representing 1 nonsynonymous and 2 synonymous amino acid substitutions. None were associated with major dark/light color differences or the presence/absence of ornamental plumage. At all amino acid sites known to be functionally important in other avian species with dark/light plumage color variation, ruffs were either monomorphic or the shared polymorphism did not coincide with color morph. Neither ornamental plumage color differences nor the presence/absence of ornamental plumage in ruffs are likely to be caused entirely by amino acid variation within the coding regions of the MC1R locus. Regulatory elements and structural variation at other loci may be involved in melanin expression and contribute to the extreme plumage polymorphism observed in this species. PMID:25534935

  13. Kit for detecting nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    2001-01-01

    A kit is provided for detecting a target nucleic acid sequence in a sample, the kit comprising: a first hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a first portion of the target sequence, the first hybridization probe including a first complexing agent for forming a binding pair with a second complexing agent; and a second hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a second portion of the target sequence to which the first hybridization probe does not selectively hybridize, the second hybridization probe including a detectable marker; a third hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a first portion of the target sequence, the third hybridization probe including the same detectable marker as the second hybridization probe; and a fourth hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a second portion of the target sequence to which the third hybridization probe does not selectively hybridize, the fourth hybridization probe including the first complexing agent for forming a binding pair with the second complexing agent; wherein the first and second hybridization probes are capable of simultaneously hybridizing to the target sequence and the third and fourth hybridization probes are capable of simultaneously hybridizing to the target sequence, the detectable marker is not present on the first or fourth hybridization probes and the first, second, third, and fourth hybridization probes each include a competitive nucleic acid sequence which is sufficiently complementary to a third portion of the target sequence that the competitive sequences of the first, second, third, and fourth hybridization probes compete with each other to hybridize to the third portion of the

  14. Evidence of Divergent Amino Acid Usage in Comparative Analyses of R5- and X4-Associated HIV-1 Vpr Sequences

    PubMed Central

    Antell, Gregory C.; Zhong, Wen; Kercher, Katherine; Passic, Shendra; Williams, Jean; Liu, Yucheng; James, Tony; Jacobson, Jeffrey M.; Szep, Zsofia

    2017-01-01

    Vpr is an HIV-1 accessory protein that plays numerous roles during viral replication, and some of which are cell type dependent. To test the hypothesis that HIV-1 tropism extends beyond the envelope into the vpr gene, studies were performed to identify the associations between coreceptor usage and Vpr variation in HIV-1-infected patients. Colinear HIV-1 Env-V3 and Vpr amino acid sequences were obtained from the LANL HIV-1 sequence database and from well-suppressed patients in the Drexel/Temple Medicine CNS AIDS Research and Eradication Study (CARES) Cohort. Genotypic classification of Env-V3 sequences as X4 (CXCR4-utilizing) or R5 (CCR5-utilizing) was used to group colinear Vpr sequences. To reveal the sequences associated with a specific coreceptor usage genotype, Vpr amino acid sequences were assessed for amino acid diversity and Jensen-Shannon divergence between the two groups. Five amino acid alphabets were used to comprehensively examine the impact of amino acid substitutions involving side chains with similar physiochemical properties. Positions 36, 37, 41, 89, and 96 of Vpr were characterized by statistically significant divergence across multiple alphabets when X4 and R5 sequence groups were compared. In addition, consensus amino acid switches were found at positions 37 and 41 in comparisons of the R5 and X4 sequence populations. These results suggest an evolutionary link between Vpr and gp120 in HIV-1-infected patients. PMID:28620613

  15. Analysis of microbial community variation during the mixed culture fermentation of agricultural peel wastes to produce lactic acid.

    PubMed

    Liang, Shaobo; Gliniewicz, Karol; Gerritsen, Alida T; McDonald, Armando G

    2016-05-01

    Mixed cultures fermentation can be used to convert organic wastes into various chemicals and fuels. This study examined the fermentation performance of four batch reactors fed with different agricultural (orange, banana, and potato (mechanical and steam)) peel wastes using mixed cultures, and monitored the interval variation of reactor microbial communities with 16S rRNA genes using Illumina sequencing. All four reactors produced similar chemical profile with lactic acid (LA) as dominant compound. Acetic acid and ethanol were also observed with small fractions. The Illumina sequencing results revealed the diversity of microbial community decreased during fermentation and a community of largely lactic acid producing bacteria dominated by species of Lactobacillus developed. Copyright © 2016 Elsevier Ltd. All rights reserved.

  16. Non-codingRNA sequence variations in human chronic lymphocytic leukemia and colorectal cancer.

    PubMed

    Wojcik, Sylwia E; Rossi, Simona; Shimizu, Masayoshi; Nicoloso, Milena S; Cimmino, Amelia; Alder, Hansjuerg; Herlea, Vlad; Rassenti, Laura Z; Rai, Kanti R; Kipps, Thomas J; Keating, Michael J; Croce, Carlo M; Calin, George A

    2010-02-01

    Cancer is a genetic disease in which the interplay between alterations in protein-coding genes and non-coding RNAs (ncRNAs) plays a fundamental role. In recent years, the full coding component of the human genome was sequenced in various cancers, whereas such attempts related to ncRNAs are still fragmentary. We screened genomic DNAs for sequence variations in 148 microRNAs (miRNAs) and ultraconserved regions (UCRs) loci in patients with chronic lymphocytic leukemia (CLL) or colorectal cancer (CRC) by Sanger technique and further tried to elucidate the functional consequences of some of these variations. We found sequence variations in miRNAs in both sporadic and familial CLL cases, mutations of UCRs in CLLs and CRCs and, in certain instances, detected functional effects of these variations. Furthermore, by integrating our data with previously published data on miRNA sequence variations, we have created a catalog of DNA sequence variations in miRNAs/ultraconserved genes in human cancers. These findings argue that ncRNAs are targeted by both germ line and somatic mutations as well as by single-nucleotide polymorphisms with functional significance for human tumorigenesis. Sequence variations in ncRNA loci are frequent and some have functional and biological significance. Such information can be exploited to further investigate on a genome-wide scale the frequency of genetic variations in ncRNAs and their functional meaning, as well as for the development of new diagnostic and prognostic markers for leukemias and carcinomas.

  17. Non-codingRNA sequence variations in human chronic lymphocytic leukemia and colorectal cancer

    PubMed Central

    Wojcik, Sylwia E.; Rossi, Simona; Shimizu, Masayoshi; Nicoloso, Milena S.; Cimmino, Amelia; Alder, Hansjuerg; Herlea, Vlad; Rassenti, Laura Z.; Rai, Kanti R.; Kipps, Thomas J.; Keating, Michael J.

    2010-01-01

    Cancer is a genetic disease in which the interplay between alterations in protein-coding genes and non-coding RNAs (ncRNAs) plays a fundamental role. In recent years, the full coding component of the human genome was sequenced in various cancers, whereas such attempts related to ncRNAs are still fragmentary. We screened genomic DNAs for sequence variations in 148 microRNAs (miRNAs) and ultraconserved regions (UCRs) loci in patients with chronic lymphocytic leukemia (CLL) or colorectal cancer (CRC) by Sanger technique and further tried to elucidate the functional consequences of some of these variations. We found sequence variations in miRNAs in both sporadic and familial CLL cases, mutations of UCRs in CLLs and CRCs and, in certain instances, detected functional effects of these variations. Furthermore, by integrating our data with previously published data on miRNA sequence variations, we have created a catalog of DNA sequence variations in miRNAs/ultraconserved genes in human cancers. These findings argue that ncRNAs are targeted by both germ line and somatic mutations as well as by single-nucleotide polymorphisms with functional significance for human tumorigenesis. Sequence variations in ncRNA loci are frequent and some have functional and biological significance. Such information can be exploited to further investigate on a genome-wide scale the frequency of genetic variations in ncRNAs and their functional meaning, as well as for the development of new diagnostic and prognostic markers for leukemias and carcinomas. PMID:19926640

  18. Genetic variation assessment of acid lime accessions collected from south of Iran using SSR and ISSR molecular markers.

    PubMed

    Sharafi, Ata Allah; Abkenar, Asad Asadi; Sharafi, Ali; Masaeli, Mohammad

    2016-01-01

    Iran has a long history of acid lime cultivation and propagation. In this study, genetic variation in 28 acid lime accessions from five regions of south of Iran, and their relatedness with other 19 citrus cultivars were analyzed using Simple Sequence Repeat (SSR) and Inter-Simple Sequence Repeat (ISSR) molecular markers. Nine primers for SSR and nine ISSR primers were used for allele scoring. In total, 49 SSR and 131 ISSR polymorphic alleles were detected. Cluster analysis of SSR and ISSR data showed that most of the acid lime accessions (19 genotypes) have hybrid origin and genetically distance with nucellar of Mexican lime (9 genotypes). As nucellar of Mexican lime are susceptible to phytoplasma, these acid lime genotypes can be used to evaluate their tolerance against biotic constricts like lime "witches' broom disease".

  19. Nucleic acid detection kits

    DOEpatents

    Hall, Jeff G.; Lyamichev, Victor I.; Mast, Andrea L.; Brow, Mary Ann; Kwiatkowski, Robert W.; Vavra, Stephanie H.

    2005-03-29

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based on charge. The present invention also provides methods for the detection of non-target cleavage products via the formation of a complete and activated protein binding region. The invention further provides sensitive and specific methods for the detection of nucleic acid from various viruses in a sample.

  20. VarDetect: a nucleotide sequence variation exploratory tool

    PubMed Central

    Ngamphiw, Chumpol; Kulawonganunchai, Supasak; Assawamakin, Anunchai; Jenwitheesuk, Ekachai; Tongsima, Sissades

    2008-01-01

    Background Single nucleotide polymorphisms (SNPs) are the most commonly studied units of genetic variation. The discovery of such variation may help to identify causative gene mutations in monogenic diseases and SNPs associated with predisposing genes in complex diseases. Accurate detection of SNPs requires software that can correctly interpret chromatogram signals to nucleotides. Results We present VarDetect, a stand-alone nucleotide variation exploratory tool that automatically detects nucleotide variation from fluorescence based chromatogram traces. Accurate SNP base-calling is achieved using pre-calculated peak content ratios, and is enhanced by rules which account for common sequence reading artifacts. The proposed software tool is benchmarked against four other well-known SNP discovery software tools (PolyPhred, novoSNP, Genalys and Mutation Surveyor) using fluorescence based chromatograms from 15 human genes. These chromatograms were obtained from sequencing 16 two-pooled DNA samples; a total of 32 individual DNA samples. In this comparison of automatic SNP detection tools, VarDetect achieved the highest detection efficiency. Availability VarDetect is compatible with most major operating systems such as Microsoft Windows, Linux, and Mac OSX. The current version of VarDetect is freely available at . PMID:19091032

  1. Hybridization and sequencing of nucleic acids using base pair mismatches

    DOEpatents

    Fodor, Stephen P. A.; Lipshutz, Robert J.; Huang, Xiaohua

    2001-01-01

    Devices and techniques for hybridization of nucleic acids and for determining the sequence of nucleic acids. Arrays of nucleic acids are formed by techniques, preferably high resolution, light-directed techniques. Positions of hybridization of a target nucleic acid are determined by, e.g., epifluorescence microscopy. Devices and techniques are proposed to determine the sequence of a target nucleic acid more efficiently and more quickly through such synthesis and detection techniques.

  2. Mouse Vk gene classification by nucleic acid sequence similarity.

    PubMed

    Strohal, R; Helmberg, A; Kroemer, G; Kofler, R

    1989-01-01

    Analyses of immunoglobulin (Ig) variable (V) region gene usage in the immune response, estimates of V gene germline complexity, and other nucleic acid hybridization-based studies depend on the extent to which such genes are related (i.e., sequence similarity) and their organization in gene families. While mouse Igh heavy chain V region (VH) gene families are relatively well-established, a corresponding systematic classification of Igk light chain V region (Vk) genes has not been reported. The present analysis, in the course of which we reviewed the known extent of the Vk germline gene repertoire and Vk gene usage in a variety of responses to foreign and self antigens, provides a classification of mouse Vk genes in gene families composed of members with greater than 80% overall nucleic acid sequence similarity. This classification differed in several aspects from that of VH genes: only some Vk gene families were as clearly separated (by greater than 25% sequence dissimilarity) as typical VH gene families; most Vk gene families were closely related and, in several instances, members from different families were very similar (greater than 80%) over large sequence portions; frequently, classification by nucleic acid sequence similarity diverged from existing classifications based on amino-terminal protein sequence similarity. Our data have implications for Vk gene analyses by nucleic acid hybridization and describe potentially important differences in sequence organization between VH and Vk genes.

  3. Whole-Genome Sequence Variation among Multiple Isolates of Pseudomonas aeruginosa

    PubMed Central

    Spencer, David H.; Kas, Arnold; Smith, Eric E.; Raymond, Christopher K.; Sims, Elizabeth H.; Hastings, Michele; Burns, Jane L.; Kaul, Rajinder; Olson, Maynard V.

    2003-01-01

    Whole-genome shotgun sequencing was used to study the sequence variation of three Pseudomonas aeruginosa isolates, two from clonal infections of cystic fibrosis patients and one from an aquatic environment, relative to the genomic sequence of reference strain PAO1. The majority of the PAO1 genome is represented in these strains; however, at least three prominent islands of PAO1-specific sequence are apparent. Conversely, ∼10% of the sequencing reads derived from each isolate fail to align with the PAO1 backbone. While average sequence variation among all strains is roughly 0.5%, regions of pronounced differences were evident in whole-genome scans of nucleotide diversity. We analyzed two such divergent loci, the pyoverdine and O-antigen biosynthesis regions, by complete resequencing. A thorough analysis of isolates collected over time from one of the cystic fibrosis patients revealed independent mutations resulting in the loss of O-antigen synthesis alternating with a mucoid phenotype. Overall, we conclude that most of the PAO1 genome represents a core P. aeruginosa backbone sequence while the strains addressed in this study possess additional genetic material that accounts for at least 10% of their genomes. Approximately half of these additional sequences are novel. PMID:12562802

  4. HIV-1 sequence variation between isolates from mother-infant transmission pairs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wike, C.M.; Daniels, M.R.; Furtado, M.

    1991-12-31

    To examine the sequence diversity of human immunodeficiency virus type 1 (HIV-1) between known transmission sets, sequences from the V3 and V4-V5 region of the env gene from 4 mother-infant pairs were analyzed. The mean interpatient sequence variation between isolates from linked mother-infant pairs was comparable to the sequence diversity found between isolates from other close contacts. The mean intrapatient variation was significantly less in the infants` isolates then the isolates from both their mothers and other characterized intrapatient sequence sets. In addition, a distinct and characteristic difference in the glycosylation pattern preceding the V3 loop was found between eachmore » linked transmission pair. These findings indicate that selection of specific genotypic variants, which may play a role in some direct transmission sets, and the duration of infection are important factors in the degree of diversity seen between the sequence sets.« less

  5. BreaKmer: detection of structural variation in targeted massively parallel sequencing data using kmers.

    PubMed

    Abo, Ryan P; Ducar, Matthew; Garcia, Elizabeth P; Thorner, Aaron R; Rojas-Rudilla, Vanesa; Lin, Ling; Sholl, Lynette M; Hahn, William C; Meyerson, Matthew; Lindeman, Neal I; Van Hummelen, Paul; MacConaill, Laura E

    2015-02-18

    Genomic structural variation (SV), a common hallmark of cancer, has important predictive and therapeutic implications. However, accurately detecting SV using high-throughput sequencing data remains challenging, especially for 'targeted' resequencing efforts. This is critically important in the clinical setting where targeted resequencing is frequently being applied to rapidly assess clinically actionable mutations in tumor biopsies in a cost-effective manner. We present BreaKmer, a novel approach that uses a 'kmer' strategy to assemble misaligned sequence reads for predicting insertions, deletions, inversions, tandem duplications and translocations at base-pair resolution in targeted resequencing data. Variants are predicted by realigning an assembled consensus sequence created from sequence reads that were abnormally aligned to the reference genome. Using targeted resequencing data from tumor specimens with orthogonally validated SV, non-tumor samples and whole-genome sequencing data, BreaKmer had a 97.4% overall sensitivity for known events and predicted 17 positively validated, novel variants. Relative to four publically available algorithms, BreaKmer detected SV with increased sensitivity and limited calls in non-tumor samples, key features for variant analysis of tumor specimens in both the clinical and research settings. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. The amino acid sequence of Staphylococcus aureus penicillinase.

    PubMed Central

    Ambler, R P

    1975-01-01

    The amino acid sequence of the penicillinase (penicillin amido-beta-lactamhydrolase, EC 3.5.2.6) from Staphylococcus aureus strain PC1 was determined. The protein consists of a single polypeptide chain of 257 residues, and the sequence was determined by characterization of tryptic, chymotryptic, peptic and CNBr peptides, with some additional evidence from thermolysin and S. aureus proteinase peptides. A mistake in the preliminary report of the sequence is corrected; residues 113-116 are now thought to be -Lys-Lys-Val-Lys- rather than -Lys-Val-Lys-Lys-. Detailed evidence for the amino acid sequence has been deposited as Supplementary Publication SUP 50056 (91 pages) at the British Library (Lending Division), Boston Spa, Wetherby, West Yorkshire LS23 7BQ, U.K., from whom copies may be obtained on the terms given in Biochem. J. (1975) 145, 5. PMID:1218078

  7. Detection and sequence/structure mapping of biophysical constraints to protein variation in saturated mutational libraries and protein sequence alignments with a dedicated server.

    PubMed

    Abriata, Luciano A; Bovigny, Christophe; Dal Peraro, Matteo

    2016-06-17

    Protein variability can now be studied by measuring high-resolution tolerance-to-substitution maps and fitness landscapes in saturated mutational libraries. But these rich and expensive datasets are typically interpreted coarsely, restricting detailed analyses to positions of extremely high or low variability or dubbed important beforehand based on existing knowledge about active sites, interaction surfaces, (de)stabilizing mutations, etc. Our new webserver PsychoProt (freely available without registration at http://psychoprot.epfl.ch or at http://lucianoabriata.altervista.org/psychoprot/index.html ) helps to detect, quantify, and sequence/structure map the biophysical and biochemical traits that shape amino acid preferences throughout a protein as determined by deep-sequencing of saturated mutational libraries or from large alignments of naturally occurring variants. We exemplify how PsychoProt helps to (i) unveil protein structure-function relationships from experiments and from alignments that are consistent with structures according to coevolution analysis, (ii) recall global information about structural and functional features and identify hitherto unknown constraints to variation in alignments, and (iii) point at different sources of variation among related experimental datasets or between experimental and alignment-based data. Remarkably, metabolic costs of the amino acids pose strong constraints to variability at protein surfaces in nature but not in the laboratory. This and other differences call for caution when extrapolating results from in vitro experiments to natural scenarios in, for example, studies of protein evolution. We show through examples how PsychoProt can be a useful tool for the broad communities of structural biology and molecular evolution, particularly for studies about protein modeling, evolution and design.

  8. Dynamic Variation and Reversion in the Signature Amino Acids of H7N9 Virus During Human Infection.

    PubMed

    Zou, Xiaohui; Guo, Qiang; Zhang, Wei; Chen, Hui; Bai, Wei; Lu, Binghuai; Zhang, Wang; Fan, Yanyan; Liu, Chao; Wang, Yeming; Zhou, Fei; Cao, Bin

    2018-04-24

    Signature amino acids of H7N9 influenza virus play critical roles in human adaption and pathogenesis, but their dynamic variation is unknown during disease development. We sequentially collected respiratory samples from H7N9 patients at different timepoints and applied next-generation sequencing (NGS) to the whole genome of the H7N9 virus to investigate the variation at signature sites. A total of 11 patients were involved and from whom 29 samples were successfully sequenced, including samples from multiple timepoints in 9 patients. NA R292K, PB2 E627K, and D701N were the three most dynamic mutations. The oseltamivir resistance-related NA R292K mutation was present in 9 samples from 5 patients, including one sample obtained before antiviral therapy. In all patients with the NA 292K mutation, the oseltamivir-sensitive 292R genotype persisted and was not eliminated by antiviral treatment. The PB2 E627K substitution was present in 18 samples from 8 patients, among which 12 samples demonstrated a mixture of E/K and the 627K frequency exhibited dynamic variation. Dual D701N and E627K mutations emerged but failed to achieve predominance in any of the samples. Signature amino acids in PB2 and NA demonstrated high polymorphism and dynamic variation within individual patients during H7N9 virus infection.

  9. A map of human genome variation from population-scale sequencing.

    PubMed

    Abecasis, Gonçalo R; Altshuler, David; Auton, Adam; Brooks, Lisa D; Durbin, Richard M; Gibbs, Richard A; Hurles, Matt E; McVean, Gil A

    2010-10-28

    The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.

  10. Deep sequencing reveals cell-type-specific patterns of single-cell transcriptome variation.

    PubMed

    Dueck, Hannah; Khaladkar, Mugdha; Kim, Tae Kyung; Spaethling, Jennifer M; Francis, Chantal; Suresh, Sangita; Fisher, Stephen A; Seale, Patrick; Beck, Sheryl G; Bartfai, Tamas; Kuhn, Bernhard; Eberwine, James; Kim, Junhyong

    2015-06-09

    Differentiation of metazoan cells requires execution of different gene expression programs but recent single-cell transcriptome profiling has revealed considerable variation within cells of seeming identical phenotype. This brings into question the relationship between transcriptome states and cell phenotypes. Additionally, single-cell transcriptomics presents unique analysis challenges that need to be addressed to answer this question. We present high quality deep read-depth single-cell RNA sequencing for 91 cells from five mouse tissues and 18 cells from two rat tissues, along with 30 control samples of bulk RNA diluted to single-cell levels. We find that transcriptomes differ globally across tissues with regard to the number of genes expressed, the average expression patterns, and within-cell-type variation patterns. We develop methods to filter genes for reliable quantification and to calibrate biological variation. All cell types include genes with high variability in expression, in a tissue-specific manner. We also find evidence that single-cell variability of neuronal genes in mice is correlated with that in rats consistent with the hypothesis that levels of variation may be conserved. Single-cell RNA-sequencing data provide a unique view of transcriptome function; however, careful analysis is required in order to use single-cell RNA-sequencing measurements for this purpose. Technical variation must be considered in single-cell RNA-sequencing studies of expression variation. For a subset of genes, biological variability within each cell type appears to be regulated in order to perform dynamic functions, rather than solely molecular noise.

  11. Detection and isolation of nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1997-01-01

    A method for detecting a target nucleic acid sequence in a sample is provided using hybridization probes which competitively hybridize to a target nucleic acid. According to the method, a target nucleic acid sequence is hybridized to first and second hybridization probes which are complementary to overlapping portions of the target nucleic acid sequence, the first hybridization probe including a first complexing agent capable of forming a binding pair with a second complexing agent and the second hybridization probe including a detectable marker. The first complexing agent attached to the first hybridization probe is contacted with a second complexing agent, the second complexing agent being attached to a solid support such that when the first and second complexing agents are attached, target nucleic acid sequences hybridized to the first hybridization probe become immobilized on to the solid support. The immobilized target nucleic acids are then separated and detected by detecting the detectable marker attached to the second hybridization probe. A kit for performing the method is also provided.

  12. Detection and isolation of nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1997-04-01

    A method for detecting a target nucleic acid sequence in a sample is provided using hybridization probes which competitively hybridize to a target nucleic acid. According to the method, a target nucleic acid sequence is hybridized to first and second hybridization probes which are complementary to overlapping portions of the target nucleic acid sequence, the first hybridization probe including a first complexing agent capable of forming a binding pair with a second complexing agent and the second hybridization probe including a detectable marker. The first complexing agent attached to the first hybridization probe is contacted with a second complexing agent, the second complexing agent being attached to a solid support such that when the first and second complexing agents are attached, target nucleic acid sequences hybridized to the first hybridization probe become immobilized on to the solid support. The immobilized target nucleic acids are then separated and detected by detecting the detectable marker attached to the second hybridization probe. A kit for performing the method is also provided. 7 figs.

  13. Lactobacillus kefiri shows inter-strain variations in the amino acid sequence of the S-layer proteins.

    PubMed

    Malamud, Mariano; Carasi, Paula; Bronsoms, Sílvia; Trejo, Sebastián A; Serradell, María de Los Angeles

    2017-04-01

    The S-layer is a proteinaceous envelope constituted by subunits that self-assemble to form a two-dimensional lattice that covers the surface of different species of Bacteria and Archaea, and it could be involved in cell recognition of microbes among other several distinct functions. In this work, both proteomic and genomic approaches were used to gain knowledge about the sequences of the S-layer protein (SLPs) encoding genes expressed by six aggregative and sixteen non-aggregative strains of potentially probiotic Lactobacillus kefiri. Peptide mass fingerprint (PMF) analysis confirmed the identity of SLPs extracted from L. kefiri, and based on the homology with phylogenetically related species, primers located outside and inside the SLP-genes were employed to amplify genomic DNA. The O-glycosylation site SASSAS was found in all L. kefiri SLPs. Ten strains were selected for sequencing of the complete genes. The total length of the mature proteins varies from 492 to 576 amino acids, and all SLPs have a calculated pI between 9.37 and 9.60. The N-terminal region is relatively conserved and shows a high percentage of positively charged amino acids. Major differences among strains are found in the C-terminal region. Different groups could be distinguished regarding the mature SLPs and the similarities observed in the PMF spectra. Interestingly, SLPs of the aggregative strains are 100% homologous, although these strains were isolated from different kefir grains. This knowledge provides relevant data for better understanding of the mechanisms involved in SLPs functionality and could contribute to the development of products of biotechnological interest from potentially probiotic bacteria.

  14. Limited Variation in BK Virus T-Cell Epitopes Revealed by Next-Generation Sequencing

    PubMed Central

    Sahoo, Malaya K.; Tan, Susanna K.; Chen, Sharon F.; Kapusinszky, Beatrix; Concepcion, Katherine R.; Kjelson, Lynn; Mallempati, Kalyan; Farina, Heidi M.; Fernández-Viña, Marcelo; Tyan, Dolly; Grimm, Paul C.; Anderson, Matthew W.; Concepcion, Waldo

    2015-01-01

    BK virus (BKV) infection causing end-organ disease remains a formidable challenge to the hematopoietic cell transplant (HCT) and kidney transplant fields. As BKV-specific treatments are limited, immunologic-based therapies may be a promising and novel therapeutic option for transplant recipients with persistent BKV infection. Here, we describe a whole-genome, deep-sequencing methodology and bioinformatics pipeline that identify BKV variants across the genome and at BKV-specific HLA-A2-, HLA-B0702-, and HLA-B08-restricted CD8 T-cell epitopes. BKV whole genomes were amplified using long-range PCR with four inverse primer sets, and fragmentation libraries were sequenced on the Ion Torrent Personal Genome Machine (PGM). An error model and variant-calling algorithm were developed to accurately identify rare variants. A total of 65 samples from 18 pediatric HCT and kidney recipients with quantifiable BKV DNAemia underwent whole-genome sequencing. Limited genetic variation was observed. The median number of amino acid variants identified per sample was 8 (range, 2 to 37; interquartile range, 10), with the majority of variants (77%) detected at a frequency of <5%. When normalized for length, there was no statistical difference in the median number of variants across all genes. Similarly, the predominant virus population within samples harbored T-cell epitopes similar to the reference BKV strain that was matched for the BKV genotype. Despite the conservation of epitopes, low-level variants in T-cell epitopes were detected in 77.7% (14/18) of patients. Understanding epitope variation across the whole genome provides insight into the virus-immune interface and may help guide the development of protocols for novel immunologic-based therapies. PMID:26202116

  15. Amino acid sequence of the human fibronectin receptor

    PubMed Central

    1987-01-01

    The amino acid sequence deduced from cDNA of the human placental fibronectin receptor is reported. The receptor is composed of two subunits: an alpha subunit of 1,008 amino acids which is processed into two polypeptides disulfide bonded to one another, and a beta subunit of 778 amino acids. Each subunit has near its COOH terminus a hydrophobic segment. This and other sequence features suggest a structure for the receptor in which the hydrophobic segments serve as transmembrane domains anchoring each subunit to the membrane and dividing each into a large ectodomain and a short cytoplasmic domain. The alpha subunit ectodomain has five sequence elements homologous to consensus Ca2+- binding sites of several calcium-binding proteins, and the beta subunit contains a fourfold repeat strikingly rich in cysteine. The alpha subunit sequence is 46% homologous to the alpha subunit of the vitronectin receptor. The beta subunit is 44% homologous to the human platelet adhesion receptor subunit IIIa and 47% homologous to a leukocyte adhesion receptor beta subunit. The high degree of homology (85%) of the beta subunit with one of the polypeptides of a chicken adhesion receptor complex referred to as integrin complex strongly suggests that the latter polypeptide is the chicken homologue of the fibronectin receptor beta subunit. These receptor subunit homologies define a superfamily of adhesion receptors. The availability of the entire protein sequence for the fibronectin receptor will facilitate studies on the functions of these receptors. PMID:2958481

  16. Phenolic acid esterases, coding sequences and methods

    DOEpatents

    Blum, David L.; Kataeva, Irina; Li, Xin-Liang; Ljungdahl, Lars G.

    2002-01-01

    Described herein are four phenolic acid esterases, three of which correspond to domains of previously unknown function within bacterial xylanases, from XynY and XynZ of Clostridium thermocellum and from a xylanase of Ruminococcus. The fourth specifically exemplified xylanase is a protein encoded within the genome of Orpinomyces PC-2. The amino acids of these polypeptides and nucleotide sequences encoding them are provided. Recombinant host cells, expression vectors and methods for the recombinant production of phenolic acid esterases are also provided.

  17. Identification of random nucleic acid sequence aberrations using dual capture probes which hybridize to different chromosome regions

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1998-01-01

    A method is provided for detecting nucleic acid sequence aberrations using two immobilization steps. According to the method, a nucleic acid sequence aberration is detected by detecting nucleic acid sequences having both a first nucleic acid sequence type (e.g., from a first chromosome) and a second nucleic acid sequence type (e.g., from a second chromosome), the presence of the first and the second nucleic acid sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. In the method, immobilization of a first hybridization probe is used to isolate a first set of nucleic acids in the sample which contain the first nucleic acid sequence type. Immobilization of a second hybridization probe is then used to isolate a second set of nucleic acids from within the first set of nucleic acids which contain the second nucleic acid sequence type. The second set of nucleic acids are then detected, their presence indicating the presence of a nucleic acid sequence aberration.

  18. Phylogenetically Structured Differences in rRNA Gene Sequence Variation among Species of Arbuscular Mycorrhizal Fungi and Their Implications for Sequence Clustering

    PubMed Central

    Ekanayake, Saliya; Ruan, Yang; Schütte, Ursel M. E.; Kaonongbua, Wittaya; Fox, Geoffrey; Ye, Yuzhen; Bever, James D.

    2016-01-01

    ABSTRACT Arbuscular mycorrhizal (AM) fungi form mutualisms with plant roots that increase plant growth and shape plant communities. Each AM fungal cell contains a large amount of genetic diversity, but it is unclear if this diversity varies across evolutionary lineages. We found that sequence variation in the nuclear large-subunit (LSU) rRNA gene from 29 isolates representing 21 AM fungal species generally assorted into genus- and species-level clades, with the exception of species of the genera Claroideoglomus and Entrophospora. However, there were significant differences in the levels of sequence variation across the phylogeny and between genera, indicating that it is an evolutionarily constrained trait in AM fungi. These consistent patterns of sequence variation across both phylogenetic and taxonomic groups pose challenges to interpreting operational taxonomic units (OTUs) as approximations of species-level groups of AM fungi. We demonstrate that the OTUs produced by five sequence clustering methods using 97% or equivalent sequence similarity thresholds failed to match the expected species of AM fungi, although OTUs from AbundantOTU, CD-HIT-OTU, and CROP corresponded better to species than did OTUs from mothur or UPARSE. This lack of OTU-to-species correspondence resulted both from sequences of one species being split into multiple OTUs and from sequences of multiple species being lumped into the same OTU. The OTU richness therefore will not reliably correspond to the AM fungal species richness in environmental samples. Conservatively, this error can overestimate species richness by 4-fold or underestimate richness by one-half, and the direction of this error will depend on the genera represented in the sample. IMPORTANCE Arbuscular mycorrhizal (AM) fungi form important mutualisms with the roots of most plant species. Individual AM fungi are genetically diverse, but it is unclear whether the level of this diversity differs among evolutionary lineages. We found

  19. Amino acid sequence analysis of the annexin super-gene family of proteins.

    PubMed

    Barton, G J; Newman, R H; Freemont, P S; Crumpton, M J

    1991-06-15

    The annexins are a widespread family of calcium-dependent membrane-binding proteins. No common function has been identified for the family and, until recently, no crystallographic data existed for an annexin. In this paper we draw together 22 available annexin sequences consisting of 88 similar repeat units, and apply the techniques of multiple sequence alignment, pattern matching, secondary structure prediction and conservation analysis to the characterisation of the molecules. The analysis clearly shows that the repeats cluster into four distinct families and that greatest variation occurs within the repeat 3 units. Multiple alignment of the 88 repeats shows amino acids with conserved physicochemical properties at 22 positions, with only Gly at position 23 being absolutely conserved in all repeats. Secondary structure prediction techniques identify five conserved helices in each repeat unit and patterns of conserved hydrophobic amino acids are consistent with one face of a helix packing against the protein core in predicted helices a, c, d, e. Helix b is generally hydrophobic in all repeats, but contains a striking pattern of repeat-specific residue conservation at position 31, with Arg in repeats 4 and Glu in repeats 2, but unconserved amino acids in repeats 1 and 3. This suggests repeats 2 and 4 may interact via a buried saltbridge. The loop between predicted helices a and b of repeat 3 shows features distinct from the equivalent loop in repeats 1, 2 and 4, suggesting an important structural and/or functional role for this region. No compelling evidence emerges from this study for uteroglobin and the annexins sharing similar tertiary structures, or for uteroglobin representing a derivative of a primordial one-repeat structure that underwent duplication to give the present day annexins. The analyses performed in this paper are re-evaluated in the Appendix, in the light of the recently published X-ray structure for human annexin V. The structure confirms most of

  20. Identification of random nucleic acid sequence aberrations using dual capture probes which hybridize to different chromosome regions

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1998-03-24

    A method is provided for detecting nucleic acid sequence aberrations using two immobilization steps. According to the method, a nucleic acid sequence aberration is detected by detecting nucleic acid sequences having both a first nucleic acid sequence type (e.g., from a first chromosome) and a second nucleic acid sequence type (e.g., from a second chromosome), the presence of the first and the second nucleic acid sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. In the method, immobilization of a first hybridization probe is used to isolate a first set of nucleic acids in the sample which contain the first nucleic acid sequence type. Immobilization of a second hybridization probe is then used to isolate a second set of nucleic acids from within the first set of nucleic acids which contain the second nucleic acid sequence type. The second set of nucleic acids are then detected, their presence indicating the presence of a nucleic acid sequence aberration. 14 figs.

  1. Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification.

    PubMed

    Sinclair, Robert M; Ravantti, Janne J; Bamford, Dennis H

    2017-04-15

    Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly

  2. Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification

    PubMed Central

    Sinclair, Robert M.; Ravantti, Janne J.

    2017-01-01

    ABSTRACT Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids

  3. Rapid Characterization of Insulin Modifications and Sequence Variations by Proteinase K Digestion and UHPLC-ESI-MS

    NASA Astrophysics Data System (ADS)

    Yang, Rong-Sheng; Tang, Weijuan; Sheng, Huaming; Meng, Fanyu

    2018-01-01

    Discovery of novel insulin analogs as therapeutics has remained an active area of research. Compared with native human insulin, insulin analog molecules normally incorporate either covalent modifications or amino acid sequence variations. From the drug discovery and development perspective, methods for efficient and detailed characterization of these primary structural changes are very important. In this report, we demonstrate that proteinase K digestion coupled with UPLC-ESI-MS analysis provides a simple and rapid approach to characterize the modifications and sequence variations of insulin molecules. A commercially available proteinase K digestion kit was used to process recombinant human insulin (RHI), insulin glargine, and fluorescein isothiocynate-labeled recombinant human insulin (FITC-RHI) samples. The LC-MS data clearly showed that RHI and insulin glargine samples can be differentiated, and the FITC modifications in all three amine sites of the RHI molecule are well characterized. The end-to-end experiment and data interpretation was achieved within 60 min. This approach is fast and simple, and can be easily implemented in early drug discovery laboratories to facilitate research on more advanced insulin therapeutics. [Figure not available: see fulltext.

  4. Rapid Characterization of Insulin Modifications and Sequence Variations by Proteinase K Digestion and UHPLC-ESI-MS

    NASA Astrophysics Data System (ADS)

    Yang, Rong-Sheng; Tang, Weijuan; Sheng, Huaming; Meng, Fanyu

    2018-05-01

    Discovery of novel insulin analogs as therapeutics has remained an active area of research. Compared with native human insulin, insulin analog molecules normally incorporate either covalent modifications or amino acid sequence variations. From the drug discovery and development perspective, methods for efficient and detailed characterization of these primary structural changes are very important. In this report, we demonstrate that proteinase K digestion coupled with UPLC-ESI-MS analysis provides a simple and rapid approach to characterize the modifications and sequence variations of insulin molecules. A commercially available proteinase K digestion kit was used to process recombinant human insulin (RHI), insulin glargine, and fluorescein isothiocynate-labeled recombinant human insulin (FITC-RHI) samples. The LC-MS data clearly showed that RHI and insulin glargine samples can be differentiated, and the FITC modifications in all three amine sites of the RHI molecule are well characterized. The end-to-end experiment and data interpretation was achieved within 60 min. This approach is fast and simple, and can be easily implemented in early drug discovery laboratories to facilitate research on more advanced insulin therapeutics. [Figure not available: see fulltext.

  5. Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation

    NASA Astrophysics Data System (ADS)

    Sheynkman, Gloria M.; Shortreed, Michael R.; Cesnik, Anthony J.; Smith, Lloyd M.

    2016-06-01

    Mass spectrometry-based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications.

  6. Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation

    PubMed Central

    Sheynkman, Gloria M.; Shortreed, Michael R.; Cesnik, Anthony J.; Smith, Lloyd M.

    2016-01-01

    Mass spectrometry–based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications. PMID:27049631

  7. Mapping and phasing of structural variation in patient genomes using nanopore sequencing.

    PubMed

    Cretu Stancu, Mircea; van Roosmalen, Markus J; Renkens, Ivo; Nieboer, Marleen M; Middelkamp, Sjors; de Ligt, Joep; Pregno, Giulia; Giachino, Daniela; Mandrile, Giorgia; Espejo Valle-Inclan, Jose; Korzelius, Jerome; de Bruijn, Ewart; Cuppen, Edwin; Talkowski, Michael E; Marschall, Tobias; de Ridder, Jeroen; Kloosterman, Wigard P

    2017-11-06

    Despite improvements in genomics technology, the detection of structural variants (SVs) from short-read sequencing still poses challenges, particularly for complex variation. Here we analyse the genomes of two patients with congenital abnormalities using the MinION nanopore sequencer and a novel computational pipeline-NanoSV. We demonstrate that nanopore long reads are superior to short reads with regard to detection of de novo chromothripsis rearrangements. The long reads also enable efficient phasing of genetic variations, which we leveraged to determine the parental origin of all de novo chromothripsis breakpoints and to resolve the structure of these complex rearrangements. Additionally, genome-wide surveillance of inherited SVs reveals novel variants, missed in short-read data sets, a large proportion of which are retrotransposon insertions. We provide a first exploration of patient genome sequencing with a nanopore sequencer and demonstrate the value of long-read sequencing in mapping and phasing of SVs for both clinical and research applications.

  8. Designing deep sequencing experiments: detecting structural variation and estimating transcript abundance.

    PubMed

    Bashir, Ali; Bansal, Vikas; Bafna, Vineet

    2010-06-18

    Massively parallel DNA sequencing technologies have enabled the sequencing of several individual human genomes. These technologies are also being used in novel ways for mRNA expression profiling, genome-wide discovery of transcription-factor binding sites, small RNA discovery, etc. The multitude of sequencing platforms, each with their unique characteristics, pose a number of design challenges, regarding the technology to be used and the depth of sequencing required for a particular sequencing application. Here we describe a number of analytical and empirical results to address design questions for two applications: detection of structural variations from paired-end sequencing and estimating mRNA transcript abundance. For structural variation, our results provide explicit trade-offs between the detection and resolution of rearrangement breakpoints, and the optimal mix of paired-read insert lengths. Specifically, we prove that optimal detection and resolution of breakpoints is achieved using a mix of exactly two insert library lengths. Furthermore, we derive explicit formulae to determine these insert length combinations, enabling a 15% improvement in breakpoint detection at the same experimental cost. On empirical short read data, these predictions show good concordance with Illumina 200 bp and 2 Kbp insert length libraries. For transcriptome sequencing, we determine the sequencing depth needed to detect rare transcripts from a small pilot study. With only 1 Million reads, we derive corrections that enable almost perfect prediction of the underlying expression probability distribution, and use this to predict the sequencing depth required to detect low expressed genes with greater than 95% probability. Together, our results form a generic framework for many design considerations related to high-throughput sequencing. We provide software tools http://bix.ucsd.edu/projects/NGS-DesignTools to derive platform independent guidelines for designing sequencing experiments

  9. Complete amino acid sequence of bovine colostrum low-Mr cysteine proteinase inhibitor.

    PubMed

    Hirado, M; Tsunasawa, S; Sakiyama, F; Niinobe, M; Fujii, S

    1985-07-01

    The complete amino acid sequence of bovine colostrum cysteine proteinase inhibitor was determined by sequencing native inhibitor and peptides obtained by cyanogen bromide degradation, Achromobacter lysylendopeptidase digestion and partial acid hydrolysis of reduced and S-carboxymethylated protein. Achromobacter peptidase digestion was successfully used to isolate two disulfide-containing peptides. The inhibitor consists of 112 amino acids with an Mr of 12787. Two disulfide bonds were established between Cys 66 and Cys 77 and between Cys 90 and Cys 110. A high degree of homology in the sequence was found between the colostrum inhibitor and human gamma-trace, human salivary acidic protein and chicken egg-white cystatin.

  10. DYZ1 arrays show sequence variation between the monozygotic males

    PubMed Central

    2014-01-01

    Background Monozygotic twins (MZT) are an important resource for genetical studies in the context of normal and diseased genomes. In the present study we used DYZ1, a satellite fraction present in the form of tandem arrays on the long arm of the human Y chromosome, as a tool to uncover sequence variations between the monozygotic males. Results We detected copy number variation, frequent insertions and deletions within the sequences of DYZ1 arrays amongst all the three sets of twins used in the present study. MZT1b showed loss of 35 bp compared to that in 1a, whereas 2a showed loss of 31 bp compared to that in 2b. Similarly, 3b showed 10 bp insertion compared to that in 3a. MZT1a germline DNA showed loss of 5 bp and 1b blood DNA showed loss of 26 bp compared to that of 1a blood and 1b germline DNA, respectively. Of the 69 restriction sites detected in DYZ1 arrays, MboII, BsrI, TspEI and TaqI enzymes showed frequent loss and or gain amongst all the 3 pairs studied. MZT1 pair showed loss/gain of VspI, BsrDI, AgsI, PleI, TspDTI, TspEI, TfiI and TaqI restriction sites in both blood and germline DNA. All the three sets of MZT showed differences in the number of DYZ1 copies. FISH signals reflected somatic mosaicism of the DYZ1 copies across the cells. Conclusions DYZ1 showed both sequence and copy number variation between the MZT males. Sequence variation was also noticed between germline and blood DNA samples of the same individual as we observed at least in one set of sample. The result suggests that DYZ1 faithfully records all the genetical changes occurring after the twining which may be ascribed to the environmental factors. PMID:24495361

  11. Characterization of genetic sequence variation of 58 STR loci in four major population groups.

    PubMed

    Novroski, Nicole M M; King, Jonathan L; Churchill, Jennifer D; Seah, Lay Hong; Budowle, Bruce

    2016-11-01

    Massively parallel sequencing (MPS) can identify sequence variation within short tandem repeat (STR) alleles as well as their nominal allele lengths that traditionally have been obtained by capillary electrophoresis. Using the MiSeq FGx Forensic Genomics System (Illumina), STRait Razor, and in-house excel workbooks, genetic variation was characterized within STR repeat and flanking regions of 27 autosomal, 7 X-chromosome and 24 Y-chromosome STR markers in 777 unrelated individuals from four population groups. Seven hundred and forty six autosomal, 227 X-chromosome, and 324 Y-chromosome STR alleles were identified by sequence compared with 357 autosomal, 107 X-chromosome, and 189 Y-chromosome STR alleles that were identified by length. Within the observed sequence variation, 227 autosomal, 156 X-chromosome, and 112 Y-chromosome novel alleles were identified and described. One hundred and seventy six autosomal, 123 X-chromosome, and 93 Y-chromosome sequence variants resided within STR repeat regions, and 86 autosomal, 39 X-chromosome, and 20 Y-chromosome variants were located in STR flanking regions. Three markers, D18S51, DXS10135, and DYS385a-b had 1, 4, and 1 alleles, respectively, which contained both a novel repeat region variant and a flanking sequence variant in the same nucleotide sequence. There were 50 markers that demonstrated a relative increase in diversity with the variant sequence alleles compared with those of traditional nominal length alleles. These population data illustrate the genetic variation that exists in the commonly used STR markers in the selected population samples and provide allele frequencies for statistical calculations related to STR profiling with MPS data. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  12. Detection of nucleic acids by multiple sequential invasive cleavages

    DOEpatents

    Hall, Jeff G.; Lyamichev, Victor I.; Mast, Andrea L.; Brow, Mary Ann D.

    1999-01-01

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based on charge. The present invention also provides methods for the detection of non-target cleavage products via the formation of a complete and activated protein binding region. The invention further provides sensitive and specific methods for the detection of human cytomegalovirus nucleic acid in a sample.

  13. Detection of nucleic acids by multiple sequential invasive cleavages

    DOEpatents

    Hall, Jeff G; Lyamichev, Victor I; Mast, Andrea L; Brow, Mary Ann D

    2012-10-16

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based on charge. The present invention also provides methods for the detection of non-target cleavage products via the formation of a complete and activated protein binding region. The invention further provides sensitive and specific methods for the detection of human cytomegalovirus nucleic acid in a sample.

  14. Nucleic acid sequence detection using multiplexed oligonucleotide PCR

    DOEpatents

    Nolan, John P [Santa Fe, NM; White, P Scott [Los Alamos, NM

    2006-12-26

    Methods for rapidly detecting single or multiple sequence alleles in a sample nucleic acid are described. Provided are all of the oligonucleotide pairs capable of annealing specifically to a target allele and discriminating among possible sequences thereof, and ligating to each other to form an oligonucleotide complex when a particular sequence feature is present (or, alternatively, absent) in the sample nucleic acid. The design of each oligonucleotide pair permits the subsequent high-level PCR amplification of a specific amplicon when the oligonucleotide complex is formed, but not when the oligonucleotide complex is not formed. The presence or absence of the specific amplicon is used to detect the allele. Detection of the specific amplicon may be achieved using a variety of methods well known in the art, including without limitation, oligonucleotide capture onto DNA chips or microarrays, oligonucleotide capture onto beads or microspheres, electrophoresis, and mass spectrometry. Various labels and address-capture tags may be employed in the amplicon detection step of multiplexed assays, as further described herein.

  15. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion

    PubMed Central

    Thomsen, Martin Christen Frølund; Nielsen, Morten

    2012-01-01

    Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally valuable information related to amino acid depletion. Seq2logo aims at resolving these issues allowing the user to include sequence weighting to correct for data redundancy, pseudo counts to correct for low number of observations and different logotype representations each capturing different aspects related to amino acid enrichment and depletion. Besides allowing input in the format of peptides and MSA, Seq2Logo accepts input as Blast sequence profiles, providing easy access for non-expert end-users to characterize and identify functionally conserved/variable amino acids in any given protein of interest. The output from the server is a sequence logo and a PSSM. Seq2Logo is available at http://www.cbs.dtu.dk/biotools/Seq2Logo (14 May 2012, date last accessed). PMID:22638583

  16. Population genomic analysis of strain variation in Leptospirillum group II bacteria involved in acid mine drainage formation.

    PubMed

    Simmons, Sheri L; Dibartolo, Genevieve; Denef, Vincent J; Goltsman, Daniela S Aliaga; Thelen, Michael P; Banfield, Jillian F

    2008-07-22

    Deeply sampled community genomic (metagenomic) datasets enable comprehensive analysis of heterogeneity in natural microbial populations. In this study, we used sequence data obtained from the dominant member of a low-diversity natural chemoautotrophic microbial community to determine how coexisting closely related individuals differ from each other in terms of gene sequence and gene content, and to uncover evidence of evolutionary processes that occur over short timescales. DNA sequence obtained from an acid mine drainage biofilm was reconstructed, taking into account the effects of strain variation, to generate a nearly complete genome tiling path for a Leptospirillum group II species closely related to L. ferriphilum (sampling depth approximately 20x). The population is dominated by one sequence type, yet we detected evidence for relatively abundant variants (>99.5% sequence identity to the dominant type) at multiple loci, and a few rare variants. Blocks of other Leptospirillum group II types ( approximately 94% sequence identity) have recombined into one or more variants. Variant blocks of both types are more numerous near the origin of replication. Heterogeneity in genetic potential within the population arises from localized variation in gene content, typically focused in integrated plasmid/phage-like regions. Some laterally transferred gene blocks encode physiologically important genes, including quorum-sensing genes of the LuxIR system. Overall, results suggest inter- and intrapopulation genetic exchange involving distinct parental genome types and implicate gain and loss of phage and plasmid genes in recent evolution of this Leptospirillum group II population. Population genetic analyses of single nucleotide polymorphisms indicate variation between closely related strains is not maintained by positive selection, suggesting that these regions do not represent adaptive differences between strains. Thus, the most likely explanation for the observed patterns of

  17. Population Genomic Analysis of Strain Variation in Leptospirillum Group II Bacteria Involved in Acid Mine Drainage Formation

    PubMed Central

    Denef, Vincent J; Goltsman, Daniela S. Aliaga; Thelen, Michael P; Banfield, Jillian F

    2008-01-01

    Deeply sampled community genomic (metagenomic) datasets enable comprehensive analysis of heterogeneity in natural microbial populations. In this study, we used sequence data obtained from the dominant member of a low-diversity natural chemoautotrophic microbial community to determine how coexisting closely related individuals differ from each other in terms of gene sequence and gene content, and to uncover evidence of evolutionary processes that occur over short timescales. DNA sequence obtained from an acid mine drainage biofilm was reconstructed, taking into account the effects of strain variation, to generate a nearly complete genome tiling path for a Leptospirillum group II species closely related to L. ferriphilum (sampling depth ∼20×). The population is dominated by one sequence type, yet we detected evidence for relatively abundant variants (>99.5% sequence identity to the dominant type) at multiple loci, and a few rare variants. Blocks of other Leptospirillum group II types (∼94% sequence identity) have recombined into one or more variants. Variant blocks of both types are more numerous near the origin of replication. Heterogeneity in genetic potential within the population arises from localized variation in gene content, typically focused in integrated plasmid/phage-like regions. Some laterally transferred gene blocks encode physiologically important genes, including quorum-sensing genes of the LuxIR system. Overall, results suggest inter- and intrapopulation genetic exchange involving distinct parental genome types and implicate gain and loss of phage and plasmid genes in recent evolution of this Leptospirillum group II population. Population genetic analyses of single nucleotide polymorphisms indicate variation between closely related strains is not maintained by positive selection, suggesting that these regions do not represent adaptive differences between strains. Thus, the most likely explanation for the observed patterns of polymorphism is

  18. Dissecting the relationship between protein structure and sequence variation

    NASA Astrophysics Data System (ADS)

    Shahmoradi, Amir; Wilke, Claus; Wilke Lab Team

    2015-03-01

    Over the past decade several independent works have shown that some structural properties of proteins are capable of predicting protein evolution. The strength and significance of these structure-sequence relations, however, appear to vary widely among different proteins, with absolute correlation strengths ranging from 0 . 1 to 0 . 8 . Here we present the results from a comprehensive search for the potential biophysical and structural determinants of protein evolution by studying more than 200 structural and evolutionary properties in a dataset of 209 monomeric enzymes. We discuss the main protein characteristics responsible for the general patterns of protein evolution, and identify sequence divergence as the main determinant of the strengths of virtually all structure-evolution relationships, explaining ~ 10 - 30 % of observed variation in sequence-structure relations. In addition to sequence divergence, we identify several protein structural properties that are moderately but significantly coupled with the strength of sequence-structure relations. In particular, proteins with more homogeneous back-bone hydrogen bond energies, large fractions of helical secondary structures and low fraction of beta sheets tend to have the strongest sequence-structure relation. BEACON-NSF center for the study of evolution in action.

  19. BayesPI-BAR: a new biophysical model for characterization of regulatory sequence variations

    PubMed Central

    Wang, Junbai; Batmanov, Kirill

    2015-01-01

    Sequence variations in regulatory DNA regions are known to cause functionally important consequences for gene expression. DNA sequence variations may have an essential role in determining phenotypes and may be linked to disease; however, their identification through analysis of massive genome-wide sequencing data is a great challenge. In this work, a new computational pipeline, a Bayesian method for protein–DNA interaction with binding affinity ranking (BayesPI-BAR), is proposed for quantifying the effect of sequence variations on protein binding. BayesPI-BAR uses biophysical modeling of protein–DNA interactions to predict single nucleotide polymorphisms (SNPs) that cause significant changes in the binding affinity of a regulatory region for transcription factors (TFs). The method includes two new parameters (TF chemical potentials or protein concentrations and direct TF binding targets) that are neglected by previous methods. The new method is verified on 67 known human regulatory SNPs, of which 47 (70%) have predicted true TFs ranked in the top 10. Importantly, the performance of BayesPI-BAR, which uses principal component analysis to integrate multiple predictions from various TF chemical potentials, is found to be better than that of existing programs, such as sTRAP and is-rSNP, when evaluated on the same SNPs. BayesPI-BAR is a publicly available tool and is able to carry out parallelized computation, which helps to investigate a large number of TFs or SNPs and to detect disease-associated regulatory sequence variations in the sea of genome-wide noncoding regions. PMID:26202972

  20. Detection of nucleic acids by multiple sequential invasive cleavages 02

    DOEpatents

    Hall, Jeff G.; Lyamichev, Victor I.; Mast, Andrea L.; Brow, Mary Ann D.

    2002-01-01

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based on charge. The present invention also provides methods for the detection of non-target cleavage products via the formation of a complete and activated protein binding region. The invention further provides sensitive and specific methods for the detection of human cytomegalovirus nucleic acid in a sample.

  1. WEB-server for search of a periodicity in amino acid and nucleotide sequences

    NASA Astrophysics Data System (ADS)

    E Frenkel, F.; Skryabin, K. G.; Korotkov, E. V.

    2017-12-01

    A new web server (http://victoria.biengi.ac.ru/splinter/login.php) was designed and developed to search for periodicity in nucleotide and amino acid sequences. The web server operation is based upon a new mathematical method of searching for multiple alignments, which is founded on the position weight matrices optimization, as well as on implementation of the two-dimensional dynamic programming. This approach allows the construction of multiple alignments of the indistinctly similar amino acid and nucleotide sequences that accumulated more than 1.5 substitutions per a single amino acid or a nucleotide without performing the sequences paired comparisons. The article examines the principles of the web server operation and two examples of studying amino acid and nucleotide sequences, as well as information that could be obtained using the web server.

  2. Detection and isolation of nucleic acid sequences using a bifunctional hybridization probe

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    2000-01-01

    A method for detecting and isolating a target sequence in a sample of nucleic acids is provided using a bifunctional hybridization probe capable of hybridizing to the target sequence that includes a detectable marker and a first complexing agent capable of forming a binding pair with a second complexing agent. A kit is also provided for detecting a target sequence in a sample of nucleic acids using a bifunctional hybridization probe according to this method.

  3. Sequence variation of the feline immunodeficiency virus genome and its clinical relevance.

    PubMed

    Stickney, A L; Dunowska, M; Cave, N J

    2013-06-08

    The ongoing evolution of feline immunodeficiency virus (FIV) has resulted in the existence of a diverse continuum of viruses. FIV isolates differ with regards to their mutation and replication rates, plasma viral loads, cell tropism and the ability to induce apoptosis. Clinical disease in FIV-infected cats is also inconsistent. Genomic sequence variation of FIV is likely to be responsible for some of the variation in viral behaviour. The specific genetic sequences that influence these key viral properties remain to be determined. With knowledge of the specific key determinants of pathogenicity, there is the potential for veterinarians in the future to apply this information for prognostic purposes. Genomic sequence variation of FIV also presents an obstacle to effective vaccine development. Most challenge studies demonstrate acceptable efficacy of a dual-subtype FIV vaccine (Fel-O-Vax FIV) against FIV infection under experimental settings; however, vaccine efficacy in the field still remains to be proven. It is important that we discover the key determinants of immunity induced by this vaccine; such data would compliment vaccine field efficacy studies and provide the basis to make informed recommendations on its use.

  4. Genetic variation of six desaturase genes in flax and their impact on fatty acid composition.

    PubMed

    Thambugala, Dinushika; Duguid, Scott; Loewen, Evelyn; Rowland, Gordon; Booker, Helen; You, Frank M; Cloutier, Sylvie

    2013-10-01

    Flax (Linum usitatissimum L.) is one of the richest plant sources of omega-3 fatty acids praised for their health benefits. In this study, the extent of the genetic variability of genes encoding stearoyl-ACP desaturase (SAD), and fatty acid desaturase 2 (FAD2) and 3 (FAD3) was determined by sequencing the six paralogous genes from 120 flax accessions representing a broad range of germplasm including some EMS mutant lines. A total of 6 alleles for sad1 and sad2, 21 for fad2a, 5 for fad2b, 15 for fad3a and 18 for fad3b were identified. Deduced amino acid sequences of the alleles predicted 4, 2, 3, 4, 6 and 7 isoforms, respectively. Allele frequencies varied greatly across genes. Fad3a, with 110 SNPs and 19 indels, and fad3b, with 50 SNPs and 5 indels, showed the highest levels of genetic variations. While most of the SNPs and all the indels were silent mutations, both genes carried nonsense SNP mutations resulting in premature stop codons, a feature not observed in sad and fad2 genes. Some alleles and isoforms discovered in induced mutant lines were absent in the natural germplasm. Correlation of these genotypic data with fatty acid composition data of 120 flax accessions phenotyped in six field experiments revealed statistically significant effects of some of the SAD and FAD isoforms on fatty acid composition, oil content and iodine value. The novel allelic variants and isoforms identified for the six desaturases will be a resource for the development of oilseed flax with unique and useful fatty acid profiles.

  5. Chromospheric variations in main-sequence stars

    NASA Technical Reports Server (NTRS)

    Baliunas, S. L.; Donahue, R. A.; Soon, J. H.; Horne, J. H.; Frazer, J.; Woodard-Eklund, L.; Bradford, M.; Rao, L. M.; Wilson, O. C.; Zhang, Q.

    1995-01-01

    The fluxes in passbands 0.1 nm wide and centered on the Ca II H and K emission cores have been monitored in 111 stars of spectral type F2-M2 on or near the main sequence in a continuation of an observing program started by O. C. Wilson. Most of the measurements began in 1966, with observations scheduled monthly until 1980, when observations were schedueld sevral times per week. The records, with a long-term precision of about 1.5%, display fluctuations that can be idntified with variations on timescales similar to the 11 yr cycle of solar activity as well as axial rotation, and the growth and decay of emitting regions. We present the records of chromospheric emission and general conclusions about variations in surface magnetic activity on timescales greater than 1 yr but less than a few decades. The results for stars of spectral type G0-K5 V indicate a pattern of change in rotation and chromospheric activity on an evolutionary timescale, in which (1) young stars exhibit high average levels of activity, rapid rotation rates, no Maunder minimum phase and rarely display a smooth, cyclic variation; (2) stars of intermediate age (approximately 1-2 Gyr for 1 solar mass) have moderate levels of activity and rotation rates, and occasional smooth cycles; and (3) stars as old as the Sun and older have slower rotation rates, lower activity levels and smooth cycles with occasional Maunder minimum-phases.

  6. Soil amino acid composition across a boreal forest successional sequence

    Treesearch

    Nancy R. Werdin-Pfisterer; Knut Kielland; Richard D. Boone

    2009-01-01

    Soil amino acids are important sources of organic nitrogen for plant nutrition, yet few studies have examined which amino acids are most prevalent in the soil. In this study, we examined the composition, concentration, and seasonal patterns of soil amino acids across a primary successional sequence encompassing a natural gradient of plant productivity and soil...

  7. Thermal and acid tolerant beta-xylosidases, genes encoding, related organisms, and methods

    DOEpatents

    Thompson, David N [Idaho Falls, ID; Thompson, Vicki S [Idaho Falls, ID; Schaller, Kastli D [Ammon, ID; Apel, William A [Jackson, WY; Lacey, Jeffrey A [Idaho Falls, ID; Reed, David W [Idaho Falls, ID

    2011-04-12

    Isolated and/or purified polypeptides and nucleic acid sequences encoding polypeptides from Alicyclobacillus acidocaldarius and variations thereof are provided. Further provided are methods of at least partially degrading xylotriose and/or xylobiose using isolated and/or purified polypeptides and nucleic acid sequences encoding polypeptides from Alicyclobacillus acidocaldarius and variations thereof.

  8. Exome sequencing and arrayCGH detection of gene sequence and copy number variation between ILS and ISS mouse strains.

    PubMed

    Dumas, Laura; Dickens, C Michael; Anderson, Nathan; Davis, Jonathan; Bennett, Beth; Radcliffe, Richard A; Sikela, James M

    2014-06-01

    It has been well documented that genetic factors can influence predisposition to develop alcoholism. While the underlying genomic changes may be of several types, two of the most common and disease associated are copy number variations (CNVs) and sequence alterations of protein coding regions. The goal of this study was to identify CNVs and single-nucleotide polymorphisms that occur in gene coding regions that may play a role in influencing the risk of an individual developing alcoholism. Toward this end, two mouse strains were used that have been selectively bred based on their differential sensitivity to alcohol: the Inbred long sleep (ILS) and Inbred short sleep (ISS) mouse strains. Differences in initial response to alcohol have been linked to risk for alcoholism, and the ILS/ISS strains are used to investigate the genetics of initial sensitivity to alcohol. Array comparative genomic hybridization (arrayCGH) and exome sequencing were conducted to identify CNVs and gene coding sequence differences, respectively, between ILS and ISS mice. Mouse arrayCGH was performed using catalog Agilent 1 × 244 k mouse arrays. Subsequently, exome sequencing was carried out using an Illumina HiSeq 2000 instrument. ArrayCGH detected 74 CNVs that were strain-specific (38 ILS/36 ISS), including several ISS-specific deletions that contained genes implicated in brain function and neurotransmitter release. Among several interesting coding variations detected by exome sequencing was the gain of a premature stop codon in the alpha-amylase 2B (AMY2B) gene specifically in the ILS strain. In total, exome sequencing detected 2,597 and 1,768 strain-specific exonic gene variants in the ILS and ISS mice, respectively. This study represents the most comprehensive and detailed genomic comparison of ILS and ISS mouse strains to date. The two complementary genome-wide approaches identified strain-specific CNVs and gene coding sequence variations that should provide strong candidates to

  9. Preferential amino acid sequences in alumina-catalyzed peptide bond formation.

    PubMed

    Bujdák, J; Rode, B M

    2002-05-21

    The catalytic effect of activated alumina on amino acid condensation was investigated. The readiness of amino acids to form peptide sequences was estimated on the basis of the yield of dipeptides and was found to decrease in the order glycine (Gly), alanine (Ala), leucine (Leu), valine (Val), proline (Pro). For example, approximately 15% Gly was converted to the dipeptide (Gly(2)), 5% to cyclic anhydride (cyc(Gly(2))) and small amounts of tri- (Gly(3)) and tetrapeptide (Gly(4)) were formed after 28 days. On the other hand, only trace amounts of Pro(2) were formed from proline under the same conditions. Preferential formation of certain sequences was observed in the mixed reaction systems containing two amino acids. For example, almost ten times more Gly-Val than Val-Gly was formed in the Gly+Val reaction system. The preferred sequences can be explained on the basis of an inductive effect that side groups have on the nucleophilicity and electrophilicity, respectively, of the amino and carboxyl groups. A comparison with published data of amino acid reactions in other reaction systems revealed that the main trends of preferential sequence formation were the same as those described for the salt-induced peptide formation (SIPF) reaction. The results of this work and other previously published papers show that alumina and related mineral surfaces might have played a crucial role in the prebiotic formation of the first peptides on the primitive earth.

  10. VWF mutations and new sequence variations identified in healthy controls are more frequent in the African-American population.

    PubMed

    Bellissimo, Daniel B; Christopherson, Pamela A; Flood, Veronica H; Gill, Joan Cox; Friedman, Kenneth D; Haberichter, Sandra L; Shapiro, Amy D; Abshire, Thomas C; Leissinger, Cindy; Hoots, W Keith; Lusher, Jeanne M; Ragni, Margaret V; Montgomery, Robert R

    2012-03-01

    Diagnosis and classification of VWD is aided by molecular analysis of the VWF gene. Because VWF polymorphisms have not been fully characterized, we performed VWF laboratory testing and gene sequencing of 184 healthy controls with a negative bleeding history. The controls included 66 (35.9%) African Americans (AAs). We identified 21 new sequence variations, 13 (62%) of which occurred exclusively in AAs and 2 (G967D, T2666M) that were found in 10%-15% of the AA samples, suggesting they are polymorphisms. We identified 14 sequence variations reported previously as VWF mutations, the majority of which were type 1 mutations. These controls had VWF Ag levels within the normal range, suggesting that these sequence variations might not always reduce plasma VWF levels. Eleven mutations were found in AAs, and the frequency of M740I, H817Q, and R2185Q was 15%-18%. Ten AA controls had the 2N mutation H817Q; 1 was homozygous. The average factor VIII level in this group was 99 IU/dL, suggesting that this variation may confer little or no clinical symptoms. This study emphasizes the importance of sequencing healthy controls to understand ethnic-specific sequence variations so that asymptomatic sequence variations are not misidentified as mutations in other ethnic or racial groups.

  11. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2010-07-01 2010-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide and...

  12. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2011-07-01 2011-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide and...

  13. The complete amino acid sequence of human skeletal-muscle fructose-bisphosphate aldolase.

    PubMed Central

    Freemont, P S; Dunbar, B; Fothergill-Gilmore, L A

    1988-01-01

    The complete amino acid sequence of human skeletal-muscle fructose-bisphosphate aldolase, comprising 363 residues, was determined. The sequence was deduced by automated sequencing of CNBr-cleavage, o-iodosobenzoic acid-cleavage, trypsin-digest and staphylococcal-proteinase-digest fragments. Comparison of the sequence with other class I aldolase sequences shows that the mammalian muscle isoenzyme is one of the most highly conserved enzymes known, with only about 2% of the residues changing per 100 million years. Non-mammalian aldolases appear to be evolving at the same rate as other glycolytic enzymes, with about 4% of the residues changing per 100 million years. Secondary-structure predictions are analysed in an accompanying paper [Sawyer, Fothergill-Gilmore & Freemont (1988) Biochem. J. 249, 789-793]. PMID:3355497

  14. Sequences Of Amino Acids For Human Serum Albumin

    NASA Technical Reports Server (NTRS)

    Carter, Daniel C.

    1992-01-01

    Sequences of amino acids defined for use in making polypeptides one-third to one-sixth as large as parent human serum albumin molecule. Smaller, chemically stable peptides have diverse applications including service as artificial human serum and as active components of biosensors and chromatographic matrices. In applications involving production of artificial sera from new sequences, little or no concern about viral contaminants. Smaller genetically engineered polypeptides more easily expressed and produced in large quantities, making commercial isolation and production more feasible and profitable.

  15. LOVD: easy creation of a locus-specific sequence variation database using an "LSDB-in-a-box" approach.

    PubMed

    Fokkema, Ivo F A C; den Dunnen, Johan T; Taschner, Peter E M

    2005-08-01

    The completion of the human genome project has initiated, as well as provided the basis for, the collection and study of all sequence variation between individuals. Direct access to up-to-date information on sequence variation is currently provided most efficiently through web-based, gene-centered, locus-specific databases (LSDBs). We have developed the Leiden Open (source) Variation Database (LOVD) software approaching the "LSDB-in-a-Box" idea for the easy creation and maintenance of a fully web-based gene sequence variation database. LOVD is platform-independent and uses PHP and MySQL open source software only. The basic gene-centered and modular design of the database follows the recommendations of the Human Genome Variation Society (HGVS) and focuses on the collection and display of DNA sequence variations. With minimal effort, the LOVD platform is extendable with clinical data. The open set-up should both facilitate and promote functional extension with scripts written by the community. The LOVD software is freely available from the Leiden Muscular Dystrophy pages (www.DMD.nl/LOVD/). To promote the use of LOVD, we currently offer curators the possibility to set up an LSDB on our Leiden server. (c) 2005 Wiley-Liss, Inc.

  16. Analysis of human herpesvirus-6 IE1 sequence variation in clinical samples.

    PubMed

    Stanton, Richard; Wilkinson, Gavin W G; Fox, Julie D

    2003-12-01

    Herpesvirus immediate early (IE) proteins are known to play key roles in establishing productive infections, regulating reactivation from latency, and creating a cellular environment favourable to viral replication. Human herpesvirus-6 (HHV-6) IE genes have not been studied as intensively as their homologues in the prototype betaherpesvirus human cytomegalovirus (HCMV). Whilst the HCMV IE1 gene is relatively conserved, early studies indicated that HHV-6 IE1 exhibited a high level of sequence variation between HHV-6A and HHV-6B isolates, although the observation was based primarily on virus stocks that had been isolated and propagated in vitro. In this study, we investigated the level of HHV-6 IE1 sequence variation in vivo by direct sequencing of circulating virus in clinical samples without prior in vitro culture. Sequences exactly matching those reported for reference HHV-6 isolates were identified in clinical samples, thus the HHV-6 laboratory strains used in the majority of in vitro studies appear to be representative of virus circulating in vivo with respect to the IE1 gene. The HHV-6 IE1 sequence is also conserved in reference strains that had been passaged extensively in vitro. The high degree of divergence between variant A and B type IE1 sequences was confirmed, but interestingly HHV-6B IE1 sequences were observed to further segregate into two distinct subgroups, with the laboratory strains Z29 and HST representative of these two subgroups. Within each HHV-6B subgroup, a remarkably high level of homology was observed. Thus the HHV-6 IE1 sequence appears highly stable, underlining its potential importance to the viral life cycle. Copyright 2003 Wiley-Liss, Inc.

  17. CNV-seq, a new method to detect copy number variation using high-throughput sequencing.

    PubMed

    Xie, Chao; Tammi, Martti T

    2009-03-06

    DNA copy number variation (CNV) has been recognized as an important source of genetic variation. Array comparative genomic hybridization (aCGH) is commonly used for CNV detection, but the microarray platform has a number of inherent limitations. Here, we describe a method to detect copy number variation using shotgun sequencing, CNV-seq. The method is based on a robust statistical model that describes the complete analysis procedure and allows the computation of essential confidence values for detection of CNV. Our results show that the number of reads, not the length of the reads is the key factor determining the resolution of detection. This favors the next-generation sequencing methods that rapidly produce large amount of short reads. Simulation of various sequencing methods with coverage between 0.1x to 8x show overall specificity between 91.7 - 99.9%, and sensitivity between 72.2 - 96.5%. We also show the results for assessment of CNV between two individual human genomes.

  18. Simultaneous mutation and copy number variation (CNV) detection by multiplex PCR-based GS-FLX sequencing.

    PubMed

    Goossens, Dirk; Moens, Lotte N; Nelis, Eva; Lenaerts, An-Sofie; Glassee, Wim; Kalbe, Andreas; Frey, Bruno; Kopal, Guido; De Jonghe, Peter; De Rijk, Peter; Del-Favero, Jurgen

    2009-03-01

    We evaluated multiplex PCR amplification as a front-end for high-throughput sequencing, to widen the applicability of massive parallel sequencers for the detailed analysis of complex genomes. Using multiplex PCR reactions, we sequenced the complete coding regions of seven genes implicated in peripheral neuropathies in 40 individuals on a GS-FLX genome sequencer (Roche). The resulting dataset showed highly specific and uniform amplification. Comparison of the GS-FLX sequencing data with the dataset generated by Sanger sequencing confirmed the detection of all variants present and proved the sensitivity of the method for mutation detection. In addition, we showed that we could exploit the multiplexed PCR amplicons to determine individual copy number variation (CNV), increasing the spectrum of detected variations to both genetic and genomic variants. We conclude that our straightforward procedure substantially expands the applicability of the massive parallel sequencers for sequencing projects of a moderate number of amplicons (50-500) with typical applications in resequencing exons in positional or functional candidate regions and molecular genetic diagnostics. 2008 Wiley-Liss, Inc.

  19. Thermal and acid tolerant beta xylosidases, arabinofuranosidases, genes encoding, related organisms, and methods

    DOEpatents

    Thompson, David N; Thompson, Vicki S; Schaller, Kastli D; Apel, William A; Reed, David W; Lacey, Jeffrey A

    2013-04-30

    Isolated and/or purified polypeptides and nucleic acid sequences encoding polypeptides from Alicyclobacillus acidocaldarius and variations thereof are provided. Further provided are methods of at least partially degrading xylotriose, xylobiose, and/or arabinofuranose-substituted xylan using isolated and/or purified polypeptides and nucleic acid sequences encoding polypeptides from Alicyclobacillus acidocaldarius and variations thereof.

  20. Comparative characterization of random-sequence proteins consisting of 5, 12, and 20 kinds of amino acids

    PubMed Central

    Tanaka, Junko; Doi, Nobuhide; Takashima, Hideaki; Yanagawa, Hiroshi

    2010-01-01

    Screening of functional proteins from a random-sequence library has been used to evolve novel proteins in the field of evolutionary protein engineering. However, random-sequence proteins consisting of the 20 natural amino acids tend to aggregate, and the occurrence rate of functional proteins in a random-sequence library is low. From the viewpoint of the origin of life, it has been proposed that primordial proteins consisted of a limited set of amino acids that could have been abundantly formed early during chemical evolution. We have previously found that members of a random-sequence protein library constructed with five primitive amino acids show high solubility (Doi et al., Protein Eng Des Sel 2005;18:279–284). Although such a library is expected to be appropriate for finding functional proteins, the functionality may be limited, because they have no positively charged amino acid. Here, we constructed three libraries of 120-amino acid, random-sequence proteins using alphabets of 5, 12, and 20 amino acids by preselection using mRNA display (to eliminate sequences containing stop codons and frameshifts) and characterized and compared the structural properties of random-sequence proteins arbitrarily chosen from these libraries. We found that random-sequence proteins constructed with the 12-member alphabet (including five primitive amino acids and positively charged amino acids) have higher solubility than those constructed with the 20-member alphabet, though other biophysical properties are very similar in the two libraries. Thus, a library of moderate complexity constructed from 12 amino acids may be a more appropriate resource for functional screening than one constructed from 20 amino acids. PMID:20162614

  1. RNA sequencing to study gene expression and single nucleotide polymorphism variation associated with citrate content in cow milk.

    PubMed

    Cánovas, A; Rincón, G; Islas-Trejo, A; Jimenez-Flores, R; Laubscher, A; Medrano, J F

    2013-04-01

    The technological properties of milk have significant importance for the dairy industry. Citrate, a normal constituent of milk, forms one of the main buffer systems that regulate the equilibrium between Ca(2+) and H(+) ions. Higher-than-normal citrate content is associated with poor coagulation properties of milk. To identify the genes responsible for the variation of citrate content in milk in dairy cattle, the metabolic steps involved in citrate and fatty acid synthesis pathways in ruminant mammary tissue using RNA sequencing were studied. Genetic markers that could influence milk citrate content in Holstein cows were used in a marker-trait association study to establish the relationship between 74 single nucleotide polymorphisms (SNP) in 20 candidate genes and citrate content in 250 Holstein cows. This analysis revealed 6 SNP in key metabolic pathway genes [isocitrate dehydrogenase 1 (NADP+), soluble (IDH1); pyruvate dehydrogenase (lipoamide) β (PDHB); pyruvate kinase (PKM2); and solute carrier family 25 (mitochondrial carrier; citrate transporter), member 1 (SLC25A1)] significantly associated with increased milk citrate content. The amount of the phenotypic variation explained by the 6 SNP ranged from 10.1 to 13.7%. Also, genotype-combination analysis revealed the highest phenotypic variation was explained combining IDH1_23211, PDHB_5562, and SLC25A1_4446 genotypes. This specific genotype combination explained 21.3% of the phenotypic variation. The largest citrate associated effect was in the 3' untranslated region of the SLC25A1 gene, which is responsible for the transport of citrate across the mitochondrial inner membrane. This study provides an approach using RNA sequencing, metabolic pathway analysis, and association studies to identify genetic variation in functional target genes determining complex trait phenotypes. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  2. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data shall...

  3. Amino acid sequence of a trypsin inhibitor from a Spirometra (Spirometra erinaceieuropaei).

    PubMed

    Sanda, A; Uchida, A; Itagaki, T; Kobayashi, H; Inokuchi, N; Koyama, T; Iwama, M; Ohgi, K; Irie, M

    2001-12-01

    A trypsin inhibitor that is highly homologous with bovine pancreatic trypsin inhibitor (BPTI) was co-purified along with RNase from Spirometra (Spirometra erinaceieuropaei). The amino acid sequence of this inhibitor (SETI) and the nucleotide sequence of the cDNA encoding this protein were determined by protein chemistry and gene technology. SETI contains 68 amino acid residues and has a molecular mass of 7,798 Da. SETI has 31 amino acid residues that are identical with BPTI's sequence, including 6 half-cystine and 5 aromatic amino acid residues. The active site Lys residue in BPTI is replaced by an Arg residue in SETI. SETI is an effective inhibitor of trypsin and moderately inhibits a-chymotrypsin, but less inhibits elastase or subtilisin. SETI was expressed by E. coli containing a PelB vector carrying the SETI encoding cDNA; an expression yield of 0.68 mg/l was obtained. The phylogenetic relationship of SETI and the other BPTI-like trypsin inhibitors was analyzed using most likelihood inference methods.

  4. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... and/or amino acid sequences as part of the application. 1.823 Section 1.823 Patents, Trademarks, and... Amino Acid Sequences § 1.823 Requirements for nucleotide and/or amino acid sequences as part of the... incorporation-by-reference of the Sequence Listing as required by § 1.52(e)(5). The presentation of the...

  5. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... and/or amino acid sequences as part of the application. 1.823 Section 1.823 Patents, Trademarks, and... Amino Acid Sequences § 1.823 Requirements for nucleotide and/or amino acid sequences as part of the... incorporation-by-reference of the Sequence Listing as required by § 1.52(e)(5). The presentation of the...

  6. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... and/or amino acid sequences as part of the application. 1.823 Section 1.823 Patents, Trademarks, and... Amino Acid Sequences § 1.823 Requirements for nucleotide and/or amino acid sequences as part of the... incorporation-by-reference of the Sequence Listing as required by § 1.52(e)(5). The presentation of the...

  7. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... and/or amino acid sequences as part of the application. 1.823 Section 1.823 Patents, Trademarks, and... Amino Acid Sequences § 1.823 Requirements for nucleotide and/or amino acid sequences as part of the... incorporation-by-reference of the Sequence Listing as required by § 1.52(e)(5). The presentation of the...

  8. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... and/or amino acid sequences as part of the application. 1.823 Section 1.823 Patents, Trademarks, and... Amino Acid Sequences § 1.823 Requirements for nucleotide and/or amino acid sequences as part of the... incorporation-by-reference of the Sequence Listing as required by § 1.52(e)(5). The presentation of the...

  9. Nucleotide sequence analysis of the gene encoding the Deinococcus radiodurans surface protein, derived amino acid sequence, and complementary protein chemical studies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Peters, J.; Peters, M.; Lottspeich, F.

    1987-11-01

    The complete nucleotide sequence of the gene encoding the surface (hexagonally packed intermediate (HPI))-layer polypeptide of Deinococcus radiodurans Sark was determined and found to encode a polypeptide of 1036 amino acids. Amino acid sequence analysis of about 30% of the residues revealed that the mature polypeptide consists of at least 978 amino acids. The N terminus was blocked to Edman degradation. The results of proteolytic modification of the HPI layer in situ and M/sub r/ estimations of the HPI polypeptide expressed in Escherichia coli indicated that there is a leader sequence. The N-terminal region contained a very high percentage (29%)more » of threonine and serine, including a cluster of nine consecutive serine or threonine residues, whereas a stretch near the C terminus was extremely rich in aromatic amino acids (29%). The protein contained at least two disulfide bridges, as well as tightly bound reducing sugars and fatty acids.« less

  10. Comparative RNA sequencing reveals substantial genetic variation in endangered primates

    PubMed Central

    Perry, George H.; Melsted, Páll; Marioni, John C.; Wang, Ying; Bainer, Russell; Pickrell, Joseph K.; Michelini, Katelyn; Zehr, Sarah; Yoder, Anne D.; Stephens, Matthew; Pritchard, Jonathan K.; Gilad, Yoav

    2012-01-01

    Comparative genomic studies in primates have yielded important insights into the evolutionary forces that shape genetic diversity and revealed the likely genetic basis for certain species-specific adaptations. To date, however, these studies have focused on only a small number of species. For the majority of nonhuman primates, including some of the most critically endangered, genome-level data are not yet available. In this study, we have taken the first steps toward addressing this gap by sequencing RNA from the livers of multiple individuals from each of 16 mammalian species, including humans and 11 nonhuman primates. Of the nonhuman primate species, five are lemurs and two are lorisoids, for which little or no genomic data were previously available. To analyze these data, we developed a method for de novo assembly and alignment of orthologous gene sequences across species. We assembled an average of 5721 gene sequences per species and characterized diversity and divergence of both gene sequences and gene expression levels. We identified patterns of variation that are consistent with the action of positive or directional selection, including an 18-fold enrichment of peroxisomal genes among genes whose regulation likely evolved under directional selection in the ancestral primate lineage. Importantly, we found no relationship between genetic diversity and endangered status, with the two most endangered species in our study, the black and white ruffed lemur and the Coquerel's sifaka, having the highest genetic diversity among all primates. Our observations imply that many endangered lemur populations still harbor considerable genetic variation. Timely efforts to conserve these species alongside their habitats have, therefore, strong potential to achieve long-term success. PMID:22207615

  11. Copy number variation of individual cattle genomes using next-generation sequencing

    USDA-ARS?s Scientific Manuscript database

    Copy number variations (CNVs) affect a wide range of phenotypic traits; however, CNVs in or near segmental duplication regions are often intractable. Using a read depth approach based on next-generation sequencing, we examined genome-wide copy number differences among five taurine (three Angus, one ...

  12. Identification of the sequence variations of 15 autosomal STR loci in a Chinese population.

    PubMed

    Chen, Wenjing; Cheng, Jianding; Ou, Xueling; Chen, Yong; Tong, Dayue; Sun, Hongyu

    2014-01-01

    DNA sequence variation including base(s) changes and insertion or deletion in the primer binding region may cause a null allele and, if this changes the length of the amplified fragment out of the allelic ladder, off-ladder (OL) alleles may be detected. In order to provide accurate and reliable DNA evidence for forensic DNA analysis, it is essential to clarify sequence variations in prevalently used STR loci. Suspected null alleles and OL alleles of PlowerPlex16® System from 21,934 unrelated Chinese individuals were verified by alternative systems and sequenced. A total of 17 cases with null alleles were identified, including 12 kinds of point mutations in 16 cases and a 19-base deletion in one case. The total frequency of null alleles was 7.751 × 10(-4). Eight hundred and forty-four OL alleles classified as being of 97 different kinds were observed at 15 STR loci of the PowerPlex®16 system except vWA. All the frequencies of OL alleles were under 0.01. Null alleles should be confirmed by alternative primers and OL alleles should be named appropriately. Particular attention should be paid to sequence variation, since incorrect designation could lead to false conclusions.

  13. Amino acid sequence of tyrosinase from Neurospora crassa.

    PubMed Central

    Lerch, K

    1978-01-01

    The amino-acid sequence of tyrosinase from Neurospora crassa (monophenol,dihydroxyphenylalanine:oxygen oxidoreductase, EC 1.14.18.1) is reported. This copper-containing oxidase consists of a single polypeptide chain of 407 amino acids. The primary structure was determined by automated and manual sequence analysis on fragments produced by cleavage with cyanogen bromide and on peptides obtained by digestion with trypsin, pepsin, thermolysin, or chymotrypsin. The amino terminus of the protein is acetylated and the single cysteinyl residue 96 is covalently linked via a thioether bridge to histidyl residue 94. The formation and the possible role of this unusual structure in Neurospora tyrosinase is discussed. Dye-sensitized photooxidation of apotyrosinase and active-site-directed inactivation of the native enzyme indicate the possible involvement of histidyl residues 188, 192, 289, and 305 or 306 as ligands to the active-site copper as well as in the catalytic mechanism of this monooxygenase. PMID:151279

  14. Variation of unsaturated fatty acids in soybean sprout of high oleic acid accessions.

    PubMed

    Dhakal, Krishna Hari; Jung, Ki-Hwal; Chae, Jong-Hyun; Shannon, J Grover; Lee, Jeong-Dong

    2014-12-01

    Oleic acid and oleic acid rich foods may have beneficial health effects in humans. Soybeans with high oleic acid (around 80% in seed oil) have been developed. Soybean sprouts are an important vegetable in Korea, Japan and China. The objective of this study was to investigate the variation of unsaturated fatty acids, oleic, linoleic and α-linolenic acids, in sprouts from soybeans with normal and high oleic acid concentration. Twelve soybean accessions with six high oleic acid lines, three parents of high oleic acid lines, and three checks with normal and high oleic acid concentration were used in this study. The unsaturated fatty acid concentration in sprouts from each genotype was similar to the concentration in the ungerminated seed. The oleic acid concentration in the sprouts of high oleic acid lines (up to 80%) was still high (>70%) compared to the ungerminated seed. Thus, high oleic soybean varieties developed for sprout production could add valuable health benefits to sprouts and the individuals who consume this vegetable. Copyright © 2014 Elsevier Ltd. All rights reserved.

  15. An Overview of Genomic Sequence Variation Markup Language (GSVML)

    PubMed Central

    Nakaya, Jun; Hiroi, Kaei; Ido, Keisuke; Yang, Woosung; Kimura, Michio

    2006-01-01

    Internationally accumulated genomic sequence variation data on human requires the interoperable data exchanging format. We developed the GSVML as the data exchanging format. The GSVML is human health oriented and has three categories. Analyses on the use case in human health domain and the investigation on the databases and markup languages were conducted. An interface ability to Health Level Seven Genotype Model was examined. GSVML provides a sharable platform for both clinical and research applications.

  16. Sequence variation of functional HTLV-II tax alleles among isolates from an endemic population: lack of evidence for oncogenic determinant in tax.

    PubMed

    Hjelle, B; Chaney, R

    1992-02-01

    Human T-cell leukemia-lymphoma virus type II (HTLV-II) has been isolated from patients with hairy cell leukemia (HCL). We previously described a population with longstanding endemic HTLV-II infection, and showed that there is no increased risk for HCL in the affected groups. We thus have direct evidence that the endemic form(s) of HTLV-II cause HCL infrequently, if at all. By comparison, there is reason to suspect that the viruses isolated from patients with HCL had an etiologic role in the disease in those patients. One way to reconcile these conflicting observations is to consider that isolates of HTLV-II might differ in oncogenic potential. To determine whether the structure of the putative oncogenic determinant of HTLV-II, tax2, might differ in the new isolates compared to the tax of the prototype HCL isolate, MO, four new functional tax cDNAs were cloned from new isolates. Sequence analysis showed only minor (0.9-2.0%) amino acid variation compared to the published sequence of MO tax2. Some codons were consistently different from published sequences of the MO virus, but in most cases, such variations were also found in each of two tax2 clones we isolated from the MO T-cell line. These variations rendered the new clones more similar to the tax1 of the pathogenic virus HTLV-I. Thus we find no evidence that pathologic determinants of HTLV-II can be assigned to the tax gene.

  17. Variation in amino acid and lipid composition of latent fingerprints.

    PubMed

    Croxton, Ruth S; Baron, Mark G; Butler, David; Kent, Terry; Sears, Vaughn G

    2010-06-15

    The enhancement of latent fingerprints, both at the crime scene and in the laboratory using an array of chemical, physical and optical techniques, permits their use for identification. Despite the plethora of techniques available, there are occasions when latent fingerprints are not successfully enhanced. An understanding of latent fingerprint chemistry and behaviour will aid the improvement of current techniques and the development of novel ones. In this study the amino acid and fatty acid content of 'real' latent fingerprints collected on a non-porous surface was analysed by gas chromatography-mass spectrometry. Squalene was also quantified in addition. Hexadecanoic acid, octadecanoic acid and cis-9-octadecenoic acid were the most abundant fatty acids in all samples. There was, however, wide variation in the relative amounts of each fatty acid in each sample. It was clearly demonstrated that touching sebum-rich areas of the face immediately prior to fingerprint deposition resulted in a significant increase in the amount of fatty acids and squalene deposited in the resulting 'groomed' fingerprints. Serine was the most abundant amino acid identified followed by glycine, alanine and aspartic acid. The significant quantitative differences between the 'natural' and 'groomed' fingerprint samples seen for fatty acids were not observed in the case of the amino acids. This study demonstrates the variation in latent fingerprint composition between individuals and the impact of the sampling protocol on the quantitative analysis of fingerprints. (c) 2010 Elsevier Ireland Ltd. All rights reserved.

  18. Copy number variation of individual cattle genomes using next-generation sequencing

    USDA-ARS?s Scientific Manuscript database

    Copy Number Variations (CNVs) affect a wide range of phenotypic traits; however, CNVs in or near segmental duplication regions are often difficult to track. Using a read depth approach based on next generation sequencing, we examined genome-wide copy number differences among five taurine (three Angu...

  19. Color differences among feral pigeons (Columba livia) are not attributable to sequence variation in the coding region of the melanocortin-1 receptor gene (MC1R)

    PubMed Central

    2013-01-01

    Background Genetic variation at the melanocortin-1 receptor (MC1R) gene is correlated with melanin color variation in many birds. Feral pigeons (Columba livia) show two major melanin-based colorations: a red coloration due to pheomelanic pigment and a black coloration due to eumelanic pigment. Furthermore, within each color type, feral pigeons display continuous variation in the amount of melanin pigment present in the feathers, with individuals varying from pure white to a full dark melanic color. Coloration is highly heritable and it has been suggested that it is under natural or sexual selection, or both. Our objective was to investigate whether MC1R allelic variants are associated with plumage color in feral pigeons. Findings We sequenced 888 bp of the coding sequence of MC1R among pigeons varying both in the type, eumelanin or pheomelanin, and the amount of melanin in their feathers. We detected 10 non-synonymous substitutions and 2 synonymous substitution but none of them were associated with a plumage type. It remains possible that non-synonymous substitutions that influence coloration are present in the short MC1R fragment that we did not sequence but this seems unlikely because we analyzed the entire functionally important region of the gene. Conclusions Our results show that color differences among feral pigeons are probably not attributable to amino acid variation at the MC1R locus. Therefore, variation in regulatory regions of MC1R or variation in other genes may be responsible for the color polymorphism of feral pigeons. PMID:23915680

  20. Somatic Genetic Variation in Solid Pseudopapillary Tumor of the Pancreas by Whole Exome Sequencing

    PubMed Central

    Guo, Meng; Luo, Guopei; Jin, Kaizhou; Long, Jiang; Cheng, He; Lu, Yu; Wang, Zhengshi; Yang, Chao; Xu, Jin; Ni, Quanxing; Yu, Xianjun; Liu, Chen

    2017-01-01

    Solid pseudopapillary tumor of the pancreas (SPT) is a rare pancreatic disease with a unique clinical manifestation. Although CTNNB1 gene mutations had been universally reported, genetic variation profiles of SPT are largely unidentified. We conducted whole exome sequencing in nine SPT patients to probe the SPT-specific insertions and deletions (indels) and single nucleotide polymorphisms (SNPs). In total, 54 SNPs and 41 indels of prominent variations were demonstrated through parallel exome sequencing. We detected that CTNNB1 mutations presented throughout all patients studied (100%), and a higher count of SNPs was particularly detected in patients with older age, larger tumor, and metastatic disease. By aggregating 95 detected variation events and viewing the interconnections among each of the genes with variations, CTNNB1 was identified as the core portion in the network, which might collaborate with other events such as variations of USP9X, EP400, HTT, MED12, and PKD1 to regulate tumorigenesis. Pathway analysis showed that the events involved in other cancers had the potential to influence the progression of the SNPs count. Our study revealed an insight into the variation of the gene encoding region underlying solid-pseudopapillary neoplasm tumorigenesis. The detection of these variations might partly reflect the potential molecular mechanism. PMID:28054945

  1. Nanopores and nucleic acids: prospects for ultrarapid sequencing

    NASA Technical Reports Server (NTRS)

    Deamer, D. W.; Akeson, M.

    2000-01-01

    DNA and RNA molecules can be detected as they are driven through a nanopore by an applied electric field at rates ranging from several hundred microseconds to a few milliseconds per molecule. The nanopore can rapidly discriminate between pyrimidine and purine segments along a single-stranded nucleic acid molecule. Nanopore detection and characterization of single molecules represents a new method for directly reading information encoded in linear polymers. If single-nucleotide resolution can be achieved, it is possible that nucleic acid sequences can be determined at rates exceeding a thousand bases per second.

  2. PubDNA Finder: a web database linking full-text articles to sequences of nucleic acids.

    PubMed

    García-Remesal, Miguel; Cuevas, Alejandro; Pérez-Rey, David; Martín, Luis; Anguita, Alberto; de la Iglesia, Diana; de la Calle, Guillermo; Crespo, José; Maojo, Víctor

    2010-11-01

    PubDNA Finder is an online repository that we have created to link PubMed Central manuscripts to the sequences of nucleic acids appearing in them. It extends the search capabilities provided by PubMed Central by enabling researchers to perform advanced searches involving sequences of nucleic acids. This includes, among other features (i) searching for papers mentioning one or more specific sequences of nucleic acids and (ii) retrieving the genetic sequences appearing in different articles. These additional query capabilities are provided by a searchable index that we created by using the full text of the 176 672 papers available at PubMed Central at the time of writing and the sequences of nucleic acids appearing in them. To automatically extract the genetic sequences occurring in each paper, we used an original method we have developed. The database is updated monthly by automatically connecting to the PubMed Central FTP site to retrieve and index new manuscripts. Users can query the database via the web interface provided. PubDNA Finder can be freely accessed at http://servet.dia.fi.upm.es:8080/pubdnafinder

  3. RSAT 2015: Regulatory Sequence Analysis Tools.

    PubMed

    Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

    2015-07-01

    RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. Genome-wide copy number variation in the bovine genome detected using low coverage sequence of popular beef breeds

    USDA-ARS?s Scientific Manuscript database

    Genomic structural variations are an important source of genetic diversity. Copy number variations (CNVs), gains and losses of large regions of genomic sequence between individuals of a species, are known to be associated with both diseases and phenotypic traits. Deeply sequenced genomes are often u...

  5. A statistical approach to detection of copy number variations in PCR-enriched targeted sequencing data.

    PubMed

    Demidov, German; Simakova, Tamara; Vnuchkova, Julia; Bragin, Anton

    2016-10-22

    Multiplex polymerase chain reaction (PCR) is a common enrichment technique for targeted massive parallel sequencing (MPS) protocols. MPS is widely used in biomedical research and clinical diagnostics as the fast and accurate tool for the detection of short genetic variations. However, identification of larger variations such as structure variants and copy number variations (CNV) is still being a challenge for targeted MPS. Some approaches and tools for structural variants detection were proposed, but they have limitations and often require datasets of certain type, size and expected number of amplicons affected by CNVs. In the paper, we describe novel algorithm for high-resolution germinal CNV detection in the PCR-enriched targeted sequencing data and present accompanying tool. We have developed a machine learning algorithm for the detection of large duplications and deletions in the targeted sequencing data generated with PCR-based enrichment step. We have performed verification studies and established the algorithm's sensitivity and specificity. We have compared developed tool with other available methods applicable for the described data and revealed its higher performance. We showed that our method has high specificity and sensitivity for high-resolution copy number detection in targeted sequencing data using large cohort of samples.

  6. Complete complementary DNA-derived amino acid sequence of canine cardiac phospholamban.

    PubMed Central

    Fujii, J; Ueno, A; Kitano, K; Tanaka, S; Kadoma, M; Tada, M

    1987-01-01

    Complementary DNA (cDNA) clones specific for phospholamban of sarcoplasmic reticulum membranes have been isolated from a canine cardiac cDNA library. The amino acid sequence deduced from the cDNA sequence indicates that phospholamban consists of 52 amino acid residues and lacks an amino-terminal signal sequence. The protein has an inferred mol wt 6,080 that is in agreement with its apparent monomeric mol wt 6,000, estimated previously by sodium dodecyl sulfate-polyacrylamide gel electrophoresis. Phospholamban contains two distinct domains, a hydrophilic region at the amino terminus (domain I) and a hydrophobic region at the carboxy terminus (domain II). We propose that domain I is localized at the cytoplasmic surface and offers phosphorylatable sites whereas domain II is anchored into the sarcoplasmic reticulum membrane. PMID:3793929

  7. Study of FibroTest and hyaluronic acid biological variation in healthy volunteers and comparison of serum hyaluronic acid biological variation between chronic liver diseases of different etiology and fibrotic stage using confidence intervals.

    PubMed

    Istaces, Nicolas; Gulbis, Béatrice

    2015-07-01

    Personalized ranges of liver fibrosis serum biomarkers such as FibroTest or hyaluronic acid could be used for early detection of fibrotic changes in patients with progressive chronic liver disease. Our aim was to generate reliable biological variation estimates for these two biomarkers with confidence intervals for within-subject biological variation and reference change value. Nine fasting healthy volunteers and 66 chronic liver disease patients were included. Biological variation estimates were calculated for FibroTest in healthy volunteers, and for hyaluronic acid in healthy volunteers and chronic liver disease patients stratified by etiology and liver fibrosis stage. In healthy volunteers, within-subject biological coefficient of variation (with 95% confidence intervals) and index of individuality were 20% (16%-28%) and 0.6 for FibroTest and 34% (27%-47%) and 0.79 for hyaluronic acid, respectively. Overall hyaluronic acid within-subject biological coefficient of variation was similar among non-alcoholic fatty liver disease and chronic hepatitis C with 41% (34%-52%) and 45% (39%-55%), respectively, in contrast to chronic hepatitis B with 170% (140%-215%). Hyaluronic acid within-subject biological coefficients of variation were similar between F0-F1, F2 and F3 liver fibrosis stages in non-alcoholic fatty liver disease with 34% (25%-49%), 41% (31%-59%) and 34% (23%-62%), respectively, and in chronic hepatitis C with 34% (27%-47%), 33% (26%-45%) and 38% (27%-65%), respectively. However, corresponding hyaluronic acid indexes of individuality were lower in the higher fibrosis stages. Non-overlapping confidence intervals of biological variation estimates allowed us to detect significant differences regarding hyaluronic acid biological variation between chronic liver disease subgroups. Copyright © 2015 The Canadian Society of Clinical Chemists. Published by Elsevier Inc. All rights reserved.

  8. Sequence, distribution and chromosomal context of class I and class II pilin genes of Neisseria meningitidis identified in whole genome sequences

    PubMed Central

    2014-01-01

    Background Neisseria meningitidis expresses type four pili (Tfp) which are important for colonisation and virulence. Tfp have been considered as one of the most variable structures on the bacterial surface due to high frequency gene conversion, resulting in amino acid sequence variation of the major pilin subunit (PilE). Meningococci express either a class I or a class II pilE gene and recent work has indicated that class II pilins do not undergo antigenic variation, as class II pilE genes encode conserved pilin subunits. The purpose of this work was to use whole genome sequences to further investigate the frequency and variability of the class II pilE genes in meningococcal isolate collections. Results We analysed over 600 publically available whole genome sequences of N. meningitidis isolates to determine the sequence and genomic organization of pilE. We confirmed that meningococcal strains belonging to a limited number of clonal complexes (ccs, namely cc1, cc5, cc8, cc11 and cc174) harbour a class II pilE gene which is conserved in terms of sequence and chromosomal context. We also identified pilS cassettes in all isolates with class II pilE, however, our analysis indicates that these do not serve as donor sequences for pilE/pilS recombination. Furthermore, our work reveals that the class II pilE locus lacks the DNA sequence motifs that enable (G4) or enhance (Sma/Cla repeat) pilin antigenic variation. Finally, through analysis of pilin genes in commensal Neisseria species we found that meningococcal class II pilE genes are closely related to pilE from Neisseria lactamica and Neisseria polysaccharea, suggesting horizontal transfer among these species. Conclusions Class II pilins can be defined by their amino acid sequence and genomic context and are present in meningococcal isolates which have persisted and spread globally. The absence of G4 and Sma/Cla sequences adjacent to the class II pilE genes is consistent with the lack of pilin subunit variation in these

  9. The complete amino acid sequence of human erythrocyte diphosphoglycerate mutase.

    PubMed Central

    Haggarty, N W; Dunbar, B; Fothergill, L A

    1983-01-01

    The complete amino acid sequence of human erythrocyte diphosphoglycerate mutase, comprising 239 residues, was determined. The sequence was deduced from the four cyanogen bromide fragments, and from the peptides derived from these fragments after digestion with a number of proteolytic enzymes. Comparison of this sequence with that of the yeast glycolytic enzyme, phosphoglycerate mutase, shows that these enzymes are 47% identical. Most, but not all, of the residues implicated as being important for the activity of the glycolytic mutase are conserved in the erythrocyte diphosphoglycerate mutase. PMID:6313356

  10. Detection and quantitation of single nucleotide polymorphisms, DNA sequence variations, DNA mutations, DNA damage and DNA mismatches

    DOEpatents

    McCutchen-Maloney, Sandra L.

    2002-01-01

    DNA mutation binding proteins alone and as chimeric proteins with nucleases are used with solid supports to detect DNA sequence variations, DNA mutations and single nucleotide polymorphisms. The solid supports may be flow cytometry beads, DNA chips, glass slides or DNA dips sticks. DNA molecules are coupled to solid supports to form DNA-support complexes. Labeled DNA is used with unlabeled DNA mutation binding proteins such at TthMutS to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by binding which gives an increase in signal. Unlabeled DNA is utilized with labeled chimeras to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by nuclease activity of the chimera which gives a decrease in signal.

  11. Sequence-length variation of mtDNA HVS-I C-stretch in Chinese ethnic groups.

    PubMed

    Chen, Feng; Dang, Yong-hui; Yan, Chun-xia; Liu, Yan-ling; Deng, Ya-jun; Fulton, David J R; Chen, Teng

    2009-10-01

    The purpose of this study was to investigate mitochondrial DNA (mtDNA) hypervariable segment-I (HVS-I) C-stretch variations and explore the significance of these variations in forensic and population genetics studies. The C-stretch sequence variation was studied in 919 unrelated individuals from 8 Chinese ethnic groups using both direct and clone sequencing approaches. Thirty eight C-stretch haplotypes were identified, and some novel and population specific haplotypes were also detected. The C-stretch genetic diversity (GD) values were relatively high, and probability (P) values were low. Additionally, C-stretch length heteroplasmy was observed in approximately 9% of individuals studied. There was a significant correlation (r=-0.961, P<0.01) between the expansion of the cytosine sequence length in the C-stretch of HVS-I and a reduction in the number of upstream adenines. These results indicate that the C-stretch could be a useful genetic maker in forensic identification of Chinese populations. The results from the Fst and dA genetic distance matrix, neighbor-joining tree, and principal component map also suggest that C-stretch could be used as a reliable genetic marker in population genetics.

  12. Quantum-Sequencing: Biophysics of quantum tunneling through nucleic acids

    NASA Astrophysics Data System (ADS)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    Tunneling microscopy and spectroscopy has extensively been used in physical surface sciences to study quantum tunneling to measure electronic local density of states of nanomaterials and to characterize adsorbed species. Quantum-Sequencing (Q-Seq) is a new method based on tunneling microscopy for electronic sequencing of single molecule of nucleic acids. A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free single-molecule sequencing method. Here, we present the unique ``electronic fingerprints'' for all nucleotides on DNA and RNA using Q-Seq along their intrinsic biophysical parameters. We have analyzed tunneling spectra for the nucleotides at different pH conditions and analyzed the HOMO, LUMO and energy gap for all of them. In addition we show a number of biophysical parameters to further characterize all nucleobases (electron and hole transition voltage and energy barriers). These results highlight the robustness of Q-Seq as a technique for next-generation sequencing.

  13. Amino acid sequence of the Amur tiger prion protein.

    PubMed

    Wu, Changde; Pang, Wanyong; Zhao, Deming

    2006-10-01

    Prion diseases are fatal neurodegenerative disorders in human and animal associated with conformational conversion of a cellular prion protein (PrP(C)) into the pathologic isoform (PrP(Sc)). Various data indicate that the polymorphisms within the open reading frame (ORF) of PrP are associated with the susceptibility and control the species barrier in prion diseases. In the present study, partial Prnp from 25 Amur tigers (tPrnp) were cloned and screened for polymorphisms. Four single nucleotide polymorphisms (T423C, A501G, C511A, A610G) were found; the C511A and A610G nucleotide substitutions resulted in the amino acid changes Lysine171Glutamine and Alanine204Threoine, respectively. The tPrnp amino acid sequence is similar to house cat (Felis catus ) and sheep, but differs significantly from other two cat Prnp sequences that were previously deposited in GenBank.

  14. Length Variation, Heteroplasmy and Sequence Divergence in the Mitochondrial DNA of Four Species of Sturgeon (Acipenser)

    PubMed Central

    Brown, J. R.; Beckenbach, K.; Beckenbach, A. T.; Smith, M. J.

    1996-01-01

    The extent of mtDNA length variation and heteroplasmy as well as DNA sequences of the control region and two tRNA genes were determined for four North American sturgeon species: Acipenser transmontanus, A. medirostris, A. fulvescens and A. oxyrhnychus. Across the Continental Divide, a division in the occurrence of length variation and heteroplasmy was observed that was concordant with species biogeography as well as with phylogenies inferred from restriction fragment length polymorphisms (RFLP) of whole mtDNA and pairwise comparisons of unique sequences of the control region. In all species, mtDNA length variation was due to repeated arrays of 78-82-bp sequences each containing a D-loop strand synthesis termination associated sequence (TAS). Individual repeats showed greater sequence conservation within individuals and species rather than between species, which is suggestive of concerted evolution. Differences in the frequencies of multiple copy genomes and heteroplasmy among the four species may be ascribed to differences in the rates of recurrent mutation. A mechanism that may offset the high rate of mutation for increased copy number is suggested on the basis that an increase in the number of functional TAS motifs might reduce the frequency of successfully initiated H-strand replications. PMID:8852850

  15. Influence of the Amino Acid Sequence on Protein-Mineral Interactions in Soil

    NASA Astrophysics Data System (ADS)

    Chacon, S. S.; Reardon, P. N.; Purvine, S.; Lipton, M. S.; Washton, N.; Kleber, M.

    2017-12-01

    The intimate associations between protein and mineral surfaces have profound impacts on nutrient cycling in soil. Proteins are an important source of organic C and N, and a subset of proteins, extracellular enzymes (EE), can catalyze the depolymerization of soil organic matter (SOM). Our goal was to determine how variation in the amino acid sequence could influence a protein's susceptibility to become chemically altered by mineral surfaces to infer the fate of adsorbed EE function in soil. We hypothesized that (1) addition of charged amino acids would enhance the adsorption onto oppositely charged mineral surfaces (2) addition of aromatic amino acids would increase adsorption onto zero charged surfaces (3) Increase adsorption of modified proteins would enhance their susceptibility to alterations by redox active minerals. To test these hypotheses, we generated three engineered proxies of a model protein Gb1 (IEP 4.0, 6.2 kDA) by inserting either negatively charged, positively charged or aromatic amino acids in the second loop. These modified proteins were allowed to interact with functionally different mineral surfaces (goethite, montmorillonite, kaolinite and birnessite) at pH 5 and 7. We used LC-MS/MS and solution-state Heteronuclear Single Quantum Coherence Spectroscopy NMR to observe modifications on engineered proteins as a consequence to mineral interactions. Preliminary results indicate that addition of any amino acids to a protein increase its susceptibility to fragmentation and oxidation by redox active mineral surfaces, and alter adsorption to the other mineral surfaces. This suggest that not all mineral surfaces in soil may act as sorbents for EEs and chemical modification of their structure should also be considered as an explanation for decrease in EE activity. Fragmentation of proteins by minerals can bypass the need to produce proteases, but microbial acquisition of other nutrients that require enzymes such as cellulases, ligninases or phosphatases

  16. Ovine mitochondrial DNA sequence variation and its association with production and reproduction traits within an Afec-Assaf flock.

    PubMed

    Reicher, S; Seroussi, E; Weller, J I; Rosov, A; Gootwine, E

    2012-07-01

    Polymorphisms in mitochondrial DNA (mtDNA) protein- and tRNA-coding genes were shown to be associated with various diseases in humans as well as with production and reproduction traits in livestock. Alignment of full length mitochondria sequences from the 5 known ovine haplogroups: HA (n = 3), HB (n = 5), HC (n = 3), HD (n = 2), and HE (n = 2; GenBank accession nos. HE577847-50 and 11 published complete ovine mitochondria sequences) revealed sequence variation in 10 out of the 13 protein coding mtDNA sequences. Twenty-six of the 245 variable sites found in the protein coding sequences represent non-synonymous mutations. Sequence variation was observed also in 8 out of the 22 tRNA mtDNA sequences. On the basis of the mtDNA control region and cytochrome b partial sequences along with information on maternal lineages within an Afec-Assaf flock, 1,126 Afec-Assaf ewes were assigned to mitochondrial haplogroups HA, HB, and HC, with frequencies of 0.43, 0.43, and 0.14, respectively. Analysis of birth weight and growth rate records of lamb (n = 1286) and productivity from 4,993 lambing records revealed no association between mitochondrial haplogroup affiliation and female longevity, lambs perinatal survival rate, birth weight, and daily growth rate of lambs up to 150 d that averaged 1,664 d, 88.3%, 4.5 kg, and 320 g/d, respectively. However, significant (P < 0.0001) differences among the haplogroups were found for prolificacy of ewes, with prolificacies (mean ± SE) of 2.14 ± 0.04, 2.25 ± 0.04, and 2.30 ± 0.06 lamb born/ewe lambing for the HA, HB, and the HC haplogroups, respectively. Our results highlight the ovine mitogenome genetic variation in protein- and tRNA coding genes and suggest that sequence variation in ovine mtDNA is associated with variation in ewe prolificacy.

  17. Method for high-volume sequencing of nucleic acids: random and directed priming with libraries of oligonucleotides

    DOEpatents

    Studier, F. William

    1995-04-18

    Random and directed priming methods for determining nucleotide sequences by enzymatic sequencing techniques, using libraries of primers of lengths 8, 9 or 10 bases, are disclosed. These methods permit direct sequencing of nucleic acids as large as 45,000 base pairs or larger without the necessity for subcloning. Individual primers are used repeatedly to prime sequence reactions in many different nucleic acid molecules. Libraries containing as few as 10,000 octamers, 14,200 nonamers, or 44,000 decamers would have the capacity to determine the sequence of almost any cosmid DNA. Random priming with a fixed set of primers from a smaller library can also be used to initiate the sequencing of individual nucleic acid molecules, with the sequence being completed by directed priming with primers from the library. In contrast to random cloning techniques, a combined random and directed priming strategy is far more efficient.

  18. Method for high-volume sequencing of nucleic acids: random and directed priming with libraries of oligonucleotides

    DOEpatents

    Studier, F.W.

    1995-04-18

    Random and directed priming methods for determining nucleotide sequences by enzymatic sequencing techniques, using libraries of primers of lengths 8, 9 or 10 bases, are disclosed. These methods permit direct sequencing of nucleic acids as large as 45,000 base pairs or larger without the necessity for subcloning. Individual primers are used repeatedly to prime sequence reactions in many different nucleic acid molecules. Libraries containing as few as 10,000 octamers, 14,200 nonamers, or 44,000 decamers would have the capacity to determine the sequence of almost any cosmid DNA. Random priming with a fixed set of primers from a smaller library can also be used to initiate the sequencing of individual nucleic acid molecules, with the sequence being completed by directed priming with primers from the library. In contrast to random cloning techniques, a combined random and directed priming strategy is far more efficient. 2 figs.

  19. Modular probes for enriching and detecting complex nucleic acid sequences

    NASA Astrophysics Data System (ADS)

    Wang, Juexiao Sherry; Yan, Yan Helen; Zhang, David Yu

    2017-12-01

    Complex DNA sequences are difficult to detect and profile, but are important contributors to human health and disease. Existing hybridization probes lack the capability to selectively bind and enrich hypervariable, long or repetitive sequences. Here, we present a generalized strategy for constructing modular hybridization probes (M-Probes) that overcomes these challenges. We demonstrate that M-Probes can tolerate sequence variations of up to 7 nt at prescribed positions while maintaining single nucleotide sensitivity at other positions. M-Probes are also shown to be capable of sequence-selectively binding a continuous DNA sequence of more than 500 nt. Furthermore, we show that M-Probes can detect genes with triplet repeats exceeding a programmed threshold. As a demonstration of this technology, we have developed a hybrid capture method to determine the exact triplet repeat expansion number in the Huntington's gene of genomic DNA using quantitative PCR.

  20. Activation of c-jun N-terminal kinase upon influenza A virus (IAV) infection is independent of pathogen-related receptors but dependent on amino acid sequence variations of IAV NS1.

    PubMed

    Nacken, Wolfgang; Anhlan, Darisuren; Hrincius, Eike R; Mostafa, Ahmed; Wolff, Thorsten; Sadewasser, Anne; Pleschka, Stephan; Ehrhardt, Christina; Ludwig, Stephan

    2014-08-01

    A hallmark cell response to influenza A virus (IAV) infections is the phosphorylation and activation of c-jun N-terminal kinase (JNK). However, so far it is not fully clear which molecules are involved in the activation of JNK upon IAV infection. Here, we report that the transfection of influenza viral-RNA induces JNK in a retinoic acid-inducible gene I (RIG-I)-dependent manner. However, neither RIG-I-like receptors nor MyD88-dependent Toll-like receptors were found to be involved in the activation of JNK upon IAV infection. Viral JNK activation may be blocked by addition of cycloheximide and heat shock protein inhibitors during infection, suggesting that the expression of an IAV-encoded protein is responsible for JNK activation. Indeed, the overexpression of nonstructural protein 1 (NS1) of certain IAV subtypes activated JNK, whereas those of some other subtypes failed to activate JNK. Site-directed mutagenesis experiments using NS1 of the IAV H7N7, H5N1, and H3N2 subtypes identified the amino acid residue phenylalanine (F) at position 103 to be decisive for JNK activation. Cleavage- and polyadenylation-specific factor 30 (CPSF30), whose binding to NS1 is stabilized by the amino acids F103 and M106, is not involved in JNK activation. Conclusively, subtype-specific sequence variations in the IAV NS1 protein result in subtype-specific differences in JNK signaling upon IAV infection. Influenza A virus (IAV) infection leads to the activation or modulation of multiple signaling pathways. Here, we demonstrate for the first time that the c-jun N-terminal kinase (JNK), a long-known stress-activated mitogen-activated protein (MAP) kinase, is activated by RIG-I when cells are treated with IAV RNA. However, at the same time, nonstructural protein 1 (NS1) of IAV has an intrinsic JNK-activating property that is dependent on IAV subtype-specific amino acid variations around position 103. Our findings identify two different and independent pathways that result in the activation

  1. Activation of c-jun N-Terminal Kinase upon Influenza A Virus (IAV) Infection Is Independent of Pathogen-Related Receptors but Dependent on Amino Acid Sequence Variations of IAV NS1

    PubMed Central

    Nacken, Wolfgang; Anhlan, Darisuren; Hrincius, Eike R.; Mostafa, Ahmed; Wolff, Thorsten; Sadewasser, Anne; Pleschka, Stephan; Ehrhardt, Christina

    2014-01-01

    ABSTRACT A hallmark cell response to influenza A virus (IAV) infections is the phosphorylation and activation of c-jun N-terminal kinase (JNK). However, so far it is not fully clear which molecules are involved in the activation of JNK upon IAV infection. Here, we report that the transfection of influenza viral-RNA induces JNK in a retinoic acid-inducible gene I (RIG-I)-dependent manner. However, neither RIG-I-like receptors nor MyD88-dependent Toll-like receptors were found to be involved in the activation of JNK upon IAV infection. Viral JNK activation may be blocked by addition of cycloheximide and heat shock protein inhibitors during infection, suggesting that the expression of an IAV-encoded protein is responsible for JNK activation. Indeed, the overexpression of nonstructural protein 1 (NS1) of certain IAV subtypes activated JNK, whereas those of some other subtypes failed to activate JNK. Site-directed mutagenesis experiments using NS1 of the IAV H7N7, H5N1, and H3N2 subtypes identified the amino acid residue phenylalanine (F) at position 103 to be decisive for JNK activation. Cleavage- and polyadenylation-specific factor 30 (CPSF30), whose binding to NS1 is stabilized by the amino acids F103 and M106, is not involved in JNK activation. Conclusively, subtype-specific sequence variations in the IAV NS1 protein result in subtype-specific differences in JNK signaling upon IAV infection. IMPORTANCE Influenza A virus (IAV) infection leads to the activation or modulation of multiple signaling pathways. Here, we demonstrate for the first time that the c-jun N-terminal kinase (JNK), a long-known stress-activated mitogen-activated protein (MAP) kinase, is activated by RIG-I when cells are treated with IAV RNA. However, at the same time, nonstructural protein 1 (NS1) of IAV has an intrinsic JNK-activating property that is dependent on IAV subtype-specific amino acid variations around position 103. Our findings identify two different and independent pathways that

  2. Conservation of Shannon's redundancy for proteins. [information theory applied to amino acid sequences

    NASA Technical Reports Server (NTRS)

    Gatlin, L. L.

    1974-01-01

    Concepts of information theory are applied to examine various proteins in terms of their redundancy in natural originators such as animals and plants. The Monte Carlo method is used to derive information parameters for random protein sequences. Real protein sequence parameters are compared with the standard parameters of protein sequences having a specific length. The tendency of a chain to contain some amino acids more frequently than others and the tendency of a chain to contain certain amino acid pairs more frequently than other pairs are used as randomness measures of individual protein sequences. Non-periodic proteins are generally found to have random Shannon redundancies except in cases of constraints due to short chain length and genetic codes. Redundant characteristics of highly periodic proteins are discussed. A degree of periodicity parameter is derived.

  3. Magnetic susceptibility variations in Loess sequences and their relationship to astronomical forcing

    NASA Technical Reports Server (NTRS)

    Verosub, Kenneth L.; Singer, Michael J.

    1992-01-01

    The long, well-exposed and often continuous sequences of loess found throughout the world are generally thought to provide an excellent opportunity for studying long-term, large-scale environmental change during the last few million years. In recent years, the most fruitful loess studies have been those involving the deposits of the loess in China. One of the most intriguing results of that work has been the discovery of an apparent correlation between variations in the magnetic susceptibility of the loess sequence and the oxygen isotope record of the deep sea. This correlation implies that magnetic susceptibility variations are being driven by astronomical parameters. However, the basic data have been interpreted in various ways by different authors, most of whom assumed that the magnetic minerals in the loess have not been affected by post-depositional processes. Using a chemical extraction procedure that allows us to separate the contribution of secondary pedogenic magnetic minerals from primary inherited magnetic minerals, we have found that the magnetic susceptibility of the Chinese paleosols is largely due to a pedogenic component which is present to a lesser degree in the loess. We have also found that the smaller inherited component of the magnetic susceptibility is about the same in the paleosols and the loess. These results demonstrate the need for additional study of the processes that create magnetic susceptibility variations in order to interpret properly the role of astronomical forcing in producing these variations.

  4. Overdispersion of the Molecular Clock: Temporal Variation of Gene-Specific Substitution Rates in Drosophila

    PubMed Central

    Hartl, Daniel L.

    2008-01-01

    Simple models of molecular evolution assume that sequences evolve by a Poisson process in which nucleotide or amino acid substitutions occur as rare independent events. In these models, the expected ratio of the variance to the mean of substitution counts equals 1, and substitution processes with a ratio greater than 1 are called overdispersed. Comparing the genomes of 10 closely related species of Drosophila, we extend earlier evidence for overdispersion in amino acid replacements as well as in four-fold synonymous substitutions. The observed deviation from the Poisson expectation can be described as a linear function of the rate at which substitutions occur on a phylogeny, which implies that deviations from the Poisson expectation arise from gene-specific temporal variation in substitution rates. Amino acid sequences show greater temporal variation in substitution rates than do four-fold synonymous sequences. Our findings provide a general phenomenological framework for understanding overdispersion in the molecular clock. Also, the presence of substantial variation in gene-specific substitution rates has broad implications for work in phylogeny reconstruction and evolutionary rate estimation. PMID:18480070

  5. Mitochondrial DNA variation and phylogenetic relationships among five tuna species based on sequencing of D-loop region.

    PubMed

    Kumar, Girish; Kocour, Martin; Kunal, Swaraj Priyaranjan

    2016-05-01

    In order to assess the DNA sequence variation and phylogenetic relationship among five tuna species (Auxis thazard, Euthynnus affinis, Katsuwonus pelamis, Thunnus tonggol, and T. albacares) out of all four tuna genera, partial sequences of the mitochondrial DNA (mtDNA) D-loop region were analyzed. The estimate of intra-specific sequence variation in studied species was low, ranging from 0.027 to 0.080 [Kimura's two parameter distance (K2P)], whereas values of inter-specific variation ranged from 0.049 to 0.491. The longtail tuna (T. tonggol) and yellowfin tuna (T. albacares) were found to share a close relationship (K2P = 0.049) while skipjack tuna (K. pelamis) was most divergent studied species. Phylogenetic analysis using Maximum-Likelihood (ML) and Neighbor-Joining (NJ) methods supported the monophyletic origin of Thunnus species. Similarly, phylogeny of Auxis and Euthynnus species substantiate the monophyly. However, results showed a distinct origin of K. pelamis from genus Thunnus as well as Auxis and Euthynnus. Thus, the mtDNA D-loop region sequence data supports the polyphyletic origin of tuna species.

  6. Complete Amino Acid Sequence of a Copper/Zinc-Superoxide Dismutase from Ginger Rhizome.

    PubMed

    Nishiyama, Yuki; Fukamizo, Tamo; Yoneda, Kazunari; Araki, Tomohiro

    2017-04-01

    Superoxide dismutase (SOD) is an antioxidant enzyme protecting cells from oxidative stress. Ginger (Zingiber officinale) is known for its antioxidant properties, however, there are no data on SODs from ginger rhizomes. In this study, we purified SOD from the rhizome of Z. officinale (Zo-SOD) and determined its complete amino acid sequence using N terminal sequencing, amino acid analysis, and de novo sequencing by tandem mass spectrometry. Zo-SOD consists of 151 amino acids with two signature Cu/Zn-SOD motifs and has high similarity to other plant Cu/Zn-SODs. Multiple sequence alignment showed that Cu/Zn-binding residues and cysteines forming a disulfide bond, which are highly conserved in Cu/Zn-SODs, are also present in Zo-SOD. Phylogenetic analysis revealed that plant Cu/Zn-SODs clustered into distinct chloroplastic, cytoplasmic, and intermediate groups. Among them, only chloroplastic enzymes carried amino acid substitutions in the region functionally important for enzymatic activity, suggesting that chloroplastic SODs may have a function distinct from those of SODs localized in other subcellular compartments. The nucleotide sequence of the Zo-SOD coding region was obtained by reverse-translation, and the gene was synthesized, cloned, and expressed. The recombinant Zo-SOD demonstrated pH stability in the range of 5-10, which is similar to other reported Cu/Zn-SODs, and thermal stability in the range of 10-60 °C, which is higher than that for most plant Cu/Zn-SODs but lower compared to the enzyme from a Z. officinale relative Curcuma aromatica.

  7. Protein location prediction using atomic composition and global features of the amino acid sequence

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cherian, Betsy Sheena, E-mail: betsy.skb@gmail.com; Nair, Achuthsankar S.

    2010-01-22

    Subcellular location of protein is constructive information in determining its function, screening for drug candidates, vaccine design, annotation of gene products and in selecting relevant proteins for further studies. Computational prediction of subcellular localization deals with predicting the location of a protein from its amino acid sequence. For a computational localization prediction method to be more accurate, it should exploit all possible relevant biological features that contribute to the subcellular localization. In this work, we extracted the biological features from the full length protein sequence to incorporate more biological information. A new biological feature, distribution of atomic composition is effectivelymore » used with, multiple physiochemical properties, amino acid composition, three part amino acid composition, and sequence similarity for predicting the subcellular location of the protein. Support Vector Machines are designed for four modules and prediction is made by a weighted voting system. Our system makes prediction with an accuracy of 100, 82.47, 88.81 for self-consistency test, jackknife test and independent data test respectively. Our results provide evidence that the prediction based on the biological features derived from the full length amino acid sequence gives better accuracy than those derived from N-terminal alone. Considering the features as a distribution within the entire sequence will bring out underlying property distribution to a greater detail to enhance the prediction accuracy.« less

  8. Major histocompatibility complex variation in the endangered Przewalski's horse.

    PubMed Central

    Hedrick, P W; Parker, K M; Miller, E L; Miller, P S

    1999-01-01

    The major histocompatibility complex (MHC) is a fundamental part of the vertebrate immune system, and the high variability in many MHC genes is thought to play an essential role in recognition of parasites. The Przewalski's horse is extinct in the wild and all the living individuals descend from 13 founders, most of whom were captured around the turn of the century. One of the primary genetic concerns in endangered species is whether they have ample adaptive variation to respond to novel selective factors. In examining 14 Przewalski's horses that are broadly representative of the living animals, we found six different class II DRB major histocompatibility sequences. The sequences showed extensive nonsynonymous variation, concentrated in the putative antigen-binding sites, and little synonymous variation. Individuals had from two to four sequences as determined by single-stranded conformation polymorphism (SSCP) analysis. On the basis of the SSCP data, phylogenetic analysis of the nucleotide sequences, and segregation in a family group, we conclude that four of these sequences are from one gene (although one sequence codes for a nonfunctional allele because it contains a stop codon) and two other sequences are from another gene. The position of the stop codon is at the same amino-acid position as in a closely related sequence from the domestic horse. Because other organisms have extensive variation at homologous loci, the Przewalski's horse may have quite low variation in this important adaptive region. PMID:10430594

  9. Genetic variation in Dip5, an amino acid permease, and Pdr5, a multiple drug transporter, regulates glyphosate resistance in S. cerevisiae.

    PubMed

    Rong-Mullins, Xiaoqing; Ravishankar, Apoorva; McNeal, Kirsten A; Lonergan, Zachery R; Biega, Audrey C; Creamer, J Philip; Gallagher, Jennifer E G

    2017-01-01

    S. cerevisiae from different environments are subject to a wide range of selective pressures, whether intentional or by happenstance. Chemicals classified by their application, such as herbicides, fungicides and antibiotics, can affect non-target organisms. First marketed as RoundUp™, glyphosate is the most widely used herbicide. In plants, glyphosate inhibits EPSPS, of the shikimate pathway, which is present in many organisms but lacking in mammals. The shikimate pathway produces chorismate which is the precursor to all the aromatic amino acids, para-aminobenzoic acid, and Coenzyme Q10. Crops engineered to be resistant to glyphosate contain a homolog of EPSPS that is not bound by glyphosate. Here, we show that S. cerevisiae has a wide-range of glyphosate resistance. Sequence comparison between the target proteins, i.e., the plant EPSPS and the yeast orthologous protein Aro1, predicted that yeast would be resistant to glyphosate. However, the growth variation seen in the subset of yeast tested was not due to polymorphisms within Aro1, instead, it was caused by genetic variation in an ABC multiple drug transporter, Pdr5, and an amino acid permease, Dip5. Using genetic variation as a probe into glyphosate response, we uncovered mechanisms that contribute to the transportation of glyphosate in and out of the cell. Taking advantage of the natural genetic variation within yeast and measuring growth under different conditions that would change the use of the shikimate pathway, we uncovered a general transport mechanism of glyphosate into eukaryotic cells.

  10. Mapping copy number variation by population-scale genome sequencing.

    PubMed

    Mills, Ryan E; Walter, Klaudia; Stewart, Chip; Handsaker, Robert E; Chen, Ken; Alkan, Can; Abyzov, Alexej; Yoon, Seungtai Chris; Ye, Kai; Cheetham, R Keira; Chinwalla, Asif; Conrad, Donald F; Fu, Yutao; Grubert, Fabian; Hajirasouliha, Iman; Hormozdiari, Fereydoun; Iakoucheva, Lilia M; Iqbal, Zamin; Kang, Shuli; Kidd, Jeffrey M; Konkel, Miriam K; Korn, Joshua; Khurana, Ekta; Kural, Deniz; Lam, Hugo Y K; Leng, Jing; Li, Ruiqiang; Li, Yingrui; Lin, Chang-Yun; Luo, Ruibang; Mu, Xinmeng Jasmine; Nemesh, James; Peckham, Heather E; Rausch, Tobias; Scally, Aylwyn; Shi, Xinghua; Stromberg, Michael P; Stütz, Adrian M; Urban, Alexander Eckehart; Walker, Jerilyn A; Wu, Jiantao; Zhang, Yujun; Zhang, Zhengdong D; Batzer, Mark A; Ding, Li; Marth, Gabor T; McVean, Gil; Sebat, Jonathan; Snyder, Michael; Wang, Jun; Ye, Kenny; Eichler, Evan E; Gerstein, Mark B; Hurles, Matthew E; Lee, Charles; McCarroll, Steven A; Korbel, Jan O

    2011-02-03

    Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.

  11. Human retroviruses and AIDS 1996. A compilation and analysis of nucleic acid and amino acid sequences

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Myers, G.; Foley, B.; Korber, B.

    1997-04-01

    This compendium and the accompanying floppy diskettes are the result of an effort to compile and rapidly publish all relevant molecular data concerning the human immunodeficiency viruses (HIV) and related retroviruses. The scope of the compendium and database is best summarized by the five parts that it comprises: (1) Nuclear Acid Alignments and Sequences; (2) Amino Acid Alignments; (3) Analysis; (4) Related Sequences; and (5) Database Communications. Information within all the parts is updated throughout the year on the Web site, http://hiv-web.lanl.gov. While this publication could take the form of a review or sequence monograph, it is not so conceived.more » Instead, the literature from which the database is derived has simply been summarized and some elementary computational analyses have been performed upon the data. Interpretation and commentary have been avoided insofar as possible so that the reader can form his or her own judgments concerning the complex information. In addition to the general descriptions of the parts of the compendium, the user should read the individual introductions for each part.« less

  12. Streptococcal phosphoenolpyruvate-sugar phosphotransferase system: amino acid sequence and site of ATP-dependent phosphorylation of HPr

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Deutscher, J.; Pevec, B.; Beyreuther, K.

    1986-10-21

    The amino acid sequence of histidine-containing protein (HPr) from Streptococcus faecalis has been determined by direct Edman degradation of intact HPr and by amino acid sequence analysis of tryptic peptides, V8 proteolyptic peptides, thermolytic peptides, and cyanogen bromide cleavage products. HPr from S. faecalis was found to contain 89 amino acid residues, corresponding to a molecular weight of 9438. The amino acid sequence of HPr from S. faecalis shows extended homology to the primary structure of HPr proteins from other bacteria. Besides the phosphoenolpyruvate-dependent phosphorylation of a histidyl residue in HPr, catalyzed by enzyme I of the bacterial phosphotransferase system,more » HPr was also found to be phosphorylated at a seryl residue in an ATP-dependent protein kinase catalyzed reaction. The site of ATP-dependent phosphorylation in HPr of S faecalis has now been determined. (/sup 32/P)P-Ser-HPr was digested with three different proteases, and in each case, a single labeled peptide was isolated. Following digestion with subtilisin, they obtained a peptide with the sequence -(P)Ser-Ile-Met-. Using chymotrypsin, they isolated a peptide with the sequence -Ser-Val-Asn-Leu-Lys-(P)Ser-Ile-Met-Gly-Val-Met-. The longest labeled peptide was obtained with V8 staphylococcal protease. According to amino acid analysis, this peptide contained 36 out of the 89 amino acid residues of HPr. The following sequence of 12 amino acid residues of the V8 peptide was determined: -Tyr-Lys-Gly-Lys-Ser-Val-Asn-Leu-Lys-(P)Ser-Ile-Met-. Thus, the site of ATP-dependent phosphorylation was determined to be Ser-46 within the primary structure of HPr.« less

  13. Mitochondrial DNA Sequence Variation in North Atlantic Long-Finned Pilot Whales, Globicephala melas

    DTIC Science & Technology

    1994-06-01

    Delphinapterus leucas : mitochondrial DNA sequence variation within and among North American populations. M.Sc. thesis. McMaster University. Brown, G.G...Delphinapteras leucas ) (Brennin 1992), minke whales {Balaenoptera acutorostratd) (Wada et al. 1991), bottlenose dolphins {Tursiops truncatus) (Dowling & Brown

  14. "De-novo" amino acid sequence elucidation of protein G'e by combined "top-down" and "bottom-up" mass spectrometry.

    PubMed

    Yefremova, Yelena; Al-Majdoub, Mahmoud; Opuni, Kwabena F M; Koy, Cornelia; Cui, Weidong; Yan, Yuetian; Gross, Michael L; Glocker, Michael O

    2015-03-01

    Mass spectrometric de-novo sequencing was applied to review the amino acid sequence of a commercially available recombinant protein G´ with great scientific and economic importance. Substantial deviations to the published amino acid sequence (Uniprot Q54181) were found by the presence of 46 additional amino acids at the N-terminus, including a so-called "His-tag" as well as an N-terminal partial α-N-gluconoylation and α-N-phosphogluconoylation, respectively. The unexpected amino acid sequence of the commercial protein G' comprised 241 amino acids and resulted in a molecular mass of 25,998.9 ± 0.2 Da for the unmodified protein. Due to the higher mass that is caused by its extended amino acid sequence compared with the original protein G' (185 amino acids), we named this protein "protein G'e." By means of mass spectrometric peptide mapping, the suggested amino acid sequence, as well as the N-terminal partial α-N-gluconoylations, was confirmed with 100% sequence coverage. After the protein G'e sequence was determined, we were able to determine the expression vector pET-28b from Novagen with the Xho I restriction enzyme cleavage site as the best option that was used for cloning and expressing the recombinant protein G'e in E. coli. A dissociation constant (K(d)) value of 9.4 nM for protein G'e was determined thermophoretically, showing that the N-terminal flanking sequence extension did not cause significant changes in the binding affinity to immunoglobulins.

  15. The Intolerance of Regulatory Sequence to Genetic Variation Predicts Gene Dosage Sensitivity

    PubMed Central

    Wang, Quanli; Halvorsen, Matt; Han, Yujun; Weir, William H.; Allen, Andrew S.; Goldstein, David B.

    2015-01-01

    Noncoding sequence contains pathogenic mutations. Yet, compared with mutations in protein-coding sequence, pathogenic regulatory mutations are notoriously difficult to recognize. Most fundamentally, we are not yet adept at recognizing the sequence stretches in the human genome that are most important in regulating the expression of genes. For this reason, it is difficult to apply to the regulatory regions the same kinds of analytical paradigms that are being successfully applied to identify mutations among protein-coding regions that influence risk. To determine whether dosage sensitive genes have distinct patterns among their noncoding sequence, we present two primary approaches that focus solely on a gene’s proximal noncoding regulatory sequence. The first approach is a regulatory sequence analogue of the recently introduced residual variation intolerance score (RVIS), termed noncoding RVIS, or ncRVIS. The ncRVIS compares observed and predicted levels of standing variation in the regulatory sequence of human genes. The second approach, termed ncGERP, reflects the phylogenetic conservation of a gene’s regulatory sequence using GERP++. We assess how well these two approaches correlate with four gene lists that use different ways to identify genes known or likely to cause disease through changes in expression: 1) genes that are known to cause disease through haploinsufficiency, 2) genes curated as dosage sensitive in ClinGen’s Genome Dosage Map, 3) genes judged likely to be under purifying selection for mutations that change expression levels because they are statistically depleted of loss-of-function variants in the general population, and 4) genes judged unlikely to cause disease based on the presence of copy number variants in the general population. We find that both noncoding scores are highly predictive of dosage sensitivity using any of these criteria. In a similar way to ncGERP, we assess two ensemble-based predictors of regional noncoding importance

  16. Sequence variations of the alpha-globin genes: scanning of high CG content genes with DHPLC and DG-DGGE.

    PubMed

    Lacerra, Giuseppina; Fiorito, Mirella; Musollino, Gennaro; Di Noce, Francesca; Esposito, Maria; Nigro, Vincenzo; Gaudiano, Carlo; Carestia, Clementina

    2004-10-01

    The alpha-globin chains are encoded by two duplicated genes (HBA2 and HBA1, 5'-3') showing overall sequence homology >96% and average CG content >60%. alpha-Thalassemia, the most prevalent worldwide autosomal recessive disorder, is a hereditary anemia caused by sequence variations of these genes in about 25% of carriers. We evaluated the overall sensitivity and suitability of DHPLC and DG-DGGE in scanning both the alpha-globin genes by carrying out a retrospective analysis of 19 variant alleles in 29 genotypes. The HBA2 alleles c.1A>G, c.79G>A, and c.281T>G, and the HBA1 allele c.475C>A were new. Three pathogenic sequence variations were associated in cis with nonpathogenic variations in all families studied; they were the HBA2 variation c.2T>C associated with c.-24C>G, and the HBA2 variations c.391G>C and c.427T>C, both associated with c.565G>A. We set up original experimental conditions for DHPLC and DG-DGGE and analyzed 10 normal subjects, 46 heterozygotes, seven homozygotes, seven compound heterozygotes, and six compound heterozygotes for a hybrid gene. Both the methodologies gave reproducible results and no false-positive was detected. DHPLC showed 100% sensitivity and DG-DGGE nearly 90%. About 100% of the sequence from the cap site to the polyA addition site could be scanned by DHPLC, about 87% by DG-DGGE. It is noteworthy that the three most common pathogenic sequence variations (HBA2 alleles c.2T>C, c.95+2_95+6del, and c.523A>G) were unambiguously detected by both the methodologies. Genotype diagnosis must be confirmed with PCR sequencing of single amplicons or with an allele-specific method. This study can be helpful for scanning genes with high CG content and offers a model suitable for duplicated genes with high homology. Copyright 2004 Wiley-Liss, Inc.

  17. Re-sequencing and genetic variation identification of a rice line with ideal plant architecture.

    PubMed

    Li, Shuangcheng; Xie, Kailong; Li, Wenbo; Zou, Ting; Ren, Yun; Wang, Shiquan; Deng, Qiming; Zheng, Aiping; Zhu, Jun; Liu, Huainian; Wang, Lingxia; Ai, Peng; Gao, Fengyan; Huang, Bin; Cao, Xuemei; Li, Ping

    2012-12-01

    The ideal plant architecture (IPA) includes several important characteristics such as low tiller numbers, few or no unproductive tillers, more grains per panicle, and thick and sturdy stems. We have developed an indica restorer line 7302R that displays the IPA phenotype in terms of tiller number, grain number, and stem strength. However, its mechanism had to be clarified. We performed re-sequencing and genome-wide variation analysis of 7302R using the Solexa sequencing technology. With the genomic sequence of the indica cultivar 9311 as reference, 307 627 SNPs, 57 372 InDels, and 3 096 SVs were identified in the 7302R genome. The 7302R-specific variations were investigated via the synteny analysis of all the SNPs of 7302R with those of the previous sequenced none-IPA-type lines IR24, MH63, and SH527. Moreover, we found 178 168 7302R-specific SNPs across the whole genome and 30 239 SNPs in the predicted mRNA regions, among which 8 517 were Non-syn CDS. In addition, 263 large-effect SNPs that were expected to affect the integrity of encoded proteins were identified from the 7302R-specific SNPs. SNPs of several important previously cloned rice genes were also identified by aligning the 7302R sequence with other sequence lines. Our results provided several candidates account for the IPA phenotype of 7302R. These results therefore lay the groundwork for long-term efforts to uncover important genes and alleles for rice plant architecture construction, also offer useful data resources for future genetic and genomic studies in rice.

  18. Whole Genome Re-Sequencing and Characterization of Powdery Mildew Disease-Associated Allelic Variation in Melon.

    PubMed

    Natarajan, Sathishkumar; Kim, Hoy-Taek; Thamilarasan, Senthil Kumar; Veerappan, Karpagam; Park, Jong-In; Nou, Ill-Sup

    2016-01-01

    Powdery mildew is one of the most common fungal diseases in the world. This disease frequently affects melon (Cucumis melo L.) and other Cucurbitaceous family crops in both open field and greenhouse cultivation. One of the goals of genomics is to identify the polymorphic loci responsible for variation in phenotypic traits. In this study, powdery mildew disease assessment scores were calculated for four melon accessions, 'SCNU1154', 'Edisto47', 'MR-1', and 'PMR5'. To investigate the genetic variation of these accessions, whole genome re-sequencing using the Illumina HiSeq 2000 platform was performed. A total of 754,759,704 quality-filtered reads were generated, with an average of 82.64% coverage relative to the reference genome. Comparisons of the sequences for the melon accessions revealed around 7.4 million single nucleotide polymorphisms (SNPs), 1.9 million InDels, and 182,398 putative structural variations (SVs). Functional enrichment analysis of detected variations classified them into biological process, cellular component and molecular function categories. Further, a disease-associated QTL map was constructed for 390 SNPs and 45 InDels identified as related to defense-response genes. Among them 112 SNPs and 12 InDels were observed in powdery mildew responsive chromosomes. Accordingly, this whole genome re-sequencing study identified SNPs and InDels associated with defense genes that will serve as candidate polymorphisms in the search for sources of resistance against powdery mildew disease and could accelerate marker-assisted breeding in melon.

  19. Synthetic oligonucleotide probes deduced from amino acid sequence data. Theoretical and practical considerations.

    PubMed

    Lathe, R

    1985-05-05

    Synthetic probes deduced from amino acid sequence data are widely used to detect cognate coding sequences in libraries of cloned DNA segments. The redundancy of the genetic code dictates that a choice must be made between (1) a mixture of probes reflecting all codon combinations, and (2) a single longer "optimal" probe. The second strategy is examined in detail. The frequency of sequences matching a given probe by chance alone can be determined and also the frequency of sequences closely resembling the probe and contributing to the hybridization background. Gene banks cannot be treated as random associations of the four nucleotides, and probe sequences deduced from amino acid sequence data occur more often than predicted by chance alone. Probe lengths must be increased to confer the necessary specificity. Examination of hybrids formed between unique homologous probes and their cognate targets reveals that short stretches of perfect homology occurring by chance make a significant contribution to the hybridization background. Statistical methods for improving homology are examined, taking human coding sequences as an example, and considerations of codon utilization and dinucleotide frequencies yield an overall homology of greater than 82%. Recommendations for probe design and hybridization are presented, and the choice between using multiple probes reflecting all codon possibilities and a unique optimal probe is discussed.

  20. LPA and PLG sequence variation and kringle IV-2 copy number in two populations.

    PubMed

    Crawford, Dana C; Peng, Ze; Cheng, Jan-Fang; Boffelli, Dario; Ahearn, Magdalena; Nguyen, Dan; Shaffer, Tristan; Yi, Qian; Livingston, Robert J; Rieder, Mark J; Nickerson, Deborah A

    2008-01-01

    Lp(a) levels have long been recognized as a potential risk factor for coronary heart disease that is almost completely under genetic control. Much of the genetics impacting Lp(a) levels has been attributed to the highly polymorphic LPA kringle IV-2 copy number variant, and most of the variance in Lp(a) levels in populations of European-descent is inversely correlated with kringle IV copy number. However, less of the variance is explained in African-descent populations for the same structural variation. African-descent populations have, on average, higher levels of Lp(a), suggesting other genetic factors contribute to Lp(a) level variability across populations. To identify potential cis-acting factors, we re-sequenced the gene LPA for single nucleotide polymorphism (SNP) discovery in 23 European-Americans and 24 African-Americans. We also re- sequenced the neighboring gene plasminogen (PLG) and genotyped the kringle IV copy number variant in the same reference samples. These data are the most comprehensive description of sequence variation in LPA and its relationship with the kringle IV copy number variant. With these data, we demonstrate that only a fraction of LPA sequence diversity has been previously documented. Also, we identify several high frequency SNPs present in the African-American sample but absent in the European-American sample. Finally, we show that SNPs within PLG are not in linkage disequilibrium with SNPs in LPA, and we show that kringle IV copy number variation is not in linkage disequilibrium with either LPA or PLG SNPs. Together, these data suggest that LPA SNPs could independently contribute to Lp(a) levels in the general population. Copyright (c) 2008 S. Karger AG, Basel.

  1. Hydroxamic acids as weak base indicators: protonation in strong acid media.

    PubMed

    García, B; Ibeas, S; Hoyuelos, F J; Leal, J M; Secco, F; Venturini, M

    2001-11-30

    The protonation equilibria of N-phenylbenzohydroxamic, benzohydroxamic, salicylhydroxamic, and N-p-tolylcinnamohydroxamic acids have been studied at 25 degrees C in concentrated sulfuric, hydrochloric, and perchloric acid media; the UV-vis spectral measurements were analyzed using the Hammett equation and the Bunnett-Olsen and excess acidity methods. The medium effects observed in the UV spectral curves were corrected with the Cox-Yates and vector analysis methods. The H(A) acidity function based on benzamides provided the best results. The range of variation of the solvation coefficient m is similar to that of amides, this indicating similar solvation requirements for amides and hydroxamic acids. For the same substrate, the observed variations of pK(BH)(+) with the mineral acid used was justified by formation of solvent-separated ion pairs; for the same mineral acid, the observed changes in pK(BH)(+) can be explained by the solvation of BH(+). The change of the pK(BH)(+) values was in reasonably good agreement with the sequence of the catalytic efficiency of the mineral acids used, HCl > H(2)SO(4) > HClO(4).

  2. Constitutional sequence variation in the Fanconi anaemia group C (FANCC) gene in childhood acute myeloid leukaemia.

    PubMed

    Barber, Lisa M; McGrath, Helen E N; Meyer, Stefan; Will, Andrew M; Birch, Jillian M; Eden, Osborn B; Taylor, G Malcolm

    2003-04-01

    The extent to which genetic susceptibility contributes to the causation of childhood acute myeloid leukaemia (AML) is not known. The inherited bone marrow failure disorder Fanconi anaemia (FA) carries a substantially increased risk of AML, raising the possibility that constitutional variation in the FA (FANC) genes is involved in the aetiology of childhood AML. We have screened genomic DNA extracted from remission blood samples of 97 children with sporadic AML and 91 children with sporadic acute lymphoblastic leukaemia (ALL), together with 104 cord blood DNA samples from newborn children, for variations in the Fanconi anaemia group C (FANCC) gene. We found no evidence of known FANCC pathogenic mutations in children with AML, ALL or in the cord blood samples. However, we detected 12 different FANCC sequence variants, of which five were novel to this study. Among six FANCC variants leading to amino-acid substitutions, one (S26F) was present at a fourfold greater frequency in children with AML than in the cord blood samples (odds ratio: 4.09, P = 0.047; 95% confidence interval 1.08-15.54). Our results thus do not exclude the possibility that this polymorphic variant contributes to the risk of a small proportion of childhood AML.

  3. Complete cDNA sequence and amino acid analysis of a bovine ribonuclease K6 gene.

    PubMed

    Pietrowski, D; Förster, M

    2000-01-01

    The complete cDNA sequence of a ribonuclease k6 gene of Bos Taurus has been determined. It codes for a protein with 154 amino acids and contains the invariant cysteine, histidine and lysine residues as well as the characteristic motifs specific to ribonuclease active sites. The deduced protein sequence is 27 residues longer than other known ribonucleases k6 and shows amino acids exchanges which could reflect a strain specificity or polymorphism within the bovine genome. Based on sequence similarity we have termed the identified gene bovine ribonuclease k6 b (brk6b).

  4. Computational analysis of sequence selection mechanisms.

    PubMed

    Meyerguz, Leonid; Grasso, Catherine; Kleinberg, Jon; Elber, Ron

    2004-04-01

    Mechanisms leading to gene variations are responsible for the diversity of species and are important components of the theory of evolution. One constraint on gene evolution is that of protein foldability; the three-dimensional shapes of proteins must be thermodynamically stable. We explore the impact of this constraint and calculate properties of foldable sequences using 3660 structures from the Protein Data Bank. We seek a selection function that receives sequences as input, and outputs survival probability based on sequence fitness to structure. We compute the number of sequences that match a particular protein structure with energy lower than the native sequence, the density of the number of sequences, the entropy, and the "selection" temperature. The mechanism of structure selection for sequences longer than 200 amino acids is approximately universal. For shorter sequences, it is not. We speculate on concrete evolutionary mechanisms that show this behavior.

  5. Genetic variation in Dip5, an amino acid permease, and Pdr5, a multiple drug transporter, regulates glyphosate resistance in S. cerevisiae

    PubMed Central

    McNeal, Kirsten A.; Lonergan, Zachery R.; Biega, Audrey C.; Creamer, J. Philip

    2017-01-01

    S. cerevisiae from different environments are subject to a wide range of selective pressures, whether intentional or by happenstance. Chemicals classified by their application, such as herbicides, fungicides and antibiotics, can affect non-target organisms. First marketed as RoundUp™, glyphosate is the most widely used herbicide. In plants, glyphosate inhibits EPSPS, of the shikimate pathway, which is present in many organisms but lacking in mammals. The shikimate pathway produces chorismate which is the precursor to all the aromatic amino acids, para-aminobenzoic acid, and Coenzyme Q10. Crops engineered to be resistant to glyphosate contain a homolog of EPSPS that is not bound by glyphosate. Here, we show that S. cerevisiae has a wide-range of glyphosate resistance. Sequence comparison between the target proteins, i.e., the plant EPSPS and the yeast orthologous protein Aro1, predicted that yeast would be resistant to glyphosate. However, the growth variation seen in the subset of yeast tested was not due to polymorphisms within Aro1, instead, it was caused by genetic variation in an ABC multiple drug transporter, Pdr5, and an amino acid permease, Dip5. Using genetic variation as a probe into glyphosate response, we uncovered mechanisms that contribute to the transportation of glyphosate in and out of the cell. Taking advantage of the natural genetic variation within yeast and measuring growth under different conditions that would change the use of the shikimate pathway, we uncovered a general transport mechanism of glyphosate into eukaryotic cells. PMID:29155836

  6. Modeling bias and variation in the stochastic processes of small RNA sequencing

    PubMed Central

    Etheridge, Alton; Sakhanenko, Nikita; Galas, David

    2017-01-01

    Abstract The use of RNA-seq as the preferred method for the discovery and validation of small RNA biomarkers has been hindered by high quantitative variability and biased sequence counts. In this paper we develop a statistical model for sequence counts that accounts for ligase bias and stochastic variation in sequence counts. This model implies a linear quadratic relation between the mean and variance of sequence counts. Using a large number of sequencing datasets, we demonstrate how one can use the generalized additive models for location, scale and shape (GAMLSS) distributional regression framework to calculate and apply empirical correction factors for ligase bias. Bias correction could remove more than 40% of the bias for miRNAs. Empirical bias correction factors appear to be nearly constant over at least one and up to four orders of magnitude of total RNA input and independent of sample composition. Using synthetic mixes of known composition, we show that the GAMLSS approach can analyze differential expression with greater accuracy, higher sensitivity and specificity than six existing algorithms (DESeq2, edgeR, EBSeq, limma, DSS, voom) for the analysis of small RNA-seq data. PMID:28369495

  7. Negative Ion In-Source Decay Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry for Sequencing Acidic Peptides

    NASA Astrophysics Data System (ADS)

    McMillen, Chelsea L.; Wright, Patience M.; Cassady, Carolyn J.

    2016-05-01

    Matrix-assisted laser desorption/ionization (MALDI) in-source decay was studied in the negative ion mode on deprotonated peptides to determine its usefulness for obtaining extensive sequence information for acidic peptides. Eight biological acidic peptides, ranging in size from 11 to 33 residues, were studied by negative ion mode ISD (nISD). The matrices 2,5-dihydroxybenzoic acid, 2-aminobenzoic acid, 2-aminobenzamide, 1,5-diaminonaphthalene, 5-amino-1-naphthol, 3-aminoquinoline, and 9-aminoacridine were used with each peptide. Optimal fragmentation was produced with 1,5-diaminonphthalene (DAN), and extensive sequence informative fragmentation was observed for every peptide except hirudin(54-65). Cleavage at the N-Cα bond of the peptide backbone, producing c' and z' ions, was dominant for all peptides. Cleavage of the N-Cα bond N-terminal to proline residues was not observed. The formation of c and z ions is also found in electron transfer dissociation (ETD), electron capture dissociation (ECD), and positive ion mode ISD, which are considered to be radical-driven techniques. Oxidized insulin chain A, which has four highly acidic oxidized cysteine residues, had less extensive fragmentation. This peptide also exhibited the only charged localized fragmentation, with more pronounced product ion formation adjacent to the highly acidic residues. In addition, spectra were obtained by positive ion mode ISD for each protonated peptide; more sequence informative fragmentation was observed via nISD for all peptides. Three of the peptides studied had no product ion formation in ISD, but extensive sequence informative fragmentation was found in their nISD spectra. The results of this study indicate that nISD can be used to readily obtain sequence information for acidic peptides.

  8. Negative Ion In-Source Decay Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry for Sequencing Acidic Peptides.

    PubMed

    McMillen, Chelsea L; Wright, Patience M; Cassady, Carolyn J

    2016-05-01

    Matrix-assisted laser desorption/ionization (MALDI) in-source decay was studied in the negative ion mode on deprotonated peptides to determine its usefulness for obtaining extensive sequence information for acidic peptides. Eight biological acidic peptides, ranging in size from 11 to 33 residues, were studied by negative ion mode ISD (nISD). The matrices 2,5-dihydroxybenzoic acid, 2-aminobenzoic acid, 2-aminobenzamide, 1,5-diaminonaphthalene, 5-amino-1-naphthol, 3-aminoquinoline, and 9-aminoacridine were used with each peptide. Optimal fragmentation was produced with 1,5-diaminonphthalene (DAN), and extensive sequence informative fragmentation was observed for every peptide except hirudin(54-65). Cleavage at the N-Cα bond of the peptide backbone, producing c' and z' ions, was dominant for all peptides. Cleavage of the N-Cα bond N-terminal to proline residues was not observed. The formation of c and z ions is also found in electron transfer dissociation (ETD), electron capture dissociation (ECD), and positive ion mode ISD, which are considered to be radical-driven techniques. Oxidized insulin chain A, which has four highly acidic oxidized cysteine residues, had less extensive fragmentation. This peptide also exhibited the only charged localized fragmentation, with more pronounced product ion formation adjacent to the highly acidic residues. In addition, spectra were obtained by positive ion mode ISD for each protonated peptide; more sequence informative fragmentation was observed via nISD for all peptides. Three of the peptides studied had no product ion formation in ISD, but extensive sequence informative fragmentation was found in their nISD spectra. The results of this study indicate that nISD can be used to readily obtain sequence information for acidic peptides.

  9. Spatio-temporal Variations of Characteristic Repeating Earthquake Sequences along the Middle America Trench in Mexico

    NASA Astrophysics Data System (ADS)

    Dominguez, L. A.; Taira, T.; Hjorleifsdottir, V.; Santoyo, M. A.

    2015-12-01

    Repeating earthquake sequences are sets of events that are thought to rupture the same area on the plate interface and thus provide nearly identical waveforms. We systematically analyzed seismic records from 2001 through 2014 to identify repeating earthquakes with highly correlated waveforms occurring along the subduction zone of the Cocos plate. Using the correlation coefficient (cc) and spectral coherency (coh) of the vertical components as selection criteria, we found a set of 214 sequences whose waveforms exceed cc≥95% and coh≥95%. Spatial clustering along the trench shows large variations in repeating earthquakes activity. Particularly, the rupture zone of the M8.1, 1985 earthquake shows an almost absence of characteristic repeating earthquakes, whereas the Guerrero Gap zone and the segment of the trench close to the Guerrero-Oaxaca border shows a significantly larger number of repeating earthquakes sequences. Furthermore, temporal variations associated to stress changes due to major shows episodes of unlocking and healing of the interface. Understanding the different components that control the location and recurrence time of characteristic repeating sequences is a key factor to pinpoint areas where large megathrust earthquakes may nucleate and consequently to improve the seismic hazard assessment.

  10. Nucleotide and amino acid variations of tannase gene from different Aspergillus strains.

    PubMed

    Borrego-Terrazas, J A; Lara-Victoriano, F; Flores-Gallegos, A C; Veana, F; Aguilar, C N; Rodríguez-Herrera, R

    2014-08-01

    Tannase is an enzyme that catalyses the hydrolysis of ester bonds present in tannins. Most of the scientific reports about this biocatalysis focus on aspects related to tannase production and its recovery; on the other hand, reports assessing the molecular aspects of the tannase gene or protein are scarce. In the present study, a tannase gene fragment from several Aspergillus strains isolated from the Mexican semidesert was sequenced and compared with tannase amino acid sequences reported in NCBI database using bioinformatics tools. The genetic relationship among the different tannase sequences was also determined. A conserved region of 7 amino acids was found with the conserved motif GXSXG common to esterases, in which the active-site serine residue is located. In addition, in Aspergillus niger strains GH1 and PSH, we found an extra codon in the tannase sequences encoding glycine. The tannase gene belonging to semidesert fungal strains followed a neutral evolution path with the formation of 10 haplotypes, of which A. niger GH1 and PSH haplotypes are the oldest.

  11. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ...” means those amino acids other than “Xaa” and those nucleotide bases other than “n”defined in accordance... 37 Patents, Trademarks, and Copyrights 1 2014-07-01 2014-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences...

  12. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ...” means those amino acids other than “Xaa” and those nucleotide bases other than “n”defined in accordance... 37 Patents, Trademarks, and Copyrights 1 2013-07-01 2013-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences...

  13. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ...” means those amino acids other than “Xaa” and those nucleotide bases other than “n”defined in accordance... 37 Patents, Trademarks, and Copyrights 1 2012-07-01 2012-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences...

  14. Sequence Polymorphisms and Structural Variations among Four Grapevine (Vitis vinifera L.) Cultivars Representing Sardinian Agriculture

    PubMed Central

    Mercenaro, Luca; Nieddu, Giovanni; Porceddu, Andrea; Pezzotti, Mario; Camiolo, Salvatore

    2017-01-01

    The genetic diversity among grapevine (Vitis vinifera L.) cultivars that underlies differences in agronomic performance and wine quality reflects the accumulation of single nucleotide polymorphisms (SNPs) and small indels as well as larger genomic variations. A combination of high throughput sequencing and mapping against the grapevine reference genome allows the creation of comprehensive sequence variation maps. We used next generation sequencing and bioinformatics to generate an inventory of SNPs and small indels in four widely cultivated Sardinian grape cultivars (Bovale sardo, Cannonau, Carignano and Vermentino). More than 3,200,000 SNPs were identified with high statistical confidence. Some of the SNPs caused the appearance of premature stop codons and thus identified putative pseudogenes. The analysis of SNP distribution along chromosomes led to the identification of large genomic regions with uninterrupted series of homozygous SNPs. We used a digital comparative genomic hybridization approach to identify 6526 genomic regions with significant differences in copy number among the four cultivars compared to the reference sequence, including 81 regions shared between all four cultivars and 4953 specific to single cultivars (representing 1.2 and 75.9% of total copy number variation, respectively). Reads mapping at a distance that was not compatible with the insert size were used to identify a dataset of putative large deletions with cultivar Cannonau revealing the highest number. The analysis of genes mapping to these regions provided a list of candidates that may explain some of the phenotypic differences among the Bovale sardo, Cannonau, Carignano and Vermentino cultivars. PMID:28775732

  15. Full genome virus detection in fecal samples using sensitive nucleic acid preparation, deep sequencing, and a novel iterative sequence classification algorithm.

    PubMed

    Cotten, Matthew; Oude Munnink, Bas; Canuti, Marta; Deijs, Martin; Watson, Simon J; Kellam, Paul; van der Hoek, Lia

    2014-01-01

    We have developed a full genome virus detection process that combines sensitive nucleic acid preparation optimised for virus identification in fecal material with Illumina MiSeq sequencing and a novel post-sequencing virus identification algorithm. Enriched viral nucleic acid was converted to double-stranded DNA and subjected to Illumina MiSeq sequencing. The resulting short reads were processed with a novel iterative Python algorithm SLIM for the identification of sequences with homology to known viruses. De novo assembly was then used to generate full viral genomes. The sensitivity of this process was demonstrated with a set of fecal samples from HIV-1 infected patients. A quantitative assessment of the mammalian, plant, and bacterial virus content of this compartment was generated and the deep sequencing data were sufficient to assembly 12 complete viral genomes from 6 virus families. The method detected high levels of enteropathic viruses that are normally controlled in healthy adults, but may be involved in the pathogenesis of HIV-1 infection and will provide a powerful tool for virus detection and for analyzing changes in the fecal virome associated with HIV-1 progression and pathogenesis.

  16. Full Genome Virus Detection in Fecal Samples Using Sensitive Nucleic Acid Preparation, Deep Sequencing, and a Novel Iterative Sequence Classification Algorithm

    PubMed Central

    Cotten, Matthew; Oude Munnink, Bas; Canuti, Marta; Deijs, Martin; Watson, Simon J.; Kellam, Paul; van der Hoek, Lia

    2014-01-01

    We have developed a full genome virus detection process that combines sensitive nucleic acid preparation optimised for virus identification in fecal material with Illumina MiSeq sequencing and a novel post-sequencing virus identification algorithm. Enriched viral nucleic acid was converted to double-stranded DNA and subjected to Illumina MiSeq sequencing. The resulting short reads were processed with a novel iterative Python algorithm SLIM for the identification of sequences with homology to known viruses. De novo assembly was then used to generate full viral genomes. The sensitivity of this process was demonstrated with a set of fecal samples from HIV-1 infected patients. A quantitative assessment of the mammalian, plant, and bacterial virus content of this compartment was generated and the deep sequencing data were sufficient to assembly 12 complete viral genomes from 6 virus families. The method detected high levels of enteropathic viruses that are normally controlled in healthy adults, but may be involved in the pathogenesis of HIV-1 infection and will provide a powerful tool for virus detection and for analyzing changes in the fecal virome associated with HIV-1 progression and pathogenesis. PMID:24695106

  17. Population-genomic variation within RNA viruses of the Western honey bee, Apis mellifera, inferred from deep sequencing

    USDA-ARS?s Scientific Manuscript database

    Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RN...

  18. NullSeq: A Tool for Generating Random Coding Sequences with Desired Amino Acid and GC Contents.

    PubMed

    Liu, Sophia S; Hockenberry, Adam J; Lancichinetti, Andrea; Jewett, Michael C; Amaral, Luís A N

    2016-11-01

    The existence of over- and under-represented sequence motifs in genomes provides evidence of selective evolutionary pressures on biological mechanisms such as transcription, translation, ligand-substrate binding, and host immunity. In order to accurately identify motifs and other genome-scale patterns of interest, it is essential to be able to generate accurate null models that are appropriate for the sequences under study. While many tools have been developed to create random nucleotide sequences, protein coding sequences are subject to a unique set of constraints that complicates the process of generating appropriate null models. There are currently no tools available that allow users to create random coding sequences with specified amino acid composition and GC content for the purpose of hypothesis testing. Using the principle of maximum entropy, we developed a method that generates unbiased random sequences with pre-specified amino acid and GC content, which we have developed into a python package. Our method is the simplest way to obtain maximally unbiased random sequences that are subject to GC usage and primary amino acid sequence constraints. Furthermore, this approach can easily be expanded to create unbiased random sequences that incorporate more complicated constraints such as individual nucleotide usage or even di-nucleotide frequencies. The ability to generate correctly specified null models will allow researchers to accurately identify sequence motifs which will lead to a better understanding of biological processes as well as more effective engineering of biological systems.

  19. Nucleotide sequence variation at two genes of the phenylpropanoid pathway, the FAH1 and F3H genes, in Arabidopsis thaliana.

    PubMed

    Aguadé, M

    2001-01-01

    The FAH1 and F3H genes encode ferulate-5-hydroxylase and flavanone-3-hydroxylase, which are enzymes in the pathways leading to the synthesis of sinapic acid esters and flavonoids, respectively. Nucleotide variation at these genes was surveyed by sequencing a sample of 20 worldwide Arabidopsis thaliana ecotypes and one Arabidopsis lyrata spp. petraea stock. In contrast with most previously studied genes, the percentage of singletons was rather low in both the FAH1 and the F3H gene regions. There was, therefore, no footprint of a recent species expansion in the pattern of nucleotide variation in these regions. In both FAH1 and F3H, nucleotide variation was structured into two major highly differentiated haplotypes. In both genes, there was a peak of silent polymorphism in the 5' part of the coding region without a parallel increase in silent divergence. In FAH1, the peak was centered at the beginning of the second exon. In F3H, nucleotide diversity was highest at the beginning of the gene. The observed pattern of variation in both FAH1 and F3H, although suggestive of balancing selection, was compatible with a neutral model with no recombination.

  20. [Complete genome sequencing of polymalic acid-producing strain Aureobasidium pullulans CCTCC M2012223].

    PubMed

    Wang, Yongkang; Song, Xiaodan; Li, Xiaorong; Yang, Sang-tian; Zou, Xiang

    2017-01-04

    To explore the genome sequence of Aureobasidium pullulans CCTCC M2012223, analyze the key genes related to the biosynthesis of important metabolites, and provide genetic background for metabolic engineering. Complete genome of A. pullulans CCTCC M2012223 was sequenced by Illumina HiSeq high throughput sequencing platform. Then, fragment assembly, gene prediction, functional annotation, and GO/COG cluster were analyzed in comparison with those of other five A. pullulans varieties. The complete genome sequence of A. pullulans CCTCC M2012223 was 30756831 bp with an average GC content of 47.49%, and 9452 genes were successfully predicted. Genome-wide analysis showed that A. pullulans CCTCC M2012223 had the biggest genome assembly size. Protein sequences involved in the pullulan and polymalic acid pathway were highly conservative in all of six A. pullulans varieties. Although both A. pullulans CCTCC M2012223 and A. pullulans var. melanogenum have a close affinity, some point mutation and inserts were occurred in protein sequences involved in melanin biosynthesis. Genome information of A. pullulans CCTCC M2012223 was annotated and genes involved in melanin, pullulan and polymalic acid pathway were compared, which would provide a theoretical basis for genetic modification of metabolic pathway in A. pullulans.

  1. Whole-Genome Sequencing Reveals Genetic Variation in the Asian House Rat.

    PubMed

    Teng, Huajing; Zhang, Yaohua; Shi, Chengmin; Mao, Fengbiao; Hou, Lingling; Guo, Hongling; Sun, Zhongsheng; Zhang, Jianxu

    2016-07-07

    Whole-genome sequencing of wild-derived rat species can provide novel genomic resources, which may help decipher the genetics underlying complex phenotypes. As a notorious pest, reservoir of human pathogens, and colonizer, the Asian house rat, Rattus tanezumi, is successfully adapted to its habitat. However, little is known regarding genetic variation in this species. In this study, we identified over 41,000,000 single-nucleotide polymorphisms, plus insertions and deletions, through whole-genome sequencing and bioinformatics analyses. Moreover, we identified over 12,000 structural variants, including 143 chromosomal inversions. Further functional analyses revealed several fixed nonsense mutations associated with infection and immunity-related adaptations, and a number of fixed missense mutations that may be related to anticoagulant resistance. A genome-wide scan for loci under selection identified various genes related to neural activity. Our whole-genome sequencing data provide a genomic resource for future genetic studies of the Asian house rat species and have the potential to facilitate understanding of the molecular adaptations of rats to their ecological niches. Copyright © 2016 Teng et al.

  2. Complete amino acid sequence of ananain and a comparison with stem bromelain and other plant cysteine proteases.

    PubMed Central

    Lee, K L; Albee, K L; Bernasconi, R J; Edmunds, T

    1997-01-01

    The amino acid sequences of ananain (EC3.4.22.31) and stem bromelain (3.4.22.32), two cysteine proteases from pineapple stem, are similar yet ananain and stem bromelain possess distinct specificities towards synthetic peptide substrates and different reactivities towards the cysteine protease inhibitors E-64 and chicken egg white cystatin. We present here the complete amino acid sequence of ananain and compare it with the reported sequences of pineapple stem bromelain, papain and chymopapain from papaya and actinidin from kiwifruit. Ananain is comprised of 216 residues with a theoretical mass of 23464 Da. This primary structure includes a sequence insert between residues 170 and 174 not present in stem bromelain or papain and a hydrophobic series of amino acids adjacent to His-157. It is possible that these sequence differences contribute to the different substrate and inhibitor specificities exhibited by ananain and stem bromelain. PMID:9355753

  3. Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives.

    PubMed

    Zhao, Min; Wang, Qingguo; Wang, Quan; Jia, Peilin; Zhao, Zhongming

    2013-01-01

    Copy number variation (CNV) is a prevalent form of critical genetic variation that leads to an abnormal number of copies of large genomic regions in a cell. Microarray-based comparative genome hybridization (arrayCGH) or genotyping arrays have been standard technologies to detect large regions subject to copy number changes in genomes until most recently high-resolution sequence data can be analyzed by next-generation sequencing (NGS). During the last several years, NGS-based analysis has been widely applied to identify CNVs in both healthy and diseased individuals. Correspondingly, the strong demand for NGS-based CNV analyses has fuelled development of numerous computational methods and tools for CNV detection. In this article, we review the recent advances in computational methods pertaining to CNV detection using whole genome and whole exome sequencing data. Additionally, we discuss their strengths and weaknesses and suggest directions for future development.

  4. Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives

    PubMed Central

    2013-01-01

    Copy number variation (CNV) is a prevalent form of critical genetic variation that leads to an abnormal number of copies of large genomic regions in a cell. Microarray-based comparative genome hybridization (arrayCGH) or genotyping arrays have been standard technologies to detect large regions subject to copy number changes in genomes until most recently high-resolution sequence data can be analyzed by next-generation sequencing (NGS). During the last several years, NGS-based analysis has been widely applied to identify CNVs in both healthy and diseased individuals. Correspondingly, the strong demand for NGS-based CNV analyses has fuelled development of numerous computational methods and tools for CNV detection. In this article, we review the recent advances in computational methods pertaining to CNV detection using whole genome and whole exome sequencing data. Additionally, we discuss their strengths and weaknesses and suggest directions for future development. PMID:24564169

  5. Modeling kinetic rate variation in third generation DNA sequencing data to detect putative modifications to DNA bases

    PubMed Central

    Schadt, Eric E.; Banerjee, Onureena; Fang, Gang; Feng, Zhixing; Wong, Wing H.; Zhang, Xuegong; Kislyuk, Andrey; Clark, Tyson A.; Luong, Khai; Keren-Paz, Alona; Chess, Andrew; Kumar, Vipin; Chen-Plotkin, Alice; Sondheimer, Neal; Korlach, Jonas; Kasarskis, Andrew

    2013-01-01

    Current generation DNA sequencing instruments are moving closer to seamlessly sequencing genomes of entire populations as a routine part of scientific investigation. However, while significant inroads have been made identifying small nucleotide variation and structural variations in DNA that impact phenotypes of interest, progress has not been as dramatic regarding epigenetic changes and base-level damage to DNA, largely due to technological limitations in assaying all known and unknown types of modifications at genome scale. Recently, single-molecule real time (SMRT) sequencing has been reported to identify kinetic variation (KV) events that have been demonstrated to reflect epigenetic changes of every known type, providing a path forward for detecting base modifications as a routine part of sequencing. However, to date no statistical framework has been proposed to enhance the power to detect these events while also controlling for false-positive events. By modeling enzyme kinetics in the neighborhood of an arbitrary location in a genomic region of interest as a conditional random field, we provide a statistical framework for incorporating kinetic information at a test position of interest as well as at neighboring sites that help enhance the power to detect KV events. The performance of this and related models is explored, with the best-performing model applied to plasmid DNA isolated from Escherichia coli and mitochondrial DNA isolated from human brain tissue. We highlight widespread kinetic variation events, some of which strongly associate with known modification events, while others represent putative chemically modified sites of unknown types. PMID:23093720

  6. Modeling kinetic rate variation in third generation DNA sequencing data to detect putative modifications to DNA bases.

    PubMed

    Schadt, Eric E; Banerjee, Onureena; Fang, Gang; Feng, Zhixing; Wong, Wing H; Zhang, Xuegong; Kislyuk, Andrey; Clark, Tyson A; Luong, Khai; Keren-Paz, Alona; Chess, Andrew; Kumar, Vipin; Chen-Plotkin, Alice; Sondheimer, Neal; Korlach, Jonas; Kasarskis, Andrew

    2013-01-01

    Current generation DNA sequencing instruments are moving closer to seamlessly sequencing genomes of entire populations as a routine part of scientific investigation. However, while significant inroads have been made identifying small nucleotide variation and structural variations in DNA that impact phenotypes of interest, progress has not been as dramatic regarding epigenetic changes and base-level damage to DNA, largely due to technological limitations in assaying all known and unknown types of modifications at genome scale. Recently, single-molecule real time (SMRT) sequencing has been reported to identify kinetic variation (KV) events that have been demonstrated to reflect epigenetic changes of every known type, providing a path forward for detecting base modifications as a routine part of sequencing. However, to date no statistical framework has been proposed to enhance the power to detect these events while also controlling for false-positive events. By modeling enzyme kinetics in the neighborhood of an arbitrary location in a genomic region of interest as a conditional random field, we provide a statistical framework for incorporating kinetic information at a test position of interest as well as at neighboring sites that help enhance the power to detect KV events. The performance of this and related models is explored, with the best-performing model applied to plasmid DNA isolated from Escherichia coli and mitochondrial DNA isolated from human brain tissue. We highlight widespread kinetic variation events, some of which strongly associate with known modification events, while others represent putative chemically modified sites of unknown types.

  7. Variability of the protein sequences of lcrV between epidemic and atypical rhamnose-positive strains of Yersinia pestis.

    PubMed

    Anisimov, Andrey P; Panfertsev, Evgeniy A; Svetoch, Tat'yana E; Dentovskaya, Svetlana V

    2007-01-01

    Sequencing of lcrV genes and comparison of the deduced amino acid sequences from ten Y. pestis strains belonging mostly to the group of atypical rhamnose-positive isolates (non-pestis subspecies or pestoides group) showed that the LcrV proteins analyzed could be classified into five sequence types. This classification was based on major amino acid polymorphisms among LcrV proteins in the four "hot points" of the protein sequences. Some additional minor polymorphisms were found throughout these sequence types. The "hot points" corresponded to amino acids 18 (Lys --> Asn), 72 (Lys --> Arg), 273 (Cys --> Ser), and 324-326 (Ser-Gly-Lys --> Arg) in the LcrV sequence of the reference Y. pestis strain CO92. One possible explanation for polymorphism in amino acid sequences of LcrV among different strains is that strain-specific variation resulted from adaptation of the plague pathogen to different rodent and lagomorph hosts.

  8. A complete Neandertal mitochondrial genome sequence determined by high-throughput sequencing

    PubMed Central

    Green, Richard E.; Malaspinas, Anna-Sapfo; Krause, Johannes; Briggs, Adrian W.; Johnson, Philip L. F.; Uhler, Caroline; Meyer, Matthias; Good, Jeffrey M.; Maricic, Tomislav; Stenzel, Udo; Prüfer, Kay; Siebauer, Michael; Burbano, Hernán A.; Ronan, Michael; Rothberg, Jonathan M.; Egholm, Michael; Rudan, Pavao; Brajković, Dejana; Kućan, Željko; Gušić, Ivan; Wikström, Mårten; Laakkonen, Liisa; Kelso, Janet; Slatkin, Montgomery; Pääbo, Svante

    2008-01-01

    Summary A complete mitochondrial (mt) genome sequence was reconstructed from a 38,000-year-old Neandertal individual using 8,341 mtDNA sequences identified among 4.8 Gb of DNA generated from ~0.3 grams of bone. Analysis of the assembled sequence unequivocally establishes that the Neandertal mtDNA falls outside the variation of extant human mtDNAs and allows an estimate of the divergence date between the two mtDNA lineages of 660,000±140,000 years. Of the 13 proteins encoded in the mtDNA, subunit 2 of cytochrome c oxidase of the mitochondrial electron transport chain has experienced the largest number of amino acid substitutions in human ancestors since the separation from Neandertals. There is evidence that purifying selection in the Neandertal mtDNA was reduced compared to other primate lineages suggesting that the effective population size of Neandertals was small. PMID:18692465

  9. SNBRFinder: A Sequence-Based Hybrid Algorithm for Enhanced Prediction of Nucleic Acid-Binding Residues.

    PubMed

    Yang, Xiaoxia; Wang, Jia; Sun, Jun; Liu, Rong

    2015-01-01

    Protein-nucleic acid interactions are central to various fundamental biological processes. Automated methods capable of reliably identifying DNA- and RNA-binding residues in protein sequence are assuming ever-increasing importance. The majority of current algorithms rely on feature-based prediction, but their accuracy remains to be further improved. Here we propose a sequence-based hybrid algorithm SNBRFinder (Sequence-based Nucleic acid-Binding Residue Finder) by merging a feature predictor SNBRFinderF and a template predictor SNBRFinderT. SNBRFinderF was established using the support vector machine whose inputs include sequence profile and other complementary sequence descriptors, while SNBRFinderT was implemented with the sequence alignment algorithm based on profile hidden Markov models to capture the weakly homologous template of query sequence. Experimental results show that SNBRFinderF was clearly superior to the commonly used sequence profile-based predictor and SNBRFinderT can achieve comparable performance to the structure-based template methods. Leveraging the complementary relationship between these two predictors, SNBRFinder reasonably improved the performance of both DNA- and RNA-binding residue predictions. More importantly, the sequence-based hybrid prediction reached competitive performance relative to our previous structure-based counterpart. Our extensive and stringent comparisons show that SNBRFinder has obvious advantages over the existing sequence-based prediction algorithms. The value of our algorithm is highlighted by establishing an easy-to-use web server that is freely accessible at http://ibi.hzau.edu.cn/SNBRFinder.

  10. Genetic validation of whole-transcriptome sequencing for mapping expression affected by cis-regulatory variation.

    PubMed

    Babak, Tomas; Garrett-Engele, Philip; Armour, Christopher D; Raymond, Christopher K; Keller, Mark P; Chen, Ronghua; Rohl, Carol A; Johnson, Jason M; Attie, Alan D; Fraser, Hunter B; Schadt, Eric E

    2010-08-13

    Identifying associations between genotypes and gene expression levels using microarrays has enabled systematic interrogation of regulatory variation underlying complex phenotypes. This approach has vast potential for functional characterization of disease states, but its prohibitive cost, given hundreds to thousands of individual samples from populations have to be genotyped and expression profiled, has limited its widespread application. Here we demonstrate that genomic regions with allele-specific expression (ASE) detected by sequencing cDNA are highly enriched for cis-acting expression quantitative trait loci (cis-eQTL) identified by profiling of 500 animals in parallel, with up to 90% agreement on the allele that is preferentially expressed. We also observed widespread noncoding and antisense ASE and identified several allele-specific alternative splicing variants. Monitoring ASE by sequencing cDNA from as little as one sample is a practical alternative to expression genetics for mapping cis-acting variation that regulates RNA transcription and processing.

  11. CDSbank: taxonomy-aware extraction, selection, renaming and formatting of protein-coding DNA or amino acid sequences.

    PubMed

    Hazes, Bart

    2014-02-28

    Protein-coding DNA sequences and their corresponding amino acid sequences are routinely used to study relationships between sequence, structure, function, and evolution. The rapidly growing size of sequence databases increases the power of such comparative analyses but it makes it more challenging to prepare high quality sequence data sets with control over redundancy, quality, completeness, formatting, and labeling. Software tools for some individual steps in this process exist but manual intervention remains a common and time consuming necessity. CDSbank is a database that stores both the protein-coding DNA sequence (CDS) and amino acid sequence for each protein annotated in Genbank. CDSbank also stores Genbank feature annotation, a flag to indicate incomplete 5' and 3' ends, full taxonomic data, and a heuristic to rank the scientific interest of each species. This rich information allows fully automated data set preparation with a level of sophistication that aims to meet or exceed manual processing. Defaults ensure ease of use for typical scenarios while allowing great flexibility when needed. Access is via a free web server at http://hazeslab.med.ualberta.ca/CDSbank/. CDSbank presents a user-friendly web server to download, filter, format, and name large sequence data sets. Common usage scenarios can be accessed via pre-programmed default choices, while optional sections give full control over the processing pipeline. Particular strengths are: extract protein-coding DNA sequences just as easily as amino acid sequences, full access to taxonomy for labeling and filtering, awareness of incomplete sequences, and the ability to take one protein sequence and extract all synonymous CDS or identical protein sequences in other species. Finally, CDSbank can also create labeled property files to, for instance, annotate or re-label phylogenetic trees.

  12. Integrating mRNA and Protein Sequencing Enables the Detection and Quantitative Profiling of Natural Protein Sequence Variants of Populus trichocarpa.

    PubMed

    Abraham, Paul E; Wang, Xiaojing; Ranjan, Priya; Nookaew, Intawat; Zhang, Bing; Tuskan, Gerald A; Hettich, Robert L

    2015-12-04

    Next-generation sequencing has transformed the ability to link genotypes to phenotypes and facilitates the dissection of genetic contribution to complex traits. However, it is challenging to link genetic variants with the perturbed functional effects on proteins encoded by such genes. Here we show how RNA sequencing can be exploited to construct genotype-specific protein sequence databases to assess natural variation in proteins, providing information about the molecular toolbox driving cellular processes. For this study, we used two natural genotypes selected from a recent genome-wide association study of Populus trichocarpa, an obligate outcrosser with tremendous phenotypic variation across the natural population. This strategy allowed us to comprehensively catalogue proteins containing single amino acid polymorphisms (SAAPs), as well as insertions and deletions. We profiled the frequency of 128 types of naturally occurring amino acid substitutions, including both expected (neutral) and unexpected (non-neutral) SAAPs, with a subset occurring in regions of the genome having strong polymorphism patterns consistent with recent positive and/or divergent selection. By zeroing in on the molecular signatures of these important regions that might have previously been uncharacterized, we now provide a high-resolution molecular inventory that should improve accessibility and subsequent identification of natural protein variants in future genotype-to-phenotype studies.

  13. ANCAC: amino acid, nucleotide, and codon analysis of COGs--a tool for sequence bias analysis in microbial orthologs.

    PubMed

    Meiler, Arno; Klinger, Claudia; Kaufmann, Michael

    2012-09-08

    The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG) within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC's NUCOCOG dataset as the largest one available for that purpose thus far. Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills.

  14. Molecular spectrum of somaclonal variation in regenerated rice revealed by whole-genome sequencing.

    PubMed

    Miyao, Akio; Nakagome, Mariko; Ohnuma, Takako; Yamagata, Harumi; Kanamori, Hiroyuki; Katayose, Yuichi; Takahashi, Akira; Matsumoto, Takashi; Hirochika, Hirohiko

    2012-01-01

    Somaclonal variation is a phenomenon that results in the phenotypic variation of plants regenerated from cell culture. One of the causes of somaclonal variation in rice is the transposition of retrotransposons. However, many aspects of the mechanisms that result in somaclonal variation remain undefined. To detect genome-wide changes in regenerated rice, we analyzed the whole-genome sequences of three plants independently regenerated from cultured cells originating from a single seed stock. Many single-nucleotide polymorphisms (SNPs) and insertions and deletions (indels) were detected in the genomes of the regenerated plants. The transposition of only Tos17 among 43 transposons examined was detected in the regenerated plants. Therefore, the SNPs and indels contribute to the somaclonal variation in regenerated rice in addition to the transposition of Tos17. The observed molecular spectrum was similar to that of the spontaneous mutations in Arabidopsis thaliana. However, the base change ratio was estimated to be 1.74 × 10(-6) base substitutions per site per regeneration, which is 248-fold greater than the spontaneous mutation rate of A. thaliana.

  15. Sequence dependent aggregation of peptides and fibril formation

    NASA Astrophysics Data System (ADS)

    Hung, Nguyen Ba; Le, Duy-Manh; Hoang, Trinh X.

    2017-09-01

    Deciphering the links between amino acid sequence and amyloid fibril formation is key for understanding protein misfolding diseases. Here we use Monte Carlo simulations to study the aggregation of short peptides in a coarse-grained model with hydrophobic-polar (HP) amino acid sequences and correlated side chain orientations for hydrophobic contacts. A significant heterogeneity is observed in the aggregate structures and in the thermodynamics of aggregation for systems of different HP sequences and different numbers of peptides. Fibril-like ordered aggregates are found for several sequences that contain the common HPH pattern, while other sequences may form helix bundles or disordered aggregates. A wide variation of the aggregation transition temperatures among sequences, even among those of the same hydrophobic fraction, indicates that not all sequences undergo aggregation at a presumable physiological temperature. The transition is found to be the most cooperative for sequences forming fibril-like structures. For a fibril-prone sequence, it is shown that fibril formation follows the nucleation and growth mechanism. Interestingly, a binary mixture of peptides of an aggregation-prone and a non-aggregation-prone sequence shows the association and conversion of the latter to the fibrillar structure. Our study highlights the role of a sequence in selecting fibril-like aggregates and also the impact of a structural template on fibril formation by peptides of unrelated sequences.

  16. Genetics of Variation in Serum Uric Acid and Cardiovascular Risk Factors in Mexican Americans

    PubMed Central

    Voruganti, V. Saroja; Nath, Subrata D.; Cole, Shelley A.; Thameem, Farook; Jowett, Jeremy B.; Bauer, Richard; MacCluer, Jean W.; Blangero, John; Comuzzie, Anthony G.; Abboud, Hanna E.; Arar, Nedal H.

    2009-01-01

    Background: Elevated serum uric acid is associated with several cardiovascular disease (CVD) risk factors such as hypertension, inflammation, endothelial dysfunction, insulin resistance, dyslipidemia, and obesity. However, the role of uric acid as an independent risk factor for CVD is not yet clear. Objective: The aim of the study was to localize quantitative trait loci regulating variation in serum uric acid and also establish the relationship between serum uric acid and other CVD risk factors in Mexican Americans (n = 848; men = 310, women = 538) participating in the San Antonio Family Heart Study. Methods: Quantitative genetic analysis was conducted using variance components decomposition method, implemented in the software program SOLAR. Results: Mean ± sd of serum uric acid was 5.35 ± 1.38 mg/dl. Univariate genetic analysis showed serum uric acid and other CVD risk markers to be significantly heritable (P < 0.005). Bivariate analysis showed significant correlation of serum uric acid with body mass index, waist circumference, waist/hip ratio, total body fat, plasma insulin, serum triglycerides, high-density lipoprotein cholesterol, C-reactive protein, and granulocyte macrophage-colony stimulating factor (P < 0.05). A genome-wide scan for detecting quantitative trait loci regulating serum uric acid variation showed a significant logarithm of odds (LOD) score of 4.72 (empirical LOD score = 4.62; P < 0.00001) on chromosome 3p26. One LOD support interval contains 25 genes, of which an interesting candidate gene is chemokine receptor 2. Summary: There is a significant genetic component in the variation in serum uric acid and evidence of pleiotropy between serum uric acid and other cardiovascular risk factors. PMID:19001525

  17. Identification and Analysis of Novel Amino-Acid Sequence Repeats in Bacillus anthracis str. Ames Proteome Using Computational Tools

    PubMed Central

    Hemalatha, G. R.; Rao, D. Satyanarayana; Guruprasad, L.

    2007-01-01

    We have identified four repeats and ten domains that are novel in proteins encoded by the Bacillus anthracis str. Ames proteome using automated in silico methods. A “repeat” corresponds to a region comprising less than 55-amino-acid residues that occur more than once in the protein sequence and sometimes present in tandem. A “domain” corresponds to a conserved region with greater than 55-amino-acid residues and may be present as single or multiple copies in the protein sequence. These correspond to (1) 57-amino-acid-residue PxV domain, (2) 122-amino-acid-residue FxF domain, (3) 111-amino-acid-residue YEFF domain, (4) 109-amino-acid-residue IMxxH domain, (5) 103-amino-acid-residue VxxT domain, (6) 84-amino-acid-residue ExW domain, (7) 104-amino-acid-residue NTGFIG domain, (8) 36-amino-acid-residue NxGK repeat, (9) 95-amino-acid-residue VYV domain, (10) 75-amino-acid-residue KEWE domain, (11) 59-amino-acid-residue AFL domain, (12) 53-amino-acid-residue RIDVK repeat, (13) (a) 41-amino-acid-residue AGQF repeat and (b) 42-amino-acid-residue GSAL repeat. A repeat or domain type is characterized by specific conserved sequence motifs. We discuss the presence of these repeats and domains in proteins from other genomes and their probable secondary structure. PMID:17538688

  18. Reliable Detection of Herpes Simplex Virus Sequence Variation by High-Throughput Resequencing.

    PubMed

    Morse, Alison M; Calabro, Kaitlyn R; Fear, Justin M; Bloom, David C; McIntyre, Lauren M

    2017-08-16

    High-throughput sequencing (HTS) has resulted in data for a number of herpes simplex virus (HSV) laboratory strains and clinical isolates. The knowledge of these sequences has been critical for investigating viral pathogenicity. However, the assembly of complete herpesviral genomes, including HSV, is complicated due to the existence of large repeat regions and arrays of smaller reiterated sequences that are commonly found in these genomes. In addition, the inherent genetic variation in populations of isolates for viruses and other microorganisms presents an additional challenge to many existing HTS sequence assembly pipelines. Here, we evaluate two approaches for the identification of genetic variants in HSV1 strains using Illumina short read sequencing data. The first, a reference-based approach, identifies variants from reads aligned to a reference sequence and the second, a de novo assembly approach, identifies variants from reads aligned to de novo assembled consensus sequences. Of critical importance for both approaches is the reduction in the number of low complexity regions through the construction of a non-redundant reference genome. We compared variants identified in the two methods. Our results indicate that approximately 85% of variants are identified regardless of the approach. The reference-based approach to variant discovery captures an additional 15% representing variants divergent from the HSV1 reference possibly due to viral passage. Reference-based approaches are significantly less labor-intensive and identify variants across the genome where de novo assembly-based approaches are limited to regions where contigs have been successfully assembled. In addition, regions of poor quality assembly can lead to false variant identification in de novo consensus sequences. For viruses with a well-assembled reference genome, a reference-based approach is recommended.

  19. Human Retroviruses and AIDS. A compilation and analysis of nucleic acid and amino acid sequences: I--II; III--V

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Myers, G.; Korber, B.; Wain-Hobson, S.

    1993-12-31

    This compendium and the accompanying floppy diskettes are the result of an effort to compile and rapidly publish all relevant molecular data concerning the human immunodeficiency viruses (HIV) and related retroviruses. The scope of the compendium and database is best summarized by the five parts that it comprises: (I) HIV and SIV Nucleotide Sequences; (II) Amino Acid Sequences; (III) Analyses; (IV) Related Sequences; and (V) Database Communications. Information within all the parts is updated at least twice in each year, which accounts for the modes of binding and pagination in the compendium.

  20. Sequence variation and phylogenetic analysis of envelope glycoprotein of hepatitis G virus.

    PubMed

    Lim, M Y; Fry, K; Yun, A; Chong, S; Linnen, J; Fung, K; Kim, J P

    1997-11-01

    A transfusion-transmissible agent provisionally designated hepatitis G virus (HGV) was recently identified. In this study, we examined the variability of the HGV genome by analysing sequences in the putative envelope region from 72 isolates obtained from diverse geographical sources. The 1561 nucleotide sequence of the E1/E2/NS2a region of HGV was determined from 12 isolates, and compared with three published sequences. The most variability was observed in 400 nucleotides at the N terminus of E2. We next analysed this 400 nucleotide envelope variable region (EV) from an additional 60 HGV isolates. This sequence varied considerably among the 75 isolates, with overall identity ranging from 79.3% to 99.5% at the nucleotide level, and from 83.5% to 100% at the amino acid level. However, hypervariable regions were not identified. Phylogenetic analyses indicated that the 75 HGV isolates belong to a single genotype. A single-tier distribution of evolutionary distances was observed among the 15 E1/E2/NS2a sequences and the 75 EV sequences. In contrast, 11 isolates of HCV were analysed and showed a three-tiered distribution, representing genotypes, subtypes, and isolates. The 75 isolates of HGV fell into four clusters on the phylogenetic tree. Tight geographical clustering was observed among the HGV isolates from Japan and Korea.

  1. Sequence variation between 462 human individuals fine-tunes functional sites of RNA processing

    NASA Astrophysics Data System (ADS)

    Ferreira, Pedro G.; Oti, Martin; Barann, Matthias; Wieland, Thomas; Ezquina, Suzana; Friedländer, Marc R.; Rivas, Manuel A.; Esteve-Codina, Anna; Estivill, Xavier; Guigó, Roderic; Dermitzakis, Emmanouil; Antonarakis, Stylianos; Meitinger, Thomas; Strom, Tim M.; Palotie, Aarno; François Deleuze, Jean; Sudbrak, Ralf; Lerach, Hans; Gut, Ivo; Syvänen, Ann-Christine; Gyllensten, Ulf; Schreiber, Stefan; Rosenstiel, Philip; Brunner, Han; Veltman, Joris; Hoen, Peter A. C. T.; Jan van Ommen, Gert; Carracedo, Angel; Brazma, Alvis; Flicek, Paul; Cambon-Thomsen, Anne; Mangion, Jonathan; Bentley, David; Hamosh, Ada; Rosenstiel, Philip; Strom, Tim M.; Lappalainen, Tuuli; Guigó, Roderic; Sammeth, Michael

    2016-09-01

    Recent advances in the cost-efficiency of sequencing technologies enabled the combined DNA- and RNA-sequencing of human individuals at the population-scale, making genome-wide investigations of the inter-individual genetic impact on gene expression viable. Employing mRNA-sequencing data from the Geuvadis Project and genome sequencing data from the 1000 Genomes Project we show that the computational analysis of DNA sequences around splice sites and poly-A signals is able to explain several observations in the phenotype data. In contrast to widespread assessments of statistically significant associations between DNA polymorphisms and quantitative traits, we developed a computational tool to pinpoint the molecular mechanisms by which genetic markers drive variation in RNA-processing, cataloguing and classifying alleles that change the affinity of core RNA elements to their recognizing factors. The in silico models we employ further suggest RNA editing can moonlight as a splicing-modulator, albeit less frequently than genomic sequence diversity. Beyond existing annotations, we demonstrate that the ultra-high resolution of RNA-Seq combined from 462 individuals also provides evidence for thousands of bona fide novel elements of RNA processing—alternative splice sites, introns, and cleavage sites—which are often rare and lowly expressed but in other characteristics similar to their annotated counterparts.

  2. Genetic validation of whole-transcriptome sequencing for mapping expression affected by cis-regulatory variation

    PubMed Central

    2010-01-01

    Background Identifying associations between genotypes and gene expression levels using microarrays has enabled systematic interrogation of regulatory variation underlying complex phenotypes. This approach has vast potential for functional characterization of disease states, but its prohibitive cost, given hundreds to thousands of individual samples from populations have to be genotyped and expression profiled, has limited its widespread application. Results Here we demonstrate that genomic regions with allele-specific expression (ASE) detected by sequencing cDNA are highly enriched for cis-acting expression quantitative trait loci (cis-eQTL) identified by profiling of 500 animals in parallel, with up to 90% agreement on the allele that is preferentially expressed. We also observed widespread noncoding and antisense ASE and identified several allele-specific alternative splicing variants. Conclusion Monitoring ASE by sequencing cDNA from as little as one sample is a practical alternative to expression genetics for mapping cis-acting variation that regulates RNA transcription and processing. PMID:20707912

  3. Single-strand conformation polymorphism (SSCP)-based mutation scanning approaches to fingerprint sequence variation in ribosomal DNA of ascaridoid nematodes.

    PubMed

    Zhu, X Q; Gasser, R B

    1998-06-01

    In this study, we assessed single-strand conformation polymorphism (SSCP)-based approaches for their capacity to fingerprint sequence variation in ribosomal DNA (rDNA) of ascaridoid nematodes of veterinary and/or human health significance. The second internal transcribed spacer region (ITS-2) of rDNA was utilised as the target region because it is known to provide species-specific markers for this group of parasites. ITS-2 was amplified by PCR from genomic DNA derived from individual parasites and subjected to analysis. Direct SSCP analysis of amplicons from seven taxa (Toxocara vitulorum, Toxocara cati, Toxocara canis, Toxascaris leonina, Baylisascaris procyonis, Ascaris suum and Parascaris equorum) showed that the single-strand (ss) ITS-2 patterns produced allowed their unequivocal identification to species. While no variation in SSCP patterns was detected in the ITS-2 within four species for which multiple samples were available, the method allowed the direct display of four distinct sequence types of ITS-2 among individual worms of T. cati. Comparison of SSCP/sequencing with the methods of dideoxy fingerprinting (ddF) and restriction endonuclease fingerprinting (REF) revealed that also ddF allowed the definition of the four sequence types, whereas REF displayed three of four. The findings indicate the usefulness of the SSCP-based approaches for the identification of ascaridoid nematodes to species, the direct display of sequence variation in rDNA and the detection of population variation. The ability to fingerprint microheterogeneity in ITS-2 rDNA using such approaches also has implications for studying fundamental aspects relating to mutational change in rDNA.

  4. GCPred: a web tool for guanylyl cyclase functional centre prediction from amino acid sequence.

    PubMed

    Xu, Nuo; Fu, Dongfang; Li, Shiang; Wang, Yuxuan; Wong, Aloysius

    2018-06-15

    GCPred is a webserver for the prediction of guanylyl cyclase (GC) functional centres from amino acid sequence. GCs are enzymes that generate the signalling molecule cyclic guanosine 3', 5'-monophosphate from guanosine-5'-triphosphate. A novel class of GC centres (GCCs) has been identified in complex plant proteins. Using currently available experimental data, GCPred is created to automate and facilitate the identification of similar GCCs. The server features GCC values that consider in its calculation, the physicochemical properties of amino acids constituting the GCC and the conserved amino acids within the centre. From user input amino acid sequence, the server returns a table of GCC values and graphs depicting deviations from mean values. The utility of this server is demonstrated using plant proteins and the human interleukin-1 receptor-associated kinase family of proteins as example. The GCPred server is available at http://gcpred.com. Supplementary data are available at Bioinformatics online.

  5. Porcine MYF6 gene: sequence, homology analysis, and variation in the promoter region.

    PubMed

    Wyszyńska-Koko, J; Kurył, J

    2004-01-01

    MYF6 gene codes for the bHLH transcription factor belonging to MyoD family. Its expression accompanies the processes of differentiation and maturation of myotubes during embriogenesis and continues on a relatively high level after birth, affecting the muscle phenotype. The porcine MYF6 gene was amplified and sequenced and compared with MYF6 gene sequences of other species. The amino acid sequence was deduced and an interspecies homology analysis was performed. Myf-6 protein shows a high conservation among species of 99 and 97% identity when comparing pig with cow and human, respectively, and of 93% when comparing pig with mouse and rat. The single nucleotide polymorphism (SNP) was revealed within the promoter region, which appeared to be T --> C transition recognized by a MspI restriction enzyme.

  6. Sequence variation in mitochondrial cox1 and nad1 genes of ascaridoid nematodes in cats and dogs from Iran.

    PubMed

    Mikaeili, F; Mirhendi, H; Mohebali, M; Hosseini, M; Sharbatkhori, M; Zarei, Z; Kia, E B

    2015-07-01

    The study was conducted to determine the sequence variation in two mitochondrial genes, namely cytochrome c oxidase 1 (pcox1) and NADH dehydrogenase 1 (pnad1) within and among isolates of Toxocara cati, Toxocara canis and Toxascaris leonina. Genomic DNA was extracted from 32 isolates of T. cati, 9 isolates of T. canis and 19 isolates of T. leonina collected from cats and dogs in different geographical areas of Iran. Mitochondrial genes were amplified by polymerase chain reaction (PCR) and sequenced. Sequence data were aligned using the BioEdit software and compared with published sequences in GenBank. Phylogenetic analysis was performed using Bayesian inference and maximum likelihood methods. Based on pairwise comparison, intra-species genetic diversity within Iranian isolates of T. cati, T. canis and T. leonina amounted to 0-2.3%, 0-1.3% and 0-1.0% for pcox1 and 0-2.0%, 0-1.7% and 0-2.6% for pnad1, respectively. Inter-species sequence variation among the three ascaridoid nematodes was significantly higher, being 9.5-16.6% for pcox1 and 11.9-26.7% for pnad1. Sequence and phylogenetic analysis of the pcox1 and pnad1 genes indicated that there is significant genetic diversity within and among isolates of T. cati, T. canis and T. leonina from different areas of Iran, and these genes can be used for studying genetic variation of ascaridoid nematodes.

  7. Population-genomic variation within RNA viruses of the Western honey bee, Apis mellifera, inferred from deep sequencing

    PubMed Central

    2013-01-01

    Background Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RNA viruses of the Western honey bee (Apis mellifera), deformed wing virus (DWV) and Israel acute paralysis virus (IAPV). All viral RNA was extracted from North American samples of honey bees or, in one case, the ectoparasitic mite Varroa destructor. Results Coverage depth was generally lower for IAPV than DWV, and marked gaps in coverage occurred in several narrow regions (< 50 bp) of IAPV. These coverage gaps occurred across sequencing runs and were virtually unchanged when reads were re-mapped with greater permissiveness (up to 8% divergence), suggesting a recurrent sequencing artifact rather than strain divergence. Consensus sequences of DWV for each sample showed little phylogenetic divergence, low nucleotide diversity, and strongly negative values of Fu and Li’s D statistic, suggesting a recent population bottleneck and/or purifying selection. The Kakugo strain of DWV fell outside of all other DWV sequences at 100% bootstrap support. IAPV consensus sequences supported the existence of multiple clades as had been previously reported, and Fu and Li’s D was closer to neutral expectation overall, although a sliding-window analysis identified a significantly positive D within the protease region, suggesting selection maintains diversity in that region. Within-sample mean diversity was comparable between the two viruses on average, although for both viruses there was substantial variation among samples in mean diversity at third codon positions and in the number of high-diversity sites. FST values were bimodal for DWV, likely reflecting neutral divergence in two low-diversity populations, whereas IAPV had several sites that were strong outliers with very low

  8. Population-genomic variation within RNA viruses of the Western honey bee, Apis mellifera, inferred from deep sequencing.

    PubMed

    Cornman, Robert Scott; Boncristiani, Humberto; Dainat, Benjamin; Chen, Yanping; vanEngelsdorp, Dennis; Weaver, Daniel; Evans, Jay D

    2013-03-07

    Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RNA viruses of the Western honey bee (Apis mellifera), deformed wing virus (DWV) and Israel acute paralysis virus (IAPV). All viral RNA was extracted from North American samples of honey bees or, in one case, the ectoparasitic mite Varroa destructor. Coverage depth was generally lower for IAPV than DWV, and marked gaps in coverage occurred in several narrow regions (< 50 bp) of IAPV. These coverage gaps occurred across sequencing runs and were virtually unchanged when reads were re-mapped with greater permissiveness (up to 8% divergence), suggesting a recurrent sequencing artifact rather than strain divergence. Consensus sequences of DWV for each sample showed little phylogenetic divergence, low nucleotide diversity, and strongly negative values of Fu and Li's D statistic, suggesting a recent population bottleneck and/or purifying selection. The Kakugo strain of DWV fell outside of all other DWV sequences at 100% bootstrap support. IAPV consensus sequences supported the existence of multiple clades as had been previously reported, and Fu and Li's D was closer to neutral expectation overall, although a sliding-window analysis identified a significantly positive D within the protease region, suggesting selection maintains diversity in that region. Within-sample mean diversity was comparable between the two viruses on average, although for both viruses there was substantial variation among samples in mean diversity at third codon positions and in the number of high-diversity sites. FST values were bimodal for DWV, likely reflecting neutral divergence in two low-diversity populations, whereas IAPV had several sites that were strong outliers with very low FST. This initial

  9. Length and repeat-sequence variation in 58 STRs and 94 SNPs in two Spanish populations.

    PubMed

    Casals, Ferran; Anglada, Roger; Bonet, Núria; Rasal, Raquel; van der Gaag, Kristiaan J; Hoogenboom, Jerry; Solé-Morata, Neus; Comas, David; Calafell, Francesc

    2017-09-01

    We have genotyped the 58 STRs (27 autosomal, 24 Y-STRs and 7 X-STRs) and 94 autosomal SNPs in Illumina ForenSeq™ Primer Mix A in 88 Spanish Roma (Gypsy) samples and 143 Catalans. Since this platform is based in massive parallel sequencing, we have used simple R scripts to uncover the sequence variation in the repeat region. Thus, we have found, across 58 STRs, 541 length-based alleles, which, after considering repeat-sequence variation, became 804 different alleles. All loci in both populations were in Hardy-Weinberg equilibrium. F ST between both populations was 0.0178 for autosomal SNPs, 0.0146 for autosomal STRs, 0.0101 for X-STRs and 0.1866 for Y-STRs. Combined a priori statistics showed quite large; for instance, pooling all the autosomal loci, the a priori probabilities of discriminating a suspect become 1-(2.3×10 -70 ) and 1-(5.9×10 -73 ), for Roma and Catalans respectively, and the chances of excluding a false father in a trio are 1-(2.6×10 -20 ) and 1-(2.0×10 -21 ). Copyright © 2017 Elsevier B.V. All rights reserved.

  10. Nucleotide sequence of the phosphoglycerate kinase gene from the extreme thermophile Thermus thermophilus. Comparison of the deduced amino acid sequence with that of the mesophilic yeast phosphoglycerate kinase.

    PubMed Central

    Bowen, D; Littlechild, J A; Fothergill, J E; Watson, H C; Hall, L

    1988-01-01

    Using oligonucleotide probes derived from amino acid sequencing information, the structural gene for phosphoglycerate kinase from the extreme thermophile, Thermus thermophilus, was cloned in Escherichia coli and its complete nucleotide sequence determined. The gene consists of an open reading frame corresponding to a protein of 390 amino acid residues (calculated Mr 41,791) with an extreme bias for G or C (93.1%) in the codon third base position. Comparison of the deduced amino acid sequence with that of the corresponding mesophilic yeast enzyme indicated a number of significant differences. These are discussed in terms of the unusual codon bias and their possible role in enhanced protein thermal stability. Images Fig. 1. PMID:3052437

  11. Length variation and sequence divergence in mitochondrial control region of Schizothoracine (Teleostei: Cyperinidae) species.

    PubMed

    Syed, Mudasir Ahmad; Bhat, Farooz Ahmad; Balkhi, Masood-ul Hassan; Bhat, Bilal Ahmad

    2016-01-01

    Schizothoracine fish commonly called snow trouts inhibit the entire network of snow and spring fed cool waters of Kashmir, India. Over 10 species reported earlier, only five species have been found, these include Schizothorax niger, Schizothorax esocinus, Schizothorax plagiostomus, Schizothorax curvifrons and Schizothorax labiatus. The relationship between these species is contradicting. To understand the evolutionary relation of these species, we examined the sequence information of mitochondrial D-loop of 25 individuals representing five species. Sequence alignment showed D-loop region highly variable and length variation was observed in di-nucleotide (TA)n microsatellite between and within species. Interestingly, all these species have (TA)n microsatellite not associated with longer tandem repeats at the 3' end of the mitochondrial control region and do not show heteroplasmy. Our analysis also indicates the presence of four conserved sequence blocks (CSB), CSB-D, CSB-1, CSB-II and CSB-III, four (Termination Associated Sequence) TAS motifs and 15bp pyrimidine block within the mitochondrial control region, that are highly conserved within genus Schizothorax when compared with other species. The phylogenetic analysis carried by Maximum likelihood (ML), Neighbor Joining (NJ) and Bayesian inference (BI) generated almost identical results. The resultant BI tree showed a close genetic relationship of all the five species and supports two distinct grouping of S. esocinus species. Besides the species relation, the presence of length variation in tandem repeats is attributed to differences in predicting the stability of secondary structures. The role of CSBs and TASs, reported so far as main regulatory signals, would explain the conservation of these elements in evolution.

  12. LineageSpecificSeqgen: generating sequence data with lineage-specific variation in the proportion of variable sites

    PubMed Central

    Grievink, Liat Shavit; Penny, David; Hendy, Mike D; Holland, Barbara R

    2009-01-01

    Correction to Shavit Grievink L, Penny D, Hendy MD, Holland BR: LineageSpecificSeqgen: generating sequence data with lineage-specific variation in the proportion of variable sites. BMC Evol Biol 2008, 8(1):317.

  13. Sequence variations in RepMP2/3 and RepMP4 elements reveal intragenomic homologous DNA recombination events in Mycoplasma pneumoniae.

    PubMed

    Spuesens, Emiel B M; Oduber, Minoushka; Hoogenboezem, Theo; Sluijter, Marcel; Hartwig, Nico G; van Rossum, Annemarie M C; Vink, Cornelis

    2009-07-01

    The gene encoding major adhesin protein P1 of Mycoplasma pneumoniae, MPN141, contains two DNA sequence stretches, designated RepMP2/3 and RepMP4, which display variation among strains. This variation allows strains to be differentiated into two major P1 genotypes (1 and 2) and several variants. Interestingly, multiple versions of the RepMP2/3 and RepMP4 elements exist at other sites within the bacterial genome. Because these versions are closely related in sequence, but not identical, it has been hypothesized that they have the capacity to recombine with their counterparts within MPN141, and thereby serve as a source of sequence variation of the P1 protein. In order to determine the variation within the RepMP2/3 and RepMP4 elements, both within the bacterial genome and among strains, we analysed the DNA sequences of all RepMP2/3 and RepMP4 elements within the genomes of 23 M. pneumoniae strains. Our data demonstrate that: (i) recombination is likely to have occurred between two RepMP2/3 elements in four of the strains, and (ii) all previously described P1 genotypes can be explained by inter-RepMP recombination events. Moreover, the difference between the two major P1 genotypes was reflected in all RepMP elements, such that subtype 1 and 2 strains can be differentiated on the basis of sequence variation in each RepMP element. This implies that subtype 1 and subtype 2 strains represent evolutionarily diverged strain lineages. Finally, a classification scheme is proposed in which the P1 genotype of M. pneumoniae isolates can be described in a sequence-based, universal fashion.

  14. Genetic variation among the Mapuche Indians from the Patagonian region of Argentina: mitochondrial DNA sequence variation and allele frequencies of several nuclear genes.

    PubMed

    Ginther, C; Corach, D; Penacino, G A; Rey, J A; Carnese, F R; Hutz, M H; Anderson, A; Just, J; Salzano, F M; King, M C

    1993-01-01

    DNA samples from 60 Mapuche Indians, representing 39 maternal lineages, were genetically characterized for (1) nucleotide sequences of the mtDNA control region; (2) presence or absence of a nine base duplication in mtDNA region V; (3) HLA loci DRB1 and DQA1; (4) variation at three nuclear genes with short tandem repeats; and (5) variation at the polymorphic marker D2S44. The genetic profile of the Mapuche population was compared to other Amerinds and to worldwide populations. Two highly polymorphic portions of the mtDNA control region, comprising 650 nucleotides, were amplified by the polymerase chain reaction (PCR) and directly sequenced. The 39 maternal lineages were defined by two or three generation families identified by the Mapuches. These 39 lineages included 19 different mtDNA sequences that could be grouped into four classes. The same classes of sequences appear in other Amerinds from North, Central, and South American populations separated by thousands of miles, suggesting that the origin of the mtDNA patterns predates the migration to the Americas. The mtDNA sequence similarity between Amerind populations suggests that the migration throughout the Americas occurred rapidly relative to the mtDNA mutation rate. HLA DRB1 alleles 1602 and 1402 were frequent among the Mapuches. These alleles also occur at high frequency among other Amerinds in North and South America, but not among Spanish, Chinese or African-American populations. The high frequency of these alleles throughout the Americas, and their specificity to the Americas, supports the hypothesis that Mapuches and other Amerind groups are closely related.(ABSTRACT TRUNCATED AT 250 WORDS)

  15. Understanding human DNA sequence variation.

    PubMed

    Kidd, K K; Pakstis, A J; Speed, W C; Kidd, J R

    2004-01-01

    Over the past century researchers have identified normal genetic variation and studied that variation in diverse human populations to determine the amounts and distributions of that variation. That information is being used to develop an understanding of the demographic histories of the different populations and the species as a whole, among other studies. With the advent of DNA-based markers in the last quarter century, these studies have accelerated. One of the challenges for the next century is to understand that variation. One component of that understanding will be population genetics. We present here examples of many of the ways these new data can be analyzed from a population perspective using results from our laboratory on multiple individual DNA-based polymorphisms, many clustered in haplotypes, studied in multiple populations representing all major geographic regions of the world. These data support an "out of Africa" hypothesis for human dispersal around the world and begin to refine the understanding of population structures and genetic relationships. We are also developing baseline information against which we can compare findings at different loci to aid in the identification of loci subject, now and in the past, to selection (directional or balancing). We do not yet have a comprehensive understanding of the extensive variation in the human genome, but some of that understanding is coming from population genetics.

  16. Amazonian phylogeography: mtDNA sequence variation in arboreal echimyid rodents (Caviomorpha).

    PubMed

    da Silva, M N; Patton, J L

    1993-09-01

    Patterns of evolutionary relationships among haplotype clades of sequences of the mitochondrial cytochrome b DNA gene are examined for five genera of arboreal rodents of the Caviomorph family Echimyidae from the Amazon Basin. Data are available for 798 bp of sequence from a total of 24 separate localities in Peru, Venezuela, Bolivia, and Brazil for Mesomys, Isothrix, Makalata, Dactylomys, and Echimys. Sequence divergence, corrected for multiple hits, is extensive, ranging from less than 1% for comparisons within populations of over 20% among geographic units within genera. Both the degree of differentiation and the geographic patterning of the variation suggest that more than one species composes the Amazonian distribution of the currently recognized Mesomys hispidus, Isothrix bistriata, Makalata didelphoides, and Dactylomys dactylinus. There is general concordance in the geographic range of haplotype clades for each of these taxa, and the overall level of differentiation within them is largely equivalent. These observations suggest that a common vicariant history underlies the respective diversification of each genus. However, estimated times of divergence based on the rate of third position transversion substitutions for the major clades within each genus typically range above 1 million years. Thus, allopatric isolation precipitating divergence must have been considerably earlier than the late Pleistocene forest fragmentation events commonly invoked for Amazonian biota.

  17. Control of artefactual variation in reported inter-sample relatedness during clinical use of a Mycobacterium tuberculosis sequencing pipeline.

    PubMed

    Wyllie, David H; Sanderson, Nicholas; Myers, Richard; Peto, Tim; Robinson, Esther; Crook, Derrick W; Smith, E Grace; Walker, A Sarah

    2018-06-06

    Contact tracing requires reliable identification of closely related bacterial isolates. When we noticed the reporting of artefactual variation between M. tuberculosis isolates during routine next generation sequencing of Mycobacterium spp, we investigated its basis in 2,018 consecutive M. tuberculosis isolates. In the routine process used, clinical samples were decontaminated and inoculated into broth cultures; from positive broth cultures DNA was extracted, sequenced, reads mapped, and consensus sequences determined. We investigated the process of consensus sequence determination, which selects the most common nucleotide at each position. Having determined the high-quality read depth and depth of minor variants across 8,006 M. tuberculosis genomic regions, we quantified the relationship between the minor variant depth and the amount of non-Mycobacterial bacterial DNA, which originates from commensal microbes killed during sample decontamination. In the presence of non-Mycobacterial bacterial DNA, we found significant increases in minor variant frequencies of more than 1.5 fold in 242 regions covering 5.1% of the M. tuberculosis genome. Included within these were four high variation regions strongly influenced by the amount of non-Mycobacterial bacterial DNA. Excluding these four regions from pairwise distance comparisons reduced biologically implausible variation from 5.2% to 0% in an independent validation set derived from 226 individuals. Thus, we have demonstrated an approach identifying critical genomic regions contributing to clinically relevant artefactual variation in bacterial similarity searches. The approach described monitors the outputs of the complex multi-step laboratory and bioinformatics process, allows periodic process adjustments, and will have application to quality control of routine bacterial genomics. Copyright © 2018 Wyllie et al.

  18. Sequence diversity of hepatitis C virus 6a within the extended interferon sensitivity-determining region correlates with interferon-alpha/ribavirin treatment outcomes.

    PubMed

    Zhou, Daniel X M; Chan, Paul K S; Zhang, Tiejun; Tully, Damien C; Tam, John S

    2010-10-01

    Studies on the association between sequence variability of the interferon sensitivity-determining region (ISDR) of hepatitis C virus and the outcome of treatment have reached conflicting results. In this study, 25 patients infected with HCV 6a who had received interferon-alpha/ribavirin combination treatment were analyzed for the sequence variations. 14 of them had the full genome sequences obtained from a previous study, whereas the other 11 samples were sequenced for the extended ISDR (eISDR). This eISDR fragment covers 192 bp (64 amino acids) upstream and 201 bp (67 amino acids) downstream from the ISDR previously defined for HCV 1b. The comparison between interferon-alpha resistance and response groups for the amino acid mutations located in the full genome (6 and 8 patients respectively) as well as the mutations located in the eISDR (10 and 15 patients respectively) showed that the mutations I2160V, I2256V, V2292I (P<0.05) within eISDR were significantly associated with resistance to treatment. However, the extent of amino acid variations within previously defined ISDR was not associated with resistance to treatment as previously reported. Four amino acid variations I248V (P=0.03-0.06) within E1, R445K (P=0.02-0.05) and S747T (P=0.03) within E2, I861V (P=0.01) within NS2 which located outside the eISDR may also associate with treatment outcome as identified by a prescreening of variations within 14 HCV 6a full genomes. (c) 2010 Elsevier B.V. All rights reserved.

  19. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... in WIPO Standard ST.25 (1998), Appendix 2, Tables 1 and 3. This incorporation by reference was... ST.25 (1998), Appendix 2, Tables 1 and 3, shall be listed in a given sequence as “n” or “Xaa... acids. (1) The amino acids in a protein or peptide sequence shall be listed using the three-letter...

  20. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... in WIPO Standard ST.25 (1998), Appendix 2, Tables 1 and 3. This incorporation by reference was... ST.25 (1998), Appendix 2, Tables 1 and 3, shall be listed in a given sequence as “n” or “Xaa... acids. (1) The amino acids in a protein or peptide sequence shall be listed using the three-letter...

  1. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... in WIPO Standard ST.25 (1998), Appendix 2, Tables 1 and 3. This incorporation by reference was... ST.25 (1998), Appendix 2, Tables 1 and 3, shall be listed in a given sequence as “n” or “Xaa... acids. (1) The amino acids in a protein or peptide sequence shall be listed using the three-letter...

  2. Sequence Variation in the Small-Subunit rRNA Gene of Plasmodium malariae and Prevalence of Isolates with the Variant Sequence in Sichuan, China

    PubMed Central

    Liu, Qing; Zhu, Shenghua; Mizuno, Sahoko; Kimura, Masatsugu; Liu, Peina; Isomura, Shin; Wang, Xingzhen; Kawamoto, Fumihiko

    1998-01-01

    By two PCR-based diagnostic methods, Plasmodium malariae infections have been rediscovered at two foci in the Sichuan province of China, a region where no cases of P. malariae have been officially reported for the last 2 decades. In addition, a variant form of P. malariae which has a deletion of 19 bp and seven substitutions of base pairs in the target sequence of the small-subunit (SSU) rRNA gene was detected with high frequency. Alignment analysis of Plasmodium sp. SSU rRNA gene sequences revealed that the 5′ region of the variant sequence is identical to that of P. vivax or P. knowlesi and its 3′ region is identical to that of P. malariae. The same sequence variations were also found in P. malariae isolates collected along the Thai-Myanmar border, suggesting a wide distribution of this variant form from southern China to Southeast Asia. PMID:9774600

  3. Draft genome sequence of the docosahexaenoic acid producing thraustochytrid Aurantiochytrium sp. T66.

    PubMed

    Liu, Bin; Ertesvåg, Helga; Aasen, Inga Marie; Vadstein, Olav; Brautaset, Trygve; Heggeset, Tonje Marita Bjerkan

    2016-06-01

    Thraustochytrids are unicellular, marine protists, and there is a growing industrial interest in these organisms, particularly because some species, including strains belonging to the genus Aurantiochytrium, accumulate high levels of docosahexaenoic acid (DHA). Here, we report the draft genome sequence of Aurantiochytrium sp. T66 (ATCC PRA-276), with a size of 43 Mbp, and 11,683 predicted protein-coding sequences. The data has been deposited at DDBJ/EMBL/Genbank under the accession LNGJ00000000. The genome sequence will contribute new insight into DHA biosynthesis and regulation, providing a basis for metabolic engineering of thraustochytrids.

  4. ANCAC: amino acid, nucleotide, and codon analysis of COGs – a tool for sequence bias analysis in microbial orthologs

    PubMed Central

    2012-01-01

    Background The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG) within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Results Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC’s NUCOCOG dataset as the largest one available for that purpose thus far. Conclusions Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills. PMID:22958836

  5. Simple Sequence Repeats Provide a Substrate for Phenotypic Variation in the Neurospora crassa Circadian Clock

    PubMed Central

    Michael, Todd P.; Park, Sohyun; Kim, Tae-Sung; Booth, Jim; Byer, Amanda; Sun, Qi; Chory, Joanne; Lee, Kwangwon

    2007-01-01

    Background WHITE COLLAR-1 (WC-1) mediates interactions between the circadian clock and the environment by acting as both a core clock component and as a blue light photoreceptor in Neurospora crassa. Loss of the amino-terminal polyglutamine (NpolyQ) domain in WC-1 results in an arrhythmic circadian clock; this data is consistent with this simple sequence repeat (SSR) being essential for clock function. Methodology/Principal Findings Since SSRs are often polymorphic in length across natural populations, we reasoned that investigating natural variation of the WC-1 NpolyQ may provide insight into its role in the circadian clock. We observed significant phenotypic variation in the period, phase and temperature compensation of circadian regulated asexual conidiation across 143 N. crassa accessions. In addition to the NpolyQ, we identified two other simple sequence repeats in WC-1. The sizes of all three WC-1 SSRs correlated with polymorphisms in other clock genes, latitude and circadian period length. Furthermore, in a cross between two N. crassa accessions, the WC-1 NpolyQ co-segregated with period length. Conclusions/Significance Natural variation of the WC-1 NpolyQ suggests a mechanism by which period length can be varied and selected for by the local environment that does not deleteriously affect WC-1 activity. Understanding natural variation in the N. crassa circadian clock will facilitate an understanding of how fungi exploit their environments. PMID:17726525

  6. A diurnal component to the variation in sieve tube amino acid content in wheat.

    PubMed

    Gattolin, Stefano; Newbury, H John; Bale, Jeffrey S; Tseng, Hua-Ming; Barrett, David A; Pritchard, Jeremy

    2008-06-01

    We have used high-sensitivity capillary electrophoresis coupled to a laser-induced fluorescence detection method to quantify 16 amino acids in wheat (Triticum aestivum) sieve tube (ST) samples as small as 2 nL collected by severing the stylets of feeding aphids. The sensitivity of the method was sufficient to determine a quantitative amino acid profile of individual STs without the need to bulk samples to produce larger volumes for analysis. This allowed the observation of the full range of variation that exists in individual STs. Some of the total concentrations of amino acids recorded are higher than those reported previously. The results obtained show variation in the concentrations of phenylalanine (Phe), histidine/valine (His/Val), leucine/isoleucine (Leu/Ile), arginine, asparagine, glutamine, tyrosine (Tyr), and lysine (Lys) across the ST samples. These could not be explained by plant-to-plant variation. Statistical analyses revealed five analytes (Tyr, Lys, Phe, His/Val, and Leu/Ile) that showed striking covariation in their concentrations across ST samples. A regression analysis revealed a significant relationship between the concentrations of Tyr, Lys, Phe, Leu/Ile, His/Val, asparagine, arginine, and proline and the time of collection of ST samples, with these amino acids increasing in concentration during the afternoon. This increase was confirmed to occur in individual STs by analyzing samples obtained from stylet bundles exuding for many hours. Finally, an apparent relationship between the exudation rate of ST sap and its total amino acid concentration was observed: samples containing higher total amino acid concentrations were observed to exude from the severed stylet bundles more slowly.

  7. Global sequence variation in the histidine-rich proteins 2 and 3 of Plasmodium falciparum: implications for the performance of malaria rapid diagnostic tests

    PubMed Central

    2010-01-01

    Background Accurate diagnosis is essential for prompt and appropriate treatment of malaria. While rapid diagnostic tests (RDTs) offer great potential to improve malaria diagnosis, the sensitivity of RDTs has been reported to be highly variable. One possible factor contributing to variable test performance is the diversity of parasite antigens. This is of particular concern for Plasmodium falciparum histidine-rich protein 2 (PfHRP2)-detecting RDTs since PfHRP2 has been reported to be highly variable in isolates of the Asia-Pacific region. Methods The pfhrp2 exon 2 fragment from 458 isolates of P. falciparum collected from 38 countries was amplified and sequenced. For a subset of 80 isolates, the exon 2 fragment of histidine-rich protein 3 (pfhrp3) was also amplified and sequenced. DNA sequence and statistical analysis of the variation observed in these genes was conducted. The potential impact of the pfhrp2 variation on RDT detection rates was examined by analysing the relationship between sequence characteristics of this gene and the results of the WHO product testing of malaria RDTs: Round 1 (2008), for 34 PfHRP2-detecting RDTs. Results Sequence analysis revealed extensive variations in the number and arrangement of various repeats encoded by the genes in parasite populations world-wide. However, no statistically robust correlation between gene structure and RDT detection rate for P. falciparum parasites at 200 parasites per microlitre was identified. Conclusions The results suggest that despite extreme sequence variation, diversity of PfHRP2 does not appear to be a major cause of RDT sensitivity variation. PMID:20470441

  8. Diff-seq: A high throughput sequencing-based mismatch detection assay for DNA variant enrichment and discovery

    PubMed Central

    Karas, Vlad O; Sinnott-Armstrong, Nicholas A; Varghese, Vici; Shafer, Robert W; Greenleaf, William J; Sherlock, Gavin

    2018-01-01

    Abstract Much of the within species genetic variation is in the form of single nucleotide polymorphisms (SNPs), typically detected by whole genome sequencing (WGS) or microarray-based technologies. However, WGS produces mostly uninformative reads that perfectly match the reference, while microarrays require genome-specific reagents. We have developed Diff-seq, a sequencing-based mismatch detection assay for SNP discovery without the requirement for specialized nucleic-acid reagents. Diff-seq leverages the Surveyor endonuclease to cleave mismatched DNA molecules that are generated after cross-annealing of a complex pool of DNA fragments. Sequencing libraries enriched for Surveyor-cleaved molecules result in increased coverage at the variant sites. Diff-seq detected all mismatches present in an initial test substrate, with specific enrichment dependent on the identity and context of the variation. Application to viral sequences resulted in increased observation of variant alleles in a biologically relevant context. Diff-Seq has the potential to increase the sensitivity and efficiency of high-throughput sequencing in the detection of variation. PMID:29361139

  9. Amino acid selective unlabeling for sequence specific resonance assignments in proteins

    PubMed Central

    Krishnarjuna, B.; Jaipuria, Garima; Thakur, Anushikha

    2010-01-01

    Sequence specific resonance assignment constitutes an important step towards high-resolution structure determination of proteins by NMR and is aided by selective identification and assignment of amino acid types. The traditional approach to selective labeling yields only the chemical shifts of the particular amino acid being selected and does not help in establishing a link between adjacent residues along the polypeptide chain, which is important for sequential assignments. An alternative approach is the method of amino acid selective ‘unlabeling’ or reverse labeling, which involves selective unlabeling of specific amino acid types against a uniformly 13C/15N labeled background. Based on this method, we present a novel approach for sequential assignments in proteins. The method involves a new NMR experiment named, {12COi–15Ni+1}-filtered HSQC, which aids in linking the 1HN/15N resonances of the selectively unlabeled residue, i, and its C-terminal neighbor, i + 1, in HN-detected double and triple resonance spectra. This leads to the assignment of a tri-peptide segment from the knowledge of the amino acid types of residues: i − 1, i and i + 1, thereby speeding up the sequential assignment process. The method has the advantage of being relatively inexpensive, applicable to 2H labeled protein and can be coupled with cell-free synthesis and/or automated assignment approaches. A detailed survey involving unlabeling of different amino acid types individually or in pairs reveals that the proposed approach is also robust to misincorporation of 14N at undesired sites. Taken together, this study represents the first application of selective unlabeling for sequence specific resonance assignments and opens up new avenues to using this methodology in protein structural studies. Electronic supplementary material The online version of this article (doi:10.1007/s10858-010-9459-z) contains supplementary material, which is available to authorized users. PMID:21153044

  10. Construction of a large collection of small genome variations in French dairy and beef breeds using whole-genome sequences.

    PubMed

    Boussaha, Mekki; Michot, Pauline; Letaief, Rabia; Hozé, Chris; Fritz, Sébastien; Grohs, Cécile; Esquerré, Diane; Duchesne, Amandine; Philippe, Romain; Blanquet, Véronique; Phocas, Florence; Floriot, Sandrine; Rocha, Dominique; Klopp, Christophe; Capitan, Aurélien; Boichard, Didier

    2016-11-15

    In recent years, several bovine genome sequencing projects were carried out with the aim of developing genomic tools to improve dairy and beef production efficiency and sustainability. In this study, we describe the first French cattle genome variation dataset obtained by sequencing 274 whole genomes representing several major dairy and beef breeds. This dataset contains over 28 million single nucleotide polymorphisms (SNPs) and small insertions and deletions. Comparisons between sequencing results and SNP array genotypes revealed a very high genotype concordance rate, which indicates the good quality of our data. To our knowledge, this is the first large-scale catalog of small genomic variations in French dairy and beef cattle. This resource will contribute to the study of gene functions and population structure and also help to improve traits through genotype-guided selection.

  11. Analysis of simian immunodeficiency virus sequence variation in tissues of rhesus macaques with simian AIDS.

    PubMed Central

    Kodama, T; Mori, K; Kawahara, T; Ringler, D J; Desrosiers, R C

    1993-01-01

    One rhesus macaque displayed severe encephalomyelitis and another displayed severe enterocolitis following infection with molecularly cloned simian immunodeficiency virus (SIV) strain SIVmac239. Little or no free anti-SIV antibody developed in these two macaques, and they died relatively quickly (4 to 6 months) after infection. Manifestation of the tissue-specific disease in these macaques was associated with the emergence of variants with high replicative capacity for macrophages and primary infection of tissue macrophages. The nature of sequence variation in the central region (vif, vpr, and vpx), the env gene, and the nef long terminal repeat (LTR) region in brain, colon, and other tissues was examined to see whether specific genetic changes were associated with SIV replication in brain or gut. Sequence analysis revealed strong conservation of the intergenic central region, nef, and the LTR. However, analysis of env sequences in these two macaques and one other revealed significant, interesting patterns of sequence variation. (i) Changes in env that were found previously to contribute to the replicative ability of SIVmac for macrophages in culture were present in the tissues of these animals. (ii) The greatest variability was located in the regions between V1 and V2 and from "V3" through C3 in gp120, which are different in location from the variable regions observed previously in animals with strong antibody responses and long-term persistent infection. (iii) The predominant sequence change of D-->N at position 385 in C3 is most surprising, since this change in both SIV and human immunodeficiency virus type 1 has been associated with dramatically diminished affinity for CD4 and replication in vitro. (iv) The nature of sequence changes at some positions (146, 178, 345, 385, and "V3") suggests that viral replication in brain and gut may be facilitated by specific sequence changes in env in addition to those that impart a general ability to replicate well in

  12. 37 CFR 1.824 - Form and format for nucleotide and/or amino acid sequence submissions in computer readable form.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2010-07-01 2010-07-01 false Form and format for... And/or Amino Acid Sequences § 1.824 Form and format for nucleotide and/or amino acid sequence... Code for Information Interchange (ASCII) text. No other formats shall be allowed. (3) The computer...

  13. Variation in Leaf Respiration Rates at Night Correlates with Carbohydrate and Amino Acid Supply1[OPEN

    PubMed Central

    Lee, Chun Pong; Cheng, Riyan

    2017-01-01

    Plant respiration can theoretically be fueled by and dependent upon an array of central metabolism components; however, which ones are responsible for the quantitative variation found in respiratory rates is unknown. Here, large-scale screens revealed 2-fold variation in nighttime leaf respiration rate (RN) among mature leaves from an Arabidopsis (Arabidopsis thaliana) natural accession collection grown under common favorable conditions. RN variation was mostly maintained in the absence of genetic variation, which emphasized the low heritability of RN and its plasticity toward relatively small environmental differences within the sampling regime. To pursue metabolic explanations for leaf RN variation, parallel metabolite level profiling and assays of total protein and starch were performed. Within an accession, RN correlated strongly with stored carbon substrates, including starch and dicarboxylic acids, as well as sucrose, major amino acids, shikimate, and salicylic acid. Among different accessions, metabolite-RN correlations were maintained with protein, sucrose, and major amino acids but not stored carbon substrates. A complementary screen of the effect of exogenous metabolites and effectors on leaf RN revealed that (1) RN is stimulated by the uncoupler FCCP and high levels of substrates, demonstrating that both adenylate turnover and substrate supply can limit leaf RN, and (2) inorganic nitrogen did not stimulate RN, consistent with limited nighttime nitrogen assimilation. Simultaneous measurements of RN and protein synthesis revealed that these processes were largely uncorrelated in mature leaves. These results indicate that differences in preceding daytime metabolic activities are the major source of variation in mature leaf RN under favorable controlled conditions. PMID:28615345

  14. Sequencing, bioinformatic characterization and expression pattern of a putative amino acid transporter from the parasitic cestode Echinococcus granulosus.

    PubMed

    Camicia, Federico; Paredes, Rodolfo; Chalar, Cora; Galanti, Norbel; Kamenetzky, Laura; Gutierrez, Ariana; Rosenzvit, Mara C

    2008-03-31

    We have sequenced and partially characterized an Echinococcus granulosus cDNA, termed egat1, from a protoscolex signal sequence trap (SST) cDNA library. The isolated 1627 bp long cDNA contains an ORF of 489 amino acids and shows an amino acid identity of 30% with neutral and excitatory amino acid transporters members of the Dicarboxylate/Amino Acid Na+ and/or H+ Cation Symporter family (DAACS) (TC 2.A.23). Additional bioinformatics analysis of EgAT1, confirmed the results obtained by similarity searches and showed the presence of 9 to 10 transmembrane domains, consensus sequences for N-glycosylation between the third and fourth transmembrane domain, a highly similar hydropathy profile with ASCT1 (a known member of DAACS family), high score with SDF (Sodium Dicarboxilate Family) and similar motifs with EDTRANSPORT, a fingerprint of excitatory amino acid transporters. The localization of the putative amino acid transporter was analyzed by in situ hybridization and immunofluorescence in protoscoleces and associated germinal layer. The in situ hybridization labelling indicates the distribution of egat1 mRNA throughout the tegument. EgAT1 protein, which showed in Western blots a molecular mass of approximately 60 kD, is localized in the subtegumental region of the metacestode, particularly around suckers and rostellum of protoscoleces and layers from brood capsules. The sequence and expression analyses of EgAT1 pave the way for functional analysis of amino acids transporters of E. granulosus and its evaluation as new drug targets against cystic echinococcosis.

  15. Amino- and carboxyl-terminal amino acid sequences of proteins coded by gag gene of murine leukemia virus

    PubMed Central

    Oroszlan, Stephen; Henderson, Louis E.; Stephenson, John R.; Copeland, Terry D.; Long, Cedric W.; Ihle, James N.; Gilden, Raymond V.

    1978-01-01

    The amino- and carboxyl-terminal amino acid sequences of proteins (p10, p12, p15, and p30) coded by the gag gene of Rauscher and AKR murine leukemia viruses were determined. Among these proteins, p15 from both viruses appears to have a blocked amino end. Proline was found to be the common NH2 terminus of both p30s and both p12s, and alanine of both p10s. The amino-terminal sequences of p30s are identical, as are those of p10s, while the p12 sequences are clearly distinctive but also show substantial homology. The carboxyl-terminal amino acids of both viral p30s and p12s are leucine and phenylalanine, respectively. Rauscher leukemia virus p15 has tyrosine as the carboxyl terminus while AKR virus p15 has phenylalanine in this position. The compositional and sequence data provide definite chemical criteria for the identification of analogous gag gene products and for the comparison of viral proteins isolated in different laboratories. On the basis of amino acid sequences and the previously proposed H-p15-p12-p30-p10-COOH peptide sequence in the precursor polyprotein, a model for cleavage sites involved in the post-translational processing of the precursor coded for by the gag gene is proposed. PMID:206897

  16. Exploiting sequence similarity to validate the sensitivity of SNP arrays in detecting fine-scaled copy number variations.

    PubMed

    Wong, Gerard; Leckie, Christopher; Gorringe, Kylie L; Haviv, Izhak; Campbell, Ian G; Kowalczyk, Adam

    2010-04-15

    High-density single nucleotide polymorphism (SNP) genotyping arrays are efficient and cost effective platforms for the detection of copy number variation (CNV). To ensure accuracy in probe synthesis and to minimize production costs, short oligonucleotide probe sequences are used. The use of short probe sequences limits the specificity of binding targets in the human genome. The specificity of these short probeset sequences has yet to be fully analysed against a normal reference human genome. Sequence similarity can artificially elevate or suppress copy number measurements, and hence reduce the reliability of affected probe readings. For the purpose of detecting narrow CNVs reliably down to the width of a single probeset, sequence similarity is an important issue that needs to be addressed. We surveyed the Affymetrix Human Mapping SNP arrays for probeset sequence similarity against the reference human genome. Utilizing sequence similarity results, we identified a collection of fine-scaled putative CNVs between gender from autosomal probesets whose sequence matches various loci on the sex chromosomes. To detect these variations, we utilized our statistical approach, Detecting REcurrent Copy number change using rank-order Statistics (DRECS), and showed that its performance was superior and more stable than the t-test in detecting CNVs. Through the application of DRECS on the HapMap population datasets with multi-matching probesets filtered, we identified biologically relevant SNPs in aberrant regions across populations with known association to physical traits, such as height, covered by the span of a single probe. This provided empirical confirmation of the existence of naturally occurring narrow CNVs as well as the sensitivity of the Affymetrix SNP array technology in detecting them. The MATLAB implementation of DRECS is available at http://ww2.cs.mu.oz.au/ approximately gwong/DRECS/index.html.

  17. alpha-Lactalbumin species variation, HAMLET formation, and tumor cell death.

    PubMed

    Pettersson, Jenny; Mossberg, Ann-Kristin; Svanborg, Catharina

    2006-06-23

    HAMLET (human alpha-lactalbumin made lethal to tumor cells) is a tumoricidal complex of apo alpha-lactalbumin and oleic acid, formed in casein after low pH treatment of human milk. This study examined if HAMLET-like complexes are present in casein from different species and if isolated alpha-lactalbumin from those species can form such complexes with oleic acid. Casein from human, bovine, equine, and porcine milk was separated by ion exchange chromatography and active complexes were only found in human casein. This was not explained by alpha-lactalbumin sequence variation, as purified bovine, equine, porcine, and caprine alpha-lactalbumins formed complexes with oleic acid with biological activity similar to HAMLET. We conclude that structural variation of alpha-lactalbumins does not preclude the formation of HAMLET-like complexes and that natural HAMLET formation in casein was unique to human milk, which also showed the highest oleic acid content.

  18. [Genetic variation analysis of canine parvovirus VP2 gene in China].

    PubMed

    Yi, Li; Cheng, Shi-Peng; Yan, Xi-Jun; Wang, Jian-Ke; Luo, Bin

    2009-11-01

    To recognize the molecular biology character, phylogenetic relationship and the state quo prevalent of Canine parvovirus (CPV), Faecal samnples from pet dogs with acute enteritis in the cities of Beijing, Wuhan, and Nanjing were collected and tested for CPV by PCR and other assay between 2006 and 2008. There was no CPV to FPV (MEV) variation by PCR-RFLP analysis in all samples. The complete ORFs of VP2 genes were obtained by PCR from 15 clinical CPVs and 2 CPV vaccine strains. All amplicons were cloned and sequenced. Analysis of the VP2 sequences showed that clinical CPVs both belong to CPV-2a subtype, and could be classified into a new cluster by amino acids contrasting which contains Tyr-->Ile (324) mutation. Besides the 2 CPV vaccine strains belong to CPV-2 subtype, and both of them have scattered variation in amino acids residues of VP2 protein. Construction of the phylogenetic tree based on CPV VP2 sequence showed these 15 CPV clinical strains were in close relationship with Korea strain K001 than CPV-2a isolates in other countries at early time, It is indicated that the canine parvovirus genetic variation was associated with location and time in some degree. The survey of CPV capsid protein VP2 gene provided the useful information for the identification of CPV types and understanding of their genetic relationship.

  19. Hybridization properties of long nucleic acid probes for detection of variable target sequences, and development of a hybridization prediction algorithm

    PubMed Central

    Öhrmalm, Christina; Jobs, Magnus; Eriksson, Ronnie; Golbob, Sultan; Elfaitouri, Amal; Benachenhou, Farid; Strømme, Maria; Blomberg, Jonas

    2010-01-01

    One of the main problems in nucleic acid-based techniques for detection of infectious agents, such as influenza viruses, is that of nucleic acid sequence variation. DNA probes, 70-nt long, some including the nucleotide analog deoxyribose-Inosine (dInosine), were analyzed for hybridization tolerance to different amounts and distributions of mismatching bases, e.g. synonymous mutations, in target DNA. Microsphere-linked 70-mer probes were hybridized in 3M TMAC buffer to biotinylated single-stranded (ss) DNA for subsequent analysis in a Luminex® system. When mismatches interrupted contiguous matching stretches of 6 nt or longer, it had a strong impact on hybridization. Contiguous matching stretches are more important than the same number of matching nucleotides separated by mismatches into several regions. dInosine, but not 5-nitroindole, substitutions at mismatching positions stabilized hybridization remarkably well, comparable to N (4-fold) wobbles in the same positions. In contrast to shorter probes, 70-nt probes with judiciously placed dInosine substitutions and/or wobble positions were remarkably mismatch tolerant, with preserved specificity. An algorithm, NucZip, was constructed to model the nucleation and zipping phases of hybridization, integrating both local and distant binding contributions. It predicted hybridization more exactly than previous algorithms, and has the potential to guide the design of variation-tolerant yet specific probes. PMID:20864443

  20. Implication of the cause of differences in 3D structures of proteins with high sequence identity based on analyses of amino acid sequences and 3D structures.

    PubMed

    Matsuoka, Masanari; Sugita, Masatake; Kikuchi, Takeshi

    2014-09-18

    Proteins that share a high sequence homology while exhibiting drastically different 3D structures are investigated in this study. Recently, artificial proteins related to the sequences of the GA and IgG binding GB domains of human serum albumin have been designed. These artificial proteins, referred to as GA and GB, share 98% amino acid sequence identity but exhibit different 3D structures, namely, a 3α bundle versus a 4β + α structure. Discriminating between their 3D structures based on their amino acid sequences is a very difficult problem. In the present work, in addition to using bioinformatics techniques, an analysis based on inter-residue average distance statistics is used to address this problem. It was hard to distinguish which structure a given sequence would take only with the results of ordinary analyses like BLAST and conservation analyses. However, in addition to these analyses, with the analysis based on the inter-residue average distance statistics and our sequence tendency analysis, we could infer which part would play an important role in its structural formation. The results suggest possible determinants of the different 3D structures for sequences with high sequence identity. The possibility of discriminating between the 3D structures based on the given sequences is also discussed.

  1. Analyses of natural variation indicates that the absence of RPS4/RRS1 and amino acid change in RPS4 cause loss of their functions and resistance to pathogens.

    PubMed

    Narusaka, Mari; Iuchi, Satoshi; Narusaka, Yoshihiro

    2017-03-04

    A pair of Arabidopsis thaliana resistance proteins, RPS4 and RRS1, recognizes the cognate Avr effector from the bacterial pathogens Pseudomonas syringae pv. tomato expressing avrRps4 (Pst-avrRps4), Ralstonia solanacearum, and the fungal pathogen Colletotrichum higginsianum and leads to defense signaling activation against the pathogens. In the present study, we analyzed 14 A. thaliana accessions for natural variation in Pst-avrRps4 and C. higginsianum susceptibility, and found new compatible and incompatible Arabidopsis-pathogen interactions. We first found that A. thaliana accession Cvi-0 is susceptible to Pst-avrRps4. Interestingly, the genome sequence assembly indicated that Cvi-0 lost both RPS4 and RRS1, but not RPS4B and RRS1B, compared to the reference genome sequence from A. thaliana accession Col-0. On the other hand, the natural variation analysis of RPS4 alleles from various Arabidopsis accessions revealed that one amino-acid change, Y950H, is responsible for the loss of resistance to Pst-avrRps4 and C. higginsianum in RLD-0. Our data indicate that the amino acid change, Y950H, in RPS4 resulted in the loss of both RPS4 and RRS1 functions and resistance to pathogens.

  2. BIOLOGICAL RESPONSE TO VARIATION OF ACID-VOLATILE SULFIDES AND METALS IN FIELD-EXPOSED SPIKED SEDIMENTS

    EPA Science Inventory

    Vertical and temporal variations of acid-volatile sulfides (AVS) and simultaneously extracted metals (SEM) in sediment can control biological impacts of metals. To assess the significance of these variations in field sediments, sediments spiked with cadmium, copper, lead, nickel ...

  3. Laying-sequence-specific variation in yolk oestrogen levels, and relationship to plasma oestrogen in female zebra finches (Taeniopygia guttata)

    PubMed Central

    Williams, Tony D.; Ames, Caroline E.; Kiparissis, Yiannis; Wynne-Edwards, Katherine E.

    2005-01-01

    We investigated the relationship between plasma and yolk oestrogens in laying female zebra finches (Taeniopygia guttata) by manipulating plasma oestradiol (E2) levels, via injection of oestradiol-17β, in a sequence-specific manner to maintain chronically high plasma levels for later-developing eggs (contrasting with the endogenous pattern of decreasing plasma E2 concentrations during laying). We report systematic variation in yolk oestrogen concentrations, in relation to laying sequence, similar to that widely reported for androgenic steroids. In sham-manipulated females, yolk E2 concentrations decreased with laying sequence. However, in E2-treated females plasma E2 levels were higher during the period of rapid yolk development of later-laid eggs, compared with control females. As a consequence, we reversed the laying-sequence-specific pattern of yolk E2: in E2-treated females, yolk E2 concentrations increased with laying-sequence. In general therefore, yolk E2 levels were a direct reflection of plasma E2 levels. However, in control females there was some inter-individual variability in the endogenous pattern of plasma E2 levels through the laying cycle which could generate variation in sequence-specific patterns of yolk hormone levels even if these primarily reflect circulating steroid levels. PMID:15695208

  4. Comparative venom gland transcriptomics of Naja kaouthia (monocled cobra) from Malaysia and Thailand: elucidating geographical venom variation and insights into sequence novelty

    PubMed Central

    Chanhome, Lawan; Tan, Nget Hong

    2017-01-01

    exclusively and abundantly expressed cytotoxin CTX-3 in NK-T. The findings suggested correlation with the geographical variation in proteome and toxicity of the venom, and support the call for optimising antivenom production and use in the region. Besides, the current study uncovered full and partial sequences of numerous toxin genes from N. kaouthia which have not been reported hitherto; these include N. kaouthia-specific l-amino acid oxidase (LAAO), snake venom serine protease (SVSP), cystatin, acetylcholinesterase (AChE), hyaluronidase (HYA), waprin, phospholipase B (PLB), aminopeptidase (AP), neprilysin, etc. Taken together, the findings further enrich the snake toxin database and provide deeper insights into the genetic diversity of cobra venom toxins. PMID:28392982

  5. Conservation of the C-type lectin fold for massive sequence variation in a Treponema diversity-generating retroelement

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Le Coq, Johanne; Ghosh, Partho

    2012-06-19

    Anticipatory ligand binding through massive protein sequence variation is rare in biological systems, having been observed only in the vertebrate adaptive immune response and in a phage diversity-generating retroelement (DGR). Earlier work has demonstrated that the prototypical DGR variable protein, major tropism determinant (Mtd), meets the demands of anticipatory ligand binding by novel means through the C-type lectin (CLec) fold. However, because of the low sequence identity among DGR variable proteins, it has remained unclear whether the CLec fold is a general solution for DGRs. We have addressed this problem by determining the structure of a second DGR variable protein,more » TvpA, from the pathogenic oral spirochete Treponema denticola. Despite its weak sequence identity to Mtd ({approx}16%), TvpA was found to also have a CLec fold, with predicted variable residues exposed in a ligand-binding site. However, this site in TvpA was markedly more variable than the one in Mtd, reflecting the unprecedented approximate 10{sup 20} potential variability of TvpA. In addition, similarity between TvpA and Mtd with formylglycine-generating enzymes was detected. These results provide strong evidence for the conservation of the formylglycine-generating enzyme-type CLec fold among DGRs as a means of accommodating massive sequence variation.« less

  6. Conservation of the C-type lectin fold for massive sequence variation in a Treponema diversity-generating retroelement

    PubMed Central

    Le Coq, Johanne; Ghosh, Partho

    2011-01-01

    Anticipatory ligand binding through massive protein sequence variation is rare in biological systems, having been observed only in the vertebrate adaptive immune response and in a phage diversity-generating retroelement (DGR). Earlier work has demonstrated that the prototypical DGR variable protein, major tropism determinant (Mtd), meets the demands of anticipatory ligand binding by novel means through the C-type lectin (CLec) fold. However, because of the low sequence identity among DGR variable proteins, it has remained unclear whether the CLec fold is a general solution for DGRs. We have addressed this problem by determining the structure of a second DGR variable protein, TvpA, from the pathogenic oral spirochete Treponema denticola. Despite its weak sequence identity to Mtd (∼16%), TvpA was found to also have a CLec fold, with predicted variable residues exposed in a ligand-binding site. However, this site in TvpA was markedly more variable than the one in Mtd, reflecting the unprecedented approximate 1020 potential variability of TvpA. In addition, similarity between TvpA and Mtd with formylglycine-generating enzymes was detected. These results provide strong evidence for the conservation of the formylglycine-generating enzyme-type CLec fold among DGRs as a means of accommodating massive sequence variation. PMID:21873231

  7. Conservation of the C-type lectin fold for massive sequence variation in a Treponema diversity-generating retroelement.

    PubMed

    Le Coq, Johanne; Ghosh, Partho

    2011-08-30

    Anticipatory ligand binding through massive protein sequence variation is rare in biological systems, having been observed only in the vertebrate adaptive immune response and in a phage diversity-generating retroelement (DGR). Earlier work has demonstrated that the prototypical DGR variable protein, major tropism determinant (Mtd), meets the demands of anticipatory ligand binding by novel means through the C-type lectin (CLec) fold. However, because of the low sequence identity among DGR variable proteins, it has remained unclear whether the CLec fold is a general solution for DGRs. We have addressed this problem by determining the structure of a second DGR variable protein, TvpA, from the pathogenic oral spirochete Treponema denticola. Despite its weak sequence identity to Mtd (∼16%), TvpA was found to also have a CLec fold, with predicted variable residues exposed in a ligand-binding site. However, this site in TvpA was markedly more variable than the one in Mtd, reflecting the unprecedented approximate 10(20) potential variability of TvpA. In addition, similarity between TvpA and Mtd with formylglycine-generating enzymes was detected. These results provide strong evidence for the conservation of the formylglycine-generating enzyme-type CLec fold among DGRs as a means of accommodating massive sequence variation.

  8. CNV-TV: a robust method to discover copy number variation from short sequencing reads.

    PubMed

    Duan, Junbo; Zhang, Ji-Gang; Deng, Hong-Wen; Wang, Yu-Ping

    2013-05-02

    Copy number variation (CNV) is an important structural variation (SV) in human genome. Various studies have shown that CNVs are associated with complex diseases. Traditional CNV detection methods such as fluorescence in situ hybridization (FISH) and array comparative genomic hybridization (aCGH) suffer from low resolution. The next generation sequencing (NGS) technique promises a higher resolution detection of CNVs and several methods were recently proposed for realizing such a promise. However, the performances of these methods are not robust under some conditions, e.g., some of them may fail to detect CNVs of short sizes. There has been a strong demand for reliable detection of CNVs from high resolution NGS data. A novel and robust method to detect CNV from short sequencing reads is proposed in this study. The detection of CNV is modeled as a change-point detection from the read depth (RD) signal derived from the NGS, which is fitted with a total variation (TV) penalized least squares model. The performance (e.g., sensitivity and specificity) of the proposed approach are evaluated by comparison with several recently published methods on both simulated and real data from the 1000 Genomes Project. The experimental results showed that both the true positive rate and false positive rate of the proposed detection method do not change significantly for CNVs with different copy numbers and lengthes, when compared with several existing methods. Therefore, our proposed approach results in a more reliable detection of CNVs than the existing methods.

  9. Amino acid sequence of the smaller basic protein from rat brain myelin

    PubMed Central

    Dunkley, Peter R.; Carnegie, Patrick R.

    1974-01-01

    1. The complete amino acid sequence of the smaller basic protein from rat brain myelin was determined. This protein differs from myelin basic proteins of other species in having a deletion of a polypeptide of 40 amino acid residues from the centre of the molecule. 2. A detailed comparison is made of the constant and variable regions in a group of myelin basic proteins from six species. 3. An arginine residue in the rat protein was found to be partially methylated. The ratio of methylated to unmethylated arginine at this position differed from that found for the human basic protein. 4. Three tryptic peptides were isolated in more than one form. The differences between the two forms of each peptide are discussed in relation to the electrophoretic heterogeneity of myelin basic proteins, which is known to occur at alkaline pH values. 5. Detailed evidence for the amino acid sequence of the protein has been deposited as Supplementary Publication SUP 50029 at the British Library (Lending Division) (formerly the National Lending Library for Science and Technology), Boston Spa, Yorks. LS23 7BQ, U.K., from whom copies may be obtained on the terms given in Biochem. J. (1973) 131, 5. PMID:4141893

  10. Homology analyses of the protein sequences of fatty acid synthases from chicken liver, rat mammary gland, and yeast

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chang, Soo-Ik; Hammes, G.G.

    1989-11-01

    Homology analyses of the protein sequences of chicken liver and rat mammary gland fatty acid synthases were carried out. The amino acid sequences of the chicken and rat enzymes are 67% identical. If conservative substitutions are allowed, 78% of the amino acids are matched. A region of low homologies exists between the functional domains, in particular around amino acid residues 1059-1264 of the chicken enzyme. Homologies between the active sites of chicken and rat and of chicken and yeast enzymes have been analyzed by an alignment method. A high degree of homology exists between the active sites of the chickenmore » and rat enzymes. However, the chicken and yeast enzymes show a lower degree of homology. The DADPH-binding dinucleotide folds of the {beta}-ketoacyl reductase and the enoyl reductase sites were identified by comparison with a known consensus sequence for the DADP- and FAD-binding dinucleotide folds. The active sites of all of the enzymes are primarily in hydrophobic regions of the protein. This study suggests that the genes for the functional domains of fatty acid synthase were originally separated, and these genes were connected to each other by using different connecting nucleotide sequences in different species. An alternative explanation for the differences in rat and chicken is a common ancestry and mutations in the joining regions during evolution.« less

  11. HLA DNA Sequence Variation among Human Populations: Molecular Signatures of Demographic and Selective Events

    PubMed Central

    Buhler, Stéphane; Sanchez-Mazas, Alicia

    2011-01-01

    Molecular differences between HLA alleles vary up to 57 nucleotides within the peptide binding coding region of human Major Histocompatibility Complex (MHC) genes, but it is still unclear whether this variation results from a stochastic process or from selective constraints related to functional differences among HLA molecules. Although HLA alleles are generally treated as equidistant molecular units in population genetic studies, DNA sequence diversity among populations is also crucial to interpret the observed HLA polymorphism. In this study, we used a large dataset of 2,062 DNA sequences defined for the different HLA alleles to analyze nucleotide diversity of seven HLA genes in 23,500 individuals of about 200 populations spread worldwide. We first analyzed the HLA molecular structure and diversity of these populations in relation to geographic variation and we further investigated possible departures from selective neutrality through Tajima's tests and mismatch distributions. All results were compared to those obtained by classical approaches applied to HLA allele frequencies. Our study shows that the global patterns of HLA nucleotide diversity among populations are significantly correlated to geography, although in some specific cases the molecular information reveals unexpected genetic relationships. At all loci except HLA-DPB1, populations have accumulated a high proportion of very divergent alleles, suggesting an advantage of heterozygotes expressing molecularly distant HLA molecules (asymmetric overdominant selection model). However, both different intensities of selection and unequal levels of gene conversion may explain the heterogeneous mismatch distributions observed among the loci. Also, distinctive patterns of sequence divergence observed at the HLA-DPB1 locus suggest current neutrality but old selective pressures on this gene. We conclude that HLA DNA sequences advantageously complement HLA allele frequencies as a source of data used to explore the

  12. Molecular interaction networks in the analyses of sequence variation and proteomics data.

    PubMed

    Stelzl, Ulrich

    2013-12-01

    Protein-protein interaction networks are typically generated in standard cell lines or model organisms as it is prohibitively difficult to record large interaction datasets from specific tissues or disease models at a reasonable pace. Although the interaction data are of high confidence, they thus do not reflect in vivo relationships as such. A wealth of physiologically relevant protein information, obtained under different conditions and from different systems, is available including information on genetic variation, protein levels, and PTMs. However, these data are difficult to assess comprehensively because the relationships between the entities remain elusive from the measurements. Here, we exemplarily highlight recent studies that gained deeper insight from genetic variation, protein, and PTM measurements using interaction information pointing toward the importance and potential of interaction networks for the interpretation of sequencing and proteomics data. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  13. Solid phase sequencing of biopolymers

    DOEpatents

    Cantor, Charles; Koster, Hubert

    2010-09-28

    This invention relates to methods for detecting and sequencing target nucleic acid sequences, to mass modified nucleic acid probes and arrays of probes useful in these methods, and to kits and systems which contain these probes. Useful methods involve hybridizing the nucleic acids or nucleic acids which represent complementary or homologous sequences of the target to an array of nucleic acid probes. These probes comprise a single-stranded portion, an optional double-stranded portion and a variable sequence within the single-stranded portion. The molecular weights of the hybridized nucleic acids of the set can be determined by mass spectroscopy, and the sequence of the target determined from the molecular weights of the fragments. Nucleic acids whose sequences can be determined include DNA or RNA in biological samples such as patient biopsies and environmental samples. Probes may be fixed to a solid support such as a hybridization chip to facilitate automated molecular weight analysis and identification of the target sequence.

  14. N-Terminal Amino Acid Sequence Determination of Proteins by N-Terminal Dimethyl Labeling: Pitfalls and Advantages When Compared with Edman Degradation Sequence Analysis.

    PubMed

    Chang, Elizabeth; Pourmal, Sergei; Zhou, Chun; Kumar, Rupesh; Teplova, Marianna; Pavletich, Nikola P; Marians, Kenneth J; Erdjument-Bromage, Hediye

    2016-07-01

    In recent history, alternative approaches to Edman sequencing have been investigated, and to this end, the Association of Biomolecular Resource Facilities (ABRF) Protein Sequencing Research Group (PSRG) initiated studies in 2014 and 2015, looking into bottom-up and top-down N-terminal (Nt) dimethyl derivatization of standard quantities of intact proteins with the aim to determine Nt sequence information. We have expanded this initiative and used low picomole amounts of myoglobin to determine the efficiency of Nt-dimethylation. Application of this approach on protein domains, generated by limited proteolysis of overexpressed proteins, confirms that it is a universal labeling technique and is very sensitive when compared with Edman sequencing. Finally, we compared Edman sequencing and Nt-dimethylation of the same polypeptide fragments; results confirm that there is agreement in the identity of the Nt amino acid sequence between these 2 methods.

  15. Extension of the COG and arCOG databases by amino acid and nucleotide sequences

    PubMed Central

    Meereis, Florian; Kaufmann, Michael

    2008-01-01

    Background The current versions of the COG and arCOG databases, both excellent frameworks for studies in comparative and functional genomics, do not contain the nucleotide sequences corresponding to their protein or protein domain entries. Results Using sequence information obtained from GenBank flat files covering the completely sequenced genomes of the COG and arCOG databases, we constructed NUCOCOG (nucleotide sequences containing COG databases) as an extended version including all nucleotide sequences and in addition the amino acid sequences originally utilized to construct the current COG and arCOG databases. We make available three comprehensive single XML files containing the complete databases including all sequence information. In addition, we provide a web interface as a utility suitable to browse the NUCOCOG database for sequence retrieval. The database is accessible at . Conclusion NUCOCOG offers the possibility to analyze any sequence related property in the context of the COG and arCOG framework simply by using script languages such as PERL applied to a large but single XML document. PMID:19014535

  16. Integrating mRNA and protein sequencing enables the detection and quantitative profiling of natural protein sequence variants of Populus trichocarpa

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Abraham, Paul E.; Wang, Xiaojing; Ranjan, Priya

    The availability of next-generation sequencing technologies has rapidly transformed our ability to link genotypes to phenotypes, and as such, promises to facilitate the dissection of genetic contribution to complex traits. Although discoveries of genetic associations will further our understanding of biology, once candidate variants have been identified, investigators are faced with the challenge of characterizing the functional effects on proteins encoded by such genes. Here we show how next-generation RNA sequencing data can be exploited to construct genotype-specific protein sequence databases, which provide a clearer picture of the molecular toolbox underlying cellular and organismal processes and their variation in amore » natural population. For this study, we used two individual genotypes (DENA-17-3 and VNDL-27-4) from a recent genome wide association (GWA) study of Populus trichocarpa, an obligate outcrosser that exhibits tremendous phenotypic variation across the natural population. This strategy allowed us to comprehensively catalogue proteins containing single amino acid polymorphisms (SAAPs) and insertions and deletions (INDELS). Based on large-scale identification of SAAPs, we profiled the frequency of 128 types of naturally occurring amino acid substitutions, with a subset of SAAPs occurring in regions of the genome having strong polymorphism patterns consistent with recent positive and/or divergent selection. In addition, we were able to explore the diploid landscape of Populus at the proteome-level, allowing the characterization of heterozygous variants.« less

  17. Integrating mRNA and protein sequencing enables the detection and quantitative profiling of natural protein sequence variants of Populus trichocarpa

    DOE PAGES

    Abraham, Paul E.; Wang, Xiaojing; Ranjan, Priya; ...

    2015-10-20

    The availability of next-generation sequencing technologies has rapidly transformed our ability to link genotypes to phenotypes, and as such, promises to facilitate the dissection of genetic contribution to complex traits. Although discoveries of genetic associations will further our understanding of biology, once candidate variants have been identified, investigators are faced with the challenge of characterizing the functional effects on proteins encoded by such genes. Here we show how next-generation RNA sequencing data can be exploited to construct genotype-specific protein sequence databases, which provide a clearer picture of the molecular toolbox underlying cellular and organismal processes and their variation in amore » natural population. For this study, we used two individual genotypes (DENA-17-3 and VNDL-27-4) from a recent genome wide association (GWA) study of Populus trichocarpa, an obligate outcrosser that exhibits tremendous phenotypic variation across the natural population. This strategy allowed us to comprehensively catalogue proteins containing single amino acid polymorphisms (SAAPs) and insertions and deletions (INDELS). Based on large-scale identification of SAAPs, we profiled the frequency of 128 types of naturally occurring amino acid substitutions, with a subset of SAAPs occurring in regions of the genome having strong polymorphism patterns consistent with recent positive and/or divergent selection. In addition, we were able to explore the diploid landscape of Populus at the proteome-level, allowing the characterization of heterozygous variants.« less

  18. Population subdivision and molecular sequence variation: theory and analysis of Drosophila ananassae data.

    PubMed

    Vogl, Claus; Das, Aparup; Beaumont, Mark; Mohanty, Sujata; Stephan, Wolfgang

    2003-11-01

    Population subdivision complicates analysis of molecular variation. Even if neutrality is assumed, three evolutionary forces need to be considered: migration, mutation, and drift. Simplification can be achieved by assuming that the process of migration among and drift within subpopulations is occurring fast compared to mutation and drift in the entire population. This allows a two-step approach in the analysis: (i) analysis of population subdivision and (ii) analysis of molecular variation in the migrant pool. We model population subdivision using an infinite island model, where we allow the migration/drift parameter Theta to vary among populations. Thus, central and peripheral populations can be differentiated. For inference of Theta, we use a coalescence approach, implemented via a Markov chain Monte Carlo (MCMC) integration method that allows estimation of allele frequencies in the migrant pool. The second step of this approach (analysis of molecular variation in the migrant pool) uses the estimated allele frequencies in the migrant pool for the study of molecular variation. We apply this method to a Drosophila ananassae sequence data set. We find little indication of isolation by distance, but large differences in the migration parameter among populations. The population as a whole seems to be expanding. A population from Bogor (Java, Indonesia) shows the highest variation and seems closest to the species center.

  19. A global analysis of CNVs in swine using whole genome sequence data and association analysis with fatty acid composition and growth traits.

    PubMed

    Revilla, Manuel; Puig-Oliveras, Anna; Castelló, Anna; Crespo-Piazuelo, Daniel; Paludo, Ediane; Fernández, Ana I; Ballester, Maria; Folch, Josep M

    2017-01-01

    Copy number variations (CNVs) are important genetic variants complementary to SNPs, and can be considered as biomarkers for some economically important traits in domestic animals. In the present study, a genomic analysis of porcine CNVs based on next-generation sequencing data was carried out to identify CNVs segregating in an Iberian x Landrace backcross population and study their association with fatty acid composition and growth-related traits. A total of 1,279 CNVs, including duplications and deletions, were detected, ranging from 106 to 235 CNVs across samples, with an average of 183 CNVs per sample. Moreover, we detected 540 CNV regions (CNVRs) containing 245 genes. Functional annotation suggested that these genes possess a great variety of molecular functions and may play a role in production traits in commercial breeds. Some of the identified CNVRs contained relevant functional genes (e.g., CLCA4, CYP4X1, GPAT2, MOGAT2, PLA2G2A and PRKG1, among others). The variation in copy number of four of them (CLCA4, GPAT2, MOGAT2 and PRKG1) was validated in 150 BC1_LD (25% Iberian and 75% Landrace) animals by qPCR. Additionally, their contribution regarding backfat and intramuscular fatty acid composition and growth-related traits was analyzed. Statistically significant associations were obtained for CNVR112 (GPAT2) for the C18:2(n-6)/C18:3(n-3) ratio in backfat and carcass length, among others. Notably, GPATs are enzymes that catalyze the first step in the biosynthesis of both triglycerides and glycerophospholipids, suggesting that this CNVR may contribute to genetic variation in fatty acid composition and growth traits. These findings provide useful genomic information to facilitate the further identification of trait-related CNVRs affecting economically important traits in pigs.

  20. The amino acid motif L/IIxxFE defines a novel actin-binding sequence in PDZ-RhoGEF

    PubMed Central

    Banerjee, Jayashree; Fischer, Christopher C.; Wedegaertner, Philip B.

    2009-01-01

    PDZ-RhoGEF is a member of the regulator of G protein signaling (RGS) domain-containing RhoGEFs (RGS-RhoGEFs) that link activated heterotrimeric G protein α subunits of the G12 family to activation of the small GTPase RhoA. Unique among the RGS-RhoGEFs, PDZ-RhoGEF contains a short sequence that localizes the protein to the actin cytoskeleton. In this report, we demonstrate that the actin-binding domain, located between amino acids 561–585, directly binds to F-actin in vitro. Extensive mutagenesis identifies isoleucine 568, isoleucine 569, phenylalanine 572, and glutamic acid 573 as necessary for binding to actin and for co-localization with the actin cytoskeleton in cells. These results define a novel actin-binding sequence in PDZ-RhoGEF with a critical amino acid motif of IIxxFE. Moreover, sequence analysis identifies a similar actin-binding motif in the N-terminus of the RhoGEF frabin, and, as with PDZ-RhoGEF, mutagenesis and actin interaction experiments demonstrate a motif of LIxxFE, consisting of the key amino acids leucine 23, isoleucine 24, phenylalanine 27, and glutamic acid 28. Taken together, results with PDZ-RhoGEF and frabin identify a novel actin binding sequence. Lastly, inducible dimerization of the actin-binding region of PDZ-RhoGEF revealed a dimerization-dependent actin bundling activity in vitro. PDZ-RhoGEF exists in cells as a dimer, raising the possibility that PDZ-RhoGEF could influence actin structure independent of its ability to activate RhoA. PMID:19618964

  1. Genetic variation in food choice behaviour of amino acid-deprived Drosophila.

    PubMed

    Toshima, Naoko; Hara, Chieko; Scholz, Claus-Jürgen; Tanimura, Teiichi

    2014-10-01

    To understand homeostatic regulation in insects, we need to understand the mechanisms by which they respond to external stimuli to maintain the internal milieu. Our previous study showed that Drosophila melanogaster exhibit specific amino acid preferences. Here, we used the D.melanogaster Genetic Reference Panel (DGRP), which is comprised of multiple inbred lines derived from a natural population, to examine how amino acid preference changes depending on the internal nutritional state in different lines. We performed a two-choice preference test and observed genetic variations in the response to amino acid deprivation. For example, a high-responding line showed an enhanced preference for amino acids even after only 1day of deprivation and responded to a fairly low concentration of amino acids. Conversely, a low-responding line showed no increased preference for amino acids after deprivation. We compared the gene expression profiles between selected high- and the low-responding lines and performed SNP analyses. We found several groups of genes putatively involved in altering amino acid preference. These results will contribute to future studies designed to explore how the genetic architecture of an organism evolves to adapt to different nutritional environments. Copyright © 2014 Elsevier Ltd. All rights reserved.

  2. alpha-Amylase gene of Streptomyces limosus: nucleotide sequence, expression motifs, and amino acid sequence homology to mammalian and invertebrate alpha-amylases.

    PubMed Central

    Long, C M; Virolle, M J; Chang, S Y; Chang, S; Bibb, M J

    1987-01-01

    The nucleotide sequence of the coding and regulatory regions of the alpha-amylase gene (aml) of Streptomyces limosus was determined. High-resolution S1 mapping was used to locate the 5' end of the transcript and demonstrated that the gene is transcribed from a unique promoter. The predicted amino acid sequence has considerable identity to mammalian and invertebrate alpha-amylases, but not to those of plant, fungal, or eubacterial origin. Consistent with this is the susceptibility of the enzyme to an inhibitor of mammalian alpha-amylases. The amino-terminal sequence of the extracellular enzyme was determined, revealing the presence of a typical signal peptide preceding the mature form of the alpha-amylase. Images PMID:3500166

  3. Sequence variation of koala retrovirus transmembrane protein p15E among koalas from different geographic regions.

    PubMed

    Ishida, Yasuko; McCallister, Chelsea; Nikolaidis, Nikolas; Tsangaras, Kyriakos; Helgen, Kristofer M; Greenwood, Alex D; Roca, Alfred L

    2015-01-15

    The koala retrovirus (KoRV), which is transitioning from an exogenous to an endogenous form, has been associated with high mortality in koalas. For other retroviruses, the envelope protein p15E has been considered a candidate for vaccine development. We therefore examined proviral sequence variation of KoRV p15E in a captive Queensland and three wild southern Australian koalas. We generated 163 sequences with intact open reading frames, which grouped into 39 distinct haplotypes. Sixteen distinct haplotypes comprising 139 of the sequences (85%) coded for the same polypeptide. Among the remaining 23 haplotypes, 22 were detected only once among the sequences, and each had 1 or 2 non-synonymous differences from the majority sequence. Several analyses suggested that p15E was under purifying selection. Important epitopes and domains were highly conserved across the p15E sequences and in previously reported exogenous KoRVs. Overall, these results support the potential use of p15E for KoRV vaccine development. Copyright © 2014 Elsevier Inc. All rights reserved.

  4. Identification of single amino acid substitutions (SAAS) in neuraminidase from influenza a virus (H1N1) via mass spectrometry analysis coupled with de novo peptide sequencing.

    PubMed

    Peng, Qisheng; Wang, Zijian; Wu, Donglin; Li, Xiaoou; Liu, Xiaofeng; Sun, Wanchun; Liu, Ning

    2016-08-01

    Amino acid substitutions in the neuraminidase of the influenza virus are the main cause of the emergence of resistance to zanamivir or oseltamivir during seasonal influenza treatment; they are the result of non-synonymous mutations in the viral genome that can be successfully detected by polymer chain reaction (PCR)-based approaches. There is always an urgent need to detect variation in amino acid sequences directly at the protein level. Mass spectrometry coupled with de novo sequencing has been explored as an alternative and straightforward strategy for detecting amino acid substitutions, as well - this approach is the primary focus of the present study. Influenza virus (A/Puerto Rico/8/1934 H1N1) propagated in embryonated chicken eggs was purified by ultracentrifugation, followed by PNGase F treatment. The deglycosylated virion was lysed and separated by sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE). The gel band corresponding to neuraminidase was picked up and subjected to liquid chromatography tandem mass spectrometry (LC-MS/MS) analysis. LC-MS/MS analyses, coupled with manual de novo sequencing, allowed the determination of three amino acid substitutions: R346K, S349 N, and S370I/L, in the neuraminidase from the influenza virus (A/Puerto Rico/8/1934 H1N1), which were located in three mutated peptides of the neuraminidase: YGNGVWIGK, TKNHSSR, and PNGWTETDI/LK, respectively. We found that the amino acid substitutions in the proteins of RNA viruses (including influenza A virus) resulting from non-synonymous gene mutations can indeed be directly analyzed via mass spectrometry, and that manual interpretation of the MS/MS data may be beneficial. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  5. A genome-wide analysis of the lysophosphatidate acyltransferase (LPAAT) gene family in cotton: organization, expression, sequence variation, and association with seed oil content and fiber quality.

    PubMed

    Wang, Nuohan; Ma, Jianjiang; Pei, Wenfeng; Wu, Man; Li, Haijing; Li, Xingli; Yu, Shuxun; Zhang, Jinfa; Yu, Jiwen

    2017-03-01

    Lysophosphatidic acid acyltransferase (LPAAT) encoded by a multigene family is a rate-limiting enzyme in the Kennedy pathway in higher plants. Cotton is the most important natural fiber crop and one of the most important oilseed crops. However, little is known on genes coding for LPAATs involved in oil biosynthesis with regard to its genome organization, diversity, expression, natural genetic variation, and association with fiber development and oil content in cotton. In this study, a comprehensive genome-wide analysis in four Gossypium species with genome sequences, i.e., tetraploid G. hirsutum- AD 1  and G. barbadense- AD 2 and its possible ancestral diploids G. raimondii- D 5 and G. arboreum- A 2 , identified 13, 10, 8, and 9 LPAAT genes, respectively, that were divided into four subfamilies. RNA-seq analyses of the LPAAT genes in the widely grown G. hirsutum suggest their differential expression at the transcriptional level in developing cottonseeds and fibers. Although 10 LPAAT genes were co-localised with quantitative trait loci (QTL) for cottonseed oil or protein content within a 25-cM region, only one single strand conformation polymorphic (SSCP) marker developed from a synonymous single nucleotide polymorphism (SNP) of the At-Gh13LPAAT5 gene was significantly correlated with cottonseed oil and protein contents in one of the three field tests. Moreover, transformed yeasts using the At-Gh13LPAAT5 gene with the two sequences for the SNP led to similar results, i.e., a 25-31% increase in palmitic acid and oleic acid, and a 16-29% increase in total triacylglycerol (TAG). The results in this study demonstrated that the natural variation in the LPAAT genes to improving cottonseed oil content and fiber quality is limited; therefore, traditional cross breeding should not expect much progress in improving cottonseed oil content or fiber quality through a marker-assisted selection for the LPAAT genes. However, enhancing the expression of one of the LPAAT genes such

  6. Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    Patel, Kamlesh D.

    2018-01-22

    Kamlesh (Ken) Patel from Sandia National Laboratories (Livermore, California) presents "Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology " at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  7. Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Patel, Kamlesh D.

    2012-06-01

    Kamlesh (Ken) Patel from Sandia National Laboratories (Livermore, California) presents "Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology " at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  8. Genome sequences of five Lactobacillus sp. isolates from traditional Turkish sourdough

    USDA-ARS?s Scientific Manuscript database

    A high level of variation in microflora can be observed in lactic acid bacteria (LAB) profiles of sourdoughs. Here, we present draft genome sequences of Lactobacillus reuteri E81, L. reuteri LR5A, L. rhamnosus LR2, L. plantarum PFC-311 and a novel Lactobacillus sp. PFC-70 isolated from traditional T...

  9. Evolutionary Pattern of the FAE1 Gene in Brassicaceae and Its Correlation with the Erucic Acid Trait

    PubMed Central

    Li, Mimi; Peng, Bin; Guo, Haisong; Yan, Qinqin; Hang, Yueyu

    2013-01-01

    The fatty acid elongase 1 (FAE1) gene catalyzes the initial condensation step in the elongation pathway of VLCFA (very long chain fatty acid) biosynthesis and is thus a key gene in erucic acid biosynthesis. Based on a worldwide collection of 62 accessions representing 14 tribes, 31 genera, 51 species, 4 subspecies and 7 varieties, we conducted a phylogenetic reconstruction and correlation analysis between genetic variations in the FAE1 gene and the erucic acid trait, attempting to gain insight into the evolutionary patterns and the correlations between genetic variations in FAE1 and trait variations. The five clear, deeply diverged clades detected in the phylogenetic reconstruction are largely congruent with a previous multiple gene-derived phylogeny. The Ka/Ks ratio (<1) and overall low level of nucleotide diversity in the FAE1 gene suggest that purifying selection is the major evolutionary force acting on this gene. Sequence variations in FAE1 show a strong correlation with the content of erucic acid in seeds, suggesting a causal link between the two. Furthermore, we detected 16 mutations that were fixed between the low and high phenotypes of the FAE1 gene, which constitute candidate active sites in this gene for altering the content of erucic acid in seeds. Our findings begin to shed light on the evolutionary pattern of this important gene and represent the first step in elucidating how the sequence variations impact the production of erucic acid in plants. PMID:24358289

  10. Ultra-deep sequencing reveals high prevalence and broad structural diversity of hepatitis B surface antigen mutations in a global population.

    PubMed

    Gencay, Mikael; Hübner, Kirsten; Gohl, Peter; Seffner, Anja; Weizenegger, Michael; Neofytos, Dionysios; Batrla, Richard; Woeste, Andreas; Kim, Hyon-Suk; Westergaard, Gaston; Reinsch, Christine; Brill, Eva; Thu Thuy, Pham Thi; Hoang, Bui Huu; Sonderup, Mark; Spearman, C Wendy; Pabinger, Stephan; Gautier, Jérémie; Brancaccio, Giuseppina; Fasano, Massimo; Santantonio, Teresa; Gaeta, Giovanni B; Nauck, Markus; Kaminski, Wolfgang E

    2017-01-01

    The diversity of the hepatitis B surface antigen (HBsAg) has a significant impact on the performance of diagnostic screening tests and the clinical outcome of hepatitis B infection. Neutralizing or diagnostic antibodies against the HBsAg are directed towards its highly conserved major hydrophilic region (MHR), in particular towards its "a" determinant subdomain. Here, we explored, on a global scale, the genetic diversity of the HBsAg MHR in a large, multi-ethnic cohort of randomly selected subjects with HBV infection from four continents. A total of 1553 HBsAg positive blood samples of subjects originating from 20 different countries across Africa, America, Asia and central Europe were characterized for amino acid variation in the MHR. Using highly sensitive ultra-deep sequencing, we found 72.8% of the successfully sequenced subjects (n = 1391) demonstrated amino acid sequence variation in the HBsAg MHR. This indicates that the global variation frequency in the HBsAg MHR is threefold higher than previously reported. The majority of the amino acid mutations were found in the HBV genotypes B (28.9%) and C (25.4%). Collectively, we identified 345 distinct amino acid mutations in the MHR. Among these, we report 62 previously unknown mutations, which extends the worldwide pool of currently known HBsAg MHR mutations by 22%. Importantly, topological analysis identified the "a" determinant upstream flanking region as the structurally most diverse subdomain of the HBsAg MHR. The highest prevalence of "a" determinant region mutations was observed in subjects from Asia, followed by the African, American and European cohorts, respectively. Finally, we found that more than half (59.3%) of all HBV subjects investigated carried multiple MHR mutations. Together, this worldwide ultra-deep sequencing based genotyping study reveals that the global prevalence and structural complexity of variation in the hepatitis B surface antigen have, to date, been significantly underappreciated.

  11. The mitochondrial C16069T polymorphism, not mitochondrial D310 (D-loop) mononucleotide sequence variations, is associated with bladder cancer.

    PubMed

    Shakhssalim, Nasser; Houshmand, Massoud; Kamalidehghan, Behnam; Faraji, Abolfazl; Sarhangnejad, Reza; Dadgar, Sepideh; Mobaraki, Maryam; Rosli, Rozita; Sanati, Mohammad Hossein

    2013-12-05

    Bladder cancer is a relatively common and potentially life-threatening neoplasm that ranks ninth in terms of worldwide cancer incidence. The aim of this study was to determine deletions and sequence variations in the mitochondrial displacement loop (D-loop) region from the blood specimens and tumoral tissues of patients with bladder cancer, compared to adjacent non-tumoral tissues. The DNA from blood, tumoral tissues and adjacent non-tumoral tissues of twenty-six patients with bladder cancer and DNA from blood of 504 healthy controls from different ethnicities were investigated to determine sequence variation in the mitochondrial D-loop region using multiplex polymerase chain reaction (PCR), DNA sequencing and southern blotting analysis. From a total of 110 variations, 48 were reported as new mutations. No deletions were detected in tumoral tissues, adjacent non-tumoral tissues and blood samples from patients. Although the polymorphisms at loci 16189, 16261 and 16311 were not significantly correlated with bladder cancer, the C16069T variation was significantly present in patient samples compared to control samples (p < 0.05). Interestingly, there was no significant difference (p > 0.05) of C variations, including C7TC6, C8TC6, C9TC6 and C10TC6, in D310 mitochondrial DNA between patients and control samples. Our study suggests that 16069 mitochondrial DNA D-Loop mutations may play a significant role in the etiology of bladder cancer and facilitate the definition of carcinogenesis-related mutations in human cancer.

  12. Partial amino acid sequence of the branched chain amino acid aminotransferase (TmB) of E. coli JA199 pDU11

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Feild, M.J.; Armstrong, F.B.

    1987-05-01

    E. coli JA199 pDU11 harbors a multicopy plasmid containing the ilv GEDAY gene cluster of S. typhimurium. TmB, gene product of ilv E, was purified, crystallized, and subjected to Edman degradation using a gas phase sequencer. The intact protein yielded an amino terminal 31 residue sequence. Both carboxymethylated apoenzyme and (/sup 3/H)-NaBH-reduced holoenzyme were then subjected to digestion by trypsin. The digests were fractionated using reversed phase HPLC, and the peptides isolated were sequenced. The borohydride-treated holoenzyme was used to isolate the cofactor-binding peptide. The peptide is 27 residues long and a comparison with known sequences of other aminotransferases revealedmore » limited homology. Peptides accounting for 211 of 288 predicted residues have been sequenced, including 9 residues of the carboxyl terminus. Comparison of peptides with the inferred amino acid sequence of the E. coli K-12 enzyme has helped determine the sequence of the amino terminal 59 residues; only two differences between the sequences are noted in this region.« less

  13. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... base or modified or unusual amino acid may be presented in a given sequence as the corresponding unmodified base or amino acid if the modified base or modified or unusual amino acid is one of those listed... the Feature section. Otherwise, each occurrence of a base or amino acid not appearing in WIPO Standard...

  14. Self-sequencing of amino acids and origins of polyfunctional protocells

    NASA Technical Reports Server (NTRS)

    Fox, S. W.

    1984-01-01

    The role of proteins in the origin of living things is discussed. It has been experimentally established that amino acids can sequence themselves under simulated geological conditions with highly nonrandom products which accordingly contain diverse information. Multiple copies of each type of macromolecule are formed, resulting in greater power for any protoenzymic molecule than would accrue from a single copy of each type. Thermal proteins are readily incorporated into laboratory protocells. The experimental evidence for original polyfunctional protocells is discussed.

  15. Microwave-assisted acid and base hydrolysis of intact proteins containing disulfide bonds for protein sequence analysis by mass spectrometry.

    PubMed

    Reiz, Bela; Li, Liang

    2010-09-01

    Controlled hydrolysis of proteins to generate peptide ladders combined with mass spectrometric analysis of the resultant peptides can be used for protein sequencing. In this paper, two methods of improving the microwave-assisted protein hydrolysis process are described to enable rapid sequencing of proteins containing disulfide bonds and increase sequence coverage, respectively. It was demonstrated that proteins containing disulfide bonds could be sequenced by MS analysis by first performing hydrolysis for less than 2 min, followed by 1 h of reduction to release the peptides originally linked by disulfide bonds. It was shown that a strong base could be used as a catalyst for microwave-assisted protein hydrolysis, producing complementary sequence information to that generated by microwave-assisted acid hydrolysis. However, using either acid or base hydrolysis, amide bond breakages in small regions of the polypeptide chains of the model proteins (e.g., cytochrome c and lysozyme) were not detected. Dynamic light scattering measurement of the proteins solubilized in an acid or base indicated that protein-protein interaction or aggregation was not the cause of the failure to hydrolyze certain amide bonds. It was speculated that there were some unknown local structures that might play a role in preventing an acid or base from reacting with the peptide bonds therein. 2010 American Society for Mass Spectrometry. Published by Elsevier Inc. All rights reserved.

  16. Distinguishing functional polymorphism from random variation in the sequences of >10,000 HLA-A, -B and -C alleles.

    PubMed

    Robinson, James; Guethlein, Lisbeth A; Cereb, Nezih; Yang, Soo Young; Norman, Paul J; Marsh, Steven G E; Parham, Peter

    2017-06-01

    HLA class I glycoproteins contain the functional sites that bind peptide antigens and engage lymphocyte receptors. Recently, clinical application of sequence-based HLA typing has uncovered an unprecedented number of novel HLA class I alleles. Here we define the nature and extent of the variation in 3,489 HLA-A, 4,356 HLA-B and 3,111 HLA-C alleles. This analysis required development of suites of methods, having general applicability, for comparing and analyzing large numbers of homologous sequences. At least three amino-acid substitutions are present at every position in the polymorphic α1 and α2 domains of HLA-A, -B and -C. A minority of positions have an incidence >1% for the 'second' most frequent nucleotide, comprising 70 positions in HLA-A, 85 in HLA-B and 54 in HLA-C. The majority of these positions have three or four alternative nucleotides. These positions were subject to positive selection and correspond to binding sites for peptides and receptors. Most alleles of HLA class I (>80%) are very rare, often identified in one person or family, and they differ by point mutation from older, more common alleles. These alleles with single nucleotide polymorphisms reflect the germ-line mutation rate. Their frequency predicts the human population harbors 8-9 million HLA class I variants. The common alleles of human populations comprise 42 core alleles, which represent all selected polymorphism, and recombinants that have assorted this polymorphism.

  17. Distinguishing functional polymorphism from random variation in the sequences of >10,000 HLA-A, -B and -C alleles

    PubMed Central

    Cereb, Nezih; Yang, Soo Young; Marsh, Steven G. E.; Parham, Peter

    2017-01-01

    HLA class I glycoproteins contain the functional sites that bind peptide antigens and engage lymphocyte receptors. Recently, clinical application of sequence-based HLA typing has uncovered an unprecedented number of novel HLA class I alleles. Here we define the nature and extent of the variation in 3,489 HLA-A, 4,356 HLA-B and 3,111 HLA-C alleles. This analysis required development of suites of methods, having general applicability, for comparing and analyzing large numbers of homologous sequences. At least three amino-acid substitutions are present at every position in the polymorphic α1 and α2 domains of HLA-A, -B and -C. A minority of positions have an incidence >1% for the ‘second’ most frequent nucleotide, comprising 70 positions in HLA-A, 85 in HLA-B and 54 in HLA-C. The majority of these positions have three or four alternative nucleotides. These positions were subject to positive selection and correspond to binding sites for peptides and receptors. Most alleles of HLA class I (>80%) are very rare, often identified in one person or family, and they differ by point mutation from older, more common alleles. These alleles with single nucleotide polymorphisms reflect the germ-line mutation rate. Their frequency predicts the human population harbors 8–9 million HLA class I variants. The common alleles of human populations comprise 42 core alleles, which represent all selected polymorphism, and recombinants that have assorted this polymorphism. PMID:28650991

  18. Statistical distribution of amino acid sequences: a proof of Darwinian evolution.

    PubMed

    Eitner, Krystian; Koch, Uwe; Gaweda, Tomasz; Marciniak, Jedrzej

    2010-12-01

    The article presents results of the listing of the quantity of amino acids, dipeptides and tripeptides for all proteins available in the UNIPROT-TREMBL database and the listing for selected species and enzymes. UNIPROT-TREMBL contains protein sequences associated with computationally generated annotations and large-scale functional characterization. Due to the distinct metabolic pathways of amino acid syntheses and their physicochemical properties, the quantities of subpeptides in proteins vary. We have proved that the distribution of amino acids, dipeptides and tripeptides is statistical which confirms that the evolutionary biodiversity development model is subject to the theory of independent events. It seems interesting that certain short peptide combinations occur relatively rarely or even not at all. First, it confirms the Darwinian theory of evolution and second, it opens up opportunities for designing pharmaceuticals among rarely represented short peptide combinations. Furthermore, an innovative approach to the mass analysis of bioinformatic data is presented. eitner@amu.edu.pl Supplementary data are available at Bioinformatics online.

  19. Comparison of ZP3 protein sequences among vertebrate species: to obtain a consensus sequence for immunocontraception.

    PubMed

    Zhu, X; Naz, R K

    1999-03-01

    The deduced ZP3 amino acid (aa) sequences of 13 vertebrate species namely mouse, hamster, rabbit, pig, porcine, cow, dog, cat, human, bonnet, marmoset, carp, and frog were compared using the PILEUP and PRETTY alignment programs (GCG, Wisconsin, USA). The published aa sequences obtained from 13 vertebrate species indicated the overall evolutionarily conservation in the N-terminus, central region, and C-terminus of the ZP3 polypeptide. More variations of ZP3 polypeptide sequences were seen in the alignments of carp and frog from the 11 mammalian species making the leader sequence more prominent. The canonical furin proteolytic processing signal at the C-terminus was found in all the ZP3 polypeptide sequences except of carp and frog. In the central region, the ZP3 deduced aa sequences of all the 13 vertebrate species aligned well, and six relatively conserved sequences were found. There are 11 conserved cysteine residues in the central region across all species including carp and frog, indicating that these residues have longer evolutionary history. The ZP3 aa sequence similarities were examined using the GAP program (GCG). The highest aa similarities are observed between the members of the same order within the class mammalia, and also (95.4%) between pig (ungulata) and rabbit (lagomorpha). The deduced ZP3 aa sequences per se may not be enough to build a phylogenetic tree.

  20. Sequencing of a Patient with Balanced Chromosome Abnormalities and Neurodevelopmental Disease Identifies Disruption of Multiple High Risk Loci by Structural Variation

    PubMed Central

    Blake, Jonathon; Riddell, Andrew; Theiss, Susanne; Gonzalez, Alexis Perez; Haase, Bettina; Jauch, Anna; Janssen, Johannes W. G.; Ibberson, David; Pavlinic, Dinko; Moog, Ute; Benes, Vladimir; Runz, Heiko

    2014-01-01

    Balanced chromosome abnormalities (BCAs) occur at a high frequency in healthy and diseased individuals, but cost-efficient strategies to identify BCAs and evaluate whether they contribute to a phenotype have not yet become widespread. Here we apply genome-wide mate-pair library sequencing to characterize structural variation in a patient with unclear neurodevelopmental disease (NDD) and complex de novo BCAs at the karyotype level. Nucleotide-level characterization of the clinically described BCA breakpoints revealed disruption of at least three NDD candidate genes (LINC00299, NUP205, PSMD14) that gave rise to abnormal mRNAs and could be assumed as disease-causing. However, unbiased genome-wide analysis of the sequencing data for cryptic structural variation was key to reveal an additional submicroscopic inversion that truncates the schizophrenia- and bipolar disorder-associated brain transcription factor ZNF804A as an equally likely NDD-driving gene. Deep sequencing of fluorescent-sorted wild-type and derivative chromosomes confirmed the clinically undetected BCA. Moreover, deep sequencing further validated a high accuracy of mate-pair library sequencing to detect structural variants larger than 10 kB, proposing that this approach is powerful for clinical-grade genome-wide structural variant detection. Our study supports previous evidence for a role of ZNF804A in NDD and highlights the need for a more comprehensive assessment of structural variation in karyotypically abnormal individuals and patients with neurocognitive disease to avoid diagnostic deception. PMID:24625750

  1. Purification, characterization, gene cloning and nucleotide sequencing of D: -stereospecific amino acid amidase from soil bacterium: Delftia acidovorans.

    PubMed

    Hongpattarakere, Tipparat; Komeda, Hidenobu; Asano, Yasuhisa

    2005-12-01

    The D-amino acid amidase-producing bacterium was isolated from soil samples using an enrichment culture technique in medium broth containing D-phenylalanine amide as a sole source of nitrogen. The strain exhibiting the strongest activity was identified as Delftia acidovorans strain 16. This strain produced intracellular D-amino acid amidase constitutively. The enzyme was purified about 380-fold to homogeneity and its molecular mass was estimated to be about 50 kDa, on sodium dodecyl sulfate polyacrylamide gel electrophoresis. The enzyme was active preferentially toward D-amino acid amides rather than their L-counterparts. It exhibited strong amino acid amidase activity toward aromatic amino acid amides including D-phenylalanine amide, D-tryptophan amide and D-tyrosine amide, yet it was not specifically active toward low-molecular-weight D-amino acid amides such as D-alanine amide, L-alanine amide and L-serine amide. Moreover, it was not specifically active toward oligopeptides. The enzyme showed maximum activity at 40 degrees C and pH 8.5 and appeared to be very stable, with 92.5% remaining activity after the reaction was performed at 45 degrees C for 30 min. However, it was mostly inactivated in the presence of phenylmethanesulfonyl fluoride or Cd2+, Ag+, Zn2+, Hg2+ and As3+ . The NH2 terminal and internal amino acid sequences of the enzyme were determined; and the gene was cloned and sequenced. The enzyme gene damA encodes a 466-amino-acid protein (molecular mass 49,860.46 Da); and the deduced amino acid sequence exhibits homology to the D-amino acid amidase from Variovorax paradoxus (67.9% identity), the amidotransferase A subunit from Burkholderia fungorum (50% identity) and other enantioselective amidases.

  2. A Novel Phytase with Sequence Similarity to Purple Acid Phosphatases Is Expressed in Cotyledons of Germinating Soybean Seedlings 1

    PubMed Central

    Hegeman, Carla E.; Grabau, Elizabeth A.

    2001-01-01

    Phytic acid (myo-inositol hexakisphosphate) is the major storage form of phosphorus in plant seeds. During germination, stored reserves are used as a source of nutrients by the plant seedling. Phytic acid is degraded by the activity of phytases to yield inositol and free phosphate. Due to the lack of phytases in the non-ruminant digestive tract, monogastric animals cannot utilize dietary phytic acid and it is excreted into manure. High phytic acid content in manure results in elevated phosphorus levels in soil and water and accompanying environmental concerns. The use of phytases to degrade seed phytic acid has potential for reducing the negative environmental impact of livestock production. A phytase was purified to electrophoretic homogeneity from cotyledons of germinated soybeans (Glycine max L. Merr.). Peptide sequence data generated from the purified enzyme facilitated the cloning of the phytase sequence (GmPhy) employing a polymerase chain reaction strategy. The introduction of GmPhy into soybean tissue culture resulted in increased phytase activity in transformed cells, which confirmed the identity of the phytase gene. It is surprising that the soybean phytase was unrelated to previously characterized microbial or maize (Zea mays) phytases, which were classified as histidine acid phosphatases. The soybean phytase sequence exhibited a high degree of similarity to purple acid phosphatases, a class of metallophosphoesterases. PMID:11500558

  3. The isolation, purification and amino-acid sequence of insulin from the teleost fish Cottus scorpius (daddy sculpin).

    PubMed

    Cutfield, J F; Cutfield, S M; Carne, A; Emdin, S O; Falkmer, S

    1986-07-01

    Insulin from the principal islets of the teleost fish, Cottus scorpius (daddy sculpin), has been isolated and sequenced. Purification involved acid/alcohol extraction, gel filtration, and reverse-phase high-performance liquid chromatography to yield nearly 1 mg pure insulin/g wet weight islet tissue. Biological potency was estimated as 40% compared to porcine insulin. The sculpin insulin crystallised in the absence of zinc ions although zinc is known to be present in the islets in significant amounts. Two other hormones, glucagon and pancreatic polypeptide, were copurified with the insulin, and an N-terminal sequence for pancreatic polypeptide was determined. The primary structure of sculpin insulin shows a number of sequence changes unique so far amongst teleost fish. These changes occur at A14 (Arg), A15 (Val), and B2 (Asp). The B chain contains 29 amino acids and there is no N-terminal extension as seen with several other fish. Presumably as a result of the amino acid substitutions, sculpin insulin does not readily form crystals containing zinc-insulin hexamers, despite the presence of the coordinating B10 His.

  4. Design of nucleic acid sequences for DNA computing based on a thermodynamic approach

    PubMed Central

    Tanaka, Fumiaki; Kameda, Atsushi; Yamamoto, Masahito; Ohuchi, Azuma

    2005-01-01

    We have developed an algorithm for designing multiple sequences of nucleic acids that have a uniform melting temperature between the sequence and its complement and that do not hybridize non-specifically with each other based on the minimum free energy (ΔGmin). Sequences that satisfy these constraints can be utilized in computations, various engineering applications such as microarrays, and nano-fabrications. Our algorithm is a random generate-and-test algorithm: it generates a candidate sequence randomly and tests whether the sequence satisfies the constraints. The novelty of our algorithm is that the filtering method uses a greedy search to calculate ΔGmin. This effectively excludes inappropriate sequences before ΔGmin is calculated, thereby reducing computation time drastically when compared with an algorithm without the filtering. Experimental results in silico showed the superiority of the greedy search over the traditional approach based on the hamming distance. In addition, experimental results in vitro demonstrated that the experimental free energy (ΔGexp) of 126 sequences correlated well with ΔGmin (|R| = 0.90) than with the hamming distance (|R| = 0.80). These results validate the rationality of a thermodynamic approach. We implemented our algorithm in a graphic user interface-based program written in Java. PMID:15701762

  5. Comparative analysis of complete chloroplast genome sequence and inversion variation in Lasthenia burkei (Madieae, Asteraceae).

    PubMed

    Walker, Joseph F; Zanis, Michael J; Emery, Nancy C

    2014-04-01

    Complete chloroplast genome studies can help resolve relationships among large, complex plant lineages such as Asteraceae. We present the first whole plastome from the Madieae tribe and compare its sequence variation to other chloroplast genomes in Asteraceae. We used high throughput sequencing to obtain the Lasthenia burkei chloroplast genome. We compared sequence structure and rates of molecular evolution in the small single copy (SSC), large single copy (LSC), and inverted repeat (IR) regions to those for eight Asteraceae accessions and one Solanaceae accession. The chloroplast sequence of L. burkei is 150 746 bp and contains 81 unique protein coding genes and 4 coding ribosomal RNA sequences. We identified three major inversions in the L. burkei chloroplast, all of which have been found in other Asteraceae lineages, and a previously unreported inversion in Lactuca sativa. Regions flanking inversions contained tRNA sequences, but did not have particularly high G + C content. Substitution rates varied among the SSC, LSC, and IR regions, and rates of evolution within each region varied among species. Some observed differences in rates of molecular evolution may be explained by the relative proportion of coding to noncoding sequence within regions. Rates of molecular evolution vary substantially within and among chloroplast genomes, and major inversion events may be promoted by the presence of tRNAs. Collectively, these results provide insight into different mechanisms that may promote intramolecular recombination and the inversion of large genomic regions in the plastome.

  6. cis-β-Bromostyrene derivatives from cinnamic acids via a tandem substitutive bromination-decarboxylation sequence.

    PubMed

    Tang, Khanh G; Kent, Greggory T; Erden, Ihsan; Wu, Weiming

    2017-10-04

    cis -β-Bromostyrene derivatives were synthesized stereospecifically from cinnamic acids through β-lactone intermediates. The synthetic sequence did not require the purification of the β-lactone intermediates although they were found to be stable and readily purified in most cases.

  7. BLAZAR ANTI-SEQUENCE OF SPECTRAL VARIATION WITHIN INDIVIDUAL BLAZARS: CASES FOR MRK 501 AND 3C 279

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Jin; Zhang, Shuang-Nan; Liang, En-Wei, E-mail: zhang.jin@hotmail.com

    2013-04-10

    The jet properties of Mrk 501 and 3C 279 are derived by fitting broadband spectral energy distributions (SEDs) with lepton models. The derived {gamma}{sub b} (the break Lorenz factor of the electron distribution) is 10{sup 4}-10{sup 6} for Mrk 501 and 200 {approx} 600 for 3C 279. The magnetic field strength (B) of Mrk 501 is usually one order of magnitude lower than that of 3C 279, but their Doppler factors ({delta}) are comparable. A spectral variation feature where the peak luminosity is correlated with the peak frequency, which is opposite from the blazar sequence, is observed in the twomore » sources. We find that (1) the peak luminosities of the two bumps in the SEDs for Mrk 501 depend on {gamma}{sub b} in both the observer and co-moving frames, but they are not correlated with B and {delta} and (2) the luminosity variation of 3C 279 is dominated by the external Compton (EC) peak and its peak luminosity is correlated with {gamma}{sub b} and {delta}, but anti-correlated with B. These results suggest that {gamma}{sub b} may govern the spectral variation of Mrk 501 and {delta} and B would be responsible for the spectral variation of 3C 279. The narrow distribution of {gamma}{sub b} and the correlation of {gamma}{sub b} and B in 3C 279 would be due to the cooling from the EC process and the strong magnetic field. Based on our brief discussion, we propose that this spectral variation feature may originate from the instability of the corona but not from the variation of the accretion rate as does the blazar sequence.« less

  8. Repeat sequence chromosome specific nucleic acid probes and methods of preparing and using

    DOEpatents

    Weier, H.U.G.; Gray, J.W.

    1995-06-27

    A primer directed DNA amplification method to isolate efficiently chromosome-specific repeated DNA wherein degenerate oligonucleotide primers are used is disclosed. The probes produced are a heterogeneous mixture that can be used with blocking DNA as a chromosome-specific staining reagent, and/or the elements of the mixture can be screened for high specificity, size and/or high degree of repetition among other parameters. The degenerate primers are sets of primers that vary in sequence but are substantially complementary to highly repeated nucleic acid sequences, preferably clustered within the template DNA, for example, pericentromeric alpha satellite repeat sequences. The template DNA is preferably chromosome-specific. Exemplary primers and probes are disclosed. The probes of this invention can be used to determine the number of chromosomes of a specific type in metaphase spreads, in germ line and/or somatic cell interphase nuclei, micronuclei and/or in tissue sections. Also provided is a method to select arbitrarily repeat sequence probes that can be screened for chromosome-specificity. 18 figs.

  9. Repeat sequence chromosome specific nucleic acid probes and methods of preparing and using

    DOEpatents

    Weier, Heinz-Ulrich G.; Gray, Joe W.

    1995-01-01

    A primer directed DNA amplification method to isolate efficiently chromosome-specific repeated DNA wherein degenerate oligonucleotide primers are used is disclosed. The probes produced are a heterogeneous mixture that can be used with blocking DNA as a chromosome-specific staining reagent, and/or the elements of the mixture can be screened for high specificity, size and/or high degree of repetition among other parameters. The degenerate primers are sets of primers that vary in sequence but are substantially complementary to highly repeated nucleic acid sequences, preferably clustered within the template DNA, for example, pericentromeric alpha satellite repeat sequences. The template DNA is preferably chromosome-specific. Exemplary primers ard probes are disclosed. The probes of this invention can be used to determine the number of chromosomes of a specific type in metaphase spreads, in germ line and/or somatic cell interphase nuclei, micronuclei and/or in tissue sections. Also provided is a method to select arbitrarily repeat sequence probes that can be screened for chromosome-specificity.

  10. DNA Cloning of Plasmodium falciparum Circumsporozoite Gene: Amino Acid Sequence of Repetitive Epitope

    NASA Astrophysics Data System (ADS)

    Enea, Vincenzo; Ellis, Joan; Zavala, Fidel; Arnot, David E.; Asavanich, Achara; Masuda, Aoi; Quakyi, Isabella; Nussenzweig, Ruth S.

    1984-08-01

    A clone of complementary DNA encoding the circumsporozoite (CS) protein of the human malaria parasite Plasmodium falciparum has been isolated by screening an Escherichia coli complementary DNA library with a monoclonal antibody to the CS protein. The DNA sequence of the complementary DNA insert encodes a four-amino acid sequence: proline-asparagine-alanine-asparagine, tandemly repeated 23 times. The CS β -lactamase fusion protein specifically binds monoclonal antibodies to the CS protein and inhibits the binding of these antibodies to native Plasmodium falciparum CS protein. These findings provide a basis for the development of a vaccine against Plasmodium falciparum malaria.

  11. Hydroquinone: O-glucosyltransferase from cultivated Rauvolfia cells: enrichment and partial amino acid sequences.

    PubMed

    Arend, J; Warzecha, H; Stöckigt, J

    2000-01-01

    Plant cell suspension cultures of Rauvolfia are able to produce a high amount of arbutin by glucosylation of exogenously added hydroquinone. A four step purification procedure using anion exchange, hydrophobic interaction, hydroxyapatite-chromatography and chromatofocusing delivered in a yield of 0.5%, an approximately 390 fold enrichment of the involved glucosyltransferase. SDS-PAGE showed a M(r) for the enzyme of 52 kDa. Proteolysis of the pure enzyme with endoproteinase LysC revealed six peptide fragments with 9-23 amino acids which were sequenced. Sequence alignment of the six peptides showed high homologies to glycosyltransferases from other higher plants.

  12. Variation in Lake Michigan alewife (Alosa pseudoharengus) thiaminase and fatty acids composition

    USGS Publications Warehouse

    Honeyfield, D.C.; Tillitt, D.E.; Fitzsimons, J.D.; Brown, S.B.

    2010-01-01

    Thiaminase activity of alewife (Alosa pseudoharengus) is variable across Lake Michigan, yet factors that contribute to the variability in alewife thiaminase activity are unknown. The fatty acid content of Lake Michigan alewife has not been previously reported. Analysis of 53 Lake Michigan alewives found a positive correlation between thiaminase activity and the following fatty acid: C22:ln9, sum of omega-6 fatty acids (Sw6), and sum of the polyunsaturated fatty acids. Thiaminase activity was negatively correlated with C15:0, C16:0, C17:0, C18:0, C20:0, C22:0, C24:0, C18:ln9t, C20:3n3, C22:2, and the sum of all saturated fatty acids (SAFA). Multi-variant regression analysis resulted in three variables (C18:ln9t, Sw6, SAFA) that explained 71% (R2=0.71, P<0.0001) of the variation in thiaminase activity. Because the fatty acid content of an organism is related is food source, diet may be an important factor modulating alewife thiaminase activity. These data suggest there is an association between fatty acids and thiaminase activity in Lake Michigan alewife.

  13. Partial amino-acid sequence of the precursor of an immunoglobulin light chain containing NH2-terminal pyroglutamic acid.

    PubMed Central

    Burstein, Y; Kantour, F; Schechter, I

    1976-01-01

    Analyses of amino-acid sequences of the total cell-free products programmed by the mRNA of MOPC-104E gamma light (L)-chain show that over 95% of the products have sequences of a distinct protein that correspond to the L-chain precursor. In this precursor an extra piece is coupled to the NH2-terminus of the mature L-chain. Analyses of products labeled with [3H]alanine, [3H]leucine, and [3H]proline demonstrate that the extra piece is composed of at least 18 residues. Analyses of [35S]methione-labeled product indicate that the extra piece may contain an additional NH2-terminal methionine, which is detected in about 10% of the molecules. Partial recovery of the NJ2-terminal methionine (alanine, leucine, and proline are recovered in yields close to theoretical, greater than 95%) suggests that it is the initiator methionine, which is known to be short lived in eukaryotes due to rapid hydrolysis. Thus, the extra piece seems to be 19 residues in length, and it contains one methionine at the NH2-terminus, three alanines at positions 2, 12, and 17, and five leucines at positions 6, 8, 10, 11, and 13. The close gathering of leucine residues, as well as their abundance (26%), suggest that the extra piece would be quite hydrophobic. Hydrophobicity seems to be a general property of the extra piece, since similar clusters of leucine were found in the precursors of 3 KL-chains (Burstein, Y. & Schechter, I. (1976) Biochem. J. 157, 145-151). The NH2-terminus of the mature MOPC-104E gamma L-chain is blocked by pyroglutamic acid. The fact that in the precursor a peptide segment precedes this NH2-terminus establishes that pyroglutamic acid is not the initiator residue for synthesis of the L-chain. Apparently, the pyroglutamic acid is formed by cyclization of glutamic acid or glutamine during cleavage of the extra piece to yield the mature L-chain. Images PMID:822420

  14. Evolution of sequence-defined highly functionalized nucleic acid polymers

    NASA Astrophysics Data System (ADS)

    Chen, Zhen; Lichtor, Phillip A.; Berliner, Adrian P.; Chen, Jonathan C.; Liu, David R.

    2018-03-01

    The evolution of sequence-defined synthetic polymers made of building blocks beyond those compatible with polymerase enzymes or the ribosome has the potential to generate new classes of receptors, catalysts and materials. Here we describe a ligase-mediated DNA-templated polymerization and in vitro selection system to evolve highly functionalized nucleic acid polymers (HFNAPs) made from 32 building blocks that contain eight chemically diverse side chains on a DNA backbone. Through iterated cycles of polymer translation, selection and reverse translation, we discovered HFNAPs that bind proprotein convertase subtilisin/kexin type 9 (PCSK9) and interleukin-6, two protein targets implicated in human diseases. Mutation and reselection of an active PCSK9-binding polymer yielded evolved polymers with high affinity (KD = 3 nM). This evolved polymer potently inhibited the binding between PCSK9 and the low-density lipoprotein receptor. Structure-activity relationship studies revealed that specific side chains at defined positions in the polymers are required for binding to their respective targets. Our findings expand the chemical space of evolvable polymers to include densely functionalized nucleic acids with diverse, researcher-defined chemical repertoires.

  15. An Alignment-Free Algorithm in Comparing the Similarity of Protein Sequences Based on Pseudo-Markov Transition Probabilities among Amino Acids

    PubMed Central

    Li, Yushuang; Yang, Jiasheng; Zhang, Yi

    2016-01-01

    In this paper, we have proposed a novel alignment-free method for comparing the similarity of protein sequences. We first encode a protein sequence into a 440 dimensional feature vector consisting of a 400 dimensional Pseudo-Markov transition probability vector among the 20 amino acids, a 20 dimensional content ratio vector, and a 20 dimensional position ratio vector of the amino acids in the sequence. By evaluating the Euclidean distances among the representing vectors, we compare the similarity of protein sequences. We then apply this method into the ND5 dataset consisting of the ND5 protein sequences of 9 species, and the F10 and G11 datasets representing two of the xylanases containing glycoside hydrolase families, i.e., families 10 and 11. As a result, our method achieves a correlation coefficient of 0.962 with the canonical protein sequence aligner ClustalW in the ND5 dataset, much higher than those of other 5 popular alignment-free methods. In addition, we successfully separate the xylanases sequences in the F10 family and the G11 family and illustrate that the F10 family is more heat stable than the G11 family, consistent with a few previous studies. Moreover, we prove mathematically an identity equation involving the Pseudo-Markov transition probability vector and the amino acids content ratio vector. PMID:27918587

  16. Amino acid sequence of human cholinesterase. Annual report, 30 September 1984-30 September 1985

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lockridge, O.

    1985-10-01

    The active-site serine residue is located 198 amino acids from the N-terminal. The active-site peptide was isolated from three different genetic types of human serum cholinesterase: from usual, atypical, and atypical-silent genotypes. It was found that the amino acid sequence of the active-site peptide was identical in all three genotypes. Comparison of the complete sequences of cholinesterase from human serum and acetylcholinesterase from the electric organ of Torpedo californica shows an identity of 53%. Cholinesterase is of interest to the Department of Defense because cholinesterase protects against organophosphate poisons of the type used in chemical warfare. The structural results presentedmore » here will serve as the basis for cloning the gene for cholinesterase. The potential uses of large amounts of cholinesterase would be for cleaning up spills of organophosphates and possibly for detoxifying exposed personnel.« less

  17. The sequence of sequencers: The history of sequencing DNA

    PubMed Central

    Heather, James M.; Chain, Benjamin

    2016-01-01

    Determining the order of nucleic acid residues in biological samples is an integral component of a wide variety of research applications. Over the last fifty years large numbers of researchers have applied themselves to the production of techniques and technologies to facilitate this feat, sequencing DNA and RNA molecules. This time-scale has witnessed tremendous changes, moving from sequencing short oligonucleotides to millions of bases, from struggling towards the deduction of the coding sequence of a single gene to rapid and widely available whole genome sequencing. This article traverses those years, iterating through the different generations of sequencing technology, highlighting some of the key discoveries, researchers, and sequences along the way. PMID:26554401

  18. Two-level QSAR network (2L-QSAR) for peptide inhibitor design based on amino acid properties and sequence positions.

    PubMed

    Du, Q S; Ma, Y; Xie, N Z; Huang, R B

    2014-01-01

    In the design of peptide inhibitors the huge possible variety of the peptide sequences is of high concern. In collaboration with the fast accumulation of the peptide experimental data and database, a statistical method is suggested for peptide inhibitor design. In the two-level peptide prediction network (2L-QSAR) one level is the physicochemical properties of amino acids and the other level is the peptide sequence position. The activity contributions of amino acids are the functions of physicochemical properties and the sequence positions. In the prediction equation two weight coefficient sets {ak} and {bl} are assigned to the physicochemical properties and to the sequence positions, respectively. After the two coefficient sets are optimized based on the experimental data of known peptide inhibitors using the iterative double least square (IDLS) procedure, the coefficients are used to evaluate the bioactivities of new designed peptide inhibitors. The two-level prediction network can be applied to the peptide inhibitor design that may aim for different target proteins, or different positions of a protein. A notable advantage of the two-level statistical algorithm is that there is no need for host protein structural information. It may also provide useful insight into the amino acid properties and the roles of sequence positions.

  19. Amino-acid sequence and predicted three-dimensional structure of pea seed (Pisum sativum) ferritin.

    PubMed Central

    Lobreaux, S; Yewdall, S J; Briat, J F; Harrison, P M

    1992-01-01

    The iron storage protein, ferritin, is widely distributed in the living kingdom. Here the complete cDNA and derived amino-acid sequence of pea seed ferritin are described, together with its predicted secondary structure, namely a four-helix-bundle fold similar to those of mammalian ferritins, with a fifth short helix at the C-terminus. An N-terminal extension of 71 residues contains a transit peptide (first 47 residues) responsible for plastid targetting as in other plant ferritins, and this is cleaved before assembly. The second part of the extension (24 residues) belongs to the mature subunit; it is cleaved during germination. The amino-acid sequence of pea seed ferritin is aligned with those of other ferritins (49% amino-acid identity with H-chains and 40% with L-chains of human liver ferritin in the aligned region). A three-dimensional model has been constructed by fitting the aligned sequence to the coordinates of human H-chains, with appropriate modifications. A folded conformation with an 11-residue helix is predicted for the N-terminal extension. As in mammalian ferritins, 24 subunits assemble into a hollow shell. In pea seed ferritin, its N-terminal extension is exposed on the outside surface of the shell. Within each pea subunit is a ferroxidase centre resembling those of human ferritin H-chains except for a replacement of Glu-62 by His. The channel at the 4-fold-symmetry axes defined by E-helices, is predicted to be hydrophilic in plant ferritins, whereas it is hydrophobic in mammalian ferritins. Images Fig. 3. Fig. 5. Fig. 6. PMID:1472006

  20. Variations in endothelin receptor B subtype 2 (EDNRB2) coding sequences and mRNA expression levels in 4 Muscovy duck plumage colour phenotypes.

    PubMed

    Wu, N; Qin, H; Wang, M; Bian, Y; Dong, B; Sun, G; Zhao, W; Chang, G; Xu, Q; Chen, G

    2017-04-01

    1. Endothelin receptor B subtype 2 (EDNRB2) is a paralog of EDNRB, which encodes a 7-transmembrane G-protein coupled receptor. Previous studies reported that EDNRB was essential for melanoblast migration in mammals and ducks. 2. Muscovy ducks have different plumage colour phenotypes. Variations in EDNRB2 coding sequences (CDSs) and mRNA expression levels were investigated in 4 different Muscovy duck plumage colour phenotypes, including black, black mutant, silver and white head. 3. The EDNRB2 gene from Muscovy duck was cloned; it had a length of 6435 bp and encoded 437 amino acids. The coding region was screened and potential single nucleotide polymorphisms were identified. Eight mutations were obtained, including one missense variant (c.64C > T) and 7 synonymous substitutions. The substitutions were associated with plumage colour phenotypes. 4. The EDNRB2 mRNA expression levels were compared between feather pulp from black birds and black mutant birds. The results indicated that EDNRB2 transcripts in feather pulp were significantly higher in black feathers than in white feathers. 5. The results determined the variation of EDNRB2 CDS and mRNA expression in Muscovy ducks of various plumage colours.

  1. Overview of the creative genome: effects of genome structure and sequence on the generation of variation and evolution.

    PubMed

    Caporale, Lynn Helena

    2012-09-01

    This overview of a special issue of Annals of the New York Academy of Sciences discusses uneven distribution of distinct types of variation across the genome, the dependence of specific types of variation upon distinct classes of DNA sequences and/or the induction of specific proteins, the circumstances in which distinct variation-generating systems are activated, and the implications of this work for our understanding of evolution and of cancer. Also discussed is the value of non text-based computational methods for analyzing information carried by DNA, early insights into organizational frameworks that affect genome behavior, and implications of this work for comparative genomics. © 2012 New York Academy of Sciences.

  2. MSeq-CNV: accurate detection of Copy Number Variation from Sequencing of Multiple samples.

    PubMed

    Malekpour, Seyed Amir; Pezeshk, Hamid; Sadeghi, Mehdi

    2018-03-05

    Currently a few tools are capable of detecting genome-wide Copy Number Variations (CNVs) based on sequencing of multiple samples. Although aberrations in mate pair insertion sizes provide additional hints for the CNV detection based on multiple samples, the majority of the current tools rely only on the depth of coverage. Here, we propose a new algorithm (MSeq-CNV) which allows detecting common CNVs across multiple samples. MSeq-CNV applies a mixture density for modeling aberrations in depth of coverage and abnormalities in the mate pair insertion sizes. Each component in this mixture density applies a Binomial distribution for modeling the number of mate pairs with aberration in the insertion size and also a Poisson distribution for emitting the read counts, in each genomic position. MSeq-CNV is applied on simulated data and also on real data of six HapMap individuals with high-coverage sequencing, in 1000 Genomes Project. These individuals include a CEU trio of European ancestry and a YRI trio of Nigerian ethnicity. Ancestry of these individuals is studied by clustering the identified CNVs. MSeq-CNV is also applied for detecting CNVs in two samples with low-coverage sequencing in 1000 Genomes Project and six samples form the Simons Genome Diversity Project.

  3. Amino acid and nucleotide recurrence in aligned sequences: synonymous substitution patterns in association with global and local base compositions.

    PubMed

    Nishizawa, M; Nishizawa, K

    2000-10-01

    The tendency for repetitiveness of nucleotides in DNA sequences has been reported for a variety of organisms. We show that the tendency for repetitive use of amino acids is widespread and is observed even for segments conserved between human and Drosophila melanogaster at the level of >50% amino acid identity. This indicates that repetitiveness influences not only the weakly constrained segments but also those sequence segments conserved among phyla. Not only glutamine (Q) but also many of the 20 amino acids show a comparable level of repetitiveness. Repetitiveness in bases at codon position 3 is stronger for human than for D.melanogaster, whereas local repetitiveness in intron sequences is similar between the two organisms. While genes for immune system-specific proteins, but not ancient human genes (i.e. human homologs of Escherichia coli genes), have repetitiveness at codon bases 1 and 2, repetitiveness at codon base 3 for these groups is similar, suggesting that the human genome has at least two mechanisms generating local repetitiveness. Neither amino acid nor nucleotide repetitiveness is observed beyond the exon boundary, denying the possibility that such repetitiveness could mainly stem from natural selection on mRNA or protein sequences. Analyses of mammalian sequence alignments show that while the 'between gene' GC content heterogeneity, which is linked to 'isochores', is a principal factor associated with the bias in substitution patterns in human, 'within gene' heterogeneity in nucleotide composition is also associated with such bias on a more local scale. The relationship amongst the various types of repetitiveness is discussed.

  4. Amino acid and nucleotide recurrence in aligned sequences: synonymous substitution patterns in association with global and local base compositions

    PubMed Central

    Nishizawa, Manami; Nishizawa, Kazuhisa

    2000-01-01

    The tendency for repetitiveness of nucleotides in DNA sequences has been reported for a variety of organisms. We show that the tendency for repetitive use of amino acids is widespread and is observed even for segments conserved between human and Drosophila melanogaster at the level of >50% amino acid identity. This indicates that repetitiveness influences not only the weakly constrained segments but also those sequence segments conserved among phyla. Not only glutamine (Q) but also many of the 20 amino acids show a comparable level of repetitiveness. Repetitiveness in bases at codon position 3 is stronger for human than for D.melanogaster, whereas local repetitiveness in intron sequences is similar between the two organisms. While genes for immune system-specific proteins, but not ancient human genes (i.e. human homologs of Escherichia coli genes), have repetitiveness at codon bases 1 and 2, repetitiveness at codon base 3 for these groups is similar, suggesting that the human genome has at least two mechanisms generating local repetitiveness. Neither amino acid nor nucleotide repetitiveness is observed beyond the exon boundary, denying the possibility that such repetitiveness could mainly stem from natural selection on mRNA or protein sequences. Analyses of mammalian sequence alignments show that while the ‘between gene’ GC content heterogeneity, which is linked to ‘isochores’, is a principal factor associated with the bias in substitution patterns in human, ‘within gene’ heterogeneity in nucleotide composition is also associated with such bias on a more local scale. The relationship amongst the various types of repetitiveness is discussed. PMID:11000273

  5. Ultra-deep sequencing reveals high prevalence and broad structural diversity of hepatitis B surface antigen mutations in a global population

    PubMed Central

    Gencay, Mikael; Hübner, Kirsten; Gohl, Peter; Seffner, Anja; Weizenegger, Michael; Neofytos, Dionysios; Batrla, Richard; Woeste, Andreas; Kim, Hyon-suk; Westergaard, Gaston; Reinsch, Christine; Brill, Eva; Thu Thuy, Pham Thi; Hoang, Bui Huu; Sonderup, Mark; Spearman, C. Wendy; Pabinger, Stephan; Gautier, Jérémie; Brancaccio, Giuseppina; Fasano, Massimo; Santantonio, Teresa; Gaeta, Giovanni B.; Nauck, Markus; Kaminski, Wolfgang E.

    2017-01-01

    The diversity of the hepatitis B surface antigen (HBsAg) has a significant impact on the performance of diagnostic screening tests and the clinical outcome of hepatitis B infection. Neutralizing or diagnostic antibodies against the HBsAg are directed towards its highly conserved major hydrophilic region (MHR), in particular towards its “a” determinant subdomain. Here, we explored, on a global scale, the genetic diversity of the HBsAg MHR in a large, multi-ethnic cohort of randomly selected subjects with HBV infection from four continents. A total of 1553 HBsAg positive blood samples of subjects originating from 20 different countries across Africa, America, Asia and central Europe were characterized for amino acid variation in the MHR. Using highly sensitive ultra-deep sequencing, we found 72.8% of the successfully sequenced subjects (n = 1391) demonstrated amino acid sequence variation in the HBsAg MHR. This indicates that the global variation frequency in the HBsAg MHR is threefold higher than previously reported. The majority of the amino acid mutations were found in the HBV genotypes B (28.9%) and C (25.4%). Collectively, we identified 345 distinct amino acid mutations in the MHR. Among these, we report 62 previously unknown mutations, which extends the worldwide pool of currently known HBsAg MHR mutations by 22%. Importantly, topological analysis identified the “a” determinant upstream flanking region as the structurally most diverse subdomain of the HBsAg MHR. The highest prevalence of “a” determinant region mutations was observed in subjects from Asia, followed by the African, American and European cohorts, respectively. Finally, we found that more than half (59.3%) of all HBV subjects investigated carried multiple MHR mutations. Together, this worldwide ultra-deep sequencing based genotyping study reveals that the global prevalence and structural complexity of variation in the hepatitis B surface antigen have, to date, been significantly

  6. Amino acid sequence of bovine muzzle epithelial desmocollin derived from cloned cDNA: a novel subtype of desmosomal cadherins.

    PubMed

    Koch, P J; Goldschmidt, M D; Walsh, M J; Zimbelmann, R; Schmelz, M; Franke, W W

    1991-05-01

    Desmosomes are cell-type-specific intercellular junctions found in epithelium, myocardium and certain other tissues. They consist of assemblies of molecules involved in the adhesion of specific cell types and in the anchorage of cell-type-specific cytoskeletal elements, the intermediate-size filaments, to the plasma membrane. To explore the individual desmosomal components and their functions we have isolated DNA clones encoding the desmosomal glycoprotein, desmocollin, using antibodies and a cDNA expression library from bovine muzzle epithelium. The cDNA-deduced amino-acid sequence of desmocollin (presently we cannot decide to which of the two desmocollins, DC I or DC II, this clone relates) defines a polypeptide with a calculated molecular weight of 85,000, with a single candidate sequence of 24 amino acids sufficiently long for a transmembrane arrangement, and an extracellular aminoterminal portion of 561 amino acid residues, compared to a cytoplasmic part of only 176 amino acids. Amino acid sequence comparisons have revealed that desmocollin is highly homologous to members of the cadherin family of cell adhesion molecules, including the previously sequenced desmoglein, another desmosome-specific cadherin. Using riboprobes derived from cDNAs for Northern-blot analyses, we have identified an mRNA of approximately 6 kb in stratified epithelia such as muzzle epithelium and tongue mucosa but not in two epithelial cell culture lines containing desmosomes and desmoplakins. The difference may indicate drastic differences in mRNA concentration or the existence of cell-type-specific desmocollin subforms. The molecular topology of desmocollin(s) is discussed in relation to possible functions of the individual molecular domains.

  7. The amino acid sequence around the active-site cysteine and histidine residues of stem bromelain

    PubMed Central

    Husain, S. S.; Lowe, G.

    1970-01-01

    Stem bromelain that had been irreversibly inhibited with 1,3-dibromo[2-14C]-acetone was reduced with sodium borohydride and carboxymethylated with iodoacetic acid. After digestion with trypsin and α-chymotrypsin three radioactive peptides were isolated chromatographically. The amino acid sequences around the cross-linked cysteine and histidine residues were determined and showed a high degree of homology with those around the active-site cysteine and histidine residues of papain and ficin. PMID:5420046

  8. Predicted secondary structure similarity in the absence of primary amino acid sequence homology: hepatitis B virus open reading frames.

    PubMed Central

    Schaeffer, E; Sninsky, J J

    1984-01-01

    Proteins that are related evolutionarily may have diverged at the level of primary amino acid sequence while maintaining similar secondary structures. Computer analysis has been used to compare the open reading frames of the hepatitis B virus to those of the woodchuck hepatitis virus at the level of amino acid sequence, and to predict the relative hydrophilic character and the secondary structure of putative polypeptides. Similarity is seen at the levels of relative hydrophilicity and secondary structure, in the absence of sequence homology. These data reinforce the proposal that these open reading frames encode viral proteins. Computer analysis of this type can be more generally used to establish structural similarities between proteins that do not share obvious sequence homology as well as to assess whether an open reading frame is fortuitous or codes for a protein. PMID:6585835

  9. The sequence of sequencers: The history of sequencing DNA.

    PubMed

    Heather, James M; Chain, Benjamin

    2016-01-01

    Determining the order of nucleic acid residues in biological samples is an integral component of a wide variety of research applications. Over the last fifty years large numbers of researchers have applied themselves to the production of techniques and technologies to facilitate this feat, sequencing DNA and RNA molecules. This time-scale has witnessed tremendous changes, moving from sequencing short oligonucleotides to millions of bases, from struggling towards the deduction of the coding sequence of a single gene to rapid and widely available whole genome sequencing. This article traverses those years, iterating through the different generations of sequencing technology, highlighting some of the key discoveries, researchers, and sequences along the way. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  10. GibbsCluster: unsupervised clustering and alignment of peptide sequences.

    PubMed

    Andreatta, Massimo; Alvarez, Bruno; Nielsen, Morten

    2017-07-03

    Receptor interactions with short linear peptide fragments (ligands) are at the base of many biological signaling processes. Conserved and information-rich amino acid patterns, commonly called sequence motifs, shape and regulate these interactions. Because of the properties of a receptor-ligand system or of the assay used to interrogate it, experimental data often contain multiple sequence motifs. GibbsCluster is a powerful tool for unsupervised motif discovery because it can simultaneously cluster and align peptide data. The GibbsCluster 2.0 presented here is an improved version incorporating insertion and deletions accounting for variations in motif length in the peptide input. In basic terms, the program takes as input a set of peptide sequences and clusters them into meaningful groups. It returns the optimal number of clusters it identified, together with the sequence alignment and sequence motif characterizing each cluster. Several parameters are available to customize cluster analysis, including adjustable penalties for small clusters and overlapping groups and a trash cluster to remove outliers. As an example application, we used the server to deconvolute multiple specificities in large-scale peptidome data generated by mass spectrometry. The server is available at http://www.cbs.dtu.dk/services/GibbsCluster-2.0. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  11. Human Y chromosome copy number variation in the next generation sequencing era and beyond.

    PubMed

    Massaia, Andrea; Xue, Yali

    2017-05-01

    The human Y chromosome provides a fertile ground for structural rearrangements owing to its haploidy and high content of repeated sequences. The methodologies used for copy number variation (CNV) studies have developed over the years. Low-throughput techniques based on direct observation of rearrangements were developed early on, and are still used, often to complement array-based or sequencing approaches which have limited power in regions with high repeat content and specifically in the presence of long, identical repeats, such as those found in human sex chromosomes. Some specific rearrangements have been investigated for decades; because of their effects on fertility, or their outstanding evolutionary features, the interest in these has not diminished. However, following the flourishing of large-scale genomics, several studies have investigated CNVs across the whole chromosome. These studies sometimes employ data generated within large genomic projects such as the DDD study or the 1000 Genomes Project, and often survey large samples of healthy individuals without any prior selection. Novel technologies based on sequencing long molecules and combinations of technologies, promise to stimulate the study of Y-CNVs in the immediate future.

  12. From amino acid sequence to bioactivity: The biomedical potential of antitumor peptides.

    PubMed

    Blanco-Míguez, Aitor; Gutiérrez-Jácome, Alberto; Pérez-Pérez, Martín; Pérez-Rodríguez, Gael; Catalán-García, Sandra; Fdez-Riverola, Florentino; Lourenço, Anália; Sánchez, Borja

    2016-06-01

    Chemoprevention is the use of natural and/or synthetic substances to block, reverse, or retard the process of carcinogenesis. In this field, the use of antitumor peptides is of interest as, (i) these molecules are small in size, (ii) they show good cell diffusion and permeability, (iii) they affect one or more specific molecular pathways involved in carcinogenesis, and (iv) they are not usually genotoxic. We have checked the Web of Science Database (23/11/2015) in order to collect papers reporting on bioactive peptide (1691 registers), which was further filtered searching terms such as "antiproliferative," "antitumoral," or "apoptosis" among others. Works reporting the amino acid sequence of an antiproliferative peptide were kept (60 registers), and this was complemented with the peptides included in CancerPPD, an extensive resource for antiproliferative peptides and proteins. Peptides were grouped according to one of the following mechanism of action: inhibition of cell migration, inhibition of tumor angiogenesis, antioxidative mechanisms, inhibition of gene transcription/cell proliferation, induction of apoptosis, disorganization of tubulin structure, cytotoxicity, or unknown mechanisms. The main mechanisms of action of those antiproliferative peptides with known amino acid sequences are presented and finally, their potential clinical usefulness and future challenges on their application is discussed. © 2016 The Protein Society.

  13. From amino acid sequence to bioactivity: The biomedical potential of antitumor peptides

    PubMed Central

    Blanco‐Míguez, Aitor; Gutiérrez‐Jácome, Alberto; Pérez‐Pérez, Martín; Pérez‐Rodríguez, Gael; Catalán‐García, Sandra; Fdez‐Riverola, Florentino; Lourenço, Anália

    2016-01-01

    Abstract Chemoprevention is the use of natural and/or synthetic substances to block, reverse, or retard the process of carcinogenesis. In this field, the use of antitumor peptides is of interest as, (i) these molecules are small in size, (ii) they show good cell diffusion and permeability, (iii) they affect one or more specific molecular pathways involved in carcinogenesis, and (iv) they are not usually genotoxic. We have checked the Web of Science Database (23/11/2015) in order to collect papers reporting on bioactive peptide (1691 registers), which was further filtered searching terms such as “antiproliferative,” “antitumoral,” or “apoptosis” among others. Works reporting the amino acid sequence of an antiproliferative peptide were kept (60 registers), and this was complemented with the peptides included in CancerPPD, an extensive resource for antiproliferative peptides and proteins. Peptides were grouped according to one of the following mechanism of action: inhibition of cell migration, inhibition of tumor angiogenesis, antioxidative mechanisms, inhibition of gene transcription/cell proliferation, induction of apoptosis, disorganization of tubulin structure, cytotoxicity, or unknown mechanisms. The main mechanisms of action of those antiproliferative peptides with known amino acid sequences are presented and finally, their potential clinical usefulness and future challenges on their application is discussed. PMID:27010507

  14. Complete nucleotide and derived amino acid sequence of cDNA encoding the mitochondrial uncoupling protein of rat brown adipose tissue: lack of a mitochondrial targeting presequence.

    PubMed Central

    Ridley, R G; Patel, H V; Gerber, G E; Morton, R C; Freeman, K B

    1986-01-01

    A cDNA clone spanning the entire amino acid sequence of the nuclear-encoded uncoupling protein of rat brown adipose tissue mitochondria has been isolated and sequenced. With the exception of the N-terminal methionine the deduced N-terminus of the newly synthesized uncoupling protein is identical to the N-terminal 30 amino acids of the native uncoupling protein as determined by protein sequencing. This proves that the protein contains no N-terminal mitochondrial targeting prepiece and that a targeting region must reside within the amino acid sequence of the mature protein. Images PMID:3012461

  15. Recurrent Coding Sequence Variation Explains Only A Small Fraction of the Genetic Architecture of Colorectal Cancer

    PubMed Central

    Timofeeva, Maria N.; Kinnersley, Ben; Farrington, Susan M.; Whiffin, Nicola; Palles, Claire; Svinti, Victoria; Lloyd, Amy; Gorman, Maggie; Ooi, Li-Yin; Hosking, Fay; Barclay, Ella; Zgaga, Lina; Dobbins, Sara; Martin, Lynn; Theodoratou, Evropi; Broderick, Peter; Tenesa, Albert; Smillie, Claire; Grimes, Graeme; Hayward, Caroline; Campbell, Archie; Porteous, David; Deary, Ian J.; Harris, Sarah E.; Northwood, Emma L.; Barrett, Jennifer H.; Smith, Gillian; Wolf, Roland; Forman, David; Morreau, Hans; Ruano, Dina; Tops, Carli; Wijnen, Juul; Schrumpf, Melanie; Boot, Arnoud; Vasen, Hans F A; Hes, Frederik J.; van Wezel, Tom; Franke, Andre; Lieb, Wolgang; Schafmayer, Clemens; Hampe, Jochen; Buch, Stephan; Propping, Peter; Hemminki, Kari; Försti, Asta; Westers, Helga; Hofstra, Robert; Pinheiro, Manuela; Pinto, Carla; Teixeira, Manuel; Ruiz-Ponte, Clara; Fernández-Rozadilla, Ceres; Carracedo, Angel; Castells, Antoni; Castellví-Bel, Sergi; Campbell, Harry; Bishop, D. Timothy; Tomlinson, Ian P M; Dunlop, Malcolm G.; Houlston, Richard S.

    2015-01-01

    Whilst common genetic variation in many non-coding genomic regulatory regions are known to impart risk of colorectal cancer (CRC), much of the heritability of CRC remains unexplained. To examine the role of recurrent coding sequence variation in CRC aetiology, we genotyped 12,638 CRCs cases and 29,045 controls from six European populations. Single-variant analysis identified a coding variant (rs3184504) in SH2B3 (12q24) associated with CRC risk (OR = 1.08, P = 3.9 × 10−7), and novel damaging coding variants in 3 genes previously tagged by GWAS efforts; rs16888728 (8q24) in UTP23 (OR = 1.15, P = 1.4 × 10−7); rs6580742 and rs12303082 (12q13) in FAM186A (OR = 1.11, P = 1.2 × 10−7 and OR = 1.09, P = 7.4 × 10−8); rs1129406 (12q13) in ATF1 (OR = 1.11, P = 8.3 × 10−9), all reaching exome-wide significance levels. Gene based tests identified associations between CRC and PCDHGA genes (P < 2.90 × 10−6). We found an excess of rare, damaging variants in base-excision (P = 2.4 × 10−4) and DNA mismatch repair genes (P = 6.1 × 10−4) consistent with a recessive mode of inheritance. This study comprehensively explores the contribution of coding sequence variation to CRC risk, identifying associations with coding variation in 4 genes and PCDHG gene cluster and several candidate recessive alleles. However, these findings suggest that recurrent, low-frequency coding variants account for a minority of the unexplained heritability of CRC. PMID:26553438

  16. Fatty Acid Profile and Unigene-Derived Simple Sequence Repeat Markers in Tung Tree (Vernicia fordii)

    PubMed Central

    Zhang, Lin; Jia, Baoguang; Tan, Xiaofeng; Thammina, Chandra S.; Long, Hongxu; Liu, Min; Wen, Shanna; Song, Xianliang; Cao, Heping

    2014-01-01

    Tung tree (Vernicia fordii) provides the sole source of tung oil widely used in industry. Lack of fatty acid composition and molecular markers hinders biochemical, genetic and breeding research. The objectives of this study were to determine fatty acid profiles and develop unigene-derived simple sequence repeat (SSR) markers in tung tree. Fatty acid profiles of 41 accessions showed that the ratio of α-eleostearic acid was increasing continuously with a parallel trend to the amount of tung oil accumulation while the ratios of other fatty acids were decreasing in different stages of the seeds and that α-eleostearic acid (18∶3) consisted of 77% of the total fatty acids in tung oil. Transcriptome sequencing identified 81,805 unigenes from tung cDNA library constructed using seed mRNA and discovered 6,366 SSRs in 5,404 unigenes. The di- and tri-nucleotide microsatellites accounted for 92% of the SSRs with AG/CT and AAG/CTT being the most abundant SSR motifs. Fifteen polymorphic genic-SSR markers were developed from 98 unigene loci tested in 41 cultivated tung accessions by agarose gel and capillary electrophoresis. Genbank database search identified 10 of them putatively coding for functional proteins. Quantitative PCR demonstrated that all 15 polymorphic SSR-associated unigenes were expressed in tung seeds and some of them were highly correlated with oil composition in the seeds. Dendrogram revealed that most of the 41 accessions were clustered according to the geographic region. These new polymorphic genic-SSR markers will facilitate future studies on genetic diversity, molecular fingerprinting, comparative genomics and genetic mapping in tung tree. The lipid profiles in the seeds of 41 tung accessions will be valuable for biochemical and breeding studies. PMID:25167054

  17. Genome Sequence of Lactobacillus rhamnosus Strain CASL, an Efficient l-Lactic Acid Producer from Cheap Substrate Cassava

    PubMed Central

    Yu, Bo; Su, Fei; Wang, Limin; Zhao, Bo; Qin, Jiayang; Ma, Cuiqing; Xu, Ping; Ma, Yanhe

    2011-01-01

    Lactobacillus rhamnosus is a type of probiotic bacteria with industrial potential for l-lactic acid production. We announce the draft genome sequence of L. rhamnosus CASL (2,855,156 bp with a G+C content of 46.6%), which is an efficient producer of l-lactic acid from cheap, nonfood substrate cassava with a high production titer. PMID:22123765

  18. Autozygome Sequencing Expands the Horizon of Human Knockout Research and Provides Novel Insights into Human Phenotypic Variation

    PubMed Central

    Anazi, Shamsa; Alshamekh, Shomoukh; Alkuraya, Fowzan S.

    2013-01-01

    The use of autozygosity as a mapping tool in the search for autosomal recessive disease genes is well established. We hypothesized that autozygosity not only unmasks the recessiveness of disease causing variants, but can also reveal natural knockouts of genes with less obvious phenotypic consequences. To test this hypothesis, we exome sequenced 77 well phenotyped individuals born to first cousin parents in search of genes that are biallelically inactivated. Using a very conservative estimate, we show that each of these individuals carries biallelic inactivation of 22.8 genes on average. For many of the 169 genes that appear to be biallelically inactivated, available data support involvement in modulating metabolism, immunity, perception, external appearance and other phenotypic aspects, and appear therefore to contribute to human phenotypic variation. Other genes with biallelic inactivation may contribute in yet unknown mechanisms or may be on their way to conversion into pseudogenes due to true recent dispensability. We conclude that sequencing the autozygome is an efficient way to map the contribution of genes to human phenotypic variation that goes beyond the classical definition of disease. PMID:24367280

  19. Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform.

    PubMed

    Schirmer, Melanie; Ijaz, Umer Z; D'Amore, Rosalinda; Hall, Neil; Sloan, William T; Quince, Christopher

    2015-03-31

    With read lengths of currently up to 2 × 300 bp, high throughput and low sequencing costs Illumina's MiSeq is becoming one of the most utilized sequencing platforms worldwide. The platform is manageable and affordable even for smaller labs. This enables quick turnaround on a broad range of applications such as targeted gene sequencing, metagenomics, small genome sequencing and clinical molecular diagnostics. However, Illumina error profiles are still poorly understood and programs are therefore not designed for the idiosyncrasies of Illumina data. A better knowledge of the error patterns is essential for sequence analysis and vital if we are to draw valid conclusions. Studying true genetic variation in a population sample is fundamental for understanding diseases, evolution and origin. We conducted a large study on the error patterns for the MiSeq based on 16S rRNA amplicon sequencing data. We tested state-of-the-art library preparation methods for amplicon sequencing and showed that the library preparation method and the choice of primers are the most significant sources of bias and cause distinct error patterns. Furthermore we tested the efficiency of various error correction strategies and identified quality trimming (Sickle) combined with error correction (BayesHammer) followed by read overlapping (PANDAseq) as the most successful approach, reducing substitution error rates on average by 93%. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  20. Fluorescence energy transfer as a probe for nucleic acid structures and sequences.

    PubMed Central

    Mergny, J L; Boutorine, A S; Garestier, T; Belloc, F; Rougée, M; Bulychev, N V; Koshkin, A A; Bourson, J; Lebedev, A V; Valeur, B

    1994-01-01

    The primary or secondary structure of single-stranded nucleic acids has been investigated with fluorescent oligonucleotides, i.e., oligonucleotides covalently linked to a fluorescent dye. Five different chromophores were used: 2-methoxy-6-chloro-9-amino-acridine, coumarin 500, fluorescein, rhodamine and ethidium. The chemical synthesis of derivatized oligonucleotides is described. Hybridization of two fluorescent oligonucleotides to adjacent nucleic acid sequences led to fluorescence excitation energy transfer between the donor and the acceptor dyes. This phenomenon was used to probe primary and secondary structures of DNA fragments and the orientation of oligodeoxynucleotides synthesized with the alpha-anomers of nucleoside units. Fluorescence energy transfer can be used to reveal the formation of hairpin structures and the translocation of genes between two chromosomes. PMID:8152922

  1. Concentration variations of amino acids in mammalian fossils: effects of diagenesis and the implications for amino acid racemization analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Blackwell, B.; Rutter, N.W.

    Detailed amino acid analysis of bones, teeth, and antler from several mammal species have shown that concentrations of several amino acids can be related to three factors: type of material analyzed, diagenetic alteration of the material, and relative age of the fossil. Concentrations of several amino acids are significantly different in enamel compared to those of dentine or cement. This can be used to check that no contamination of one material by another has occurred, which is critical for using the data for amino acid dating, since all three materials have different racemization rates for some acids. With increased inmore » growth of secondary minerals, generally reduced amino acid concentrations are observed. Interacid ratios and concentrations vary significantly the norms expected for the type of material with increasing degrees of alteration. These effects can be linked to abnormal racemization ratios observed in the same samples. Therefore, abnormal concentrations and/or interacid ratios can be used to detect samples in which the D/L amino acid ratios otherwise appear normal, thereby insuring better accuracy of amino acid racemization analysis. For unaltered fossils, with increasing sample age regardless the type of material, some amino acids steadily degrade, while others actually increase in concentration initially due to their generation as by-products of decay. Preliminary studies indicate that this progressive alteration can used to complement racemization data for determining relative stratigraphic sequences.« less

  2. Variation in plasma amino acid concentrations during a cycling competition.

    PubMed

    Medelli, J; Lounana, J; Hill, D

    2003-06-01

    The variations in plasma concentrations of 24 amino acids (AAs) were measured, taking into account modifications in plasma volume, in 7 male subjects, professional cyclists, during the first 2 stages (EI and EII) of the competition "4 Days of Dunkirk 1999". Blood samples were taken before the start and at the end of each stage. At the end of EI a significant reduction (p<0.02) in alpha-aminobutyric acid was observed (-38%) and a significant increase (Wilcoxon 0.01acid (33%), arginine (-21%), cystine (-15%), glycocholic acid (26%), hydroxyproline (-28%), leucine (-17%), lysine (-26%), methionine (-19%), threonine (-15%) and valine (-16%). The recovery period was characterised by increases in previously reduced AAs and vice versa. These results show that cellular use of plasma AAs in the endurance cyclist could differ between the first hours of competition and the subsequent stages, possibly in relation to the gradual increase in stress level as the stages progress; they also suggest that appropriate consideration should be given to the quality of protein inputs in the nutrition of cyclists involved in a high-level stage competition.

  3. Sequence variations of the human MPDZ gene and association with alcoholism in subjects with European ancestry.

    PubMed

    Karpyak, Victor M; Kim, Jeong-Hyun; Biernacka, Joanna M; Wieben, Eric D; Mrazek, David A; Black, John L; Choi, Doo-Sup

    2009-04-01

    Mpdz gene variations are known contributors of acute alcohol withdrawal severity and seizures in mice. To investigate the relevance of these findings for human alcoholism, we resequenced 46 exons, exon-intron boundaries, and 2 kilobases in the 5' region of the human MPDZ gene in 61 subjects with a history of alcohol withdrawal seizures (AWS), 59 subjects with a history of alcohol withdrawal without AWS, and 64 Coriell samples from self-reported nonalcoholic subjects [all European American (EA) ancestry] and compared with the Mpdz sequences of 3 mouse strains with different propensity to AWS. To explore potential associations of the human MPDZ gene with alcoholism and AWS, single SNP and haplotype analyses were performed using 13 common variants. Sixty-seven new, mostly rare variants were discovered in the human MPDZ gene. Sequence comparison revealed that the human gene does not have variations identical to those comprising Mpdz gene haplotype associated with AWS in mice. We also found no significant association between MPDZ haplotypes and AWS in humans. However, a global test of haplotype association revealed a significant difference in haplotype frequencies between alcohol-dependent subjects without AWS and Coriell controls (p = 0.015), suggesting a potential role of MPDZ in alcoholism and/or related phenotypes other than AWS. Haplotype-specific tests for the most common haplotypes (frequency > 0.05), revealed a specific high-risk haplotype (p = 0.006, maximum statistic p = 0.051), containing rs13297480G allele also found to be significantly more prevalent in alcoholics without AWS compared with nonalcoholic Coriell subjects (p = 0.019). Sequencing of MPDZ gene in individuals with EA ancestry revealed no variations in the sites identical to those associated with AWS in mice. Exploratory haplotype and single SNP association analyses suggest a possible association between the MPDZ gene and alcohol dependence but not AWS. Further functional genomic analysis of MPDZ

  4. Relative quantification of 40 nucleic acid sequences by multiplex ligation-dependent probe amplification

    PubMed Central

    Schouten, Jan P.; McElgunn, Cathal J.; Waaijer, Raymond; Zwijnenburg, Danny; Diepvens, Filip; Pals, Gerard

    2002-01-01

    We describe a new method for relative quantification of 40 different DNA sequences in an easy to perform reaction requiring only 20 ng of human DNA. Applications shown of this multiplex ligation-dependent probe amplification (MLPA) technique include the detection of exon deletions and duplications in the human BRCA1, MSH2 and MLH1 genes, detection of trisomies such as Down’s syndrome, characterisation of chromosomal aberrations in cell lines and tumour samples and SNP/mutation detection. Relative quantification of mRNAs by MLPA will be described elsewhere. In MLPA, not sample nucleic acids but probes added to the samples are amplified and quantified. Amplification of probes by PCR depends on the presence of probe target sequences in the sample. Each probe consists of two oligonucleotides, one synthetic and one M13 derived, that hybridise to adjacent sites of the target sequence. Such hybridised probe oligonucleotides are ligated, permitting subsequent amplification. All ligated probes have identical end sequences, permitting simultaneous PCR amplification using only one primer pair. Each probe gives rise to an amplification product of unique size between 130 and 480 bp. Probe target sequences are small (50–70 nt). The prerequisite of a ligation reaction provides the opportunity to discriminate single nucleotide differences. PMID:12060695

  5. Relative quantification of 40 nucleic acid sequences by multiplex ligation-dependent probe amplification.

    PubMed

    Schouten, Jan P; McElgunn, Cathal J; Waaijer, Raymond; Zwijnenburg, Danny; Diepvens, Filip; Pals, Gerard

    2002-06-15

    We describe a new method for relative quantification of 40 different DNA sequences in an easy to perform reaction requiring only 20 ng of human DNA. Applications shown of this multiplex ligation-dependent probe amplification (MLPA) technique include the detection of exon deletions and duplications in the human BRCA1, MSH2 and MLH1 genes, detection of trisomies such as Down's syndrome, characterisation of chromosomal aberrations in cell lines and tumour samples and SNP/mutation detection. Relative quantification of mRNAs by MLPA will be described elsewhere. In MLPA, not sample nucleic acids but probes added to the samples are amplified and quantified. Amplification of probes by PCR depends on the presence of probe target sequences in the sample. Each probe consists of two oligonucleotides, one synthetic and one M13 derived, that hybridise to adjacent sites of the target sequence. Such hybridised probe oligonucleotides are ligated, permitting subsequent amplification. All ligated probes have identical end sequences, permitting simultaneous PCR amplification using only one primer pair. Each probe gives rise to an amplification product of unique size between 130 and 480 bp. Probe target sequences are small (50-70 nt). The prerequisite of a ligation reaction provides the opportunity to discriminate single nucleotide differences.

  6. Development of a genotype-by-sequencing immunogenetic assay as exemplified by screening for variation in red fox with and without endemic rabies exposure.

    PubMed

    Donaldson, Michael E; Rico, Yessica; Hueffer, Karsten; Rando, Halie M; Kukekova, Anna V; Kyle, Christopher J

    2018-01-01

    Pathogens are recognized as major drivers of local adaptation in wildlife systems. By determining which gene variants are favored in local interactions among populations with and without disease, spatially explicit adaptive responses to pathogens can be elucidated. Much of our current understanding of host responses to disease comes from a small number of genes associated with an immune response. High-throughput sequencing (HTS) technologies, such as genotype-by-sequencing (GBS), facilitate expanded explorations of genomic variation among populations. Hybridization-based GBS techniques can be leveraged in systems not well characterized for specific variants associated with disease outcome to "capture" specific genes and regulatory regions known to influence expression and disease outcome. We developed a multiplexed, sequence capture assay for red foxes to simultaneously assess ~300-kbp of genomic sequence from 116 adaptive, intrinsic, and innate immunity genes of predicted adaptive significance and their putative upstream regulatory regions along with 23 neutral microsatellite regions to control for demographic effects. The assay was applied to 45 fox DNA samples from Alaska, where three arctic rabies strains are geographically restricted and endemic to coastal tundra regions, yet absent from the boreal interior. The assay provided 61.5% on-target enrichment with relatively even sequence coverage across all targeted loci and samples (mean = 50×), which allowed us to elucidate genetic variation across introns, exons, and potential regulatory regions (4,819 SNPs). Challenges remained in accurately describing microsatellite variation using this technique; however, longer-read HTS technologies should overcome these issues. We used these data to conduct preliminary analyses and detected genetic structure in a subset of red fox immune-related genes between regions with and without endemic arctic rabies. This assay provides a template to assess immunogenetic variation

  7. Genome-wide copy number variation in the bovine genome detected using low coverage sequence of popular beef breeds

    USDA-ARS?s Scientific Manuscript database

    Copy number variations (CNVs) are large insertions, deletions or duplications in the genome that vary between members of a species and are known to affect a wide variety of phenotypic traits. In this study, we identified CNVs in a population of bulls using low coverage next-generation sequence data....

  8. Evolutionary Distance of Amino Acid Sequence Orthologs across Macaque Subspecies: Identifying Candidate Genes for SIV Resistance in Chinese Rhesus Macaques

    PubMed Central

    Ross, Cody T.; Roodgar, Morteza; Smith, David Glenn

    2015-01-01

    We use the Reciprocal Smallest Distance (RSD) algorithm to identify amino acid sequence orthologs in the Chinese and Indian rhesus macaque draft sequences and estimate the evolutionary distance between such orthologs. We then use GOanna to map gene function annotations and human gene identifiers to the rhesus macaque amino acid sequences. We conclude methodologically by cross-tabulating a list of amino acid orthologs with large divergence scores with a list of genes known to be involved in SIV or HIV pathogenesis. We find that many of the amino acid sequences with large evolutionary divergence scores, as calculated by the RSD algorithm, have been shown to be related to HIV pathogenesis in previous laboratory studies. Four of the strongest candidate genes for SIVmac resistance in Chinese rhesus macaques identified in this study are CDK9, CXCL12, TRIM21, and TRIM32. Additionally, ANKRD30A, CTSZ, GORASP2, GTF2H1, IL13RA1, MUC16, NMDAR1, Notch1, NT5M, PDCD5, RAD50, and TM9SF2 were identified as possible candidates, among others. We failed to find many laboratory experiments contrasting the effects of Indian and Chinese orthologs at these sites on SIVmac pathogenesis, but future comparative studies might hold fertile ground for research into the biological mechanisms underlying innate resistance to SIVmac in Chinese rhesus macaques. PMID:25884674

  9. Individual Variation in Lipidomic Profiles of Healthy Subjects in Response to Omega-3 Fatty Acids

    PubMed Central

    Nording, Malin L.; Yang, Jun; Georgi, Katrin; Hegedus Karbowski, Christine; German, J. Bruce; Weiss, Robert H.; Hogg, Ronald J.; Trygg, Johan; Hammock, Bruce D.; Zivkovic, Angela M.

    2013-01-01

    Introduction Conflicting findings in both interventional and observational studies have resulted in a lack of consensus on the benefits of ω3 fatty acids in reducing disease risk. This may be due to individual variability in response. We used a multi-platform lipidomic approach to investigate both the consistent and inconsistent responses of individuals comprehensively to a defined ω3 intervention. Methods The lipidomic profile including fatty acids, lipid classes, lipoprotein distribution, and oxylipins was examined multi- and uni-variately in 12 healthy subjects pre vs. post six weeks of ω3 fatty acids (1.9 g/d eicosapentaenoic acid [EPA] and 1.5 g/d docosahexaenoic acid [DHA]). Results Total lipidomic and oxylipin profiles were significantly different pre vs. post treatment across all subjects (p=0.00007 and p=0.00002 respectively). There was a strong correlation between oxylipin profiles and EPA and DHA incorporated into different lipid classes (r2=0.93). However, strikingly divergent responses among individuals were also observed. Both ω3 and ω6 fatty acid metabolites displayed a large degree of variation among the subjects. For example, in half of the subjects, two arachidonic acid cyclooxygenase products, prostaglandin E2 (PGE2) and thromboxane B2 (TXB2), and a lipoxygenase product, 12-hydroxyeicosatetraenoic acid (12-HETE) significantly decreased post intervention, whereas in the other half they either did not change or increased. The EPA lipoxygenase metabolite 12-hydroxyeicosapentaenoic acid (12-HEPE) varied among subjects from an 82% decrease to a 5,000% increase. Conclusions Our results show that certain defined responses to ω3 fatty acid intervention were consistent across all subjects. However, there was also a high degree of inter-individual variability in certain aspects of lipid metabolism. This lipidomic based phenotyping approach demonstrated that individual responsiveness to ω3 fatty acids is highly variable and measurable, and could be

  10. Seasonal variations, molecular distributions, and stable carbon isotopic compositions of dicarboxylic acids, ketocarboxylic acids, and α-dicarbonyls in PM2.5 from Beijing, China

    NASA Astrophysics Data System (ADS)

    Zhao, W.; Kawamura, K.; Fu, P.

    2016-12-01

    Low molecular weight (LMW) dicarboxylic acids and related polar compounds comprise a significant fraction of atmospheric aerosols. Seasonal variations, molecular distributions, and stable carbon isotopic compositions of dicarboxylic acids, ketocarboxylic acids, and α-dicarbonyls, as well as organic carbon (OC), elemental carbon (EC), water soluble organic carbon (WSOC) and inorganic ionic species, were determined to better understand the sources and photochemical aging processes of carbonaceous aerosols in urban Beijing from Sept. 2013 to Jul. 2014 (n=65). Concentrations of total diacids ranged from 110-2580 ng m-3, while ketoacids (9.5-353 ng m-3) and dicarbonyls (1.5-85.9 ng m-3) were less abundant. Higher ambient concentrations of phthalic (Ph) (37.9±27.3 ng m-3), terephthalic (tPh) (48.7±51.1 ng m-3), and glyoxylic (ωC2) (44.3±69 ng m-3) acids were found in winter than other seasons. The temporal variations of malonic acid to succinic acid (C3/C4) ratios were relatively low throughout the whole year, most of which were less than or equal to unity, even in summer, implying more contributions of dicarboxylic acids from primary emissions, rather than aging processes during long-range atmospheric transport. The δ13C mean values of malonic acid (-18.7% to -17.3%) and succinic acid (-28.6% to -17.1%) were larger than those of oxalic acid (-22.9% to -20.1%) in both seasons, except for δ13C of succinic acid in summer. Lower δ13C values of these compounds in Beijing than those in marine areas may be mainly associated with primary emissions, such as biomass burning, vehicular exhaust, incomplete fossil fuel combustion and plastic wastes.

  11. Conserved Sequence Preferences Contribute to Substrate Recognition by the Proteasome*

    PubMed Central

    Yu, Houqing; Singh Gautam, Amit K.; Wilmington, Shameika R.; Wylie, Dennis; Martinez-Fonts, Kirby; Kago, Grace; Warburton, Marie; Chavali, Sreenivas; Inobe, Tomonao; Finkelstein, Ilya J.; Babu, M. Madan

    2016-01-01

    The proteasome has pronounced preferences for the amino acid sequence of its substrates at the site where it initiates degradation. Here, we report that modulating these sequences can tune the steady-state abundance of proteins over 2 orders of magnitude in cells. This is the same dynamic range as seen for inducing ubiquitination through a classic N-end rule degron. The stability and abundance of His3 constructs dictated by the initiation site affect survival of yeast cells and show that variation in proteasomal initiation can affect fitness. The proteasome's sequence preferences are linked directly to the affinity of the initiation sites to their receptor on the proteasome and are conserved between Saccharomyces cerevisiae, Schizosaccharomyces pombe, and human cells. These findings establish that the sequence composition of unstructured initiation sites influences protein abundance in vivo in an evolutionarily conserved manner and can affect phenotype and fitness. PMID:27226608

  12. Deep sequencing in library selection projects: what insight does it bring?

    PubMed Central

    Glanville, J; D’Angelo, S; Khan, T.A.; Reddy, S. T.; Naranjo, L.; Ferrara, F.; Bradbury, A.R.M.

    2015-01-01

    High throughput sequencing is poised to change all aspects of the way antibodies and other binders are discovered and engineered. Millions of available sequence reads provide an unprecedented sampling depth able to guide the design and construction of effective, high quality naïve libraries containing tens of billions of unique molecules. Furthermore, during selections, high throughput sequencing enables quantitative tracing of enriched clones and position-specific guidance to amino acid variation under positive selection during antibody engineering. Successful application of the technologies relies on specific PCR reagent design, correct sequencing platform selection, and effective use of computational tools and statistical measures to remove error, identify antibodies, estimate diversity, and extract signatures of selection from the clone down to individual structural positions. Here we review these considerations and discuss some of the remaining challenges to the widespread adoption of the technology. PMID:26451649

  13. Serum uric acid concentrations and SLC2A9 genetic variation in Hispanic children: the Viva La Familia Study1234

    PubMed Central

    Voruganti, V Saroja; Laston, Sandra; Haack, Karin; Mehta, Nitesh R; Cole, Shelley A; Butte, Nancy F; Comuzzie, Anthony G

    2015-01-01

    Background: Elevated concentrations of serum uric acid are associated with increased risk of gout and renal and cardiovascular diseases. Genetic studies in adults have consistently identified associations of solute carrier family 2, member 9 (SLC2A9), polymorphisms with variation in serum uric acid. However, it is not known whether the association of serum uric acid with SLC2A9 polymorphisms manifests in children. Objective: The aim was to investigate whether variation in serum uric acid is under genetic influence and whether the association with SLC2A9 polymorphisms generalizes to Hispanic children of the Viva La Familia Study. Design: We conducted a genomewide association study with 1.1 million genetic markers in 815 children. Results: We found serum uric acid to be significantly heritable [h2 ± SD = 0.45 ± 0.08, P = 5.8 × 10−11] and associated with SLC2A9 variants (P values between 10−16 and 10−7). Several of the significantly associated polymorphisms were previously identified in studies in adults. We also found positive genetic correlations between serum uric acid and BMI z score (ρG = 0.45, P = 0.002), percentage of body fat (ρG = 0.28, P = 0.04), fat mass (ρG = 0.34, P = 0.02), waist circumference (ρG = 0.42, P = 0.003), and waist-to-height ratio (ρG = 0.46, P = 0.001). Conclusions: Our results show that variation in serum uric acid in Hispanic children is under considerable genetic influence and is associated with obesity-related phenotypes. As in adults, genetic variation in SLC2A9 is associated with serum uric acid concentrations, an important biomarker of renal and cardiovascular disease risk, in Hispanic children. PMID:25833971

  14. Identification of a novel bovine enterovirus possessing highly divergent amino acid sequences in capsid protein.

    PubMed

    Tsuchiaka, Shinobu; Rahpaya, Sayed Samim; Otomaru, Konosuke; Aoki, Hiroshi; Kishimoto, Mai; Naoi, Yuki; Omatsu, Tsutomu; Sano, Kaori; Okazaki-Terashima, Sachiko; Katayama, Yukie; Oba, Mami; Nagai, Makoto; Mizutani, Tetsuya

    2017-01-17

    Bovine enterovirus (BEV) belongs to the species Enterovirus E or F, genus Enterovirus and family Picornaviridae. Although numerous studies have identified BEVs in the feces of cattle with diarrhea, the pathogenicity of BEVs remains unclear. Previously, we reported the detection of novel kobu-like virus in calf feces, by metagenomics analysis. In the present study, we identified a novel BEV in diarrheal feces collected for that survey. Complete genome sequences were determined by deep sequencing in feces. Secondary RNA structure analysis of the 5' untranslated region (UTR), phylogenetic tree construction and pairwise identity analysis were conducted. The complete genome sequences of BEV were genetically distant from other EVs and the VP1 coding region contained novel and unique amino acid sequences. We named this strain as BEV AN12/Bos taurus/JPN/2014 (referred to as BEV-AN12). According to genome analysis, the genome length of this virus is 7414 nucleotides excluding the poly (A) tail and its genome consists of a 5'UTR, open reading frame encoding a single polyprotein, and 3'UTR. The results of secondary RNA structure analysis showed that in the 5'UTR, BEV-AN12 had an additional clover leaf structure and small stem loop structure, similarly to other BEVs. In pairwise identity analysis, BEV-AN12 showed high amino acid (aa) identities to Enterovirus F in the polyprotein, P2 and P3 regions (aa identity ≥82.4%). Therefore, BEV-AN12 is closely related to Enterovirus F. However, aa sequences in the capsid protein regions, particularly the VP1 encoding region, showed significantly low aa identity to other viruses in genus Enterovirus (VP1 aa identity ≤58.6%). In addition, BEV-AN12 branched separately from Enterovirus E and F in phylogenetic trees based on the aa sequences of P1 and VP1, although it clustered with Enterovirus F in trees based on sequences in the P2 and P3 genome region. We identified novel BEV possessing highly divergent aa sequences in the VP1 coding

  15. Purification, amino acid sequence and characterisation of kangaroo IGF-I.

    PubMed

    Yandell, C A; Francis, G L; Wheldrake, J F; Upton, Z

    1998-01-01

    Insulin-like growth factor-I (IGF-I) and IGF-II have been purified to homogeneity from kangaroo (Macropus fuliginosus) serum, thus this represents the first report of the purification, sequencing and characterisation of marsupial IGFs. N-Terminal protein sequencing reveals that there are six amino acid differences between kangaroo and human IGF-I. Kangaroo IGF-II has been partially sequenced and no differences were found between human and kangaroo IGF-II in the 53 residues identified. Thus the IGFs appear to be remarkably structurally conserved during mammalian radiation. In addition, in vitro characterisation of kangaroo IGF-I demonstrated that the functional properties of human, kangaroo and chicken IGF-I are very similar. In an assay measuring the ability of the proteins to stimulate protein synthesis in rat L6 myoblasts, all IGF-I proteins were found to be equally potent. The ability of all three proteins to compete for binding with radiolabelled human IGF-I to type-1 IGF receptors in L6 myoblasts and in Sminthopsis crassicaudata transformed lung fibroblasts, a marsupial cell line, was comparable. Furthermore, kangaroo and human IGF-I react equally in a human IGF-I RIA using a human reference standard, radiolabelled human IGF-I and a polyclonal antibody raised against recombinant human IGF-I. This study indicates that not only is the primary structure of eutherian and metatherian IGF-I conserved, but also the proteins appear to be functionally similar.

  16. Rare coding variation in paraoxonase-1 is associated with ischemic stroke in the NHLBI Exome Sequencing Project.

    PubMed

    Kim, Daniel Seung; Crosslin, David R; Auer, Paul L; Suzuki, Stephanie M; Marsillach, Judit; Burt, Amber A; Gordon, Adam S; Meschia, James F; Nalls, Mike A; Worrall, Bradford B; Longstreth, W T; Gottesman, Rebecca F; Furlong, Clement E; Peters, Ulrike; Rich, Stephen S; Nickerson, Deborah A; Jarvik, Gail P

    2014-06-01

    HDL-associated paraoxonase-1 (PON1) is an enzyme whose activity is associated with cerebrovascular disease. Common PON1 genetic variants have not been consistently associated with cerebrovascular disease. Rare coding variation that likely alters PON1 enzyme function may be more strongly associated with stroke. The National Heart, Lung, and Blood Institute Exome Sequencing Project sequenced the coding regions (exomes) of the genome for heart, lung, and blood-related phenotypes (including ischemic stroke). In this sample of 4,204 unrelated participants, 496 had verified, noncardioembolic ischemic stroke. After filtering, 28 nonsynonymous PON1 variants were identified. Analysis with the sequence kernel association test, adjusted for covariates, identified significant associations between PON1 variants and ischemic stroke (P = 3.01 × 10(-3)). Stratified analyses demonstrated a stronger association of PON1 variants with ischemic stroke in African ancestry (AA) participants (P = 5.03 × 10(-3)). Ethnic differences in the association between PON1 variants with stroke could be due to the effects of PON1Val109Ile (overall P = 7.88 × 10(-3); AA P = 6.52 × 10(-4)), found at higher frequency in AA participants (1.16% vs. 0.02%) and whose protein is less stable than the common allele. In summary, rare genetic variation in PON1 was associated with ischemic stroke, with stronger associations identified in those of AA. Increased focus on PON1 enzyme function and its role in cerebrovascular disease is warranted.

  17. Spatial and Temporal Stress Drop Variations of the 2011 Tohoku Earthquake Sequence

    NASA Astrophysics Data System (ADS)

    Miyake, H.

    2013-12-01

    The 2011 Tohoku earthquake sequence consists of foreshocks, mainshock, aftershocks, and repeating earthquakes. To quantify spatial and temporal stress drop variations is important for understanding M9-class megathrust earthquakes. Variability and spatial and temporal pattern of stress drop is a basic information for rupture dynamics as well as useful to source modeling. As pointed in the ground motion prediction equations by Campbell and Bozorgnia [2008, Earthquake Spectra], mainshock-aftershock pairs often provide significant decrease of stress drop. We here focus strong motion records before and after the Tohoku earthquake, and analyze source spectral ratios considering azimuth- and distance dependency [Miyake et al., 2001, GRL]. Due to the limitation of station locations on land, spatial and temporal stress drop variations are estimated by adjusting shifts from the omega-squared source spectral model. The adjustment is based on the stochastic Green's function simulations of source spectra considering azimuth- and distance dependency. We assumed the same Green's functions for event pairs for each station, both the propagation path and site amplification effects are cancelled out. Precise studies of spatial and temporal stress drop variations have been performed [e.g., Allmann and Shearer, 2007, JGR], this study targets the relations between stress drop vs. progression of slow slip prior to the Tohoku earthquake by Kato et al. [2012, Science] and plate structures. Acknowledgement: This study is partly supported by ERI Joint Research (2013-B-05). We used the JMA unified earthquake catalogue and K-NET, KiK-net, and F-net data provided by NIED.

  18. Construction Strategy for an Internal Amplification Control for Real-Time Diagnostic Assays Using Nucleic Acid Sequence-Based Amplification: Development and Clinical Application

    PubMed Central

    Rodríguez-Lázaro, David; D'Agostino, Martin; Pla, Maria; Cook, Nigel

    2004-01-01

    An important analytical control in molecular amplification-based methods is an internal amplification control (IAC), which should be included in each reaction mixture. An IAC is a nontarget nucleic acid sequence which is coamplified simultaneously with the target sequence. With negative results for the target nucleic acid, the absence of an IAC signal indicates that amplification has failed. A general strategy for the construction of an IAC for inclusion in molecular beacon-based real-time nucleic acid sequence-based amplification (NASBA) assays is presented. Construction proceeds in two phases. In the first phase, a double-stranded DNA molecule that contains nontarget sequences flanked by target sequences complementary to the NASBA primers is produced. At the 5′ end of this DNA molecule is a T7 RNA polymerase binding sequence. In the second phase of construction, RNA transcripts are produced from the DNA by T7 RNA polymerase. This RNA is the IAC; it is amplified by the target NASBA primers and is detected by a molecular beacon probe complementary to the internal nontarget sequences. As a practical example, an IAC for use in an assay for the detection of Mycobacterium avium subsp. paratuberculosis is described, its incorporation and optimization within the assay are detailed, and its application to spiked and natural clinical samples is shown to illustrate the correct interpretation of the diagnostic results. PMID:15583319

  19. Amino acid sequences of peptides from a chymotryptic digest of a urea-soluble protein fraction (U.S.3) from oxidized wool

    PubMed Central

    Corfield, M. C.; Fletcher, J. C.

    1969-01-01

    1. A chymotryptic digest of the protein fraction U.S.3. from oxidized wool was separated into 51 peptide fractions by chromatography on a column of cation-exchange resin. 2. The less acidic fractions were separated into their component peptides by a combination of cation-exchange-resin chromatography, paper chromatography and paper electrophoresis. 3. The amino acid sequences of 34 of these peptides were elucidated, and those of 14 others partially determined. 4. Overlaps between the tryptic and chymotryptic peptides from fraction U.S.3 have enabled ten extended amino acid sequences to be deduced, the longest containing 20 amino acid residues. 5. The relevance of the results to the structures of the helical and non-helical regions of wool is discussed. PMID:5395876

  20. Glycosylation Focuses Sequence Variation in the Influenza A Virus H1 Hemagglutinin Globular Domain

    PubMed Central

    Hensley, Scott E.; Hurt, Darrell E.; Bennink, Jack R.; Yewdell, Jonathan W.

    2010-01-01

    Antigenic drift in the influenza A virus hemagglutinin (HA) is responsible for seasonal reformulation of influenza vaccines. Here, we address an important and largely overlooked issue in antigenic drift: how does the number and location of glycosylation sites affect HA evolution in man? We analyzed the glycosylation status of all full-length H1 subtype HA sequences available in the NCBI influenza database. We devised the “flow index” (FI), a simple algorithm that calculates the tendency for viruses to gain or lose consensus glycosylation sites. The FI predicts the predominance of glycosylation states among existing strains. Our analyses show that while the number of glycosylation sites in the HA globular domain does not influence the overall magnitude of variation in defined antigenic regions, variation focuses on those regions unshielded by glycosylation. This supports the conclusion that glycosylation generally shields HA from antibody-mediated neutralization, and implies that fitness costs in accommodating oligosaccharides limit virus escape via HA hyperglycosylation. PMID:21124818

  1. [Susceptibility HLA alleles and amino acids to Takayasu arteritis].

    PubMed

    Terao, Chikashi; Yoshifuji, Hajime; Mimori, Tsuneyo; Matsuda, Fumihiko

    2014-01-01

    Takayasu arteritis (TAK) is a systemic vasculitis affecting aorta and its large branches which were firstly reported from Japan. TAK develops mainly in young females and the number of patients with TAK in Japan is estimated about 6,000 to 10,000. This low prevalence has made genetic studies of TAK difficult to elucidate its genetic background. The HLA region, especially HLA-B locus, is the strongest susceptibility locus to TAK. The association between TAK and HLA-B*52:01 has been established beyond ethnicity. Recently, two different Japanese research groups identified HLA-B67:01, a relatively rare allele in East Asian population, as a novel susceptibility allele. At the same time, two amino acid variations, namely, histidine at position 171 and phenylalanine at position 67 were reported as susceptibility and protective variations, respectively. Since these positions of amino acid are in the peptide binding grooves of HLA-B protein, changes of peptide-binding in MHC class I seem to play a critical role on susceptibility to TAK. Furthermore, the importance of these two amino acid variations would explain the lack of susceptibility effect of HLA-B*51:01 to TAK, which shares most of amino acid sequences with HLA-B*52:01 except for two amino acids including the position 67.

  2. Low incidence of DNA sequence variation in human induced pluripotent stem cells generated by non-integrating plasmid expression

    PubMed Central

    Cheng, Linzhao; Hansen, Nancy F.; Zhao, Ling; Du, Yutao; Zou, Chunlin; Donovan, Frank X.; Chou, Bin-Kuan; Zhou, Guangyu; Li, Shijie; Dowey, Sarah N.; Ye, Zhaohui; Chandrasekharappa, Settara C.; Yang, Huanming; Mullikin, James C.; Liu, P. Paul

    2012-01-01

    Summary The utility of induced pluripotent stem cells (iPSCs) as models to study diseases and as sources for cell therapy depends on the integrity of their genomes. Despite recent publications of DNA sequence variations in the iPSCs, the true scope of such changes for the entire genome is not clear. Here we report the whole-genome sequencing of three human iPSC lines derived from two cell types of an adult donor by episomal vectors. The vector sequence was undetectable in the deeply sequenced iPSC lines. We identified 1058–1808 heterozygous single nucleotide variants (SNVs), but no copy number variants, in each iPSC line. Six to twelve of these SNVs were within coding regions in each iPSC line, but ~50% of them are synonymous changes and the remaining are not selectively enriched for known genes associated with cancers. Our data thus suggest that episome-mediated reprogramming is not inherently mutagenic during integration-free iPSC induction. PMID:22385660

  3. Sequence variation identified in the 18S rRNA gene of Theileria mutans and Theileria velifera from the African buffalo (Syncerus caffer).

    PubMed

    Chaisi, Mamohale E; Collins, Nicola E; Potgieter, Fred T; Oosthuizen, Marinda C

    2013-01-16

    The African buffalo (Syncerus caffer) is a natural reservoir host for both pathogenic and non-pathogenic Theileria species. These often occur naturally as mixed infections in buffalo. Although the benign and mildly pathogenic forms do not have any significant economic importance, their presence could complicate the interpretation of diagnostic test results aimed at the specific diagnosis of the pathogenic Theileria parva in cattle and buffalo in South Africa. The 18S rRNA gene has been used as the target in a quantitative real-time PCR (qPCR) assay for the detection of T. parva infections. However, the extent of sequence variation within this gene in the non-pathogenic Theileria spp. of the Africa buffalo is not well known. The aim of this study was, therefore, to characterise the full-length 18S rRNA genes of Theileria mutans, Theileria sp. (strain MSD) and T. velifera and to determine the possible influence of any sequence variation on the specific detection of T. parva using the 18S rRNA qPCR. The reverse line blot (RLB) hybridization assay was used to select samples which either tested positive for several different Theileria spp., or which hybridised only with the Babesia/Theileria genus-specific probe and not with any of the Babesia or Theileria species-specific probes. The full-length 18S rRNA genes from 14 samples, originating from 13 buffalo and one bovine from different localities in South Africa, were amplified, cloned and the resulting recombinants sequenced. Variations in the 18S rRNA gene sequences were identified in T. mutans, Theileria sp. (strain MSD) and T. velifera, with the greatest diversity observed amongst the T. mutans variants. This variation possibly explained why the RLB hybridization assay failed to detect T. mutans and T. velifera in some of the analysed samples. Copyright © 2012 Elsevier B.V. All rights reserved.

  4. Discovery, genotyping and characterization of structural variation and novel sequence at single nucleotide resolution from de novo genome assemblies on a population scale.

    PubMed

    Liu, Siyang; Huang, Shujia; Rao, Junhua; Ye, Weijian; Krogh, Anders; Wang, Jun

    2015-01-01

    Comprehensive recognition of genomic variation in one individual is important for understanding disease and developing personalized medication and treatment. Many tools based on DNA re-sequencing exist for identification of single nucleotide polymorphisms, small insertions and deletions (indels) as well as large deletions. However, these approaches consistently display a substantial bias against the recovery of complex structural variants and novel sequence in individual genomes and do not provide interpretation information such as the annotation of ancestral state and formation mechanism. We present a novel approach implemented in a single software package, AsmVar, to discover, genotype and characterize different forms of structural variation and novel sequence from population-scale de novo genome assemblies up to nucleotide resolution. Application of AsmVar to several human de novo genome assemblies captures a wide spectrum of structural variants and novel sequences present in the human population in high sensitivity and specificity. Our method provides a direct solution for investigating structural variants and novel sequences from de novo genome assemblies, facilitating the construction of population-scale pan-genomes. Our study also highlights the usefulness of the de novo assembly strategy for definition of genome structure.

  5. Melanocortin 1 receptor (MC1R) gene sequence variation and melanism in the gray (Sciurus carolinensis), fox (Sciurus niger), and red (Sciurus vulgaris) squirrel.

    PubMed

    McRobie, Helen R; King, Linda M; Fanutti, Cristina; Coussons, Peter J; Moncrief, Nancy D; Thomas, Alison P M

    2014-01-01

    Sequence variations in the melanocortin 1 receptor (MC1R) gene are associated with melanism in many different species of mammals, birds, and reptiles. The gray squirrel (Sciurus carolinensis), found in the British Isles, was introduced from North America in the late 19th century. Melanism in the British gray squirrel is associated with a 24-bp deletion in the MC1R. To investigate the origin of this mutation, we sequenced the MC1R of 95 individuals including 44 melanic gray squirrels from both the British Isles and North America. Melanic gray squirrels of both populations had the same 24-bp deletion associated with melanism. Given the significant deletion associated with melanism in the gray squirrel, we sequenced the MC1R of both wild-type and melanic fox squirrels (Sciurus niger) (9 individuals) and red squirrels (Sciurus vulgaris) (39 individuals). Unlike the gray squirrel, no association between sequence variation in the MC1R and melanism was found in these 2 species. We conclude that the melanic gray squirrel found in the British Isles originated from one or more introductions of melanic gray squirrels from North America. We also conclude that variations in the MC1R are not associated with melanism in the fox and red squirrels.

  6. Variation, Repetition, And Choice

    PubMed Central

    Abreu-Rodrigues, Josele; Lattal, Kennon A; dos Santos, Cristiano V; Matos, Ricardo A

    2005-01-01

    Experiment 1 investigated the controlling properties of variability contingencies on choice between repeated and variable responding. Pigeons were exposed to concurrent-chains schedules with two alternatives. In the REPEAT alternative, reinforcers in the terminal link depended on a single sequence of four responses. In the VARY alternative, a response sequence in the terminal link was reinforced only if it differed from the n previous sequences (lag criterion). The REPEAT contingency generated low, constant levels of sequence variation whereas the VARY contingency produced levels of sequence variation that increased with the lag criterion. Preference for the REPEAT alternative tended to increase directly with the degree of variation required for reinforcement. Experiment 2 examined the potential confounding effects in Experiment 1 of immediacy of reinforcement by yoking the interreinforcer intervals in the REPEAT alternative to those in the VARY alternative. Again, preference for REPEAT was a function of the lag criterion. Choice between varying and repeating behavior is discussed with respect to obtained behavioral variability, probability of reinforcement, delay of reinforcement, and switching within a sequence. PMID:15828592

  7. Comparative Chloroplast Genomics of Gossypium Species: Insights Into Repeat Sequence Variations and Phylogeny

    PubMed Central

    Wu, Ying; Liu, Fang; Yang, Dai-Gang; Li, Wei; Zhou, Xiao-Jian; Pei, Xiao-Yu; Liu, Yan-Gai; He, Kun-Lun; Zhang, Wen-Sheng; Ren, Zhong-Ying; Zhou, Ke-Hai; Ma, Xiong-Feng; Li, Zhong-Hu

    2018-01-01

    Cotton is one of the most economically important fiber crop plants worldwide. The genus Gossypium contains a single allotetraploid group (AD) and eight diploid genome groups (A–G and K). However, the evolution of repeat sequences in the chloroplast genomes and the phylogenetic relationships of Gossypium species are unclear. Thus, we determined the variations in the repeat sequences and the evolutionary relationships of 40 cotton chloroplast genomes, which represented the most diverse in the genus, including five newly sequenced diploid species, i.e., G. nandewarense (C1-n), G. armourianum (D2-1), G. lobatum (D7), G. trilobum (D8), and G. schwendimanii (D11), and an important semi-wild race of upland cotton, G. hirsutum race latifolium (AD1). The genome structure, gene order, and GC content of cotton species were similar to those of other higher plant plastid genomes. In total, 2860 long sequence repeats (>10 bp in length) were identified, where the F-genome species had the largest number of repeats (G. longicalyx F1: 108) and E-genome species had the lowest (G. stocksii E1: 53). Large-scale repeat sequences possibly enrich the genetic information and maintain genome stability in cotton species. We also identified 10 divergence hotspot regions, i.e., rpl33-rps18, psbZ-trnG (GCC), rps4-trnT (UGU), trnL (UAG)-rpl32, trnE (UUC)-trnT (GGU), atpE, ndhI, rps2, ycf1, and ndhF, which could be useful molecular genetic markers for future population genetics and phylogenetic studies. Site-specific selection analysis showed that some of the coding sites of 10 chloroplast genes (atpB, atpE, rps2, rps3, petB, petD, ccsA, cemA, ycf1, and rbcL) were under protein sequence evolution. Phylogenetic analysis based on the whole plastomes suggested that the Gossypium species grouped into six previously identified genetic clades. Interestingly, all 13 D-genome species clustered into a strong monophyletic clade. Unexpectedly, the cotton species with C, G, and K-genomes were admixed and

  8. Sequence variations of the partially dominant DELLA gene Rht-B1c in wheat and their functional impacts

    PubMed Central

    Ma, Zhengqiang

    2013-01-01

    Rht-B1c, allelic to the DELLA protein-encoding gene Rht-B1a, is a natural mutation documented in common wheat (Triticum aestivum). It confers variation to a number of traits related to cell and plant morphology, seed dormancy, and photosynthesis. The present study was conducted to examine the sequence variations of Rht-B1c and their functional impacts. The results showed that Rht-B1c was partially dominant or co-dominant for plant height, and exhibited an increased dwarfing effect. At the sequence level, Rht-B1c differed from Rht-B1a by one 2kb Veju retrotransposon insertion, three coding region single nucleotide polymorphisms (SNPs), one 197bp insertion, and four SNPs in the 1kb upstream sequence. Haplotype investigations, association analyses, transient expression assays, and expression profiling showed that the Veju insertion was primarily responsible for the extreme dwarfing effect. It was found that the Veju insertion changed processing of the Rht-B1c transcripts and resulted in DELLA motif primary structure disruption. Expression assays showed that Rht-B1c caused reduction of total Rht-1 transcript levels, and up-regulation of GATA-like transcription factors and genes positively regulated by these factors, suggesting that one way in which Rht-1 proteins affect plant growth and development is through GATA-like transcription factor regulation. PMID:23918966

  9. Capturing sequence variation among flowering-time regulatory gene homologs in the allopolyploid crop species Brassica napus

    PubMed Central

    Schiessl, Sarah; Samans, Birgit; Hüttel, Bruno; Reinhard, Richard; Snowdon, Rod J.

    2014-01-01

    Flowering, the transition from the vegetative to the generative phase, is a decisive time point in the lifecycle of a plant. Flowering is controlled by a complex network of transcription factors, photoreceptors, enzymes and miRNAs. In recent years, several studies gave rise to the hypothesis that this network is also strongly involved in the regulation of other important lifecycle processes ranging from germination and seed development through to fundamental developmental and yield-related traits. In the allopolyploid crop species Brassica napus, (genome AACC), homoeologous copies of flowering time regulatory genes are implicated in major phenological variation within the species, however the extent and control of intraspecific and intergenomic variation among flowering-time regulators is still unclear. To investigate differences among B. napus morphotypes in relation to flowering-time gene variation, we performed targeted deep sequencing of 29 regulatory flowering-time genes in four genetically and phenologically diverse B. napus accessions. The genotype panel included a winter-type oilseed rape, a winter fodder rape, a spring-type oilseed rape (all B. napus ssp. napus) and a swede (B. napus ssp. napobrassica), which show extreme differences in winter-hardiness, vernalization requirement and flowering behavior. A broad range of genetic variation was detected in the targeted genes for the different morphotypes, including non-synonymous SNPs, copy number variation and presence-absence variation. The results suggest that this broad variation in vernalization, clock and signaling genes could be a key driver of morphological differentiation for flowering-related traits in this recent allopolyploid crop species. PMID:25202314

  10. Genome-wide patterns of copy number variation in the diversified chicken genomes using next-generation sequencing.

    PubMed

    Yi, Guoqiang; Qu, Lujiang; Liu, Jianfeng; Yan, Yiyuan; Xu, Guiyun; Yang, Ning

    2014-11-07

    Copy number variation (CNV) is important and widespread in the genome, and is a major cause of disease and phenotypic diversity. Herein, we performed a genome-wide CNV analysis in 12 diversified chicken genomes based on whole genome sequencing. A total of 8,840 CNV regions (CNVRs) covering 98.2 Mb and representing 9.4% of the chicken genome were identified, ranging in size from 1.1 to 268.8 kb with an average of 11.1 kb. Sequencing-based predictions were confirmed at a high validation rate by two independent approaches, including array comparative genomic hybridization (aCGH) and quantitative PCR (qPCR). The Pearson's correlation coefficients between sequencing and aCGH results ranged from 0.435 to 0.755, and qPCR experiments revealed a positive validation rate of 91.71% and a false negative rate of 22.43%. In total, 2,214 (25.0%) predicted CNVRs span 2,216 (36.4%) RefSeq genes associated with specific biological functions. Besides two previously reported copy number variable genes EDN3 and PRLR, we also found some promising genes with potential in phenotypic variation. Two genes, FZD6 and LIMS1, related to disease susceptibility/resistance are covered by CNVRs. The highly duplicated SOCS2 may lead to higher bone mineral density. Entire or partial duplication of some genes like POPDC3 may have great economic importance in poultry breeding. Our results based on extensive genetic diversity provide a more refined chicken CNV map and genome-wide gene copy number estimates, and warrant future CNV association studies for important traits in chickens.

  11. Sequence variation and structural conservation allows development of novel function and immune evasion in parasite surface protein families

    PubMed Central

    Higgins, Matthew K; Carrington, Mark

    2014-01-01

    Trypanosoma and Plasmodium species are unicellular, eukaryotic pathogens that have evolved the capacity to survive and proliferate within a human host, causing sleeping sickness and malaria, respectively. They have very different survival strategies. African trypanosomes divide in blood and extracellular spaces, whereas Plasmodium species invade and proliferate within host cells. Interaction with host macromolecules is central to establishment and maintenance of an infection by both parasites. Proteins that mediate these interactions are under selection pressure to bind host ligands without compromising immune avoidance strategies. In both parasites, the expansion of genes encoding a small number of protein folds has established large protein families. This has permitted both diversification to form novel ligand binding sites and variation in sequence that contributes to avoidance of immune recognition. In this review we consider two such parasite surface protein families, one from each species. In each case, known structures demonstrate how extensive sequence variation around a conserved molecular architecture provides an adaptable protein scaffold that the parasites can mobilise to mediate interactions with their hosts. PMID:24442723

  12. An evaluation of copy number variation detection tools for cancer using whole exome sequencing data.

    PubMed

    Zare, Fatima; Dow, Michelle; Monteleone, Nicholas; Hosny, Abdelrahman; Nabavi, Sheida

    2017-05-31

    Recently copy number variation (CNV) has gained considerable interest as a type of genomic/genetic variation that plays an important role in disease susceptibility. Advances in sequencing technology have created an opportunity for detecting CNVs more accurately. Recently whole exome sequencing (WES) has become primary strategy for sequencing patient samples and study their genomics aberrations. However, compared to whole genome sequencing, WES introduces more biases and noise that make CNV detection very challenging. Additionally, tumors' complexity makes the detection of cancer specific CNVs even more difficult. Although many CNV detection tools have been developed since introducing NGS data, there are few tools for somatic CNV detection for WES data in cancer. In this study, we evaluated the performance of the most recent and commonly used CNV detection tools for WES data in cancer to address their limitations and provide guidelines for developing new ones. We focused on the tools that have been designed or have the ability to detect cancer somatic aberrations. We compared the performance of the tools in terms of sensitivity and false discovery rate (FDR) using real data and simulated data. Comparative analysis of the results of the tools showed that there is a low consensus among the tools in calling CNVs. Using real data, tools show moderate sensitivity (~50% - ~80%), fair specificity (~70% - ~94%) and poor FDRs (~27% - ~60%). Also, using simulated data we observed that increasing the coverage more than 10× in exonic regions does not improve the detection power of the tools significantly. The limited performance of the current CNV detection tools for WES data in cancer indicates the need for developing more efficient and precise CNV detection methods. Due to the complexity of tumors and high level of noise and biases in WES data, employing advanced novel segmentation, normalization and de-noising techniques that are designed specifically for cancer data is

  13. The nucleotide sequence and genome organization of Plasmopara halstedii virus.

    PubMed

    Heller-Dohmen, Marion; Göpfert, Jens C; Pfannstiel, Jens; Spring, Otmar

    2011-03-17

    Only very few viruses of Oomycetes have been studied in detail. Isometric virions were found in different isolates of the oomycete Plasmopara halstedii, the downy mildew pathogen of sunflower. However, complete nucleotide sequences and data on the genome organization were lacking. Viral RNA of different P. halstedii isolates was subjected to nucleotide sequencing and analysis of the viral genome. The N-terminal sequence of the viral coat protein was determined using Top-Down MALDI-TOF analysis. The complete nucleotide sequences of both single-stranded RNA segments (RNA1 and RNA2) were established. RNA1 consisted of 2793 nucleotides (nt) exclusive its 3' poly(A) tract and a single open-reading frame (ORF1) of 2745 nt. ORF1 was framed by a 5' untranslated region (5' UTR) of 18 nt and a 3' untranslated region (3' UTR) of 30 nt. ORF1 contained motifs of RNA-dependent RNA polymerases (RdRp) and showed similarities to RdRp of Scleropthora macrospora virus A (SmV A) and viruses within the Nodaviridae family. RNA2 consisted of 1526 nt exclusive its 3' poly(A) tract and a second ORF (ORF2) of 1128 nt. ORF2 coded for the single viral coat protein (CP) and was framed by a 5' UTR of 164 nt and a 3' UTR of 234 nt. The deduced amino acid sequence of ORF2 was verified by nano-LC-ESI-MS/MS experiments. Top-Down MALDI-TOF analysis revealed the N-terminal sequence of the CP. The N-terminal sequence represented a region within ORF2 suggesting a proteolytic processing of the CP in vivo. The CP showed similarities to CP of SmV A and viruses within the Tombusviridae family. Fragments of RNA1 (ca. 1.9 kb) and RNA2 (ca. 1.4 kb) were used to analyze the nucleotide sequence variation of virions in different P. halstedii isolates. Viral sequence variation was 0.3% or less regardless of their host's pathotypes, the geographical origin and the sensitivity towards the fungicide metalaxyl. The results showed the presence of a single and new virus type in different P. halstedii isolates

  14. Multiplexed droplet single-cell RNA-sequencing using natural genetic variation.

    PubMed

    Kang, Hyun Min; Subramaniam, Meena; Targ, Sasha; Nguyen, Michelle; Maliskova, Lenka; McCarthy, Elizabeth; Wan, Eunice; Wong, Simon; Byrnes, Lauren; Lanata, Cristina M; Gate, Rachel E; Mostafavi, Sara; Marson, Alexander; Zaitlen, Noah; Criswell, Lindsey A; Ye, Chun Jimmie

    2018-01-01

    Droplet single-cell RNA-sequencing (dscRNA-seq) has enabled rapid, massively parallel profiling of transcriptomes. However, assessing differential expression across multiple individuals has been hampered by inefficient sample processing and technical batch effects. Here we describe a computational tool, demuxlet, that harnesses natural genetic variation to determine the sample identity of each droplet containing a single cell (singlet) and detect droplets containing two cells (doublets). These capabilities enable multiplexed dscRNA-seq experiments in which cells from unrelated individuals are pooled and captured at higher throughput than in standard workflows. Using simulated data, we show that 50 single-nucleotide polymorphisms (SNPs) per cell are sufficient to assign 97% of singlets and identify 92% of doublets in pools of up to 64 individuals. Given genotyping data for each of eight pooled samples, demuxlet correctly recovers the sample identity of >99% of singlets and identifies doublets at rates consistent with previous estimates. We apply demuxlet to assess cell-type-specific changes in gene expression in 8 pooled lupus patient samples treated with interferon (IFN)-β and perform eQTL analysis on 23 pooled samples.

  15. fCCAC: functional canonical correlation analysis to evaluate covariance between nucleic acid sequencing datasets.

    PubMed

    Madrigal, Pedro

    2017-03-01

    Computational evaluation of variability across DNA or RNA sequencing datasets is a crucial step in genomic science, as it allows both to evaluate reproducibility of biological or technical replicates, and to compare different datasets to identify their potential correlations. Here we present fCCAC, an application of functional canonical correlation analysis to assess covariance of nucleic acid sequencing datasets such as chromatin immunoprecipitation followed by deep sequencing (ChIP-seq). We show how this method differs from other measures of correlation, and exemplify how it can reveal shared covariance between histone modifications and DNA binding proteins, such as the relationship between the H3K4me3 chromatin mark and its epigenetic writers and readers. An R/Bioconductor package is available at http://bioconductor.org/packages/fCCAC/ . pmb59@cam.ac.uk. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  16. Variation in the Nucleotide Sequence of Cottontail Rabbit Papillomavirus a and b Subtypes Affects Wart Regression and Malignant Transformation and Level of Viral Replication in Domestic Rabbits

    PubMed Central

    Salmon, Jérôme; Nonnenmacher, Mathieu; Cazé, Sandrine; Flamant, Patricia; Croissant, Odile; Orth, Gérard; Breitburd, Françoise

    2000-01-01

    We previously reported the partial characterization of two cottontail rabbit papillomavirus (CRPV) subtypes with strikingly divergent E6 and E7 oncoproteins. We report now the complete nucleotide sequences of these subtypes, referred to as CRPVa4 (7,868 nucleotides) and CRPVb (7,867 nucleotides). The CRPVa4 and CRPVb genomes differed at 238 (3%) nucleotide positions, whereas CRPVa4 and the prototype CRPV differed by only 5 nucleotides. The most variable region (7% nucleotide divergence) included the long regulatory region (LRR) and the E6 and E7 genes. A mutation in the stop codon resulted in an 8-amino-acid-longer CRPVb E4 protein, and a nucleotide deletion reduced the coding capacity of the E5 gene from 101 to 25 amino acids. In domestic rabbits homozygous for a specific haplotype of the DRA and DQA genes of the major histocompatibility complex, warts induced by CRPVb DNA or a chimeric genome containing the CRPVb LRR/E6/E7 region showed an early regression, whereas warts induced by CRPVa4 or a chimeric genome containing the CRPVa4 LRR/E6/E7 region persisted and evolved into carcinomas. In contrast, most CRPVa, CRPVb, and chimeric CRPV DNA-induced warts showed no early regression in rabbits homozygous for another DRA-DQA haplotype. Little, if any, viral replication is usually observed in domestic rabbit warts. When warts induced by CRPVa and CRPVb virions and DNA were compared, the number of cells positive for viral DNA or capsid antigens was found to be greater by 1 order of magnitude for specimens induced by CRPVb. Thus, both sequence variation in the LRR/E6/E7 region and the genetic constitution of the host influence the expression of the oncogenic potential of CRPV. Furthermore, intratype variation may overcome to some extent the host restriction of CRPV replication in domestic rabbits. PMID:11044121

  17. Prediction of beta-turns from amino acid sequences using the residue-coupled model.

    PubMed

    Guruprasad, K; Shukla, S

    2003-04-01

    We evaluated the prediction of beta-turns from amino acid sequences using the residue-coupled model with an enlarged representative protein data set selected from the Protein Data Bank. Our results show that the probability values derived from a data set comprising 425 protein chains yielded an overall beta-turn prediction accuracy 68.74%, compared with 94.7% reported earlier on a data set of 30 proteins using the same method. However, we noted that the overall beta-turn prediction accuracy using probability values derived from the 30-protein data set reduces to 40.74% when tested on the data set comprising 425 protein chains. In contrast, using probability values derived from the 425 data set used in this analysis, the overall beta-turn prediction accuracy yielded consistent results when tested on either the 30-protein data set (64.62%) used earlier or a more recent representative data set comprising 619 protein chains (64.66%) or on a jackknife data set comprising 476 representative protein chains (63.38%). We therefore recommend the use of probability values derived from the 425 representative protein chains data set reported here, which gives more realistic and consistent predictions of beta-turns from amino acid sequences.

  18. The shikimate pathway: review of amino acid sequence, function and three-dimensional structures of the enzymes.

    PubMed

    Mir, Rafia; Jallu, Shais; Singh, T P

    2015-06-01

    The aromatic compounds such as aromatic amino acids, vitamin K and ubiquinone are important prerequisites for the metabolism of an organism. All organisms can synthesize these aromatic metabolites through shikimate pathway, except for mammals which are dependent on their diet for these compounds. The pathway converts phosphoenolpyruvate and erythrose 4-phosphate to chorismate through seven enzymatically catalyzed steps and chorismate serves as a precursor for the synthesis of variety of aromatic compounds. These enzymes have shown to play a vital role for the viability of microorganisms and thus are suggested to present attractive molecular targets for the design of novel antimicrobial drugs. This review focuses on the seven enzymes of the shikimate pathway, highlighting their primary sequences, functions and three-dimensional structures. The understanding of their active site amino acid maps, functions and three-dimensional structures will provide a framework on which the rational design of antimicrobial drugs would be based. Comparing the full length amino acid sequences and the X-ray crystal structures of these enzymes from bacteria, fungi and plant sources would contribute in designing a specific drug and/or in developing broad-spectrum compounds with efficacy against a variety of pathogens.

  19. Molecular composition and seasonal variation of amino acids in urban aerosols from Beijing, China

    NASA Astrophysics Data System (ADS)

    Ren, Lujie; Bai, Huahua; Yu, Xi; Wu, Fengchang; Yue, Siyao; Ren, Hong; Li, Linjie; Lai, Senchao; Sun, Yele; Wang, Zifa; Fu, Pingqing

    2018-05-01

    Fifteen hydrolyzed amino acids (THAA) were quantified in urban aerosols (TSP samples) collected during April 2012 to May 2013 in Beijing, China using high-performance liquid chromatography (HPLC) after their derivatization with o-phthalaldehyde (OPA), to investigate their molecular distributions and seasonal variation. Total concentrations of amino acids ranged from 1.73-25.7 nmol m- 3 with a peak in spring (13.7 nmol m- 3), followed by winter (11.5 nmol m- 3), fall (9.51 nmol m- 3) and summer (7.45 nmol m- 3). Glycine (Gly), alanine (Ala) and valine (Val) are found to be the most abundant species, which account for 46% of the total THAA. Compared with those recorded in previous studies, the atmospheric levels of amino acids in Beijing were higher than those from other regions. Enhanced amounts of methionine, tyrosine, histidine, aspartic acid and glutamic acid were found during the rainfall events. The factor analysis further suggests that amino acids in urban Beijing originated from multiple sources including biological emission, biomass burning, as well as anthropogenic activities.

  20. Sequence investigation of 34 forensic autosomal STRs with massively parallel sequencing.

    PubMed

    Zhang, Suhua; Niu, Yong; Bian, Yingnan; Dong, Rixia; Liu, Xiling; Bao, Yun; Jin, Chao; Zheng, Hancheng; Li, Chengtao

    2018-05-01

    STRs vary not only in the length of the repeat units and the number of repeats but also in the region with which they conform to an incremental repeat pattern. Massively parallel sequencing (MPS) offers new possibilities in the analysis of STRs since they can simultaneously sequence multiple targets in a single reaction and capture potential internal sequence variations. Here, we sequenced 34 STRs applied in the forensic community of China with a custom-designed panel. MPS performance were evaluated from sequencing reads analysis, concordance study and sensitivity testing. High coverage sequencing data were obtained to determine the constitute ratios and heterozygous balance. No actual inconsistent genotypes were observed between capillary electrophoresis (CE) and MPS, demonstrating the reliability of the panel and the MPS technology. With the sequencing data from the 200 investigated individuals, 346 and 418 alleles were obtained via CE and MPS technologies at the 34 STRs, indicating MPS technology provides higher discrimination than CE detection. The whole study demonstrated that STR genotyping with the custom panel and MPS technology has the potential not only to reveal length and sequence variations but also to satisfy the demands of high throughput and high multiplexing with acceptable sensitivity.

  1. Ultra high-throughput nucleic acid sequencing as a tool for virus discovery in the turkey gut.

    USDA-ARS?s Scientific Manuscript database

    Recently, the use of the next generation of nucleic acid sequencing technology (i.e., 454 pyrosequencing, as developed by Roche/454 Life Sciences) has allowed an in-depth look at the uncultivated microorganisms present in complex environmental samples, including samples with agricultural importance....

  2. Fast computational methods for predicting protein structure from primary amino acid sequence

    DOEpatents

    Agarwal, Pratul Kumar [Knoxville, TN

    2011-07-19

    The present invention provides a method utilizing primary amino acid sequence of a protein, energy minimization, molecular dynamics and protein vibrational modes to predict three-dimensional structure of a protein. The present invention also determines possible intermediates in the protein folding pathway. The present invention has important applications to the design of novel drugs as well as protein engineering. The present invention predicts the three-dimensional structure of a protein independent of size of the protein, overcoming a significant limitation in the prior art.

  3. Deep sequencing in library selection projects: what insight does it bring?

    PubMed

    Glanville, J; D'Angelo, S; Khan, T A; Reddy, S T; Naranjo, L; Ferrara, F; Bradbury, A R M

    2015-08-01

    High throughput sequencing is poised to change all aspects of the way antibodies and other binders are discovered and engineered. Millions of available sequence reads provide an unprecedented sampling depth able to guide the design and construction of effective, high quality naïve libraries containing tens of billions of unique molecules. Furthermore, during selections, high throughput sequencing enables quantitative tracing of enriched clones and position-specific guidance to amino acid variation under positive selection during antibody engineering. Successful application of the technologies relies on specific PCR reagent design, correct sequencing platform selection, and effective use of computational tools and statistical measures to remove error, identify antibodies, estimate diversity, and extract signatures of selection from the clone down to individual structural positions. Here we review these considerations and discuss some of the remaining challenges to the widespread adoption of the technology. Copyright © 2015 Elsevier Ltd. All rights reserved.

  4. Complete amino acid sequence of the myoglobin from the Pacific sei whale, Balaenoptera borealis.

    PubMed

    Jones, B N; Rothgeb, T M; England, R D; Gurd, F R

    1979-04-25

    The complete amino acid sequence of the major component myoglobin from Pacific sei whale, Balaenoptera borealis, was determined by specific cleavage of the protein to obtain large peptides which are readily degraded by the automatic sequencer. The acetimidated apomyoglobin was selectively cleaved at its two methionyl residues with cyanogen bromide and at its three arginyl residues by trypsin. From the sequence analysis of four of these peptides and the apomyoglobin, over 75% of the covalent structure of the protein was obtained. The remainder of the primary structure was determined by the sequence analysis of peptides that resulted from further digestion of the amino-terminal and central cyanogen bromide fragments. The amino-terminal fragment was specifically cleaved at its two tryptophanyl residues with N-chlorosuccinimide and the central cyanogen bromide fragment was cleaved at its glutamyl residues with staphylococcal protease and at its single tyrosyl residue with N-bromosuccinimide. The primary structure of this myoglobin proved identical with that from the gray whale but differs from that of the finback whale at four positions, from that of the minke whale at three positions and from the myoglobin of the humpback whale at one position. The above sequence identities and differences reflect the close taxonomic relationship of these five species of Cetacea.

  5. Predicting protein amidation sites by orchestrating amino acid sequence features

    NASA Astrophysics Data System (ADS)

    Zhao, Shuqiu; Yu, Hua; Gong, Xiujun

    2017-08-01

    Amidation is the fourth major category of post-translational modifications, which plays an important role in physiological and pathological processes. Identifying amidation sites can help us understanding the amidation and recognizing the original reason of many kinds of diseases. But the traditional experimental methods for predicting amidation sites are often time-consuming and expensive. In this study, we propose a computational method for predicting amidation sites by orchestrating amino acid sequence features. Three kinds of feature extraction methods are used to build a feature vector enabling to capture not only the physicochemical properties but also position related information of the amino acids. An extremely randomized trees algorithm is applied to choose the optimal features to remove redundancy and dependence among components of the feature vector by a supervised fashion. Finally the support vector machine classifier is used to label the amidation sites. When tested on an independent data set, it shows that the proposed method performs better than all the previous ones with the prediction accuracy of 0.962 at the Matthew's correlation coefficient of 0.89 and area under curve of 0.964.

  6. Effective normalization for copy number variation detection from whole genome sequencing.

    PubMed

    Janevski, Angel; Varadan, Vinay; Kamalakaran, Sitharthan; Banerjee, Nilanjana; Dimitrova, Nevenka

    2012-01-01

    Whole genome sequencing enables a high resolution view of the human genome and provides unique insights into genome structure at an unprecedented scale. There have been a number of tools to infer copy number variation in the genome. These tools, while validated, also include a number of parameters that are configurable to genome data being analyzed. These algorithms allow for normalization to account for individual and population-specific effects on individual genome CNV estimates but the impact of these changes on the estimated CNVs is not well characterized. We evaluate in detail the effect of normalization methodologies in two CNV algorithms FREEC and CNV-seq using whole genome sequencing data from 8 individuals spanning four populations. We apply FREEC and CNV-seq to a sequencing data set consisting of 8 genomes. We use multiple configurations corresponding to different read-count normalization methodologies in FREEC, and statistically characterize the concordance of the CNV calls between FREEC configurations and the analogous output from CNV-seq. The normalization methodologies evaluated in FREEC are: GC content, mappability and control genome. We further stratify the concordance analysis within genic, non-genic, and a collection of validated variant regions. The GC content normalization methodology generates the highest number of altered copy number regions. Both mappability and control genome normalization reduce the total number and length of copy number regions. Mappability normalization yields Jaccard indices in the 0.07 - 0.3 range, whereas using a control genome normalization yields Jaccard index values around 0.4 with normalization based on GC content. The most critical impact of using mappability as a normalization factor is substantial reduction of deletion CNV calls. The output of another method based on control genome normalization, CNV-seq, resulted in comparable CNV call profiles, and substantial agreement in variable gene and CNV region calls

  7. Differences in acid tolerance between Bifidobacterium breve BB8 and its acid-resistant derivative B. breve BB8dpH, revealed by RNA-sequencing and physiological analysis.

    PubMed

    Yang, Xu; Hang, Xiaomin; Tan, Jing; Yang, Hong

    2015-06-01

    Bifidobacteria are common inhabitants of the human gastrointestinal tract, and their application has increased dramatically in recent years due to their health-promoting effects. The ability of bifidobacteria to tolerate acidic environments is particularly important for their function as probiotics because they encounter such environments in food products and during passage through the gastrointestinal tract. In this study, we generated a derivative, Bifidobacterium breve BB8dpH, which displayed a stable, acid-resistant phenotype. To investigate the possible reasons for the higher acid tolerance of B. breve BB8dpH, as compared with its parental strain B. breve BB8, a combined transcriptome and physiological approach was used to characterize differences between the two strains. An analysis of the transcriptome by RNA-sequencing indicated that the expression of 121 genes was increased by more than 2-fold, while the expression of 146 genes was reduced more than 2-fold, in B. breve BB8dpH. Validation of the RNA-sequencing data using real-time quantitative PCR analysis demonstrated that the RNA-sequencing results were highly reliable. The comparison analysis, based on differentially expressed genes, suggested that the acid tolerance of B. breve BB8dpH was enhanced by regulating the expression of genes involved in carbohydrate transport and metabolism, energy production, synthesis of cell envelope components (peptidoglycan and exopolysaccharide), synthesis and transport of glutamate and glutamine, and histidine synthesis. Furthermore, an analysis of physiological data showed that B. breve BB8dpH displayed higher production of exopolysaccharide and lower H(+)-ATPase activity than B. breve BB8. The results presented here will improve our understanding of acid tolerance in bifidobacteria, and they will lead to the development of new strategies to enhance the acid tolerance of bifidobacterial strains. Copyright © 2015 Elsevier Ltd. All rights reserved.

  8. Genetic Variation in Cardiomyopathy and Cardiovascular Disorders.

    PubMed

    McNally, Elizabeth M; Puckelwartz, Megan J

    2015-01-01

    With the wider deployment of massively-parallel, next-generation sequencing, it is now possible to survey human genome data for research and clinical purposes. The reduced cost of producing short-read sequencing has now shifted the burden to data analysis. Analysis of genome sequencing remains challenged by the complexity of the human genome, including redundancy and the repetitive nature of genome elements and the large amount of variation in individual genomes. Public databases of human genome sequences greatly facilitate interpretation of common and rare genetic variation, although linking database sequence information to detailed clinical information is limited by privacy and practical issues. Genetic variation is a rich source of knowledge for cardiovascular disease because many, if not all, cardiovascular disorders are highly heritable. The role of rare genetic variation in predicting risk and complications of cardiovascular diseases has been well established for hypertrophic and dilated cardiomyopathy, where the number of genes that are linked to these disorders is growing. Bolstered by family data, where genetic variants segregate with disease, rare variation can be linked to specific genetic variation that offers profound diagnostic information. Understanding genetic variation in cardiomyopathy is likely to help stratify forms of heart failure and guide therapy. Ultimately, genetic variation may be amenable to gene correction and gene editing strategies.

  9. Null alleles and sequence variations at primer binding sites of STR loci within multiplex typing systems.

    PubMed

    Yao, Yining; Yang, Qinrui; Shao, Chengchen; Liu, Baonian; Zhou, Yuxiang; Xu, Hongmei; Zhou, Yueqin; Tang, Qiqun; Xie, Jianhui

    2018-01-01

    Rare variants are widely observed in human genome and sequence variations at primer binding sites might impair the process of PCR amplification resulting in dropouts of alleles, named as null alleles. In this study, 5 cases from routine paternity testing using PowerPlex ® 21 System for STR genotyping were considered to harbor null alleles at TH01, FGA, D5S818, D8S1179, and D16S539, respectively. The dropout of alleles was confirmed by using alternative commercial kits AGCU Expressmarker 22 PCR amplification kit and AmpFℓSTR ® . Identifiler ® Plus Kit, and sequencing results revealed a single base variation at the primer binding site of each STR locus. Results from the collection of previous reports show that null alleles at D5S818 were frequently observed in population detected by two PowerPlex ® typing systems and null alleles at D19S433 were mostly observed in Japanese population detected by two AmpFℓSTR™ typing systems. Furthermore, the most popular mutation type appeared the transition from C to T with G to A, which might have a potential relationship with DNA methylation. Altogether, these results can provide helpful information in forensic practice to the elimination of genotyping discrepancy and the development of primer sets. Copyright © 2017 Elsevier B.V. All rights reserved.

  10. Sequence variations in the osteoprotegerin gene promoter in patients with postmenopausal osteoporosis.

    PubMed

    Arko, B; Prezelj, J; Komel, R; Kocijancic, A; Hudler, P; Marc, J

    2002-09-01

    Osteoprotegerin (OPG) is a recently discovered member of the TNF receptor superfamily that acts as an important paracrine regulator of bone remodeling. OPG knockout mice develop severe osteoporosis, whereas administration of OPG can prevent ovariectomy-induced bone loss. These findings implicate a role for OPG in the development of osteoporosis. In the present study, we screened the OPG gene promoter for sequence variations and examined their association with bone mineral density (BMD) in 103 osteoporotic postmenopausal women. Single-strand conformation polymorphism analysis followed by DNA sequencing revealed a presence of four nucleotide substitutions: 209 G-->A, 245 T-->G, 889 C-->T, and 950 T-->C. The frequencies of genotypes were as follows: GG (89.3%), GA (10.7%) for 209 G-->A polymorphism; TT (89.3%), TG (10.7%) for 245 T-->G polymorphism; and TT (25.2%), TC (53.4%), CC (21.4%) for 950 T-->C polymorphism. Substitution 889 C-->T was found in only two patients. Statistically significant association of genotypes with BMD at the lumbar spine (P = 0.005) was observed for 209 G-->A and 245 T-->G polymorphisms. Haplotype GATG was associated with lower BMD as compared with GGTT haplotype. Our results suggest that 209 G-->A and 245 T-->G polymorphisms in the OPG gene promoter may contribute to the genetic regulation of BMD.

  11. Synthesis and evaluations of an acid-cleavable, fluorescently labeled nucleotide as a reversible terminator for DNA sequencing.

    PubMed

    Tan, Lianjiang; Liu, Yazhi; Li, Xiaowei; Wu, Xin-Yan; Gong, Bing; Shen, Yu-Mei; Shao, Zhifeng

    2016-02-11

    An acid-cleavable linker based on a dimethylketal moiety was synthesized and used to connect a nucleotide with a fluorophore to produce a 3'-OH unblocked nucleotide analogue as an excellent reversible terminator for DNA sequencing by synthesis.

  12. SNP in Chalcone Synthase gene is associated with variation of 6-gingerol content in contrasting landraces of Zingiber officinale.Roscoe.

    PubMed

    Ghosh, Subhabrata; Mandi, Swati Sen

    2015-07-25

    Zingiber officinale, medicinally the most important species within Zingiber genus, contains 6-gingerol as the active principle. This compound obtained from rhizomes of Z.officinale, has immense medicinal importance and is used in various herbal drug formulations. Our record of variation in content of this active principle, viz. 6-gingerol, in land races of this drug plant collected from different locations correlated with our Gene expression studies exhibiting high Chalcone Synthase gene (Chalcone Synthase is the rate limiting enzyme of 6-gingerol biosynthesis pathway) expression in high 6-gingerol containing landraces than in the low 6-gingerol containing landraces. Sequencing of Chalcone Synthase cDNA and subsequent multiple sequence alignment revealed seven SNPs between these contrasting genotypes. Converting this nucleotide sequence to amino acid sequence, alteration of two amino acids becomes evident; one amino acid change (asparagine to serine at position 336) is associated with base change (A→G) and another change (serine to leucine at position 142) is associated with the base change (C→T). Since asparagine at position 336 is one of the critical amino acids of the catalytic triad of Chalcone Synthase enzyme, responsible for substrate binding, our study suggests that landraces with a specific amino acid change viz. Asparagine (found in high 6-gingerol containing landraces) to serine causes low 6-gingerol content. This is probably due to a weak enzyme substrate association caused by the absence of asparagine in the catalytic triad. Detailed study of this finding could also help to understand molecular mechanism associated with variation in 6-gingerol content in Z.officinale genotypes and thereby strategies for developing elite genotypes containing high 6-gingerol content. Copyright © 2015 Elsevier B.V. All rights reserved.

  13. A gene variation of 14-3-3 zeta isoform in rat hippocampus.

    PubMed

    Murakami, K; Situ, S Y; Eshete, F

    1996-11-14

    A variant form of 14-3-3 zeta was isolated from the rat hippocampal cDNA library. The cloned cDNA is 1687 bp in length and it contains an entire ORF (nt = 63-797) with 245 amino acids that is characteristic to 14-3-3 zeta subtype. By comparing with reported sequences of 14-3-3 zeta, we found three nucleotide substitutions within the coding sequence in our clone; C<-->T transition at nt = 325 and G<-->C transversions at nt = 387 and 388. Both are missense mutations, leading ACG (Thr) to ATG (Met) and CGT (Arg) to GCT (Ala) conversions at residue 88 and 109, respectively. Our results show that at least three different genetic variants of 14-3-3 zeta are present in rat species which results in protein variations. Such mutation in the amino acid sequence is an important indication of the diverse functions of this protein and may also contribute to the recent contradictory observations regarding the role of the 14-3-3 zeta subtype.

  14. Statistical potential-based amino acid similarity matrices for aligning distantly related protein sequences.

    PubMed

    Tan, Yen Hock; Huang, He; Kihara, Daisuke

    2006-08-15

    Aligning distantly related protein sequences is a long-standing problem in bioinformatics, and a key for successful protein structure prediction. Its importance is increasing recently in the context of structural genomics projects because more and more experimentally solved structures are available as templates for protein structure modeling. Toward this end, recent structure prediction methods employ profile-profile alignments, and various ways of aligning two profiles have been developed. More fundamentally, a better amino acid similarity matrix can improve a profile itself; thereby resulting in more accurate profile-profile alignments. Here we have developed novel amino acid similarity matrices from knowledge-based amino acid contact potentials. Contact potentials are used because the contact propensity to the other amino acids would be one of the most conserved features of each position of a protein structure. The derived amino acid similarity matrices are tested on benchmark alignments at three different levels, namely, the family, the superfamily, and the fold level. Compared to BLOSUM45 and the other existing matrices, the contact potential-based matrices perform comparably in the family level alignments, but clearly outperform in the fold level alignments. The contact potential-based matrices perform even better when suboptimal alignments are considered. Comparing the matrices themselves with each other revealed that the contact potential-based matrices are very different from BLOSUM45 and the other matrices, indicating that they are located in a different basin in the amino acid similarity matrix space.

  15. Nucleotide and deduced amino acid sequence of the envelope gene of the Vasilchenko strain of TBE virus; comparison with other flaviviruses.

    PubMed

    Gritsun, T S; Frolova, T V; Pogodina, V V; Lashkevich, V A; Venugopal, K; Gould, E A

    1993-02-01

    A strain of tick-borne encephalitis virus known as Vasilchenko (Vs) exhibits relatively low virulence characteristics in monkeys, Syrian hamsters and humans. The gene encoding the envelope glycoprotein of this virus was cloned and sequenced. Alignment of the sequence with those of other known tick-borne flaviviruses and identification of the recognised amino acid genetic marker EHLPTA confirmed its identity as a member of the TBE complex. However, Vs virus was distinguishable from eastern and western tick-borne serotypes by the presence of the sequence AQQ at amino acid positions 232-234 and also by the presence of other specific amino acid substitutions which may be genetic markers for these viruses and could determine their pathogenetic characteristics. When compared with other tick-borne flaviviruses, Vs virus had 12 unique amino acid substitutions including an additional potential glycosylation site at position (315-317). The Vs virus strain shared closest nucleotide and amino acid homology (84.5% and 95.5% respectively) with western and far eastern strains of tick-borne encephalitis virus. Comparison with the far eastern serotype of tick-borne encephalitis virus, by cross-immunoelectrophoresis of Vs virions and PAGE analysis of the extracted virion proteins, revealed differences in surface charge and virus stability that may account for the different virulence characteristics of Vs virus. These results support and enlarge upon previous data obtained from molecular and serological analysis.

  16. Sequence diversity of the leukotoxin (lktA) gene in caprine and ovine strains of Mannheimia haemolytica.

    PubMed

    Vougidou, C; Sandalakis, V; Psaroulaki, A; Petridou, E; Ekateriniadou, L

    2013-04-20

    Mannheimia haemolytica is the aetiological agent of pneumonic pasteurellosis in small ruminants. The primary virulence factor of the bacterium is a leukotoxin (LktA), which induces apoptosis in susceptible cells via mitochondrial targeting. It has been previously shown that certain lktA alleles are associated either with cattle or sheep. The objective of the present study was to investigate lktA sequence variation among ovine and caprine M haemolytica strains isolated from pneumonic lungs, revealing any potential adaptation for the caprine host, for which there is no available data. Furthermore, we investigated amino acid variation in the N-terminal part of the sequences and its effect on targeting mitochondria. Data analysis showed that the prevalent caprine genotype differed at a single non-synonymous site from a previously described uncommon bovine allele, whereas the ovine sequences represented new, distinct alleles. N-terminal sequence differences did not affect the mitochondrial targeting ability of the isolates; interestingly enough in one case, mitochondrial matrix targeting was indicated rather than membrane association, suggesting an alternative LktA trafficking pattern.

  17. Sequence variations and two levels of MCT1 and CD147 expression in red blood cells and gluteus muscle of horses.

    PubMed

    Koho, N M; Mykkänen, A K; Reeben, M; Raekallio, M R; Ilves, M; Pösö, A R

    2012-01-01

    MCT1-CD147 complex is the prime lactate transporter in mammalian plasma membranes. In equine red blood cells (RBCs), activity of the complex and expression of MCT1 and CD147 is bimodal; high in 70% and low in 30%. We studied whether sequence variations contribute to the bimodal expression of MCT1 and CD147. Samples of blood and cremaster muscle were collected in connection of castration from 24 horses. Additional gluteus muscle samples were collected from 15 Standardbreds of which seven were known to express low amounts of CD147 in RBCs. The cDNA of MCT1 and CD147 together with a promoter region of CD147 was sequenced. The amounts of MCT1 and CD147 expressed in RBC and muscle membranes were measured by Western blot and mRNA levels in muscles by qPCR. MCT1 and CD147 were expressed in 20 castrates, and in four only were traces found. Sequence variations found in MCT1 were not linked to MCT1 expression. In CD147 linked heterozygous single nucleotide polymorphisms (SNPs) 389A>G (Met(125)Val) and 990C>T (3'-UTR) were associated to low expression of CD147. Also a mutation 168A>G (Ile(51)Val) in CD147 was associated to low MCT1 and CD147 expression. Low MCT1 and CD147 mRNA levels in gluteus were found in Standardbreds with low CD147 expression in RBCs. The results suggest that sequence variations affect the expression level of CD147, but do not explain its bimodality. The levels of MCT1 and CD147 mRNA correlated with the expression of CD147 and suggest that bimodality of their expression is regulated at transcriptional level. Copyright © 2011 Elsevier B.V. All rights reserved.

  18. Complete Genome Sequence of a thermotolerant sporogenic lactic acid bacterium, Bacillus coagulans strain 36D1

    PubMed Central

    Rhee, Mun Su; Moritz, Brélan E.; Xie, Gary; Glavina del Rio, T.; Dalin, E.; Tice, H.; Bruce, D.; Goodwin, L.; Chertkov, O.; Brettin, T.; Han, C.; Detter, C.; Pitluck, S.; Land, Miriam L.; Patel, Milind; Ou, Mark; Harbrucker, Roberta; Ingram, Lonnie O.; Shanmugam, K. T.

    2011-01-01

    Bacillus coagulans is a ubiquitous soil bacterium that grows at 50-55 °C and pH 5.0 and ferments various sugars that constitute plant biomass to L (+)-lactic acid. The ability of this sporogenic lactic acid bacterium to grow at 50-55 °C and pH 5.0 makes this organism an attractive microbial biocatalyst for production of optically pure lactic acid at industrial scale not only from glucose derived from cellulose but also from xylose, a major constituent of hemicellulose. This bacterium is also considered as a potential probiotic. Complete genome sequence of a representative strain, B. coagulans strain 36D1, is presented and discussed. PMID:22675583

  19. Modeling the plant-soil interaction in presence of heavy metal pollution and acidity variations.

    PubMed

    Guala, Sebastián; Vega, Flora A; Covelo, Emma F

    2013-01-01

    On a mathematical interaction model, developed to model metal uptake by plants and the effects on their growth, we introduce a modification which considers also effects on variations of acidity in soil. The model relates the dynamics of the uptake of metals from soil to plants and also variations of uptake according to the acidity level. Two types of relationships are considered: total and available metal content. We suppose simple mathematical assumptions in order to get as simple as possible expressions with the aim of being easily tested in experimental problems. This work introduces modifications to two versions of the model: on the one hand, the expression of the relationship between the metal in soil and the concentration of the metal in plants and, on the other hand, the relationship between the metal in the soil and total amount of the metal in plants. The fine difference of both versions is fundamental at the moment to consider the tolerance and capacity of accumulation of pollutants in the biomass from the soil.

  20. ViVaMBC: estimating viral sequence variation in complex populations from illumina deep-sequencing data using model-based clustering.

    PubMed

    Verbist, Bie; Clement, Lieven; Reumers, Joke; Thys, Kim; Vapirev, Alexander; Talloen, Willem; Wetzels, Yves; Meys, Joris; Aerssens, Jeroen; Bijnens, Luc; Thas, Olivier

    2015-02-22

    Deep-sequencing allows for an in-depth characterization of sequence variation in complex populations. However, technology associated errors may impede a powerful assessment of low-frequency mutations. Fortunately, base calls are complemented with quality scores which are derived from a quadruplet of intensities, one channel for each nucleotide type for Illumina sequencing. The highest intensity of the four channels determines the base that is called. Mismatch bases can often be corrected by the second best base, i.e. the base with the second highest intensity in the quadruplet. A virus variant model-based clustering method, ViVaMBC, is presented that explores quality scores and second best base calls for identifying and quantifying viral variants. ViVaMBC is optimized to call variants at the codon level (nucleotide triplets) which enables immediate biological interpretation of the variants with respect to their antiviral drug responses. Using mixtures of HCV plasmids we show that our method accurately estimates frequencies down to 0.5%. The estimates are unbiased when average coverages of 25,000 are reached. A comparison with the SNP-callers V-Phaser2, ShoRAH, and LoFreq shows that ViVaMBC has a superb sensitivity and specificity for variants with frequencies above 0.4%. Unlike the competitors, ViVaMBC reports a higher number of false-positive findings with frequencies below 0.4% which might partially originate from picking up artificial variants introduced by errors in the sample and library preparation step. ViVaMBC is the first method to call viral variants directly at the codon level. The strength of the approach lies in modeling the error probabilities based on the quality scores. Although the use of second best base calls appeared very promising in our data exploration phase, their utility was limited. They provided a slight increase in sensitivity, which however does not warrant the additional computational cost of running the offline base caller. Apparently

  1. Protein 3D Structure Computed from Evolutionary Sequence Variation

    PubMed Central

    Sheridan, Robert; Hopf, Thomas A.; Pagnani, Andrea; Zecchina, Riccardo; Sander, Chris

    2011-01-01

    The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing. In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy. We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues., including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7–4.8 Å Cα-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein

  2. The hypervariable region 1 protein of hepatitis C virus broadly reactive with sera of patients with chronic hepatitis C has a similar amino acid sequence with the consensus sequence.

    PubMed

    Watanabe, K; Yoshioka, K; Ito, H; Ishigami, M; Takagi, K; Utsunomiya, S; Kobayashi, M; Kishimoto, H; Yano, M; Kakumu, S

    1999-11-10

    Hypervariable region 1 (HVR1) proteins of hepatitis C virus (HCV) have been reported to react broadly with sera of patients with HCV infection. However, the variability of the broad reactivity of individual HVR1 proteins has not been elucidated. We assessed the reactivity of 25 different HVR1 proteins (genotype 1b) with sera of 81 patients with HCV infection (genotype 1b) by Western blot. HVR1 proteins reacted with 2-60 sera. The number of sera reactive with each HVR1 protein significantly correlated with the number of amino acid residues identical to the consensus sequence defined by Puntoriero et al. (G. Puntoriero, A. Lahm, S. Zucchelli, B. B. Ercole, R. Tafi, M. Penzzanera, M. U. Mondelli, R. Cortese, A. Tramontano, G. Galfre', and A. Nicosia. 1998. EMBO J. 17, 3521-3533. ) (r = 0.561, P < 0.005). The most widely reactive HVR1 protein, 12-22, had a sequence similar to the consensus sequence. The peptide with C-terminal 13-amino-acids sequence of HVR1 protein 12-22 (NH2-CSFTSLFTPGPSQK) was injected into rabbits as an immunogen. The rabbit immune sera reacted with 9 of 25 HVR1 proteins of genotype 1b including HVR1 protein 12-22 and with 3 of 12 proteins of genotype 2a. These results indicate that the HVR1 protein broadly reactive with patients' sera has a sequence similar to the consensus sequence, can induce broadly reactive sera, and could be one of the candidate immunogens in a prophylactic vaccine against HCV. Copyright 1999 Academic Press.

  3. Variation of ascorbic acid concentration in fruits of cultivated and wild apples.

    PubMed

    Fang, Ting; Zhen, Qiaoling; Liao, Liao; Owiti, Albert; Zhao, Li; Korban, Schuyler S; Han, Yuepeng

    2017-06-15

    Ascorbic acid (AsA) content in mature fruits of 457 apple accessions were measured, and a great variation in AsA concentration was detected. Wild fruits showed significantly higher level of AsA than cultivated fruits. Fruit AsA content was positively correlated with malic acid content, but negatively correlated with fruit weight and soluble solid content. Thus, the difference in AsA content between the wild and cultivated fruits could be attributed to an indirect consequence of human selection for larger fruit size, less acidity, and increased sweetness during apple domestication. Additionally, AsA concentration was extremely high in fruit at the juvenile stage, but dramatically decreased at the expanding and mature stages. The expression levels of three genes controlling AsA accumulation, MdGGP1, MdDHAR3-3, and MdNAT7-2, were significantly negatively correlated with AsA contents in fruits, suggesting a feedback regulation mechanism in AsA-related gene expression. Our results could be helpful for future apple breeding. Copyright © 2017 Elsevier Ltd. All rights reserved.

  4. Variation in a surface-exposed region of the Mycoplasma pneumoniae P40 protein as a consequence of homologous DNA recombination between RepMP5 elements.

    PubMed

    Spuesens, Emiel B M; van de Kreeke, Nick; Estevão, Silvia; Hoogenboezem, Theo; Sluijter, Marcel; Hartwig, Nico G; van Rossum, Annemarie M C; Vink, Cornelis

    2011-02-01

    Mycoplasma pneumoniae is a human pathogen that causes a range of respiratory tract infections. The first step in infection is adherence of the bacteria to the respiratory epithelium. This step is mediated by a specialized organelle, which contains several proteins (cytadhesins) that have an important function in adherence. Two of these cytadhesins, P40 and P90, represent the proteolytic products from a single 130 kDa protein precursor, which is encoded by the MPN142 gene. Interestingly, MPN142 contains a repetitive DNA element, termed RepMP5, of which homologues are found at seven other loci within the M. pneumoniae genome. It has been hypothesized that these RepMP5 elements, which are similar but not identical in sequence, recombine with their counterpart within MPN142 and thereby provide a source of sequence variation for this gene. As this variation may give rise to amino acid changes within P40 and P90, the recombination between RepMP5 elements may constitute the basis of antigenic variation and, possibly, immune evasion by M. pneumoniae. To investigate the sequence variation of MPN142 in relation to inter-RepMP5 recombination, we determined the sequences of all RepMP5 elements in a collection of 25 strains. The results indicate that: (i) inter-RepMP5 recombination events have occurred in seven of the strains, and (ii) putative RepMP5 recombination events involving MPN142 have induced amino acid changes in a surface-exposed part of the P40 protein in two of the strains. We conclude that recombination between RepMP5 elements is a common phenomenon that may lead to sequence variation of MPN142-encoded proteins.

  5. Genetic Variation and Its Reflection on Posttranslational Modifications in Frequency Clock and Mating Type a-1 Proteins in Sordaria fimicola

    PubMed Central

    Arif, Rabia; Akram, Faiza; Jamil, Tazeen; Lee, Siu Fai

    2017-01-01

    Posttranslational modifications (PTMs) occur in all essential proteins taking command of their functions. There are many domains inside proteins where modifications take place on side-chains of amino acids through various enzymes to generate different species of proteins. In this manuscript we have, for the first time, predicted posttranslational modifications of frequency clock and mating type a-1 proteins in Sordaria fimicola collected from different sites to see the effect of environment on proteins or various amino acids pickings and their ultimate impact on consensus sequences present in mating type proteins using bioinformatics tools. Furthermore, we have also measured and walked through genomic DNA of various Sordaria strains to determine genetic diversity by genotyping the short sequence repeats (SSRs) of wild strains of S. fimicola collected from contrasting environments of two opposing slopes (harsh and xeric south facing slope and mild north facing slope) of Evolution Canyon (EC), Israel. Based on the whole genome sequence of S. macrospora, we targeted 20 genomic regions in S. fimicola which contain short sequence repeats (SSRs). Our data revealed genetic variations in strains from south facing slope and these findings assist in the hypothesis that genetic variations caused by stressful environments lead to evolution. PMID:28717646

  6. Genetic Variation and Its Reflection on Posttranslational Modifications in Frequency Clock and Mating Type a-1 Proteins in Sordaria fimicola.

    PubMed

    Arif, Rabia; Akram, Faiza; Jamil, Tazeen; Mukhtar, Hamid; Lee, Siu Fai; Saleem, Muhammad

    2017-01-01

    Posttranslational modifications (PTMs) occur in all essential proteins taking command of their functions. There are many domains inside proteins where modifications take place on side-chains of amino acids through various enzymes to generate different species of proteins. In this manuscript we have, for the first time, predicted posttranslational modifications of frequency clock and mating type a-1 proteins in Sordaria fimicola collected from different sites to see the effect of environment on proteins or various amino acids pickings and their ultimate impact on consensus sequences present in mating type proteins using bioinformatics tools. Furthermore, we have also measured and walked through genomic DNA of various Sordaria strains to determine genetic diversity by genotyping the short sequence repeats (SSRs) of wild strains of S. fimicola collected from contrasting environments of two opposing slopes (harsh and xeric south facing slope and mild north facing slope) of Evolution Canyon (EC), Israel. Based on the whole genome sequence of S. macrospora , we targeted 20 genomic regions in S. fimicola which contain short sequence repeats (SSRs). Our data revealed genetic variations in strains from south facing slope and these findings assist in the hypothesis that genetic variations caused by stressful environments lead to evolution.

  7. Scop3D: three-dimensional visualization of sequence conservation.

    PubMed

    Vermeire, Tessa; Vermaere, Stijn; Schepens, Bert; Saelens, Xavier; Van Gucht, Steven; Martens, Lennart; Vandermarliere, Elien

    2015-04-01

    The integration of a protein's structure with its known sequence variation provides insight on how that protein evolves, for instance in terms of (changing) function or immunogenicity. Yet, collating the corresponding sequence variants into a multiple sequence alignment, calculating each position's conservation, and mapping this information back onto a relevant structure is not straightforward. We therefore built the Sequence Conservation on Protein 3D structure (scop3D) tool to perform these tasks automatically. The output consists of two modified PDB files in which the B-values for each position are replaced by the percentage sequence conservation, or the information entropy for each position, respectively. Furthermore, text files with absolute and relative amino acid occurrences for each position are also provided, along with snapshots of the protein from six distinct directions in space. The visualization provided by scop3D can for instance be used as an aid in vaccine development or to identify antigenic hotspots, which we here demonstrate based on an analysis of the fusion proteins of human respiratory syncytial virus and mumps virus. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  8. Serum uric acid concentrations and SLC2A9 genetic variation in Hispanic children: The Viva La Familia Study

    USDA-ARS?s Scientific Manuscript database

    Elevated concentrations of serum uric acid are associated with increased risk of gout and renal and cardiovascular diseases. Genetic studies in adults have consistently identified associations of solute carrier family 2, member 9 (SLC2A9), polymorphisms with variation in serum uric acid. However, it...

  9. Position-dependent effects of locked nucleic acid (LNA) on DNA sequencing and PCR primers

    PubMed Central

    Levin, Joshua D.; Fiala, Dean; Samala, Meinrado F.; Kahn, Jason D.; Peterson, Raymond J.

    2006-01-01

    Genomes are becoming heavily annotated with important features. Analysis of these features often employs oligonucleotides that hybridize at defined locations. When the defined location lies in a poor sequence context, traditional design strategies may fail. Locked Nucleic Acid (LNA) can enhance oligonucleotide affinity and specificity. Though LNA has been used in many applications, formal design rules are still being defined. To further this effort we have investigated the effect of LNA on the performance of sequencing and PCR primers in AT-rich regions, where short primers yield poor sequencing reads or PCR yields. LNA was used in three positional patterns: near the 5′ end (LNA-5′), near the 3′ end (LNA-3′) and distributed throughout (LNA-Even). Quantitative measures of sequencing read length (Phred Q30 count) and real-time PCR signal (cycle threshold, CT) were characterized using two-way ANOVA. LNA-5′ increased the average Phred Q30 score by 60% and it was never observed to decrease performance. LNA-5′ generated cycle thresholds in quantitative PCR that were comparable to high-yielding conventional primers. In contrast, LNA-3′ and LNA-Even did not improve read lengths or CT. ANOVA demonstrated the statistical significance of these results and identified significant interaction between the positional design rule and primer sequence. PMID:17071964

  10. Seasonal Variations of Low Molecular Weight Dicarboxylic Acids in Atmospheric Aerosols at Okinawa Islands, Japan

    NASA Astrophysics Data System (ADS)

    Nakaema, F.; Handa, D.; Tanahara, A.; Arakaki, T.

    2009-04-01

    Low molecular weight dicarboxylic acids are major fraction of water soluble organic compounds in the atmospheric aerosols. Recently, economy of East Asia grows up remarkably, and atmospheric aerosols discharged from this area have been transported to Japan. In this study, we collected aerosol at Cape Hedo (CH) and University of the Ryukyus(UR), and studied the distribution and origin of low molecule dicarboxylic acid. Aerosols were collected on a quartz filter with a high volume air sampler. Low molecular weight dicarboxylic acids extracted by pure water were derivatized to dibutyl esters by reactions with BF3/butanol and were measured by GC-FID. In many samples, oxalic acid showed the highest concentration. Concentration of oxalic acid, malonic acid, succinic acid and malic acid were strongly correlated between the two sampling sites. Oxalic acid occupied on the average 83% and 76% of all the dicarboxylic acid measured for CH samples and UR samples. It is suggested that the aerosols in Okinawa were affected by secondary photochemical reactions, not by the primary emissions from local sources. The seasonal variation of the dicarboxylic acids concentrations in CH and UR showed higher in spring and fall, and a lower in summer. From the back trajectory analysis, dicarboxylic acids concentrations showed higher when an air mass came from East Asia area, and showed lower when it came from Pacific Ocean.

  11. Comparison of the nucleotide and amino acid sequences of the RsrI and EcoRI restriction endonucleases.

    PubMed

    Stephenson, F H; Ballard, B T; Boyer, H W; Rosenberg, J M; Greene, P J

    1989-12-21

    The RsrI endonuclease, a type-II restriction endonuclease (ENase) found in Rhodobacter sphaeroides, is an isoschizomer of the EcoRI ENase. A clone containing an 11-kb BamHI fragment was isolated from an R. sphaeroides genomic DNA library by hybridization with synthetic oligodeoxyribonucleotide probes based on the N-terminal amino acid (aa) sequence of RsrI. Extracts of E. coli containing a subclone of the 11-kb fragment display RsrI activity. Nucleotide sequence analysis reveals an 831-bp open reading frame encoding a polypeptide of 277 aa. A 50% identity exists within a 266-aa overlap between the deduced aa sequences of RsrI and EcoRI. Regions of 75-100% aa sequence identity correspond to key structural and functional regions of EcoRI. The type-II ENases have many common properties, and a common origin might have been expected. Nevertheless, this is the first demonstration of aa sequence similarity between ENases produced by different organisms.

  12. Geochemical variations during the 2012 Emilia seismic sequence

    NASA Astrophysics Data System (ADS)

    Sciarra, Alessandra; Cantucci, Barbara; Galli, Gianfranco; Cinti, Daniele; Pizzino, Luca

    2015-04-01

    Several geochemical surveys (soil gas and shallow water) were performed in the Modena province (Massa Finalese, Finale Emilia, Medolla and S. Felice sul Panaro), during 2006-2014 period. In May-June 2012, a seismic sequence (main shocks of ML 5.9 and 5.8) was occurred closely to the investigated area. In this area 300 CO2 and CH4 fluxes measurements, 150 soil gas concentrations (He, H2, CO2, CH4 and C2H6), 30 shallow waters and their isotopic analyses (δ13C- CH4, δD- CH4 and δ13C- CO2) were performed in April-May 2006, October and December 2008, repeated in May and September 2012, June 2013 and July 2014 afterwards the 2012 Emilia seismic sequences. Chemical composition of soil gas are dominated by CH4 in the southern part by CO2 in the northern part. Very anomalous fluxes and concentrations are recorded in spot areas; elsewhere CO2 and CH4 values are very low, within the typical range of vegetative and of organic exhalation of the cultivated soil. After the seismic sequence the CH4 and CO2 fluxes are increased of one order of magnitude in the spotty areas, whereas in the surrounding area the values are within the background. On the contrary, CH4 concentration decrease (40%v/v in the 2012 surveys) and CO2 concentration increase until to 12.7%v/v (2013 survey). Isotopic gas analysis were carried out only on samples with anomalous values. Pre-seismic data hint a thermogenic origin of CH4 probably linked to leakage from a deep source in the Medolla area. Conversely, 2012/2013 isotopic data indicate a typical biogenic origin (i.e. microbial hydrocarbon production) of the CH4, as recognized elsewhere in the Po Plain and surroundings. The δ13C-CO2 value suggests a prevalent shallow origin of CO2 (i.e. organic and/or soil-derived) probably related to anaerobic oxidation of heavy hydrocarbons. Water samples, collected from domestic, industrial and hydrocarbons exploration wells, allowed us to recognize different families of waters. Waters are meteoric in origin and

  13. Draft Genome Sequence of the Butyric Acid Producer Clostridium tyrobutyricum Strain CIP I-776 (IFP923).

    PubMed

    Wasels, François; Clément, Benjamin; Lopes Ferreira, Nicolas

    2016-03-03

    Here, we report the draft genome sequence of Clostridium tyrobutyricum CIP I-776 (IFP923), an efficient producer of butyric acid. The genome consists of a single chromosome of 3.19 Mb and provides useful data concerning the metabolic capacities of the strain. Copyright © 2016 Wasels et al.

  14. Detection with synthetic oligonucleotide probes of nucleotide sequence variations in the genes encoding enterotoxins of Escherichia coli.

    PubMed Central

    Nishibuchi, M; Murakami, A; Arita, M; Jikuya, H; Takano, J; Honda, T; Miwatani, T

    1989-01-01

    We examined variations in the genes encoding heat-stable enterotoxin (ST) and heat-labile enterotoxin (LT) in 88 strains of Escherichia coli isolated from individuals with traveler's diarrhea to find suitable sequences for use as oligonucleotide probes. Four oligonucleotide probes of the gene encoding ST of human origin (STIb or STh), one oligonucleotide probe of the gene encoding ST of porcine origin (STIa or STp), and three oligonucleotide probes of the gene encoding LT of human origin (LTIh) were used in DNA colony hybridization tests. In 15 of 22 strains possessing the STh gene and 28 of 42 strains producing LT, the sequences of all regions tested were identical to the published sequences. One region in the STh gene examined with a 18-mer probe was relatively well conserved and was shown to be closely associated with the enterotoxicity of the E. coli strains in suckling mice. This oligonucleotide, however, hybridized with strains of Vibrio cholerae O1, V. parahaemolyticus, and Yersinia enterocolitica that gave negative results in the suckling mouse assay. PMID:2685027

  15. Whole Genome Sequencing of Greater Amberjack (Seriola dumerili) for SNP Identification on Aligned Scaffolds and Genome Structural Variation Analysis Using Parallel Resequencing

    PubMed Central

    Aokic, Jun-ya; Kawase, Junya; Hamada, Kazuhisa; Fujimoto, Hiroshi; Yamamoto, Ikki; Usuki, Hironori

    2018-01-01

    Greater amberjack (Seriola dumerili) is distributed in tropical and temperate waters worldwide and is an important aquaculture fish. We carried out de novo sequencing of the greater amberjack genome to construct a reference genome sequence to identify single nucleotide polymorphisms (SNPs) for breeding amberjack by marker-assisted or gene-assisted selection as well as to identify functional genes for biological traits. We obtained 200 times coverage and constructed a high-quality genome assembly using next generation sequencing technology. The assembled sequences were aligned onto a yellowtail (Seriola quinqueradiata) radiation hybrid (RH) physical map by sequence homology. A total of 215 of the longest amberjack sequences, with a total length of 622.8 Mbp (92% of the total length of the genome scaffolds), were lined up on the yellowtail RH map. We resequenced the whole genomes of 20 greater amberjacks and mapped the resulting sequences onto the reference genome sequence. About 186,000 nonredundant SNPs were successfully ordered on the reference genome. Further, we found differences in the genome structural variations between two greater amberjack populations using BreakDancer. We also analyzed the greater amberjack transcriptome and mapped the annotated sequences onto the reference genome sequence. PMID:29785397

  16. Large-Scale Concatenation cDNA Sequencing

    PubMed Central

    Yu, Wei; Andersson, Björn; Worley, Kim C.; Muzny, Donna M.; Ding, Yan; Liu, Wen; Ricafrente, Jennifer Y.; Wentland, Meredith A.; Lennon, Greg; Gibbs, Richard A.

    1997-01-01

    A total of 100 kb of DNA derived from 69 individual human brain cDNA clones of 0.7–2.0 kb were sequenced by concatenated cDNA sequencing (CCS), whereby multiple individual DNA fragments are sequenced simultaneously in a single shotgun library. The method yielded accurate sequences and a similar efficiency compared with other shotgun libraries constructed from single DNA fragments (>20 kb). Computer analyses were carried out on 65 cDNA clone sequences and their corresponding end sequences to examine both nucleic acid and amino acid sequence similarities in the databases. Thirty-seven clones revealed no DNA database matches, 12 clones generated exact matches (≥98% identity), and 16 clones generated nonexact matches (57%–97% identity) to either known human or other species genes. Of those 28 matched clones, 8 had corresponding end sequences that failed to identify similarities. In a protein similarity search, 27 clone sequences displayed significant matches, whereas only 20 of the end sequences had matches to known protein sequences. Our data indicate that full-length cDNA insert sequences provide significantly more nucleic acid and protein sequence similarity matches than expressed sequence tags (ESTs) for database searching. [All 65 cDNA clone sequences described in this paper have been submitted to the GenBank data library under accession nos. U79240–U79304.] PMID:9110174

  17. Ensembl variation resources

    PubMed Central

    2010-01-01

    Background The maturing field of genomics is rapidly increasing the number of sequenced genomes and producing more information from those previously sequenced. Much of this additional information is variation data derived from sampling multiple individuals of a given species with the goal of discovering new variants and characterising the population frequencies of the variants that are already known. These data have immense value for many studies, including those designed to understand evolution and connect genotype to phenotype. Maximising the utility of the data requires that it be stored in an accessible manner that facilitates the integration of variation data with other genome resources such as gene annotation and comparative genomics. Description The Ensembl project provides comprehensive and integrated variation resources for a wide variety of chordate genomes. This paper provides a detailed description of the sources of data and the methods for creating the Ensembl variation databases. It also explores the utility of the information by explaining the range of query options available, from using interactive web displays, to online data mining tools and connecting directly to the data servers programmatically. It gives a good overview of the variation resources and future plans for expanding the variation data within Ensembl. Conclusions Variation data is an important key to understanding the functional and phenotypic differences between individuals. The development of new sequencing and genotyping technologies is greatly increasing the amount of variation data known for almost all genomes. The Ensembl variation resources are integrated into the Ensembl genome browser and provide a comprehensive way to access this data in the context of a widely used genome bioinformatics system. All Ensembl data is freely available at http://www.ensembl.org and from the public MySQL database server at ensembldb.ensembl.org. PMID:20459805

  18. Organismal, genetic, and transcriptional variation in the deeply sequenced gut microbiomes of identical twins

    PubMed Central

    Turnbaugh, Peter J.; Quince, Christopher; Faith, Jeremiah J.; McHardy, Alice C.; Yatsunenko, Tanya; Niazi, Faheem; Affourtit, Jason; Egholm, Michael; Henrissat, Bernard; Knight, Rob; Gordon, Jeffrey I.

    2010-01-01

    We deeply sampled the organismal, genetic, and transcriptional diversity in fecal samples collected from a monozygotic (MZ) twin pair and compared the results to 1,095 communities from the gut and other body habitats of related and unrelated individuals. Using a new scheme for noise reduction in pyrosequencing data, we estimated the total diversity of species-level bacterial phylotypes in the 1.2-1.5 million bacterial 16S rRNA reads obtained from each deeply sampled cotwin to be ~800 (35.9%, 49.1% detected in both). A combined 1.1 million read 16S rRNA dataset representing 281 shallowly sequenced fecal samples from 54 twin pairs and their mothers contained an estimated 4,018 species-level phylotypes, with each sample having a unique species assemblage (53.4 ± 0.6% and 50.3 ± 0.5% overlap with the deeply sampled cotwins). Of the 134 phylotypes with a relative abundance of >0.1% in the combined dataset, only 37 appeared in >50% of the samples, with one phylotype in the Lachnospiraceae family present in 99%. Nongut communities had significantly reduced overlap with the deeply sequenced twins’ fecal microbiota (18.3 ± 0.3%, 15.3 ± 0.3%). The MZ cotwins’ fecal DNA was deeply sequenced (3.8-6.3 Gbp/sample) and assembled reads were assigned to 25 genus-level phylogenetic bins. Only 17% of the genes in these bins were shared between the cotwins. Bins exhibited differences in their degree of sequence variation, gene content including the repertoire of carbohydrate active enzymes present within and between twins (e.g., predicted cellulases, dockerins), and transcriptional activities. These results provide an expanded perspective about features that make each of us unique life forms and directions for future characterization of our gut ecosystems. PMID:20363958

  19. A Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data.

    PubMed

    Sepúlveda, Nuno; Campino, Susana G; Assefa, Samuel A; Sutherland, Colin J; Pain, Arnab; Clark, Taane G

    2013-02-26

    The advent of next generation sequencing technology has accelerated efforts to map and catalogue copy number variation (CNV) in genomes of important micro-organisms for public health. A typical analysis of the sequence data involves mapping reads onto a reference genome, calculating the respective coverage, and detecting regions with too-low or too-high coverage (deletions and amplifications, respectively). Current CNV detection methods rely on statistical assumptions (e.g., a Poisson model) that may not hold in general, or require fine-tuning the underlying algorithms to detect known hits. We propose a new CNV detection methodology based on two Poisson hierarchical models, the Poisson-Gamma and Poisson-Lognormal, with the advantage of being sufficiently flexible to describe different data patterns, whilst robust against deviations from the often assumed Poisson model. Using sequence coverage data of 7 Plasmodium falciparum malaria genomes (3D7 reference strain, HB3, DD2, 7G8, GB4, OX005, and OX006), we showed that empirical coverage distributions are intrinsically asymmetric and overdispersed in relation to the Poisson model. We also demonstrated a low baseline false positive rate for the proposed methodology using 3D7 resequencing data and simulation. When applied to the non-reference isolate data, our approach detected known CNV hits, including an amplification of the PfMDR1 locus in DD2 and a large deletion in the CLAG3.2 gene in GB4, and putative novel CNV regions. When compared to the recently available FREEC and cn.MOPS approaches, our findings were more concordant with putative hits from the highest quality array data for the 7G8 and GB4 isolates. In summary, the proposed methodology brings an increase in flexibility, robustness, accuracy and statistical rigour to CNV detection using sequence coverage data.

  20. A Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data

    PubMed Central

    2013-01-01

    Background The advent of next generation sequencing technology has accelerated efforts to map and catalogue copy number variation (CNV) in genomes of important micro-organisms for public health. A typical analysis of the sequence data involves mapping reads onto a reference genome, calculating the respective coverage, and detecting regions with too-low or too-high coverage (deletions and amplifications, respectively). Current CNV detection methods rely on statistical assumptions (e.g., a Poisson model) that may not hold in general, or require fine-tuning the underlying algorithms to detect known hits. We propose a new CNV detection methodology based on two Poisson hierarchical models, the Poisson-Gamma and Poisson-Lognormal, with the advantage of being sufficiently flexible to describe different data patterns, whilst robust against deviations from the often assumed Poisson model. Results Using sequence coverage data of 7 Plasmodium falciparum malaria genomes (3D7 reference strain, HB3, DD2, 7G8, GB4, OX005, and OX006), we showed that empirical coverage distributions are intrinsically asymmetric and overdispersed in relation to the Poisson model. We also demonstrated a low baseline false positive rate for the proposed methodology using 3D7 resequencing data and simulation. When applied to the non-reference isolate data, our approach detected known CNV hits, including an amplification of the PfMDR1 locus in DD2 and a large deletion in the CLAG3.2 gene in GB4, and putative novel CNV regions. When compared to the recently available FREEC and cn.MOPS approaches, our findings were more concordant with putative hits from the highest quality array data for the 7G8 and GB4 isolates. Conclusions In summary, the proposed methodology brings an increase in flexibility, robustness, accuracy and statistical rigour to CNV detection using sequence coverage data. PMID:23442253

  1. Rare coding variation in paraoxonase-1 is associated with ischemic stroke in the NHLBI Exome Sequencing Project[S

    PubMed Central

    Kim, Daniel Seung; Crosslin, David R.; Auer, Paul L.; Suzuki, Stephanie M.; Marsillach, Judit; Burt, Amber A.; Gordon, Adam S.; Meschia, James F.; Nalls, Mike A.; Worrall, Bradford B.; Longstreth, W. T.; Gottesman, Rebecca F.; Furlong, Clement E.; Peters, Ulrike; Rich, Stephen S.; Nickerson, Deborah A.; Jarvik, Gail P.

    2014-01-01

    HDL-associated paraoxonase-1 (PON1) is an enzyme whose activity is associated with cerebrovascular disease. Common PON1 genetic variants have not been consistently associated with cerebrovascular disease. Rare coding variation that likely alters PON1 enzyme function may be more strongly associated with stroke. The National Heart, Lung, and Blood Institute Exome Sequencing Project sequenced the coding regions (exomes) of the genome for heart, lung, and blood-related phenotypes (including ischemic stroke). In this sample of 4,204 unrelated participants, 496 had verified, noncardioembolic ischemic stroke. After filtering, 28 nonsynonymous PON1 variants were identified. Analysis with the sequence kernel association test, adjusted for covariates, identified significant associations between PON1 variants and ischemic stroke (P = 3.01 × 10−3). Stratified analyses demonstrated a stronger association of PON1 variants with ischemic stroke in African ancestry (AA) participants (P = 5.03 × 10−3). Ethnic differences in the association between PON1 variants with stroke could be due to the effects of PON1Val109Ile (overall P = 7.88 × 10−3; AA P = 6.52 × 10−4), found at higher frequency in AA participants (1.16% vs. 0.02%) and whose protein is less stable than the common allele. In summary, rare genetic variation in PON1 was associated with ischemic stroke, with stronger associations identified in those of AA. Increased focus on PON1 enzyme function and its role in cerebrovascular disease is warranted. PMID:24711634

  2. Simultaneous Detection of Both Single Nucleotide Variations and Copy Number Alterations by Next-Generation Sequencing in Gorlin Syndrome

    PubMed Central

    Morita, Kei-ichi; Naruto, Takuya; Tanimoto, Kousuke; Yasukawa, Chisato; Oikawa, Yu; Masuda, Kiyoshi; Imoto, Issei; Inazawa, Johji; Omura, Ken; Harada, Hiroyuki

    2015-01-01

    Gorlin syndrome (GS) is an autosomal dominant disorder that predisposes affected individuals to developmental defects and tumorigenesis, and caused mainly by heterozygous germline PTCH1 mutations. Despite exhaustive analysis, PTCH1 mutations are often unidentifiable in some patients; the failure to detect mutations is presumably because of mutations occurred in other causative genes or outside of analyzed regions of PTCH1, or copy number alterations (CNAs). In this study, we subjected a cohort of GS-affected individuals from six unrelated families to next-generation sequencing (NGS) analysis for the combined screening of causative alterations in Hedgehog signaling pathway-related genes. Specific single nucleotide variations (SNVs) of PTCH1 causing inferred amino acid changes were identified in four families (seven affected individuals), whereas CNAs within or around PTCH1 were found in two families in whom possible causative SNVs were not detected. Through a targeted resequencing of all coding exons, as well as simultaneous evaluation of copy number status using the alignment map files obtained via NGS, we found that GS phenotypes could be explained by PTCH1 mutations or deletions in all affected patients. Because it is advisable to evaluate CNAs of candidate causative genes in point mutation-negative cases, NGS methodology appears to be useful for improving molecular diagnosis through the simultaneous detection of both SNVs and CNAs in the targeted genes/regions. PMID:26544948

  3. Simultaneous Detection of Both Single Nucleotide Variations and Copy Number Alterations by Next-Generation Sequencing in Gorlin Syndrome.

    PubMed

    Morita, Kei-ichi; Naruto, Takuya; Tanimoto, Kousuke; Yasukawa, Chisato; Oikawa, Yu; Masuda, Kiyoshi; Imoto, Issei; Inazawa, Johji; Omura, Ken; Harada, Hiroyuki

    2015-01-01

    Gorlin syndrome (GS) is an autosomal dominant disorder that predisposes affected individuals to developmental defects and tumorigenesis, and caused mainly by heterozygous germline PTCH1 mutations. Despite exhaustive analysis, PTCH1 mutations are often unidentifiable in some patients; the failure to detect mutations is presumably because of mutations occurred in other causative genes or outside of analyzed regions of PTCH1, or copy number alterations (CNAs). In this study, we subjected a cohort of GS-affected individuals from six unrelated families to next-generation sequencing (NGS) analysis for the combined screening of causative alterations in Hedgehog signaling pathway-related genes. Specific single nucleotide variations (SNVs) of PTCH1 causing inferred amino acid changes were identified in four families (seven affected individuals), whereas CNAs within or around PTCH1 were found in two families in whom possible causative SNVs were not detected. Through a targeted resequencing of all coding exons, as well as simultaneous evaluation of copy number status using the alignment map files obtained via NGS, we found that GS phenotypes could be explained by PTCH1 mutations or deletions in all affected patients. Because it is advisable to evaluate CNAs of candidate causative genes in point mutation-negative cases, NGS methodology appears to be useful for improving molecular diagnosis through the simultaneous detection of both SNVs and CNAs in the targeted genes/regions.

  4. Genome Sequence of Lactobacillus sakei LK-145 Isolated from a Japanese Sake Cellar as a High Producer of d-Amino Acids

    PubMed Central

    Kato, Shiro

    2017-01-01

    ABSTRACT This announcement reports the complete genome sequence of strain LK-145 of Lactobacillus sakei isolated from a Japanese sake cellar as a potent strain for the production of large amounts of d-amino acids. Three putative genes encoding an amino acid racemase were identified. PMID:28818888

  5. [Complete genome sequencing and analyses of rabies viruses isolated from wild animals (Chinese Ferret-Badger) in Zhejiang province].

    PubMed

    Lei, Yong-Liang; Wang, Xiao-Guang; Liu, Fu-Ming; Chen, Xiu-Ying; Ye, Bi-Feng; Mei, Jian-Hua; Lan, Jin-Quan; Tang, Qing

    2009-08-01

    Based on sequencing the full-length genomes of two Chinese Ferret-Badger, we analyzed the properties of rabies viruses genetic variation in molecular level to get information on prevalence and variation of rabies viruses in Zhejiang, and to enrich the genome database of rabies viruses street strains isolated from Chinese wildlife. Overlapped fragments were amplified by RT-PCR and full-length genomes were assembled to analyze the nucleotide and deduced protein similarities and phylogenetic analyses of the N genes from Chinese Ferret-Badger, sika deer, vole, dog. Vaccine strains were then determined. The two full-length genomes were completely sequenced to find out that they had the same genetic structure with 11 923 nts including 58 nts-Leader, 1353 nts-NP, 894 nts-PP, 609 nts-MP, 1575 nts-GP, 6386 nts-LP, and 2, 5, 5 nts- intergenic regions (IGRs), 423 nts-Pseudogene-like sequence (Psi), 70 nts-Trailer. The two full-length genomes were in accordance with the properties of Rhabdoviridae Lyssa virus by blast and multi-sequence alignment. The nucleotide and amino acid sequences among Chinese strains had the highest similarity, especially among animals of the same species. Of the two full-length genomes, the similarity in amino acid level was dramatically higher than that in nucleotide level, so that the nucleotide mutations happened in these two genomes were most probably as synonymous mutations. Compared to the referenced rabies viruses, the lengths of the five protein coding regions did not show any changes or recombination, but only with a few-point mutations. It was evident that the five proteins appeared to be stable. The variation sites and types of the two ferret badgers genomes were similar to the referenced vaccine or street strains. The two strains were genotype 1 according to the multi-sequence and phylogenetic analyses, which possessing the distinct geographyphic characteristics of China. All the evidence suggested a cue that these two ferret badgers

  6. The effect of acidity on gill variations in the aquatic air-breathing fish, Trichogaster lalius.

    PubMed

    Huang, Chun-Yen; Lin, Hui-Chen

    2011-01-01

    Climate change affects organisms that inhabit not only in aerial but also in aquatic environments by making water more hypoxic and acidic. In the past, we evaluated morphological and functional variations in the gills of 12 species of aquatic air-breathing fishes. The aim of the present study is to examine the degree of gill modification in the aquatic air-breathing fish, Trichogaster lalius, in response to acidic stress. This provides a link between the ecological and physiological studies. We evaluated the changes in morphology and function of the gills, labyrinth organ, and kidney when the fish were subjected to acidic water and deionized water (DW). In the first experiment, fish were sampled at 1, 2, 4, and 7 days after acidic treatment. Apparent morphological modification was observed on day 4 and recovery was noted on day 7. Protein expression and enzyme activity of vacuolar-type H+-ATPase (VHA) and the protein expression of the proliferating cell nuclear antigen (PCNA) of the 1st and 4th gill arches both increased in the 4-day and 7-day acidic groups while the enzyme activity of Na+/K+-ATPase (NKA) decreased. In the second experiment, fish were tested for changes in the 1st and 4th gill arches and kidney after exposure to DW and acidic water for 4days. The gill structure of the fish in the DW was not different from that of the control group (fresh water). The protein expression and enzyme activity of the VHA of the 1st and 4th gill arches increased in both the DW and acidic groups for 4 days. We found a decrease in the protein expression of NKA in the kidney and in the enzyme activity of NKA in the 1st and 4th gill arches in the DW and acidic groups. From these results, we suggest that T. lalius exhibited significantly different ionic regulation and acid-base regulatory abilities in the DW and acidic groups in the 1st and 4th gill arches and kidney. The responses of the gills in T. lalius were different from those fish that show apparent morphological

  7. Proteogenomic Investigation of Strain Variation in Clinical Mycobacterium tuberculosis Isolates.

    PubMed

    Heunis, Tiaan; Dippenaar, Anzaan; Warren, Robin M; van Helden, Paul D; van der Merwe, Ruben G; Gey van Pittius, Nicolaas C; Pain, Arnab; Sampson, Samantha L; Tabb, David L

    2017-10-06

    Mycobacterium tuberculosis consists of a large number of different strains that display unique virulence characteristics. Whole-genome sequencing has revealed substantial genetic diversity among clinical M. tuberculosis isolates, and elucidating the phenotypic variation encoded by this genetic diversity will be of the utmost importance to fully understand M. tuberculosis biology and pathogenicity. In this study, we integrated whole-genome sequencing and mass spectrometry (GeLC-MS/MS) to reveal strain-specific characteristics in the proteomes of two clinical M. tuberculosis Latin American-Mediterranean isolates. Using this approach, we identified 59 peptides containing single amino acid variants, which covered ∼9% of all coding nonsynonymous single nucleotide variants detected by whole-genome sequencing. Furthermore, we identified 29 distinct peptides that mapped to a hypothetical protein not present in the M. tuberculosis H37Rv reference proteome. Here, we provide evidence for the expression of this protein in the clinical M. tuberculosis SAWC3651 isolate. The strain-specific databases enabled confirmation of genomic differences (i.e., large genomic regions of difference and nonsynonymous single nucleotide variants) in these two clinical M. tuberculosis isolates and allowed strain differentiation at the proteome level. Our results contribute to the growing field of clinical microbial proteogenomics and can improve our understanding of phenotypic variation in clinical M. tuberculosis isolates.

  8. Cloning and characterization of acid invertase genes in the roots of the metallophyte Kummerowia stipulacea (Maxim.) Makino from two populations: Differential expression under copper stress.

    PubMed

    Zhang, Luan; Xiong, Zhi-ting; Xu, Zhong-rui; Liu, Chen; Cai, Shen-wen

    2014-06-01

    The roots of metallophytes serve as the key interface between plants and heavy metal-contaminated underground environments. It is known that the roots of metallicolous plants show a higher activity of acid invertase enzymes than those of non-metallicolous plants when under copper stress. To test whether the higher activity of acid invertases is the result of increased expression of acid invertase genes or variations in the amino acid sequences between the two population types, we isolated full cDNAs for acid invertases from two populations of Kummerowia stipulacea (from metalliferous and non-metalliferous soils), determined their nucleotide sequences, expressed them in Pichia pastoris, and conducted real-time PCR to determine differences in transcript levels during Cu stress. Heterologous expression of acid invertase cDNAs in P. pastoris indicated that variations in the amino acid sequences of acid invertases between the two populations played no significant role in determining enzyme characteristics. Seedlings of K. stipulacea were exposed to 0.3µM Cu(2+) (control) and 10µM Cu(2+) for 7 days under hydroponics׳ conditions. The transcript levels of acid invertases in metallicolous plants were significantly higher than in non-metallicolous plants when under copper stress. The results suggest that the expression of acid invertase genes in metallicolous plants of K. stipulacea differed from those in non-metallicolous plants under such conditions. In addition, the sugars may play an important role in regulating the transcript level of acid invertase genes and acid invertase genes may also be involved in root/shoot biomass allocation. Copyright © 2014 Elsevier Inc. All rights reserved.

  9. Analysis of Sequence Variation and Risk Association of Human Papillomavirus 52 Variants Circulating in Korea

    PubMed Central

    Choi, Youn Jin; Ki, Eun Young; Zhang, Chuqing; Ho, Wendy C. S.; Lee, Sung-Jong; Jeong, Min Jin

    2016-01-01

    Introduction Human papillomavirus (HPV) 52 is a carcinogenic, high-risk genotype frequently detected in cervical cancer cases from East Asia, including Korea. Materials and Methods Sequences of HPV52 detected in 91 cervical samples collected from women attending Seoul St. Mary’s Hospital were analyzed. HPV52 genomic sequences were obtained by polymerase chain reaction (PCR)-based sequencing and analyzed using Seq-Scape software, and phylogenetic trees were constructed using MEGA6 software. Results Of the 91 cervical samples, 40 were normal, 22 were low-grade lesions, 21 were high-grade lesions and 7 were squamous cell carcinomas. Four HPV52 variant lineages (A, B, C and D) were identified. Lineage B was the most frequently detected lineage, followed by lineage C. By analyzing the two most frequently detected lineages (B and C), we found that distinct variations existed in each lineage. We also found that a lineage B-specific mutation K93R (A379G) was associated with an increased risk of cervical neoplasia. Conclusions To our knowledge, we are the first to reveal the predominance of the HPV52 lineages, B and C, in Korea. We also found these lineages harbored distinct genetic alterations that may affect oncogenicity. Our findings increase our understanding on the heterogeneity of HPV52 variants, and may be useful for the development of new diagnostic assays and therapeutic vaccines. PMID:27977741

  10. A worldwide survey of genome sequence variation provides insight into the evolutionary history of the honeybee Apis mellifera.

    PubMed

    Wallberg, Andreas; Han, Fan; Wellhagen, Gustaf; Dahle, Bjørn; Kawata, Masakado; Haddad, Nizar; Simões, Zilá Luz Paulino; Allsopp, Mike H; Kandemir, Irfan; De la Rúa, Pilar; Pirk, Christian W; Webster, Matthew T

    2014-10-01

    The honeybee Apis mellifera has major ecological and economic importance. We analyze patterns of genetic variation at 8.3 million SNPs, identified by sequencing 140 honeybee genomes from a worldwide sample of 14 populations at a combined total depth of 634×. These data provide insight into the evolutionary history and genetic basis of local adaptation in this species. We find evidence that population sizes have fluctuated greatly, mirroring historical fluctuations in climate, although contemporary populations have high genetic diversity, indicating the absence of domestication bottlenecks. Levels of genetic variation are strongly shaped by natural selection and are highly correlated with patterns of gene expression and DNA methylation. We identify genomic signatures of local adaptation, which are enriched in genes expressed in workers and in immune system- and sperm motility-related genes that might underlie geographic variation in reproduction, dispersal and disease resistance. This study provides a framework for future investigations into responses to pathogens and climate change in honeybees.

  11. Genome sequence of the thermophilic strain Bacillus coagulans 2-6, an efficient producer of high-optical-purity L-lactic acid.

    PubMed

    Su, Fei; Yu, Bo; Sun, Jibin; Ou, Hong-Yu; Zhao, Bo; Wang, Limin; Qin, Jiayang; Tang, Hongzhi; Tao, Fei; Jarek, Michael; Scharfe, Maren; Ma, Cuiqing; Ma, Yanhe; Xu, Ping

    2011-09-01

    Bacillus coagulans 2-6 is an efficient producer of lactic acid. The genome of B. coagulans 2-6 has the smallest genome among the members of the genus Bacillus known to date. The frameshift mutation at the start of the d-lactate dehydrogenase sequence might be responsible for the production of high-optical-purity l-lactic acid.

  12. Analyses of mitochondrial amino acid sequence datasets support the proposal that specimens of Hypodontus macropi from three species of macropodid hosts represent distinct species

    PubMed Central

    2013-01-01

    Background Hypodontus macropi is a common intestinal nematode of a range of kangaroos and wallabies (macropodid marsupials). Based on previous multilocus enzyme electrophoresis (MEE) and nuclear ribosomal DNA sequence data sets, H. macropi has been proposed to be complex of species. To test this proposal using independent molecular data, we sequenced the whole mitochondrial (mt) genomes of individuals of H. macropi from three different species of hosts (Macropus robustus robustus, Thylogale billardierii and Macropus [Wallabia] bicolor) as well as that of Macropicola ocydromi (a related nematode), and undertook a comparative analysis of the amino acid sequence datasets derived from these genomes. Results The mt genomes sequenced by next-generation (454) technology from H. macropi from the three host species varied from 13,634 bp to 13,699 bp in size. Pairwise comparisons of the amino acid sequences predicted from these three mt genomes revealed differences of 5.8% to 18%. Phylogenetic analysis of the amino acid sequence data sets using Bayesian Inference (BI) showed that H. macropi from the three different host species formed distinct, well-supported clades. In addition, sliding window analysis of the mt genomes defined variable regions for future population genetic studies of H. macropi in different macropodid hosts and geographical regions around Australia. Conclusions The present analyses of inferred mt protein sequence datasets clearly supported the hypothesis that H. macropi from M. robustus robustus, M. bicolor and T. billardierii represent distinct species. PMID:24261823

  13. Taste, umami-enhance effect and amino acid sequence of peptides separated from silkworm pupa hydrolysate.

    PubMed

    Yu, Zilin; Jiang, Hongrui; Guo, Rongcan; Yang, Bo; You, Gang; Zhao, Mouming; Liu, Xiaoling

    2018-06-01

    Four umami peptides were separated and purified by ultrafiltration, gel filtration chromatography and identified by ultra-performance liquid chromatography tandem mass-spectrometry (UPLC-MS/MS), the amino acid sequences of four peptides are Val-Pro-Tyr (VPY), Thr-Ala-Tyr (TAY), Ala-Ala-Pro-Tyr (AAPY) and Gly-Phe-Pro (GFP). The result illustrates that the umami amino acids are not the content of umami peptides, but bitter amino acids are included. The threshold of VPY, TAY, AAPY and GFP were 1.65 mmol/L, 1.76 mmol/L, 2.97 mmol/L and 6.26 mmol/L, respectively. The peptide TAY, VPY and AAPY had an umami-enhancement effect on the monosodium glutamate (MSG) + sodium chloride (NaCl) solution, their concentrations were 2.5 g/L, 5 g/L and 5 g/L, respectively, while GFP has no significant umami-enhancement effect in solution. In addition, the peptides have better taste than its composing amino acids, which indicates that the taste of peptide does not depend on its composing amino acids. Copyright © 2018. Published by Elsevier Ltd.

  14. The nucleotide sequence of HLA-B{sup *}2704 reveals a new amino acid substitution in exon 4 which is also present in HLA-B{sup *}2706

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rudwaleit, M.; Bowness, P.; Wordsworth, P.

    1996-12-31

    The HLA-B27 subtype HLA-B{sup *}2704 is virtually absent in Caucasians but common in Orientals, where it is associated with ankylosing spondylitis. The amino acid sequence of HLA-B{sup *}2704 has been established by peptide mapping and was shown to differ by two amino acids from HLA-B{sup *}2705, HLA-B{sup *}2704 is characterized by a serine for aspartic acid substitution at position 77 and glutamic acid for valine at position 152. To date, however, no nucleotide sequence confirming these changes at the DNA level has been published. 13 refs., 2 figs.

  15. Nucleic acid arrays and methods of synthesis

    DOEpatents

    Sabanayagam, Chandran R.; Sano, Takeshi; Misasi, John; Hatch, Anson; Cantor, Charles

    2001-01-01

    The present invention generally relates to high density nucleic acid arrays and methods of synthesizing nucleic acid sequences on a solid surface. Specifically, the present invention contemplates the use of stabilized nucleic acid primer sequences immobilized on solid surfaces, and circular nucleic acid sequence templates combined with the use of isothermal rolling circle amplification to thereby increase nucleic acid sequence concentrations in a sample or on an array of nucleic acid sequences.

  16. Nucleic Acid Amplification Testing and Sequencing Combined with Acid-Fast Staining in Needle Biopsy Lung Tissues for the Diagnosis of Smear-Negative Pulmonary Tuberculosis.

    PubMed

    Jiang, Faming; Huang, Weiwei; Wang, Ye; Tian, Panwen; Chen, Xuerong; Liang, Zongan

    2016-01-01

    Smear-negative pulmonary tuberculosis (PTB) is common and difficult to diagnose. In this study, we investigated the diagnostic value of nucleic acid amplification testing and sequencing combined with acid-fast bacteria (AFB) staining of needle biopsy lung tissues for patients with suspected smear-negative PTB. Patients with suspected smear-negative PTB who underwent percutaneous transthoracic needle biopsy between May 1, 2012, and June 30, 2015, were enrolled in this retrospective study. Patients with AFB in sputum smears were excluded. All lung biopsy specimens were fixed in formalin, embedded in paraffin, and subjected to acid-fast staining and tuberculous polymerase chain reaction (TB-PCR). For patients with positive AFB and negative TB-PCR results in lung tissues, probe assays and 16S rRNA sequencing were used for identification of nontuberculous mycobacteria (NTM). The sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and diagnostic accuracy of PCR and AFB staining were calculated separately and in combination. Among the 220 eligible patients, 133 were diagnosed with TB (men/women: 76/57; age range: 17-80 years, confirmed TB: 9, probable TB: 124). Forty-eight patients who were diagnosed with other specific diseases were assigned as negative controls, and 39 patients with indeterminate final diagnosis were excluded from statistical analysis. The sensitivity, specificity, PPV, NPV, and accuracy of histological AFB (HAFB) for the diagnosis of smear-negative were 61.7% (82/133), 100% (48/48), 100% (82/82), 48.5% (48/181), and 71.8% (130/181), respectively. The sensitivity, specificity, PPV, and NPV of histological PCR were 89.5% (119/133), 95.8% (46/48), 98.3% (119/121), and 76.7% (46/60), respectively, demonstrating that histological PCR had significantly higher accuracy (91.2% [165/181]) than histological acid-fast staining (71.8% [130/181]), P < 0.001. Parallel testing of histological AFB staining and PCR showed the

  17. Striking structural dynamism and nucleotide sequence variation of the transposon Galileo in the genome of Drosophila mojavensis

    PubMed Central

    2013-01-01

    Background Galileo is a transposable element responsible for the generation of three chromosomal inversions in natural populations of Drosophila buzzatii. Although the most characteristic feature of Galileo is the long internally-repetitive terminal inverted repeats (TIRs), which resemble the Drosophila Foldback element, its transposase-coding sequence has led to its classification as a member of the P-element superfamily (Class II, subclass 1, TIR order). Furthermore, Galileo has a wide distribution in the genus Drosophila, since it has been found in 6 of the 12 Drosophila sequenced genomes. Among these species, D. mojavensis, the one closest to D. buzzatii, presented the highest diversity in sequence and structure of Galileo elements. Results In the present work, we carried out a thorough search and annotation of all the Galileo copies present in the D. mojavensis sequenced genome. In our set of 170 Galileo copies we have detected 5 Galileo subfamilies (C, D, E, F, and X) with different structures ranging from nearly complete, to only 2 TIR or solo TIR copies. Finally, we have explored the structural and length variation of the Galileo copies that point out the relatively frequent rearrangements within and between Galileo elements. Different mechanisms responsible for these rearrangements are discussed. Conclusions Although Galileo is a transposable element with an ancient history in the D. mojavensis genome, our data indicate a recent transpositional activity. Furthermore, the dynamism in sequence and structure, mainly affecting the TIRs, suggests an active exchange of sequences among the copies. This exchange could lead to new subfamilies of the transposon, which could be crucial for the long-term survival of the element in the genome. PMID:23374229

  18. Striking structural dynamism and nucleotide sequence variation of the transposon Galileo in the genome of Drosophila mojavensis.

    PubMed

    Marzo, Mar; Bello, Xabier; Puig, Marta; Maside, Xulio; Ruiz, Alfredo

    2013-02-04

    Galileo is a transposable element responsible for the generation of three chromosomal inversions in natural populations of Drosophila buzzatii. Although the most characteristic feature of Galileo is the long internally-repetitive terminal inverted repeats (TIRs), which resemble the Drosophila Foldback element, its transposase-coding sequence has led to its classification as a member of the P-element superfamily (Class II, subclass 1, TIR order). Furthermore, Galileo has a wide distribution in the genus Drosophila, since it has been found in 6 of the 12 Drosophila sequenced genomes. Among these species, D. mojavensis, the one closest to D. buzzatii, presented the highest diversity in sequence and structure of Galileo elements. In the present work, we carried out a thorough search and annotation of all the Galileo copies present in the D. mojavensis sequenced genome. In our set of 170 Galileo copies we have detected 5 Galileo subfamilies (C, D, E, F, and X) with different structures ranging from nearly complete, to only 2 TIR or solo TIR copies. Finally, we have explored the structural and length variation of the Galileo copies that point out the relatively frequent rearrangements within and between Galileo elements. Different mechanisms responsible for these rearrangements are discussed. Although Galileo is a transposable element with an ancient history in the D. mojavensis genome, our data indicate a recent transpositional activity. Furthermore, the dynamism in sequence and structure, mainly affecting the TIRs, suggests an active exchange of sequences among the copies. This exchange could lead to new subfamilies of the transposon, which could be crucial for the long-term survival of the element in the genome.

  19. Genomic Landscape of Intrahost Variation in Group A Streptococcus: Repeated and Abundant Mutational Inactivation of the fabT Gene Encoding a Regulator of Fatty Acid Synthesis

    PubMed Central

    Eraso, Jesus M.; Olsen, Randall J.; Beres, Stephen B.; Kachroo, Priyanka; Porter, Adeline R.; Nasser, Waleed; Bernard, Paul E.; DeLeo, Frank R.

    2016-01-01

    To obtain new information about Streptococcus pyogenes intrahost genetic variation during invasive infection, we sequenced the genomes of 2,954 serotype M1 strains recovered from a nonhuman primate experimental model of necrotizing fasciitis. A total of 644 strains (21.8%) acquired polymorphisms relative to the input parental strain. The fabT gene, encoding a transcriptional regulator of fatty acid biosynthesis genes, contained 54.5% of these changes. The great majority of polymorphisms were predicted to deleteriously alter FabT function. Transcriptome-sequencing (RNA-seq) analysis of a wild-type strain and an isogenic fabT deletion mutant strain found that between 3.7 and 28.5% of the S. pyogenes transcripts were differentially expressed, depending on the growth temperature (35°C or 40°C) and growth phase (mid-exponential or stationary phase). Genes implicated in fatty acid synthesis and lipid metabolism were significantly upregulated in the fabT deletion mutant strain. FabT also directly or indirectly regulated central carbon metabolism genes, including pyruvate hub enzymes and fermentation pathways and virulence genes. Deletion of fabT decreased virulence in a nonhuman primate model of necrotizing fasciitis. In addition, the fabT deletion strain had significantly decreased survival in human whole blood and during phagocytic interaction with polymorphonuclear leukocytes ex vivo. We conclude that FabT mutant progeny arise during infection, constitute a metabolically distinct subpopulation, and are less virulent in the experimental models used here. PMID:27600505

  20. Genome Sequence of Sphingomonas wittichii DP58, the First Reported Phenazine-1-Carboxylic Acid-Degrading Strain

    PubMed Central

    Ma, Zhiwei; Shen, Xuemei; Wang, Wei; Peng, Huasong; Xu, Ping; Zhang, Xuehong

    2012-01-01

    Sphingomonas wittichii DP58 (CCTCC M 2012027), the first reported phenazine-1-carboxylic acid (PCA)-degrading strain, was isolated from pimiento rhizosphere soils. Here we present a 5.6-Mb assembly of its genome. This sequence would contribute to the elucidation of the molecular mechanism of PCA degradation to improve the antifungal's effectiveness or remove superfluous PCA. PMID:22689229

  1. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.; Wang, Chunwei; Jevons, Luis C.; Bernhart, Derek H.; Lipshutz, Robert J.

    2004-05-11

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  2. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    1998-08-18

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  3. Sequence heterogeneity of cannabidiolic- and tetrahydrocannabinolic acid-synthase in Cannabis sativa L. and its relationship with chemical phenotype.

    PubMed

    Onofri, Chiara; de Meijer, Etienne P M; Mandolino, Giuseppe

    2015-08-01

    Sequence variants of THCA- and CBDA-synthases were isolated from different Cannabis sativa L. strains expressing various wild-type and mutant chemical phenotypes (chemotypes). Expressed and complete sequences were obtained from mature inflorescences. Each strain was shown to have a different specificity and/or ability to convert the precursor CBGA into CBDA and/or THCA type products. The comparison of the expressed sequences led to the identification of different mutations, all of them due to SNPs. These SNPs were found to relate to the cannabinoid composition of the inflorescence at maturity and are therefore proposed to have a functional significance. The amount of variation was found to be higher within the CBDAS sequence family than in the THCAS family, suggesting a more recent evolution of THCA-forming enzymes from the CBDAS group. We therefore consider CBDAS as the ancestral type of these synthases. Copyright © 2015 Elsevier Ltd. All rights reserved.

  4. [Sequencing and analysis of complete genome of rabies viruses isolated from Chinese Ferret-Badger and dog in Zhejiang province].

    PubMed

    Lei, Yong-Liang; Wang, Xiao-Guang; Tao, Xiao-Yan; Li, Hao; Meng, Sheng-Li; Chen, Xiu-Ying; Liu, Fu-Ming; Ye, Bi-Feng; Tang, Qing

    2010-01-01

    Based on sequencing the full-length genomes of four Chinese Ferret-Badger and dog, we analyze the properties of rabies viruses genetic variation in molecular level, get the information about rabies viruses prevalence and variation in Zhejiang, and enrich the genome database of rabies viruses street strains isolated from China. Rabies viruses in suckling mice were isolated, overlapped fragments were amplified by RT-PCR and full-length genomes were assembled to analyze the nucleotide and deduced protein similarities and phylogenetic analyses from Chinese Ferret-Badger, dog, sika deer, vole, used vaccine strain were determined. The four full-length genomes were sequenced completely and had the same genetic structure with the length of 11, 923 nts or 11, 925 nts including 58 nts-Leader, 1353 nts-NP, 894 nts-PP, 609 nts-MP, 1575 nts-GP, 6386 nts-LP, and 2, 5, 5 nts- intergenic regions(IGRs), 423 nts-Pseudogene-like sequence (psi), 70 nts-Trailer. The four full-length genomes were in accordance with the properties of Rhabdoviridae Lyssa virus by BLAST and multi-sequence alignment. The nucleotide and amino acid sequences among Chinese strains had the highest similarity, especially among animals of the same species. Of the four full-length genomes, the similarity in amino acid level was dramatically higher than that in nucleotide level, so the nucleotide mutations happened in these four genomes were most synonymous mutations. Compared with the reference rabies viruses, the lengths of the five protein coding regions had no change, no recombination, only with a few point mutations. It was evident that the five proteins appeared to be stable. The variation sites and types of the four genomes were similar to the reference vaccine or street strains. And the four strains were genotype 1 according to the multi-sequence and phylogenetic analyses, which possessed the distinct district characteristics of China. Therefore, these four rabies viruses are likely to be street viruses

  5. Sequence variation in SORL1 and Dementia risk in Swedes

    PubMed Central

    Reynolds, Chandra A.; Hong, Mun-Gwan; Eriksson, Ulrika K.; Blennow, Kaj; Johansson, Boo; Malmberg, Bo; Berg, Stig; Gatz, Margaret; Pedersen, Nancy L.; Bennet, Anna M.; Prince, Jonathan A.

    2010-01-01

    The gene encoding the neuronal sortilin-related receptor SORL1 has been claimed to be associated with Alzheimer Disease by independent groups and across various human populations. We evaluated six genetic markers in SORL1 in a sample of 1558 Swedish dementia cases (including 1270 Alzheimer disease cases) and 2179 controls. For both single marker and haplotype-based analyses we found no strong support for SORL1 as a dementia- or AD-risk modifying gene in our sample in isolation, nor did we observe association with AD/dementia-related traits, including CSF β-amyloid1–42, tau levels, or age-at-onset. However, meta-analyses of markers in this study together with previously published studies on SORL1 encompassing in excess of 13,000 individuals does suggest significant association with AD (best OR 1.097; 95% CI 1.038–1.158, p = 0.001). All six markers were significant in meta-analyses and it is notable that they occur in two distinct LD blocks. These data are consistent with either allelic heterogeneity or the existence of as yet untested functional variants and these will be important considerations in further attempts to evaluate the importance of sequence variation in SORL1 with AD risk. PMID:19653016

  6. Understanding variations in secondary findings reporting practices across U.S. genome sequencing laboratories.

    PubMed

    Ackerman, Sara L; Koenig, Barbara A

    2018-01-01

    Increasingly used for clinical purposes, genome and exome sequencing can generate clinically relevant information that is not directly related to the reason for testing (incidental or secondary findings). Debates about the ethical implications of secondary findings were sparked by the American College of Medical Genetics (ACMG) 2013 policy statement, which recommended that laboratories report pathogenic alterations in 56 genes. Although wide variation in laboratories' secondary findings policies has been reported, little is known about its causes. We interviewed 18 laboratory directors and genetic counselors at 10 U.S. laboratories to investigate the motivations and interests shaping secondary findings reporting policies for clinical exome sequencing. Analysis of interview transcripts and laboratory documents was informed by sociological theories of standardization. Laboratories varied widely in terms of the types of secondary findings reported, consent-form language, and choices offered to patients. In explaining their adaptation of the ACMG report, our participants weighed genetic information's clinical, moral, professional, and commercial value in an attempt to maximize benefits for patients and families, minimize the costs of sequencing and analysis, adhere to professional norms, attract customers, and contend with the uncertain clinical implications of much of the genetic information generated. Nearly all laboratories in our study voluntarily adopted ACMG's recommendations, but their actual practices varied considerably and were informed by laboratory-specific judgments about clinical utility and patient benefit. Our findings offer a compelling example of standardization as a complex process that rarely leads simply to uniformity of practice. As laboratories take on a more prominent role in decisions about the return of genetic information, strategies are needed to inform patients, families, and clinicians about the differences between laboratories' practices

  7. Variation of Select Flavonols and Chlorogenic Acid Content of Elderberry Collected Throughout the Eastern United States

    PubMed Central

    Mudge, Elizabeth; Applequist, Wendy L.; Finley, Jamie; Lister, Patience; Townesmith, Andrew K.; Walker, Karen M.; Brown, Paula N.

    2016-01-01

    American elderberries are commonly collected from wild plants for use as food and medicinal products. The degree of phytochemical variation amongst wild populations has not been established and might affect the overall quality of elderberry dietary supplements. The three major flavonols identified in elderberries are rutin, quercetin and isoquercetin. Variation in the flavonols and chlorogenic acid was determined for 107 collections of elderberries from throughout the eastern United States using an optimized high performance liquid chromatography with ultraviolet detection method. The mean content was 71.9 mg per 100g fresh weight with variation ranging from 7.0 to 209.7 mg per 100 g fresh weight within the collected population. Elderberries collected from southeastern regions had significantly higher contents in comparison with those in more northern regions. The variability of the individual flavonol and chlorogenic acid profiles of the berries was complex and likely influenced by multiple factors. Several outliers were identified based on unique phytochemical profiles in comparison with average populations. This is the first study to determine the inherent variability of American elderberries from wild collections and can be used to identify potential new cultivars that may produce fruits of unique or high-quality phytochemical content for the food and dietary supplement industries. PMID:26877585

  8. First draft genome sequencing of indole acetic acid producing and plant growth promoting fungus Preussia sp. BSL10.

    PubMed

    Khan, Abdul Latif; Asaf, Sajjad; Khan, Abdur Rahim; Al-Harrasi, Ahmed; Al-Rawahi, Ahmed; Lee, In-Jung

    2016-05-10

    Preussia sp. BSL10, family Sporormiaceae, was actively producing phytohormone (indole-3-acetic acid) and extra-cellular enzymes (phosphatases and glucosidases). The fungus was also promoting the growth of arid-land tree-Boswellia sacra. Looking at such prospects of this fungus, we sequenced its draft genome for the first time. The Illumina based sequence analysis reveals an approximate genome size of 31.4Mbp for Preussia sp. BSL10. Based on ab initio gene prediction, total 32,312 coding sequences were annotated consisting of 11,967 coding genes, pseudogenes, and 221 tRNA genes. Furthermore, 321 carbohydrate-active enzymes were predicted and classified into many functional families. Copyright © 2016 Elsevier B.V. All rights reserved.

  9. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, M.S.

    1998-08-18

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device. 27 figs.

  10. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    2003-08-19

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments may be improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  11. Relationship between ureB Sequence Diversity, Urease Activity and Genotypic Variations of Different Helicobacter pylori Strains in Patients with Gastric Disorders.

    PubMed

    Ghalehnoei, Hossein; Ahmadzadeh, Alireza; Farzi, Nastaran; Alebouyeh, Masoud; Aghdaei, Hamid Asadzadeh; Azimzadeh, Pendram; Molaei, Mahsa; Zali, Mohammad Reza

    2016-01-01

    Association of the severity of Helicobacter pylori induced diseases with virulence entity of the colonized strains was proven in some studies. Urease has been demonstrated as a potent virulence factor for H. pylori. The main aim of this study was investigation of the relationships of ureB sequence diversity, urease activity and virulence genotypes of different H. pylori strains with histopathological changes of gastric tissue in infected patients suffering from different gastric disorders. Analysis of the virulence genotypes in the isolated strains indicated significant associations between the presence of severe active gastritis and cagA+ (P = 0.039) or cagA/iceA1 genotypes (P = 0.026), and intestinal metaplasia and vacA m1 (P = 0.008) or vacA s1/m2 (P = 0.001) genotypes. Our results showed a 2.4-fold increased risk of peptic ulcer (95% CI: 0.483-11.93), compared with gastritis, in the infected patients who had dupA positive strains; however this association was not statistically significant. The results of urease activity showed a significant mean difference between the isolated strains from patients with PUD and NUD (P = 0.034). This activity was relatively higher among patients with intestinal metaplasia. Also a significant association was found between the lack of cagA and increased urease activity among the isolated strains (P = 0.036). While the greatest sequence variation of ureB was detected in a strain from a patient with intestinal metaplasia, the sole determined amino acid change in UreB sequence (Ala201Thr, 30%), showed no influence on urease activity. In conclusion, the supposed role of H. pylori urease to form peptic ulcer and advancing of intestinal metaplasia was postulated in this study. Higher urease activity in the colonizing H. pylori strains that present specific virulence factors was indicated as a risk factor for promotion of histopathological changes of gastric tissue that advance gastric malignancy.

  12. RSAT 2015: Regulatory Sequence Analysis Tools

    PubMed Central

    Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A.; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M.; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

    2015-01-01

    RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. PMID:25904632

  13. Amino acid sequences of ribosomal proteins S11 from Bacillus stearothermophilus and S19 from Halobacterium marismortui. Comparison of the ribosomal protein S11 family.

    PubMed

    Kimura, M; Kimura, J; Hatakeyama, T

    1988-11-21

    The complete amino acid sequences of ribosomal proteins S11 from the Gram-positive eubacterium Bacillus stearothermophilus and of S19 from the archaebacterium Halobacterium marismortui have been determined. A search for homologous sequences of these proteins revealed that they belong to the ribosomal protein S11 family. Homologous proteins have previously been sequenced from Escherichia coli as well as from chloroplast, yeast and mammalian ribosomes. A pairwise comparison of the amino acid sequences showed that Bacillus protein S11 shares 68% identical residues with S11 from Escherichia coli and a slightly lower homology (52%) with the homologous chloroplast protein. The halophilic protein S19 is more related to the eukaryotic (45-49%) than to the eubacterial counterparts (35%).

  14. In Vitro Modeling of Bile Acid Processing by the Human Fecal Microbiota.

    PubMed

    Martin, Glynn; Kolida, Sofia; Marchesi, Julian R; Want, Elizabeth; Sidaway, James E; Swann, Jonathan R

    2018-01-01

    Bile acids, the products of concerted host and gut bacterial metabolism, have important signaling functions within the mammalian metabolic system and a key role in digestion. Given the complexity of the mega-variate bacterial community residing in the gastrointestinal tract, studying associations between individual bacterial genera and bile acid processing remains a challenge. Here, we present a novel in vitro approach to determine the bacterial genera associated with the metabolism of different primary bile acids and their potential to contribute to inter-individual variation in this processing. Anaerobic, pH-controlled batch cultures were inoculated with human fecal microbiota and treated with individual conjugated primary bile acids (500 μg/ml) to serve as the sole substrate for 24 h. Samples were collected throughout the experiment (0, 5, 10, and 24 h) and the bacterial composition was determined by 16S rRNA gene sequencing and the bile acid signatures were characterized using a targeted ultra-performance liquid chromatography-mass spectrometry (UPLC-MS) approach. Data fusion techniques were used to identify statistical bacterial-metabolic linkages. An increase in gut bacteria associated bile acids was observed over 24 h with variation in the rate of bile acid metabolism across the volunteers ( n = 7). Correlation analysis identified a significant association between the Gemmiger genus and the deconjugation of glycine conjugated bile acids while the deconjugation of taurocholic acid was associated with bacteria from the Eubacterium and Ruminococcus genera. A positive correlation between Dorea and deoxycholic acid production suggest a potential role for this genus in cholic acid dehydroxylation. A slower deconjugation of taurocholic acid was observed in individuals with a greater abundance of Parasutterella and Akkermansia . This work demonstrates the utility of integrating compositional (metataxonomics) and functional (metabonomics) systems biology approaches

  15. Significant seasonal variations of microbial community in an acid mine drainage lake in Anhui Province, China.

    PubMed

    Hao, Chunbo; Wei, Pengfei; Pei, Lixin; Du, Zerui; Zhang, Yi; Lu, Yanchun; Dong, Hailiang

    2017-04-01

    Acid mine drainage (AMD),characterized by strong acidity and high metal concentrations, generates from the oxidative dissolution of metal sulfides, and acidophiles can accelerate the process significantly. Despite extensive research in microbial diversity and community composition, little is known about seasonal variations of microbial community structure (especially micro eukaryotes) in response to environmental conditions in AMD ecosystem. To this end, AMD samples were collected from Nanshan AMD lake, Anhui Province, China, over a full seasonal cycle from 2013 to 2014, and water chemistry and microbial composition were studied. pH of lake water was stable (∼3.0) across the sampling period, while the concentrations of ions varied dramatically. The highest metal concentrations in the lake were found for Mg and Al, not commonly found Fe. Unexpectedly, ultrahigh concentration of chlorophyll a was measured in the extremely acidic lake, reaching 226.43-280.95 μg/L in winter, even higher than those in most eutrophic freshwater lakes. Both prokaryotic and eukaryotic communities showed a strong seasonal variation. Among the prokaryotes, "Ferrovum", a chemolithotrophic iron-oxidizing bacterium was predominant in most sampling seasons, although it was a minor member prior to September, 2012. Fe 2+ was the initial geochemical factor that drove the variation of the prokaryotic community. The eukaryotic community was simple but varied more drastically than the prokaryotic community. Photoautotrophic algae (primary producers) formed a food web with protozoa or flagellate (top consumers) across all four seasons, and temperature appeared to be responsible for the observed seasonal variation. Ochromonas and Chlamydomonas (responsible for high algal bloom in winter) occurred in autumn/summer and winter/spring seasons, respectively, because of their distinct growth temperatures. The closest phylogenetic relationship between Chlamydomonas species in the lake and those in Arctic

  16. Whole Genome Sequencing demonstrates that Geographic Variation of Escherichia coli O157 Genotypes Dominates Host Association.

    PubMed

    Strachan, Norval J C; Rotariu, Ovidiu; Lopes, Bruno; MacRae, Marion; Fairley, Susan; Laing, Chad; Gannon, Victor; Allison, Lesley J; Hanson, Mary F; Dallman, Tim; Ashton, Philip; Franz, Eelco; van Hoek, Angela H A M; French, Nigel P; George, Tessy; Biggs, Patrick J; Forbes, Ken J

    2015-10-07

    Genetic variation in an infectious disease pathogen can be driven by ecological niche dissimilarities arising from different host species and different geographical locations. Whole genome sequencing was used to compare E. coli O157 isolates from host reservoirs (cattle and sheep) from Scotland and to compare genetic variation of isolates (human, animal, environmental/food) obtained from Scotland, New Zealand, Netherlands, Canada and the USA. Nei's genetic distance calculated from core genome single nucleotide polymorphisms (SNPs) demonstrated that the animal isolates were from the same population. Investigation of the Shiga toxin bacteriophage and their insertion sites (SBI typing) revealed that cattle and sheep isolates had statistically indistinguishable rarefaction profiles, diversity and genotypes. In contrast, isolates from different countries exhibited significant differences in Nei's genetic distance and SBI typing. Hence, after successful international transmission, which has occurred on multiple occasions, local genetic variation occurs, resulting in a global patchwork of continental and trans-continental phylogeographic clades. These findings are important for three reasons: first, understanding transmission and evolution of infectious diseases associated with multiple host reservoirs and multi-geographic locations; second, highlighting the relevance of the sheep reservoir when considering farm based interventions; and third, improving our understanding of why human disease incidence varies across the world.

  17. Human Genome Sequencing in Health and Disease

    PubMed Central

    Gonzaga-Jauregui, Claudia; Lupski, James R.; Gibbs, Richard A.

    2013-01-01

    Following the “finished,” euchromatic, haploid human reference genome sequence, the rapid development of novel, faster, and cheaper sequencing technologies is making possible the era of personalized human genomics. Personal diploid human genome sequences have been generated, and each has contributed to our better understanding of variation in the human genome. We have consequently begun to appreciate the vastness of individual genetic variation from single nucleotide to structural variants. Translation of genome-scale variation into medically useful information is, however, in its infancy. This review summarizes the initial steps undertaken in clinical implementation of personal genome information, and describes the application of whole-genome and exome sequencing to identify the cause of genetic diseases and to suggest adjuvant therapies. Better analysis tools and a deeper understanding of the biology of our genome are necessary in order to decipher, interpret, and optimize clinical utility of what the variation in the human genome can teach us. Personal genome sequencing may eventually become an instrument of common medical practice, providing information that assists in the formulation of a differential diagnosis. We outline herein some of the remaining challenges. PMID:22248320

  18. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    1999-10-26

    A computer system (1) for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments may be improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area (814) and sample sequences in another area (816) on a display device (3).

  19. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    2001-06-05

    A computer system (1) for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments may be improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area (814) and sample sequences in another area (816) on a display device (3).

  20. Contribution of human growth hormone-releasing hormone receptor (GHRHR) gene sequence variation to isolated severe growth hormone deficiency (ISGHD) and normal adult height.

    PubMed

    Camats, Núria; Fernández-Cancio, Mónica; Carrascosa, Antonio; Andaluz, Pilar; Albisu, M Ángeles; Clemente, María; Gussinyé, Miquel; Yeste, Diego; Audí, Laura

    2012-10-01

    Molecular causes of isolated severe growth hormone deficiency (ISGHD) in several genes have been established. The aim of this study was to analyse the contribution of growth hormone-releasing hormone receptor (GHRHR) gene sequence variation to GH deficiency in a series of prepubertal ISGHD patients and to normal adult height. A systematic GHRHR gene sequence analysis was performed in 69 ISGHD patients and 60 normal adult height controls (NAHC). Four GHRHR single-nucleotide polymorphisms (SNPs) were genotyped in 248 additional NAHC. An analysis was performed on individual SNPs and combined genotype associations with diagnosis in ISGHD patients and with height-SDS in NAHC. Twenty-one SNPs were found. P3, P13, P15 and P20 had not been previously described. Patients and controls shared 12 SNPs (P1, P2, P4-P11, P16 and P21). Significantly different frequencies of the heterozygous genotype and alternate allele were detected in P9 (exon 4, rs4988498) and P12 (intron 6, rs35609199); P9 heterozygous genotype frequencies were similar in patients and the shortest control group (heights between -2 and -1 SDS) and significantly different in controls (heights between -1 and +2 SDS). GHRHR P9 together with 4 GH1 SNP genotypes contributed to 6·2% of height-SDS variation in the entire 308 NAHC. This study established the GHRHR gene sequence variation map in ISGHD patients and NAHC. No evidence of GHRHR mutation contribution to ISGHD was found in this population, although P9 and P12 SNP frequencies were significantly different between ISGHD and NAHC. Thus, the gene sequence may contribute to normal adult height, as demonstrated in NAHC. © 2012 Blackwell Publishing Ltd.

  1. Regional variations in the presentation and surgical management of Pierre Robin sequence.

    PubMed

    Scott, Andrew R; Mader, Nicholas S

    2014-12-01

    To estimate the current birth prevalence of isolated and syndromic Pierre Robin sequence (iPRS and sPRS), including demographic variations. To assess for regional variations in surgical airway interventions for PRS, and to determine the mean length of stay (LOS), cost of admission, complication rate, and rate of associated procedures related to tongue-lip adhesion (TLA), neonatal mandibular distraction osteogenesis (MDO), and tracheotomy. Retrospective cross-sectional study. The 2006 and 2009 Kids Inpatient Databases were used to identify newborns and infants with PRS; analysis using cross tabulations and linear regression modeling was performed. In 2006 and 2009, the estimated birth prevalence of iPRS was 1.8:10,000 live births and sPRS 1.4:10,000 live births. The highest rate was in whites and the lowest in non-Hispanic blacks. There were 145 TLAs (36%), 176 MDOs (43%), and 85 tracheotomies (21%). The Northeast favored a TLA strategy; the Midwest favored MDO. The mean LOS for TLA was 24.5 days, MDO 36.7 days, tracheotomy (iPRS) 44.9 days, and tracheotomy (sPRS) 53.0 days. The birth prevalence of PRS may be higher than previously described, especially in whites. Surgical management strategies vary between regions. The overall cost of a TLA admission is lower than an MDO or tracheotomy admission, owing primarily to shorter LOS. This study was limited by not taking into account outpatient expenses (nursing care, monitoring) or need for further airway/feeding intervention over subsequent admissions. 2c. © 2014 The American Laryngological, Rhinological and Otological Society, Inc.

  2. Quantum-Sequencing: Fast electronic single DNA molecule sequencing

    NASA Astrophysics Data System (ADS)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free, high-throughput and cost-effective, single-molecule sequencing method. Here, we present the first demonstration of unique ``electronic fingerprint'' of all nucleotides (A, G, T, C), with single-molecule DNA sequencing, using Quantum-tunneling Sequencing (Q-Seq) at room temperature. We show that the electronic state of the nucleobases shift depending on the pH, with most distinct states identified at acidic pH. We also demonstrate identification of single nucleotide modifications (methylation here). Using these unique electronic fingerprints (or tunneling data), we report a partial sequence of beta lactamase (bla) gene, which encodes resistance to beta-lactam antibiotics, with over 95% success rate. These results highlight the potential of Q-Seq as a robust technique for next-generation sequencing.

  3. Comparative sequence analysis of acid sensitive/resistance proteins in Escherichia coli and Shigella flexneri

    PubMed Central

    Manikandan, Selvaraj; Balaji, Seetharaaman; Kumar, Anil; Kumar, Rita

    2007-01-01

    The molecular basis for the survival of bacteria under extreme conditions in which growth is inhibited is a question of great current interest. A preliminary study was carried out to determine residue pattern conservation among the antiporters of enteric bacteria, responsible for extreme acid sensitivity especially in Escherichia coli and Shigella flexneri. Here we found the molecular evidence that proved the relationship between E. coli and S. flexneri. Multiple sequence alignment of the gadC coded acid sensitive antiporter showed many conserved residue patterns at regular intervals at the N-terminal region. It was observed that as the alignment approaches towards the C-terminal, the number of conserved residues decreases, indicating that the N-terminal region of this protein has much active role when compared to the carboxyl terminal. The motif, FHLVFFLLLGG, is well conserved within the entire gadC coded protein at the amino terminal. The motif is also partially conserved among other antiporters (which are not coded by gadC) but involved in acid sensitive/resistance mechanism. Phylogenetic cluster analysis proves the relationship of Escherichia coli and Shigella flexneri. The gadC coded proteins are converged as a clade and diverged from other antiporters belongs to the amino acid-polyamine-organocation (APC) superfamily. PMID:21670792

  4. High-Throughput Sequencing and Copy Number Variation Detection Using Formalin Fixed Embedded Tissue in Metastatic Gastric Cancer

    PubMed Central

    Hong, Min Eui; Do, In-Gu; Kang, So Young; Ha, Sang Yun; Kim, Seung Tae; Park, Se Hoon; Kang, Won Ki; Choi, Min-Gew; Lee, Jun Ho; Sohn, Tae Sung; Bae, Jae Moon; Kim, Sung; Kim, Duk-Hwan; Kim, Kyoung-Mee

    2014-01-01

    In the era of targeted therapy, mutation profiling of cancer is a crucial aspect of making therapeutic decisions. To characterize cancer at a molecular level, the use of formalin-fixed paraffin-embedded tissue is important. We tested the Ion AmpliSeq Cancer Hotspot Panel v2 and nCounter Copy Number Variation Assay in 89 formalin-fixed paraffin-embedded gastric cancer samples to determine whether they are applicable in archival clinical samples for personalized targeted therapies. We validated the results with Sanger sequencing, real-time quantitative PCR, fluorescence in situ hybridization and immunohistochemistry. Frequently detected somatic mutations included TP53 (28.17%), APC (10.1%), PIK3CA (5.6%), KRAS (4.5%), SMO (3.4%), STK11 (3.4%), CDKN2A (3.4%) and SMAD4 (3.4%). Amplifications of HER2, CCNE1, MYC, KRAS and EGFR genes were observed in 8 (8.9%), 4 (4.5%), 2 (2.2%), 1 (1.1%) and 1 (1.1%) cases, respectively. In the cases with amplification, fluorescence in situ hybridization for HER2 verified gene amplification and immunohistochemistry for HER2, EGFR and CCNE1 verified the overexpression of proteins in tumor cells. In conclusion, we successfully performed semiconductor-based sequencing and nCounter copy number variation analyses in formalin-fixed paraffin-embedded gastric cancer samples. High-throughput screening in archival clinical samples enables faster, more accurate and cost-effective detection of hotspot mutations or amplification in genes. PMID:25372287

  5. Exploring variation-aware contig graphs for (comparative) metagenomics using MaryGold

    PubMed Central

    Nijkamp, Jurgen F.; Pop, Mihai; Reinders, Marcel J. T.; de Ridder, Dick

    2013-01-01

    Motivation: Although many tools are available to study variation and its impact in single genomes, there is a lack of algorithms for finding such variation in metagenomes. This hampers the interpretation of metagenomics sequencing datasets, which are increasingly acquired in research on the (human) microbiome, in environmental studies and in the study of processes in the production of foods and beverages. Existing algorithms often depend on the use of reference genomes, which pose a problem when a metagenome of a priori unknown strain composition is studied. In this article, we develop a method to perform reference-free detection and visual exploration of genomic variation, both within a single metagenome and between metagenomes. Results: We present the MaryGold algorithm and its implementation, which efficiently detects bubble structures in contig graphs using graph decomposition. These bubbles represent variable genomic regions in closely related strains in metagenomic samples. The variation found is presented in a condensed Circos-based visualization, which allows for easy exploration and interpretation of the found variation. We validated the algorithm on two simulated datasets containing three respectively seven Escherichia coli genomes and showed that finding allelic variation in these genomes improves assemblies. Additionally, we applied MaryGold to publicly available real metagenomic datasets, enabling us to find within-sample genomic variation in the metagenomes of a kimchi fermentation process, the microbiome of a premature infant and in microbial communities living on acid mine drainage. Moreover, we used MaryGold for between-sample variation detection and exploration by comparing sequencing data sampled at different time points for both of these datasets. Availability: MaryGold has been written in C++ and Python and can be downloaded from http://bioinformatics.tudelft.nl/software Contact: d.deridder@tudelft.nl PMID:24058058

  6. Method of increasing conversion of a fatty acid to its corresponding dicarboxylic acid

    DOEpatents

    Craft, David L.; Wilson, C. Ron; Eirich, Dudley; Zhang, Yeyan

    2004-09-14

    A nucleic acid sequence including a CYP promoter operably linked to nucleic acid encoding a heterologous protein is provided to increase transcription of the nucleic acid. Expression vectors and host cells containing the nucleic acid sequence are also provided. The methods and compositions described herein are especially useful in the production of polycarboxylic acids by yeast cells.

  7. Identification of eight mutations and three sequence variations in the cystic fibrosis transmembrane conductance regulator (CFTR) gene

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ghanem, N.; Costes, B.; Girodon, E.

    1994-05-15

    To determine cystic fibrosis (CF) defects in a sample of 224 non-[Delta]F508 CF chromosomes, the authors used denaturing gradient gel multiplex analysis of CF transmembrane conductance regulator gene segments, a strategy based on blind exhaustive analysis rather than a search for known mutations. This process allowed detection of 11 novel variations comprising two nonsense mutations (Q890X and W1204X), a splice defect (405 + 4 A [yields] G), a frameshift (3293delA), four presumed missense mutations (S912L, H949Y, L1065P, Q1071P), and three sequence polymorphisms (R31C or 223 C/T, 3471 T/C, and T1220I or 3791 C/T). The authors describe these variations, together withmore » the associated phenotype when defects on both CF chromosomes were identified. 8 refs., 1 fig., 1 tab.« less

  8. Sequence variations of the bovine prion protein gene (PRNP) in native Korean Hanwoo cattle

    PubMed Central

    Choi, Sangho

    2012-01-01

    Bovine spongiform encephalopathy (BSE) is one of the fatal neurodegenerative diseases known as transmissible spongiform encephalopathies (TSEs) caused by infectious prion proteins. Genetic variations correlated with susceptibility or resistance to TSE in humans and sheep have not been reported for bovine strains including those from Holstein, Jersey, and Japanese Black cattle. Here, we investigated bovine prion protein gene (PRNP) variations in Hanwoo cattle [Bos (B.) taurus coreanae], a native breed in Korea. We identified mutations and polymorphisms in the coding region of PRNP, determined their frequency, and evaluated their significance. We identified four synonymous polymorphisms and two non-synonymous mutations in PRNP, but found no novel polymorphisms. The sequence and number of octapeptide repeats were completely conserved, and the haplotype frequency of the coding region was similar to that of other B. taurus strains. When we examined the 23-bp and 12-bp insertion/deletion (indel) polymorphisms in the non-coding region of PRNP, Hanwoo cattle had a lower deletion allele and 23-bp del/12-bp del haplotype frequency than healthy and BSE-affected animals of other strains. Thus, Hanwoo are seemingly less susceptible to BSE than other strains due to the 23-bp and 12-bp indel polymorphisms. PMID:22705734

  9. Genetic variation and dynamics of infections of equid herpesvirus 5 in individual horses.

    PubMed

    Back, Helena; Ullman, Karin; Leijon, Mikael; Söderlund, Robert; Penell, Johanna; Ståhl, Karl; Pringle, John; Valarcher, Jean-François

    2016-01-01

    Equid herpesvirus 5 (EHV-5) is related to the human Epstein-Barr virus (human herpesvirus 4) and has frequently been observed in equine populations worldwide. EHV-5 was previously assumed to be low to non-pathogenic; however, studies have also related the virus to the severe lung disease equine multinodular pulmonary fibrosis (EMPF). Genetic information of EHV-5 is scanty: the whole genome was recently described and only limited nucleotide sequences are available. In this study, samples were taken twice 1 year apart from eight healthy horses at the same professional training yard and samples from a ninth horse that was diagnosed with EMPF with samples taken pre- and post-mortem to analyse partial glycoprotein B (gB) gene of EHV-5 by using next-generation sequencing. The analysis resulted in 27 partial gB gene sequences, 11 unique sequence types and five amino acid sequences. These sequences could be classified within four genotypes (I-IV) of the EHV-5 gB gene based on the degree of similarity of the nucleotide and amino acid sequences, and in this work horses were shown to be identified with up to three different genotypes simultaneously. The observations showed a range of interactions between EHV-5 and the host over time, where the same virus persists in some horses, whereas others have a more dynamic infection pattern including strains from different genotypes. This study provides insight into the genetic variation and dynamics of EHV-5, and highlights that further work is needed to understand the EHV-5 interaction with its host.

  10. Sequence and expression variations suggest an adaptive role for the DA1-like gene family in the evolution of soybeans.

    PubMed

    Zhao, Man; Gu, Yongzhe; He, Lingli; Chen, Qingshan; He, Chaoying

    2015-05-15

    The DA1 gene family is plant-specific and Arabidopsis DA1 regulates seed and organ size, but the functions in soybeans are unknown. The cultivated soybean (Glycine max) is believed to be domesticated from the annual wild soybeans (Glycine soja). To evaluate whether DA1-like genes were involved in the evolution of soybeans, we compared variation at both sequence and expression levels of DA1-like genes from G. max (GmaDA1) and G. soja (GsoDA1). Sequence identities were extremely high between the orthologous pairs between soybeans, while the paralogous copies in a soybean species showed a relatively high divergence. Moreover, the expression variation of DA1-like paralogous genes in soybean was much greater than the orthologous gene pairs between the wild and cultivated soybeans during development and challenging abiotic stresses such as salinity. We further found that overexpressing GsoDA1 genes did not affect seed size. Nevertheless, overexpressing them reduced transgenic Arabidopsis seed germination sensitivity to salt stress. Moreover, most of these genes could improve salt tolerance of the transgenic Arabidopsis plants, corroborated by a detection of expression variation of several key genes in the salt-tolerance pathways. Our work suggested that expression diversification of DA1-like genes is functionally associated with adaptive radiation of soybeans, reinforcing that the plant-specific DA1 gene family might have contributed to the successful adaption to complex environments and radiation of the plants.

  11. Chimeric proteins for detection and quantitation of DNA mutations, DNA sequence variations, DNA damage and DNA mismatches

    DOEpatents

    McCutchen-Maloney, Sandra L.

    2002-01-01

    Chimeric proteins having both DNA mutation binding activity and nuclease activity are synthesized by recombinant technology. The proteins are of the general formula A-L-B and B-L-A where A is a peptide having DNA mutation binding activity, L is a linker and B is a peptide having nuclease activity. The chimeric proteins are useful for detection and identification of DNA sequence variations including DNA mutations (including DNA damage and mismatches) by binding to the DNA mutation and cutting the DNA once the DNA mutation is detected.

  12. Amino acid sequences of peptides from a tryptic digest of a urea-soluble protein fraction (U.S.3) from oxidized wool

    PubMed Central

    Corfield, M. C.; Fletcher, J. C.; Robson, A.

    1967-01-01

    1. A tryptic digest of the protein fraction U.S.3 from oxidized wool has been separated into 32 peptide fractions by cation-exchange resin chromatography. 2. Most of these fractions have been resolved into their component peptides by a combination of the techniques of cation-exchange resin chromatography, paper chromatography and paper electrophoresis. 3. The amino acid compositions of 58 of the peptides in the digest present in the largest amounts have been determined. 4. The amino acid sequences of 38 of these have been completely elucidated and those of six others partially derived. 5. These findings indicate that the parent protein in wool from which the protein fraction U.S.3 is derived has a minimum molecular weight of 74000. 6. The structures of wool proteins are discussed in the light of the peptide sequences determined, and, in particular, of those sequences in fraction U.S.3 that could not be elucidated. PMID:16742497

  13. Identification of copy number variations associated with congenital heart disease by chromosomal microarray analysis and next-generation sequencing.

    PubMed

    Zhu, Xiangyu; Li, Jie; Ru, Tong; Wang, Yaping; Xu, Yan; Yang, Ying; Wu, Xing; Cram, David S; Hu, Yali

    2016-04-01

    To determine the type and frequency of pathogenic chromosomal abnormalities in fetuses diagnosed with congenital heart disease (CHD) using chromosomal microarray analysis (CMA) and validate next-generation sequencing as an alternative diagnostic method. Chromosomal aneuploidies and submicroscopic copy number variations (CNVs) were identified in amniocytes DNA samples from CHD fetuses using high-resolution CMA and copy number variation sequencing (CNV-Seq). Overall, 21 of 115 CHD fetuses (18.3%) referred for CMA had a pathogenic chromosomal anomaly. In six of 73 fetuses (8.2%) with an isolated CHD, CMA identified two cases of DiGeorge syndrome, and one case each of 1q21.1 microdeletion, 16p11.2 microdeletion and Angelman/Prader Willi syndromes, and 22q11.21 microduplication syndrome. In 12 of 42 fetuses (28.6%) with CHD and additional structural abnormalities, CMA identified eight whole or partial trisomies (19.0%), five CNVs (11.9%) associated with DiGeorge, Wolf-Hirschhorn, Miller-Dieker, Cri du Chat and Blepharophimosis, Ptosis, and Epicanthus Inversus syndromes and four other rare pathogenic CNVs (9.5%). Overall, there was a 100% diagnostic concordance between CMA and CNV-Seq for detecting all 21 pathogenic chromosomal abnormalities associated with CHD. CMA and CNV-Seq are reliable and accurate prenatal techniques for identifying pathogenic fetal chromosomal abnormalities associated with cardiac defects. © 2016 John Wiley & Sons, Ltd. © 2016 John Wiley & Sons, Ltd.

  14. Isolation, sequencing and expression of RED, a novel human gene encoding an acidic-basic dipeptide repeat.

    PubMed

    Assier, E; Bouzinba-Segard, H; Stolzenberg, M C; Stephens, R; Bardos, J; Freemont, P; Charron, D; Trowsdale, J; Rich, T

    1999-04-16

    A novel human gene RED, and the murine homologue, MuRED, were cloned. These genes were named after the extensive stretch of alternating arginine (R) and glutamic acid (E) or aspartic acid (D) residues that they contain. We term this the 'RED' repeat. The genes of both species were expressed in a wide range of tissues and we have mapped the human gene to chromosome 5q22-24. MuRED and RED shared 98% sequence identity at the amino acid level. The open reading frame of both genes encodes a 557 amino acid protein. RED fused to a fluorescent tag was expressed in nuclei of transfected cells and localised to nuclear dots. Co-localisation studies showed that these nuclear dots did not contain either PML or Coilin, which are commonly found in the POD or coiled body nuclear compartments. Deletion of the amino terminal 265 amino acids resulted in a failure to sort efficiently to the nucleus, though nuclear dots were formed. Deletion of a further 50 amino acids from the amino terminus generates a protein that can sort to the nucleus but is unable to generate nuclear dots. Neither construct localised to the nucleolus. The characteristics of RED and its nuclear localisation implicate it as a regulatory protein, possibly involved in transcription.

  15. CNV-CH: A Convex Hull Based Segmentation Approach to Detect Copy Number Variations (CNV) Using Next-Generation Sequencing Data

    PubMed Central

    De, Rajat K.

    2015-01-01

    Copy number variation (CNV) is a form of structural alteration in the mammalian DNA sequence, which are associated with many complex neurological diseases as well as cancer. The development of next generation sequencing (NGS) technology provides us a new dimension towards detection of genomic locations with copy number variations. Here we develop an algorithm for detecting CNVs, which is based on depth of coverage data generated by NGS technology. In this work, we have used a novel way to represent the read count data as a two dimensional geometrical point. A key aspect of detecting the regions with CNVs, is to devise a proper segmentation algorithm that will distinguish the genomic locations having a significant difference in read count data. We have designed a new segmentation approach in this context, using convex hull algorithm on the geometrical representation of read count data. To our knowledge, most algorithms have used a single distribution model of read count data, but here in our approach, we have considered the read count data to follow two different distribution models independently, which adds to the robustness of detection of CNVs. In addition, our algorithm calls CNVs based on the multiple sample analysis approach resulting in a low false discovery rate with high precision. PMID:26291322

  16. CNV-CH: A Convex Hull Based Segmentation Approach to Detect Copy Number Variations (CNV) Using Next-Generation Sequencing Data.

    PubMed

    Sinha, Rituparna; Samaddar, Sandip; De, Rajat K

    2015-01-01

    Copy number variation (CNV) is a form of structural alteration in the mammalian DNA sequence, which are associated with many complex neurological diseases as well as cancer. The development of next generation sequencing (NGS) technology provides us a new dimension towards detection of genomic locations with copy number variations. Here we develop an algorithm for detecting CNVs, which is based on depth of coverage data generated by NGS technology. In this work, we have used a novel way to represent the read count data as a two dimensional geometrical point. A key aspect of detecting the regions with CNVs, is to devise a proper segmentation algorithm that will distinguish the genomic locations having a significant difference in read count data. We have designed a new segmentation approach in this context, using convex hull algorithm on the geometrical representation of read count data. To our knowledge, most algorithms have used a single distribution model of read count data, but here in our approach, we have considered the read count data to follow two different distribution models independently, which adds to the robustness of detection of CNVs. In addition, our algorithm calls CNVs based on the multiple sample analysis approach resulting in a low false discovery rate with high precision.

  17. Variation in the genomic locations and sequence conservation of STAR elements among staphylococcal species provides insight into DNA repeat evolution

    PubMed Central

    2012-01-01

    Background Staphylococcus aureus Repeat (STAR) elements are a type of interspersed intergenic direct repeat. In this study the conservation and variation in these elements was explored by bioinformatic analyses of published staphylococcal genome sequences and through sequencing of specific STAR element loci from a large set of S. aureus isolates. Results Using bioinformatic analyses, we found that the STAR elements were located in different genomic loci within each staphylococcal species. There was no correlation between the number of STAR elements in each genome and the evolutionary relatedness of staphylococcal species, however higher levels of repeats were observed in both S. aureus and S. lugdunensis compared to other staphylococcal species. Unexpectedly, sequencing of the internal spacer sequences of individual repeat elements from multiple isolates showed conservation at the sequence level within deep evolutionary lineages of S. aureus. Whilst individual STAR element loci were demonstrated to expand and contract, the sequences associated with each locus were stable and distinct from one another. Conclusions The high degree of lineage and locus-specific conservation of these intergenic repeat regions suggests that STAR elements are maintained due to selective or molecular forces with some of these elements having an important role in cell physiology. The high prevalence in two of the more virulent staphylococcal species is indicative of a potential role for STAR elements in pathogenesis. PMID:23020678

  18. Isolation and amino acid sequences of squirrel monkey (Saimiri sciurea) insulin and glucagon

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yu, Jinghua; Eng, J.; Yalow, R.S.

    1990-12-01

    It was reported two decades ago that insulin was not detectable in the glucose-stimulated state in Saimiri sciurea, the New World squirrel monkey, by a radioimmunoassay system developed with guinea pig anti-pork insulin antibody and labeled park insulin. With the same system, reasonable levels were observed in rhesus monkeys and chimpanzees. This suggested that New World monkeys, like the New World hystricomorph rodents such as the guinea pig and the coypu, might have insulins whose sequences differ markedly from those of Old World mammals. In this report the authors describe the purification and amino acid sequences of squirrel monkey insulinmore » and glucagon. They demonstrate that the substitutions at B29, B27, A2, A4, and A17 of squirrel monkey insulin are identical with those previously found in another New World primate, the owl monkey (Aotus trivirgatus). The immunologic cross-reactivity of this insulin in their immunoassay system is only a few percent of that of human insulin. It appears that the peptides of the New World monkeys have diverged less from those of the Old World mammals than have those of the New World hystricomorph rodents. The striking improvements in peptide purification and sequencing have the potential for adding new information concerning the evolutionary divergence of species.« less

  19. Subcellular location prediction of proteins using support vector machines with alignment of block sequences utilizing amino acid composition.

    PubMed

    Tamura, Takeyuki; Akutsu, Tatsuya

    2007-11-30

    Subcellular location prediction of proteins is an important and well-studied problem in bioinformatics. This is a problem of predicting which part in a cell a given protein is transported to, where an amino acid sequence of the protein is given as an input. This problem is becoming more important since information on subcellular location is helpful for annotation of proteins and genes and the number of complete genomes is rapidly increasing. Since existing predictors are based on various heuristics, it is important to develop a simple method with high prediction accuracies. In this paper, we propose a novel and general predicting method by combining techniques for sequence alignment and feature vectors based on amino acid composition. We implemented this method with support vector machines on plant data sets extracted from the TargetP database. Through fivefold cross validation tests, the obtained overall accuracies and average MCC were 0.9096 and 0.8655 respectively. We also applied our method to other datasets including that of WoLF PSORT. Although there is a predictor which uses the information of gene ontology and yields higher accuracy than ours, our accuracies are higher than existing predictors which use only sequence information. Since such information as gene ontology can be obtained only for known proteins, our predictor is considered to be useful for subcellular location prediction of newly-discovered proteins. Furthermore, the idea of combination of alignment and amino acid frequency is novel and general so that it may be applied to other problems in bioinformatics. Our method for plant is also implemented as a web-system and available on http://sunflower.kuicr.kyoto-u.ac.jp/~tamura/slpfa.html.

  20. Amino acid sequences of the ribosomal proteins HL30 and HmaL5 from the archaebacterium Halobacterium marismortui.

    PubMed

    Hatakeyama, T; Hatakeyama, T

    1990-07-06

    The complete amino acid sequences of the ribosomal proteins HL30 and HmaL5 from the archaebacterium Halobacterium marismortui were determined. Protein HL30 was found to be acetylated at its N-terminal amino acid and shows homology to the eukaryotic ribosomal proteins YL34 from yeast and RL31 from rat. Protein HmaL5 was homologous to the protein L5 from Escherichia coli and Bacillus stearothermophilus as well as to YL16 from yeast. HmaL5 shows more similarities to its eukaryotic counterpart than to eubacterial ones.

  1. Whole-genome sequencing of a Plasmodium vivax clinical isolate exhibits geographical characteristics and high genetic variation in China-Myanmar border area.

    PubMed

    Chen, Shen-Bo; Wang, Yue; Kassegne, Kokouvi; Xu, Bin; Shen, Hai-Mo; Chen, Jun-Hu

    2017-02-06

    Currently in China, the trend of Plasmodium vivax cases imported from Southeast Asia was increased especially in the China-Myanmar border area. Driven by the increase in P. vivax cases and stronger need for vaccine and drug development, several P. vivax isolates genome sequencing projects are underway. However, little is known about the genetic variability in this area until now. The sequencing of the first P. vivax isolate from China-Myanmar border area (CMB-1) generated 120 million paired-end reads. A percentage of 10.6 of the quality-evaluated reads were aligned onto 99.9% of the reference strain Sal I genome in 62-fold coverage with an average of 4.8 SNPs per kb. We present a 539-SNP marker data set for P. vivax that can identify different parasites from different geographic origins with high sensitivity. We also identified exceptionally high levels of genetic variability in members of multigene families such as RBP, SERA, vir, MSP3 and AP2. The de-novo assembly yielded a database composed of 8,409 contigs with N50 lengths of 6.6 kb and revealed 661 novel predicted genes including 78 vir genes, suggesting a greater functional variation in P. vivax from this area. Our result contributes to a better understanding of P. vivax genetic variation, and provides a fundamental basis for the geographic differentiation of vivax malaria from China-Myanmar border area using a direct sequencing approach without leukocyte depletion. This novel sequencing method can be used as an essential tool for the genomic research of P. vivax in the near future.

  2. Mammoth and Mastodon collagen sequences; survival and utility

    NASA Astrophysics Data System (ADS)

    Buckley, M.; Larkin, N.; Collins, M.

    2011-04-01

    Near-complete collagen (I) sequences are proposed for elephantid and mammutid taxa, based upon available African elephant genomic data and supported with LC-MALDI-MS/MS and LC-ESI-MS/MS analyses of collagen digests from proboscidean bone. Collagen sequence coverage was investigated from several specimens of two extinct mammoths ( Mammuthus trogontherii and Mammuthus primigenius), the extinct American mastodon ( Mammut americanum), the extinct straight-tusked elephant ( Elephas ( Palaeoloxodon) antiquus) and extant Asian ( Elephas maximus) and African ( Loxodonta africana) elephants and compared between the two ionization techniques used. Two suspected mammoth fossils from the British Middle Pleistocene (Cromerian) deposits of the West Runton Forest Bed were analysed to investigate the potential use of peptide mass spectrometry for fossil identification. Despite the age of the fossils, sufficient peptides were obtained to identify these as elephantid, and sufficient sequence variation to discriminate elephantid and mammutid collagen (I). In-depth LC-MS analyses further failed to identify a peptide that could be used to reliably distinguish between the three genera of elephantids ( Elephas, Loxodonta and Mammuthus), an observation consistent with predicted amino acid substitution rates between these species.

  3. Characterization of durum wheat high molecular weight glutenin subunits Bx20 and By20 sequences by a molecular and proteomic approach.

    PubMed

    Santagati, Vito Davide; Sestili, Francesco; Lafiandra, Domenico; D'Ovidio, Renato; Rogniaux, Helene; Masci, Stefania

    2016-07-01

    Wheat high molecular weight glutenin subunit variation is important because of its great influence on glutenin polymer structure, that is related to dough technological properties. Among the different subunits, the pair Bx20 and By20 is known to have a negative effect on quality, but the reasons are not clear: Bx20 has two cysteines, which theoretically make this subunit a chain extender of the glutenin polymer, just like the other Bx subunits, showing four cysteines, two of which should be involved in intra-molecular disulfide bonds. By20 has never been characterized so far at molecular level. Here we report the nucleotide sequences of Bx20 and By20 genes isolated from the durum wheat cultivar 'Lira 45' and the validation of the corresponding deduced amino acid sequences by using MALDI-TOF and LC-MS/MS. Four nucleotide differences were identified in the Bx20 gene with respect to the deduced sequence present in NCBI, causing two amino acid substitutions. For the By20 subunit, nucleotide and amino acid sequences revealed a great similarity to By15, both at gene and protein levels, showing five nucleotide changes generating two amino acid differences. No evidence of post-translational modifications has been found. Hypotheses are formulated in regard to relationships with technological quality. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  4. A putative carbohydrate-binding domain of the lactose-binding Cytisus sessilifolius anti-H(O) lectin has a similar amino acid sequence to that of the L-fucose-binding Ulex europaeus anti-H(O) lectin.

    PubMed

    Konami, Y; Yamamoto, K; Osawa, T; Irimura, T

    1995-04-01

    The complete amino acid sequence of a lactose-binding Cytisus sessilifolius anti-H(O) lectin II (CSA-II) was determined using a protein sequencer. After digestion of CSA-II with endoproteinase Lys-C or Asp-N, the resulting peptides were purified by reversed-phase high performance liquid chromatography (HPLC) and then subjected to sequence analysis. Comparison of the complete amino acid sequence of CSA-II with the sequences of other leguminous seed lectins revealed regions of extensive homology. The amino acid sequence of a putative carbohydrate-binding domain of CSA-II was found to be similar to those of several anti-H(O) leguminous lectins, especially to that of the L-fucose-binding Ulex europaeus lectin I (UEA-I).

  5. Sequence analysis of Jembrana disease virus strains reveals a genetically stable lentivirus.

    PubMed

    Desport, Moira; Stewart, Meredith E; Mikosza, Andrew S; Sheridan, Carol A; Peterson, Shane E; Chavand, Olivier; Hartaningsih, Nining; Wilcox, Graham E

    2007-06-01

    Jembrana disease virus (JDV) is a lentivirus associated with an acute disease syndrome with a 20% case fatality rate in Bos javanicus (Bali cattle) in Indonesia, occurring after a short incubation period and with no recurrence of the disease after recovery. Partial regions of gag and pol and the entire env were examined for sequence variation in DNA samples from cases of Jembrana disease obtained from Bali, Sumatra and South Kalimantan in Indonesian Borneo. A high level of nucleotide conservation (97-100%) was observed in gag sequences from samples taken in Bali and Sumatra, indicating that the source of JDV in Sumatra was most likely to have originated from Bali. The pol sequences and, unexpectedly, the env sequences from Bali samples were also well conserved with low nucleotide (96-99%) and amino acid substitutions (95-99%). However, the sample from South Kalimantan (JDV(KAL/01)) contained more divergent sequences, particularly in env (88% identity). Phylogenetic analysis revealed that the JDV(KAL/01)env sequences clustered with the sequence from the Pulukan sample (Bali) from 2001. JDV appears to be remarkably stable genetically and has undergone minor genetic changes over a period of nearly 20 years in Bali despite becoming endemic in the cattle population of the island.

  6. DNA Sequence Variation at the Period Locus within and among Species of the Drosophila Melanogaster Complex

    PubMed Central

    Kliman, R. M.; Hey, J.

    1993-01-01

    A 1.9-kilobase region of the period locus was sequenced in six individuals of Drosophila melanogaster and from six individuals of each of three sibling species: Drosophila simulans, Drosophila sechellia and Drosophila mauritiana. Extensive genealogical analysis of 174 polymorphic sites reveals a complex history. It appears that D. simulans, as a large population still segregating very old lineages, gave rise to the island species D. mauritiana and D. sechellia. Rather than considering these speciation events as having produced ``sister'' taxa, it seems more appropriate to consider D. simulans a parent species to D. sechellia and D. mauritiana. The order, in time, of these two phylogenetic events remains unclear. D. mauritiana supports a large number of polymorphisms, many of which are shared with D. simulans, and so appears to have begun and persisted as a large population. In contrast, D. sechellia has very little variation and seems to have experienced a severe population bottleneck. Alternatively, the low variation in D. sechellia could be due to recent directional selection and genetic hitchhiking at or near the per locus. PMID:8436278

  7. Sequence quality analysis tool for HIV type 1 protease and reverse transcriptase.

    PubMed

    Delong, Allison K; Wu, Mingham; Bennett, Diane; Parkin, Neil; Wu, Zhijin; Hogan, Joseph W; Kantor, Rami

    2012-08-01

    Access to antiretroviral therapy is increasing globally and drug resistance evolution is anticipated. Currently, protease (PR) and reverse transcriptase (RT) sequence generation is increasing, including the use of in-house sequencing assays, and quality assessment prior to sequence analysis is essential. We created a computational HIV PR/RT Sequence Quality Analysis Tool (SQUAT) that runs in the R statistical environment. Sequence quality thresholds are calculated from a large dataset (46,802 PR and 44,432 RT sequences) from the published literature ( http://hivdb.Stanford.edu ). Nucleic acid sequences are read into SQUAT, identified, aligned, and translated. Nucleic acid sequences are flagged if with >five 1-2-base insertions; >one 3-base insertion; >one deletion; >six PR or >18 RT ambiguous bases; >three consecutive PR or >four RT nucleic acid mutations; >zero stop codons; >three PR or >six RT ambiguous amino acids; >three consecutive PR or >four RT amino acid mutations; >zero unique amino acids; or <0.5% or >15% genetic distance from another submitted sequence. Thresholds are user modifiable. SQUAT output includes a summary report with detailed comments for troubleshooting of flagged sequences, histograms of pairwise genetic distances, neighbor joining phylogenetic trees, and aligned nucleic and amino acid sequences. SQUAT is a stand-alone, free, web-independent tool to ensure use of high-quality HIV PR/RT sequences in interpretation and reporting of drug resistance, while increasing awareness and expertise and facilitating troubleshooting of potentially problematic sequences.

  8. Transcriptome sequencing reveals genome-wide variation in molecular evolutionary rate among ferns.

    PubMed

    Grusz, Amanda L; Rothfels, Carl J; Schuettpelz, Eric

    2016-08-30

    Transcriptomics in non-model plant systems has recently reached a point where the examination of nuclear genome-wide patterns in understudied groups is an achievable reality. This progress is especially notable in evolutionary studies of ferns, for which molecular resources to date have been derived primarily from the plastid genome. Here, we utilize transcriptome data in the first genome-wide comparative study of molecular evolutionary rate in ferns. We focus on the ecologically diverse family Pteridaceae, which comprises about 10 % of fern diversity and includes the enigmatic vittarioid ferns-an epiphytic, tropical lineage known for dramatically reduced morphologies and radically elongated phylogenetic branch lengths. Using expressed sequence data for 2091 loci, we perform pairwise comparisons of molecular evolutionary rate among 12 species spanning the three largest clades in the family and ask whether previously documented heterogeneity in plastid substitution rates is reflected in their nuclear genomes. We then inquire whether variation in evolutionary rate is being shaped by genes belonging to specific functional categories and test for differential patterns of selection. We find significant, genome-wide differences in evolutionary rate for vittarioid ferns relative to all other lineages within the Pteridaceae, but we recover few significant correlations between faster/slower vittarioid loci and known functional gene categories. We demonstrate that the faster rates characteristic of the vittarioid ferns are likely not driven by positive selection, nor are they unique to any particular type of nucleotide substitution. Our results reinforce recently reviewed mechanisms hypothesized to shape molecular evolutionary rates in vittarioid ferns and provide novel insight into substitution rate variation both within and among fern nuclear genomes.

  9. Variation block-based genomics method for crop plants.

    PubMed

    Kim, Yul Ho; Park, Hyang Mi; Hwang, Tae-Young; Lee, Seuk Ki; Choi, Man Soo; Jho, Sungwoong; Hwang, Seungwoo; Kim, Hak-Min; Lee, Dongwoo; Kim, Byoung-Chul; Hong, Chang Pyo; Cho, Yun Sung; Kim, Hyunmin; Jeong, Kwang Ho; Seo, Min Jung; Yun, Hong Tai; Kim, Sun Lim; Kwon, Young-Up; Kim, Wook Han; Chun, Hye Kyung; Lim, Sang Jong; Shin, Young-Ah; Choi, Ik-Young; Kim, Young Sun; Yoon, Ho-Sung; Lee, Suk-Ha; Lee, Sunghoon

    2014-06-15

    In contrast with wild species, cultivated crop genomes consist of reshuffled recombination blocks, which occurred by crossing and selection processes. Accordingly, recombination block-based genomics analysis can be an effective approach for the screening of target loci for agricultural traits. We propose the variation block method, which is a three-step process for recombination block detection and comparison. The first step is to detect variations by comparing the short-read DNA sequences of the cultivar to the reference genome of the target crop. Next, sequence blocks with variation patterns are examined and defined. The boundaries between the variation-containing sequence blocks are regarded as recombination sites. All the assumed recombination sites in the cultivar set are used to split the genomes, and the resulting sequence regions are termed variation blocks. Finally, the genomes are compared using the variation blocks. The variation block method identified recurring recombination blocks accurately and successfully represented block-level diversities in the publicly available genomes of 31 soybean and 23 rice accessions. The practicality of this approach was demonstrated by the identification of a putative locus determining soybean hilum color. We suggest that the variation block method is an efficient genomics method for the recombination block-level comparison of crop genomes. We expect that this method will facilitate the development of crop genomics by bringing genomics technologies to the field of crop breeding.

  10. Variation in gymnemic acid content and non-destructive harvesting of Gymnema sylvestre (Gudmar)

    PubMed Central

    Pandey, Ashok Kumar; Yadav, Swati

    2010-01-01

    Background: Madhunashini (Gymnema sylvestre R. Br.) commonly known as ‘Gudmar’ in Hindi is an important medicinal climber and extensively used in almost all Indian System of Medicine as a remedy for diabetes, rheumatism, cough, ulcer, jaundice, dyspepsia, constipation, eyes pain and also in snakebite. In India, it is found growing in Andhra Pradesh, Bihar, Chhattisgarh, Karnataka, Kerala, Madhya Pradesh, Maharastra, Orissa, Tamil Nadu, Uttar Pradesh and West Bengal. The major phytoconstituents are gymnemic acids, gudmarin and saponins. Methods: In the present study, Gymnema germplasm collected from various regions of Madhya Pradesh was evaluated on the basis of their morphological characteristics and gymnemic acid content. Gymnemic acid content in the leaves was estimated by HPLC. We have also standardized the non-destructive harvesting practices of Gudmar. Selective harvesting was done without harming the main plant. Only mature leaves (60%) were hand plucked in the month of October. Second harvest was done in the month of June. Results: Data revealed that gymnemic acid content varied between 0.96% ± 0.03 (Seoni) to 1.58% ±0.03 (Amarkantak). It was also observed that the leaves left at the time of 1st harvest during October matured in June at the time of 2nd harvest. Conclusion: Non destructive harvesting practice did not have any negative impact on overall development of the plant. It is evident that there is wide variation in the morphological characteristics and gymnemic acid content in G. sylvestre collected from various locations, which can be exploited for further crop improvement programmes. PMID:21589758

  11. Variation in gymnemic acid content and non-destructive harvesting of Gymnema sylvestre (Gudmar).

    PubMed

    Pandey, Ashok Kumar; Yadav, Swati

    2010-09-01

    Madhunashini (Gymnema sylvestre R. Br.) commonly known as 'Gudmar' in Hindi is an important medicinal climber and extensively used in almost all Indian System of Medicine as a remedy for diabetes, rheumatism, cough, ulcer, jaundice, dyspepsia, constipation, eyes pain and also in snakebite. In India, it is found growing in Andhra Pradesh, Bihar, Chhattisgarh, Karnataka, Kerala, Madhya Pradesh, Maharastra, Orissa, Tamil Nadu, Uttar Pradesh and West Bengal. The major phytoconstituents are gymnemic acids, gudmarin and saponins. In the present study, Gymnema germplasm collected from various regions of Madhya Pradesh was evaluated on the basis of their morphological characteristics and gymnemic acid content. Gymnemic acid content in the leaves was estimated by HPLC. We have also standardized the non-destructive harvesting practices of Gudmar. Selective harvesting was done without harming the main plant. Only mature leaves (60%) were hand plucked in the month of October. Second harvest was done in the month of June. Data revealed that gymnemic acid content varied between 0.96% ± 0.03 (Seoni) to 1.58% ±0.03 (Amarkantak). It was also observed that the leaves left at the time of 1(st) harvest during October matured in June at the time of 2(nd) harvest. Non destructive harvesting practice did not have any negative impact on overall development of the plant. It is evident that there is wide variation in the morphological characteristics and gymnemic acid content in G. sylvestre collected from various locations, which can be exploited for further crop improvement programmes.

  12. Sequences for Student Investigation

    ERIC Educational Resources Information Center

    Barton, Jeffrey; Feil, David; Lartigue, David; Mullins, Bernadette

    2004-01-01

    We describe two classes of sequences that give rise to accessible problems for undergraduate research. These problems may be understood with virtually no prerequisites and are well suited for computer-aided investigation. The first sequence is a variation of one introduced by Stephen Wolfram in connection with his study of cellular automata. The…

  13. Microfluidic droplet enrichment for targeted sequencing

    PubMed Central

    Eastburn, Dennis J.; Huang, Yong; Pellegrino, Maurizio; Sciambi, Adam; Ptáček, Louis J.; Abate, Adam R.

    2015-01-01

    Targeted sequence enrichment enables better identification of genetic variation by providing increased sequencing coverage for genomic regions of interest. Here, we report the development of a new target enrichment technology that is highly differentiated from other approaches currently in use. Our method, MESA (Microfluidic droplet Enrichment for Sequence Analysis), isolates genomic DNA fragments in microfluidic droplets and performs TaqMan PCR reactions to identify droplets containing a desired target sequence. The TaqMan positive droplets are subsequently recovered via dielectrophoretic sorting, and the TaqMan amplicons are removed enzymatically prior to sequencing. We demonstrated the utility of this approach by generating an average 31.6-fold sequence enrichment across 250 kb of targeted genomic DNA from five unique genomic loci. Significantly, this enrichment enabled a more comprehensive identification of genetic polymorphisms within the targeted loci. MESA requires low amounts of input DNA, minimal prior locus sequence information and enriches the target region without PCR bias or artifacts. These features make it well suited for the study of genetic variation in a number of research and diagnostic applications. PMID:25873629

  14. Identification of structural variation in mouse genomes.

    PubMed

    Keane, Thomas M; Wong, Kim; Adams, David J; Flint, Jonathan; Reymond, Alexandre; Yalcin, Binnaz

    2014-01-01

    Structural variation is variation in structure of DNA regions affecting DNA sequence length and/or orientation. It generally includes deletions, insertions, copy-number gains, inversions, and transposable elements. Traditionally, the identification of structural variation in genomes has been challenging. However, with the recent advances in high-throughput DNA sequencing and paired-end mapping (PEM) methods, the ability to identify structural variation and their respective association to human diseases has improved considerably. In this review, we describe our current knowledge of structural variation in the mouse, one of the prime model systems for studying human diseases and mammalian biology. We further present the evolutionary implications of structural variation on transposable elements. We conclude with future directions on the study of structural variation in mouse genomes that will increase our understanding of molecular architecture and functional consequences of structural variation.

  15. Dna Sequencing

    DOEpatents

    Tabor, Stanley; Richardson, Charles C.

    1995-04-25

    A method for sequencing a strand of DNA, including the steps off: providing the strand of DNA; annealing the strand with a primer able to hybridize to the strand to give an annealed mixture; incubating the mixture with four deoxyribonucleoside triphosphates, a DNA polymerase, and at least three deoxyribonucleoside triphosphates in different amounts, under conditions in favoring primer extension to form nucleic acid fragments complementory to the DNA to be sequenced; labelling the nucleic and fragments; separating them and determining the position of the deoxyribonucleoside triphosphates by differences in the intensity of the labels, thereby to determine the DNA sequence.

  16. Molecular mechanisms of epigenetic variation in plants.

    PubMed

    Fujimoto, Ryo; Sasaki, Taku; Ishikawa, Ryo; Osabe, Kenji; Kawanabe, Takahiro; Dennis, Elizabeth S

    2012-01-01

    Natural variation is defined as the phenotypic variation caused by spontaneous mutations. In general, mutations are associated with changes of nucleotide sequence, and many mutations in genes that can cause changes in plant development have been identified. Epigenetic change, which does not involve alteration to the nucleotide sequence, can also cause changes in gene activity by changing the structure of chromatin through DNA methylation or histone modifications. Now there is evidence based on induced or spontaneous mutants that epigenetic changes can cause altering plant phenotypes. Epigenetic changes have occurred frequently in plants, and some are heritable or metastable causing variation in epigenetic status within or between species. Therefore, heritable epigenetic variation as well as genetic variation has the potential to drive natural variation.

  17. Sequence of a cDNA encoding pancreatic preprosomatostatin-22.

    PubMed Central

    Magazin, M; Minth, C D; Funckes, C L; Deschenes, R; Tavianini, M A; Dixon, J E

    1982-01-01

    We report the nucleotide sequence of a precursor to somatostatin that upon proteolytic processing may give rise to a hormone of 22 amino acids. The nucleotide sequence of a cDNA from the channel catfish (Ictalurus punctatus) encodes a precursor to somatostatin that is 105 amino acids (Mr, 11,500). The cDNA coding for somatostatin-22 consists of 36 nucleotides in the 5' untranslated region, 315 nucleotides that code for the precursor to somatostatin-22, 269 nucleotides at the 3' untranslated region, and a variable length of poly(A). The putative preprohormone contains a sequence of hydrophobic amino acids at the amino terminus that has the properties of a "signal" peptide. A connecting sequence of approximately 57 amino acids is followed by a single Arg-Arg sequence, which immediately precedes the hormone. Somatostatin-22 is homologous to somatostatin-14 in 7 of the 14 amino acids, including the Phe-Trp-Lys sequence. Hybridization selection of mRNA, followed by its translation in a wheat germ cell-free system, resulted in the synthesis of a single polypeptide having a molecular weight of approximately 10,000 as estimated on Na-DodSO4/polyacrylamide gels. Images PMID:6127673

  18. Variation in conserved non-coding sequences on chromosome 5q andsusceptibility to asthma and atopy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Donfack, Joseph; Schneider, Daniel H.; Tan, Zheng

    2005-09-10

    Background: Evolutionarily conserved sequences likely havebiological function. Methods: To determine whether variation in conservedsequences in non-coding DNA contributes to risk for human disease, westudied six conserved non-coding elements in the Th2 cytokine cluster onhuman chromosome 5q31 in a large Hutterite pedigree and in samples ofoutbred European American and African American asthma cases and controls.Results: Among six conserved non-coding elements (>100 bp,>70percent identity; human-mouse comparison), we identified one singlenucleotide polymorphism (SNP) in each of two conserved elements and sixSNPs in the flanking regions of three conserved elements. We genotypedour samples for four of these SNPs and an additional three SNPs eachmore » inthe IL13 and IL4 genes. While there was only modest evidence forassociation with single SNPs in the Hutterite and European Americansamples (P<0.05), there were highly significant associations inEuropean Americans between asthma and haplotypes comprised of SNPs in theIL4 gene (P<0.001), including a SNP in a conserved non-codingelement. Furthermore, variation in the IL13 gene was strongly associatedwith total IgE (P = 0.00022) and allergic sensitization to mold allergens(P = 0.00076) in the Hutterites, and more modestly associated withsensitization to molds in the European Americans and African Americans (P<0.01). Conclusion: These results indicate that there is overalllittle variation in the conserved non-coding elements on 5q31, butvariation in IL4 and IL13, including possibly one SNP in a conservedelement, influence asthma and atopic phenotypes in diversepopulations.« less

  19. Persistent Sub-Yearly Chromospheric Variations in Lower Main-Sequence Stars: Tau Booe and alpha Com

    NASA Technical Reports Server (NTRS)

    Maulik, Davesh; Donahue, Robert A.; Baliunas, Sallie L.

    1997-01-01

    The recent discoveries of extrasolar planetary systems around lower main-sequence stars such as tau Booe (HD 120136) has prompted further investigation into their stellar activity. A cursory analysis of tau Booe for cyclic chromospheric activity, based on its 30-yr record of Ca 2 H and K fluxes obtained as part of the HK Project from Mount Wilson Observatory, finds an intermediate, sub-yearly period (approximately 117 d) in chromospheric activity in addition to, and separate from, both its rotation (3.3 d) and long-term variability. As a persistent subyearly period in surface magnetic activity is unprecedented, we investigate this apparent anomaly further by examining chromospheric activity levels of other stars with similar mass, searching for variability in chromospheric activity with periods of less than one year, but longer than measured or predicted rotation. An examination of the time series of 40 mid-to-late F dwarfs yielded one other star for further analysis: alpha Com (HD 114378, P approximately 132 d). The variations for these two stars were checked for persistence and coherence. Based on these determinations, we eliminate the possibilities of rotation, long-term activity cycle, and the evolution of active regions as the cause of this variation in both stars. In particular, for tau Booe we infer that the phenomenon may be chromospheric in origin; however, beyond this, it is difficult to identify anything further regarding the cause of the activity variations, or even whether the observed modulation in the two stars have the same origin.

  20. Teratogenicity of isotretinoin revisited: species variation and the role of all-trans-retinoic acid.

    PubMed

    Nau, H

    2001-11-01

    This paper reviews the teratogenicity of isotretinoin in regard to aspects of species variation, toxicokinetics, and metabolism. Particular emphasis is given to the hypothesis that most effects of isotretinoin (13-cis-retinoic acid) are mediated by isomerization to the all-trans-retinoic acid. This mechanism of action would provide a basis for the understanding of species differences and the extrapolation of experimental results to the human situation and thus improve drug development. The insensitive species (rat, mouse) eliminate the drug rapidly through detoxification to the beta-glucuronide; also, placental transfer is limited in these species. On the other hand, in sensitive species (primates), the drug is predominantly metabolized to the active 13-cis-4-oxo-retinoic acid; placental transfer is more extensive here. The beta-glucuronides showed limited placental transfer in all species examined; these metabolites exhibited very low, if any, measurable concentrations in the human. The 13-cis-retinoic acid is not appreciably bound to cellular retinoid-binding proteins or nuclear receptors and exhibits low tissue distribution and placental transfer. Its access to the nucleus may be extensive. Because of the long half life of 13-cis-retinoic acid, continuous isomerization results in significant area under the concentration-time curve levels of all-trans-retinoic acid in the mouse, monkey and the human; the all-trans-retinoic acid formed is extensively distributed across the placenta and may be an important factor that contributes to the teratogenic potency of 13-cis-retinoic acid. Isomerization cannot explain the teratogenic effects of 13-cis-retinoic acid in the rat and rabbit. It is concluded that the high teratogenic activity of isotretinoin in sensitive species (human, monkey) is related to slow elimination of the 13-cis-isomer, to metabolism to the 4-oxo-derivative, to increased placental transfer, to continuous isomerization and significant exposure of the

  1. Natural Variation in Brachypodium disctachyon: Deep Sequencing of Highly Diverse Natural Accessions (2013 DOE JGI Genomics of Energy and Environment 8th Annual User Meeting)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gordon, Sean

    2013-03-01

    Sean Gordon of the USDA on Natural variation in Brachypodium disctachyon: Deep Sequencing of Highly Diverse Natural Accessions at the 8th Annual Genomics of Energy Environment Meeting on March 27, 2013 in Walnut Creek, CA.

  2. Streptococcus mutans clonal variation revealed by multilocus sequence typing.

    PubMed

    Nakano, Kazuhiko; Lapirattanakul, Jinthana; Nomura, Ryota; Nemoto, Hirotoshi; Alaluusua, Satu; Grönroos, Lisa; Vaara, Martti; Hamada, Shigeyuki; Ooshima, Takashi; Nakagawa, Ichiro

    2007-08-01

    Streptococcus mutans is the major pathogen of dental caries, a biofilm-dependent infectious disease, and occasionally causes infective endocarditis. S. mutans strains have been classified into four serotypes (c, e, f, and k). However, little is known about the S. mutans population, including the clonal relationships among strains of S. mutans, in relation to the particular clones that cause systemic diseases. To address this issue, we have developed a multilocus sequence typing (MLST) scheme for S. mutans. Eight housekeeping gene fragments were sequenced from each of 102 S. mutans isolates collected from the four serotypes in Japan and Finland. Between 14 and 23 alleles per locus were identified, allowing us theoretically to distinguish more than 1.2 x 10(10) sequence types. We identified 92 sequence types in these 102 isolates, indicating that S. mutans contains a diverse population. Whereas serotype c strains were widely distributed in the dendrogram, serotype e, f, and k strains were differentiated into clonal complexes. Therefore, we conclude that the ancestral strain of S. mutans was serotype c. No geographic specificity was identified. However, the distribution of the collagen-binding protein gene (cnm) and direct evidence of mother-to-child transmission were clearly evident. In conclusion, the superior discriminatory capacity of this MLST scheme for S. mutans may have important practical implications.

  3. RoboOligo: software for mass spectrometry data to support manual and de novo sequencing of post-transcriptionally modified ribonucleic acids

    PubMed Central

    Sample, Paul J.; Gaston, Kirk W.; Alfonzo, Juan D.; Limbach, Patrick A.

    2015-01-01

    Ribosomal ribonucleic acid (RNA), transfer RNA and other biological or synthetic RNA polymers can contain nucleotides that have been modified by the addition of chemical groups. Traditional Sanger sequencing methods cannot establish the chemical nature and sequence of these modified-nucleotide containing oligomers. Mass spectrometry (MS) has become the conventional approach for determining the nucleotide composition, modification status and sequence of modified RNAs. Modified RNAs are analyzed by MS using collision-induced dissociation tandem mass spectrometry (CID MS/MS), which produces a complex dataset of oligomeric fragments that must be interpreted to identify and place modified nucleosides within the RNA sequence. Here we report the development of RoboOligo, an interactive software program for the robust analysis of data generated by CID MS/MS of RNA oligomers. There are three main functions of RoboOligo: (i) automated de novo sequencing via the local search paradigm. (ii) Manual sequencing with real-time spectrum labeling and cumulative intensity scoring. (iii) A hybrid approach, coined ‘variable sequencing’, which combines the user intuition of manual sequencing with the high-throughput sampling of automated de novo sequencing. PMID:25820423

  4. Relating Human Genetic Variation to Variation in Drug Responses

    PubMed Central

    Madian, Ashraf G.; Wheeler, Heather E.; Jones, Richard Baker; Dolan, M. Eileen

    2012-01-01

    Although sequencing a single human genome was a monumental effort a decade ago, more than one thousand genomes have now been sequenced. The task ahead lies in transforming this information into personalized treatment strategies that are tailored to the unique genetics of each individual. One important aspect of personalized medicine is patient-to-patient variation in drug response. Pharmacogenomics addresses this issue by seeking to identify genetic contributors to human variation in drug efficacy and toxicity. Here, we present a summary of the current status of this field, which has evolved from studies of single candidate genes to comprehensive genome-wide analyses. Additionally, we discuss the major challenges in translating this knowledge into a systems-level understanding of drug physiology with the ultimate goal of developing more effective personalized clinical treatment strategies. PMID:22840197

  5. A combination of PhP typing and β-d-glucuronidase gene sequence variation analysis for differentiation of Escherichia coli from humans and animals.

    PubMed

    Masters, N; Christie, M; Katouli, M; Stratton, H

    2015-06-01

    We investigated the usefulness of the β-d-glucuronidase gene variance in Escherichia coli as a microbial source tracking tool using a novel algorithm for comparison of sequences from a prescreened set of host-specific isolates using a high-resolution PhP typing method. A total of 65 common biochemical phenotypes belonging to 318 E. coli strains isolated from humans and domestic and wild animals were analysed for nucleotide variations at 10 loci along a 518 bp fragment of the 1812 bp β-d-glucuronidase gene. Neighbour-joining analysis of loci variations revealed 86 (76.8%) human isolates and 91.2% of animal isolates were correctly identified. Pairwise hierarchical clustering improved assignment; where 92 (82.1%) human and 204 (99%) animal strains were assigned to their respective cluster. Our data show that initial typing of isolates and selection of common types from different hosts prior to analysis of the β-d-glucuronidase gene sequence improves source identification. We also concluded that numerical profiling of the nucleotide variations can be used as a valuable approach to differentiate human from animal E. coli. This study signifies the usefulness of the β-d-glucuronidase gene as a marker for differentiating human faecal pollution from animal sources.

  6. TaALMT1 promoter sequence compositions, acid tolerance, and Al tolerance in wheat cultivars and landraces from Sichuan in China.

    PubMed

    Han, C; Dai, S F; Liu, D C; Pu, Z J; Wei, Y M; Zheng, Y L; Wen, D J; Zhao, L; Yan, Z H

    2013-11-18

    Previous genetic studies on wheat from various sources have indicated that aluminum (Al) tolerance may have originated independently in USA, Brazil, and China. Here, TaALMT1 promoter sequences of 92 landraces and cultivars from Sichuan, China, were sequenced. Five promoter types (I', II, III, IV, and V) were observed in 39 cultivars, and only three promoter types (I, II, and III) were observed in 53 landraces. Among the wheat collections worldwide, only the Chinese Spring (CS) landrace native to Sichuan, China, carried the TaALMT1 promoter type III. Besides CS, two other Sichuan-bred landraces and six cultivars with TaALMT1 promoter type III were identified in this study. In the phylogenetic tree constructed based on the TaALMT1 promoter sequences, type III formed a separate branch, which was supported by a high bootstrap value. It is likely that TaALMT1 promoter type III originated from Sichuan-bred wheat landraces of China. In addition, the landraces with promoter type I showed the lowest Al tolerance among all landraces and cultivars. Furthermore, the cultivars with promoter type IV showed better Al tolerance than landraces with promoter type II. A comparison of acid tolerance and Al tolerance between cultivars and landraces showed that the landraces had better acid tolerance than the cultivars, whereas the cultivars showed better Al tolerance than the landraces. Moreover, significant difference in Al tolerance was also observed between the cultivars raised by the National Ministry of Agriculture and by Sichuan Province. Among the landraces from different regions, those from the East showed better acid tolerance and Al tolerance than those from the South and West of Sichuan. Additional Al-tolerant and acid-tolerant wheat lines were also identified.

  7. Neocortical malformation as consequence of nonadaptive regulation of neuronogenetic sequence

    NASA Technical Reports Server (NTRS)

    Caviness, V. S. Jr; Takahashi, T.; Nowakowski, R. S.

    2000-01-01

    Variations in the structure of the neocortex induced by single gene mutations may be extreme or subtle. They differ from variations in neocortical structure encountered across and within species in that these "normal" structural variations are adaptive (both structurally and behaviorally), whereas those associated with disorders of development are not. Here we propose that they also differ in principle in that they represent disruptions of molecular mechanisms that are not normally regulatory to variations in the histogenetic sequence. We propose an algorithm for the operation of the neuronogenetic sequence in relation to the overall neocortical histogenetic sequence and highlight the restriction point of the G1 phase of the cell cycle as the master regulatory control point for normal coordinate structural variation across species and importantly within species. From considerations based on the anatomic evidence from neocortical malformation in humans, we illustrate in principle how this overall sequence appears to be disrupted by molecular biological linkages operating principally outside the control mechanisms responsible for the normal structural variation of the neocortex. MRDD Research Reviews 6:22-33, 2000. Copyright 2000 Wiley-Liss, Inc.

  8. Complete amino acid sequences of the ribosomal proteins L25, L29 and L31 from the archaebacterium Halobacterium marismortui.

    PubMed

    Hatakeyama, T; Kimura, M

    1988-03-15

    Ribosomal proteins were extracted from 50S ribosomal subunits of the archaebacterium Halobacterium marismortui by decreasing the concentration of Mg2+ and K+, and the proteins were separated and purified by ion-exchange column chromatography on DEAE-cellulose. Ten proteins were purified to homogeneity and three of these proteins were subjected to sequence analysis. The complete amino acid sequences of the ribosomal proteins L25, L29 and L31 were established by analyses of the peptides obtained by enzymatic digestion with trypsin, Staphylococcus aureus protease, chymotrypsin and lysylendopeptidase. Proteins L25, L29 and L31 consist of 84, 115 and 95 amino acid residues with the molecular masses of 9472 Da, 12293 Da and 10418 Da respectively. A comparison of their sequences with those of other large-ribosomal-subunit proteins from other organisms revealed that protein L25 from H. marismortui is homologous to protein L23 from Escherichia coli (34.6%), Bacillus stearothermophilus (41.8%), and tobacco chloroplasts (16.3%) as well as to protein L25 from yeast (38.0%). Proteins L29 and L31 do not appear to be homologous to any other ribosomal proteins whose structures are so far known.

  9. Factors affecting variations in the detailed fatty acid profile of Mediterranean buffalo milk determined by 2-dimensional gas chromatography.

    PubMed

    Pegolo, S; Stocco, G; Mele, M; Schiavon, S; Bittante, G; Cecchinato, A

    2017-04-01

    Buffalo milk is the world's second most widely produced milk, and increasing attention is being paid to its composition, particularly the fatty acid profile. The objectives of the present study were (1) to characterize the fatty acid composition of Mediterranean buffalo milk, and (2) to investigate potential sources of variation in the buffalo milk fatty acid profile. We determined the profile of 69 fatty acid traits in 272 individual samples of Mediterranean buffalo milk using gas chromatography. In total, 51 individual fatty acids were identified: 24 saturated fatty acids, 13 monounsaturated fatty acids, and 14 polyunsaturated fatty acids. The major individual fatty acids in buffalo milk were in the order 16:0, 18:1 cis-9, 14:0, and 18:0. Saturated fatty acids were the predominant fraction in buffalo milk fat (70.49%); monounsaturated and polyunsaturated fatty acids were at 25.95 and 3.54%, respectively. Adopting a classification based on carbon-chain length, we found that medium-chain fatty acids (11-16 carbons) represented the greater part (53.7%) of the fatty acid fraction of buffalo milk, whereas long-chain fatty acids (17-24 carbons) and short-chain fatty acids (4-10 carbons) accounted for 32.73 and 9.72%, respectively. The n-3 and n-6 fatty acids were 0.46 and 1.77%, respectively. The main conjugated linoleic acid, rumenic acid, represented 0.45% of total milk fatty acids. Herd/test date and stage of lactation were confirmed as important sources of variation in the fatty acid profile of buffalo milk. The percentages of short-chain and medium-chain fatty acids in buffalo milk increased in early lactation (+0.6 and +3.5%, respectively), whereas long-chain fatty acids decreased (-4.2%). The only exception to this pattern was butyric acid, which linearly decreased from the beginning of lactation, confirmation that its synthesis is independent of malonyl-CoA. These results seem to suggest that in early lactation the mobilization of energy reserves may have less

  10. A theoretical method to compute sequence dependent configurational properties in charged polymers and proteins.

    PubMed

    Sawle, Lucas; Ghosh, Kingshuk

    2015-08-28

    A general formalism to compute configurational properties of proteins and other heteropolymers with an arbitrary sequence of charges and non-uniform excluded volume interaction is presented. A variational approach is utilized to predict average distance between any two monomers in the chain. The presented analytical model, for the first time, explicitly incorporates the role of sequence charge distribution to determine relative sizes between two sequences that vary not only in total charge composition but also in charge decoration (even when charge composition is fixed). Furthermore, the formalism is general enough to allow variation in excluded volume interactions between two monomers. Model predictions are benchmarked against the all-atom Monte Carlo studies of Das and Pappu [Proc. Natl. Acad. Sci. U. S. A. 110, 13392 (2013)] for 30 different synthetic sequences of polyampholytes. These sequences possess an equal number of glutamic acid (E) and lysine (K) residues but differ in the patterning within the sequence. Without any fit parameter, the model captures the strong sequence dependence of the simulated values of the radius of gyration with a correlation coefficient of R(2) = 0.9. The model is then applied to real proteins to compare the unfolded state dimensions of 540 orthologous pairs of thermophilic and mesophilic proteins. The excluded volume parameters are assumed similar under denatured conditions, and only electrostatic effects encoded in the sequence are accounted for. With these assumptions, thermophilic proteins are found-with high statistical significance-to have more compact disordered ensemble compared to their mesophilic counterparts. The method presented here, due to its analytical nature, is capable of making such high throughput analysis of multiple proteins and will have broad applications in proteomic studies as well as in other heteropolymeric systems.

  11. Genetic variation and DNA fingerprinting of durian types in Malaysia using simple sequence repeat (SSR) markers.

    PubMed

    Siew, Ging Yang; Ng, Wei Lun; Tan, Sheau Wei; Alitheen, Noorjahan Banu; Tan, Soon Guan; Yeap, Swee Keong

    2018-01-01

    Durian ( Durio zibethinus ) is one of the most popular tropical fruits in Asia. To date, 126 durian types have been registered with the Department of Agriculture in Malaysia based on phenotypic characteristics. Classification based on morphology is convenient, easy, and fast but it suffers from phenotypic plasticity as a direct result of environmental factors and age. To overcome the limitation of morphological classification, there is a need to carry out genetic characterization of the various durian types. Such data is important for the evaluation and management of durian genetic resources in producing countries. In this study, simple sequence repeat (SSR) markers were used to study the genetic variation in 27 durian types from the germplasm collection of Universiti Putra Malaysia. Based on DNA sequences deposited in Genbank, seven pairs of primers were successfully designed to amplify SSR regions in the durian DNA samples. High levels of variation among the 27 durian types were observed (expected heterozygosity, H E  = 0.35). The DNA fingerprinting power of SSR markers revealed by the combined probability of identity (PI) of all loci was 2.3×10 -3 . Unique DNA fingerprints were generated for 21 out of 27 durian types using five polymorphic SSR markers (the other two SSR markers were monomorphic). We further tested the utility of these markers by evaluating the clonal status of shared durian types from different germplasm collection sites, and found that some were not clones. The findings in this preliminary study not only shows the feasibility of using SSR markers for DNA fingerprinting of durian types, but also challenges the current classification of durian types, e.g., on whether the different types should be called "clones", "varieties", or "cultivars". Such matters have a direct impact on the regulation and management of durian genetic resources in the region.

  12. Genetic variation and DNA fingerprinting of durian types in Malaysia using simple sequence repeat (SSR) markers

    PubMed Central

    Siew, Ging Yang; Tan, Sheau Wei; Tan, Soon Guan; Yeap, Swee Keong

    2018-01-01

    Durian (Durio zibethinus) is one of the most popular tropical fruits in Asia. To date, 126 durian types have been registered with the Department of Agriculture in Malaysia based on phenotypic characteristics. Classification based on morphology is convenient, easy, and fast but it suffers from phenotypic plasticity as a direct result of environmental factors and age. To overcome the limitation of morphological classification, there is a need to carry out genetic characterization of the various durian types. Such data is important for the evaluation and management of durian genetic resources in producing countries. In this study, simple sequence repeat (SSR) markers were used to study the genetic variation in 27 durian types from the germplasm collection of Universiti Putra Malaysia. Based on DNA sequences deposited in Genbank, seven pairs of primers were successfully designed to amplify SSR regions in the durian DNA samples. High levels of variation among the 27 durian types were observed (expected heterozygosity, HE = 0.35). The DNA fingerprinting power of SSR markers revealed by the combined probability of identity (PI) of all loci was 2.3×10−3. Unique DNA fingerprints were generated for 21 out of 27 durian types using five polymorphic SSR markers (the other two SSR markers were monomorphic). We further tested the utility of these markers by evaluating the clonal status of shared durian types from different germplasm collection sites, and found that some were not clones. The findings in this preliminary study not only shows the feasibility of using SSR markers for DNA fingerprinting of durian types, but also challenges the current classification of durian types, e.g., on whether the different types should be called “clones”, “varieties”, or “cultivars”. Such matters have a direct impact on the regulation and management of durian genetic resources in the region. PMID:29511604

  13. Comprehensive global amino acid sequence analysis of PB1F2 protein of influenza A H5N1 viruses and the influenza A virus subtypes responsible for the 20th‐century pandemics

    PubMed Central

    Pasricha, Gunisha; Mishra, Akhilesh C.; Chakrabarti, Alok K.

    2012-01-01

    Please cite this paper as: Pasricha et al. (2012) Comprehensive global amino acid sequence analysis of PB1F2 protein of influenza A H5N1 viruses and the Influenza A virus subtypes responsible for the 20th‐century pandemics. Influenza and Other Respiratory Viruses 7(4), 497–505. Background  PB1F2 is the 11th protein of influenza A virus translated from +1 alternate reading frame of PB1 gene. Since the discovery, varying sizes and functions of the PB1F2 protein of influenza A viruses have been reported. Selection of PB1 gene segment in the pandemics, variable size and pleiotropic effect of PB1F2 intrigued us to analyze amino acid sequences of this protein in various influenza A viruses. Methods  Amino acid sequences for PB1F2 protein of influenza A H5N1, H1N1, H2N2, and H3N2 subtypes were obtained from Influenza Research Database. Multiple sequence alignments of the PB1F2 protein sequences of the aforementioned subtypes were used to determine the size, variable and conserved domains and to perform mutational analysis. Results  Analysis showed that 96·4% of the H5N1 influenza viruses harbored full‐length PB1F2 protein. Except for the 2009 pandemic H1N1 virus, all the subtypes of the 20th‐century pandemic influenza viruses contained full‐length PB1F2 protein. Through the years, PB1F2 protein of the H1N1 and H3N2 viruses has undergone much variation. PB1F2 protein sequences of H5N1 viruses showed both human‐ and avian host‐specific conserved domains. Global database of PB1F2 protein revealed that N66S mutation was present only in 3·8% of the H5N1 strains. We found a novel mutation, N84S in the PB1F2 protein of 9·35% of the highly pathogenic avian influenza H5N1 influenza viruses. Conclusions  Varying sizes and mutations of the PB1F2 protein in different influenza A virus subtypes with pandemic potential were obtained. There was genetic divergence of the protein in various hosts which highlighted the host‐specific evolution of the virus

  14. Temporal Variations of Organic Acids in Sumac Fruit

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Robbins, C.; Mulcahy, F.; Somayajula, K.

    2006-10-01

    Extracts from staghorn sumac (Rhus typhina) fruits were obtained from fresh fruits obtained from June to October in two successive years. Total acidity, pH, and concentrations of malic and succinic acids determined using liquid chromatography were measured for each extract. Acidity and acid concentrations reached their maxima in late July, and declined slowly thereafter. Malic and succinic acid concentrations in the extracts reached maxima of about 4 and 0.2% (expressed per unit weight of fruit), respectively. Malic and succinic acids were the only organic acids observed in the extracts, and mass balance determinations indicate that these acids are most likelymore » the only ones present in appreciable amounts.« less

  15. [Variations on hemagglutinin gene of Zhejiang measles virus strains and differences with measles strains circulated both at home and abroad].

    PubMed

    Feng, Yan; Zhong, Shu-ling; Xu, Chang-ping; Lu, Yi-yu

    2013-07-01

    To investigate the variations on hemagglutinin (H) gene of measles virus (MV) in Zhejiang province, and to analyze the differences with strains circulated both at home and abroad. In total, 33 MV strains isolated in Zhejiang province between 1999 and 2011 were collected.RNA of the isolated MV strains was extracted and the complete sequences on H gene were amplified using RT-PCR assay. The products were compared with the Chinese vaccine strain Shanghai-191, which was downloaded from GenBank, and other 95 different MV strains from all over the world. 33 MV strains, isolated from the throat swab specimens collected from MV patients in Zhejiang province during 1999 to 2001, were used to conduct phylogenetic analysis with MV strains circulated in other areas of China during 1993 to 2007. The phylogenetic tree based on H gene sequences showed that all the Zhejiang MV strains located in H1a cluster, and no apparent time series and geographic restrictions were observed. Compared to the referenced vaccine strain Shanghai-191, the average variation rate on nucleotides and amino acids, and the evolutionary rate of H1a viruses from China during 2003 to 2011 were separately 5.15%, 4.44% and 5.81%, which were higher than the rates of H1a viruses during 1965 to 1993 (4.75%, 3.86% and 5.30%), and the rates of viruses during 1994 to 2002 (4.80%, 4.08% and 5.37%).However, the dn/ds ratios of strains within the three time periods were 0.19,0.21 and 0.23 respectively, which indicated that no evidence of positive selection was found on H1a MV strains during 1993 to 2011. A 24 stable amino acid variation sites on H gene was found between H1a viruses during 2003 to 2011 and the vaccine strain Shanghai-191. The largest variation occurred between vaccine and H1a strains, with 0.053 of the p-distance and 26-28 of amino acid mutations.However, only 15 stable amino acid variations were found between vaccine strain and genotype B3 or D4 strains.In addition, significant differences were found

  16. Variation of partial transferrin sequences and phylogenetic relationships among hares (Lepus capensis, Lagomorpha) from Tunisia.

    PubMed

    Awadi, Asma; Suchentrunk, Franz; Makni, Mohamed; Ben Slimen, Hichem

    2016-10-01

    North African hares are currently included in cape hares, Lepus capensis sensu lato, a taxon that may be considered a superspecies or a complex of closely related species. The existing molecular data, however, are not unequivocal, with mtDNA control region sequences suggesting a separate species status and nuclear loci (allozymes, microsatellites) revealing conspecificity of L. capensis and L. europaeus. Here, we study sequence variation in the intron 6 (468 bp) of the transferrin nuclear gene, of 105 hares with different coat colour from different regions in Tunisia with respect to genetic diversity and differentiation, as well as their phylogenetic status. Forty-six haplotypes (alleles) were revealed and compared phylogenetically to all available TF haplotypes of various Lepus species retrieved from GenBank. Maximum Likelihood, neighbor joining and median joining network analyses concordantly grouped all currently obtained haplotypes together with haplotypes belonging to six different Chinese hare species and the African scrub hare L. saxatilis. Moreover, two Tunisian haploypes were shared with L. capensis, L timidus, L. sinensis, L. yarkandensis, and L. hainanus from China. These results indicated the evolutionary complexity of the genus Lepus with the mixing of nuclear gene haplotypes resulting from introgressive hybridization or/and shared ancestral polymorphism. We report the presence of shared ancestral polymorphism between North African and Chinese hares. This has not been detected earlier in the mtDNA sequences of the same individuals. Genetic diversity of the TF sequences from the Tunisian populations was relatively high compared to other hare populations. However, genetic differentiation and gene flow analyses (AMOVA, F ST , Nm) indicated little divergence with the absence of geographically meaningful phylogroups and lack of clustering with coat colour types. These results confirm the presence of a single hare species in Tunisia, but a sound inference

  17. Carboxy-terminal sequence variation of LMP1 gene in Epstein-Barr-virus-associated mononucleosis and tumors from Serbian patients.

    PubMed

    Banko, Ana; Lazarevic, Ivana; Cupic, Maja; Stevanovic, Goran; Boricic, Ivan; Jovanovic, Tanja

    2012-04-01

    Seven strains of Epstein-Barr virus (EBV) are defined based on C-terminal sequence variations of the latent membrane protein 1 (LMP1). Some strains, especially those with a 30-bp deletion, are thought to be related to tumorigenic activity and geographical localization. The aims of the study were to determine the prevalence of different LMP1 strains and to investigate sequence variation in the C-terminal region of LMP1 in Serbian isolates. This study included 53 EBV-DNA-positive plasma and tissue block samples from patients with mononucleosis syndrome, renal transplantation, and tumors, mostly nasopharyngeal carcinoma. The sequence of the 506-bp fragment of LMP1 C terminus was used for phylogenetic analyses and identification of LMP1 strains, deletions, and mutations. The majority of isolates were non-deleted (66%), and the rest had 30-bp, rare 69-bp, or yet unknown 27-bp deletions, which were not related to malignant or non-malignant isolate origin. However, the majority of 69-bp deletion isolates were derived from patients with nasopharyngeal carcinoma. Less than five 33-bp repeats were found in the majority of non-deleted isolates (68.6%), whereas most 69-bp deletion isolates (75%) had five or six repeats. Serbian isolates were assigned to four LMP1 strains: B95-8 (32.1%), China 1 (24.5%), North Carolina (NC; 18.9%), and Mediterranean (Med; 24.5%). In NC isolates, three new mutations unique for this strain were identified. EBV EBNA2 genotypes 1 and 2 were both found, with dominance of genotype 1 (90.7%). This study demonstrated noticeable geographical-associated characteristics in the LMP1 C terminus of investigated isolates. Copyright © 2012 Wiley Periodicals, Inc.

  18. Identification, characterization, and utilization of genome-wide simple sequence repeats to identify a QTL for acidity in apple

    PubMed Central

    2012-01-01

    Background Apple is an economically important fruit crop worldwide. Developing a genetic linkage map is a critical step towards mapping and cloning of genes responsible for important horticultural traits in apple. To facilitate linkage map construction, we surveyed and characterized the distribution and frequency of perfect microsatellites in assembled contig sequences of the apple genome. Results A total of 28,538 SSRs have been identified in the apple genome, with an overall density of 40.8 SSRs per Mb. Di-nucleotide repeats are the most frequent microsatellites in the apple genome, accounting for 71.9% of all microsatellites. AT/TA repeats are the most frequent in genomic regions, accounting for 38.3% of all the G-SSRs, while AG/GA dimers prevail in transcribed sequences, and account for 59.4% of all EST-SSRs. A total set of 310 SSRs is selected to amplify eight apple genotypes. Of these, 245 (79.0%) are found to be polymorphic among cultivars and wild species tested. AG/GA motifs in genomic regions have detected more alleles and higher PIC values than AT/TA or AC/CA motifs. Moreover, AG/GA repeats are more variable than any other dimers in apple, and should be preferentially selected for studies, such as genetic diversity and linkage map construction. A total of 54 newly developed apple SSRs have been genetically mapped. Interestingly, clustering of markers with distorted segregation is observed on linkage groups 1, 2, 10, 15, and 16. A QTL responsible for malic acid content of apple fruits is detected on linkage group 8, and accounts for ~13.5% of the observed phenotypic variation. Conclusions This study demonstrates that di-nucleotide repeats are prevalent in the apple genome and that AT/TA and AG/GA repeats are the most frequent in genomic and transcribed sequences of apple, respectively. All SSR motifs identified in this study as well as those newly mapped SSRs will serve as valuable resources for pursuing apple genetic studies, aiding the apple breeding

  19. Complete Chloroplast Genome Sequence of Tartary Buckwheat (Fagopyrum tataricum) and Comparative Analysis with Common Buckwheat (F. esculentum)

    PubMed Central

    Cho, Kwang-Soo; Yun, Bong-Kyoung; Yoon, Young-Ho; Hong, Su-Young; Mekapogu, Manjulatha; Kim, Kyung-Hee; Yang, Tae-Jin

    2015-01-01

    We report the chloroplast (cp) genome sequence of tartary buckwheat (Fagopyrum tataricum) obtained by next-generation sequencing technology and compared this with the previously reported common buckwheat (F. esculentum ssp. ancestrale) cp genome. The cp genome of F. tataricum has a total sequence length of 159,272 bp, which is 327 bp shorter than the common buckwheat cp genome. The cp gene content, order, and orientation are similar to those of common buckwheat, but with some structural variation at tandem and palindromic repeat frequencies and junction areas. A total of seven InDels (around 100 bp) were found within the intergenic sequences and the ycf1 gene. Copy number variation of the 21-bp tandem repeat varied in F. tataricum (four repeats) and F. esculentum (one repeat), and the InDel of the ycf1 gene was 63 bp long. Nucleotide and amino acid have highly conserved coding sequence with about 98% homology and four genes—rpoC2, ycf3, accD, and clpP—have high synonymous (Ks) value. PCR based InDel markers were applied to diverse genetic resources of F. tataricum and F. esculentum, and the amplicon size was identical to that expected in silico. Therefore, these InDel markers are informative biomarkers to practically distinguish raw or processed buckwheat products derived from F. tataricum and F. esculentum. PMID:25966355

  20. Assignment of fatty acid-beta-oxidizing syntrophic bacteria to Syntrophomonadaceae fam. nov. on the basis of 16S rRNA sequence analyses

    NASA Technical Reports Server (NTRS)

    Zhao, H.; Yang, D.; Woese, C. R.; Bryant, M. P.

    1993-01-01

    After enrichment from Chinese rural anaerobic digestor sludge, anaerobic, sporing and nonsporing, saturated fatty acid-beta-oxidizing syntrophic bacteria were isolated as cocultures with H2- and formate-utilizing Methanospirillum hungatei or Desulfovibrio sp. strain G-11. The syntrophs degraded C4 to C8 saturated fatty acids, including isobutyrate and 2-methylbutyrate. They were adapted to grow on crotonate and were isolated as pure cultures. The crotonate-grown pure cultures alone did not grow on butyrate in either the presence or the absence of some common electron acceptors. However, when they were reconstituted with M. hungatei, growth on butyrate again occurred. In contrast, crotonate-grown Clostridium kluyveri and Clostridium sticklandii, as well as Clostridium sporogenes, failed to grow on butyrate when these organisms were cocultured with M. hungatei. The crotonate-grown pure subcultures of the syntrophs described above were subjected to 16S rRNA sequence analysis. Several previously documented fatty acid-beta-oxidizing syntrophs grown in pure cultures with crotonate were also subjected to comparative sequence analyses. The sequence analyses revealed that the new sporing and nonsporing isolates and other syntrophs that we sequenced, which had either gram-negative or gram-positive cell wall ultrastructure, all belonged to the phylogenetically gram-positive phylum. They were not closely related to any of the previously known subdivisions in the gram-positive phylum with which they were compared, but were closely related to each other, forming a new subdivision in the phylum. We recommend that this group be designated Syntrophomonadaceae fam. nov.; a description is given.