Günthard, H F; Wong, J K; Ignacio, C C; Havlir, D V; Richman, D D
1998-07-01
The performance of the high-density oligonucleotide array methodology (GeneChip) in detecting drug resistance mutations in HIV-1 pol was compared with that of automated dideoxynucleotide sequencing (ABI) of clinical samples, viral stocks, and plasmid-derived NL4-3 clones. Sequences from 29 clinical samples (plasma RNA, n = 17; lymph node RNA, n = 5; lymph node DNA, n = 7) from 12 patients, from 6 viral stock RNA samples, and from 13 NL4-3 clones were generated by both methods. Editing was done independently by a different investigator for each method before comparing the sequences. In addition, NL4-3 wild type (WT) and mutants were mixed in varying concentrations and sequenced by both methods. Overall, a concordance of 99.1% was found for a total of 30,865 bases compared. The comparison of clinical samples (plasma RNA and lymph node RNA and DNA) showed a slightly lower match of base calls, 98.8% for 19,831 nucleotides compared (protease region, 99.5%, n = 8272; RT region, 98.3%, n = 11,316), than for viral stocks and NL4-3 clones (protease region, 99.8%; RT region, 99.5%). Artificial mixing experiments showed a bias toward calling wild-type bases by GeneChip. Discordant base calls are most likely due to differential detection of mixtures. The concordance between GeneChip and ABI was high and appeared dependent on the nature of the templates (directly amplified versus cloned) and the complexity of mixes.
Machine Learned Replacement of N-Labels for Basecalled Sequences in DNA Barcoding.
Ma, Eddie Y T; Ratnasingham, Sujeevan; Kremer, Stefan C
2018-01-01
This study presents a machine learning method that increases the number of identified bases in Sanger Sequencing. The system post-processes a KB basecalled chromatogram. It selects a recoverable subset of N-labels in the KB-called chromatogram to replace with basecalls (A,C,G,T). An N-label correction is defined given an additional read of the same sequence, and a human finished sequence. Corrections are added to the dataset when an alignment determines the additional read and human agree on the identity of the N-label. KB must also rate the replacement with quality value of in the additional read. Corrections are only available during system training. Developing the system, nearly 850,000 N-labels are obtained from Barcode of Life Datasystems, the premier database of genetic markers called DNA Barcodes. Increasing the number of correct bases improves reference sequence reliability, increases sequence identification accuracy, and assures analysis correctness. Keeping with barcoding standards, our system maintains an error rate of percent. Our system only applies corrections when it estimates low rate of error. Tested on this data, our automation selects and recovers: 79 percent of N-labels from COI (animal barcode); 80 percent from matK and rbcL (plant barcodes); and 58 percent from non-protein-coding sequences (across eukaryotes).
Swenson, Luke C; Moores, Andrew; Low, Andrew J; Thielen, Alexander; Dong, Winnie; Woods, Conan; Jensen, Mark A; Wynhoven, Brian; Chan, Dennison; Glascock, Christopher; Harrigan, P Richard
2010-08-01
Tropism testing should rule out CXCR4-using HIV before treatment with CCR5 antagonists. Currently, the recombinant phenotypic Trofile assay (Monogram) is most widely utilized; however, genotypic tests may represent alternative methods. Independent triplicate amplifications of the HIV gp120 V3 region were made from either plasma HIV RNA or proviral DNA. These underwent standard, population-based sequencing with an ABI3730 (RNA n = 63; DNA n = 40), or "deep" sequencing with a Roche/454 Genome Sequencer-FLX (RNA n = 12; DNA n = 12). Position-specific scoring matrices (PSSMX4/R5) (-6.96 cutoff) and geno2pheno[coreceptor] (5% false-positive rate) inferred tropism from V3 sequence. These methods were then independently validated with a separate, blinded dataset (n = 278) of screening samples from the maraviroc MOTIVATE trials. Standard sequencing of HIV RNA with PSSM yielded 69% sensitivity and 91% specificity, relative to Trofile. The validation dataset gave 75% sensitivity and 83% specificity. Proviral DNA plus PSSM gave 77% sensitivity and 71% specificity. "Deep" sequencing of HIV RNA detected >2% inferred-CXCR4-using virus in 8/8 samples called non-R5 by Trofile, and <2% in 4/4 samples called R5. Triplicate analyses of V3 standard sequence data detect greater proportions of CXCR4-using samples than previously achieved. Sequencing proviral DNA and "deep" V3 sequencing may also be useful tools for assessing tropism.
Masking as an effective quality control method for next-generation sequencing data analysis.
Yun, Sajung; Yun, Sijung
2014-12-13
Next generation sequencing produces base calls with low quality scores that can affect the accuracy of identifying simple nucleotide variation calls, including single nucleotide polymorphisms and small insertions and deletions. Here we compare the effectiveness of two data preprocessing methods, masking and trimming, and the accuracy of simple nucleotide variation calls on whole-genome sequence data from Caenorhabditis elegans. Masking substitutes low quality base calls with 'N's (undetermined bases), whereas trimming removes low quality bases that results in a shorter read lengths. We demonstrate that masking is more effective than trimming in reducing the false-positive rate in single nucleotide polymorphism (SNP) calling. However, both of the preprocessing methods did not affect the false-negative rate in SNP calling with statistical significance compared to the data analysis without preprocessing. False-positive rate and false-negative rate for small insertions and deletions did not show differences between masking and trimming. We recommend masking over trimming as a more effective preprocessing method for next generation sequencing data analysis since masking reduces the false-positive rate in SNP calling without sacrificing the false-negative rate although trimming is more commonly used currently in the field. The perl script for masking is available at http://code.google.com/p/subn/. The sequencing data used in the study were deposited in the Sequence Read Archive (SRX450968 and SRX451773).
Suckau, Detlev; Resemann, Anja
2009-12-01
The ability to match Top-Down protein sequencing (TDS) results by MALDI-TOF to protein sequences by classical protein database searching was evaluated in this work. Resulting from these analyses were the protein identity, the simultaneous assignment of the N- and C-termini and protein sequences of up to 70 residues from either terminus. In combination with de novo sequencing using the MALDI-TDS data, even fusion proteins were assigned and the detailed sequence around the fusion site was elucidated. MALDI-TDS allowed to efficiently match protein sequences quickly and to validate recombinant protein structures-in particular, protein termini-on the level of undigested proteins.
Lim, K Yoong; Kovarik, Ales; Matyasek, Roman; Chase, Mark W; Knapp, Sandra; McCarthy, Elizabeth; Clarkson, James J; Leitch, Andrew R
2006-12-01
Combining phylogenetic reconstructions of species relationships with comparative genomic approaches is a powerful way to decipher evolutionary events associated with genome divergence. Here, we reconstruct the history of karyotype and tandem repeat evolution in species of diploid Nicotiana section Alatae. By analysis of plastid DNA, we resolved two clades with high bootstrap support, one containing N. alata, N. langsdorffii, N. forgetiana and N. bonariensis (called the n = 9 group) and another containing N. plumbaginifolia and N. longiflora (called the n = 10 group). Despite little plastid DNA sequence divergence, we observed, via fluorescent in situ hybridization, substantial chromosomal repatterning, including altered chromosome numbers, structure and distribution of repeats. Effort was focussed on 35S and 5S nuclear ribosomal DNA (rDNA) and the HRS60 satellite family of tandem repeats comprising the elements HRS60, NP3R and NP4R. We compared divergence of these repeats in diploids and polyploids of Nicotiana. There are dramatic shifts in the distribution of the satellite repeats and complete replacement of intergenic spacers (IGSs) of 35S rDNA associated with divergence of the species in section Alatae. We suggest that sequence homogenization has replaced HRS60 family repeats at sub-telomeric regions, but that this process may not occur, or occurs more slowly, when the repeats are found at intercalary locations. Sequence homogenization acts more rapidly (at least two orders of magnitude) on 35S rDNA than 5S rDNA and sub-telomeric satellite sequences. This rapid rate of divergence is analogous to that found in polyploid species, and is therefore, in plants, not only associated with polyploidy.
Hidden symmetries in N-layer dielectric stacks
NASA Astrophysics Data System (ADS)
Liu, Haihao; Shoufie Ukhtary, M.; Saito, Riichiro
2017-11-01
The optical properties of a multilayer system with arbitrary N layers of dielectric media are investigated. Each layer is one of two dielectric media, with a thickness one-quarter the wavelength of light in that medium, corresponding to a central frequency f 0. Using the transfer matrix method, the transmittance T is calculated for all possible 2 N sequences for small N. Unexpectedly, it is found that instead of 2 N different values of T at f 0 (T 0), there are only (N/2+1) discrete values of T 0, for even N, and (N + 1) for odd N. We explain this high degeneracy in T 0 values by finding symmetry operations on the sequences that do not change T 0. Analytical formulae were derived for the T 0 values and their degeneracies as functions of N and an integer parameter for each sequence we call ‘charge’. Additionally, the bandwidth at f 0 and filter response of the transmission spectra are investigated, revealing asymptotic behavior at large N.
Fisher, Kevin E.; Zhang, Linsheng; Wang, Jason; Smith, Geoffrey H.; Newman, Scott; Schneider, Thomas M.; Pillai, Rathi N.; Kudchadkar, Ragini R.; Owonikoko, Taofeek K.; Ramalingam, Suresh S.; Lawson, David H.; Delman, Keith A.; El-Rayes, Bassel F.; Wilson, Malania M.; Sullivan, H. Clifford; Morrison, Annie S.; Balci, Serdar; Adsay, N. Volkan; Gal, Anthony A.; Sica, Gabriel L.; Saxe, Debra F.; Mann, Karen P.; Hill, Charles E.; Khuri, Fadlo R.; Rossi, Michael R.
2017-01-01
We tested and clinically validated a targeted next-generation sequencing (NGS) mutation panel using 80 formalin-fixed, paraffin-embedded (FFPE) tumor samples. Forty non-small cell lung carcinoma (NSCLC), 30 melanoma, and 30 gastrointestinal (12 colonic, 10 gastric, and 8 pancreatic adenocarcinoma) FFPE samples were selected from laboratory archives. After appropriate specimen and nucleic acid quality control, 80 NGS libraries were prepared using the Illumina TruSight tumor (TST) kit and sequenced on the Illumina MiSeq. Sequence alignment, variant calling, and sequencing quality control were performed using vendor software and laboratory-developed analysis workflows. TST generated ≥500× coverage for 98.4% of the 13,952 targeted bases. Reproducible and accurate variant calling was achieved at ≥5% variant allele frequency with 8 to 12 multiplexed samples per MiSeq flow cell. TST detected 112 variants overall, and confirmed all known single-nucleotide variants (n = 27), deletions (n = 5), insertions (n = 3), and multinucleotide variants (n = 3). TST detected at least one variant in 85.0% (68/80), and two or more variants in 36.2% (29/80), of samples. TP53 was the most frequently mutated gene in NSCLC (13 variants; 13/32 samples), gastrointestinal malignancies (15 variants; 13/25 samples), and overall (30 variants; 28/80 samples). BRAF mutations were most common in melanoma (nine variants; 9/23 samples). Clinically relevant NGS data can be obtained from routine clinical FFPE solid tumor specimens using TST, benchtop instruments, and vendor-supplied bioinformatics pipelines. PMID:26801070
Yefremova, Yelena; Al-Majdoub, Mahmoud; Opuni, Kwabena F M; Koy, Cornelia; Cui, Weidong; Yan, Yuetian; Gross, Michael L; Glocker, Michael O
2015-03-01
Mass spectrometric de-novo sequencing was applied to review the amino acid sequence of a commercially available recombinant protein G´ with great scientific and economic importance. Substantial deviations to the published amino acid sequence (Uniprot Q54181) were found by the presence of 46 additional amino acids at the N-terminus, including a so-called "His-tag" as well as an N-terminal partial α-N-gluconoylation and α-N-phosphogluconoylation, respectively. The unexpected amino acid sequence of the commercial protein G' comprised 241 amino acids and resulted in a molecular mass of 25,998.9 ± 0.2 Da for the unmodified protein. Due to the higher mass that is caused by its extended amino acid sequence compared with the original protein G' (185 amino acids), we named this protein "protein G'e." By means of mass spectrometric peptide mapping, the suggested amino acid sequence, as well as the N-terminal partial α-N-gluconoylations, was confirmed with 100% sequence coverage. After the protein G'e sequence was determined, we were able to determine the expression vector pET-28b from Novagen with the Xho I restriction enzyme cleavage site as the best option that was used for cloning and expressing the recombinant protein G'e in E. coli. A dissociation constant (K(d)) value of 9.4 nM for protein G'e was determined thermophoretically, showing that the N-terminal flanking sequence extension did not cause significant changes in the binding affinity to immunoglobulins.
U50: A New Metric for Measuring Assembly Output Based on Non-Overlapping, Target-Specific Contigs.
Castro, Christina J; Ng, Terry Fei Fan
2017-11-01
Advances in next-generation sequencing technologies enable routine genome sequencing, generating millions of short reads. A crucial step for full genome analysis is the de novo assembly, and currently, performance of different assembly methods is measured by a metric called N 50 . However, the N 50 value can produce skewed, inaccurate results when complex data are analyzed, especially for viral and microbial datasets. To provide a better assessment of assembly output, we developed a new metric called U 50 . The U 50 identifies unique, target-specific contigs by using a reference genome as baseline, aiming at circumventing some limitations that are inherent to the N 50 metric. Specifically, the U 50 program removes overlapping sequence of multiple contigs by utilizing a mask array, so the performance of the assembly is only measured by unique contigs. We compared simulated and real datasets by using U 50 and N 50 , and our results demonstrated that U 50 has the following advantages over N 50 : (1) reducing erroneously large N 50 values due to a poor assembly, (2) eliminating overinflated N 50 values caused by large measurements from overlapping contigs, (3) eliminating diminished N 50 values caused by an abundance of small contigs, and (4) allowing comparisons across different platforms or samples based on the new percentage-based metric UG 50 %. The use of the U 50 metric allows for a more accurate measure of assembly performance by analyzing only the unique, non-overlapping contigs. In addition, most viral and microbial sequencing have high background noise (i.e., host and other non-targets), which contributes to having a skewed, misrepresented N 50 value-this is corrected by U 50 . Also, the UG 50 % can be used to compare assembly results from different samples or studies, the cross-comparisons of which cannot be performed with N 50 .
On the derivatives of unimodular polynomials
NASA Astrophysics Data System (ADS)
Nevai, P.; Erdélyi, T.
2016-04-01
Let D be the open unit disk of the complex plane; its boundary, the unit circle of the complex plane, is denoted by \\partial D. Let \\mathscr P_n^c denote the set of all algebraic polynomials of degree at most n with complex coefficients. For λ ≥ 0, let {\\mathscr K}_n^λ \\stackrel{{def}}{=} \\biggl\\{P_n: P_n(z) = \\sumk=0^n{ak k^λ z^k}, ak \\in { C}, |a_k| = 1 \\biggr\\} \\subset {\\mathscr P}_n^c.The class \\mathscr K_n^0 is often called the collection of all (complex) unimodular polynomials of degree n. Given a sequence (\\varepsilon_n) of positive numbers tending to 0, we say that a sequence (P_n) of polynomials P_n\\in\\mathscr K_n^λ is \\{λ, (\\varepsilon_n)\\}-ultraflat if \\displaystyle (1-\\varepsilon_n)\\frac{nλ+1/2}{\\sqrt{2λ+1}}≤\\ve......a +1/2}}{\\sqrt{2λ +1}},\\qquad z \\in \\partial D,\\quad n\\in N_0.Although we do not know, in general, whether or not \\{λ, (\\varepsilon_n)\\}-ultraflat sequences of polynomials P_n\\in\\mathscr K_n^λ exist for each fixed λ>0, we make an effort to prove various interesting properties of them. These allow us to conclude that there are no sequences (P_n) of either conjugate, or plain, or skew reciprocal unimodular polynomials P_n\\in\\mathscr K_n^0 such that (Q_n) with Q_n(z)\\stackrel{{def}}{=} zP_n'(z)+1 is a \\{1,(\\varepsilon_n)\\}-ultraflat sequence of polynomials.Bibliography: 18 titles.
Dunn, Robert A; Hernandez, Olga
2009-09-01
Low frequency northeastern Pacific blue whale calls were recorded near the northern East Pacific Rise (9 degrees N latitude) on 25 ocean-bottom-mounted hydrophones and three-component seismometers during a 5-day period (November 22-26, 1997). Call types A, B, C, and D were identified; the most common pattern being approximately 130-135 s repetitions of the AB sequence that, for any individual whale, persisted for hours. Up to eight individual blue whales were recorded near enough to the instruments to determine their locations and were tracked call-by-call using the B components of the calls and a Bayesian inversion procedure. For four of these eight whales, the entire call sequences and swim tracks were determined for 20-26-h periods; the other whales were tracked for much shorter periods. The eight whales moved into the area during a period of airgun activity conducted by the academic seismic ship R/V Maurice Ewing. The authors examined the whales' locations and call characteristics with respect to the periods of airgun activity. Although the data do not permit a thorough investigation of behavioral responses, no correlation in vocalization or movement with airgun activity was observed.
Liu, Bin; Wang, Xiaolong; Lin, Lei; Dong, Qiwen; Wang, Xuan
2008-12-01
Protein remote homology detection and fold recognition are central problems in bioinformatics. Currently, discriminative methods based on support vector machine (SVM) are the most effective and accurate methods for solving these problems. A key step to improve the performance of the SVM-based methods is to find a suitable representation of protein sequences. In this paper, a novel building block of proteins called Top-n-grams is presented, which contains the evolutionary information extracted from the protein sequence frequency profiles. The protein sequence frequency profiles are calculated from the multiple sequence alignments outputted by PSI-BLAST and converted into Top-n-grams. The protein sequences are transformed into fixed-dimension feature vectors by the occurrence times of each Top-n-gram. The training vectors are evaluated by SVM to train classifiers which are then used to classify the test protein sequences. We demonstrate that the prediction performance of remote homology detection and fold recognition can be improved by combining Top-n-grams and latent semantic analysis (LSA), which is an efficient feature extraction technique from natural language processing. When tested on superfamily and fold benchmarks, the method combining Top-n-grams and LSA gives significantly better results compared to related methods. The method based on Top-n-grams significantly outperforms the methods based on many other building blocks including N-grams, patterns, motifs and binary profiles. Therefore, Top-n-gram is a good building block of the protein sequences and can be widely used in many tasks of the computational biology, such as the sequence alignment, the prediction of domain boundary, the designation of knowledge-based potentials and the prediction of protein binding sites.
Mu, Wenbo; Lu, Hsiao-Mei; Chen, Jefferey; Li, Shuwei; Elliott, Aaron M
2016-11-01
Next-generation sequencing (NGS) has rapidly replaced Sanger sequencing as the method of choice for diagnostic gene-panel testing. For hereditary-cancer testing, the technical sensitivity and specificity of the assay are paramount as clinicians use results to make important clinical management and treatment decisions. There is significant debate within the diagnostics community regarding the necessity of confirming NGS variant calls by Sanger sequencing, considering that numerous laboratories report having 100% specificity from the NGS data alone. Here we report our results from 20,000 hereditary-cancer NGS panels spanning 47 genes, in which all 7845 nonpolymorphic variants were Sanger- sequenced. Of these, 98.7% were concordant between NGS and Sanger sequencing and 1.3% were identified as NGS false-positives, located mainly in complex genomic regions (A/T-rich regions, G/C-rich regions, homopolymer stretches, and pseudogene regions). Simulating a false-positive rate of zero by adjusting the variant-calling quality-score thresholds decreased the sensitivity of the assay from 100% to 97.8%, resulting in the missed detection of 176 Sanger-confirmed variants, the majority in complex genomic regions (n = 114) and mosaic mutations (n = 7). The data illustrate the importance of setting quality thresholds for panel testing only after thousands of samples have been processed and the necessity of Sanger confirmation of NGS variants to maintain the highest possible sensitivity. Copyright © 2016 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.
Wild Birds Use an Ordering Rule to Decode Novel Call Sequences.
Suzuki, Toshitaka N; Wheatcroft, David; Griesser, Michael
2017-08-07
The generative power of human language depends on grammatical rules, such as word ordering, that allow us to produce and comprehend even novel combinations of words [1-3]. Several species of birds and mammals produce sequences of calls [4-6], and, like words in human sentences, their order may influence receiver responses [7]. However, it is unknown whether animals use call ordering to extract meaning from truly novel sequences. Here, we use a novel experimental approach to test this in a wild bird species, the Japanese tit (Parus minor). Japanese tits are attracted to mobbing a predator when they hear conspecific alert and recruitment calls ordered as alert-recruitment sequences [7]. They also approach in response to recruitment calls of heterospecific individuals in mixed-species flocks [8, 9]. Using experimental playbacks, we assess their responses to artificial sequences in which their own alert calls are combined into different orderings with heterospecific recruitment calls. We find that Japanese tits respond similarly to mixed-species alert-recruitment call sequences and to their own alert-recruitment sequences. Importantly, however, tits rarely respond to mixed-species sequences in which the call order is reversed. Thus, Japanese tits extract a compound meaning from novel call sequences using an ordering rule. These results demonstrate a new parallel between animal communication systems and human language, opening new avenues for exploring the evolution of ordering rules and compositionality in animal vocal sequences. Copyright © 2017 Elsevier Ltd. All rights reserved.
Albayrak, Levent; Khanipov, Kamil; Pimenova, Maria; Golovko, George; Rojas, Mark; Pavlidis, Ioannis; Chumakov, Sergei; Aguilar, Gerardo; Chávez, Arturo; Widger, William R; Fofanov, Yuriy
2016-12-12
Low-abundance mutations in mitochondrial populations (mutations with minor allele frequency ≤ 1%), are associated with cancer, aging, and neurodegenerative disorders. While recent progress in high-throughput sequencing technology has significantly improved the heteroplasmy identification process, the ability of this technology to detect low-abundance mutations can be affected by the presence of similar sequences originating from nuclear DNA (nDNA). To determine to what extent nDNA can cause false positive low-abundance heteroplasmy calls, we have identified mitochondrial locations of all subsequences that are common or similar (one mismatch allowed) between nDNA and mitochondrial DNA (mtDNA). Performed analysis revealed up to a 25-fold variation in the lengths of longest common and longest similar (one mismatch allowed) subsequences across the mitochondrial genome. The size of the longest subsequences shared between nDNA and mtDNA in several regions of the mitochondrial genome were found to be as low as 11 bases, which not only allows using these regions to design new, very specific PCR primers, but also supports the hypothesis of the non-random introduction of mtDNA into the human nuclear DNA. Analysis of the mitochondrial locations of the subsequences shared between nDNA and mtDNA suggested that even very short (36 bases) single-end sequencing reads can be used to identify low-abundance variation in 20.4% of the mitochondrial genome. For longer (76 and 150 bases) reads, the proportion of the mitochondrial genome where nDNA presence will not interfere found to be 44.5 and 67.9%, when low-abundance mutations at 100% of locations can be identified using 417 bases long single reads. This observation suggests that the analysis of low-abundance variations in mitochondria population can be extended to a variety of large data collections such as NCBI Sequence Read Archive, European Nucleotide Archive, The Cancer Genome Atlas, and International Cancer Genome Consortium.
ParticleCall: A particle filter for base calling in next-generation sequencing systems
2012-01-01
Background Next-generation sequencing systems are capable of rapid and cost-effective DNA sequencing, thus enabling routine sequencing tasks and taking us one step closer to personalized medicine. Accuracy and lengths of their reads, however, are yet to surpass those provided by the conventional Sanger sequencing method. This motivates the search for computationally efficient algorithms capable of reliable and accurate detection of the order of nucleotides in short DNA fragments from the acquired data. Results In this paper, we consider Illumina’s sequencing-by-synthesis platform which relies on reversible terminator chemistry and describe the acquired signal by reformulating its mathematical model as a Hidden Markov Model. Relying on this model and sequential Monte Carlo methods, we develop a parameter estimation and base calling scheme called ParticleCall. ParticleCall is tested on a data set obtained by sequencing phiX174 bacteriophage using Illumina’s Genome Analyzer II. The results show that the developed base calling scheme is significantly more computationally efficient than the best performing unsupervised method currently available, while achieving the same accuracy. Conclusions The proposed ParticleCall provides more accurate calls than the Illumina’s base calling algorithm, Bustard. At the same time, ParticleCall is significantly more computationally efficient than other recent schemes with similar performance, rendering it more feasible for high-throughput sequencing data analysis. Improvement of base calling accuracy will have immediate beneficial effects on the performance of downstream applications such as SNP and genotype calling. ParticleCall is freely available at https://sourceforge.net/projects/particlecall. PMID:22776067
Random trinomial tree models and vanilla options
NASA Astrophysics Data System (ADS)
Ganikhodjaev, Nasir; Bayram, Kamola
2013-09-01
In this paper we introduce and study random trinomial model. The usual trinomial model is prescribed by triple of numbers (u, d, m). We call the triple (u, d, m) an environment of the trinomial model. A triple (Un, Dn, Mn), where {Un}, {Dn} and {Mn} are the sequences of independent, identically distributed random variables with 0 < Dn < 1 < Un and Mn = 1 for all n, is called a random environment and trinomial tree model with random environment is called random trinomial model. The random trinomial model is considered to produce more accurate results than the random binomial model or usual trinomial model.
ABI Base Recall: Automatic Correction and Ends Trimming of DNA Sequences.
Elyazghi, Zakaria; Yazouli, Loubna El; Sadki, Khalid; Radouani, Fouzia
2017-12-01
Automated DNA sequencers produce chromatogram files in ABI format. When viewing chromatograms, some ambiguities are shown at various sites along the DNA sequences, because the program implemented in the sequencing machine and used to call bases cannot always precisely determine the right nucleotide, especially when it is represented by either a broad peak or a set of overlaying peaks. In such cases, a letter other than A, C, G, or T is recorded, most commonly N. Thus, DNA sequencing chromatograms need manual examination: checking for mis-calls and truncating the sequence when errors become too frequent. The purpose of this paper is to develop a program allowing the automatic correction of these ambiguities. This application is a Web-based program powered by Shiny and runs under R platform for an easy exploitation. As a part of the interface, we added the automatic ends clipping option, alignment against reference sequences, and BLAST. To develop and test our tool, we collected several bacterial DNA sequences from different laboratories within Institut Pasteur du Maroc and performed both manual and automatic correction. The comparison between the two methods was carried out. As a result, we note that our program, ABI base recall, accomplishes good correction with a high accuracy. Indeed, it increases the rate of identity and coverage and minimizes the number of mismatches and gaps, hence it provides solution to sequencing ambiguities and saves biologists' time and labor.
Novel DNA packaging recognition in the unusual bacteriophage N15
DOE Office of Scientific and Technical Information (OSTI.GOV)
Feiss, Michael; Geyer, Henriette, E-mail: henriettegeyer@gmail.com; Division of Viral Infections, Robert Koch Institute, Berlin
Phage lambda's cosB packaging recognition site is tripartite, consisting of 3 TerS binding sites, called R sequences. TerS binding to the critical R3 site positions the TerL endonuclease for nicking cosN to generate cohesive ends. The N15 cos (cos{sup N15}) is closely related to cos{sup λ}, but whereas the cosB{sup N15} subsite has R3, it lacks the R2 and R1 sites and the IHF binding site of cosB{sup λ}. A bioinformatic study of N15-like phages indicates that cosB{sup N15} also has an accessory, remote rR2 site, which is proposed to increase packaging efficiency, like R2 and R1 of lambda. N15more » plus five prophages all have the rR2 sequence, which is located in the TerS-encoding 1 gene, approximately 200 bp distal to R3. An additional set of four highly related prophages, exemplified by Monarch, has R3 sequence, but also has R2 and R1 sequences characteristic of cosB–λ. The DNA binding domain of TerS-N15 is a dimer. - Highlights: • There are two classes of DNA packaging signals in N15-related phages. • Phage N15's TerS binding site: a critical site and a possible remote accessory site. • Viral DNA recognition signals by the λ-like bacteriophages: the odd case of N15.« less
Rutvisuttinunt, Wiriya; Chinnawirotpisan, Piyawan; Simasathien, Sriluck; Shrestha, Sanjaya K; Yoon, In-Kyu; Klungthong, Chonticha; Fernandez, Stefan
2013-11-01
Active global surveillance and characterization of influenza viruses are essential for better preparation against possible pandemic events. Obtaining comprehensive information about the influenza genome can improve our understanding of the evolution of influenza viruses and emergence of new strains, and improve the accuracy when designing preventive vaccines. This study investigated the use of deep sequencing by the next-generation sequencing (NGS) Illumina MiSeq Platform to obtain complete genome sequence information from influenza virus isolates. The influenza virus isolates were cultured from 6 respiratory acute clinical specimens collected in Thailand and Nepal. DNA libraries obtained from each viral isolate were mixed and all were sequenced simultaneously. Total information of 2.6 Gbases was obtained from a 455±14 K/mm2 density with 95.76% (8,571,655/8,950,724 clusters) of the clusters passing quality control (QC) filters. Approximately 93.7% of all sequences from Read1 and 83.5% from Read2 contained high quality sequences that were ≥Q30, a base calling QC score standard. Alignments analysis identified three seasonal influenza A H3N2 strains, one 2009 pandemic influenza A H1N1 strain and two influenza B strains. The nearly entire genomes of all six virus isolates yielded equal or greater than 600-fold sequence coverage depth. MiSeq Platform identified seasonal influenza A H3N2, 2009 pandemic influenza A H1N1and influenza B in the DNA library mixtures efficiently. Copyright © 2013 The Authors. Published by Elsevier B.V. All rights reserved.
Alachiotis, Nikolaos; Vogiatzi, Emmanouella; Pavlidis, Pavlos; Stamatakis, Alexandros
2013-01-01
Automated DNA sequencers generate chromatograms that contain raw sequencing data. They also generate data that translates the chromatograms into molecular sequences of A, C, G, T, or N (undetermined) characters. Since chromatogram translation programs frequently introduce errors, a manual inspection of the generated sequence data is required. As sequence numbers and lengths increase, visual inspection and manual correction of chromatograms and corresponding sequences on a per-peak and per-nucleotide basis becomes an error-prone, time-consuming, and tedious process. Here, we introduce ChromatoGate (CG), an open-source software that accelerates and partially automates the inspection of chromatograms and the detection of sequencing errors for bidirectional sequencing runs. To provide users full control over the error correction process, a fully automated error correction algorithm has not been implemented. Initially, the program scans a given multiple sequence alignment (MSA) for potential sequencing errors, assuming that each polymorphic site in the alignment may be attributed to a sequencing error with a certain probability. The guided MSA assembly procedure in ChromatoGate detects chromatogram peaks of all characters in an alignment that lead to polymorphic sites, given a user-defined threshold. The threshold value represents the sensitivity of the sequencing error detection mechanism. After this pre-filtering, the user only needs to inspect a small number of peaks in every chromatogram to correct sequencing errors. Finally, we show that correcting sequencing errors is important, because population genetic and phylogenetic inferences can be misled by MSAs with uncorrected mis-calls. Our experiments indicate that estimates of population mutation rates can be affected two- to three-fold by uncorrected errors. PMID:24688709
Alachiotis, Nikolaos; Vogiatzi, Emmanouella; Pavlidis, Pavlos; Stamatakis, Alexandros
2013-01-01
Automated DNA sequencers generate chromatograms that contain raw sequencing data. They also generate data that translates the chromatograms into molecular sequences of A, C, G, T, or N (undetermined) characters. Since chromatogram translation programs frequently introduce errors, a manual inspection of the generated sequence data is required. As sequence numbers and lengths increase, visual inspection and manual correction of chromatograms and corresponding sequences on a per-peak and per-nucleotide basis becomes an error-prone, time-consuming, and tedious process. Here, we introduce ChromatoGate (CG), an open-source software that accelerates and partially automates the inspection of chromatograms and the detection of sequencing errors for bidirectional sequencing runs. To provide users full control over the error correction process, a fully automated error correction algorithm has not been implemented. Initially, the program scans a given multiple sequence alignment (MSA) for potential sequencing errors, assuming that each polymorphic site in the alignment may be attributed to a sequencing error with a certain probability. The guided MSA assembly procedure in ChromatoGate detects chromatogram peaks of all characters in an alignment that lead to polymorphic sites, given a user-defined threshold. The threshold value represents the sensitivity of the sequencing error detection mechanism. After this pre-filtering, the user only needs to inspect a small number of peaks in every chromatogram to correct sequencing errors. Finally, we show that correcting sequencing errors is important, because population genetic and phylogenetic inferences can be misled by MSAs with uncorrected mis-calls. Our experiments indicate that estimates of population mutation rates can be affected two- to three-fold by uncorrected errors.
Gelada vocal sequences follow Menzerath's linguistic law.
Gustison, Morgan L; Semple, Stuart; Ferrer-I-Cancho, Ramon; Bergman, Thore J
2016-05-10
Identifying universal principles underpinning diverse natural systems is a key goal of the life sciences. A powerful approach in addressing this goal has been to test whether patterns consistent with linguistic laws are found in nonhuman animals. Menzerath's law is a linguistic law that states that, the larger the construct, the smaller the size of its constituents. Here, to our knowledge, we present the first evidence that Menzerath's law holds in the vocal communication of a nonhuman species. We show that, in vocal sequences of wild male geladas (Theropithecus gelada), construct size (sequence size in number of calls) is negatively correlated with constituent size (duration of calls). Call duration does not vary significantly with position in the sequence, but call sequence composition does change with sequence size and most call types are abbreviated in larger sequences. We also find that intercall intervals follow the same relationship with sequence size as do calls. Finally, we provide formal mathematical support for the idea that Menzerath's law reflects compression-the principle of minimizing the expected length of a code. Our findings suggest that a common principle underpins human and gelada vocal communication, highlighting the value of exploring the applicability of linguistic laws in vocal systems outside the realm of language.
When seconds count: A study of communication variables in the opening segment of emergency calls.
Penn, Claire; Koole, Tom; Nattrass, Rhona
2017-09-01
The opening sequence of an emergency call influences the efficiency of the ambulance dispatch time. The greeting sequences in 105 calls to a South African emergency service were analysed. Initial results suggested the advantage of a specific two-part opening sequence. An on-site experiment aimed at improving call efficiency was conducted during one shift (1100 calls). Results indicated reduced conversational repairs and a significant reduction of 4 seconds in mean call length. Implications for systems and training are derived.
Gaines, William A.; Marcotte, William R.
2010-01-01
Spider dragline silk is primarily composed of proteins called major ampullate spidroins (MaSp) that consist of a large repeat array flanked by non-repetitive N- and C-terminal domains. Until recently, there has been little evidence for more than one gene encoding each of the two major spidroin silk proteins, MaSp1 and MaSp2. Here, we report the deduced N-terminal domain sequences for two distinct MaSp1 genes from Nephila clavipes (MaSp1A and MaSp1B) and for MaSp2. All three MaSp genes are co-expressed in the major ampullate gland. A search of the GenBank database also revealed two distinct MaSp1 C-terminal domain sequences. Sequencing confirmed that both MaSp1 genes are present in all seven Nephila clavipes spiders examined. The presence of nucleotide polymorphisms in these genes confirmed that MaSp1A and MaSp1B are distinct genetic loci and not merely alleles of the same gene. We have experimentally determined the transcription start sites for all three MaSp genes and established preliminary pairing between the two MaSp1 N- and C-terminal domains. Phylogenetic analysis of these new sequences and other published MaSp N- and C-terminal domain sequences illustrated that duplications of MaSp genes may be widespread among spider species. PMID:18828837
Zhang, Ling-Ling; Wu, Mao; Hu, Bang-Chuan; Chen, Hua-Liang; Pan, Jin-Ren; Ruan, Wei; Yao, Li-Nong
2018-05-08
Naegleria fowleri (N. fowleri) is the only Naegleria spp. known to cause an acute, fulminant, and rapidly fatal central nervous system infection called primary amebic meningoencephalitis (PAM) in human. In 2016, a suspected PAM patient was found in Zhejiang Province of China. The pathogen was identified by microscopic examination and PCR. The positive PCR products were sequenced and the sequences were aligned using NCBI BLAST programme. The homologous and phylogenetic analysis was conducted using the MEGA 6 programme. Under the microscopy, the motile cells with pseudopodia were observed in the direct smear, the motion characteristics of pseudopodia as well as the cell morphology suggested that the pathogen were amoeba trophozoites. The smears stained with Wright-Giemsa showed amoeba trophozoites with various sharps, which were measured of 10-25μm and characterized by the prominent, centrally placed nucleolus and the vacuolated cytoplasm. The PCR showed negative for E. histolytica and E. dispar, while positive for Naegleria spp.and N. fowleri. The nucleotide sequences acquired from this study were submitted to the Genbank with accession numbers of KX909928 and KX909927, respectively. The Blast analysis revealed that the sequences of KX909928 and KX909927 have 100% similarity with the sequence of N. fowleri gene (KT375442.1). Sequence alignment and phylogenetic tree reavealed that N. fowleri collected from this study was classified as genotype 2 and had a closest relative with N. lovaniensis. This study confirms N. fowleri as the agent responsible for this patient, however, PAM normally progresses fast and universally fatal within a week, so the patient still died at two weeks after the onset of symptoms. Copyright © 2018. Published by Elsevier Ltd.
Structure and function of neonatal social communication in a genetic mouse model of autism.
Takahashi, T; Okabe, S; Broin, P Ó; Nishi, A; Ye, K; Beckert, M V; Izumi, T; Machida, A; Kang, G; Abe, S; Pena, J L; Golden, A; Kikusui, T; Hiroi, N
2016-09-01
A critical step toward understanding autism spectrum disorder (ASD) is to identify both genetic and environmental risk factors. A number of rare copy number variants (CNVs) have emerged as robust genetic risk factors for ASD, but not all CNV carriers exhibit ASD and the severity of ASD symptoms varies among CNV carriers. Although evidence exists that various environmental factors modulate symptomatic severity, the precise mechanisms by which these factors determine the ultimate severity of ASD are still poorly understood. Here, using a mouse heterozygous for Tbx1 (a gene encoded in 22q11.2 CNV), we demonstrate that a genetically triggered neonatal phenotype in vocalization generates a negative environmental loop in pup-mother social communication. Wild-type pups used individually diverse sequences of simple and complicated call types, but heterozygous pups used individually invariable call sequences with less complicated call types. When played back, representative wild-type call sequences elicited maternal approach, but heterozygous call sequences were ineffective. When the representative wild-type call sequences were randomized, they were ineffective in eliciting vigorous maternal approach behavior. These data demonstrate that an ASD risk gene alters the neonatal call sequence of its carriers and this pup phenotype in turn diminishes maternal care through atypical social communication. Thus, an ASD risk gene induces, through atypical neonatal call sequences, less than optimal maternal care as a negative neonatal environmental factor.
Structure and function of neonatal social communication in a genetic mouse model of autism
Takahashi, Tomohisa; Okabe, Shota; Ó Broin, Pilib; Nishi, Akira; Ye, Kenny; Beckert, Michael V.; Izumi, Takeshi; Machida, Akihiro; Kang, Gina; Abe, Seiji; Pena, Jose L.; Golden, Aaron; Kikusui, Takefumi; Hiroi, Noboru
2015-01-01
A critical step toward understanding autism spectrum disorder (ASD) is to identify both genetic and environmental risk factors. A number of rare copy number variants (CNVs) have emerged as robust genetic risk factors for ASD, but not all CNV carriers exhibit ASD and the severity of ASD symptoms varies among CNV carriers. Although evidence exists that various environmental factors modulate symptomatic severity, the precise mechanisms by which these factors determine the ultimate severity of ASD are still poorly understood. Here, using a mouse heterozygous for Tbx1 (a gene encoded in 22q11.2 CNV), we demonstrate that a genetically-triggered neonatal phenotype in vocalization generates a negative environmental loop in pup-mother social communication. Wild-type pups used individually diverse sequences of simple and complicated call types, but heterozygous pups used individually invariable call sequences with less complicated call types. When played back, representative wild-type call sequences elicited maternal approach, but heterozygous call sequences were ineffective. When the representative wild-type call sequences were randomized, they were ineffective in eliciting vigorous maternal approach behavior. These data demonstrate that an ASD risk gene alters the neonatal call sequence of its carriers and this pup phenotype in turn diminishes maternal care through atypical social communication. Thus, an ASD risk gene induces, through atypical neonatal call sequences, less than optimal maternal care as a negative neonatal environmental factor. PMID:26666205
Gelada vocal sequences follow Menzerath’s linguistic law
Gustison, Morgan L.; Semple, Stuart; Ferrer-i-Cancho, Ramon; Bergman, Thore J.
2016-01-01
Identifying universal principles underpinning diverse natural systems is a key goal of the life sciences. A powerful approach in addressing this goal has been to test whether patterns consistent with linguistic laws are found in nonhuman animals. Menzerath’s law is a linguistic law that states that, the larger the construct, the smaller the size of its constituents. Here, to our knowledge, we present the first evidence that Menzerath’s law holds in the vocal communication of a nonhuman species. We show that, in vocal sequences of wild male geladas (Theropithecus gelada), construct size (sequence size in number of calls) is negatively correlated with constituent size (duration of calls). Call duration does not vary significantly with position in the sequence, but call sequence composition does change with sequence size and most call types are abbreviated in larger sequences. We also find that intercall intervals follow the same relationship with sequence size as do calls. Finally, we provide formal mathematical support for the idea that Menzerath’s law reflects compression—the principle of minimizing the expected length of a code. Our findings suggest that a common principle underpins human and gelada vocal communication, highlighting the value of exploring the applicability of linguistic laws in vocal systems outside the realm of language. PMID:27091968
van den Akker, Jeroen; Mishne, Gilad; Zimmer, Anjali D; Zhou, Alicia Y
2018-04-17
Next generation sequencing (NGS) has become a common technology for clinical genetic tests. The quality of NGS calls varies widely and is influenced by features like reference sequence characteristics, read depth, and mapping accuracy. With recent advances in NGS technology and software tools, the majority of variants called using NGS alone are in fact accurate and reliable. However, a small subset of difficult-to-call variants that still do require orthogonal confirmation exist. For this reason, many clinical laboratories confirm NGS results using orthogonal technologies such as Sanger sequencing. Here, we report the development of a deterministic machine-learning-based model to differentiate between these two types of variant calls: those that do not require confirmation using an orthogonal technology (high confidence), and those that require additional quality testing (low confidence). This approach allows reliable NGS-based calling in a clinical setting by identifying the few important variant calls that require orthogonal confirmation. We developed and tested the model using a set of 7179 variants identified by a targeted NGS panel and re-tested by Sanger sequencing. The model incorporated several signals of sequence characteristics and call quality to determine if a variant was identified at high or low confidence. The model was tuned to eliminate false positives, defined as variants that were called by NGS but not confirmed by Sanger sequencing. The model achieved very high accuracy: 99.4% (95% confidence interval: +/- 0.03%). It categorized 92.2% (6622/7179) of the variants as high confidence, and 100% of these were confirmed to be present by Sanger sequencing. Among the variants that were categorized as low confidence, defined as NGS calls of low quality that are likely to be artifacts, 92.1% (513/557) were found to be not present by Sanger sequencing. This work shows that NGS data contains sufficient characteristics for a machine-learning-based model to differentiate low from high confidence variants. Additionally, it reveals the importance of incorporating site-specific features as well as variant call features in such a model.
Comparison and evaluation of two exome capture kits and sequencing platforms for variant calling.
Zhang, Guoqiang; Wang, Jianfeng; Yang, Jin; Li, Wenjie; Deng, Yutian; Li, Jing; Huang, Jun; Hu, Songnian; Zhang, Bing
2015-08-05
To promote the clinical application of next-generation sequencing, it is important to obtain accurate and consistent variants of target genomic regions at low cost. Ion Proton, the latest updated semiconductor-based sequencing instrument from Life Technologies, is designed to provide investigators with an inexpensive platform for human whole exome sequencing that achieves a rapid turnaround time. However, few studies have comprehensively compared and evaluated the accuracy of variant calling between Ion Proton and Illumina sequencing platforms such as HiSeq 2000, which is the most popular sequencing platform for the human genome. The Ion Proton sequencer combined with the Ion TargetSeq Exome Enrichment Kit together make up TargetSeq-Proton, whereas SureSelect-Hiseq is based on the Agilent SureSelect Human All Exon v4 Kit and the HiSeq 2000 sequencer. Here, we sequenced exonic DNA from four human blood samples using both TargetSeq-Proton and SureSelect-HiSeq. We then called variants in the exonic regions that overlapped between the two exome capture kits (33.6 Mb). The rates of shared variant loci called by two sequencing platforms were from 68.0 to 75.3% in four samples, whereas the concordance of co-detected variant loci reached 99%. Sanger sequencing validation revealed that the validated rate of concordant single nucleotide polymorphisms (SNPs) (91.5%) was higher than the SNPs specific to TargetSeq-Proton (60.0%) or specific to SureSelect-HiSeq (88.3%). With regard to 1-bp small insertions and deletions (InDels), the Sanger sequencing validated rates of concordant variants (100.0%) and SureSelect-HiSeq-specific (89.6%) were higher than those of TargetSeq-Proton-specific (15.8%). In the sequencing of exonic regions, a combination of using of two sequencing strategies (SureSelect-HiSeq and TargetSeq-Proton) increased the variant calling specificity for concordant variant loci and the sensitivity for variant loci called by any one platform. However, for the sequencing of platform-specific variants, the accuracy of variant calling by HiSeq 2000 was higher than that of Ion Proton, specifically for the InDel detection. Moreover, the variant calling software also influences the detection of SNPs and, specifically, InDels in Ion Proton exome sequencing.
USDA-ARS?s Scientific Manuscript database
Bacterial type III secretion systems (T3SSs) deliver proteins called effectors into eukaryotic cells. Although N-terminal amino acid sequences are required for translocation, the mechanism of substrate recognition by the T3SS is unknown. Almost all actively deployed T3SS substrates in the plant path...
QQ-SNV: single nucleotide variant detection at low frequency by comparing the quality quantiles.
Van der Borght, Koen; Thys, Kim; Wetzels, Yves; Clement, Lieven; Verbist, Bie; Reumers, Joke; van Vlijmen, Herman; Aerssens, Jeroen
2015-11-10
Next generation sequencing enables studying heterogeneous populations of viral infections. When the sequencing is done at high coverage depth ("deep sequencing"), low frequency variants can be detected. Here we present QQ-SNV (http://sourceforge.net/projects/qqsnv), a logistic regression classifier model developed for the Illumina sequencing platforms that uses the quantiles of the quality scores, to distinguish true single nucleotide variants from sequencing errors based on the estimated SNV probability. To train the model, we created a dataset of an in silico mixture of five HIV-1 plasmids. Testing of our method in comparison to the existing methods LoFreq, ShoRAH, and V-Phaser 2 was performed on two HIV and four HCV plasmid mixture datasets and one influenza H1N1 clinical dataset. For default application of QQ-SNV, variants were called using a SNV probability cutoff of 0.5 (QQ-SNV(D)). To improve the sensitivity we used a SNV probability cutoff of 0.0001 (QQ-SNV(HS)). To also increase specificity, SNVs called were overruled when their frequency was below the 80(th) percentile calculated on the distribution of error frequencies (QQ-SNV(HS-P80)). When comparing QQ-SNV versus the other methods on the plasmid mixture test sets, QQ-SNV(D) performed similarly to the existing approaches. QQ-SNV(HS) was more sensitive on all test sets but with more false positives. QQ-SNV(HS-P80) was found to be the most accurate method over all test sets by balancing sensitivity and specificity. When applied to a paired-end HCV sequencing study, with lowest spiked-in true frequency of 0.5%, QQ-SNV(HS-P80) revealed a sensitivity of 100% (vs. 40-60% for the existing methods) and a specificity of 100% (vs. 98.0-99.7% for the existing methods). In addition, QQ-SNV required the least overall computation time to process the test sets. Finally, when testing on a clinical sample, four putative true variants with frequency below 0.5% were consistently detected by QQ-SNV(HS-P80) from different generations of Illumina sequencers. We developed and successfully evaluated a novel method, called QQ-SNV, for highly efficient single nucleotide variant calling on Illumina deep sequencing virology data.
DOE Office of Scientific and Technical Information (OSTI.GOV)
John C. Meeks
2001-12-31
Nostoc punctiforme is a filamentous cyanobacterium with extensive phenotypic characteristics and a relatively large genome, approaching 10 Mb. The phenotypic characteristics include a photoautotrophic, diazotrophic mode of growth, but N. punctiforme is also facultatively heterotrophic; its vegetative cells have multiple development alternatives, including terminal differentiation into nitrogen-fixing heterocysts and transient differentiation into spore-like akinetes or motile filaments called hormogonia; and N. punctiforme has broad symbiotic competence with fungi and terrestrial plants, including bryophytes, gymnosperms and an angiosperm. The shotgun-sequencing phase of the N. punctiforme strain ATCC 29133 genome has been completed by the Joint Genome Institute. Annotation of an 8.9more » Mb database yielded 7432 open reading frames, 45% of which encode proteins with known or probable known function and 29% of which are unique to N. punctiforme. Comparative analysis of the sequence indicates a genome that is highly plastic and in a state of flux, with numerous insertion sequences and multilocus repeats, as well as genes encoding transposases and DNA modification enzymes. The sequence also reveals the presence of genes encoding putative proteins that collectively define almost all characteristics of cyanobacteria as a group. N. punctiforme has an extensive potential to sense and respond to environmental signals as reflected by the presence of more than 400 genes encoding sensor protein kinases, response regulators and other transcriptional factors. The signal transduction systems and any of the large number of unique genes may play essential roles in the cell differentiation and symbiotic interaction properties of N. punctiforme.« less
An Ecotype of Neorickettsia risticii Causing Potomac Horse Fever in Canada
Xiong, Qingming; Bekebrede, Hannah; Sharma, Pratibha; Arroyo, Luis G.; Baird, John D.
2016-01-01
ABSTRACT Neorickettsia (formerly Ehrlichia) risticii is an obligatory intracellular bacterium of digenetic trematodes. When a horse accidentally ingests aquatic insects containing encysted trematodes infected with N. risticii, the bacterium is transmitted from trematodes to horse cells and causes an acute and often fatal disease called Potomac horse fever (PHF). Since the discovery of N. risticii in the United States in 1984, using immunofluorescence and PCR assays, PHF has been increasingly recognized throughout North America and South America. However, so far, there exist only a few stable N. risticii culture isolates, all of which are from horses within the United States, and the strain diversity and environmental spreading and distribution of pathogenic N. risticii strains remain poorly understood. This paper reports the isolation of N. risticii from the blood of a horse with acute PHF in Ontario, Canada. Intracellular N. risticii colonies were detected in P388D1 cells after 47 days of culturing and 8 days after the addition of rapamycin. Molecular phylogenetic analysis based on amino acid sequences of major surface proteins P51 and Ssa1 showed that this isolate is distinct from any previously sequenced strains but closely related to midwestern U.S. strains. This is the first Canadian strain cultured, and a new method was developed to reactivate dormant N. risticii to improve culture isolation. IMPORTANCE Neorickettsia risticii is an environmental bacterium that lives inside flukes that are parasitic to aquatic snails, insects, and bats. When a horse accidentally ingests insects harboring flukes infected with N. risticii, the bacterium is transmitted to the horse and causes an acute and often fatal disease called Potomac horse fever. Although the disease has been increasingly recognized throughout North and South America, N. risticii has not been cultured outside the United States. This paper reports the first Canadian strain cultured and a new method to effectively culture isolate N. risticii from the horse blood sample. Molecular analysis showed that the genotype of this Canadian strain is distinct from previously sequenced strains but closely related to midwestern U.S. strains. Culture isolation of N. risticii strains would confirm the geographic presence of pathogenic N. risticii, help elucidate N. risticii strain diversity and environmental spreading and distribution, and improve diagnosis and development of vaccines for this dreadful disease. PMID:27474720
An Ecotype of Neorickettsia risticii Causing Potomac Horse Fever in Canada.
Xiong, Qingming; Bekebrede, Hannah; Sharma, Pratibha; Arroyo, Luis G; Baird, John D; Rikihisa, Yasuko
2016-10-01
Neorickettsia (formerly Ehrlichia) risticii is an obligatory intracellular bacterium of digenetic trematodes. When a horse accidentally ingests aquatic insects containing encysted trematodes infected with N. risticii, the bacterium is transmitted from trematodes to horse cells and causes an acute and often fatal disease called Potomac horse fever (PHF). Since the discovery of N. risticii in the United States in 1984, using immunofluorescence and PCR assays, PHF has been increasingly recognized throughout North America and South America. However, so far, there exist only a few stable N. risticii culture isolates, all of which are from horses within the United States, and the strain diversity and environmental spreading and distribution of pathogenic N. risticii strains remain poorly understood. This paper reports the isolation of N. risticii from the blood of a horse with acute PHF in Ontario, Canada. Intracellular N. risticii colonies were detected in P388D1 cells after 47 days of culturing and 8 days after the addition of rapamycin. Molecular phylogenetic analysis based on amino acid sequences of major surface proteins P51 and Ssa1 showed that this isolate is distinct from any previously sequenced strains but closely related to midwestern U.S. strains. This is the first Canadian strain cultured, and a new method was developed to reactivate dormant N. risticii to improve culture isolation. Neorickettsia risticii is an environmental bacterium that lives inside flukes that are parasitic to aquatic snails, insects, and bats. When a horse accidentally ingests insects harboring flukes infected with N. risticii, the bacterium is transmitted to the horse and causes an acute and often fatal disease called Potomac horse fever. Although the disease has been increasingly recognized throughout North and South America, N. risticii has not been cultured outside the United States. This paper reports the first Canadian strain cultured and a new method to effectively culture isolate N. risticii from the horse blood sample. Molecular analysis showed that the genotype of this Canadian strain is distinct from previously sequenced strains but closely related to midwestern U.S. strains. Culture isolation of N. risticii strains would confirm the geographic presence of pathogenic N. risticii, help elucidate N. risticii strain diversity and environmental spreading and distribution, and improve diagnosis and development of vaccines for this dreadful disease. Copyright © 2016, American Society for Microbiology. All Rights Reserved.
Fast single-pass alignment and variant calling using sequencing data
USDA-ARS?s Scientific Manuscript database
Sequencing research requires efficient computation. Few programs use already known information about DNA variants when aligning sequence data to the reference map. New program findmap.f90 reads the previous variant list before aligning sequence, calling variant alleles, and summing the allele counts...
Peng, Yousong; Yang, Lei; Li, Honglei; Zou, Yuanqiang; Deng, Lizong; Wu, Aiping; Du, Xiangjun; Wang, Dayan; Shu, Yuelong; Jiang, Taijiao
2016-08-15
Timely surveillance of the antigenic dynamics of the influenza virus is critical for accurate selection of vaccine strains, which is important for effective prevention of viral spread and infection. Here, we provide a computational platform, called PREDAC-H3, for antigenic surveillance of human influenza A(H3N2) virus based on the sequence of surface protein hemagglutinin (HA). PREDAC-H3 not only determines the antigenic variants and antigenic cluster (grouped for similar antigenicity) to which the virus belongs, based on HA sequences, but also allows visualization of the spatial distribution and temporal dynamics of antigenic clusters of viruses isolated from around the world, thus assisting in antigenic surveillance of human influenza A(H3N2) virus. It is publicly available from: http://biocloud.hnu.edu.cn/influ411/html/index.php : yshu@cnic.org.cn or taijiao@moon.ibp.ac.cn. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Campbell's monkeys concatenate vocalizations into context-specific call sequences
Ouattara, Karim; Lemasson, Alban; Zuberbühler, Klaus
2009-01-01
Primate vocal behavior is often considered irrelevant in modeling human language evolution, mainly because of the caller's limited vocal control and apparent lack of intentional signaling. Here, we present the results of a long-term study on Campbell's monkeys, which has revealed an unrivaled degree of vocal complexity. Adult males produced six different loud call types, which they combined into various sequences in highly context-specific ways. We found stereotyped sequences that were strongly associated with cohesion and travel, falling trees, neighboring groups, nonpredatory animals, unspecific predatory threat, and specific predator classes. Within the responses to predators, we found that crowned eagles triggered four and leopards three different sequences, depending on how the caller learned about their presence. Callers followed a number of principles when concatenating sequences, such as nonrandom transition probabilities of call types, addition of specific calls into an existing sequence to form a different one, or recombination of two sequences to form a third one. We conclude that these primates have overcome some of the constraints of limited vocal control by combinatorial organization. As the different sequences were so tightly linked to specific external events, the Campbell's monkey call system may be the most complex example of ‘proto-syntax’ in animal communication known to date. PMID:20007377
Model-based quality assessment and base-calling for second-generation sequencing data.
Bravo, Héctor Corrada; Irizarry, Rafael A
2010-09-01
Second-generation sequencing (sec-gen) technology can sequence millions of short fragments of DNA in parallel, making it capable of assembling complex genomes for a small fraction of the price and time of previous technologies. In fact, a recently formed international consortium, the 1000 Genomes Project, plans to fully sequence the genomes of approximately 1200 people. The prospect of comparative analysis at the sequence level of a large number of samples across multiple populations may be achieved within the next five years. These data present unprecedented challenges in statistical analysis. For instance, analysis operates on millions of short nucleotide sequences, or reads-strings of A,C,G, or T's, between 30 and 100 characters long-which are the result of complex processing of noisy continuous fluorescence intensity measurements known as base-calling. The complexity of the base-calling discretization process results in reads of widely varying quality within and across sequence samples. This variation in processing quality results in infrequent but systematic errors that we have found to mislead downstream analysis of the discretized sequence read data. For instance, a central goal of the 1000 Genomes Project is to quantify across-sample variation at the single nucleotide level. At this resolution, small error rates in sequencing prove significant, especially for rare variants. Sec-gen sequencing is a relatively new technology for which potential biases and sources of obscuring variation are not yet fully understood. Therefore, modeling and quantifying the uncertainty inherent in the generation of sequence reads is of utmost importance. In this article, we present a simple model to capture uncertainty arising in the base-calling procedure of the Illumina/Solexa GA platform. Model parameters have a straightforward interpretation in terms of the chemistry of base-calling allowing for informative and easily interpretable metrics that capture the variability in sequencing quality. Our model provides these informative estimates readily usable in quality assessment tools while significantly improving base-calling performance. © 2009, The International Biometric Society.
Mixed Sequence Reader: A Program for Analyzing DNA Sequences with Heterozygous Base Calling
Chang, Chun-Tien; Tsai, Chi-Neu; Tang, Chuan Yi; Chen, Chun-Houh; Lian, Jang-Hau; Hu, Chi-Yu; Tsai, Chia-Lung; Chao, Angel; Lai, Chyong-Huey; Wang, Tzu-Hao; Lee, Yun-Shien
2012-01-01
The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs), insertion-deletions (indels), short tandem repeats (STRs), and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR), which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i) physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii) predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS); (iii) determine human papilloma virus (HPV) genotypes by searching current viral databases in cases of double infections; (iv) estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4) and its paralog HSPDP3. PMID:22778697
Correcting for Sample Contamination in Genotype Calling of DNA Sequence Data
Flickinger, Matthew; Jun, Goo; Abecasis, Gonçalo R.; Boehnke, Michael; Kang, Hyun Min
2015-01-01
DNA sample contamination is a frequent problem in DNA sequencing studies and can result in genotyping errors and reduced power for association testing. We recently described methods to identify within-species DNA sample contamination based on sequencing read data, showed that our methods can reliably detect and estimate contamination levels as low as 1%, and suggested strategies to identify and remove contaminated samples from sequencing studies. Here we propose methods to model contamination during genotype calling as an alternative to removal of contaminated samples from further analyses. We compare our contamination-adjusted calls to calls that ignore contamination and to calls based on uncontaminated data. We demonstrate that, for moderate contamination levels (5%–20%), contamination-adjusted calls eliminate 48%–77% of the genotyping errors. For lower levels of contamination, our contamination correction methods produce genotypes nearly as accurate as those based on uncontaminated data. Our contamination correction methods are useful generally, but are particularly helpful for sample contamination levels from 2% to 20%. PMID:26235984
ERIC Educational Resources Information Center
Matlen, Bryan J.; Klahr, David
2013-01-01
We report the effect of different sequences of high vs low levels of instructional guidance on children's immediate learning and long-term transfer of simple experimental design procedures and concepts, often called "CVS" (Control of Variables Strategy). Third-grade children (N = 57) received instruction in CVS via one of four possible orderings…
Schel, Anne Marijke; Tranquilli, Sandra; Zuberbühler, Klaus
2009-05-01
Vervet monkey alarm calling has long been the paradigmatic example of how primates use vocalizations in response to predators. In vervets, there is a close and direct relationship between the production of distinct alarm vocalizations and the presence of distinct predator types. Recent fieldwork has however revealed the use of several additional alarm calling systems in primates. Here, the authors describe playback studies on the alarm call system of two colobine species, the King colobus (Colobus polykomos) of Taï Forest, Ivory Coast, and the Guereza colobus (C. guereza) of Budongo Forest, Uganda. Both species produce two basic alarm call types, snorts and acoustically variable roaring phrases, when confronted with leopards or crowned eagles. Neither call type is given exclusively to one predator, but the authors found strong regularities in call sequencing. Leopards typically elicited sequences consisting of a snort followed by few phrases, while eagles typically elicited sequences with no snorts and many phrases. The authors discuss how these call sequences have the potential to encode information at different levels, such as predator type, response-urgency, or the caller's imminent behavior. (PsycINFO Database Record (c) 2009 APA, all rights reserved).
VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research
Lai, Zhongwu; Markovets, Aleksandra; Ahdesmaki, Miika; Chapman, Brad; Hofmann, Oliver; McEwen, Robert; Johnson, Justin; Dougherty, Brian; Barrett, J. Carl; Dry, Jonathan R.
2016-01-01
Abstract Accurate variant calling in next generation sequencing (NGS) is critical to understand cancer genomes better. Here we present VarDict, a novel and versatile variant caller for both DNA- and RNA-sequencing data. VarDict simultaneously calls SNV, MNV, InDels, complex and structural variants, expanding the detected genetic driver landscape of tumors. It performs local realignments on the fly for more accurate allele frequency estimation. VarDict performance scales linearly to sequencing depth, enabling ultra-deep sequencing used to explore tumor evolution or detect tumor DNA circulating in blood. In addition, VarDict performs amplicon aware variant calling for polymerase chain reaction (PCR)-based targeted sequencing often used in diagnostic settings, and is able to detect PCR artifacts. Finally, VarDict also detects differences in somatic and loss of heterozygosity variants between paired samples. VarDict reprocessing of The Cancer Genome Atlas (TCGA) Lung Adenocarcinoma dataset called known driver mutations in KRAS, EGFR, BRAF, PIK3CA and MET in 16% more patients than previously published variant calls. We believe VarDict will greatly facilitate application of NGS in clinical cancer research. PMID:27060149
Evaluation of exome variants using the Ion Proton Platform to sequence error-prone regions.
Seo, Heewon; Park, Yoomi; Min, Byung Joo; Seo, Myung Eui; Kim, Ju Han
2017-01-01
The Ion Proton sequencer from Thermo Fisher accurately determines sequence variants from target regions with a rapid turnaround time at a low cost. However, misleading variant-calling errors can occur. We performed a systematic evaluation and manual curation of read-level alignments for the 675 ultrarare variants reported by the Ion Proton sequencer from 27 whole-exome sequencing data but that are not present in either the 1000 Genomes Project and the Exome Aggregation Consortium. We classified positive variant calls into 393 highly likely false positives, 126 likely false positives, and 156 likely true positives, which comprised 58.2%, 18.7%, and 23.1% of the variants, respectively. We identified four distinct error patterns of variant calling that may be bioinformatically corrected when using different strategies: simplicity region, SNV cluster, peripheral sequence read, and base inversion. Local de novo assembly successfully corrected 201 (38.7%) of the 519 highly likely or likely false positives. We also demonstrate that the two sequencing kits from Thermo Fisher (the Ion PI Sequencing 200 kit V3 and the Ion PI Hi-Q kit) exhibit different error profiles across different error types. A refined calling algorithm with better polymerase may improve the performance of the Ion Proton sequencing platform.
Utilization of sequence on relatives to improve analysis of individuals' low-coverage NGS data
USDA-ARS?s Scientific Manuscript database
Low-coverage sequence data is expected to have low call rates under the prevailing paradigm that genotypes are first “called” from sequence data of each individual independently and subsequent analyses (including determination of haplotypes) are dependent on those called genotypes. However, provide...
Mananga, Eugene S; Reid, Alicia E; Charpentier, Thibault
2012-02-01
This article describes the use of an alternative expansion scheme called Floquet-Magnus expansion (FME) to study the dynamics of spin system in solid-state NMR. The main tool used to describe the effect of time-dependent interactions in NMR is the average Hamiltonian theory (AHT). However, some NMR experiments, such as sample rotation and pulse crafting, seem to be more conveniently described using the Floquet theory (FT). Here, we present the first report highlighting the basics of the Floquet-Magnus expansion (FME) scheme and hint at its application on recoupling sequences that excite more efficiently double-quantum coherences, namely BABA and C7 radiofrequency pulse sequences. The use of Λ(n)(t) functions available only in the FME scheme, allows the comparison of the efficiency of BABA and C7 sequences. Copyright © 2011 Elsevier Inc. All rights reserved.
Reid, Alicia E.; Charpentier, Thibault
2013-01-01
This article describes the use of an alternative expansion scheme called Floquet-Magnus expansion (FME) to study the dynamics of spin system in solid-state NMR. The main tool used to describe the effect of time-dependent interactions in NMR is the average Hamiltonian theory (AHT). However, some NMR experiments, such as sample rotation and pulse crafting, seem to be more conveniently described using the Floquet theory (FT). Here, we present the first report highlighting the basics of the Floquet-Magnus expansion (FME) scheme and hint at its application on recoupling sequences that excite more efficiently double-quantum coherences, namely BABA and C7 radiofrequency pulse sequences. The use of Λn(t) functions available only in the FME scheme, allows the comparison of the efficiency of BABA and C7 sequences. PMID:22197191
Code-Time Diversity for Direct Sequence Spread Spectrum Systems
Hassan, A. Y.
2014-01-01
Time diversity is achieved in direct sequence spread spectrum by receiving different faded delayed copies of the transmitted symbols from different uncorrelated channel paths when the transmission signal bandwidth is greater than the coherence bandwidth of the channel. In this paper, a new time diversity scheme is proposed for spread spectrum systems. It is called code-time diversity. In this new scheme, N spreading codes are used to transmit one data symbol over N successive symbols interval. The diversity order in the proposed scheme equals to the number of the used spreading codes N multiplied by the number of the uncorrelated paths of the channel L. The paper represents the transmitted signal model. Two demodulators structures will be proposed based on the received signal models from Rayleigh flat and frequency selective fading channels. Probability of error in the proposed diversity scheme is also calculated for the same two fading channels. Finally, simulation results are represented and compared with that of maximal ration combiner (MRC) and multiple-input and multiple-output (MIMO) systems. PMID:24982925
Visvesvara, Govinda S; De Jonckheere, Johan F; Sriram, Rama; Daft, Barbara
2005-08-01
Naegleria fowleri causes an acute and rapidly fatal central nervous system infection called primary amebic meningoencephalitis (PAM) in healthy children and young adults. We describe here the identification of N. fowleri isolated from the brain of one of several cows that died of PAM based on sequencing of the internal transcribed spacers, including the 5.8S rRNA genes.
A new molecular evolution model for limited insertion independent of substitution.
Lèbre, Sophie; Michel, Christian J
2013-10-01
We recently introduced a new molecular evolution model called the IDIS model for Insertion Deletion Independent of Substitution [13,14]. In the IDIS model, the three independent processes of substitution, insertion and deletion of residues have constant rates. In order to control the genome expansion during evolution, we generalize here the IDIS model by introducing an insertion rate which decreases when the sequence grows and tends to 0 for a maximum sequence length nmax. This new model, called LIIS for Limited Insertion Independent of Substitution, defines a matrix differential equation satisfied by a vector P(t) describing the sequence content in each residue at evolution time t. An analytical solution is obtained for any diagonalizable substitution matrix M. Thus, the LIIS model gives an expression of the sequence content vector P(t) in each residue under evolution time t as a function of the eigenvalues and the eigenvectors of matrix M, the residue insertion rate vector R, the total insertion rate r, the initial and maximum sequence lengths n0 and nmax, respectively, and the sequence content vector P(t0) at initial time t0. The derivation of the analytical solution is much more technical, compared to the IDIS model, as it involves Gauss hypergeometric functions. Several propositions of the LIIS model are derived: proof that the IDIS model is a particular case of the LIIS model when the maximum sequence length nmax tends to infinity, fixed point, time scale, time step and time inversion. Using a relation between the sequence length l and the evolution time t, an expression of the LIIS model as a function of the sequence length l=n(t) is obtained. Formulas for 'insertion only', i.e. when the substitution rates are all equal to 0, are derived at evolution time t and sequence length l. Analytical solutions of the LIIS model are explicitly derived, as a function of either evolution time t or sequence length l, for two classical substitution matrices: the 3-parameter symmetric substitution matrix [12] (LIIS-SYM3) and the HKY asymmetric substitution matrix[9] (LIIS-HKY). An evaluation of the LIIS model (precisely, LIIS-HKY) based on four statistical analyses of the GC content in complete genomes of four prokaryotic taxonomic groups, namely Chlamydiae, Crenarchaeota, Spirochaetes and Thermotogae, shows the expected improvement from the theory of the LIIS model compared to the IDIS model. Copyright © 2013 Elsevier Inc. All rights reserved.
Variant calling in low-coverage whole genome sequencing of a Native American population sample.
Bizon, Chris; Spiegel, Michael; Chasse, Scott A; Gizer, Ian R; Li, Yun; Malc, Ewa P; Mieczkowski, Piotr A; Sailsbery, Josh K; Wang, Xiaoshu; Ehlers, Cindy L; Wilhelmsen, Kirk C
2014-01-30
The reduction in the cost of sequencing a human genome has led to the use of genotype sampling strategies in order to impute and infer the presence of sequence variants that can then be tested for associations with traits of interest. Low-coverage Whole Genome Sequencing (WGS) is a sampling strategy that overcomes some of the deficiencies seen in fixed content SNP array studies. Linkage-disequilibrium (LD) aware variant callers, such as the program Thunder, may provide a calling rate and accuracy that makes a low-coverage sequencing strategy viable. We examined the performance of an LD-aware variant calling strategy in a population of 708 low-coverage whole genome sequences from a community sample of Native Americans. We assessed variant calling through a comparison of the sequencing results to genotypes measured in 641 of the same subjects using a fixed content first generation exome array. The comparison was made using the variant calling routines GATK Unified Genotyper program and the LD-aware variant caller Thunder. Thunder was found to improve concordance in a coverage dependent fashion, while correctly calling nearly all of the common variants as well as a high percentage of the rare variants present in the sample. Low-coverage WGS is a strategy that appears to collect genetic information intermediate in scope between fixed content genotyping arrays and deep-coverage WGS. Our data suggests that low-coverage WGS is a viable strategy with a greater chance of discovering novel variants and associations than fixed content arrays for large sample association analyses.
NASA Astrophysics Data System (ADS)
Noirel, Josselin; Simonson, Thomas
2008-11-01
Following Kimura's neutral theory of molecular evolution [M. Kimura, The Neutral Theory of Molecular Evolution (Cambridge University Press, Cambridge, 1983) (reprinted in 1986)], it has become common to assume that the vast majority of viable mutations of a gene confer little or no functional advantage. Yet, in silico models of protein evolution have shown that mutational robustness of sequences could be selected for, even in the context of neutral evolution. The evolution of a biological population can be seen as a diffusion on the network of viable sequences. This network is called a "neutral network." Depending on the mutation rate μ and the population size N, the biological population can evolve purely randomly (μN ≪1) or it can evolve in such a way as to select for sequences of higher mutational robustness (μN ≫1). The stringency of the selection depends not only on the product μN but also on the exact topology of the neutral network, the special arrangement of which was named "superfunnel." Even though the relation between mutation rate, population size, and selection was thoroughly investigated, a study of the salient topological features of the superfunnel that could affect the strength of the selection was wanting. This question is addressed in this study. We use two different models of proteins: on lattice and off lattice. We compare neutral networks computed using these models to random networks. From this, we identify two important factors of the topology that determine the stringency of the selection for mutationally robust sequences. First, the presence of highly connected nodes ("hubs") in the network increases the selection for mutationally robust sequences. Second, the stringency of the selection increases when the correlation between a sequence's mutational robustness and its neighbors' increases. The latter finding relates a global characteristic of the neutral network to a local one, which is attainable through experiments or molecular modeling.
Noirel, Josselin; Simonson, Thomas
2008-11-14
Following Kimura's neutral theory of molecular evolution [M. Kimura, The Neutral Theory of Molecular Evolution (Cambridge University Press, Cambridge, 1983) (reprinted in 1986)], it has become common to assume that the vast majority of viable mutations of a gene confer little or no functional advantage. Yet, in silico models of protein evolution have shown that mutational robustness of sequences could be selected for, even in the context of neutral evolution. The evolution of a biological population can be seen as a diffusion on the network of viable sequences. This network is called a "neutral network." Depending on the mutation rate mu and the population size N, the biological population can evolve purely randomly (muN<1) or it can evolve in such a way as to select for sequences of higher mutational robustness (muN>1). The stringency of the selection depends not only on the product muN but also on the exact topology of the neutral network, the special arrangement of which was named "superfunnel." Even though the relation between mutation rate, population size, and selection was thoroughly investigated, a study of the salient topological features of the superfunnel that could affect the strength of the selection was wanting. This question is addressed in this study. We use two different models of proteins: on lattice and off lattice. We compare neutral networks computed using these models to random networks. From this, we identify two important factors of the topology that determine the stringency of the selection for mutationally robust sequences. First, the presence of highly connected nodes ("hubs") in the network increases the selection for mutationally robust sequences. Second, the stringency of the selection increases when the correlation between a sequence's mutational robustness and its neighbors' increases. The latter finding relates a global characteristic of the neutral network to a local one, which is attainable through experiments or molecular modeling.
Goswami, Rashmi S; Patel, Keyur P; Singh, Rajesh R; Meric-Bernstam, Funda; Kopetz, E Scott; Subbiah, Vivek; Alvarez, Ricardo H; Davies, Michael A; Jabbar, Kausar J; Roy-Chowdhuri, Sinchita; Lazar, Alexander J; Medeiros, L Jeffrey; Broaddus, Russell R; Luthra, Rajyalakshmi; Routbort, Mark J
2015-06-01
We used a clinical next-generation sequencing (NGS) hotspot mutation panel to investigate clonal evolution in paired primary and metastatic tumors. A total of 265 primary and metastatic tumor pairs were sequenced using a 46-gene cancer mutation panel capable of detecting one or more single-nucleotide variants as well as small insertions/deletions. Mutations were tabulated together with tumor type and percentage, mutational variant frequency, time interval between onset of primary tumor and metastasis, and neoadjuvant therapy status. Of note, 227 of 265 (85.7%) tumor metastasis pairs showed identical mutation calls. Of the tumor pairs with identical mutation calls, 160 (60.4%) possessed defining somatic mutation signatures and 67 (25.3%) did not exhibit any somatic mutations. There were 38 (14.3%) cases that showed at least one novel mutation call between the primary and metastasis. Metastases were almost two times more likely to show novel mutations (n = 20, 7.5%) than primary tumors (n = 12, 4.5%). TP53 was the most common additionally mutated gene in metastatic lesions, followed by PIK3CA and SMAD4. PIK3CA mutations were more often associated with metastasis in colon carcinoma samples. Clinical NGS hotspot panels can be useful in analyzing clonal evolution within tumors as well as in determining subclonal mutations that can expand in future metastases. PIK3CA, SMAD4, and TP53 are most often involved in clonal divergence, providing potential targets that may help guide the clinical management of tumor progression or metastases. ©2015 American Association for Cancer Research.
A Machine Learning Method for Power Prediction on the Mobile Devices.
Chen, Da-Ren; Chen, You-Shyang; Chen, Lin-Chih; Hsu, Ming-Yang; Chiang, Kai-Feng
2015-10-01
Energy profiling and estimation have been popular areas of research in multicore mobile architectures. While short sequences of system calls have been recognized by machine learning as pattern descriptions for anomalous detection, power consumption of running processes with respect to system-call patterns are not well studied. In this paper, we propose a fuzzy neural network (FNN) for training and analyzing process execution behaviour with respect to series of system calls, parameters and their power consumptions. On the basis of the patterns of a series of system calls, we develop a power estimation daemon (PED) to analyze and predict the energy consumption of the running process. In the initial stage, PED categorizes sequences of system calls as functional groups and predicts their energy consumptions by FNN. In the operational stage, PED is applied to identify the predefined sequences of system calls invoked by running processes and estimates their energy consumption.
Efficient computation of optimal oligo-RNA binding.
Hodas, Nathan O; Aalberts, Daniel P
2004-01-01
We present an algorithm that calculates the optimal binding conformation and free energy of two RNA molecules, one or both oligomeric. This algorithm has applications to modeling DNA microarrays, RNA splice-site recognitions and other antisense problems. Although other recent algorithms perform the same calculation in time proportional to the sum of the lengths cubed, O((N1 + N2)3), our oligomer binding algorithm, called bindigo, scales as the product of the sequence lengths, O(N1*N2). The algorithm performs well in practice with the aid of a heuristic for large asymmetric loops. To demonstrate its speed and utility, we use bindigo to investigate the binding proclivities of U1 snRNA to mRNA donor splice sites.
Spin wave propagation spectra in Octonacci one-dimensional magnonic quasicrystals
NASA Astrophysics Data System (ADS)
Valeriano, Analine P.; Costa, Carlos H.; Bezerra, Claudionor G.
2018-06-01
In this paper, we study spin wave propagation in quasiperiodic magnonic superlattices that follow the so-called Octonacci quasiperiodic sequence, where the N-th stage can be obtained through the recurrence rule SN =SN-1SN-2SN-1 , for N ⩾ 3 , and starting with S1 = A and S2 = B . The multilayered magnonic nanostructure is composed of two simple cubic ferromagnetic materials, labeled A and B, which interact through bilinear and biquadratic exchange couplings at their interfaces. The ferromagnetic materials are described by the Heisenberg model, and a transfer matrix treatment is employed, with the calculations performed for the exchange-dominated regime, taking the random phase approximation (RPA) into account. The obtained numerical results show the effects of both (i) the Octonacci quasiperiodic sequence and (ii) the biquadratic exchange coupling on the band structure and transmission spectra of spin waves. Comparisons are also performed with the spectra found in other periodic and quasiperiodic structures.
Standish, Kristopher A; Carland, Tristan M; Lockwood, Glenn K; Pfeiffer, Wayne; Tatineni, Mahidhar; Huang, C Chris; Lamberth, Sarah; Cherkas, Yauheniya; Brodmerkel, Carrie; Jaeger, Ed; Smith, Lance; Rajagopal, Gunaretnam; Curran, Mark E; Schork, Nicholas J
2015-09-22
Next-generation sequencing (NGS) technologies have become much more efficient, allowing whole human genomes to be sequenced faster and cheaper than ever before. However, processing the raw sequence reads associated with NGS technologies requires care and sophistication in order to draw compelling inferences about phenotypic consequences of variation in human genomes. It has been shown that different approaches to variant calling from NGS data can lead to different conclusions. Ensuring appropriate accuracy and quality in variant calling can come at a computational cost. We describe our experience implementing and evaluating a group-based approach to calling variants on large numbers of whole human genomes. We explore the influence of many factors that may impact the accuracy and efficiency of group-based variant calling, including group size, the biogeographical backgrounds of the individuals who have been sequenced, and the computing environment used. We make efficient use of the Gordon supercomputer cluster at the San Diego Supercomputer Center by incorporating job-packing and parallelization considerations into our workflow while calling variants on 437 whole human genomes generated as part of large association study. We ultimately find that our workflow resulted in high-quality variant calls in a computationally efficient manner. We argue that studies like ours should motivate further investigations combining hardware-oriented advances in computing systems with algorithmic developments to tackle emerging 'big data' problems in biomedical research brought on by the expansion of NGS technologies.
TotalReCaller: improved accuracy and performance via integrated alignment and base-calling.
Menges, Fabian; Narzisi, Giuseppe; Mishra, Bud
2011-09-01
Currently, re-sequencing approaches use multiple modules serially to interpret raw sequencing data from next-generation sequencing platforms, while remaining oblivious to the genomic information until the final alignment step. Such approaches fail to exploit the full information from both raw sequencing data and the reference genome that can yield better quality sequence reads, SNP-calls, variant detection, as well as an alignment at the best possible location in the reference genome. Thus, there is a need for novel reference-guided bioinformatics algorithms for interpreting analog signals representing sequences of the bases ({A, C, G, T}), while simultaneously aligning possible sequence reads to a source reference genome whenever available. Here, we propose a new base-calling algorithm, TotalReCaller, to achieve improved performance. A linear error model for the raw intensity data and Burrows-Wheeler transform (BWT) based alignment are combined utilizing a Bayesian score function, which is then globally optimized over all possible genomic locations using an efficient branch-and-bound approach. The algorithm has been implemented in soft- and hardware [field-programmable gate array (FPGA)] to achieve real-time performance. Empirical results on real high-throughput Illumina data were used to evaluate TotalReCaller's performance relative to its peers-Bustard, BayesCall, Ibis and Rolexa-based on several criteria, particularly those important in clinical and scientific applications. Namely, it was evaluated for (i) its base-calling speed and throughput, (ii) its read accuracy and (iii) its specificity and sensitivity in variant calling. A software implementation of TotalReCaller as well as additional information, is available at: http://bioinformatics.nyu.edu/wordpress/projects/totalrecaller/ fabian.menges@nyu.edu.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stoiber, Marcus H.; Brown, James B.
This software implements the first base caller for nanopore data that calls bases directly from raw data. The basecRAWller algorithm has two major advantages over current nanopore base calling software: (1) streaming base calling and (2) base calling from information rich raw signal. The ability to perform truly streaming base calling as signal is received from the sequencer can be very powerful as this is one of the major advantages of this technology as compared to other sequencing technologies. As such enabling as much streaming potential as possible will be incredibly important as this technology continues to become more widelymore » applied in biosciences. All other base callers currently employ the Viterbi algorithm which requires the whole sequence to employ the complete base calling procedure and thus precludes a natural streaming base calling procedure. The other major advantage of the basecRAWller algorithm is the prediction of bases from raw signal which contains much richer information than the segmented chunks that current algorithms employ. This leads to the potential for much more accurate base calls which would make this technology much more valuable to all of the growing user base for this technology.« less
Bolzán, Alejandro D
2017-07-01
By definition, telomeric sequences are located at the very ends or terminal regions of chromosomes. However, several vertebrate species show blocks of (TTAGGG)n repeats present in non-terminal regions of chromosomes, the so-called interstitial telomeric sequences (ITSs), interstitial telomeric repeats or interstitial telomeric bands, which include those intrachromosomal telomeric-like repeats located near (pericentromeric ITSs) or within the centromere (centromeric ITSs) and those telomeric repeats located between the centromere and the telomere (i.e., truly interstitial telomeric sequences) of eukaryotic chromosomes. According with their sequence organization, localization and flanking sequences, ITSs can be classified into four types: 1) short ITSs, 2) subtelomeric ITSs, 3) fusion ITSs, and 4) heterochromatic ITSs. The first three types have been described mainly in the human genome, whereas heterochromatic ITSs have been found in several vertebrate species but not in humans. Several lines of evidence suggest that ITSs play a significant role in genome instability and evolution. This review aims to summarize our current knowledge about the origin, function, instability and evolution of these telomeric-like repeats in vertebrate chromosomes. Copyright © 2017 Elsevier B.V. All rights reserved.
Visvesvara, Govinda S.; De Jonckheere, Johan F.; Sriram, Rama; Daft, Barbara
2005-01-01
Naegleria fowleri causes an acute and rapidly fatal central nervous system infection called primary amebic meningoencephalitis (PAM) in healthy children and young adults. We describe here the identification of N. fowleri isolated from the brain of one of several cows that died of PAM based on sequencing of the internal transcribed spacers, including the 5.8S rRNA genes. PMID:16081978
Yeo, Zhen Xuan; Wong, Joshua Chee Leong; Rozen, Steven G; Lee, Ann Siew Gek
2014-06-24
The Ion Torrent PGM is a popular benchtop sequencer that shows promise in replacing conventional Sanger sequencing as the gold standard for mutation detection. Despite the PGM's reported high accuracy in calling single nucleotide variations, it tends to generate many false positive calls in detecting insertions and deletions (indels), which may hinder its utility for clinical genetic testing. Recently, the proprietary analytical workflow for the Ion Torrent sequencer, Torrent Suite (TS), underwent a series of upgrades. We evaluated three major upgrades of TS by calling indels in the BRCA1 and BRCA2 genes. Our analysis revealed that false negative indels could be generated by TS under both default calling parameters and parameters adjusted for maximum sensitivity. However, indel calling with the same data using the open source variant callers, GATK and SAMtools showed that false negatives could be minimised with the use of appropriate bioinformatics analysis. Furthermore, we identified two variant calling measures, Quality-by-Depth (QD) and VARiation of the Width of gaps and inserts (VARW), which substantially reduced false positive indels, including non-homopolymer associated errors without compromising sensitivity. In our best case scenario that involved the TMAP aligner and SAMtools, we achieved 100% sensitivity, 99.99% specificity and 29% False Discovery Rate (FDR) in indel calling from all 23 samples, which is a good performance for mutation screening using PGM. New versions of TS, BWA and GATK have shown improvements in indel calling sensitivity and specificity over their older counterpart. However, the variant caller of TS exhibits a lower sensitivity than GATK and SAMtools. Our findings demonstrate that although indel calling from PGM sequences may appear to be noisy at first glance, proper computational indel calling analysis is able to maximize both the sensitivity and specificity at the single base level, paving the way for the usage of this technology for future clinical genetic testing.
Meaningful call combinations and compositional processing in the southern pied babbler
Engesser, Sabrina; Ridley, Amanda R.; Townsend, Simon W.
2016-01-01
Language’s expressive power is largely attributable to its compositionality: meaningful words are combined into larger/higher-order structures with derived meaning. Despite its importance, little is known regarding the evolutionary origins and emergence of this syntactic ability. Although previous research has shown a rudimentary capability to combine meaningful calls in primates, because of a scarcity of comparative data, it is unclear to what extent analog forms might also exist outside of primates. Here, we address this ambiguity and provide evidence for rudimentary compositionality in the discrete vocal system of a social passerine, the pied babbler (Turdoides bicolor). Natural observations and predator presentations revealed that babblers produce acoustically distinct alert calls in response to close, low-urgency threats and recruitment calls when recruiting group members during locomotion. On encountering terrestrial predators, both vocalizations are combined into a “mobbing sequence,” potentially to recruit group members in a dangerous situation. To investigate whether babblers process the sequence in a compositional way, we conducted systematic experiments, playing back the individual calls in isolation as well as naturally occurring and artificial sequences. Babblers reacted most strongly to mobbing sequence playbacks, showing a greater attentiveness and a quicker approach to the loudspeaker, compared with individual calls or control sequences. We conclude that the sequence constitutes a compositional structure, communicating information on both the context and the requested action. Our work supports previous research suggesting combinatoriality as a viable mechanism to increase communicative output and indicates that the ability to combine and process meaningful vocal structures, a basic syntax, may be more widespread than previously thought. PMID:27155011
Meaningful call combinations and compositional processing in the southern pied babbler.
Engesser, Sabrina; Ridley, Amanda R; Townsend, Simon W
2016-05-24
Language's expressive power is largely attributable to its compositionality: meaningful words are combined into larger/higher-order structures with derived meaning. Despite its importance, little is known regarding the evolutionary origins and emergence of this syntactic ability. Although previous research has shown a rudimentary capability to combine meaningful calls in primates, because of a scarcity of comparative data, it is unclear to what extent analog forms might also exist outside of primates. Here, we address this ambiguity and provide evidence for rudimentary compositionality in the discrete vocal system of a social passerine, the pied babbler (Turdoides bicolor). Natural observations and predator presentations revealed that babblers produce acoustically distinct alert calls in response to close, low-urgency threats and recruitment calls when recruiting group members during locomotion. On encountering terrestrial predators, both vocalizations are combined into a "mobbing sequence," potentially to recruit group members in a dangerous situation. To investigate whether babblers process the sequence in a compositional way, we conducted systematic experiments, playing back the individual calls in isolation as well as naturally occurring and artificial sequences. Babblers reacted most strongly to mobbing sequence playbacks, showing a greater attentiveness and a quicker approach to the loudspeaker, compared with individual calls or control sequences. We conclude that the sequence constitutes a compositional structure, communicating information on both the context and the requested action. Our work supports previous research suggesting combinatoriality as a viable mechanism to increase communicative output and indicates that the ability to combine and process meaningful vocal structures, a basic syntax, may be more widespread than previously thought.
A family of Nikishin systems with periodic recurrence coefficients
DOE Office of Scientific and Technical Information (OSTI.GOV)
Delvaux, Steven; Lopez, Abey; Lopez, Guillermo L
2013-01-31
Suppose we have a Nikishin system of p measures with the kth generating measure of the Nikishin system supported on an interval {Delta}{sub k} subset of R with {Delta}{sub k} Intersection {Delta}{sub k+1} = Empty-Set for all k. It is well known that the corresponding staircase sequence of multiple orthogonal polynomials satisfies a (p+2)-term recurrence relation whose recurrence coefficients, under appropriate assumptions on the generating measures, have periodic limits of period p. (The limit values depend only on the positions of the intervals {Delta}{sub k}.) Taking these periodic limit values as the coefficients of a new (p+2)-term recurrence relation, wemore » construct a canonical sequence of monic polynomials {l_brace}P{sub n}{r_brace}{sub n=0}{sup {infinity}}, the so-called Chebyshev-Nikishin polynomials. We show that the polynomials P{sub n} themselves form a sequence of multiple orthogonal polynomials with respect to some Nikishin system of measures, with the kth generating measure being absolutely continuous on {Delta}{sub k}. In this way we generalize a result of the third author and Rocha [22] for the case p=2. The proof uses the connection with block Toeplitz matrices, and with a certain Riemann surface of genus zero. We also obtain strong asymptotics and an exact Widom-type formula for functions of the second kind of the Nikishin system for {l_brace}P{sub n}{r_brace}{sub n=0}{sup {infinity}}. Bibliography: 27 titles.« less
Genomic sequence of mandarin fish rhabdovirus with an unusual small non-transcriptional ORF.
Tao, Jian-Jun; Zhou, Guang-Zhou; Gui, Jian-Fang; Zhang, Qi-Ya
2008-03-01
The complete genome of mandarin fish Siniperca chuatsi rhabdovirus (SCRV) was cloned and sequenced. It comprises 11,545 nucleotides and contains five genes encoding the nucleoprotein N, the phosphoprotein P, the matrix protein M, the glycoprotein G, and the RNA-dependent RNA polymerase protein L. At the 3' and 5' termini of SCRV genome, leader and trailer sequences show inverse complementarity. The N, P, M and G proteins share the highest sequence identities (ranging from 14.8 to 41.5%) with the respective proteins of rhabdovirus 903/87, the L protein has the highest identity with those of vesiculoviruses, especially with Chandipura virus (44.7%). Phylogenetic analysis of L proteins showed that SCRV clustered with spring vireamia of carp virus (SVCV) and was most closely related to viruses in the genus Vesiculovirus. In addition, an overlapping open reading frame (ORF) predicted to encode a protein similar to vesicular stomatitis virus C protein is present within the P gene of SCRV. Furthermore, an unoverlapping small ORF downstream of M ORF within M gene is predicted (tentatively called orf4). Therefore, the genomic organization of SCRV can be proposed as 3' leader-N-P/C-M-(orf4)-G-L-trailer 5'. Orf4 transcription or translation products could not be detected by northern or Western blot, respectively, though one similar mRNA band to M mRNA was found. This is the first report on one small unoverlapping ORF in M gene of a fish rhabdovirus.
Weirathmueller, Michelle J.; Stafford, Kathleen M.; Wilcock, William S. D.; Hilmo, Rose S.; Dziak, Robert P.; Tréhu, Anne M.
2017-01-01
In order to study the long-term stability of fin whale (Balaenoptera physalus) singing behavior, the frequency and inter-pulse interval of fin whale 20 Hz vocalizations were observed over 10 years from 2003–2013 from bottom mounted hydrophones and seismometers in the northeast Pacific Ocean. The instrument locations extended from 40°N to 48°N and 130°W to 125°W with water depths ranging from 1500–4000 m. The inter-pulse interval (IPI) of fin whale song sequences was observed to increase at a rate of 0.54 seconds/year over the decade of observation. During the same time period, peak frequency decreased at a rate of 0.17 Hz/year. Two primary call patterns were observed. During the earlier years, the more commonly observed pattern had a single frequency and single IPI. In later years, a doublet pattern emerged, with two dominant frequencies and IPIs. Many call sequences in the intervening years appeared to represent a transitional state between the two patterns. The overall trend was consistent across the entire geographical span, although some regional differences exist. Understanding changes in acoustic behavior over long time periods is needed to help establish whether acoustic characteristics can be used to help determine population identity in a widely distributed, difficult to study species such as the fin whale. PMID:29073230
Weirathmueller, Michelle J; Stafford, Kathleen M; Wilcock, William S D; Hilmo, Rose S; Dziak, Robert P; Tréhu, Anne M
2017-01-01
In order to study the long-term stability of fin whale (Balaenoptera physalus) singing behavior, the frequency and inter-pulse interval of fin whale 20 Hz vocalizations were observed over 10 years from 2003-2013 from bottom mounted hydrophones and seismometers in the northeast Pacific Ocean. The instrument locations extended from 40°N to 48°N and 130°W to 125°W with water depths ranging from 1500-4000 m. The inter-pulse interval (IPI) of fin whale song sequences was observed to increase at a rate of 0.54 seconds/year over the decade of observation. During the same time period, peak frequency decreased at a rate of 0.17 Hz/year. Two primary call patterns were observed. During the earlier years, the more commonly observed pattern had a single frequency and single IPI. In later years, a doublet pattern emerged, with two dominant frequencies and IPIs. Many call sequences in the intervening years appeared to represent a transitional state between the two patterns. The overall trend was consistent across the entire geographical span, although some regional differences exist. Understanding changes in acoustic behavior over long time periods is needed to help establish whether acoustic characteristics can be used to help determine population identity in a widely distributed, difficult to study species such as the fin whale.
Evaluating Variant Calling Tools for Non-Matched Next-Generation Sequencing Data
NASA Astrophysics Data System (ADS)
Sandmann, Sarah; de Graaf, Aniek O.; Karimi, Mohsen; van der Reijden, Bert A.; Hellström-Lindberg, Eva; Jansen, Joop H.; Dugas, Martin
2017-02-01
Valid variant calling results are crucial for the use of next-generation sequencing in clinical routine. However, there are numerous variant calling tools that usually differ in algorithms, filtering strategies, recommendations and thus, also in the output. We evaluated eight open-source tools regarding their ability to call single nucleotide variants and short indels with allelic frequencies as low as 1% in non-matched next-generation sequencing data: GATK HaplotypeCaller, Platypus, VarScan, LoFreq, FreeBayes, SNVer, SAMtools and VarDict. We analysed two real datasets from patients with myelodysplastic syndrome, covering 54 Illumina HiSeq samples and 111 Illumina NextSeq samples. Mutations were validated by re-sequencing on the same platform, on a different platform and expert based review. In addition we considered two simulated datasets with varying coverage and error profiles, covering 50 samples each. In all cases an identical target region consisting of 19 genes (42,322 bp) was analysed. Altogether, no tool succeeded in calling all mutations. High sensitivity was always accompanied by low precision. Influence of varying coverages- and background noise on variant calling was generally low. Taking everything into account, VarDict performed best. However, our results indicate that there is a need to improve reproducibility of the results in the context of multithreading.
Ordulu, Zehra; Wong, Kristen E; Currall, Benjamin B; Ivanov, Andrew R; Pereira, Shahrin; Althari, Sara; Gusella, James F; Talkowski, Michael E; Morton, Cynthia C
2014-05-01
With recent rapid advances in genomic technologies, precise delineation of structural chromosome rearrangements at the nucleotide level is becoming increasingly feasible. In this era of "next-generation cytogenetics" (i.e., an integration of traditional cytogenetic techniques and next-generation sequencing), a consensus nomenclature is essential for accurate communication and data sharing. Currently, nomenclature for describing the sequencing data of these aberrations is lacking. Herein, we present a system called Next-Gen Cytogenetic Nomenclature, which is concordant with the International System for Human Cytogenetic Nomenclature (2013). This system starts with the alignment of rearrangement sequences by BLAT or BLAST (alignment tools) and arrives at a concise and detailed description of chromosomal changes. To facilitate usage and implementation of this nomenclature, we are developing a program designated BLA(S)T Output Sequence Tool of Nomenclature (BOSToN), a demonstrative version of which is accessible online. A standardized characterization of structural chromosomal rearrangements is essential both for research analyses and for application in the clinical setting. Copyright © 2014 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Keogh, Michael J; Wei, Wei; Wilson, Ian; Coxhead, Jon; Ryan, Sarah; Rollinson, Sara; Griffin, Helen; Kurzawa-Akanbi, Marzena; Santibanez-Koref, Mauro; Talbot, Kevin; Turner, Martin R; McKenzie, Chris-Anne; Troakes, Claire; Attems, Johannes; Smith, Colin; Al Sarraj, Safa; Morris, Chris M; Ansorge, Olaf; Pickering-Brown, Stuart; Ironside, James W; Chinnery, Patrick F
2017-01-01
Given the central role of genetic factors in the pathogenesis of common neurodegenerative disorders, it is critical that mechanistic studies in human tissue are interpreted in a genetically enlightened context. To address this, we performed exome sequencing and copy number variant analysis on 1511 frozen human brains with a diagnosis of Alzheimer's disease (AD, n = 289), frontotemporal dementia/amyotrophic lateral sclerosis (FTD/ALS, n = 252), Creutzfeldt-Jakob disease (CJD, n = 239), Parkinson's disease (PD, n = 39), dementia with Lewy bodies (DLB, n = 58), other neurodegenerative, vascular, or neurogenetic disorders (n = 266), and controls with no significant neuropathology (n = 368). Genomic DNA was extracted from brain tissue in all cases before exome sequencing (Illumina Nextera 62 Mb capture) with variants called by FreeBayes; copy number variant (CNV) analysis (Illumina HumanOmniExpress-12 BeadChip); C9orf72 repeat expansion detection; and APOE genotyping. Established or likely pathogenic heterozygous, compound heterozygous, or homozygous variants, together with the C9orf72 hexanucleotide repeat expansions and a copy number gain of APP, were found in 61 brains. In addition to known risk alleles in 349 brains (23.9% of 1461 undergoing exome sequencing), we saw an association between rare variants in GRN and DLB. Rare CNVs were found in <1.5% of brains, including copy number gains of PRPH that were overrepresented in AD. Clinical, pathological, and genetic data are available, enabling the retrieval of specific frozen brains through the UK Medical Research Council Brain Banks Network. This allows direct access to pathological and control human brain tissue based on an individual's genetic architecture, thus enabling the functional validation of known genetic risk factors and potentially pathogenic alleles identified in future studies. © 2017 Keogh et al.; Published by Cold Spring Harbor Laboratory Press.
Distance Magic-Type and Distance Antimagic-Type Labelings of Graphs
NASA Astrophysics Data System (ADS)
Freyberg, Bryan J.
Generally speaking, a distance magic-type labeling of a graph G of order n is a bijection l from the vertex set of the graph to the first n natural numbers or to the elements of a group of order n, with the property that the weight of each vertex is the same. The weight of a vertex x is defined as the sum (or appropriate group operation) of all the labels of vertices adjacent to x. If instead we require that all weights differ, then we refer to the labeling as a distance antimagic-type labeling. This idea can be generalized for directed graphs; the weight will take into consideration the direction of the arcs. In this manuscript, we provide new results for d-handicap labeling, a distance antimagic-type labeling, and introduce a new distance magic-type labeling called orientable Gamma-distance magic labeling. A d-handicap distance antimagic labeling (or just d-handicap labeling for short) of a graph G = ( V,E) of order n is a bijection l from V to the set {1,2,...,n} with induced weight function [special characters omitted]. such that l(xi) = i and the sequence of weights w(x 1),w(x2),...,w (xn) forms an arithmetic sequence with constant difference d at least 1. If a graph G admits a d-handicap labeling, we say G is a d-handicap graph. A d-handicap incomplete tournament, H(n,k,d ) is an incomplete tournament of n teams ranked with the first n natural numbers such that each team plays exactly k games and the strength of schedule of the ith ranked team is d more than the i + 1st ranked team. That is, strength of schedule increases arithmetically with strength of team. Constructing an H(n,k,d) is equivalent to finding a d-handicap labeling of a k-regular graph of order n.. In Chapter 2 we provide general constructions for every d for large classes of both n and k, providing breadfth and depth to the catalog of known H(n,k,d)'s. In Chapters 3 - 6, we introduce a new type of labeling called orientable Gamma-distance magic labeling. Let Gamma be an abelian group of order n. If for a graph G = (V,E) of order n there exists an orientation of the edges of G and a companion bijection from V to Gamma with the property that there is an element mu of Gamma (called the magic constant) such that [special characters omitted] where w(x) is the weight of vertex x, we say that G is orientable Gamma -distance magic. In addition to introducing the concept, we provide numerous results on orientable Zn-distance magic graphs, where Zn is the cyclic group of order n.. In Chapter 7, we summarize the results of this dissertation and provide suggestions for future work.
Fan, Yu; Xi, Liu; Hughes, Daniel S T; Zhang, Jianjun; Zhang, Jianhua; Futreal, P Andrew; Wheeler, David A; Wang, Wenyi
2016-08-24
Subclonal mutations reveal important features of the genetic architecture of tumors. However, accurate detection of mutations in genetically heterogeneous tumor cell populations using next-generation sequencing remains challenging. We develop MuSE ( http://bioinformatics.mdanderson.org/main/MuSE ), Mutation calling using a Markov Substitution model for Evolution, a novel approach for modeling the evolution of the allelic composition of the tumor and normal tissue at each reference base. MuSE adopts a sample-specific error model that reflects the underlying tumor heterogeneity to greatly improve the overall accuracy. We demonstrate the accuracy of MuSE in calling subclonal mutations in the context of large-scale tumor sequencing projects using whole exome and whole genome sequencing.
Li, Runsheng; Hsieh, Chia-Ling; Young, Amanda; Zhang, Zhihong; Ren, Xiaoliang; Zhao, Zhongying
2015-01-01
Most next-generation sequencing platforms permit acquisition of high-throughput DNA sequences, but the relatively short read length limits their use in genome assembly or finishing. Illumina has recently released a technology called Synthetic Long-Read Sequencing that can produce reads of unusual length, i.e., predominately around 10 Kb. However, a systematic assessment of their use in genome finishing and assembly is still lacking. We evaluate the promise and deficiency of the long reads in these aspects using isogenic C. elegans genome with no gap. First, the reads are highly accurate and capable of recovering most types of repetitive sequences. However, the presence of tandem repetitive sequences prevents pre-assembly of long reads in the relevant genomic region. Second, the reads are able to reliably detect missing but not extra sequences in the C. elegans genome. Third, the reads of smaller size are more capable of recovering repetitive sequences than those of bigger size. Fourth, at least 40 Kbp missing genomic sequences are recovered in the C. elegans genome using the long reads. Finally, an N50 contig size of at least 86 Kbp can be achieved with 24×reads but with substantial mis-assembly errors, highlighting a need for novel assembly algorithm for the long reads. PMID:26039588
Antanaviciute, Agne; Baquero-Perez, Belinda; Watson, Christopher M; Harrison, Sally M; Lascelles, Carolina; Crinnion, Laura; Markham, Alexander F; Bonthron, David T; Whitehouse, Adrian; Carr, Ian M
2017-10-01
Recent methods for transcriptome-wide N 6 -methyladenosine (m 6 A) profiling have facilitated investigations into the RNA methylome and established m 6 A as a dynamic modification that has critical regulatory roles in gene expression and may play a role in human disease. However, bioinformatics resources available for the analysis of m 6 A sequencing data are still limited. Here, we describe m6aViewer-a cross-platform application for analysis and visualization of m 6 A peaks from sequencing data. m6aViewer implements a novel m 6 A peak-calling algorithm that identifies high-confidence methylated residues with more precision than previously described approaches. The application enables data analysis through a graphical user interface, and thus, in contrast to other currently available tools, does not require the user to be skilled in computer programming. m6aViewer and test data can be downloaded here: http://dna2.leeds.ac.uk/m6a. © 2017 Antanaviciute et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Regulation of mIκBNS stability through PEST-mediated degradation by proteasome
DOE Office of Scientific and Technical Information (OSTI.GOV)
Park, Koog Chan; Jeong, Jiyeong; Kim, Keun Il, E-mail: kikim@sookmyung.ac.kr
2014-01-24
Highlights: • mIκBNS is degraded rapidly by proteasome without ubiquitylation. • N-terminal PEST sequence is responsible for the unstable nature of mIκBNS. • PEST sequence is not critical for nuclear localization of mIκBNS. • There is single bona fide NLS at the C-terminus of mIκBNS. - Abstract: Negative regulatory proteins in a cytokine signaling play a critical role in restricting unwanted excess activation of the signaling pathway. At the same time, negative regulatory proteins need to be removed rapidly from cells to respond properly to the next incoming signal. A nuclear IκB protein called IκBNS is known to inhibit amore » subset of NF-κB target genes upon its expression by NF-κB activation. Here, we show a mechanism to control the stability of mIκBNS which might be important for cells to prepare the next round signaling. We found that mIκBNS is a short-lived protein of which the stability is controlled by proteasome, independent of ubiquitylation process. We identified that the N-terminal PEST sequence in mIκBNS was critical for the regulation of stability.« less
Characterization of a beta-lactamase produced in Mycobacterium fortuitum D316.
Amicosante, G; Franceschini, N; Segatore, B; Oratore, A; Fattorini, L; Orefici, G; Van Beeumen, J; Frere, J M
1990-01-01
A beta-lactamase from Mycobacterium fortuitum D316 was purified and some physico-chemical properties and substrate profile determined. On the basis of its N-terminal sequence and of its sensitivity to beta-iodopenicillanate inactivation, the enzyme appeared to be a class A beta-lactamase, but its substrate profile was quite unexpected, since nine cephalosporins were among the eleven best substrates. The enzyme also hydrolysed ureidopenicillins and some so-called 'beta-lactamase-stable' cephalosporins. Images Fig. 1. PMID:2123098
Systematic comparison of variant calling pipelines using gold standard personal exome variants
Hwang, Sohyun; Kim, Eiru; Lee, Insuk; Marcotte, Edward M.
2015-01-01
The success of clinical genomics using next generation sequencing (NGS) requires the accurate and consistent identification of personal genome variants. Assorted variant calling methods have been developed, which show low concordance between their calls. Hence, a systematic comparison of the variant callers could give important guidance to NGS-based clinical genomics. Recently, a set of high-confident variant calls for one individual (NA12878) has been published by the Genome in a Bottle (GIAB) consortium, enabling performance benchmarking of different variant calling pipelines. Based on the gold standard reference variant calls from GIAB, we compared the performance of thirteen variant calling pipelines, testing combinations of three read aligners—BWA-MEM, Bowtie2, and Novoalign—and four variant callers—Genome Analysis Tool Kit HaplotypeCaller (GATK-HC), Samtools mpileup, Freebayes and Ion Proton Variant Caller (TVC), for twelve data sets for the NA12878 genome sequenced by different platforms including Illumina2000, Illumina2500, and Ion Proton, with various exome capture systems and exome coverage. We observed different biases toward specific types of SNP genotyping errors by the different variant callers. The results of our study provide useful guidelines for reliable variant identification from deep sequencing of personal genomes. PMID:26639839
Isotropic probability measures in infinite-dimensional spaces
NASA Technical Reports Server (NTRS)
Backus, George
1987-01-01
Let R be the real numbers, R(n) the linear space of all real n-tuples, and R(infinity) the linear space of all infinite real sequences x = (x sub 1, x sub 2,...). Let P sub in :R(infinity) approaches R(n) be the projection operator with P sub n (x) = (x sub 1,...,x sub n). Let p(infinity) be a probability measure on the smallest sigma-ring of subsets of R(infinity) which includes all of the cylinder sets P sub n(-1) (B sub n), where B sub n is an arbitrary Borel subset of R(n). Let p sub n be the marginal distribution of p(infinity) on R(n), so p sub n(B sub n) = p(infinity) (P sub n to the -1 (B sub n)) for each B sub n. A measure on R(n) is isotropic if it is invariant under all orthogonal transformations of R(n). All members of the set of all isotropic probability distributions on R(n) are described. The result calls into question both stochastic inversion and Bayesian inference, as currently used in many geophysical inverse problems.
Fantin, Yuri S.; Neverov, Alexey D.; Favorov, Alexander V.; Alvarez-Figueroa, Maria V.; Braslavskaya, Svetlana I.; Gordukova, Maria A.; Karandashova, Inga V.; Kuleshov, Konstantin V.; Myznikova, Anna I.; Polishchuk, Maya S.; Reshetov, Denis A.; Voiciehovskaya, Yana A.; Mironov, Andrei A.; Chulanov, Vladimir P.
2013-01-01
Sanger sequencing is a common method of reading DNA sequences. It is less expensive than high-throughput methods, and it is appropriate for numerous applications including molecular diagnostics. However, sequencing mixtures of similar DNA of pathogens with this method is challenging. This is important because most clinical samples contain such mixtures, rather than pure single strains. The traditional solution is to sequence selected clones of PCR products, a complicated, time-consuming, and expensive procedure. Here, we propose the base-calling with vocabulary (BCV) method that computationally deciphers Sanger chromatograms obtained from mixed DNA samples. The inputs to the BCV algorithm are a chromatogram and a dictionary of sequences that are similar to those we expect to obtain. We apply the base-calling function on a test dataset of chromatograms without ambiguous positions, as well as one with 3–14% sequence degeneracy. Furthermore, we use BCV to assemble a consensus sequence for an HIV genome fragment in a sample containing a mixture of viral DNA variants and to determine the positions of the indels. Finally, we detect drug-resistant Mycobacterium tuberculosis strains carrying frameshift mutations mixed with wild-type bacteria in the pncA gene, and roughly characterize bacterial communities in clinical samples by direct 16S rRNA sequencing. PMID:23382983
An algebraic hypothesis about the primeval genetic code architecture.
Sánchez, Robersy; Grau, Ricardo
2009-09-01
A plausible architecture of an ancient genetic code is derived from an extended base triplet vector space over the Galois field of the extended base alphabet {D,A,C,G,U}, where symbol D represents one or more hypothetical bases with unspecific pairings. We hypothesized that the high degeneration of a primeval genetic code with five bases and the gradual origin and improvement of a primeval DNA repair system could make possible the transition from ancient to modern genetic codes. Our results suggest that the Watson-Crick base pairing G identical with C and A=U and the non-specific base pairing of the hypothetical ancestral base D used to define the sum and product operations are enough features to determine the coding constraints of the primeval and the modern genetic code, as well as, the transition from the former to the latter. Geometrical and algebraic properties of this vector space reveal that the present codon assignment of the standard genetic code could be induced from a primeval codon assignment. Besides, the Fourier spectrum of the extended DNA genome sequences derived from the multiple sequence alignment suggests that the called period-3 property of the present coding DNA sequences could also exist in the ancient coding DNA sequences. The phylogenetic analyses achieved with metrics defined in the N-dimensional vector space (B(3))(N) of DNA sequences and with the new evolutionary model presented here also suggest that an ancient DNA coding sequence with five or more bases does not contradict the expected evolutionary history.
Somatic Point Mutation Calling in Low Cellularity Tumors
Kassahn, Karin S.; Holmes, Oliver; Nones, Katia; Patch, Ann-Marie; Miller, David K.; Christ, Angelika N.; Harliwong, Ivon; Bruxner, Timothy J.; Xu, Qinying; Anderson, Matthew; Wood, Scott; Leonard, Conrad; Taylor, Darrin; Newell, Felicity; Song, Sarah; Idrisoglu, Senel; Nourse, Craig; Nourbakhsh, Ehsan; Manning, Suzanne; Wani, Shivangi; Steptoe, Anita; Pajic, Marina; Cowley, Mark J.; Pinese, Mark; Chang, David K.; Gill, Anthony J.; Johns, Amber L.; Wu, Jianmin; Wilson, Peter J.; Fink, Lynn; Biankin, Andrew V.; Waddell, Nicola; Grimmond, Sean M.; Pearson, John V.
2013-01-01
Somatic mutation calling from next-generation sequencing data remains a challenge due to the difficulties of distinguishing true somatic events from artifacts arising from PCR, sequencing errors or mis-mapping. Tumor cellularity or purity, sub-clonality and copy number changes also confound the identification of true somatic events against a background of germline variants. We have developed a heuristic strategy and software (http://www.qcmg.org/bioinformatics/qsnp/) for somatic mutation calling in samples with low tumor content and we show the superior sensitivity and precision of our approach using a previously sequenced cell line, a series of tumor/normal admixtures, and 3,253 putative somatic SNVs verified on an orthogonal platform. PMID:24250782
Bifurcation Gaps in Asymmetric and High-Dimensional Hypercycles
NASA Astrophysics Data System (ADS)
Puig, Júlia; Farré, Gerard; Guillamon, Antoni; Fontich, Ernest; Sardanyés, Josep
Hypercycles are catalytic systems with cyclic architecture. These systems have been suggested to play a key role in the maintenance and increase of information in prebiotic replicators. It is known that for a large enough number of hypercycle species (n > 4) the coexistence of all hypercycle members is governed by a stable periodic orbit. Previous research has characterized saddle-node (s-n) bifurcations involving abrupt transitions from stable hypercycles to extinction of all hypercycle members, or, alternatively, involving the outcompetition of the hypercycle by so-called mutant sequences or parasites. Recently, the presence of a bifurcation gap between a s-n bifurcation of periodic orbits and a s-n of fixed points has been described for symmetric five-member hypercycles. This gap was found between the value of the replication quality factor Q from which the periodic orbit vanishes (QPO) and the value where two unstable (nonzero) equilibrium points collide (QSS). Here, we explore the persistence of this gap considering asymmetries in replication rates in five-member hypercycles as well as considering symmetric, larger hypercycles. Our results indicate that both the asymmetry in Malthusian replication constants and the increase in hypercycle members enlarge the size of this gap. The implications of this phenomenon are discussed in the context of delayed transitions associated to the so-called saddle remnants.
A Shellcode Detection Method Based on Full Native API Sequence and Support Vector Machine
NASA Astrophysics Data System (ADS)
Cheng, Yixuan; Fan, Wenqing; Huang, Wei; An, Jing
2017-09-01
Dynamic monitoring the behavior of a program is widely used to discriminate between benign program and malware. It is usually based on the dynamic characteristics of a program, such as API call sequence or API call frequency to judge. The key innovation of this paper is to consider the full Native API sequence and use the support vector machine to detect the shellcode. We also use the Markov chain to extract and digitize Native API sequence features. Our experimental results show that the method proposed in this paper has high accuracy and low detection rate.
NASA Astrophysics Data System (ADS)
Xing, Pengwei; Su, Ran; Guo, Fei; Wei, Leyi
2017-04-01
N6-methyladenosine (m6A) refers to methylation of the adenosine nucleotide acid at the nitrogen-6 position. It plays an important role in a series of biological processes, such as splicing events, mRNA exporting, nascent mRNA synthesis, nuclear translocation and translation process. Numerous experiments have been done to successfully characterize m6A sites within sequences since high-resolution mapping of m6A sites was established. However, as the explosive growth of genomic sequences, using experimental methods to identify m6A sites are time-consuming and expensive. Thus, it is highly desirable to develop fast and accurate computational identification methods. In this study, we propose a sequence-based predictor called RAM-NPPS for identifying m6A sites within RNA sequences, in which we present a novel feature representation algorithm based on multi-interval nucleotide pair position specificity, and use support vector machine classifier to construct the prediction model. Comparison results show that our proposed method outperforms the state-of-the-art predictors on three benchmark datasets across the three species, indicating the effectiveness and robustness of our method. Moreover, an online webserver implementing the proposed predictor has been established at http://server.malab.cn/RAM-NPPS/. It is anticipated to be a useful prediction tool to assist biologists to reveal the mechanisms of m6A site functions.
Stam, Remco; Scheikl, Daniela; Tellier, Aurélien
2016-01-01
Nod-like receptors (NLRs) are nucleotide-binding domain and leucine-rich repeats containing proteins that are important in plant resistance signaling. Many of the known pathogen resistance (R) genes in plants are NLRs and they can recognize pathogen molecules directly or indirectly. As such, divergence and copy number variants at these genes are found to be high between species. Within populations, positive and balancing selection are to be expected if plants coevolve with their pathogens. In order to understand the complexity of R-gene coevolution in wild nonmodel species, it is necessary to identify the full range of NLRs and infer their evolutionary history. Here we investigate and reveal polymorphism occurring at 220 NLR genes within one population of the partially selfing wild tomato species Solanum pennellii. We use a combination of enrichment sequencing and pooling ten individuals, to specifically sequence NLR genes in a resource and cost-effective manner. We focus on the effects which different mapping and single nucleotide polymorphism calling software and settings have on calling polymorphisms in customized pooled samples. Our results are accurately verified using Sanger sequencing of polymorphic gene fragments. Our results indicate that some NLRs, namely 13 out of 220, have maintained polymorphism within our S. pennellii population. These genes show a wide range of πN/πS ratios and differing site frequency spectra. We compare our observed rate of heterozygosity with expectations for this selfing and bottlenecked population. We conclude that our method enables us to pinpoint NLR genes which have experienced natural selection in their habitat. PMID:27189991
Krøigård, Anne Bruun; Thomassen, Mads; Lænkholm, Anne-Vibeke; Kruse, Torben A; Larsen, Martin Jakob
2016-01-01
Next generation sequencing is extensively applied to catalogue somatic mutations in cancer, in research settings and increasingly in clinical settings for molecular diagnostics, guiding therapy decisions. Somatic variant callers perform paired comparisons of sequencing data from cancer tissue and matched normal tissue in order to detect somatic mutations. The advent of many new somatic variant callers creates a need for comparison and validation of the tools, as no de facto standard for detection of somatic mutations exists and only limited comparisons have been reported. We have performed a comprehensive evaluation using exome sequencing and targeted deep sequencing data of paired tumor-normal samples from five breast cancer patients to evaluate the performance of nine publicly available somatic variant callers: EBCall, Mutect, Seurat, Shimmer, Indelocator, Somatic Sniper, Strelka, VarScan 2 and Virmid for the detection of single nucleotide mutations and small deletions and insertions. We report a large variation in the number of calls from the nine somatic variant callers on the same sequencing data and highly variable agreement. Sequencing depth had markedly diverse impact on individual callers, as for some callers, increased sequencing depth highly improved sensitivity. For SNV calling, we report EBCall, Mutect, Virmid and Strelka to be the most reliable somatic variant callers for both exome sequencing and targeted deep sequencing. For indel calling, EBCall is superior due to high sensitivity and robustness to changes in sequencing depths.
Krøigård, Anne Bruun; Thomassen, Mads; Lænkholm, Anne-Vibeke; Kruse, Torben A.; Larsen, Martin Jakob
2016-01-01
Next generation sequencing is extensively applied to catalogue somatic mutations in cancer, in research settings and increasingly in clinical settings for molecular diagnostics, guiding therapy decisions. Somatic variant callers perform paired comparisons of sequencing data from cancer tissue and matched normal tissue in order to detect somatic mutations. The advent of many new somatic variant callers creates a need for comparison and validation of the tools, as no de facto standard for detection of somatic mutations exists and only limited comparisons have been reported. We have performed a comprehensive evaluation using exome sequencing and targeted deep sequencing data of paired tumor-normal samples from five breast cancer patients to evaluate the performance of nine publicly available somatic variant callers: EBCall, Mutect, Seurat, Shimmer, Indelocator, Somatic Sniper, Strelka, VarScan 2 and Virmid for the detection of single nucleotide mutations and small deletions and insertions. We report a large variation in the number of calls from the nine somatic variant callers on the same sequencing data and highly variable agreement. Sequencing depth had markedly diverse impact on individual callers, as for some callers, increased sequencing depth highly improved sensitivity. For SNV calling, we report EBCall, Mutect, Virmid and Strelka to be the most reliable somatic variant callers for both exome sequencing and targeted deep sequencing. For indel calling, EBCall is superior due to high sensitivity and robustness to changes in sequencing depths. PMID:27002637
Verbist, Bie M P; Thys, Kim; Reumers, Joke; Wetzels, Yves; Van der Borght, Koen; Talloen, Willem; Aerssens, Jeroen; Clement, Lieven; Thas, Olivier
2015-01-01
In virology, massively parallel sequencing (MPS) opens many opportunities for studying viral quasi-species, e.g. in HIV-1- and HCV-infected patients. This is essential for understanding pathways to resistance, which can substantially improve treatment. Although MPS platforms allow in-depth characterization of sequence variation, their measurements still involve substantial technical noise. For Illumina sequencing, single base substitutions are the main error source and impede powerful assessment of low-frequency mutations. Fortunately, base calls are complemented with quality scores (Qs) that are useful for differentiating errors from the real low-frequency mutations. A variant calling tool, Q-cpileup, is proposed, which exploits the Qs of nucleotides in a filtering strategy to increase specificity. The tool is imbedded in an open-source pipeline, VirVarSeq, which allows variant calling starting from fastq files. Using both plasmid mixtures and clinical samples, we show that Q-cpileup is able to reduce the number of false-positive findings. The filtering strategy is adaptive and provides an optimized threshold for individual samples in each sequencing run. Additionally, linkage information is kept between single-nucleotide polymorphisms as variants are called at the codon level. This enables virologists to have an immediate biological interpretation of the reported variants with respect to their antiviral drug responses. A comparison with existing SNP caller tools reveals that calling variants at the codon level with Q-cpileup results in an outstanding sensitivity while maintaining a good specificity for variants with frequencies down to 0.5%. The VirVarSeq is available, together with a user's guide and test data, at sourceforge: http://sourceforge.net/projects/virtools/?source=directory. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Esselstyn, Jacob A; Evans, Ben J; Sedlock, Jodi L; Anwarali Khan, Faisal Ali; Heaney, Lawrence R
2012-09-22
Prospects for a comprehensive inventory of global biodiversity would be greatly improved by automating methods of species delimitation. The general mixed Yule-coalescent (GMYC) was recently proposed as a potential means of increasing the rate of biodiversity exploration. We tested this method with simulated data and applied it to a group of poorly known bats (Hipposideros) from the Philippines. We then used echolocation call characteristics to evaluate the plausibility of species boundaries suggested by GMYC. In our simulations, GMYC performed relatively well (errors in estimated species diversity less than 25%) when the product of the haploid effective population size (N(e)) and speciation rate (SR; per lineage per million years) was less than or equal to 10(5), while interspecific variation in N(e) was twofold or less. However, at higher but also biologically relevant values of N(e) × SR and when N(e) varied tenfold among species, performance was very poor. GMYC analyses of mitochondrial DNA sequences from Philippine Hipposideros suggest actual diversity may be approximately twice the current estimate, and available echolocation call data are mostly consistent with GMYC delimitations. In conclusion, we consider the GMYC model useful under some conditions, but additional information on N(e), SR and/or corroboration from independent character data are needed to allow meaningful interpretation of results.
snpAD: An ancient DNA genotype caller.
Prüfer, Kay
2018-06-21
The study of ancient genomes can elucidate the evolutionary past. However, analyses are complicated by base-modifications in ancient DNA molecules that result in errors in DNA sequences. These errors are particularly common near the ends of sequences and pose a challenge for genotype calling. I describe an iterative method that estimates genotype frequencies and errors along sequences to allow for accurate genotype calling from ancient sequences. The implementation of this method, called snpAD, performs well on high-coverage ancient data, as shown by simulations and by subsampling the data of a high-coverage Neandertal genome. Although estimates for low-coverage genomes are less accurate, I am able to derive approximate estimates of heterozygosity from several low-coverage Neandertals. These estimates show that low heterozygosity, compared to modern humans, was common among Neandertals. The C ++ code of snpAD is freely available at http://bioinf.eva.mpg.de/snpAD/. Supplementary data are available at Bioinformatics online.
Allele-specific copy-number discovery from whole-genome and whole-exome sequencing
Wang, WeiBo; Wang, Wei; Sun, Wei; Crowley, James J.; Szatkiewicz, Jin P.
2015-01-01
Copy-number variants (CNVs) are a major form of genetic variation and a risk factor for various human diseases, so it is crucial to accurately detect and characterize them. It is conceivable that allele-specific reads from high-throughput sequencing data could be leveraged to both enhance CNV detection and produce allele-specific copy number (ASCN) calls. Although statistical methods have been developed to detect CNVs using whole-genome sequence (WGS) and/or whole-exome sequence (WES) data, information from allele-specific read counts has not yet been adequately exploited. In this paper, we develop an integrated method, called AS-GENSENG, which incorporates allele-specific read counts in CNV detection and estimates ASCN using either WGS or WES data. To evaluate the performance of AS-GENSENG, we conducted extensive simulations, generated empirical data using existing WGS and WES data sets and validated predicted CNVs using an independent methodology. We conclude that AS-GENSENG not only predicts accurate ASCN calls but also improves the accuracy of total copy number calls, owing to its unique ability to exploit information from both total and allele-specific read counts while accounting for various experimental biases in sequence data. Our novel, user-friendly and computationally efficient method and a complete analytic protocol is freely available at https://sourceforge.net/projects/asgenseng/. PMID:25883151
Continuous Influx of Genetic Material from Host to Virus Populations
Gilbert, Clément; Peccoud, Jean; Chateigner, Aurélien; Moumen, Bouziane
2016-01-01
Many genes of large double-stranded DNA viruses have a cellular origin, suggesting that host-to-virus horizontal transfer (HT) of DNA is recurrent. Yet, the frequency of these transfers has never been assessed in viral populations. Here we used ultra-deep DNA sequencing of 21 baculovirus populations extracted from two moth species to show that a large diversity of moth DNA sequences (n = 86) can integrate into viral genomes during the course of a viral infection. The majority of the 86 different moth DNA sequences are transposable elements (TEs, n = 69) belonging to 10 superfamilies of DNA transposons and three superfamilies of retrotransposons. The remaining 17 sequences are moth sequences of unknown nature. In addition to bona fide DNA transposition, we uncover microhomology-mediated recombination as a mechanism explaining integration of moth sequences into viral genomes. Many sequences integrated multiple times at multiple positions along the viral genome. We detected a total of 27,504 insertions of moth sequences in the 21 viral populations and we calculate that on average, 4.8% of viruses harbor at least one moth sequence in these populations. Despite this substantial proportion, no insertion of moth DNA was maintained in any viral population after 10 successive infection cycles. Hence, there is a constant turnover of host DNA inserted into viral genomes each time the virus infects a moth. Finally, we found that at least 21 of the moth TEs integrated into viral genomes underwent repeated horizontal transfers between various insect species, including some lepidopterans susceptible to baculoviruses. Our results identify host DNA influx as a potent source of genetic diversity in viral populations. They also support a role for baculoviruses as vectors of DNA HT between insects, and call for an evaluation of possible gene or TE spread when using viruses as biopesticides or gene delivery vectors. PMID:26829124
Continuous Influx of Genetic Material from Host to Virus Populations.
Gilbert, Clément; Peccoud, Jean; Chateigner, Aurélien; Moumen, Bouziane; Cordaux, Richard; Herniou, Elisabeth A
2016-02-01
Many genes of large double-stranded DNA viruses have a cellular origin, suggesting that host-to-virus horizontal transfer (HT) of DNA is recurrent. Yet, the frequency of these transfers has never been assessed in viral populations. Here we used ultra-deep DNA sequencing of 21 baculovirus populations extracted from two moth species to show that a large diversity of moth DNA sequences (n = 86) can integrate into viral genomes during the course of a viral infection. The majority of the 86 different moth DNA sequences are transposable elements (TEs, n = 69) belonging to 10 superfamilies of DNA transposons and three superfamilies of retrotransposons. The remaining 17 sequences are moth sequences of unknown nature. In addition to bona fide DNA transposition, we uncover microhomology-mediated recombination as a mechanism explaining integration of moth sequences into viral genomes. Many sequences integrated multiple times at multiple positions along the viral genome. We detected a total of 27,504 insertions of moth sequences in the 21 viral populations and we calculate that on average, 4.8% of viruses harbor at least one moth sequence in these populations. Despite this substantial proportion, no insertion of moth DNA was maintained in any viral population after 10 successive infection cycles. Hence, there is a constant turnover of host DNA inserted into viral genomes each time the virus infects a moth. Finally, we found that at least 21 of the moth TEs integrated into viral genomes underwent repeated horizontal transfers between various insect species, including some lepidopterans susceptible to baculoviruses. Our results identify host DNA influx as a potent source of genetic diversity in viral populations. They also support a role for baculoviruses as vectors of DNA HT between insects, and call for an evaluation of possible gene or TE spread when using viruses as biopesticides or gene delivery vectors.
El-Assaad, Atlal; Dawy, Zaher; Nemer, Georges; Hajj, Hazem; Kobeissy, Firas H
2017-01-01
Degradomics is a novel discipline that involves determination of the proteases/substrate fragmentation profile, called the substrate degradome, and has been recently applied in different disciplines. A major application of degradomics is its utility in the field of biomarkers where the breakdown products (BDPs) of different protease have been investigated. Among the major proteases assessed, calpain and caspase proteases have been associated with the execution phases of the pro-apoptotic and pro-necrotic cell death, generating caspase/calpain-specific cleaved fragments. The distinction between calpain and caspase protein fragments has been applied to distinguish injury mechanisms. Advanced proteomics technology has been used to identify these BDPs experimentally. However, it has been a challenge to identify these BDPs with high precision and efficiency, especially if we are targeting a number of proteins at one time. In this chapter, we present a novel bioinfromatic detection method that identifies BDPs accurately and efficiently with validation against experimental data. This method aims at predicting the consensus sequence occurrences and their variants in a large set of experimentally detected protein sequences based on state-of-the-art sequence matching and alignment algorithms. After detection, the method generates all the potential cleaved fragments by a specific protease. This space and time-efficient algorithm is flexible to handle the different orientations that the consensus sequence and the protein sequence can take before cleaving. It is O(mn) in space complexity and O(Nmn) in time complexity, with N number of protein sequences, m length of the consensus sequence, and n length of each protein sequence. Ultimately, this knowledge will subsequently feed into the development of a novel tool for researchers to detect diverse types of selected BDPs as putative disease markers, contributing to the diagnosis and treatment of related disorders.
Mendel-GPU: haplotyping and genotype imputation on graphics processing units
Chen, Gary K.; Wang, Kai; Stram, Alex H.; Sobel, Eric M.; Lange, Kenneth
2012-01-01
Motivation: In modern sequencing studies, one can improve the confidence of genotype calls by phasing haplotypes using information from an external reference panel of fully typed unrelated individuals. However, the computational demands are so high that they prohibit researchers with limited computational resources from haplotyping large-scale sequence data. Results: Our graphics processing unit based software delivers haplotyping and imputation accuracies comparable to competing programs at a fraction of the computational cost and peak memory demand. Availability: Mendel-GPU, our OpenCL software, runs on Linux platforms and is portable across AMD and nVidia GPUs. Users can download both code and documentation at http://code.google.com/p/mendel-gpu/. Contact: gary.k.chen@usc.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22954633
Miao, Wei; Simpson, Alastair G B; Fu, Chengjie; Lobban, Christopher S
2005-01-01
The small subunit rDNA sequence of Maristentor dinoferus (Lobban, Schefter, Simpson, Pochon, Pawlowski, and Foissner, 2002) was determined and compared with sequences from other Heterotrichea and Karyorelictea. Maristentor resembles Stentor in basic morphology and had been provisionally assigned to Stentoridae. However, our phylogenetic analyses show that Maristentor is more closely related to Folliculinidae. Our results support the creation of a separate family for Maristentor, Maristentoridae n. fam., and also confirm the phylogenetic grouping of Folliculindae, Stentoridae, Blepharismidae, and Maristentoridae, which we informally call 'stentorids'. Maristentor, rather than Stentor itself, appears to be most significant in understanding the origins of folliculinids from their aloricate ancestors. Our analyses suggest continued uncertainty in the exact placement of the root of heterotrichs with this phylogenetic marker.
Lewis, William H; Sendra, Kacper M; Embley, T Martin; Esteban, Genoveva F
2018-01-01
Many anaerobic ciliated protozoa contain organelles of mitochondrial ancestry called hydrogenosomes. These organelles generate molecular hydrogen that is consumed by methanogenic Archaea, living in endosymbiosis within many of these ciliates. Here we describe a new species of anaerobic ciliate, Trimyema finlayi n. sp., by using silver impregnation and microscopy to conduct a detailed morphometric analysis. Comparisons with previously published morphological data for this species, as well as the closely related species, Trimyema compressum , demonstrated that despite them being similar, both the mean cell size and the mean number of somatic kineties are lower for T. finlayi than for T. compressum , which suggests that they are distinct species. This was also supported by analysis of the 18S rRNA genes from these ciliates, the sequences of which are 97.5% identical (6 substitutions, 1479 compared bases), and in phylogenetic analyses these sequences grouped with other 18S rRNA genes sequenced from previous isolates of the same respective species. Together these data provide strong evidence that T. finlayi is a novel species of Trimyema , within the class Plagiopylea. Various microscopic techniques demonstrated that T. finlayi n. sp. contains polymorphic endosymbiotic methanogens, and analysis of the endosymbionts' 16S rRNA gene showed that they belong to the genus Methanocorpusculum , which was confirmed using fluorescence in situ hybridization with specific probes. Despite the degree of similarity and close relationship between these ciliates, T. compressum contains endosymbiotic methanogens from a different genus, Methanobrevibacter . In phylogenetic analyses of 16S rRNA genes, the Methanocorpusculum endosymbiont of T. finlayi n. sp. grouped with sequences from Methanomicrobia, including the endosymbiont of an earlier isolate of the same species, ' Trimyema sp.,' which was sampled approximately 22 years earlier, at a distant (∼400 km) geographical location. Identification of the same endosymbiont species in the two separate isolates of T. finlayi n. sp. provides evidence for spatial and temporal stability of the Methanocorpusculum-T. finlayi n. sp. endosymbiosis. T. finlayi n. sp. and T. compressum provide an example of two closely related anaerobic ciliates that have endosymbionts from different methanogen genera, suggesting that the endosymbionts have not co-speciated with their hosts.
Coverage Bias and Sensitivity of Variant Calling for Four Whole-genome Sequencing Technologies
Lasitschka, Bärbel; Jones, David; Northcott, Paul; Hutter, Barbara; Jäger, Natalie; Kool, Marcel; Taylor, Michael; Lichter, Peter; Pfister, Stefan; Wolf, Stephan; Brors, Benedikt; Eils, Roland
2013-01-01
The emergence of high-throughput, next-generation sequencing technologies has dramatically altered the way we assess genomes in population genetics and in cancer genomics. Currently, there are four commonly used whole-genome sequencing platforms on the market: Illumina’s HiSeq2000, Life Technologies’ SOLiD 4 and its completely redesigned 5500xl SOLiD, and Complete Genomics’ technology. A number of earlier studies have compared a subset of those sequencing platforms or compared those platforms with Sanger sequencing, which is prohibitively expensive for whole genome studies. Here we present a detailed comparison of the performance of all currently available whole genome sequencing platforms, especially regarding their ability to call SNVs and to evenly cover the genome and specific genomic regions. Unlike earlier studies, we base our comparison on four different samples, allowing us to assess the between-sample variation of the platforms. We find a pronounced GC bias in GC-rich regions for Life Technologies’ platforms, with Complete Genomics performing best here, while we see the least bias in GC-poor regions for HiSeq2000 and 5500xl. HiSeq2000 gives the most uniform coverage and displays the least sample-to-sample variation. In contrast, Complete Genomics exhibits by far the smallest fraction of bases not covered, while the SOLiD platforms reveal remarkable shortcomings, especially in covering CpG islands. When comparing the performance of the four platforms for calling SNPs, HiSeq2000 and Complete Genomics achieve the highest sensitivity, while the SOLiD platforms show the lowest false positive rate. Finally, we find that integrating sequencing data from different platforms offers the potential to combine the strengths of different technologies. In summary, our results detail the strengths and weaknesses of all four whole-genome sequencing platforms. It indicates application areas that call for a specific sequencing platform and disallow other platforms. This helps to identify the proper sequencing platform for whole genome studies with different application scopes. PMID:23776689
Discovering Motifs in Biological Sequences Using the Micron Automata Processor.
Roy, Indranil; Aluru, Srinivas
2016-01-01
Finding approximately conserved sequences, called motifs, across multiple DNA or protein sequences is an important problem in computational biology. In this paper, we consider the (l, d) motif search problem of identifying one or more motifs of length l present in at least q of the n given sequences, with each occurrence differing from the motif in at most d substitutions. The problem is known to be NP-complete, and the largest solved instance reported to date is (26,11). We propose a novel algorithm for the (l,d) motif search problem using streaming execution over a large set of non-deterministic finite automata (NFA). This solution is designed to take advantage of the micron automata processor, a new technology close to deployment that can simultaneously execute multiple NFA in parallel. We demonstrate the capability for solving much larger instances of the (l, d) motif search problem using the resources available within a single automata processor board, by estimating run-times for problem instances (39,18) and (40,17). The paper serves as a useful guide to solving problems using this new accelerator technology.
SD-MSAEs: Promoter recognition in human genome based on deep feature extraction.
Xu, Wenxuan; Zhang, Li; Lu, Yaping
2016-06-01
The prediction and recognition of promoter in human genome play an important role in DNA sequence analysis. Entropy, in Shannon sense, of information theory is a multiple utility in bioinformatic details analysis. The relative entropy estimator methods based on statistical divergence (SD) are used to extract meaningful features to distinguish different regions of DNA sequences. In this paper, we choose context feature and use a set of methods of SD to select the most effective n-mers distinguishing promoter regions from other DNA regions in human genome. Extracted from the total possible combinations of n-mers, we can get four sparse distributions based on promoter and non-promoters training samples. The informative n-mers are selected by optimizing the differentiating extents of these distributions. Specially, we combine the advantage of statistical divergence and multiple sparse auto-encoders (MSAEs) in deep learning to extract deep feature for promoter recognition. And then we apply multiple SVMs and a decision model to construct a human promoter recognition method called SD-MSAEs. Framework is flexible that it can integrate new feature extraction or new classification models freely. Experimental results show that our method has high sensitivity and specificity. Copyright © 2016 Elsevier Inc. All rights reserved.
Super (a*, d*)-ℋ-antimagic total covering of second order of shackle graphs
NASA Astrophysics Data System (ADS)
Hesti Agustin, Ika; Dafik; Nisviasari, Rosanita; Prihandini, R. M.
2017-12-01
Let H be a simple and connected graph. A shackle of graph H, denoted by G = shack(H, v, n), is a graph G constructed by non-trivial graphs H 1, H 2, …, H n such that, for every 1 ≤ s, t ≤ n, H s and Ht have no a common vertex with |s - t| ≥ 2 and for every 1 ≤ i ≤ n - 1, Hi and H i+1 share exactly one common vertex v, called connecting vertex, and those k - 1 connecting vertices are all distinct. The graph G is said to be an (a*, d*)-H-antimagic total graph of second order if there exist a bijective function f : V(G) ∪ E(G) → {1, 2, …, |V(G)| + |E(G)|} such that for all subgraphs isomorphic to H, the total H-weights W(H)=\\displaystyle {\\sum }v\\in V(H)f(v)+\\displaystyle {\\sum }e\\in E(H)f(e) form an arithmetic sequence of second order of \\{a* ,a* +d* ,a* +3d* ,a* +6d* ,\\ldots ,a* +(\\frac{{n}2-n}{2})d* \\}, where a* and d* are positive integers and n is the number of all subgraphs isomorphic to H. An (a*, d*)-H-antimagic total labeling of second order f is called super if the smallest labels appear in the vertices. In this paper, we study a super (a*, d*)-H antimagic total labeling of second order of G = shack(H, v, n) by using a partition technique of second order.
Quick, Joshua; Quinlan, Aaron R; Loman, Nicholas J
2014-01-01
The MinION™ is a new, portable single-molecule sequencer developed by Oxford Nanopore Technologies. It measures four inches in length and is powered from the USB 3.0 port of a laptop computer. The MinION™ measures the change in current resulting from DNA strands interacting with a charged protein nanopore. These measurements can then be used to deduce the underlying nucleotide sequence. We present a read dataset from whole-genome shotgun sequencing of the model organism Escherichia coli K-12 substr. MG1655 generated on a MinION™ device during the early-access MinION™ Access Program (MAP). Sequencing runs of the MinION™ are presented, one generated using R7 chemistry (released in July 2014) and one using R7.3 (released in September 2014). Base-called sequence data are provided to demonstrate the nature of data produced by the MinION™ platform and to encourage the development of customised methods for alignment, consensus and variant calling, de novo assembly and scaffolding. FAST5 files containing event data within the HDF5 container format are provided to assist with the development of improved base-calling methods.
A short review of variants calling for single-cell-sequencing data with applications.
Wei, Zhuohui; Shu, Chang; Zhang, Changsheng; Huang, Jingying; Cai, Hongmin
2017-11-01
The field of single-cell sequencing is fleetly expanding, and many techniques have been developed in the past decade. With this technology, biologists can study not only the heterogeneity between two adjacent cells in the same tissue or organ, but also the evolutionary relationships and degenerative processes in a single cell. Calling variants is the main purpose in analyzing single cell sequencing (SCS) data. Currently, some popular methods used for bulk-cell-sequencing data analysis are tailored directly to be applied in dealing with SCS data. However, SCS requires an extra step of genome amplification to accumulate enough quantity for satisfying sequencing needs. The amplification yields large biases and thus raises challenge for using the bulk-cell-sequencing methods. In order to provide guidance for the development of specialized analyzed methods as well as using currently developed tools for SNS, this paper aims to bridge the gap. In this paper, we firstly introduced two popular genome amplification methods and compared their capabilities. Then we introduced a few popular models for calling single-nucleotide polymorphisms and copy-number variations. Finally, break-through applications of SNS were summarized to demonstrate its potential in researching cell evolution. Copyright © 2017 Elsevier Ltd. All rights reserved.
Melnikova, Nataliya V.; Dmitriev, Alexey A.; Belenikin, Maxim S.; Koroban, Nadezhda V.; Speranskaya, Anna S.; Krinitsina, Anastasia A.; Krasnov, George S.; Lakunina, Valentina A.; Snezhkina, Anastasiya V.; Sadritdinova, Asiya F.; Kishlyan, Natalya V.; Rozhmina, Tatiana A.; Klimina, Kseniya M.; Amosova, Alexandra V.; Zelenin, Alexander V.; Muravenko, Olga V.; Bolsheva, Nadezhda L.; Kudryavtseva, Anna V.
2016-01-01
Cultivated flax (Linum usitatissimum L.) is an important plant valuable for industry. Some flax lines can undergo heritable phenotypic and genotypic changes (LIS-1 insertion being the most common) in response to nutrient stress and are called plastic lines. Offspring of plastic lines, which stably inherit the changes, are called genotrophs. MicroRNAs (miRNAs) are involved in a crucial regulatory mechanism of gene expression. They have previously been assumed to take part in nutrient stress response and can, therefore, participate in genotroph formation. In the present study, we performed high-throughput sequencing of small RNAs (sRNAs) extracted from flax plants grown under normal, phosphate deficient and nutrient excess conditions to identify miRNAs and evaluate their expression. Our analysis revealed expression of 96 conserved miRNAs from 21 families in flax. Moreover, 475 novel potential miRNAs were identified for the first time, and their targets were predicted. However, none of the identified miRNAs were transcribed from LIS-1. Expression of seven miRNAs (miR168, miR169, miR395, miR398, miR399, miR408, and lus-miR-N1) with up- or down-regulation under nutrient stress (on the basis of high-throughput sequencing data) was evaluated on extended sampling using qPCR. Reference gene search identified ETIF3H and ETIF3E genes as most suitable for this purpose. Down-regulation of novel potential lus-miR-N1 and up-regulation of conserved miR399 were revealed under the phosphate deficient conditions. In addition, the negative correlation of expression of lus-miR-N1 and its predicted target, ubiquitin-activating enzyme E1 gene, as well as, miR399 and its predicted target, ubiquitin-conjugating enzyme E2 gene, was observed. Thus, in our study, miRNAs expressed in flax plastic lines and genotrophs were identified and their expression and expression of their targets was evaluated using high-throughput sequencing and qPCR for the first time. These data provide new insights into nutrient stress response regulation in plastic flax cultivars. PMID:27092149
NASA Technical Reports Server (NTRS)
Backus, George
1987-01-01
Let R be the real numbers, R(n) the linear space of all real n-tuples, and R(infinity) the linear space of all infinite real sequences x = (x sub 1, x sub 2,...). Let P sub n :R(infinity) approaches R(n) be the projection operator with P sub n (x) = (x sub 1,...,x sub n). Let p(infinity) be a probability measure on the smallest sigma-ring of subsets of R(infinity) which includes all of the cylinder sets P sub n(-1) (B sub n), where B sub n is an arbitrary Borel subset of R(n). Let p sub n be the marginal distribution of p(infinity) on R(n), so p sub n(B sub n) = p(infinity)(P sub n to the -1(B sub n)) for each B sub n. A measure on R(n) is isotropic if it is invariant under all orthogonal transformations of R(n). All members of the set of all isotropic probability distributions on R(n) are described. The result calls into question both stochastic inversion and Bayesian inference, as currently used in many geophysical inverse problems.
Syed, Mudasir Ahmad; Bhat, Farooz Ahmad; Balkhi, Masood-ul Hassan; Bhat, Bilal Ahmad
2016-01-01
Schizothoracine fish commonly called snow trouts inhibit the entire network of snow and spring fed cool waters of Kashmir, India. Over 10 species reported earlier, only five species have been found, these include Schizothorax niger, Schizothorax esocinus, Schizothorax plagiostomus, Schizothorax curvifrons and Schizothorax labiatus. The relationship between these species is contradicting. To understand the evolutionary relation of these species, we examined the sequence information of mitochondrial D-loop of 25 individuals representing five species. Sequence alignment showed D-loop region highly variable and length variation was observed in di-nucleotide (TA)n microsatellite between and within species. Interestingly, all these species have (TA)n microsatellite not associated with longer tandem repeats at the 3' end of the mitochondrial control region and do not show heteroplasmy. Our analysis also indicates the presence of four conserved sequence blocks (CSB), CSB-D, CSB-1, CSB-II and CSB-III, four (Termination Associated Sequence) TAS motifs and 15bp pyrimidine block within the mitochondrial control region, that are highly conserved within genus Schizothorax when compared with other species. The phylogenetic analysis carried by Maximum likelihood (ML), Neighbor Joining (NJ) and Bayesian inference (BI) generated almost identical results. The resultant BI tree showed a close genetic relationship of all the five species and supports two distinct grouping of S. esocinus species. Besides the species relation, the presence of length variation in tandem repeats is attributed to differences in predicting the stability of secondary structures. The role of CSBs and TASs, reported so far as main regulatory signals, would explain the conservation of these elements in evolution.
NASA Astrophysics Data System (ADS)
Mirabi, Mohammad; Fatemi Ghomi, S. M. T.; Jolai, F.
2014-04-01
Flow-shop scheduling problem (FSP) deals with the scheduling of a set of n jobs that visit a set of m machines in the same order. As the FSP is NP-hard, there is no efficient algorithm to reach the optimal solution of the problem. To minimize the holding, delay and setup costs of large permutation flow-shop scheduling problems with sequence-dependent setup times on each machine, this paper develops a novel hybrid genetic algorithm (HGA) with three genetic operators. Proposed HGA applies a modified approach to generate a pool of initial solutions, and also uses an improved heuristic called the iterated swap procedure to improve the initial solutions. We consider the make-to-order production approach that some sequences between jobs are assumed as tabu based on maximum allowable setup cost. In addition, the results are compared to some recently developed heuristics and computational experimental results show that the proposed HGA performs very competitively with respect to accuracy and efficiency of solution.
DNA Base-Calling from a Nanopore Using a Viterbi Algorithm
Timp, Winston; Comer, Jeffrey; Aksimentiev, Aleksei
2012-01-01
Nanopore-based DNA sequencing is the most promising third-generation sequencing method. It has superior read length, speed, and sample requirements compared with state-of-the-art second-generation methods. However, base-calling still presents substantial difficulty because the resolution of the technique is limited compared with the measured signal/noise ratio. Here we demonstrate a method to decode 3-bp-resolution nanopore electrical measurements into a DNA sequence using a Hidden Markov model. This method shows tremendous potential for accuracy (∼98%), even with a poor signal/noise ratio. PMID:22677395
Allele-specific copy-number discovery from whole-genome and whole-exome sequencing.
Wang, WeiBo; Wang, Wei; Sun, Wei; Crowley, James J; Szatkiewicz, Jin P
2015-08-18
Copy-number variants (CNVs) are a major form of genetic variation and a risk factor for various human diseases, so it is crucial to accurately detect and characterize them. It is conceivable that allele-specific reads from high-throughput sequencing data could be leveraged to both enhance CNV detection and produce allele-specific copy number (ASCN) calls. Although statistical methods have been developed to detect CNVs using whole-genome sequence (WGS) and/or whole-exome sequence (WES) data, information from allele-specific read counts has not yet been adequately exploited. In this paper, we develop an integrated method, called AS-GENSENG, which incorporates allele-specific read counts in CNV detection and estimates ASCN using either WGS or WES data. To evaluate the performance of AS-GENSENG, we conducted extensive simulations, generated empirical data using existing WGS and WES data sets and validated predicted CNVs using an independent methodology. We conclude that AS-GENSENG not only predicts accurate ASCN calls but also improves the accuracy of total copy number calls, owing to its unique ability to exploit information from both total and allele-specific read counts while accounting for various experimental biases in sequence data. Our novel, user-friendly and computationally efficient method and a complete analytic protocol is freely available at https://sourceforge.net/projects/asgenseng/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
VIPER: a web application for rapid expert review of variant calls.
Wöste, Marius; Dugas, Martin
2018-06-01
With the rapid development in next-generation sequencing, cost and time requirements for genomic sequencing are decreasing, enabling applications in many areas such as cancer research. Many tools have been developed to analyze genomic variation ranging from single nucleotide variants to whole chromosomal aberrations. As sequencing throughput increases, the number of variants called by such tools also grows. Often employed manual inspection of such calls is thus becoming a time-consuming procedure. We developed the Variant InsPector and Expert Rating tool (VIPER) to speed up this process by integrating the Integrative Genomics Viewer into a web application. Analysts can then quickly iterate through variants, apply filters and make decisions based on the generated images and variant metadata. VIPER was successfully employed in analyses with manual inspection of more than 10 000 calls. VIPER is implemented in Java and Javascript and is freely available at https://github.com/MarWoes/viper. marius.woeste@uni-muenster.de. Supplementary data are available at Bioinformatics online.
Yuan, Shuai; Johnston, H. Richard; Zhang, Guosheng; Li, Yun; Hu, Yi-Juan; Qin, Zhaohui S.
2015-01-01
With rapid decline of the sequencing cost, researchers today rush to embrace whole genome sequencing (WGS), or whole exome sequencing (WES) approach as the next powerful tool for relating genetic variants to human diseases and phenotypes. A fundamental step in analyzing WGS and WES data is mapping short sequencing reads back to the reference genome. This is an important issue because incorrectly mapped reads affect the downstream variant discovery, genotype calling and association analysis. Although many read mapping algorithms have been developed, the majority of them uses the universal reference genome and do not take sequence variants into consideration. Given that genetic variants are ubiquitous, it is highly desirable if they can be factored into the read mapping procedure. In this work, we developed a novel strategy that utilizes genotypes obtained a priori to customize the universal haploid reference genome into a personalized diploid reference genome. The new strategy is implemented in a program named RefEditor. When applying RefEditor to real data, we achieved encouraging improvements in read mapping, variant discovery and genotype calling. Compared to standard approaches, RefEditor can significantly increase genotype calling consistency (from 43% to 61% at 4X coverage; from 82% to 92% at 20X coverage) and reduce Mendelian inconsistency across various sequencing depths. Because many WGS and WES studies are conducted on cohorts that have been genotyped using array-based genotyping platforms previously or concurrently, we believe the proposed strategy will be of high value in practice, which can also be applied to the scenario where multiple NGS experiments are conducted on the same cohort. The RefEditor sources are available at https://github.com/superyuan/refeditor. PMID:26267278
Evans, Teri; Johnson, Andrew D; Loose, Matthew
2018-01-12
Large repeat rich genomes present challenges for assembly using short read technologies. The 32 Gb axolotl genome is estimated to contain ~19 Gb of repetitive DNA making an assembly from short reads alone effectively impossible. Indeed, this model species has been sequenced to 20× coverage but the reads could not be conventionally assembled. Using an alternative strategy, we have assembled subsets of these reads into scaffolds describing over 19,000 gene models. We call this method Virtual Genome Walking as it locally assembles whole genome reads based on a reference transcriptome, identifying exons and iteratively extending them into surrounding genomic sequence. These assemblies are then linked and refined to generate gene models including upstream and downstream genomic, and intronic, sequence. Our assemblies are validated by comparison with previously published axolotl bacterial artificial chromosome (BAC) sequences. Our analyses of axolotl intron length, intron-exon structure, repeat content and synteny provide novel insights into the genic structure of this model species. This resource will enable new experimental approaches in axolotl, such as ChIP-Seq and CRISPR and aid in future whole genome sequencing efforts. The assembled sequences and annotations presented here are freely available for download from https://tinyurl.com/y8gydc6n . The software pipeline is available from https://github.com/LooseLab/iterassemble .
Inexpensive and Highly Reproducible Cloud-Based Variant Calling of 2,535 Human Genomes
Shringarpure, Suyash S.; Carroll, Andrew; De La Vega, Francisco M.; Bustamante, Carlos D.
2015-01-01
Population scale sequencing of whole human genomes is becoming economically feasible; however, data management and analysis remains a formidable challenge for many research groups. Large sequencing studies, like the 1000 Genomes Project, have improved our understanding of human demography and the effect of rare genetic variation in disease. Variant calling on datasets of hundreds or thousands of genomes is time-consuming, expensive, and not easily reproducible given the myriad components of a variant calling pipeline. Here, we describe a cloud-based pipeline for joint variant calling in large samples using the Real Time Genomics population caller. We deployed the population caller on the Amazon cloud with the DNAnexus platform in order to achieve low-cost variant calling. Using our pipeline, we were able to identify 68.3 million variants in 2,535 samples from Phase 3 of the 1000 Genomes Project. By performing the variant calling in a parallel manner, the data was processed within 5 days at a compute cost of $7.33 per sample (a total cost of $18,590 for completed jobs and $21,805 for all jobs). Analysis of cost dependence and running time on the data size suggests that, given near linear scalability, cloud computing can be a cheap and efficient platform for analyzing even larger sequencing studies in the future. PMID:26110529
Le Bras, Ronan J; Kuzma, Heidi; Sucic, Victor; Bokelmann, Götz
2016-05-01
A notable sequence of calls was encountered, spanning several days in January 2003, in the central part of the Indian Ocean on a hydrophone triplet recording acoustic data at a 250 Hz sampling rate. This paper presents signal processing methods applied to the waveform data to detect, group, extract amplitude and bearing estimates for the recorded signals. An approximate location for the source of the sequence of calls is inferred from extracting the features from the waveform. As the source approaches the hydrophone triplet, the source level (SL) of the calls is estimated at 187 ± 6 dB re: 1 μPa-1 m in the 15-60 Hz frequency range. The calls are attributed to a subgroup of blue whales, Balaenoptera musculus, with a characteristic acoustic signature. A Bayesian location method using probabilistic models for bearing and amplitude is demonstrated on the calls sequence. The method is applied to the case of detection at a single triad of hydrophones and results in a probability distribution map for the origin of the calls. It can be extended to detections at multiple triads and because of the Bayesian formulation, additional modeling complexity can be built-in as needed.
VaDiR: an integrated approach to Variant Detection in RNA.
Neums, Lisa; Suenaga, Seiji; Beyerlein, Peter; Anders, Sara; Koestler, Devin; Mariani, Andrea; Chien, Jeremy
2018-02-01
Advances in next-generation DNA sequencing technologies are now enabling detailed characterization of sequence variations in cancer genomes. With whole-genome sequencing, variations in coding and non-coding sequences can be discovered. But the cost associated with it is currently limiting its general use in research. Whole-exome sequencing is used to characterize sequence variations in coding regions, but the cost associated with capture reagents and biases in capture rate limit its full use in research. Additional limitations include uncertainty in assigning the functional significance of the mutations when these mutations are observed in the non-coding region or in genes that are not expressed in cancer tissue. We investigated the feasibility of uncovering mutations from expressed genes using RNA sequencing datasets with a method called Variant Detection in RNA(VaDiR) that integrates 3 variant callers, namely: SNPiR, RVBoost, and MuTect2. The combination of all 3 methods, which we called Tier 1 variants, produced the highest precision with true positive mutations from RNA-seq that could be validated at the DNA level. We also found that the integration of Tier 1 variants with those called by MuTect2 and SNPiR produced the highest recall with acceptable precision. Finally, we observed a higher rate of mutation discovery in genes that are expressed at higher levels. Our method, VaDiR, provides a possibility of uncovering mutations from RNA sequencing datasets that could be useful in further functional analysis. In addition, our approach allows orthogonal validation of DNA-based mutation discovery by providing complementary sequence variation analysis from paired RNA/DNA sequencing datasets.
Stam, Remco; Scheikl, Daniela; Tellier, Aurélien
2016-06-02
Nod-like receptors (NLRs) are nucleotide-binding domain and leucine-rich repeats containing proteins that are important in plant resistance signaling. Many of the known pathogen resistance (R) genes in plants are NLRs and they can recognize pathogen molecules directly or indirectly. As such, divergence and copy number variants at these genes are found to be high between species. Within populations, positive and balancing selection are to be expected if plants coevolve with their pathogens. In order to understand the complexity of R-gene coevolution in wild nonmodel species, it is necessary to identify the full range of NLRs and infer their evolutionary history. Here we investigate and reveal polymorphism occurring at 220 NLR genes within one population of the partially selfing wild tomato species Solanum pennellii. We use a combination of enrichment sequencing and pooling ten individuals, to specifically sequence NLR genes in a resource and cost-effective manner. We focus on the effects which different mapping and single nucleotide polymorphism calling software and settings have on calling polymorphisms in customized pooled samples. Our results are accurately verified using Sanger sequencing of polymorphic gene fragments. Our results indicate that some NLRs, namely 13 out of 220, have maintained polymorphism within our S. pennellii population. These genes show a wide range of πN/πS ratios and differing site frequency spectra. We compare our observed rate of heterozygosity with expectations for this selfing and bottlenecked population. We conclude that our method enables us to pinpoint NLR genes which have experienced natural selection in their habitat. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Pure Perceptual-Based Sequence Learning: A Role for Visuospatial Attention
ERIC Educational Resources Information Center
Remillard, Gilbert
2009-01-01
Learning the structure of a sequence of target locations when target location is not the response dimension and the sequence of target locations is uncorrelated with the sequence of responses is called pure perceptual-based sequence learning. The paradigm introduced by G. Remillard (2003) was used to determine whether orienting of visuospatial…
Program Synthesizes UML Sequence Diagrams
NASA Technical Reports Server (NTRS)
Barry, Matthew R.; Osborne, Richard N.
2006-01-01
A computer program called "Rational Sequence" generates Universal Modeling Language (UML) sequence diagrams of a target Java program running on a Java virtual machine (JVM). Rational Sequence thereby performs a reverse engineering function that aids in the design documentation of the target Java program. Whereas previously, the construction of sequence diagrams was a tedious manual process, Rational Sequence generates UML sequence diagrams automatically from the running Java code.
Stretching chimeric DNA: A test for the putative S-form
NASA Astrophysics Data System (ADS)
Whitelam, Stephen; Pronk, Sander; Geissler, Phillip L.
2008-11-01
Double-stranded DNA "overstretches" at a pulling force of about 65 pN, increasing in length by a factor of 1.7. The nature of the overstretched state is unknown, despite its considerable importance for DNA's biological function and technological application. Overstretching is thought by some to be a force-induced denaturation and by others to consist of a transition to an elongated, hybridized state called S-DNA. Within a statistical mechanical model, we consider the effect upon overstretching of extreme sequence heterogeneity. "Chimeric" sequences possessing halves of markedly different AT composition elongate under fixed external conditions via distinct, spatially segregated transitions. The corresponding force-extension data vary with pulling rate in a manner that depends qualitatively and strikingly upon whether the hybridized S-form is accessible. This observation implies a test for S-DNA that could be performed in experiment.
Augmented brain function by coordinated reset stimulation with slowly varying sequences.
Zeitler, Magteld; Tass, Peter A
2015-01-01
Several brain disorders are characterized by abnormally strong neuronal synchrony. Coordinated Reset (CR) stimulation was developed to selectively counteract abnormal neuronal synchrony by desynchronization. For this, phase resetting stimuli are delivered to different subpopulations in a timely coordinated way. In neural networks with spike timing-dependent plasticity CR stimulation may eventually lead to an anti-kindling, i.e., an unlearning of abnormal synaptic connectivity and abnormal synchrony. The spatiotemporal sequence by which all stimulation sites are stimulated exactly once is called the stimulation site sequence, or briefly sequence. So far, in simulations, pre-clinical and clinical applications CR was applied either with fixed sequences or rapidly varying sequences (RVS). In this computational study we show that appropriate repetition of the sequence with occasional random switching to the next sequence may significantly improve the anti-kindling effect of CR. To this end, a sequence is applied many times before randomly switching to the next sequence. This new method is called SVS CR stimulation, i.e., CR with slowly varying sequences. In a neuronal network with strong short-range excitatory and weak long-range inhibitory dynamic couplings SVS CR stimulation turns out to be superior to CR stimulation with fixed sequences or RVS.
Augmented brain function by coordinated reset stimulation with slowly varying sequences
Zeitler, Magteld; Tass, Peter A.
2015-01-01
Several brain disorders are characterized by abnormally strong neuronal synchrony. Coordinated Reset (CR) stimulation was developed to selectively counteract abnormal neuronal synchrony by desynchronization. For this, phase resetting stimuli are delivered to different subpopulations in a timely coordinated way. In neural networks with spike timing-dependent plasticity CR stimulation may eventually lead to an anti-kindling, i.e., an unlearning of abnormal synaptic connectivity and abnormal synchrony. The spatiotemporal sequence by which all stimulation sites are stimulated exactly once is called the stimulation site sequence, or briefly sequence. So far, in simulations, pre-clinical and clinical applications CR was applied either with fixed sequences or rapidly varying sequences (RVS). In this computational study we show that appropriate repetition of the sequence with occasional random switching to the next sequence may significantly improve the anti-kindling effect of CR. To this end, a sequence is applied many times before randomly switching to the next sequence. This new method is called SVS CR stimulation, i.e., CR with slowly varying sequences. In a neuronal network with strong short-range excitatory and weak long-range inhibitory dynamic couplings SVS CR stimulation turns out to be superior to CR stimulation with fixed sequences or RVS. PMID:25873867
NASA Astrophysics Data System (ADS)
Cai, Lei; Yuan, Wei; Zhang, Zhou; He, Lin; Chou, Kuo-Chen
2016-11-01
Four popular somatic single nucleotide variant (SNV) calling methods (Varscan, SomaticSniper, Strelka and MuTect2) were carefully evaluated on the real whole exome sequencing (WES, depth of ~50X) and ultra-deep targeted sequencing (UDT-Seq, depth of ~370X) data. The four tools returned poor consensus on candidates (only 20% of calls were with multiple hits by the callers). For both WES and UDT-Seq, MuTect2 and Strelka obtained the largest proportion of COSMIC entries as well as the lowest rate of dbSNP presence and high-alternative-alleles-in-control calls, demonstrating their superior sensitivity and accuracy. Combining different callers does increase reliability of candidates, but narrows the list down to very limited range of tumor read depth and variant allele frequency. Calling SNV on UDT-Seq data, which were of much higher read-depth, discovered additional true-positive variations, despite an even more tremendous growth in false positive predictions. Our findings not only provide valuable benchmark for state-of-the-art SNV calling methods, but also shed light on the access to more accurate SNV identification in the future.
Ultraaccurate genome sequencing and haplotyping of single human cells.
Chu, Wai Keung; Edge, Peter; Lee, Ho Suk; Bansal, Vikas; Bafna, Vineet; Huang, Xiaohua; Zhang, Kun
2017-11-21
Accurate detection of variants and long-range haplotypes in genomes of single human cells remains very challenging. Common approaches require extensive in vitro amplification of genomes of individual cells using DNA polymerases and high-throughput short-read DNA sequencing. These approaches have two notable drawbacks. First, polymerase replication errors could generate tens of thousands of false-positive calls per genome. Second, relatively short sequence reads contain little to no haplotype information. Here we report a method, which is dubbed SISSOR (single-stranded sequencing using microfluidic reactors), for accurate single-cell genome sequencing and haplotyping. A microfluidic processor is used to separate the Watson and Crick strands of the double-stranded chromosomal DNA in a single cell and to randomly partition megabase-size DNA strands into multiple nanoliter compartments for amplification and construction of barcoded libraries for sequencing. The separation and partitioning of large single-stranded DNA fragments of the homologous chromosome pairs allows for the independent sequencing of each of the complementary and homologous strands. This enables the assembly of long haplotypes and reduction of sequence errors by using the redundant sequence information and haplotype-based error removal. We demonstrated the ability to sequence single-cell genomes with error rates as low as 10 -8 and average 500-kb-long DNA fragments that can be assembled into haplotype contigs with N50 greater than 7 Mb. The performance could be further improved with more uniform amplification and more accurate sequence alignment. The ability to obtain accurate genome sequences and haplotype information from single cells will enable applications of genome sequencing for diverse clinical needs. Copyright © 2017 the Author(s). Published by PNAS.
Humble, Emily; Thorne, Michael A S; Forcada, Jaume; Hoffman, Joseph I
2016-08-26
Single nucleotide polymorphism (SNP) discovery is an important goal of many studies. However, the number of 'putative' SNPs discovered from a sequence resource may not provide a reliable indication of the number that will successfully validate with a given genotyping technology. For this it may be necessary to account for factors such as the method used for SNP discovery and the type of sequence data from which it originates, suitability of the SNP flanking sequences for probe design, and genomic context. To explore the relative importance of these and other factors, we used Illumina sequencing to augment an existing Roche 454 transcriptome assembly for the Antarctic fur seal (Arctocephalus gazella). We then mapped the raw Illumina reads to the new hybrid transcriptome using BWA and BOWTIE2 before calling SNPs with GATK. The resulting markers were pooled with two existing sets of SNPs called from the original 454 assembly using NEWBLER and SWAP454. Finally, we explored the extent to which SNPs discovered using these four methods overlapped and predicted the corresponding validation outcomes for both Illumina Infinium iSelect HD and Affymetrix Axiom arrays. Collating markers across all discovery methods resulted in a global list of 34,718 SNPs. However, concordance between the methods was surprisingly poor, with only 51.0 % of SNPs being discovered by more than one method and 13.5 % being called from both the 454 and Illumina datasets. Using a predictive modeling approach, we could also show that SNPs called from the Illumina data were on average more likely to successfully validate, as were SNPs called by more than one method. Above and beyond this pattern, predicted validation outcomes were also consistently better for Affymetrix Axiom arrays. Our results suggest that focusing on SNPs called by more than one method could potentially improve validation outcomes. They also highlight possible differences between alternative genotyping technologies that could be explored in future studies of non-model organisms.
STORMSeq: an open-source, user-friendly pipeline for processing personal genomics data in the cloud.
Karczewski, Konrad J; Fernald, Guy Haskin; Martin, Alicia R; Snyder, Michael; Tatonetti, Nicholas P; Dudley, Joel T
2014-01-01
The increasing public availability of personal complete genome sequencing data has ushered in an era of democratized genomics. However, read mapping and variant calling software is constantly improving and individuals with personal genomic data may prefer to customize and update their variant calls. Here, we describe STORMSeq (Scalable Tools for Open-Source Read Mapping), a graphical interface cloud computing solution that does not require a parallel computing environment or extensive technical experience. This customizable and modular system performs read mapping, read cleaning, and variant calling and annotation. At present, STORMSeq costs approximately $2 and 5-10 hours to process a full exome sequence and $30 and 3-8 days to process a whole genome sequence. We provide this open-access and open-source resource as a user-friendly interface in Amazon EC2.
Sequence Diversity Diagram for comparative analysis of multiple sequence alignments.
Sakai, Ryo; Aerts, Jan
2014-01-01
The sequence logo is a graphical representation of a set of aligned sequences, commonly used to depict conservation of amino acid or nucleotide sequences. Although it effectively communicates the amount of information present at every position, this visual representation falls short when the domain task is to compare between two or more sets of aligned sequences. We present a new visual presentation called a Sequence Diversity Diagram and validate our design choices with a case study. Our software was developed using the open-source program called Processing. It loads multiple sequence alignment FASTA files and a configuration file, which can be modified as needed to change the visualization. The redesigned figure improves on the visual comparison of two or more sets, and it additionally encodes information on sequential position conservation. In our case study of the adenylate kinase lid domain, the Sequence Diversity Diagram reveals unexpected patterns and new insights, for example the identification of subgroups within the protein subfamily. Our future work will integrate this visual encoding into interactive visualization tools to support higher level data exploration tasks.
Hügel, Theresa; van Meir, Vincent; Muñoz-Meneses, Amanda; Clarin, B-Markus; Siemers, Björn M; Goerlitz, Holger R
2017-01-01
Animals can gain important information by attending to the signals and cues of other animals in their environment, with acoustic information playing a major role in many taxa. Echolocation call sequences of bats contain information about the identity and behaviour of the sender which is perceptible to close-by receivers. Increasing evidence supports the communicative function of echolocation within species, yet data about its role for interspecific information transfer is scarce. Here, we asked which information bats extract from heterospecific echolocation calls during foraging. In three linked playback experiments, we tested in the flight room and field if foraging Myotis bats approached the foraging call sequences of conspecifics and four heterospecifics that were similar in acoustic call structure only (acoustic similarity hypothesis), in foraging ecology only (foraging similarity hypothesis), both, or none. Compared to the natural prey capture rate of 1.3 buzzes per minute of bat activity, our playbacks of foraging sequences with 23-40 buzzes/min simulated foraging patches with significantly higher profitability. In the flight room, M. capaccinii only approached call sequences of conspecifics and of the heterospecific M. daubentonii with similar acoustics and foraging ecology. In the field, M. capaccinii and M. daubentonii only showed a weak positive response to those two species. Our results confirm information transfer across species boundaries and highlight the importance of context on the studied behaviour, but cannot resolve whether information transfer in trawling Myotis is based on acoustic similarity only or on a combination of similarity in acoustics and foraging ecology. Animals transfer information, both voluntarily and inadvertently, and within and across species boundaries. In echolocating bats, acoustic call structure and foraging ecology are linked, making echolocation calls a rich source of information about species identity, ecology and activity of the sender, which receivers might exploit to find profitable foraging grounds. We tested in three lab and field experiments if information transfer occurs between bat species and if bats obtain information about ecology from echolocation calls. Myotis capaccinii/daubentonii bats approached call playbacks, but only those from con- and heterospecifics with similar call structure and foraging ecology, confirming interspecific information transfer. Reactions differed between lab and field, emphasising situation-dependent differences in animal behaviour, the importance of field research, and the need for further studies on the underlying mechanism of information transfer and the relative contributions of acoustic and ecological similarity.
DNA base-calling from a nanopore using a Viterbi algorithm.
Timp, Winston; Comer, Jeffrey; Aksimentiev, Aleksei
2012-05-16
Nanopore-based DNA sequencing is the most promising third-generation sequencing method. It has superior read length, speed, and sample requirements compared with state-of-the-art second-generation methods. However, base-calling still presents substantial difficulty because the resolution of the technique is limited compared with the measured signal/noise ratio. Here we demonstrate a method to decode 3-bp-resolution nanopore electrical measurements into a DNA sequence using a Hidden Markov model. This method shows tremendous potential for accuracy (~98%), even with a poor signal/noise ratio. Copyright © 2012 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Luo, Ruibang; Wong, Yiu-Lun; Law, Wai-Chun; Lee, Lap-Kei; Cheung, Jeanno; Liu, Chi-Man; Lam, Tak-Wah
2014-01-01
This paper reports an integrated solution, called BALSA, for the secondary analysis of next generation sequencing data; it exploits the computational power of GPU and an intricate memory management to give a fast and accurate analysis. From raw reads to variants (including SNPs and Indels), BALSA, using just a single computing node with a commodity GPU board, takes 5.5 h to process 50-fold whole genome sequencing (∼750 million 100 bp paired-end reads), or just 25 min for 210-fold whole exome sequencing. BALSA's speed is rooted at its parallel algorithms to effectively exploit a GPU to speed up processes like alignment, realignment and statistical testing. BALSA incorporates a 16-genotype model to support the calling of SNPs and Indels and achieves competitive variant calling accuracy and sensitivity when compared to the ensemble of six popular variant callers. BALSA also supports efficient identification of somatic SNVs and CNVs; experiments showed that BALSA recovers all the previously validated somatic SNVs and CNVs, and it is more sensitive for somatic Indel detection. BALSA outputs variants in VCF format. A pileup-like SNAPSHOT format, while maintaining the same fidelity as BAM in variant calling, enables efficient storage and indexing, and facilitates the App development of downstream analyses. BALSA is available at: http://sourceforge.net/p/balsa.
Degefu, Tulu; Wolde-Meskel, Endalkachew; Rasche, Frank
2018-01-01
Vigna unguiculata, Vigna radiata and Arachis hypogaea growing in Ethiopia are nodulated by a genetically diverse group of Bradyrhizobium strains. To determine the genetic identity and symbiotic effectiveness of these bacteria, a collection of 36 test strains originating from the root nodules of the three hosts was investigated using multilocus sequence analyses (MLSA) of core genes including 16S rRNA, recA, glnII, gyrB, atpD and dnaK. Sequence analysis of nodA and nifH genes along with tests for symbiotic effectiveness using δ 15 N analysis were also carried out. The phylogenetic trees derived from the MLSA grouped most test strains into four well-supported distinct positions designated as genospecies I-IV. The maximum likelihood (ML) tree that was constructed based on the nodA gene sequences separated the entire test strains into two lineages, where the majority of the test strains were clustered on one of a well-supported large branch that comprise Bradyrhizobium species from the tropics. This clearly suggested the monophyletic origin of the nodA genes within the bradyrhizobia of tropical origin. The δ 15 N-based symbiotic effectiveness test of seven selected strains revealed that strains GN100 (δ 15 N=0.73) and GN102 (δ 15 N=0.79) were highly effective nitrogen fixers when inoculated to cowpea, thus can be considered as inoculants in cowpea production. It was concluded that Ethiopian soils are a hotspot for rhizobial diversity. This calls for further research to unravel as yet unknown bradyrhizobia nodulating legume host species growing in the country. In this respect, prospective research should also address the mechanisms of symbiotic specificity that could lead to high nitrogen fixation in target legumes.
USDA-ARS?s Scientific Manuscript database
Over the past decade, Next Generation Sequencing (NGS) technologies, also called deep sequencing, have continued to evolve, increasing capacity and lower the cost necessary for large genome sequencing projects. The one of the advantage of NGS platforms is the possibility to sequence the samples with...
Quail, Michael A; Smith, Miriam; Coupland, Paul; Otto, Thomas D; Harris, Simon R; Connor, Thomas R; Bertoni, Anna; Swerdlow, Harold P; Gu, Yong
2012-07-24
Next generation sequencing (NGS) technology has revolutionized genomic and genetic research. The pace of change in this area is rapid with three major new sequencing platforms having been released in 2011: Ion Torrent's PGM, Pacific Biosciences' RS and the Illumina MiSeq. Here we compare the results obtained with those platforms to the performance of the Illumina HiSeq, the current market leader. In order to compare these platforms, and get sufficient coverage depth to allow meaningful analysis, we have sequenced a set of 4 microbial genomes with mean GC content ranging from 19.3 to 67.7%. Together, these represent a comprehensive range of genome content. Here we report our analysis of that sequence data in terms of coverage distribution, bias, GC distribution, variant detection and accuracy. Sequence generated by Ion Torrent, MiSeq and Pacific Biosciences technologies displays near perfect coverage behaviour on GC-rich, neutral and moderately AT-rich genomes, but a profound bias was observed upon sequencing the extremely AT-rich genome of Plasmodium falciparum on the PGM, resulting in no coverage for approximately 30% of the genome. We analysed the ability to call variants from each platform and found that we could call slightly more variants from Ion Torrent data compared to MiSeq data, but at the expense of a higher false positive rate. Variant calling from Pacific Biosciences data was possible but higher coverage depth was required. Context specific errors were observed in both PGM and MiSeq data, but not in that from the Pacific Biosciences platform. All three fast turnaround sequencers evaluated here were able to generate usable sequence. However there are key differences between the quality of that data and the applications it will support.
In vitro selection using a dual RNA library that allows primerless selection
Jarosch, Florian; Buchner, Klaus; Klussmann, Sven
2006-01-01
High affinity target-binding aptamers are identified from random oligonucleotide libraries by an in vitro selection process called Systematic Evolution of Ligands by EXponential enrichment (SELEX). Since the SELEX process includes a PCR amplification step the randomized region of the oligonucleotide libraries need to be flanked by two fixed primer binding sequences. These primer binding sites are often difficult to truncate because they may be necessary to maintain the structure of the aptamer or may even be part of the target binding motif. We designed a novel type of RNA library that carries fixed sequences which constrain the oligonucleotides into a partly double-stranded structure, thereby minimizing the risk that the primer binding sequences become part of the target-binding motif. Moreover, the specific design of the library including the use of tandem RNA Polymerase promoters allows the selection of oligonucleotides without any primer binding sequences. The library was used to select aptamers to the mirror-image peptide of ghrelin. Ghrelin is a potent stimulator of growth-hormone release and food intake. After selection, the identified aptamer sequences were directly synthesized in their mirror-image configuration. The final 44 nt-Spiegelmer, named NOX-B11-3, blocks ghrelin action in a cell culture assay displaying an IC50 of 4.5 nM at 37°C. PMID:16855281
2013-01-01
Background Caligid copepods, also called sea lice, are fish ectoparasites, some species of which cause significant problems in the mariculture of salmon, where the annual cost of infection is in excess of €300 million globally. At present, caligid control on farms is mainly achieved using medicinal treatments. However, the continued use of a restricted number of medicine actives potentially favours the development of drug resistance. Here, we report transcriptional changes in a laboratory strain of the caligid Lepeophtheirus salmonis (Krøyer, 1837) that is moderately (~7-fold) resistant to the avermectin compound emamectin benzoate (EMB), a component of the anti-salmon louse agent SLICE® (Merck Animal Health). Results Suppression subtractive hybridisation (SSH) was used to enrich transcripts differentially expressed between EMB-resistant (PT) and drug-susceptible (S) laboratory strains of L. salmonis. SSH libraries were subjected to 454 sequencing. Further L. salmonis transcript sequences were available as expressed sequence tags (EST) from GenBank. Contiguous sequences were generated from both SSH and EST sequences and annotated. Transcriptional responses in PT and S salmon lice were investigated using custom 15 K oligonucleotide microarrays designed using the above sequence resources. In the absence of EMB exposure, 359 targets differed in transcript abundance between the two strains, these genes being enriched for functions such as calcium ion binding, chitin metabolism and muscle structure. γ-aminobutyric acid (GABA)-gated chloride channel (GABA-Cl) and neuronal acetylcholine receptor (nAChR) subunits showed significantly lower transcript levels in PT lice compared to S lice. Using RT-qPCR, the decrease in mRNA levels was estimated at ~1.4-fold for GABA-Cl and ~2.8-fold for nAChR. Salmon lice from the PT strain showed few transcriptional responses following acute exposure (1 or 3 h) to 200 μg L-1 of EMB, a drug concentration tolerated by PT lice, but toxic for S lice. Conclusions Avermectins are believed to exert their toxicity to invertebrates through interaction with glutamate-gated and GABA-gated chloride channels. Further potential drug targets include other Cys-loop ion channels such as nAChR. The present study demonstrates decreased transcript abundances of GABA-Cl and nAChR subunits in EMB-resistant salmon lice, suggesting their involvement in avermectin toxicity in caligids. PMID:23773482
Carmichael, Stephen N; Bron, James E; Taggart, John B; Ireland, Jacqueline H; Bekaert, Michaël; Burgess, Stewart Tg; Skuce, Philip J; Nisbet, Alasdair J; Gharbi, Karim; Sturm, Armin
2013-06-18
Caligid copepods, also called sea lice, are fish ectoparasites, some species of which cause significant problems in the mariculture of salmon, where the annual cost of infection is in excess of €300 million globally. At present, caligid control on farms is mainly achieved using medicinal treatments. However, the continued use of a restricted number of medicine actives potentially favours the development of drug resistance. Here, we report transcriptional changes in a laboratory strain of the caligid Lepeophtheirus salmonis (Krøyer, 1837) that is moderately (~7-fold) resistant to the avermectin compound emamectin benzoate (EMB), a component of the anti-salmon louse agent SLICE® (Merck Animal Health). Suppression subtractive hybridisation (SSH) was used to enrich transcripts differentially expressed between EMB-resistant (PT) and drug-susceptible (S) laboratory strains of L. salmonis. SSH libraries were subjected to 454 sequencing. Further L. salmonis transcript sequences were available as expressed sequence tags (EST) from GenBank. Contiguous sequences were generated from both SSH and EST sequences and annotated. Transcriptional responses in PT and S salmon lice were investigated using custom 15 K oligonucleotide microarrays designed using the above sequence resources. In the absence of EMB exposure, 359 targets differed in transcript abundance between the two strains, these genes being enriched for functions such as calcium ion binding, chitin metabolism and muscle structure. γ-aminobutyric acid (GABA)-gated chloride channel (GABA-Cl) and neuronal acetylcholine receptor (nAChR) subunits showed significantly lower transcript levels in PT lice compared to S lice. Using RT-qPCR, the decrease in mRNA levels was estimated at ~1.4-fold for GABA-Cl and ~2.8-fold for nAChR. Salmon lice from the PT strain showed few transcriptional responses following acute exposure (1 or 3 h) to 200 μg L-1 of EMB, a drug concentration tolerated by PT lice, but toxic for S lice. Avermectins are believed to exert their toxicity to invertebrates through interaction with glutamate-gated and GABA-gated chloride channels. Further potential drug targets include other Cys-loop ion channels such as nAChR. The present study demonstrates decreased transcript abundances of GABA-Cl and nAChR subunits in EMB-resistant salmon lice, suggesting their involvement in avermectin toxicity in caligids.
NASA Technical Reports Server (NTRS)
Wallace, G. R.; Weathers, G. D.; Graf, E. R.
1973-01-01
The statistics of filtered pseudorandom digital sequences called hybrid-sum sequences, formed from the modulo-two sum of several maximum-length sequences, are analyzed. The results indicate that a relation exists between the statistics of the filtered sequence and the characteristic polynomials of the component maximum length sequences. An analysis procedure is developed for identifying a large group of sequences with good statistical properties for applications requiring the generation of analog pseudorandom noise. By use of the analysis approach, the filtering process is approximated by the convolution of the sequence with a sum of unit step functions. A parameter reflecting the overall statistical properties of filtered pseudorandom sequences is derived. This parameter is called the statistical quality factor. A computer algorithm to calculate the statistical quality factor for the filtered sequences is presented, and the results for two examples of sequence combinations are included. The analysis reveals that the statistics of the signals generated with the hybrid-sum generator are potentially superior to the statistics of signals generated with maximum-length generators. Furthermore, fewer calculations are required to evaluate the statistics of a large group of hybrid-sum generators than are required to evaluate the statistics of the same size group of approximately equivalent maximum-length sequences.
SeqMule: automated pipeline for analysis of human exome/genome sequencing data.
Guo, Yunfei; Ding, Xiaolei; Shen, Yufeng; Lyon, Gholson J; Wang, Kai
2015-09-18
Next-generation sequencing (NGS) technology has greatly helped us identify disease-contributory variants for Mendelian diseases. However, users are often faced with issues such as software compatibility, complicated configuration, and no access to high-performance computing facility. Discrepancies exist among aligners and variant callers. We developed a computational pipeline, SeqMule, to perform automated variant calling from NGS data on human genomes and exomes. SeqMule integrates computational-cluster-free parallelization capability built on top of the variant callers, and facilitates normalization/intersection of variant calls to generate consensus set with high confidence. SeqMule integrates 5 alignment tools, 5 variant calling algorithms and accepts various combinations all by one-line command, therefore allowing highly flexible yet fully automated variant calling. In a modern machine (2 Intel Xeon X5650 CPUs, 48 GB memory), when fast turn-around is needed, SeqMule generates annotated VCF files in a day from a 30X whole-genome sequencing data set; when more accurate calling is needed, SeqMule generates consensus call set that improves over single callers, as measured by both Mendelian error rate and consistency. SeqMule supports Sun Grid Engine for parallel processing, offers turn-key solution for deployment on Amazon Web Services, allows quality check, Mendelian error check, consistency evaluation, HTML-based reports. SeqMule is available at http://seqmule.openbioinformatics.org.
Mahato, Ajay Kumar; Sharma, Nimisha; Singh, Akshay; Srivastav, Manish; Jaiprakash; Singh, Sanjay Kumar; Singh, Anand Kumar; Sharma, Tilak Raj; Singh, Nagendra Kumar
2016-01-01
Mango (Mangifera indica L.) is called "king of fruits" due to its sweetness, richness of taste, diversity, large production volume and a variety of end usage. Despite its huge economic importance genomic resources in mango are scarce and genetics of useful horticultural traits are poorly understood. Here we generated deep coverage leaf RNA sequence data for mango parental varieties 'Neelam', 'Dashehari' and their hybrid 'Amrapali' using next generation sequencing technologies. De-novo sequence assembly generated 27,528, 20,771 and 35,182 transcripts for the three genotypes, respectively. The transcripts were further assembled into a non-redundant set of 70,057 unigenes that were used for SSR and SNP identification and annotation. Total 5,465 SSR loci were identified in 4,912 unigenes with 288 type I SSR (n ≥ 20 bp). One hundred type I SSR markers were randomly selected of which 43 yielded PCR amplicons of expected size in the first round of validation and were designated as validated genic-SSR markers. Further, 22,306 SNPs were identified by aligning high quality sequence reads of the three mango varieties to the reference unigene set, revealing significantly enhanced SNP heterozygosity in the hybrid Amrapali. The present study on leaf RNA sequencing of mango varieties and their hybrid provides useful genomic resource for genetic improvement of mango.
Mahato, Ajay Kumar; Sharma, Nimisha; Singh, Akshay; Srivastav, Manish; Jaiprakash; Singh, Sanjay Kumar; Singh, Anand Kumar; Sharma, Tilak Raj; Singh, Nagendra Kumar
2016-01-01
Mango (Mangifera indica L.) is called “king of fruits” due to its sweetness, richness of taste, diversity, large production volume and a variety of end usage. Despite its huge economic importance genomic resources in mango are scarce and genetics of useful horticultural traits are poorly understood. Here we generated deep coverage leaf RNA sequence data for mango parental varieties ‘Neelam’, ‘Dashehari’ and their hybrid ‘Amrapali’ using next generation sequencing technologies. De-novo sequence assembly generated 27,528, 20,771 and 35,182 transcripts for the three genotypes, respectively. The transcripts were further assembled into a non-redundant set of 70,057 unigenes that were used for SSR and SNP identification and annotation. Total 5,465 SSR loci were identified in 4,912 unigenes with 288 type I SSR (n ≥ 20 bp). One hundred type I SSR markers were randomly selected of which 43 yielded PCR amplicons of expected size in the first round of validation and were designated as validated genic-SSR markers. Further, 22,306 SNPs were identified by aligning high quality sequence reads of the three mango varieties to the reference unigene set, revealing significantly enhanced SNP heterozygosity in the hybrid Amrapali. The present study on leaf RNA sequencing of mango varieties and their hybrid provides useful genomic resource for genetic improvement of mango. PMID:27736892
Citrus leprosis virus N: A New Dichorhavirus Causing Citrus Leprosis Disease.
Ramos-González, Pedro Luis; Chabi-Jesus, Camila; Guerra-Peraza, Orlene; Tassi, Aline Daniele; Kitajima, Elliot Watanabe; Harakava, Ricardo; Salaroli, Renato Barbosa; Freitas-Astúa, Juliana
2017-08-01
Citrus leprosis (CL) is a viral disease endemic to the Western Hemisphere that produces local necrotic and chlorotic lesions on leaves, branches, and fruit and causes serious yield reduction in citrus orchards. Samples of sweet orange (Citrus × sinensis) trees showing CL symptoms were collected during a survey in noncommercial citrus areas in the southeast region of Brazil in 2013 to 2016. Transmission electron microscopy analyses of foliar lesions confirmed the presence of rod-like viral particles commonly associated with CL in the nucleus and cytoplasm of infected cells. However, every attempt to identify these particles by reverse-transcription polymerase chain reaction tests failed, even though all described primers for the detection of known CL-causing cileviruses and dichorhaviruses were used. Next-generation sequencing of total RNA extracts from three symptomatic samples revealed the genome of distinct, although highly related (>92% nucleotide sequence identity), viruses whose genetic organization is similar to that of dichorhaviruses. The genome sequence of these viruses showed <62% nucleotide sequence identity with those of orchid fleck virus and coffee ringspot virus. Globally, the deduced amino acid sequences of the open reading frames they encode share 32.7 to 63.8% identity with the proteins of the dichorhavirids. Mites collected from both the naturally infected citrus trees and those used for the transmission of one of the characterized isolates to Arabidopsis plants were anatomically recognized as Brevipalpus phoenicis sensu stricto. Molecular and biological features indicate that the identified viruses belong to a new species of CL-associated dichorhavirus, which we propose to call Citrus leprosis N dichorhavirus. Our results, while emphasizing the increasing diversity of viruses causing CL disease, lead to a reevaluation of the nomenclature of those viruses assigned to the genus Dichorhavirus. In this regard, a comprehensive discussion is presented.
Kulis-Horn, Robert Kasimir; Rückert, Christian; Kalinowski, Jörn; Persicke, Marcus
2017-07-18
The eighth step of L-histidine biosynthesis is carried out by an enzyme called histidinol-phosphate phosphatase (HolPase). Three unrelated HolPase families are known so far. Two of them are well studied: HAD-type HolPases known from Gammaproteobacteria like Escherichia coli or Salmonella enterica and PHP-type HolPases known from yeast and Firmicutes like Bacillus subtilis. However, the third family of HolPases, the inositol monophosphatase (IMPase)-like HolPases, present in Actinobacteria like Corynebacterium glutamicum (HisN) and plants, are poorly characterized. Moreover, there exist several IMPase-like proteins in bacteria (e.g. CysQ, ImpA, and SuhB) which are very similar to HisN but most likely do not participate in L-histidine biosynthesis. Deletion of hisN, the gene encoding the IMPase-like HolPase in C. glutamicum, does not result in complete L-histidine auxotrophy. Out of four hisN homologs present in the genome of C. glutamicum (impA, suhB, cysQ, and cg0911), only cg0911 encodes an enzyme with HolPase activity. The enzymatic properties of HisN and Cg0911 were determined, delivering the first available kinetic data for IMPase-like HolPases. Additionally, we analyzed the amino acid sequences of potential HisN, ImpA, SuhB, CysQ and Cg0911 orthologs from bacteria and identified six conserved sequence motifs for each group of orthologs. Mutational studies confirmed the importance of a highly conserved aspartate residue accompanied by several aromatic amino acid residues present in motif 5 for HolPase activity. Several bacterial proteins containing all identified HolPase motifs, but showing only moderate sequence similarity to HisN from C. glutamicum, were experimentally confirmed as IMPase-like HolPases, demonstrating the value of the identified motifs. Based on the confirmed IMPase-like HolPases two profile Hidden Markov Models (HMMs) were build using an iterative approach. These HMMs allow the fast, reliable detection and differentiation of the two paralog groups from each other and other IMPases. The kinetic data obtained for HisN from C. glutamicum, as an example for an IMPase-like HolPases, shows remarkable differences in enzyme properties as compared to HAD- or PHP-type HolPases. The six sequence motifs and the HMMs presented in this study can be used to reliably differentiate between IMPase-like HolPases and IMPase-like proteins with no such activity, with the potential to enhance current and future genome annotations. A phylogenetic analysis reveals that IMPase-like HolPases are not only present in Actinobacteria and plant but can be found in further bacterial phyla, including, among others, Proteobacteria, Chlorobi and Planctomycetes.
OTG-snpcaller: An Optimized Pipeline Based on TMAP and GATK for SNP Calling from Ion Torrent Data
Huang, Wenpan; Xi, Feng; Lin, Lin; Zhi, Qihuan; Zhang, Wenwei; Tang, Y. Tom; Geng, Chunyu; Lu, Zhiyuan; Xu, Xun
2014-01-01
Because the new Proton platform from Life Technologies produced markedly different data from those of the Illumina platform, the conventional Illumina data analysis pipeline could not be used directly. We developed an optimized SNP calling method using TMAP and GATK (OTG-snpcaller). This method combined our own optimized processes, Remove Duplicates According to AS Tag (RDAST) and Alignment Optimize Structure (AOS), together with TMAP and GATK, to call SNPs from Proton data. We sequenced four sets of exomes captured by Agilent SureSelect and NimbleGen SeqCap EZ Kit, using Life Technology’s Ion Proton sequencer. Then we applied OTG-snpcaller and compared our results with the results from Torrent Variants Caller. The results indicated that OTG-snpcaller can reduce both false positive and false negative rates. Moreover, we compared our results with Illumina results generated by GATK best practices, and we found that the results of these two platforms were comparable. The good performance in variant calling using GATK best practices can be primarily attributed to the high quality of the Illumina sequences. PMID:24824529
OTG-snpcaller: an optimized pipeline based on TMAP and GATK for SNP calling from ion torrent data.
Zhu, Pengyuan; He, Lingyu; Li, Yaqiao; Huang, Wenpan; Xi, Feng; Lin, Lin; Zhi, Qihuan; Zhang, Wenwei; Tang, Y Tom; Geng, Chunyu; Lu, Zhiyuan; Xu, Xun
2014-01-01
Because the new Proton platform from Life Technologies produced markedly different data from those of the Illumina platform, the conventional Illumina data analysis pipeline could not be used directly. We developed an optimized SNP calling method using TMAP and GATK (OTG-snpcaller). This method combined our own optimized processes, Remove Duplicates According to AS Tag (RDAST) and Alignment Optimize Structure (AOS), together with TMAP and GATK, to call SNPs from Proton data. We sequenced four sets of exomes captured by Agilent SureSelect and NimbleGen SeqCap EZ Kit, using Life Technology's Ion Proton sequencer. Then we applied OTG-snpcaller and compared our results with the results from Torrent Variants Caller. The results indicated that OTG-snpcaller can reduce both false positive and false negative rates. Moreover, we compared our results with Illumina results generated by GATK best practices, and we found that the results of these two platforms were comparable. The good performance in variant calling using GATK best practices can be primarily attributed to the high quality of the Illumina sequences.
2007-02-01
almost identical system call sequences and triggering the same alarm at different hosts. The alarm propagation effect can be used to distinguish “true...different hosts. The alarm propagation effect can be used to distinguish “true alarms” from “false positives”. At the host-level, a new anomaly...0H ( ) ( )∑∑ = = ⎟⎟ ⎠ ⎞ ⎜⎜ ⎝ ⎛ − + − = 2 1 1, 2 2 2 2 1 1 ),( ),(),()( ),( ),(),()( k m ji jiT jiTjiTiN jiT jiTjiTiNW where - marginal observed
High depth, whole-genome sequencing of cholera isolates from Haiti and the Dominican Republic.
Sealfon, Rachel; Gire, Stephen; Ellis, Crystal; Calderwood, Stephen; Qadri, Firdausi; Hensley, Lisa; Kellis, Manolis; Ryan, Edward T; LaRocque, Regina C; Harris, Jason B; Sabeti, Pardis C
2012-09-11
Whole-genome sequencing is an important tool for understanding microbial evolution and identifying the emergence of functionally important variants over the course of epidemics. In October 2010, a severe cholera epidemic began in Haiti, with additional cases identified in the neighboring Dominican Republic. We used whole-genome approaches to sequence four Vibrio cholerae isolates from Haiti and the Dominican Republic and three additional V. cholerae isolates to a high depth of coverage (>2000x); four of the seven isolates were previously sequenced. Using these sequence data, we examined the effect of depth of coverage and sequencing platform on genome assembly and identification of sequence variants. We found that 50x coverage is sufficient to construct a whole-genome assembly and to accurately call most variants from 100 base pair paired-end sequencing reads. Phylogenetic analysis between the newly sequenced and thirty-three previously sequenced V. cholerae isolates indicates that the Haitian and Dominican Republic isolates are closest to strains from South Asia. The Haitian and Dominican Republic isolates form a tight cluster, with only four variants unique to individual isolates. These variants are located in the CTX region, the SXT region, and the core genome. Of the 126 mutations identified that separate the Haiti-Dominican Republic cluster from the V. cholerae reference strain (N16961), 73 are non-synonymous changes, and a number of these changes cluster in specific genes and pathways. Sequence variant analyses of V. cholerae isolates, including multiple isolates from the Haitian outbreak, identify coverage-specific and technology-specific effects on variant detection, and provide insight into genomic change and functional evolution during an epidemic.
STORMSeq: An Open-Source, User-Friendly Pipeline for Processing Personal Genomics Data in the Cloud
Karczewski, Konrad J.; Fernald, Guy Haskin; Martin, Alicia R.; Snyder, Michael; Tatonetti, Nicholas P.; Dudley, Joel T.
2014-01-01
The increasing public availability of personal complete genome sequencing data has ushered in an era of democratized genomics. However, read mapping and variant calling software is constantly improving and individuals with personal genomic data may prefer to customize and update their variant calls. Here, we describe STORMSeq (Scalable Tools for Open-Source Read Mapping), a graphical interface cloud computing solution that does not require a parallel computing environment or extensive technical experience. This customizable and modular system performs read mapping, read cleaning, and variant calling and annotation. At present, STORMSeq costs approximately $2 and 5–10 hours to process a full exome sequence and $30 and 3–8 days to process a whole genome sequence. We provide this open-access and open-source resource as a user-friendly interface in Amazon EC2. PMID:24454756
Nakazato, Takeru; Ohta, Tazro; Bono, Hidemasa
2013-01-01
High-throughput sequencing technology, also called next-generation sequencing (NGS), has the potential to revolutionize the whole process of genome sequencing, transcriptomics, and epigenetics. Sequencing data is captured in a public primary data archive, the Sequence Read Archive (SRA). As of January 2013, data from more than 14,000 projects have been submitted to SRA, which is double that of the previous year. Researchers can download raw sequence data from SRA website to perform further analyses and to compare with their own data. However, it is extremely difficult to search entries and download raw sequences of interests with SRA because the data structure is complicated, and experimental conditions along with raw sequences are partly described in natural language. Additionally, some sequences are of inconsistent quality because anyone can submit sequencing data to SRA with no quality check. Therefore, as a criterion of data quality, we focused on SRA entries that were cited in journal articles. We extracted SRA IDs and PubMed IDs (PMIDs) from SRA and full-text versions of journal articles and retrieved 2748 SRA ID-PMID pairs. We constructed a publication list referring to SRA entries. Since, one of the main themes of -omics analyses is clarification of disease mechanisms, we also characterized SRA entries by disease keywords, according to the Medical Subject Headings (MeSH) extracted from articles assigned to each SRA entry. We obtained 989 SRA ID-MeSH disease term pairs, and constructed a disease list referring to SRA data. We previously developed feature profiles of diseases in a system called “Gendoo”. We generated hyperlinks between diseases extracted from SRA and the feature profiles of it. The developed project, publication and disease lists resulting from this study are available at our web service, called “DBCLS SRA” (http://sra.dbcls.jp/). This service will improve accessibility to high-quality data from SRA. PMID:24167589
Detailed temporal structure of communication networks in groups of songbirds.
Stowell, Dan; Gill, Lisa; Clayton, David
2016-06-01
Animals in groups often exchange calls, in patterns whose temporal structure may be influenced by contextual factors such as physical location and the social network structure of the group. We introduce a model-based analysis for temporal patterns of animal call timing, originally developed for networks of firing neurons. This has advantages over cross-correlation analysis in that it can correctly handle common-cause confounds and provides a generative model of call patterns with explicit parameters for the influences between individuals. It also has advantages over standard Markovian analysis in that it incorporates detailed temporal interactions which affect timing as well as sequencing of calls. Further, a fitted model can be used to generate novel synthetic call sequences. We apply the method to calls recorded from groups of domesticated zebra finch (Taeniopygia guttata) individuals. We find that the communication network in these groups has stable structure that persists from one day to the next, and that 'kernels' reflecting the temporal range of influence have a characteristic structure for a calling individual's effect on itself, its partner and on others in the group. We further find characteristic patterns of influences by call type as well as by individual. © 2016 The Authors.
Szipl, Georgine; Boeckle, Markus; Wascher, Claudia A.F.; Spreafico, Michela; Bugnyar, Thomas
2015-01-01
Upon discovering food, common ravens, Corvus corax, produce far-reaching ‘haa’ calls or yells, which are individually distinct and signal food availability to conspecifics. Here, we investigated whether ravens respond differently to ‘haa’ calls of known and unknown individuals. In a paired playback design, we tested responses to ‘haa’ call sequences in a group containing individually marked free-ranging ravens. We simultaneously played call sequences of a male and a female raven in two different locations and varied familiarity (known or unknown to the local group). Ravens responded strongest to dyads containing familiar females, performing more scan flights above and by perching in trees near the respective speaker. Acoustic analysis of the calls used as stimuli showed no sex-, age- or familiarity-specific acoustic cues, but highly significant classification results at the individual level. Taken together, our findings indicate that ravens respond to individual characteristics in ‘haa’ calls, and choose whom to approach for feeding, i.e. join social allies and avoid dominant conspecifics. This is the first study to investigate responses to ‘haa’ calls under natural conditions in a wild population containing individually marked ravens. PMID:25598542
Van Hoeck, Arne; Horemans, Nele; Monsieurs, Pieter; Cao, Hieu Xuan; Vandenhove, Hildegarde; Blust, Ronny
2015-01-01
Freshwater duckweed, comprising the smallest, fastest growing and simplest macrophytes has various applications in agriculture, phytoremediation and energy production. Lemna minor, the so-called common duckweed, is a model system of these aquatic plants for ecotoxicological bioassays, genetic transformation tools and industrial applications. Given the ecotoxic relevance and high potential for biomass production, whole-genome information of this cosmopolitan duckweed is needed. The 472 Mbp assembly of the L. minor genome (2n = 40; estimated 481 Mbp; 98.1 %) contains 22,382 protein-coding genes and 61.5 % repetitive sequences. The repeat content explains 94.5 % of the genome size difference in comparison with the greater duckweed, Spirodela polyrhiza (2n = 40; 158 Mbp; 19,623 protein-coding genes; and 15.79 % repetitive sequences). Comparison of proteins from other monocot plants, protein ortholog identification, OrthoMCL, suggests 1356 duckweed-specific groups (3367 proteins, 15.0 % total L. minor proteins) and 795 Lemna-specific groups (2897 proteins, 12.9 % total L. minor proteins). Interestingly, proteins involved in biosynthetic processes in response to various stimuli and hydrolase activities are enriched in the Lemna proteome in comparison with the Spirodela proteome. The genome sequence and annotation of L. minor protein-coding genes provide new insights in biological understanding and biomass production applications of Lemna species.
Ron, Santiago R; Venegas, Pablo J; Ortega-Andrade, H Mauricio; Gagliardi-Urrutia, Giussepe; Salerno, Patricia E
2016-01-01
Ecnomiohyla tuberculosa is an Amazonian hylid of uncertain phylogenetic position. Herein DNA sequences of mitochondrial and nuclear genes are used to determine its phylogenetic relationships. New sequences and external morphology of Trachycephalus typhonius are also analyzed to assess the status of Ecuadorian and Peruvian populations. The phylogeny shows unequivocally that Ecnomiohyla tuberculosa is nested within the genus Tepuihyla , tribe Lophiohylini. This position was unexpected because the remaining species of Ecnomiohyla belong to the tribe Hylini. To solve the paraphyly of the genus Ecnomiohyla , Ecnomiohyla tuberculosa is transferred to the genus Tepuihyla . Comparisons of DNA sequences, external morphology, and advertisement calls between populations of Ecnomiohyla tuberculosa from Ecuador and Peru indicate that the Peruvian population represents an undescribed species. The new species is described and a species account is provided for Ecnomiohyla tuberculosa . Trachycephalus typhonius is paraphyletic relative to Trachycephalus cunauaru , Trachycephalus hadroceps , and Trachycephalus resinifictrix . The phylogenetic position of populations from western Ecuador indicates that they represent a species separate from Trachycephalus typhonius sensu stricto . We resurrect the name Hyla quadrangulum ( Trachycephalus quadrangulum comb. n. ) for those populations. Amazonian populations of " Trachycephalus typhonius " from Ecuador and Peru are genetically and morphologically distinct from Trachycephalus typhonius sensu stricto and are conspecific with the holotype of Hyla macrotis . Therefore, we also resurrect Hyla macrotis , a decision that results in Trachycephalus macrotis comb. n.
Ron, Santiago R.; Venegas, Pablo J.; Ortega-Andrade, H. Mauricio; Gagliardi-Urrutia, Giussepe; Salerno, Patricia E.
2016-01-01
Abstract Ecnomiohyla tuberculosa is an Amazonian hylid of uncertain phylogenetic position. Herein DNA sequences of mitochondrial and nuclear genes are used to determine its phylogenetic relationships. New sequences and external morphology of Trachycephalus typhonius are also analyzed to assess the status of Ecuadorian and Peruvian populations. The phylogeny shows unequivocally that Ecnomiohyla tuberculosa is nested within the genus Tepuihyla, tribe Lophiohylini. This position was unexpected because the remaining species of Ecnomiohyla belong to the tribe Hylini. To solve the paraphyly of the genus Ecnomiohyla, Ecnomiohyla tuberculosa is transferred to the genus Tepuihyla. Comparisons of DNA sequences, external morphology, and advertisement calls between populations of Ecnomiohyla tuberculosa from Ecuador and Peru indicate that the Peruvian population represents an undescribed species. The new species is described and a species account is provided for Ecnomiohyla tuberculosa. Trachycephalus typhonius is paraphyletic relative to Trachycephalus cunauaru, Trachycephalus hadroceps, and Trachycephalus resinifictrix. The phylogenetic position of populations from western Ecuador indicates that they represent a species separate from Trachycephalus typhonius sensu stricto. We resurrect the name Hyla quadrangulum (Trachycephalus quadrangulum comb. n.) for those populations. Amazonian populations of “Trachycephalus typhonius” from Ecuador and Peru are genetically and morphologically distinct from Trachycephalus typhonius sensu stricto and are conspecific with the holotype of Hyla macrotis. Therefore, we also resurrect Hyla macrotis, a decision that results in Trachycephalus macrotis comb. n. PMID:27917043
Identification and correction of systematic error in high-throughput sequence data
2011-01-01
Background A feature common to all DNA sequencing technologies is the presence of base-call errors in the sequenced reads. The implications of such errors are application specific, ranging from minor informatics nuisances to major problems affecting biological inferences. Recently developed "next-gen" sequencing technologies have greatly reduced the cost of sequencing, but have been shown to be more error prone than previous technologies. Both position specific (depending on the location in the read) and sequence specific (depending on the sequence in the read) errors have been identified in Illumina and Life Technology sequencing platforms. We describe a new type of systematic error that manifests as statistically unlikely accumulations of errors at specific genome (or transcriptome) locations. Results We characterize and describe systematic errors using overlapping paired reads from high-coverage data. We show that such errors occur in approximately 1 in 1000 base pairs, and that they are highly replicable across experiments. We identify motifs that are frequent at systematic error sites, and describe a classifier that distinguishes heterozygous sites from systematic error. Our classifier is designed to accommodate data from experiments in which the allele frequencies at heterozygous sites are not necessarily 0.5 (such as in the case of RNA-Seq), and can be used with single-end datasets. Conclusions Systematic errors can easily be mistaken for heterozygous sites in individuals, or for SNPs in population analyses. Systematic errors are particularly problematic in low coverage experiments, or in estimates of allele-specific expression from RNA-Seq data. Our characterization of systematic error has allowed us to develop a program, called SysCall, for identifying and correcting such errors. We conclude that correction of systematic errors is important to consider in the design and interpretation of high-throughput sequencing experiments. PMID:22099972
Characterization of minimal sequences associated with self-similar interval exchange maps
NASA Astrophysics Data System (ADS)
Cobo, Milton; Gutiérrez-Romo, Rodolfo; Maass, Alejandro
2018-04-01
The construction of affine interval exchange maps (IEMs) with wandering intervals that are semi-conjugate to a given self-similar IEM is strongly related to the existence of the so-called minimal sequences associated with local potentials, which are certain elements of the substitution subshift arising from the given IEM. In this article, under the condition called unique representation property, we characterize such minimal sequences for potentials coming from non-real eigenvalues of the substitution matrix. We also give conditions on the slopes of the affine extensions of a self-similar IEM that determine whether it exhibits a wandering interval or not.
A non-parametric peak calling algorithm for DamID-Seq.
Li, Renhua; Hempel, Leonie U; Jiang, Tingbo
2015-01-01
Protein-DNA interactions play a significant role in gene regulation and expression. In order to identify transcription factor binding sites (TFBS) of double sex (DSX)-an important transcription factor in sex determination, we applied the DNA adenine methylation identification (DamID) technology to the fat body tissue of Drosophila, followed by deep sequencing (DamID-Seq). One feature of DamID-Seq data is that induced adenine methylation signals are not assured to be symmetrically distributed at TFBS, which renders the existing peak calling algorithms for ChIP-Seq, including SPP and MACS, inappropriate for DamID-Seq data. This challenged us to develop a new algorithm for peak calling. A challenge in peaking calling based on sequence data is estimating the averaged behavior of background signals. We applied a bootstrap resampling method to short sequence reads in the control (Dam only). After data quality check and mapping reads to a reference genome, the peaking calling procedure compromises the following steps: 1) reads resampling; 2) reads scaling (normalization) and computing signal-to-noise fold changes; 3) filtering; 4) Calling peaks based on a statistically significant threshold. This is a non-parametric method for peak calling (NPPC). We also used irreproducible discovery rate (IDR) analysis, as well as ChIP-Seq data to compare the peaks called by the NPPC. We identified approximately 6,000 peaks for DSX, which point to 1,225 genes related to the fat body tissue difference between female and male Drosophila. Statistical evidence from IDR analysis indicated that these peaks are reproducible across biological replicates. In addition, these peaks are comparable to those identified by use of ChIP-Seq on S2 cells, in terms of peak number, location, and peaks width.
Effects of weather conditions on emergency ambulance calls for acute coronary syndromes
NASA Astrophysics Data System (ADS)
Vencloviene, Jone; Babarskiene, Ruta; Dobozinskas, Paulius; Siurkaite, Viktorija
2015-08-01
The aim of this study was to evaluate the relationship between weather conditions and daily emergency ambulance calls for acute coronary syndromes (ACS). The study included data on 3631 patients who called the ambulance for chest pain and were admitted to the department of cardiology as patients with ACS. We investigated the effect of daily air temperature ( T), barometric pressure (BP), relative humidity, and wind speed (WS) to detect the risk areas for low and high daily volume (DV) of emergency calls. We used the classification and regression tree method as well as cluster analysis. The clusters were created by applying the k-means cluster algorithm using the standardized daily weather variables. The analysis was performed separately during cold (October-April) and warm (May-September) seasons. During the cold period, the greatest DV was observed on days of low T during the 3-day sequence, on cold and windy days, and on days of low BP and high WS during the 3-day sequence; low DV was associated with high BP and decreased WS on the previous day. During June-September, a lower DV was associated with low BP, windless days, and high BP and low WS during the 3-day sequence. During the warm period, the greatest DV was associated with increased BP and changing WS during the 3-day sequence. These results suggest that daily T, BP, and WS on the day of the ambulance call and on the two previous days may be prognostic variables for the risk of ACS.
Verbist, Bie; Clement, Lieven; Reumers, Joke; Thys, Kim; Vapirev, Alexander; Talloen, Willem; Wetzels, Yves; Meys, Joris; Aerssens, Jeroen; Bijnens, Luc; Thas, Olivier
2015-02-22
Deep-sequencing allows for an in-depth characterization of sequence variation in complex populations. However, technology associated errors may impede a powerful assessment of low-frequency mutations. Fortunately, base calls are complemented with quality scores which are derived from a quadruplet of intensities, one channel for each nucleotide type for Illumina sequencing. The highest intensity of the four channels determines the base that is called. Mismatch bases can often be corrected by the second best base, i.e. the base with the second highest intensity in the quadruplet. A virus variant model-based clustering method, ViVaMBC, is presented that explores quality scores and second best base calls for identifying and quantifying viral variants. ViVaMBC is optimized to call variants at the codon level (nucleotide triplets) which enables immediate biological interpretation of the variants with respect to their antiviral drug responses. Using mixtures of HCV plasmids we show that our method accurately estimates frequencies down to 0.5%. The estimates are unbiased when average coverages of 25,000 are reached. A comparison with the SNP-callers V-Phaser2, ShoRAH, and LoFreq shows that ViVaMBC has a superb sensitivity and specificity for variants with frequencies above 0.4%. Unlike the competitors, ViVaMBC reports a higher number of false-positive findings with frequencies below 0.4% which might partially originate from picking up artificial variants introduced by errors in the sample and library preparation step. ViVaMBC is the first method to call viral variants directly at the codon level. The strength of the approach lies in modeling the error probabilities based on the quality scores. Although the use of second best base calls appeared very promising in our data exploration phase, their utility was limited. They provided a slight increase in sensitivity, which however does not warrant the additional computational cost of running the offline base caller. Apparently a lot of information is already contained in the quality scores enabling the model based clustering procedure to adjust the majority of the sequencing errors. Overall the sensitivity of ViVaMBC is such that technical constraints like PCR errors start to form the bottleneck for low frequency variant detection.
Engineered Biomimetic Polymers as Tunable Agents for Controlling CaCO₃ Mineralization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Chun-Long; Qi, Jiahui; Zuckermann, Ronald N.
2011-01-01
In nature, living organisms use peptides and proteins to precisely control the nucleation and growth of inorganic minerals and sequester CO₂ via mineralization of CaCO₃. Here we report the exploitation of a novel class of sequence-specific non-natural polymers called peptoids as tunable agents that dramatically control CaCO₃ mineralization. We show that amphiphilic peptoids composed of hydrophobic and anionic monomers exhibit both a high degree of control over calcite growth morphology and an unprecedented 23-fold acceleration of growth at a peptoid concentration of only 50 nM, while acidic peptides of similar molecular weight exhibited enhancement factors of only ~2 or less.more » We further show that both the morphology and rate controls depend on peptoid sequence, side-chain chemistry, chain length, and concentration. These findings provide guidelines for developing sequence-specific non-natural polymers that mimic the functions of natural peptides or proteins in their ability to direct mineralization of CaCO₃, with an eye toward their application to sequestration of CO₂ through mineral trapping.« less
An adaptive, object oriented strategy for base calling in DNA sequence analysis.
Giddings, M C; Brumley, R L; Haker, M; Smith, L M
1993-01-01
An algorithm has been developed for the determination of nucleotide sequence from data produced in fluorescence-based automated DNA sequencing instruments employing the four-color strategy. This algorithm takes advantage of object oriented programming techniques for modularity and extensibility. The algorithm is adaptive in that data sets from a wide variety of instruments and sequencing conditions can be used with good results. Confidence values are provided on the base calls as an estimate of accuracy. The algorithm iteratively employs confidence determinations from several different modules, each of which examines a different feature of the data for accurate peak identification. Modules within this system can be added or removed for increased performance or for application to a different task. In comparisons with commercial software, the algorithm performed well. Images PMID:8233787
CoVaCS: a consensus variant calling system.
Chiara, Matteo; Gioiosa, Silvia; Chillemi, Giovanni; D'Antonio, Mattia; Flati, Tiziano; Picardi, Ernesto; Zambelli, Federico; Horner, David Stephen; Pesole, Graziano; Castrignanò, Tiziana
2018-02-05
The advent and ongoing development of next generation sequencing technologies (NGS) has led to a rapid increase in the rate of human genome re-sequencing data, paving the way for personalized genomics and precision medicine. The body of genome resequencing data is progressively increasing underlining the need for accurate and time-effective bioinformatics systems for genotyping - a crucial prerequisite for identification of candidate causal mutations in diagnostic screens. Here we present CoVaCS, a fully automated, highly accurate system with a web based graphical interface for genotyping and variant annotation. Extensive tests on a gold standard benchmark data-set -the NA12878 Illumina platinum genome- confirm that call-sets based on our consensus strategy are completely in line with those attained by similar command line based approaches, and far more accurate than call-sets from any individual tool. Importantly our system exhibits better sensitivity and higher specificity than equivalent commercial software. CoVaCS offers optimized pipelines integrating state of the art tools for variant calling and annotation for whole genome sequencing (WGS), whole-exome sequencing (WES) and target-gene sequencing (TGS) data. The system is currently hosted at Cineca, and offers the speed of a HPC computing facility, a crucial consideration when large numbers of samples must be analysed. Importantly, all the analyses are performed automatically allowing high reproducibility of the results. As such, we believe that CoVaCS can be a valuable tool for the analysis of human genome resequencing studies. CoVaCS is available at: https://bioinformatics.cineca.it/covacs .
Tryland, Morten; Beckmen, Kimberlee Beth; Burek-Huntington, Kathleen Ann; Breines, Eva Marie; Klein, Joern
2018-02-21
The zoonotic Orf virus (ORFV; genus Parapoxvirus, Poxviridae family) occurs worldwide and is transmitted between sheep and goats, wildlife and man. Archived tissue samples from 16 Alaskan wildlife cases, representing mountain goat (Oreamnos americanus, n = 8), Dall's sheep (Ovis dalli dalli, n = 3), muskox (Ovibos moschatus, n = 3), Sitka black-tailed deer (Odocoileus hemionus sitkensis, n = 1) and caribou (Rangifer tarandus granti, n = 1), were analyzed. Clinical signs and pathology were most severe in mountain goats, affecting most mucocutaneous regions, including palpebrae, nares, lips, anus, prepuce or vulva, as well as coronary bands. The proliferative masses were solid and nodular, covered by dark friable crusts. For Dall's sheep lambs and juveniles, the gross lesions were similar to those of mountain goats, but not as extensive. The muskoxen displayed ulcerative lesions on the legs. The caribou had two ulcerative lesions on the upper lip, as well as lesions on the distal part of the legs, around the main and dew claws. A large hairless spherical mass, with the characteristics of a fibroma, was sampled from a Sitka black-tailed deer, which did not show proliferative lesions typical of an ORFV infection. Polymerase chain reaction analyses for B2L, GIF, vIL-10 and ATI demonstrated ORFV specific DNA in all cases. Sequences from Dall's sheep formed a separate cluster, comparable to ORFV from domestic sheep. Sequences from the other species were different from the Dall's sheep sequences, but almost identical to each other. This is the first major investigation of parapoxvirus infections in large Alaskan game species, and the first report of parapoxvirus infection in caribou and Sitka black-tailed deer. This study shows that most of the wild ruminant species in Alaska and from most parts of Alaska, can carry and be affected by ORFV. These findings call for attention to transmission of ORFV from wildlife to livestock and to hunters, subsistence harvesters, and wildlife biologists.
Methods For Self-Organizing Software
Bouchard, Ann M.; Osbourn, Gordon C.
2005-10-18
A method for dynamically self-assembling and executing software is provided, containing machines that self-assemble execution sequences and data structures. In addition to ordered functions calls (found commonly in other software methods), mutual selective bonding between bonding sites of machines actuates one or more of the bonding machines. Two or more machines can be virtually isolated by a construct, called an encapsulant, containing a population of machines and potentially other encapsulants that can only bond with each other. A hierarchical software structure can be created using nested encapsulants. Multi-threading is implemented by populations of machines in different encapsulants that are interacting concurrently. Machines and encapsulants can move in and out of other encapsulants, thereby changing the functionality. Bonding between machines' sites can be deterministic or stochastic with bonding triggering a sequence of actions that can be implemented by each machine. A self-assembled execution sequence occurs as a sequence of stochastic binding between machines followed by their deterministic actuation. It is the sequence of bonding of machines that determines the execution sequence, so that the sequence of instructions need not be contiguous in memory.
Exome sequencing reveals novel genetic loci influencing obesity-related traits in Hispanic children
USDA-ARS?s Scientific Manuscript database
To perform whole exome sequencing in 928 Hispanic children and identify variants and genes associated with childhood obesity.Single-nucleotide variants (SNVs) were identified from Illumina whole exome sequencing data using integrated read mapping, variant calling, and an annotation pipeline (Mercury...
Xu, Chang; Nezami Ranjbar, Mohammad R; Wu, Zhong; DiCarlo, John; Wang, Yexun
2017-01-03
Detection of DNA mutations at very low allele fractions with high accuracy will significantly improve the effectiveness of precision medicine for cancer patients. To achieve this goal through next generation sequencing, researchers need a detection method that 1) captures rare mutation-containing DNA fragments efficiently in the mix of abundant wild-type DNA; 2) sequences the DNA library extensively to deep coverage; and 3) distinguishes low level true variants from amplification and sequencing errors with high accuracy. Targeted enrichment using PCR primers provides researchers with a convenient way to achieve deep sequencing for a small, yet most relevant region using benchtop sequencers. Molecular barcoding (or indexing) provides a unique solution for reducing sequencing artifacts analytically. Although different molecular barcoding schemes have been reported in recent literature, most variant calling has been done on limited targets, using simple custom scripts. The analytical performance of barcode-aware variant calling can be significantly improved by incorporating advanced statistical models. We present here a highly efficient, simple and scalable enrichment protocol that integrates molecular barcodes in multiplex PCR amplification. In addition, we developed smCounter, an open source, generic, barcode-aware variant caller based on a Bayesian probabilistic model. smCounter was optimized and benchmarked on two independent read sets with SNVs and indels at 5 and 1% allele fractions. Variants were called with very good sensitivity and specificity within coding regions. We demonstrated that we can accurately detect somatic mutations with allele fractions as low as 1% in coding regions using our enrichment protocol and variant caller.
Kordes, Sebastian; Kössl, Manfred
2017-01-01
Abstract For the purpose of orientation, echolocating bats emit highly repetitive and spatially directed sonar calls. Echoes arising from call reflections are used to create an acoustic image of the environment. The inferior colliculus (IC) represents an important auditory stage for initial processing of echolocation signals. The present study addresses the following questions: (1) how does the temporal context of an echolocation sequence mimicking an approach flight of an animal affect neuronal processing of distance information to echo delays? (2) how does the IC process complex echolocation sequences containing echo information from multiple objects (multiobject sequence)? Here, we conducted neurophysiological recordings from the IC of ketamine-anaesthetized bats of the species Carollia perspicillata and compared the results from the IC with the ones from the auditory cortex (AC). Neuronal responses to an echolocation sequence was suppressed when compared to the responses to temporally isolated and randomized segments of the sequence. The neuronal suppression was weaker in the IC than in the AC. In contrast to the cortex, the time course of the acoustic events is reflected by IC activity. In the IC, suppression sharpens the neuronal tuning to specific call-echo elements and increases the signal-to-noise ratio in the units’ responses. When presenting multiple-object sequences, despite collicular suppression, the neurons responded to each object-specific echo. The latter allows parallel processing of multiple echolocation streams at the IC level. Altogether, our data suggests that temporally-precise neuronal responses in the IC could allow fast and parallel processing of multiple acoustic streams. PMID:29242823
Beetz, M Jerome; Kordes, Sebastian; García-Rosales, Francisco; Kössl, Manfred; Hechavarría, Julio C
2017-01-01
For the purpose of orientation, echolocating bats emit highly repetitive and spatially directed sonar calls. Echoes arising from call reflections are used to create an acoustic image of the environment. The inferior colliculus (IC) represents an important auditory stage for initial processing of echolocation signals. The present study addresses the following questions: (1) how does the temporal context of an echolocation sequence mimicking an approach flight of an animal affect neuronal processing of distance information to echo delays? (2) how does the IC process complex echolocation sequences containing echo information from multiple objects (multiobject sequence)? Here, we conducted neurophysiological recordings from the IC of ketamine-anaesthetized bats of the species Carollia perspicillata and compared the results from the IC with the ones from the auditory cortex (AC). Neuronal responses to an echolocation sequence was suppressed when compared to the responses to temporally isolated and randomized segments of the sequence. The neuronal suppression was weaker in the IC than in the AC. In contrast to the cortex, the time course of the acoustic events is reflected by IC activity. In the IC, suppression sharpens the neuronal tuning to specific call-echo elements and increases the signal-to-noise ratio in the units' responses. When presenting multiple-object sequences, despite collicular suppression, the neurons responded to each object-specific echo. The latter allows parallel processing of multiple echolocation streams at the IC level. Altogether, our data suggests that temporally-precise neuronal responses in the IC could allow fast and parallel processing of multiple acoustic streams.
Discrete sequence prediction and its applications
NASA Technical Reports Server (NTRS)
Laird, Philip
1992-01-01
Learning from experience to predict sequences of discrete symbols is a fundamental problem in machine learning with many applications. We apply sequence prediction using a simple and practical sequence-prediction algorithm, called TDAG. The TDAG algorithm is first tested by comparing its performance with some common data compression algorithms. Then it is adapted to the detailed requirements of dynamic program optimization, with excellent results.
Gonzalez, Paulina; Vileno, Bertrand; Bossak, Karolina; El Khoury, Youssef; Hellwig, Petra; Bal, Wojciech; Hureau, Christelle; Faller, Peter
2017-12-18
Peptides and proteins with the N-terminal motifs NH 2 -Xxx-His and NH 2 -Xxx-Zzz-His form well-established Cu(II) complexes. The canonical peptides are Gly-His-Lys and Asp-Ala-His-Lys (from the wound healing factor and human serum albumin, respectively). Cu(II) is bound to NH 2 -Xxx-His via three nitrogens from the peptide and an external ligand in the equatorial plane (called 3N form here). In contrast, Cu(II) is bound to NH 2 -Xxx-Zzz-His via four nitrogens from the peptide in the equatorial plane (called 4N form here). These two motifs are not mutually exclusive, as the peptides with the sequence NH 2 -Xxx-His-His contain both of them. However, this chimera has never been fully explored. In this work, we use a multispectroscopic approach to analyze the Cu(II) binding to the chimeric peptide Ala-His-His (AHH). AHH is capable of forming the 3N- and 4N-type complexes in a pH dependent manner. The 3N form predominates at pH ∼ 4-6.5 and the 4N form at ∼ pH 6.5-10. NMR experiments showed that at pH 8.5, where Cu(II) is almost exclusively bound in the 4N form, the Cu(II)-exchange between AHH or the amidated AHH-NH 2 is fast, in comparison to the nonchimeric 4N form (AAH). Together, the results show that the chimeric AHH can access both Cu(II) coordination types, that minor changes in the second (or further) coordination sphere can impact considerably the equilibrium between the forms, and that Cu kinetic exchange is fast even when Cu-AHH is mainly in the 4N form.
d-Ala-d-Ser VanN-Type Transferable Vancomycin Resistance in Enterococcus faecium▿
Lebreton, François; Depardieu, Florence; Bourdon, Nancy; Fines-Guyon, Marguerite; Berger, Pierre; Camiade, Sabine; Leclercq, Roland; Courvalin, Patrice; Cattoir, Vincent
2011-01-01
Enterococcus faecium UCN71, isolated from a blood culture, was resistant to low levels of vancomycin (MIC, 16 μg/ml) but susceptible to teicoplanin (MIC, 0.5 μg/ml). No amplification was observed with primers specific for the previously described glycopeptide resistance ligase genes, but a PCR product corresponding to a gene called vanN was obtained using degenerate primers and was sequenced. The deduced VanN protein was related (65% identity) to the d-alanine:d-serine VanL ligase. The organization of the vanN gene cluster, determined using degenerate primers and by thermal asymmetric interlaced (TAIL)-PCR, was similar to that of the vanC operons. A single promoter upstream from the resistance operon was identified by rapid amplification of cDNA ends (RACE)-PCR. The presence of peptidoglycan precursors ending in d-serine and d,d-peptidase activities in the absence of vancomycin indicated constitutive expression of the resistance operon. VanN-type resistance was transferable by conjugation to E. faecium. This is the first report of transferable d-Ala-d-Ser-type resistance in E. faecium. PMID:21807981
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kuiken, Carla; Foley, Brian; Leitner, Thomas
This compendium is an annual printed summary of the data contained in the HIV sequence database. In these compendia we try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2010. Hence, though it is called the 2010 Compendium, its contents correspond to the 2009 curated alignments on our website. The number of sequences in the HIV database is stillmore » increasing exponentially. In total, at the time of printing, there were 339,306 sequences in the HIV Sequence Database, an increase of 45% since last year. The number of near complete genomes (>7000 nucleotides) increased to 2576 by end of 2009, reflecting a smaller increase than in previous years. However, as in previous years, the compendium alignments contain only a small fraction of these. Included in the alignments are a small number of sequences representing each of the subtypes and the more prevalent circulating recombinant forms (CRFs) such as 01 and 02, as well as a few outgroup sequences (group O and N and SIV-CPZ). Of the rarer CRFs we included one representative each. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/content/sequence/NEWALIGN/align.html. Reprints are available from our website in the form of both HTML and PDF files. As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.« less
Zurawski, Gerard; Bohnert, Hans J.; Whitfeld, Paul R.; Bottomley, Warwick
1982-01-01
The gene for the so-called Mr 32,000 rapidly labeled photosystem II thylakoid membrane protein (here designated psbA) of spinach (Spinacia oleracea) chloroplasts is located on the chloroplast DNA in the large single-copy region immediately adjacent to one of the inverted repeat sequences. In this paper we show that the size of the mRNA for this protein is ≈ 1.25 kilobases and that the direction of transcription is towards the inverted repeat unit. The nucleotide sequence of the gene and its flanking regions is presented. The only large open reading frame in the sequence codes for a protein of Mr 38,950. The nucleotide sequence of psbA from Nicotiana debneyi also has been determined, and comparison of the sequences from the two species shows them to be highly conserved (>95% homology) throughout the entire reading frame. Conservation of the amino acid sequence is absolute, there being no changes in a total of 353 residues. This leads us to conclude that the primary translation product of psbA must be a protein of Mr 38,950. The protein is characterized by the complete absence of lysine residues and is relatively rich in hydrophobic amino acids, which tend to be clustered. Transcription of spinach psbA starts about 86 base pairs before the first ATG codon. Immediately upstream from this point there is a sequence typical of that found in E. coli promoters. An almost identical sequence occurs in the equivalent region of N. debneyi DNA. Images PMID:16593262
The interplay of homing and dispersal in green turtles: a focus on the southwestern atlantic.
Naro-Maciel, Eugenia; Bondioli, Ana Cristina Vigliar; Martin, Meredith; de Pádua Almeida, Antônio; Baptistotte, Cecília; Bellini, Claudio; Marcovaldi, Maria Ângela; Santos, Armando José Barsante; Amato, George
2012-01-01
Current understanding of spatial ecology is insufficient in many threatened marine species, failing to provide a solid basis for conservation and management. To address this issue for globally endangered green turtles, we investigated their population distribution by sequencing a mitochondrial control region segment from the Rocas Atoll courtship area (n = 30 males) and four feeding grounds (FGs) in Brazil (n = 397), and compared our findings to published data (n (nesting) = 1205; n (feeding) = 1587). At Rocas Atoll, the first Atlantic courtship area sequenced to date, we found males were differentiated from local juveniles but not from nesting females. In combination with tag data, this indicates possible male philopatry. The most common haplotypes detected at the study sites were CMA-08 and CMA-05, and significant temporal variation was not revealed. Although feeding grounds were differentiated overall, intra-regional structure was less pronounced. Ascension was the primary natal source of the study FGs, with Surinam and Trindade as secondary sources. The study clarified the primary connectivity between Trindade and Brazil. Possible linkages to African populations were considered, but there was insufficient resolution to conclusively determine this connection. The distribution of FG haplotype lineages was nonrandom and indicative of regional clustering. The study investigated impacts of population size, geographic distance, ocean currents, and juvenile natal homing on connectivity, addressed calls for increased genetic sampling in the southwestern Atlantic, and provided data important for conservation of globally endangered green turtles.
BS-virus-finder: virus integration calling using bisulfite sequencing data.
Gao, Shengjie; Hu, Xuesong; Xu, Fengping; Gao, Changduo; Xiong, Kai; Zhao, Xiao; Chen, Haixiao; Zhao, Shancen; Wang, Mengyao; Fu, Dongke; Zhao, Xiaohui; Bai, Jie; Mao, Likai; Li, Bo; Wu, Song; Wang, Jian; Li, Shengbin; Yang, Huangming; Bolund, Lars; Pedersen, Christian N S
2018-01-01
DNA methylation plays a key role in the regulation of gene expression and carcinogenesis. Bisulfite sequencing studies mainly focus on calling single nucleotide polymorphism, different methylation region, and find allele-specific DNA methylation. Until now, only a few software tools have focused on virus integration using bisulfite sequencing data. We have developed a new and easy-to-use software tool, named BS-virus-finder (BSVF, RRID:SCR_015727), to detect viral integration breakpoints in whole human genomes. The tool is hosted at https://github.com/BGI-SZ/BSVF. BS-virus-finder demonstrates high sensitivity and specificity. It is useful in epigenetic studies and to reveal the relationship between viral integration and DNA methylation. BS-virus-finder is the first software tool to detect virus integration loci by using bisulfite sequencing data. © The Authors 2017. Published by Oxford University Press.
Pandey, Ram Vinay; Pabinger, Stephan; Kriegner, Albert; Weinhäusel, Andreas
2016-01-01
Traditional Sanger sequencing as well as Next-Generation Sequencing have been used for the identification of disease causing mutations in human molecular research. The majority of currently available tools are developed for research and explorative purposes and often do not provide a complete, efficient, one-stop solution. As the focus of currently developed tools is mainly on NGS data analysis, no integrative solution for the analysis of Sanger data is provided and consequently a one-stop solution to analyze reads from both sequencing platforms is not available. We have therefore developed a new pipeline called MutAid to analyze and interpret raw sequencing data produced by Sanger or several NGS sequencing platforms. It performs format conversion, base calling, quality trimming, filtering, read mapping, variant calling, variant annotation and analysis of Sanger and NGS data under a single platform. It is capable of analyzing reads from multiple patients in a single run to create a list of potential disease causing base substitutions as well as insertions and deletions. MutAid has been developed for expert and non-expert users and supports four sequencing platforms including Sanger, Illumina, 454 and Ion Torrent. Furthermore, for NGS data analysis, five read mappers including BWA, TMAP, Bowtie, Bowtie2 and GSNAP and four variant callers including GATK-HaplotypeCaller, SAMTOOLS, Freebayes and VarScan2 pipelines are supported. MutAid is freely available at https://sourceforge.net/projects/mutaid.
Pandey, Ram Vinay; Pabinger, Stephan; Kriegner, Albert; Weinhäusel, Andreas
2016-01-01
Traditional Sanger sequencing as well as Next-Generation Sequencing have been used for the identification of disease causing mutations in human molecular research. The majority of currently available tools are developed for research and explorative purposes and often do not provide a complete, efficient, one-stop solution. As the focus of currently developed tools is mainly on NGS data analysis, no integrative solution for the analysis of Sanger data is provided and consequently a one-stop solution to analyze reads from both sequencing platforms is not available. We have therefore developed a new pipeline called MutAid to analyze and interpret raw sequencing data produced by Sanger or several NGS sequencing platforms. It performs format conversion, base calling, quality trimming, filtering, read mapping, variant calling, variant annotation and analysis of Sanger and NGS data under a single platform. It is capable of analyzing reads from multiple patients in a single run to create a list of potential disease causing base substitutions as well as insertions and deletions. MutAid has been developed for expert and non-expert users and supports four sequencing platforms including Sanger, Illumina, 454 and Ion Torrent. Furthermore, for NGS data analysis, five read mappers including BWA, TMAP, Bowtie, Bowtie2 and GSNAP and four variant callers including GATK-HaplotypeCaller, SAMTOOLS, Freebayes and VarScan2 pipelines are supported. MutAid is freely available at https://sourceforge.net/projects/mutaid. PMID:26840129
Kelsen, Judith R.; Dawany, Noor; Moran, Christopher J.; Petersen, Britt-Sabina; Sarmady, Mahdi; Sasson, Ariella; Pauly-Hubbard, Helen; Martinez, Alejandro; Maurer, Kelly; Soong, Joanne; Rappaport, Eric; Franke, Andre; Keller, Andreas; Winter, Harland S.; Mamula, Petar; Piccoli, David; Artis, David; Sonnenberg, Gregory F.; Daly, Mark; Sullivan, Kathleen E.; Baldassano, Robert N.; Devoto, Marcella
2016-01-01
Background & Aims Very early onset inflammatory bowel disease (VEO-IBD), IBD diagnosed ≤5 y of age, frequently presents with a different and more severe phenotype than older-onset IBD. We investigated whether patients with VEO-IBD carry rare or novel variants in genes associated with immunodeficiencies that might contribute to disease development. Methods Patients with VEO-IBD and parents (when available) were recruited from the Children's Hospital of Philadelphia from March 2013 through July 2014. We analyzed DNA from 125 patients with VEO-IBD (ages 3 weeks to 4 y) and 19 parents, 4 of whom also had IBD. Exome capture was performed by Agilent SureSelect V4, and sequencing was performed using the Illumina HiSeq platform. Alignment to human genome GRCh37 was achieved followed by post-processing and variant calling. Following functional annotation, candidate variants were analyzed for change in protein function, minor allele frequency <0.1%, and scaled combined annotation dependent depletion scores ≤10. We focused on genes associated with primary immunodeficiencies and related pathways. An additional 210 exome samples from patients with pediatric IBD (n=45) or adult-onset Crohn's disease (n=20) and healthy individuals (controls, n=145) were obtained from the University of Kiel, Germany and used as control groups. Results Four-hundred genes and regions associated with primary immunodeficiency, covering approximately 6500 coding exons totaling > 1 Mbp of coding sequence, were selected from the whole exome data. Our analysis revealed novel and rare variants within these genes that could contribute to the development of VEO-IBD, including rare heterozygous missense variants in IL10RA and previously unidentified variants in MSH5 and CD19. Conclusions In an exome sequence analysis of patients with VEO-IBD and their parents, we identified variants in genes that regulate B- and T-cell functions and could contribute to pathogenesis. Our analysis could lead to the identification of previously unidentified IBD-associated variants. PMID:26193622
Gene calling and bacterial genome annotation with BG7.
Tobes, Raquel; Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Kovach, Evdokim; Alekhin, Alexey; Pareja, Eduardo
2015-01-01
New massive sequencing technologies are providing many bacterial genome sequences from diverse taxa but a refined annotation of these genomes is crucial for obtaining scientific findings and new knowledge. Thus, bacterial genome annotation has emerged as a key point to investigate in bacteria. Any efficient tool designed specifically to annotate bacterial genomes sequenced with massively parallel technologies has to consider the specific features of bacterial genomes (absence of introns and scarcity of nonprotein-coding sequence) and of next-generation sequencing (NGS) technologies (presence of errors and not perfectly assembled genomes). These features make it convenient to focus on coding regions and, hence, on protein sequences that are the elements directly related with biological functions. In this chapter we describe how to annotate bacterial genomes with BG7, an open-source tool based on a protein-centered gene calling/annotation paradigm. BG7 is specifically designed for the annotation of bacterial genomes sequenced with NGS. This tool is sequence error tolerant maintaining their capabilities for the annotation of highly fragmented genomes or for annotating mixed sequences coming from several genomes (as those obtained through metagenomics samples). BG7 has been designed with scalability as a requirement, with a computing infrastructure completely based on cloud computing (Amazon Web Services).
PANGEA: pipeline for analysis of next generation amplicons
Giongo, Adriana; Crabb, David B; Davis-Richardson, Austin G; Chauliac, Diane; Mobberley, Jennifer M; Gano, Kelsey A; Mukherjee, Nabanita; Casella, George; Roesch, Luiz FW; Walts, Brandon; Riva, Alberto; King, Gary; Triplett, Eric W
2010-01-01
High-throughput DNA sequencing can identify organisms and describe population structures in many environmental and clinical samples. Current technologies generate millions of reads in a single run, requiring extensive computational strategies to organize, analyze and interpret those sequences. A series of bioinformatics tools for high-throughput sequencing analysis, including preprocessing, clustering, database matching and classification, have been compiled into a pipeline called PANGEA. The PANGEA pipeline was written in Perl and can be run on Mac OSX, Windows or Linux. With PANGEA, sequences obtained directly from the sequencer can be processed quickly to provide the files needed for sequence identification by BLAST and for comparison of microbial communities. Two different sets of bacterial 16S rRNA sequences were used to show the efficiency of this workflow. The first set of 16S rRNA sequences is derived from various soils from Hawaii Volcanoes National Park. The second set is derived from stool samples collected from diabetes-resistant and diabetes-prone rats. The workflow described here allows the investigator to quickly assess libraries of sequences on personal computers with customized databases. PANGEA is provided for users as individual scripts for each step in the process or as a single script where all processes, except the χ2 step, are joined into one program called the ‘backbone’. PMID:20182525
PANGEA: pipeline for analysis of next generation amplicons.
Giongo, Adriana; Crabb, David B; Davis-Richardson, Austin G; Chauliac, Diane; Mobberley, Jennifer M; Gano, Kelsey A; Mukherjee, Nabanita; Casella, George; Roesch, Luiz F W; Walts, Brandon; Riva, Alberto; King, Gary; Triplett, Eric W
2010-07-01
High-throughput DNA sequencing can identify organisms and describe population structures in many environmental and clinical samples. Current technologies generate millions of reads in a single run, requiring extensive computational strategies to organize, analyze and interpret those sequences. A series of bioinformatics tools for high-throughput sequencing analysis, including pre-processing, clustering, database matching and classification, have been compiled into a pipeline called PANGEA. The PANGEA pipeline was written in Perl and can be run on Mac OSX, Windows or Linux. With PANGEA, sequences obtained directly from the sequencer can be processed quickly to provide the files needed for sequence identification by BLAST and for comparison of microbial communities. Two different sets of bacterial 16S rRNA sequences were used to show the efficiency of this workflow. The first set of 16S rRNA sequences is derived from various soils from Hawaii Volcanoes National Park. The second set is derived from stool samples collected from diabetes-resistant and diabetes-prone rats. The workflow described here allows the investigator to quickly assess libraries of sequences on personal computers with customized databases. PANGEA is provided for users as individual scripts for each step in the process or as a single script where all processes, except the chi(2) step, are joined into one program called the 'backbone'.
Penna, M; Lin, W Y; Feng, A S
2001-12-01
We investigated the response selectivities of single auditory neurons in the torus semicircularis of Batrachyla antartandica (a leptodactylid from southern Chile) to synthetic stimuli having diverse temporal structures. The advertisement call for this species is characterized by a long sequence of brief sound pulses having a dominant frequency of about 2000 Hz. We constructed five different series of synthetic stimuli in which the following acoustic parameters were systematically modified, one at a time: pulse rate, pulse duration, pulse rise time, pulse fall time, and train duration. The carrier frequency of these stimuli was fixed at the characteristic frequency of the units under study (n=44). Response patterns of TS units to these synthetic call variants revealed different degrees of selectivity for each of the temporal variables. A substantial number of neurons showed preference for pulse rates below 2 pulses s(-1), approximating the values found in natural advertisement calls. Tonic neurons generally showed preferences for long pulse durations, long rise and fall times, and long train durations. In contrast, phasic and phasic-burst neurons preferred stimuli with short duration, short rise and fall times and short train durations.
Tensor functors between Morita duals of fusion categories
NASA Astrophysics Data System (ADS)
Galindo, César; Plavnik, Julia Yael
2017-03-01
Given a fusion category C and an indecomposable C -module category M , the fusion category C^*_{_{M}} of C-module endofunctors of M is called the (Morita) dual fusion category of C with respect to M . We describe tensor functors between two arbitrary duals C^*_{_{M}} and D^*_N in terms of data associated to C and D . We apply the results to G-equivariantizations of fusion categories and group-theoretical fusion categories. We describe the orbits of the action of the Brauer-Picard group on the set of module categories and we propose a categorification of the Rosenberg-Zelinsky sequence for fusion categories.
Identification of missing variants by combining multiple analytic pipelines.
Ren, Yingxue; Reddy, Joseph S; Pottier, Cyril; Sarangi, Vivekananda; Tian, Shulan; Sinnwell, Jason P; McDonnell, Shannon K; Biernacka, Joanna M; Carrasquillo, Minerva M; Ross, Owen A; Ertekin-Taner, Nilüfer; Rademakers, Rosa; Hudson, Matthew; Mainzer, Liudmila Sergeevna; Asmann, Yan W
2018-04-16
After decades of identifying risk factors using array-based genome-wide association studies (GWAS), genetic research of complex diseases has shifted to sequencing-based rare variants discovery. This requires large sample sizes for statistical power and has brought up questions about whether the current variant calling practices are adequate for large cohorts. It is well-known that there are discrepancies between variants called by different pipelines, and that using a single pipeline always misses true variants exclusively identifiable by other pipelines. Nonetheless, it is common practice today to call variants by one pipeline due to computational cost and assume that false negative calls are a small percent of total. We analyzed 10,000 exomes from the Alzheimer's Disease Sequencing Project (ADSP) using multiple analytic pipelines consisting of different read aligners and variant calling strategies. We compared variants identified by using two aligners in 50,100, 200, 500, 1000, and 1952 samples; and compared variants identified by adding single-sample genotyping to the default multi-sample joint genotyping in 50,100, 500, 2000, 5000 and 10,000 samples. We found that using a single pipeline missed increasing numbers of high-quality variants correlated with sample sizes. By combining two read aligners and two variant calling strategies, we rescued 30% of pass-QC variants at sample size of 2000, and 56% at 10,000 samples. The rescued variants had higher proportions of low frequency (minor allele frequency [MAF] 1-5%) and rare (MAF < 1%) variants, which are the very type of variants of interest. In 660 Alzheimer's disease cases with earlier onset ages of ≤65, 4 out of 13 (31%) previously-published rare pathogenic and protective mutations in APP, PSEN1, and PSEN2 genes were undetected by the default one-pipeline approach but recovered by the multi-pipeline approach. Identification of the complete variant set from sequencing data is the prerequisite of genetic association analyses. The current analytic practice of calling genetic variants from sequencing data using a single bioinformatics pipeline is no longer adequate with the increasingly large projects. The number and percentage of quality variants that passed quality filters but are missed by the one-pipeline approach rapidly increased with sample size.
NASA Astrophysics Data System (ADS)
Poston, Chloe N.; Higgs, Richard E.; You, Jinsam; Gelfanova, Valentina; Hale, John E.; Knierman, Michael D.; Siegel, Robert; Gutierrez, Jesus A.
2014-07-01
De novo sequencing by mass spectrometry (MS) allows for the determination of the complete amino acid (AA) sequence of a given protein based on the mass difference of detected ions from MS/MS fragmentation spectra. The technique relies on obtaining specific masses that can be attributed to characteristic theoretical masses of AAs. A major limitation of de novo sequencing by MS is the inability to distinguish between the isobaric residues leucine (Leu) and isoleucine (Ile). Incorrect identification of Ile as Leu or vice versa often results in loss of activity in recombinant antibodies. This functional ambiguity is commonly resolved with costly and time-consuming AA mutation and peptide sequencing experiments. Here, we describe a set of orthogonal biochemical protocols, which experimentally determine the identity of Ile or Leu residues in monoclonal antibodies (mAb) based on the selectivity that leucine aminopeptidase shows for n-terminal Leu residues and the cleavage preference for Leu by chymotrypsin. The resulting observations are combined with germline frequencies and incorporated into a logistic regression model, called Predictor for Xle Sites (PXleS) to provide a statistical likelihood for the identity of Leu at an ambiguous site. We demonstrate that PXleS can generate a probability for an Xle site in mAbs with 96% accuracy. The implementation of PXleS precludes the expression of several possible sequences and, therefore, reduces the overall time and resources required to go from spectra generation to a biologically active sequence for a mAb when an Ile or Leu residue is in question.
Ralph, Duncan K; Matsen, Frederick A
2016-01-01
VDJ rearrangement and somatic hypermutation work together to produce antibody-coding B cell receptor (BCR) sequences for a remarkable diversity of antigens. It is now possible to sequence these BCRs in high throughput; analysis of these sequences is bringing new insight into how antibodies develop, in particular for broadly-neutralizing antibodies against HIV and influenza. A fundamental step in such sequence analysis is to annotate each base as coming from a specific one of the V, D, or J genes, or from an N-addition (a.k.a. non-templated insertion). Previous work has used simple parametric distributions to model transitions from state to state in a hidden Markov model (HMM) of VDJ recombination, and assumed that mutations occur via the same process across sites. However, codon frame and other effects have been observed to violate these parametric assumptions for such coding sequences, suggesting that a non-parametric approach to modeling the recombination process could be useful. In our paper, we find that indeed large modern data sets suggest a model using parameter-rich per-allele categorical distributions for HMM transition probabilities and per-allele-per-position mutation probabilities, and that using such a model for inference leads to significantly improved results. We present an accurate and efficient BCR sequence annotation software package using a novel HMM "factorization" strategy. This package, called partis (https://github.com/psathyrella/partis/), is built on a new general-purpose HMM compiler that can perform efficient inference given a simple text description of an HMM.
Poston, Chloe N; Higgs, Richard E; You, Jinsam; Gelfanova, Valentina; Hale, John E; Knierman, Michael D; Siegel, Robert; Gutierrez, Jesus A
2014-07-01
De novo sequencing by mass spectrometry (MS) allows for the determination of the complete amino acid (AA) sequence of a given protein based on the mass difference of detected ions from MS/MS fragmentation spectra. The technique relies on obtaining specific masses that can be attributed to characteristic theoretical masses of AAs. A major limitation of de novo sequencing by MS is the inability to distinguish between the isobaric residues leucine (Leu) and isoleucine (Ile). Incorrect identification of Ile as Leu or vice versa often results in loss of activity in recombinant antibodies. This functional ambiguity is commonly resolved with costly and time-consuming AA mutation and peptide sequencing experiments. Here, we describe a set of orthogonal biochemical protocols, which experimentally determine the identity of Ile or Leu residues in monoclonal antibodies (mAb) based on the selectivity that leucine aminopeptidase shows for n-terminal Leu residues and the cleavage preference for Leu by chymotrypsin. The resulting observations are combined with germline frequencies and incorporated into a logistic regression model, called Predictor for Xle Sites (PXleS) to provide a statistical likelihood for the identity of Leu at an ambiguous site. We demonstrate that PXleS can generate a probability for an Xle site in mAbs with 96% accuracy. The implementation of PXleS precludes the expression of several possible sequences and, therefore, reduces the overall time and resources required to go from spectra generation to a biologically active sequence for a mAb when an Ile or Leu residue is in question.
Using distances between Top-n-gram and residue pairs for protein remote homology detection.
Liu, Bin; Xu, Jinghao; Zou, Quan; Xu, Ruifeng; Wang, Xiaolong; Chen, Qingcai
2014-01-01
Protein remote homology detection is one of the central problems in bioinformatics, which is important for both basic research and practical application. Currently, discriminative methods based on Support Vector Machines (SVMs) achieve the state-of-the-art performance. Exploring feature vectors incorporating the position information of amino acids or other protein building blocks is a key step to improve the performance of the SVM-based methods. Two new methods for protein remote homology detection were proposed, called SVM-DR and SVM-DT. SVM-DR is a sequence-based method, in which the feature vector representation for protein is based on the distances between residue pairs. SVM-DT is a profile-based method, which considers the distances between Top-n-gram pairs. Top-n-gram can be viewed as a profile-based building block of proteins, which is calculated from the frequency profiles. These two methods are position dependent approaches incorporating the sequence-order information of protein sequences. Various experiments were conducted on a benchmark dataset containing 54 families and 23 superfamilies. Experimental results showed that these two new methods are very promising. Compared with the position independent methods, the performance improvement is obvious. Furthermore, the proposed methods can also provide useful insights for studying the features of protein families. The better performance of the proposed methods demonstrates that the position dependant approaches are efficient for protein remote homology detection. Another advantage of our methods arises from the explicit feature space representation, which can be used to analyze the characteristic features of protein families. The source code of SVM-DT and SVM-DR is available at http://bioinformatics.hitsz.edu.cn/DistanceSVM/index.jsp.
Nafissi, Nafiseh; Slavcev, Roderick
2012-12-06
While safer than their viral counterparts, conventional non-viral gene delivery DNA vectors offer a limited safety profile. They often result in the delivery of unwanted prokaryotic sequences, antibiotic resistance genes, and the bacterial origins of replication to the target, which may lead to the stimulation of unwanted immunological responses due to their chimeric DNA composition. Such vectors may also impart the potential for chromosomal integration, thus potentiating oncogenesis. We sought to engineer an in vivo system for the quick and simple production of safer DNA vector alternatives that were devoid of non-transgene bacterial sequences and would lethally disrupt the host chromosome in the event of an unwanted vector integration event. We constructed a parent eukaryotic expression vector possessing a specialized manufactured multi-target site called "Super Sequence", and engineered E. coli cells (R-cell) that conditionally produce phage-derived recombinase Tel (PY54), TelN (N15), or Cre (P1). Passage of the parent plasmid vector through R-cells under optimized conditions, resulted in rapid, efficient, and one step in vivo generation of mini lcc--linear covalently closed (Tel/TelN-cell), or mini ccc--circular covalently closed (Cre-cell), DNA constructs, separated from the backbone plasmid DNA. Site-specific integration of lcc plasmids into the host chromosome resulted in chromosomal disruption and 10(5) fold lower viability than that seen with the ccc counterpart. We offer a high efficiency mini DNA vector production system that confers simple, rapid and scalable in vivo production of mini lcc DNA vectors that possess all the benefits of "minicircle" DNA vectors and virtually eliminate the potential for undesirable vector integration events.
Using Next-Generation Sequencing to Explore Genetics and Race in the High School Classroom
ERIC Educational Resources Information Center
Yang, Xinmiao; Hartman, Mark R.; Harrington, Kristin T.; Etson, Candice M.; Fierman, Matthew B.; Slonim, Donna K.; Walt, David R.
2017-01-01
With the development of new sequencing and bioinformatics technologies, concepts relating to personal genomics play an increasingly important role in our society. To promote interest and understanding of sequencing and bioinformatics in the high school classroom, we developed and implemented a laboratory-based teaching module called "The…
Eye movement sequence generation in humans: Motor or goal updating?
Quaia, Christian; Joiner, Wilsaan M.; FitzGibbon, Edmond J.; Optican, Lance M.; Smith, Maurice A.
2011-01-01
Saccadic eye movements are often grouped in pre-programmed sequences. The mechanism underlying the generation of each saccade in a sequence is currently poorly understood. Broadly speaking, two alternative schemes are possible: first, after each saccade the retinotopic location of the next target could be estimated, and an appropriate saccade could be generated. We call this the goal updating hypothesis. Alternatively, multiple motor plans could be pre-computed, and they could then be updated after each movement. We call this the motor updating hypothesis. We used McLaughlin’s intra-saccadic step paradigm to artificially create a condition under which these two hypotheses make discriminable predictions. We found that in human subjects, when sequences of two saccades are planned, the motor updating hypothesis predicts the landing position of the second saccade in two-saccade sequences much better than the goal updating hypothesis. This finding suggests that the human saccadic system is capable of executing sequences of saccades to multiple targets by planning multiple motor commands, which are then updated by serial subtraction of ongoing motor output. PMID:21191134
Best Practices and Joint Calling of the HumanExome BeadChip: The CHARGE Consortium
Grove, Megan L.; Yu, Bing; Cochran, Barbara J.; Haritunians, Talin; Bis, Joshua C.; Taylor, Kent D.; Hansen, Mark; Borecki, Ingrid B.; Cupples, L. Adrienne; Fornage, Myriam; Gudnason, Vilmundur; Harris, Tamara B.; Kathiresan, Sekar; Kraaij, Robert; Launer, Lenore J.; Levy, Daniel; Liu, Yongmei; Mosley, Thomas; Peloso, Gina M.; Psaty, Bruce M.; Rich, Stephen S.; Rivadeneira, Fernando; Siscovick, David S.; Smith, Albert V.; Uitterlinden, Andre; van Duijn, Cornelia M.; Wilson, James G.; O’Donnell, Christopher J.; Rotter, Jerome I.; Boerwinkle, Eric
2013-01-01
Genotyping arrays are a cost effective approach when typing previously-identified genetic polymorphisms in large numbers of samples. One limitation of genotyping arrays with rare variants (e.g., minor allele frequency [MAF] <0.01) is the difficulty that automated clustering algorithms have to accurately detect and assign genotype calls. Combining intensity data from large numbers of samples may increase the ability to accurately call the genotypes of rare variants. Approximately 62,000 ethnically diverse samples from eleven Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium cohorts were genotyped with the Illumina HumanExome BeadChip across seven genotyping centers. The raw data files for the samples were assembled into a single project for joint calling. To assess the quality of the joint calling, concordance of genotypes in a subset of individuals having both exome chip and exome sequence data was analyzed. After exclusion of low performing SNPs on the exome chip and non-overlap of SNPs derived from sequence data, genotypes of 185,119 variants (11,356 were monomorphic) were compared in 530 individuals that had whole exome sequence data. A total of 98,113,070 pairs of genotypes were tested and 99.77% were concordant, 0.14% had missing data, and 0.09% were discordant. We report that joint calling allows the ability to accurately genotype rare variation using array technology when large sample sizes are available and best practices are followed. The cluster file from this experiment is available at www.chargeconsortium.com/main/exomechip. PMID:23874508
Easy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiver.
Wymant, Chris; Blanquart, François; Golubchik, Tanya; Gall, Astrid; Bakker, Margreet; Bezemer, Daniela; Croucher, Nicholas J; Hall, Matthew; Hillebregt, Mariska; Ong, Swee Hoe; Ratmann, Oliver; Albert, Jan; Bannert, Norbert; Fellay, Jacques; Fransen, Katrien; Gourlay, Annabelle; Grabowski, M Kate; Gunsenheimer-Bartmeyer, Barbara; Günthard, Huldrych F; Kivelä, Pia; Kouyos, Roger; Laeyendecker, Oliver; Liitsola, Kirsi; Meyer, Laurence; Porter, Kholoud; Ristola, Matti; van Sighem, Ard; Berkhout, Ben; Cornelissen, Marion; Kellam, Paul; Reiss, Peter; Fraser, Christophe
2018-01-01
Studying the evolution of viruses and their molecular epidemiology relies on accurate viral sequence data, so that small differences between similar viruses can be meaningfully interpreted. Despite its higher throughput and more detailed minority variant data, next-generation sequencing has yet to be widely adopted for HIV. The difficulty of accurately reconstructing the consensus sequence of a quasispecies from reads (short fragments of DNA) in the presence of large between- and within-host diversity, including frequent indels, may have presented a barrier. In particular, mapping (aligning) reads to a reference sequence leads to biased loss of information; this bias can distort epidemiological and evolutionary conclusions. De novo assembly avoids this bias by aligning the reads to themselves, producing a set of sequences called contigs. However contigs provide only a partial summary of the reads, misassembly may result in their having an incorrect structure, and no information is available at parts of the genome where contigs could not be assembled. To address these problems we developed the tool shiver to pre-process reads for quality and contamination, then map them to a reference tailored to the sample using corrected contigs supplemented with the user's choice of existing reference sequences. Run with two commands per sample, it can easily be used for large heterogeneous data sets. We used shiver to reconstruct the consensus sequence and minority variant information from paired-end short-read whole-genome data produced with the Illumina platform, for sixty-five existing publicly available samples and fifty new samples. We show the systematic superiority of mapping to shiver's constructed reference compared with mapping the same reads to the closest of 3,249 real references: median values of 13 bases called differently and more accurately, 0 bases called differently and less accurately, and 205 bases of missing sequence recovered. We also successfully applied shiver to whole-genome samples of Hepatitis C Virus and Respiratory Syncytial Virus. shiver is publicly available from https://github.com/ChrisHIV/shiver.
Wilkinson, Samuel L.; John, Shibu; Walsh, Roddy; Novotny, Tomas; Valaskova, Iveta; Gupta, Manu; Game, Laurence; Barton, Paul J R.; Cook, Stuart A.; Ware, James S.
2013-01-01
Background Molecular genetic testing is recommended for diagnosis of inherited cardiac disease, to guide prognosis and treatment, but access is often limited by cost and availability. Recently introduced high-throughput bench-top DNA sequencing platforms have the potential to overcome these limitations. Methodology/Principal Findings We evaluated two next-generation sequencing (NGS) platforms for molecular diagnostics. The protein-coding regions of six genes associated with inherited arrhythmia syndromes were amplified from 15 human samples using parallelised multiplex PCR (Access Array, Fluidigm), and sequenced on the MiSeq (Illumina) and Ion Torrent PGM (Life Technologies). Overall, 97.9% of the target was sequenced adequately for variant calling on the MiSeq, and 96.8% on the Ion Torrent PGM. Regions missed tended to be of high GC-content, and most were problematic for both platforms. Variant calling was assessed using 107 variants detected using Sanger sequencing: within adequately sequenced regions, variant calling on both platforms was highly accurate (Sensitivity: MiSeq 100%, PGM 99.1%. Positive predictive value: MiSeq 95.9%, PGM 95.5%). At the time of the study the Ion Torrent PGM had a lower capital cost and individual runs were cheaper and faster. The MiSeq had a higher capacity (requiring fewer runs), with reduced hands-on time and simpler laboratory workflows. Both provide significant cost and time savings over conventional methods, even allowing for adjunct Sanger sequencing to validate findings and sequence exons missed by NGS. Conclusions/Significance MiSeq and Ion Torrent PGM both provide accurate variant detection as part of a PCR-based molecular diagnostic workflow, and provide alternative platforms for molecular diagnosis of inherited cardiac conditions. Though there were performance differences at this throughput, platforms differed primarily in terms of cost, scalability, protocol stability and ease of use. Compared with current molecular genetic diagnostic tests for inherited cardiac arrhythmias, these NGS approaches are faster, less expensive, and yet more comprehensive. PMID:23861798
The American cranberry: first insights into the whole genome of a species adapted to bog habitat.
Polashock, James; Zelzion, Ehud; Fajardo, Diego; Zalapa, Juan; Georgi, Laura; Bhattacharya, Debashish; Vorsa, Nicholi
2014-06-13
The American cranberry (Vaccinium macrocarpon Ait.) is one of only three widely-cultivated fruit crops native to North America- the other two are blueberry (Vaccinium spp.) and native grape (Vitis spp.). In terms of taxonomy, cranberries are in the core Ericales, an order for which genome sequence data are currently lacking. In addition, cranberries produce a host of important polyphenolic secondary compounds, some of which are beneficial to human health. Whereas next-generation sequencing technology is allowing the advancement of whole-genome sequencing, one major obstacle to the successful assembly from short-read sequence data of complex diploid (and higher ploidy) organisms is heterozygosity. Cranberry has the advantage of being diploid (2n = 2x = 24) and self-fertile. To minimize the issue of heterozygosity, we sequenced the genome of a fifth-generation inbred genotype (F ≥ 0.97) derived from five generations of selfing originating from the cultivar Ben Lear. The genome size of V. macrocarpon has been estimated to be about 470 Mb. Genomic sequences were assembled into 229,745 scaffolds representing 420 Mbp (N50 = 4,237 bp) with 20X average coverage. The number of predicted genes was 36,364 and represents 17.7% of the assembled genome. Of the predicted genes, 30,090 were assigned to candidate genes based on homology. Genes supported by transcriptome data totaled 13,170 (36%). Shotgun sequencing of the cranberry genome, with an average sequencing coverage of 20X, allowed efficient assembly and gene calling. The candidate genes identified represent a useful collection to further study important biochemical pathways and cellular processes and to use for marker development for breeding and the study of horticultural characteristics, such as disease resistance.
The American cranberry: first insights into the whole genome of a species adapted to bog habitat
2014-01-01
Background The American cranberry (Vaccinium macrocarpon Ait.) is one of only three widely-cultivated fruit crops native to North America- the other two are blueberry (Vaccinium spp.) and native grape (Vitis spp.). In terms of taxonomy, cranberries are in the core Ericales, an order for which genome sequence data are currently lacking. In addition, cranberries produce a host of important polyphenolic secondary compounds, some of which are beneficial to human health. Whereas next-generation sequencing technology is allowing the advancement of whole-genome sequencing, one major obstacle to the successful assembly from short-read sequence data of complex diploid (and higher ploidy) organisms is heterozygosity. Cranberry has the advantage of being diploid (2n = 2x = 24) and self-fertile. To minimize the issue of heterozygosity, we sequenced the genome of a fifth-generation inbred genotype (F ≥ 0.97) derived from five generations of selfing originating from the cultivar Ben Lear. Results The genome size of V. macrocarpon has been estimated to be about 470 Mb. Genomic sequences were assembled into 229,745 scaffolds representing 420 Mbp (N50 = 4,237 bp) with 20X average coverage. The number of predicted genes was 36,364 and represents 17.7% of the assembled genome. Of the predicted genes, 30,090 were assigned to candidate genes based on homology. Genes supported by transcriptome data totaled 13,170 (36%). Conclusions Shotgun sequencing of the cranberry genome, with an average sequencing coverage of 20X, allowed efficient assembly and gene calling. The candidate genes identified represent a useful collection to further study important biochemical pathways and cellular processes and to use for marker development for breeding and the study of horticultural characteristics, such as disease resistance. PMID:24927653
Stefanowicz, Karolina; Lannoo, Nausicaä; Proost, Paul; Van Damme, Els J M
2012-01-01
The Arabidopsis thaliana genome contains a small group of bipartite F-box proteins, consisting of an N-terminal F-box domain and a C-terminal domain sharing sequence similarity with Nictaba, the jasmonate-induced glycan-binding protein (lectin) from tobacco. Based on the high sequence similarity between the C-terminal domain of these proteins and Nictaba, the hypothesis was put forward that the so-called F-box-Nictaba proteins possess carbohydrate-binding activity and accordingly can be considered functional homologs of the mammalian sugar-binding F-box or Fbs proteins which are involved in proteasomal degradation of glycoproteins. To obtain experimental evidence for the carbohydrate-binding activity and specificity of the A. thaliana F-box-Nictaba proteins, both the complete F-box-Nictaba sequence of one selected Arabidopsis F-box protein (in casu At2g02360) as well as the Nictaba-like domain only were expressed in Pichia pastoris and analyzed by affinity chromatography, agglutination assays and glycan micro-array binding assays. These results demonstrated that the C-terminal Nictaba-like domain provides the F-box-protein with a carbohydrate-binding activity that is specifically directed against N- and O-glycans containing N-acetyllactosamine (Galβ1-3GlcNAc and Galβ1-4GlcNAc) and poly-N-acetyllactosamine ([Galβ1-4GlcNAc]n) as well as Lewis A (Galβ1-3(Fucα1-4)GlcNAc), Lewis X (Galβ1-4(Fucα1-3)GlcNAc, Lewis Y (Fucα1-2Galβ1-4(Fucα1-3)GlcNAc) and blood type B (Galα1-3(Fucα1-2)Galβ1-3GlcNAc) motifs. Based on these findings one can reasonably conclude that at least the A. thaliana F-box-Nictaba protein encoded by At2g02360 can act as a carbohydrate-binding protein. The results from the glycan array assays revealed differences in sugar-binding specificity between the F-box protein and Nictaba, indicating that the same carbohydrate-binding motif can accommodate unrelated oligosaccharides.
Factors influencing success of clinical genome sequencing across a broad spectrum of disorders
Lise, Stefano; Broxholme, John; Cazier, Jean-Baptiste; Rimmer, Andy; Kanapin, Alexander; Lunter, Gerton; Fiddy, Simon; Allan, Chris; Aricescu, A. Radu; Attar, Moustafa; Babbs, Christian; Becq, Jennifer; Beeson, David; Bento, Celeste; Bignell, Patricia; Blair, Edward; Buckle, Veronica J; Bull, Katherine; Cais, Ondrej; Cario, Holger; Chapel, Helen; Copley, Richard R; Cornall, Richard; Craft, Jude; Dahan, Karin; Davenport, Emma E; Dendrou, Calliope; Devuyst, Olivier; Fenwick, Aimée L; Flint, Jonathan; Fugger, Lars; Gilbert, Rodney D; Goriely, Anne; Green, Angie; Greger, Ingo H.; Grocock, Russell; Gruszczyk, Anja V; Hastings, Robert; Hatton, Edouard; Higgs, Doug; Hill, Adrian; Holmes, Chris; Howard, Malcolm; Hughes, Linda; Humburg, Peter; Johnson, David; Karpe, Fredrik; Kingsbury, Zoya; Kini, Usha; Knight, Julian C; Krohn, Jonathan; Lamble, Sarah; Langman, Craig; Lonie, Lorne; Luck, Joshua; McCarthy, Davis; McGowan, Simon J; McMullin, Mary Frances; Miller, Kerry A; Murray, Lisa; Németh, Andrea H; Nesbit, M Andrew; Nutt, David; Ormondroyd, Elizabeth; Oturai, Annette Bang; Pagnamenta, Alistair; Patel, Smita Y; Percy, Melanie; Petousi, Nayia; Piazza, Paolo; Piret, Sian E; Polanco-Echeverry, Guadalupe; Popitsch, Niko; Powrie, Fiona; Pugh, Chris; Quek, Lynn; Robbins, Peter A; Robson, Kathryn; Russo, Alexandra; Sahgal, Natasha; van Schouwenburg, Pauline A; Schuh, Anna; Silverman, Earl; Simmons, Alison; Sørensen, Per Soelberg; Sweeney, Elizabeth; Taylor, John; Thakker, Rajesh V; Tomlinson, Ian; Trebes, Amy; Twigg, Stephen RF; Uhlig, Holm H; Vyas, Paresh; Vyse, Tim; Wall, Steven A; Watkins, Hugh; Whyte, Michael P; Witty, Lorna; Wright, Ben; Yau, Chris; Buck, David; Humphray, Sean; Ratcliffe, Peter J; Bell, John I; Wilkie, Andrew OM; Bentley, David; Donnelly, Peter; McVean, Gilean
2015-01-01
To assess factors influencing the success of whole genome sequencing for mainstream clinical diagnosis, we sequenced 217 individuals from 156 independent cases across a broad spectrum of disorders in whom prior screening had identified no pathogenic variants. We quantified the number of candidate variants identified using different strategies for variant calling, filtering, annotation and prioritisation. We found that jointly calling variants across samples, filtering against both local and external databases, deploying multiple annotation tools and using familial transmission above biological plausibility contributed to accuracy. Overall, we identified disease causing variants in 21% of cases, rising to 34% (23/68) for Mendelian disorders and 57% (8/14) in trios. We also discovered 32 potentially clinically actionable variants in 18 genes unrelated to the referral disorder, though only four were ultimately considered reportable. Our results demonstrate the value of genome sequencing for routine clinical diagnosis, but also highlight many outstanding challenges. PMID:25985138
Mapping Base Modifications in DNA by Transverse-Current Sequencing
NASA Astrophysics Data System (ADS)
Alvarez, Jose R.; Skachkov, Dmitry; Massey, Steven E.; Kalitsov, Alan; Velev, Julian P.
2018-02-01
Sequencing DNA modifications and lesions, such as methylation of cytosine and oxidation of guanine, is even more important and challenging than sequencing the genome itself. The traditional methods for detecting DNA modifications are either insensitive to these modifications or require additional processing steps to identify a particular type of modification. Transverse-current sequencing in nanopores can potentially identify the canonical bases and base modifications in the same run. In this work, we demonstrate that the most common DNA epigenetic modifications and lesions can be detected with any predefined accuracy based on their tunneling current signature. Our results are based on simulations of the nanopore tunneling current through DNA molecules, calculated using nonequilibrium electron-transport methodology within an effective multiorbital model derived from first-principles calculations, followed by a base-calling algorithm accounting for neighbor current-current correlations. This methodology can be integrated with existing experimental techniques to improve base-calling fidelity.
Divergence of a stereotyped call in northern resident killer whales.
Grebner, Dawn M; Parks, Susan E; Bradley, David L; Miksis-Olds, Jennifer L; Capone, Dean E; Ford, John K B
2011-02-01
Northern resident killer whale pods (Orcinus orca) have distinctive stereotyped pulsed call repertoires that can be used to distinguish groups acoustically. Repertoires are generally stable, with the same call types comprising the repertoire of a given pod over a period of years to decades. Previous studies have shown that some discrete pulsed calls can be subdivided into variants or subtypes. This study suggests that new stereotyped calls may result from the gradual modification of existing call types through subtypes. Vocalizations of individuals and small groups of killer whales were collected using a bottom-mounted hydrophone array in Johnstone Strait, British Columbia in 2006 and 2007. Discriminant analysis of slope variations of a predominant call type, N4, revealed the presence of four distinct call subtypes. Similar to previous studies, there was a divergence of the N4 call between members of different matrilines of the same pod. However, this study reveals that individual killer whales produced multiple subtypes of the N4 call, indicating that divergence in the N4 call is not the result of individual differences, but rather may indicate the gradual evolution of a new stereotyped call.
Gabrielse, Carrie; Miller, Charles T.; McConnell, Kristopher H.; DeWard, Aaron; Fox, Catherine A.; Weinreich, Michael
2006-01-01
Dbf4p is an essential regulatory subunit of the Cdc7p kinase required for the initiation of DNA replication. Cdc7p and Dbf4p orthologs have also been shown to function in the response to DNA damage. A previous Dbf4p multiple sequence alignment identified a conserved ∼40-residue N-terminal region with similarity to the BRCA1 C-terminal (BRCT) motif called “motif N.” BRCT motifs encode ∼100-amino-acid domains involved in the DNA damage response. We have identified an expanded and conserved ∼100-residue N-terminal region of Dbf4p that includes motif N but is capable of encoding a single BRCT-like domain. Dbf4p orthologs diverge from the BRCT motif at the C terminus but may encode a similar secondary structure in this region. We have therefore called this the BRCT and DBF4 similarity (BRDF) motif. The principal role of this Dbf4p motif was in the response to replication fork (RF) arrest; however, it was not required for cell cycle progression, activation of Cdc7p kinase activity, or interaction with the origin recognition complex (ORC) postulated to recruit Cdc7p–Dbf4p to origins. Rad53p likely directly phosphorylated Dbf4p in response to RF arrest and Dbf4p was required for Rad53p abundance. Rad53p and Dbf4p therefore cooperated to coordinate a robust cellular response to RF arrest. PMID:16547092
Farris, Hamilton E; Ryan, Michael J
2017-03-01
Perceptually, grouping sounds based on their sources is critical for communication. This is especially true in túngara frog breeding aggregations, where multiple males produce overlapping calls that consist of an FM 'whine' followed by harmonic bursts called 'chucks'. Phonotactic females use at least two cues to group whines and chucks: whine-chuck spatial separation and sequence. Spatial separation is a primitive cue, whereas sequence is schema-based, as chuck production is morphologically constrained to follow whines, meaning that males cannot produce the components simultaneously. When one cue is available, females perceptually group whines and chucks using relative comparisons: components with the smallest spatial separation or those closest to the natural sequence are more likely grouped. By simultaneously varying the temporal sequence and spatial separation of a single whine and two chucks, this study measured between-cue perceptual weighting during a specific grouping task. Results show that whine-chuck spatial separation is a stronger grouping cue than temporal sequence, as grouping is more likely for stimuli with smaller spatial separation and non-natural sequence than those with larger spatial separation and natural sequence. Compared to the schema-based whine-chuck sequence, we propose that spatial cues have less variance, potentially explaining their preferred use when grouping during directional behavioral responses.
Behavioral Context of Call Production by Eastern North Pacific Blue Whales
2007-01-25
pairs occurring in a repeated song sequence; B calls from a different blue whale are also evident; spectrogram parameters: fast Fourier transform (FFT...Acoustic data were viewed in spectrogram form ( fast Fourier transform [FFT] length 1 s, 80% overlap, Hanning window) to de- termine the presence of calls...dura- tion to song A and B units (Table 2), but the intermit - tent timing clearly distinguishes them from song. Whales producing singular calls were
Genotype calling from next-generation sequencing data using haplotype information of reads
Zhi, Degui; Wu, Jihua; Liu, Nianjun; Zhang, Kui
2012-01-01
Motivation: Low coverage sequencing provides an economic strategy for whole genome sequencing. When sequencing a set of individuals, genotype calling can be challenging due to low sequencing coverage. Linkage disequilibrium (LD) based refinement of genotyping calling is essential to improve the accuracy. Current LD-based methods use read counts or genotype likelihoods at individual potential polymorphic sites (PPSs). Reads that span multiple PPSs (jumping reads) can provide additional haplotype information overlooked by current methods. Results: In this article, we introduce a new Hidden Markov Model (HMM)-based method that can take into account jumping reads information across adjacent PPSs and implement it in the HapSeq program. Our method extends the HMM in Thunder and explicitly models jumping reads information as emission probabilities conditional on the states of adjacent PPSs. Our simulation results show that, compared to Thunder, HapSeq reduces the genotyping error rate by 30%, from 0.86% to 0.60%. The results from the 1000 Genomes Project show that HapSeq reduces the genotyping error rate by 12 and 9%, from 2.24% and 2.76% to 1.97% and 2.50% for individuals with European and African ancestry, respectively. We expect our program can improve genotyping qualities of the large number of ongoing and planned whole genome sequencing projects. Contact: dzhi@ms.soph.uab.edu; kzhang@ms.soph.uab.edu Availability: The software package HapSeq and its manual can be found and downloaded at www.ssg.uab.edu/hapseq/. Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22285565
Number of 24-Hour Diet Recalls Needed to Estimate Energy Intake
MA, Yunsheng; Olendzki, Barbara C.; Pagoto, Sherry L.; Hurley, Thomas G.; Magner, Robert P.; Ockene, Ira S.; Schneider, Kristin L.; Merriam, Philip A.; Hébert, James R.
2009-01-01
Purpose Twenty-four-hour diet recall interviews (24HRs) are used to assess diet and to validate other diet assessment instruments. Therefore it is important to know how many 24HRs are required to describe an individual's intake. Method Seventy-nine middle-aged white women completed seven 24HRs over a 14-day period, during which energy expenditure (EE) was determined by the doubly labeled water method (DLW). Mean daily intakes were compared to DLW-derived EE using paired t tests. Linear mixed models were used to evaluate the effect of call sequence and day of the week on 24HR-derived energy intake while adjusting for education, relative body weight, social desirability, and an interaction between call sequence and social desirability. Results Mean EE from DLW was 2115 kcal/day. Adjusted 24HR-derived energy intake was lowest at call 1 (1501 kcal/day); significantly higher energy intake was observed at calls 2 and 3 (2246 and 2315 kcal/day, respectively). Energy intake on Friday was significantly lower than on Sunday. Averaging energy intake from the first two calls better approximated true energy expenditure than did the first call, and averaging the first three calls further improved the estimate (p = 0.02 for both comparisons). Additional calls did not improve estimation. Conclusions Energy intake is underreported on the first 24HR. Three 24HRs appear optimal for estimating energy intake. PMID:19576535
Number of 24-hour diet recalls needed to estimate energy intake.
Ma, Yunsheng; Olendzki, Barbara C; Pagoto, Sherry L; Hurley, Thomas G; Magner, Robert P; Ockene, Ira S; Schneider, Kristin L; Merriam, Philip A; Hébert, James R
2009-08-01
Twenty-four-hour diet recall interviews (24HRs) are used to assess diet and to validate other diet assessment instruments. Therefore it is important to know how many 24HRs are required to describe an individual's intake. Seventy-nine middle-aged white women completed seven 24HRs over a 14-day period, during which energy expenditure (EE) was determined by the doubly labeled water method (DLW). Mean daily intakes were compared to DLW-derived EE using paired t tests. Linear mixed models were used to evaluate the effect of call sequence and day of the week on 24HR-derived energy intake while adjusting for education, relative body weight, social desirability, and an interaction between call sequence and social desirability. Mean EE from DLW was 2115 kcal/day. Adjusted 24HR-derived energy intake was lowest at call 1 (1501 kcal/day); significantly higher energy intake was observed at calls 2 and 3 (2246 and 2315 kcal/day, respectively). Energy intake on Friday was significantly lower than on Sunday. Averaging energy intake from the first two calls better approximated true energy expenditure than did the first call, and averaging the first three calls further improved the estimate (p=0.02 for both comparisons). Additional calls did not improve estimation. Energy intake is underreported on the first 24HR. Three 24HRs appear optimal for estimating energy intake.
Best practices for evaluating single nucleotide variant calling methods for microbial genomics
Olson, Nathan D.; Lund, Steven P.; Colman, Rebecca E.; Foster, Jeffrey T.; Sahl, Jason W.; Schupp, James M.; Keim, Paul; Morrow, Jayne B.; Salit, Marc L.; Zook, Justin M.
2015-01-01
Innovations in sequencing technologies have allowed biologists to make incredible advances in understanding biological systems. As experience grows, researchers increasingly recognize that analyzing the wealth of data provided by these new sequencing platforms requires careful attention to detail for robust results. Thus far, much of the scientific Communit’s focus for use in bacterial genomics has been on evaluating genome assembly algorithms and rigorously validating assembly program performance. Missing, however, is a focus on critical evaluation of variant callers for these genomes. Variant calling is essential for comparative genomics as it yields insights into nucleotide-level organismal differences. Variant calling is a multistep process with a host of potential error sources that may lead to incorrect variant calls. Identifying and resolving these incorrect calls is critical for bacterial genomics to advance. The goal of this review is to provide guidance on validating algorithms and pipelines used in variant calling for bacterial genomics. First, we will provide an overview of the variant calling procedures and the potential sources of error associated with the methods. We will then identify appropriate datasets for use in evaluating algorithms and describe statistical methods for evaluating algorithm performance. As variant calling moves from basic research to the applied setting, standardized methods for performance evaluation and reporting are required; it is our hope that this review provides the groundwork for the development of these standards. PMID:26217378
Use of Security Officers on Inpatient Psychiatry Units.
Lawrence, Ryan E; Perez-Coste, Maria M; Arkow, Stan D; Appelbaum, Paul S; Dixon, Lisa B
2018-04-02
Violent and aggressive behaviors are common among psychiatric inpatients. Hospital security officers are sometimes used to address such behaviors. Research on the role of security in inpatient units is scant. This study examined when security is utilized and what happens when officers arrive. The authors reviewed the security logbook and the medical records for all patients discharged from an inpatient psychiatry unit over a six-month period. Authors recorded when security calls happened, what behaviors triggered security calls, what outcomes occurred, and whether any patient characteristics were associated with security calls. A total of 272 unique patients were included. A total of 49 patients (18%) generated security calls (N=157 calls). Security calls were most common in the first week of hospitalization (N=45 calls), and roughly half of the patients (N=25 patients) had only one call. The most common inciting behavior was "threats to persons" (N=34 calls), and the most common intervention was intramuscular antipsychotic injection (N=49 calls). The patient variables associated with security calls were having more than one prior hospitalization (odds ratio [OR]=4.56, p=.001, 95% confidence interval [CI]=1.80-11.57), involuntary hospitalization (OR=5.09, p<.001, CI=2.28-11.33), and going to court for any reason (OR=5.80, p=.004, CI=1.75-19.15). Security officers were often called for threats of violence and occasionally called for actual violence. Patient variables associated with security calls are common among inpatients, and thus clinicians should stay attuned to patients' moment-to-moment care needs.
Trembizki, Ella; Smith, Helen; Lahra, Monica M; Chen, Marcus; Donovan, Basil; Fairley, Christopher K; Guy, Rebecca; Kaldor, John; Regan, David; Ward, James; Nissen, Michael D; Sloots, Theo P; Whiley, David M
2014-06-01
Neisseria gonorrhoeae antimicrobial resistance (AMR) is a global problem heightened by emerging resistance to ceftriaxone. Appropriate molecular typing methods are important for understanding the emergence and spread of N. gonorrhoeae AMR. We report on the development, validation and testing of a Sequenom MassARRAY iPLEX method for multilocus sequence typing (MLST)-style genotyping of N. gonorrhoeae isolates. An iPLEX MassARRAY method (iPLEX14SNP) was developed targeting 14 informative gonococcal single nucleotide polymorphisms (SNPs) previously shown to predict MLST types. The method was initially validated using 24 N. gonorrhoeae control isolates and was then applied to 397 test isolates collected throughout Queensland, Australia in the first half of 2012. The iPLEX14SNP method provided 100% accuracy for the control isolates, correctly identifying all 14 SNPs for all 24 isolates (336/336). For the 397 test isolates, the iPLEX14SNP assigned results for 5461 of the possible 5558 SNPs (SNP call rate 98.25%), with complete 14 SNP profiles obtained for 364 isolates. Based on the complete SNP profile data, there were 49 different sequence types identified in Queensland, with 11 of the 49 SNP profiles accounting for the majority (n = 280; 77%) of isolates. AMR was dominated by several geographically clustered sequence types. Using the iPLEX14SNP method, up to 384 isolates could be tested within 1 working day for less than Aus$10 per isolate. The iPLEX14SNP offers an accurate and high-throughput method for the MLST-style genotyping of N. gonorrhoeae and may prove particularly useful for large-scale studies investigating the emergence and spread of gonococcal AMR. © The Author 2014. Published by Oxford University Press on behalf of the British Society for Antimicrobial Chemotherapy. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Tribonacci-Like Sequences and Generalized Pascal's Pyramids
ERIC Educational Resources Information Center
Anatriello, Giuseppina; Vincenzi, Giovanni
2014-01-01
A well-known result of Feinberg and Shannon states that the tribonacci sequence can be detected by the so-called "Pascal's pyramid." Here we will show that any tribonacci-like sequence can be obtained by the diagonals of the "Feinberg's triangle" associated to a suitable "generalized Pascal's pyramid."…
USDA-ARS?s Scientific Manuscript database
The tomato genome sequence was undertaken at a time when state-of-the-art sequencing methodologies were undergoing a transition to co-called next generation methodologies. The result was an international consortium undertaking a strategy merging both old and new approaches. Because biologists were...
Gapped pulses for frequency-swept MRI
NASA Astrophysics Data System (ADS)
Idiyatullin, Djaudat; Corum, Curt; Moeller, Steen; Garwood, Michael
2008-08-01
A recently introduced method called SWIFT (SWeep Imaging with Fourier Transform) is a fundamentally different approach to MRI which is particularly well suited to imaging objects with extremely fast spin-spin relaxation rates. The method exploits a frequency-swept excitation pulse and virtually simultaneous signal acquisition in a time-shared mode. Correlation of the spin system response with the excitation pulse function is used to extract the signals of interest. With SWIFT, image quality is highly dependent on producing uniform and broadband spin excitation. These requirements are satisfied by using frequency-modulated pulses belonging to the hyperbolic secant family (HS n pulses). This article describes the experimental steps needed to properly implement HS n pulses in SWIFT. In addition, properties of HS n pulses in the rapid passage, linear region are investigated, followed by an analysis of the pulses after inserting the "gaps" needed for time-shared excitation and acquisition. Finally, compact expressions are presented to estimate the amplitude and flip angle of the HS n pulses, as well as the relative energy deposited by the SWIFT sequence.
Hybrid de novo genome assembly of the Chinese herbal fleabane Erigeron breviscapus
Zhang, Guanghui; Zhang, Jing; Liu, Hui; Chen, Wei; Wang, Xiao; Li, Yahe
2017-01-01
Abstract Background: The plants in the Erigeron genus of the Compositae (Asteraceae) family are commonly called fleabanes, possibly due to the belief that certain chemicals in these plants repel fleas. In the traditional Chinese medicine, Erigeron breviscapus, which is native to China, was widely used in the treatment of cerebrovascular disease. A handful of bioactive compounds, including scutellarin, 3,5-dicaffeoylquinic acid, and 3,4-dicaffeoylquinic acid, have been isolated from the plant. With the purpose of finding novel medicinal compounds and understanding their biosynthetic pathways, we propose to sequence the genome of E. breviscapus. Findings: We assembled the highly heterozygous E. breviscapus genome using a combination of PacBio single-molecular real-time sequencing and next-generation sequencing methods on the Illumina HiSeq platform. The final draft genome is approximately 1.2 Gb, with contig and scaffold N50 sizes of 18.8 kb and 31.5 kb, respectively. Further analyses predicted 37 504 protein-coding genes in the E. breviscapus genome and 8172 shared gene families among Compositae species. Conclusions: The E. breviscapus genome provides a valuable resource for the investigation of novel bioactive compounds in this Chinese herb. PMID:28431028
Su, Qingxian; Ma, Chun; Domingo-Félez, Carlos; Kiil, Anne Sofie; Thamdrup, Bo; Jensen, Marlene Mark; Smets, Barth F
2017-10-15
Nitrous oxide (N 2 O) production from autotrophic nitrogen conversion processes, especially nitritation systems, can be significant, requires understanding and calls for mitigation. In this study, the rates and pathways of N 2 O production were quantified in two lab-scale sequencing batch reactors operated with intermittent feeding and demonstrating long-term and high-rate nitritation. The resulting reactor biomass was highly enriched in ammonia-oxidizing bacteria, and converted ∼93 ± 14% of the oxidized ammonium to nitrite. The low DO set-point combined with intermittent feeding was sufficient to maintain high nitritation efficiency and high nitritation rates at 20-26 °C over a period of ∼300 days. Even at the high nitritation efficiencies, net N 2 O production was low (∼2% of the oxidized ammonium). Net N 2 O production rates transiently increased with a rise in pH after each feeding, suggesting a potential effect of pH on N 2 O production. In situ application of 15 N labeled substrates revealed nitrifier denitrification as the dominant pathway of N 2 O production. Our study highlights operational conditions that minimize N 2 O emission from two-stage autotrophic nitrogen removal systems. Copyright © 2017 Elsevier Ltd. All rights reserved.
An investigation of the uniform random number generator
NASA Technical Reports Server (NTRS)
Temple, E. C.
1982-01-01
Most random number generators that are in use today are of the congruential form X(i+1) + AX(i) + C mod M where A, C, and M are nonnegative integers. If C=O, the generator is called the multiplicative type and those for which C/O are called mixed congruential generators. It is easy to see that congruential generators will repeat a sequence of numbers after a maximum of M values have been generated. The number of numbers that a procedure generates before restarting the sequence is called the length or the period of the generator. Generally, it is desirable to make the period as long as possible. A detailed discussion of congruential generators is given. Also, several promising procedures that differ from the multiplicative and mixed procedure are discussed.
Kim, Junho; Maeng, Ju Heon; Lim, Jae Seok; Son, Hyeonju; Lee, Junehawk; Lee, Jeong Ho; Kim, Sangwoo
2016-10-15
Advances in sequencing technologies have remarkably lowered the detection limit of somatic variants to a low frequency. However, calling mutations at this range is still confounded by many factors including environmental contamination. Vector contamination is a continuously occurring issue and is especially problematic since vector inserts are hardly distinguishable from the sample sequences. Such inserts, which may harbor polymorphisms and engineered functional mutations, can result in calling false variants at corresponding sites. Numerous vector-screening methods have been developed, but none could handle contamination from inserts because they are focusing on vector backbone sequences alone. We developed a novel method-Vecuum-that identifies vector-originated reads and resultant false variants. Since vector inserts are generally constructed from intron-less cDNAs, Vecuum identifies vector-originated reads by inspecting the clipping patterns at exon junctions. False variant calls are further detected based on the biased distribution of mutant alleles to vector-originated reads. Tests on simulated and spike-in experimental data validated that Vecuum could detect 93% of vector contaminants and could remove up to 87% of variant-like false calls with 100% precision. Application to public sequence datasets demonstrated the utility of Vecuum in detecting false variants resulting from various types of external contamination. Java-based implementation of the method is available at http://vecuum.sourceforge.net/ CONTACT: swkim@yuhs.acSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Gordon, Shira D; Ter Hofstede, Hannah M
2018-03-22
Animals co-occur with multiple predators, making sensory systems that can encode information about diverse predators advantageous. Moths in the families Noctuidae and Erebidae have ears with two auditory receptor cells (A1 and A2) used to detect the echolocation calls of predatory bats. Bat communities contain species that vary in echolocation call duration, and the dynamic range of A1 is limited by the duration of sound, suggesting that A1 provides less information about bats with shorter echolocation calls. To test this hypothesis, we obtained intensity-response functions for both receptor cells across many moth species for sound pulse durations representing the range of echolocation call durations produced by bat species in northeastern North America. We found that the threshold and dynamic range of both cells varied with sound pulse duration. The number of A1 action potentials per sound pulse increases linearly with increasing amplitude for long-duration pulses, saturating near the A2 threshold. For short sound pulses, however, A1 saturates with only a few action potentials per pulse at amplitudes far lower than the A2 threshold for both single sound pulses and pulse sequences typical of searching or approaching bats. Neural adaptation was only evident in response to approaching bat sequences at high amplitudes, not search-phase sequences. These results show that, for short echolocation calls, a large range of sound levels cannot be coded by moth auditory receptor activity, resulting in no information about the distance of a bat, although differences in activity between ears might provide information about direction. © 2018. Published by The Company of Biologists Ltd.
[Research advances of genomic GYP coding MNS blood group antigens].
Liu, Chang-Li; Zhao, Wei-Jun
2012-02-01
The MNS blood group system includes more than 40 antigens, and the M, N, S and s antigens are the most significant ones in the system. The antigenic determinants of M and N antigens lie on the top of GPA on the surface of red blood cells, while the antigenic determinants of S and s antigens lie on the top of GPB on the surface of red blood cells. The GYPA gene coding GPA and the GYPB gene coding GPB locate at the longarm of chromosome 4 and display 95% homologus sequence, meanwhile both genes locate closely to GYPE gene that did not express product. These three genes formed "GYPA-GYPB-GYPE" structure called GYP genome. This review focuses on the molecular basis of genomic GYP and the variety of GYP genome in the expression of diversity MNS blood group antigens. The molecular basis of Miltenberger hybrid glycophorin polymorphism is specifically expounded.
DIVWAG Model Documentation. Volume II. Programmer/Analyst Manual. Part 3. Chapter 9 Through 12.
1976-07-01
entered through a routine, NAM2, that calls the segment controlling routine NBARAS. (4) Segment 3, controlled by the routine NFIRE , simulates round...nuclear fire, NAM calls in sequence the routines NFIRE (segment 3), ASUNIT (segment 2), SASSMT (segment 4), and NFIRE (segment 3). These calls simulate...this is a call to NFIRE (ISEG equals one or two), control goes to block L2. (2) Block 2. If this is to assess a unit passing through a nuclear barrier
Ye, Kai; Kosters, Walter A; Ijzerman, Adriaan P
2007-03-15
Pattern discovery in protein sequences is often based on multiple sequence alignments (MSA). The procedure can be computationally intensive and often requires manual adjustment, which may be particularly difficult for a set of deviating sequences. In contrast, two algorithms, PRATT2 (http//www.ebi.ac.uk/pratt/) and TEIRESIAS (http://cbcsrv.watson.ibm.com/) are used to directly identify frequent patterns from unaligned biological sequences without an attempt to align them. Here we propose a new algorithm with more efficiency and more functionality than both PRATT2 and TEIRESIAS, and discuss some of its applications to G protein-coupled receptors, a protein family of important drug targets. In this study, we designed and implemented six algorithms to mine three different pattern types from either one or two datasets using a pattern growth approach. We compared our approach to PRATT2 and TEIRESIAS in efficiency, completeness and the diversity of pattern types. Compared to PRATT2, our approach is faster, capable of processing large datasets and able to identify the so-called type III patterns. Our approach is comparable to TEIRESIAS in the discovery of the so-called type I patterns but has additional functionality such as mining the so-called type II and type III patterns and finding discriminating patterns between two datasets. The source code for pattern growth algorithms and their pseudo-code are available at http://www.liacs.nl/home/kosters/pg/.
Clark, A M; Jacobsen, K R; Bostwick, D E; Dannenhoffer, J M; Skaggs, M I; Thompson, G A
1997-07-01
Sieve elements in the phloem of most angiosperms contain proteinaceous filaments and aggregates called P-protein. In the genus Cucurbita, these filaments are composed of two major proteins: PP1, the phloem filament protein, and PP2, the phloem lactin. The gene encoding the phloem filament protein in pumpkin (Cucurbita maxima Duch.) has been isolated and characterized. Nucleotide sequence analysis of the reconstructed gene gPP1 revealed a continuous 2430 bp protein coding sequence, with no introns, encoding an 809 amino acid polypeptide. The deduced polypeptide had characteristics of PP1 and contained a 15 amino acid sequence determined by N-terminal peptide sequence analysis of PP1. The sequence of PP1 was highly repetitive with four 200 amino acid sequence domains containing structural motifs in common with cysteine proteinase inhibitors. Expression of the PP1 gene was detected in roots, hypocotyls, cotyledons, stems, and leaves of pumpkin plants. PP1 and its mRNA accumulated in pumpkin hypocotyls during the period of rapid hypocotyl elongation after which mRNA levels declined, while protein levels remained elevated. PP1 was immunolocalized in slime plugs and P-protein bodies in sieve elements of the phloem. Occasionally, PP1 was detected in companion cells. PP1 mRNA was localized by in situ hybridization in companion cells at early stages of vascular differentiation. The developmental accumulation and localization of PP1 and its mRNA paralleled the phloem lactin, further suggesting an interaction between these phloem-specific proteins.
A new estimator of the discovery probability.
Favaro, Stefano; Lijoi, Antonio; Prünster, Igor
2012-12-01
Species sampling problems have a long history in ecological and biological studies and a number of issues, including the evaluation of species richness, the design of sampling experiments, and the estimation of rare species variety, are to be addressed. Such inferential problems have recently emerged also in genomic applications, however, exhibiting some peculiar features that make them more challenging: specifically, one has to deal with very large populations (genomic libraries) containing a huge number of distinct species (genes) and only a small portion of the library has been sampled (sequenced). These aspects motivate the Bayesian nonparametric approach we undertake, since it allows to achieve the degree of flexibility typically needed in this framework. Based on an observed sample of size n, focus will be on prediction of a key aspect of the outcome from an additional sample of size m, namely, the so-called discovery probability. In particular, conditionally on an observed basic sample of size n, we derive a novel estimator of the probability of detecting, at the (n+m+1)th observation, species that have been observed with any given frequency in the enlarged sample of size n+m. Such an estimator admits a closed-form expression that can be exactly evaluated. The result we obtain allows us to quantify both the rate at which rare species are detected and the achieved sample coverage of abundant species, as m increases. Natural applications are represented by the estimation of the probability of discovering rare genes within genomic libraries and the results are illustrated by means of two expressed sequence tags datasets. © 2012, The International Biometric Society.
Traut, Walther; Ahola, Virpi; Smith, David A S; Gordon, Ian J; Ffrench-Constant, Richard H
2017-01-01
The number of sequenced lepidopteran genomes is increasing rapidly. However, the corresponding assemblies rarely represent whole chromosomes and generally also lack the highly repetitive W sex chromosome. Knowledge of the karyotypes can facilitate genome assembly and further our understanding of sex chromosome evolution in Lepidoptera. Here, we describe the karyotypes of the Glanville fritillary Melitaea cinxia (n = 31), the monarch Danaus plexippus (n = 30), and the African queen D. chrysippus (2n = 60 or 59, depending on the source population). We show by FISH that the telomeres are of the (TTAGG)n type, as found in most insects. M. cinxia and D. plexippus have "conventional" W chromosomes which are heterochromatic in meiotic and somatic cells. In D. chrysippus, the W is inconspicuous. Neither telomeres nor W chromosomes are represented in the published genomes of M. cinxia and D. plexippus. Representation analysis in sequenced female and male D. chrysippus genomes detected an evolutionarily old autosome-Z chromosome fusion in Danaus. Conserved synteny of whole chromosomes, so called "macro synteny", in Lepidoptera permitted us to identify the chromosomes involved in this fusion. An additional and more recent sex chromosome fusion was found in D. chrysippus by karyotype analysis and classical genetics. In a hybrid population between 2 subspecies, D. c. chrysippus and D. c. dorippus, the W chromosome was fused to an autosome that carries a wing colour locus. Thus, cytogenetics and the present state of genome data complement one another to reveal the evolutionary history of the species. © 2017 S. Karger AG, Basel.
CNNdel: Calling Structural Variations on Low Coverage Data Based on Convolutional Neural Networks
2017-01-01
Many structural variations (SVs) detection methods have been proposed due to the popularization of next-generation sequencing (NGS). These SV calling methods use different SV-property-dependent features; however, they all suffer from poor accuracy when running on low coverage sequences. The union of results from these tools achieves fairly high sensitivity but still produces low accuracy on low coverage sequence data. That is, these methods contain many false positives. In this paper, we present CNNdel, an approach for calling deletions from paired-end reads. CNNdel gathers SV candidates reported by multiple tools and then extracts features from aligned BAM files at the positions of candidates. With labeled feature-expressed candidates as a training set, CNNdel trains convolutional neural networks (CNNs) to distinguish true unlabeled candidates from false ones. Results show that CNNdel works well with NGS reads from 26 low coverage genomes of the 1000 Genomes Project. The paper demonstrates that convolutional neural networks can automatically assign the priority of SV features and reduce the false positives efficaciously. PMID:28630866
Sequential addition of short DNA oligos in DNA-polymerase-based synthesis reactions
Gardner, Shea N [San Leandro, CA; Mariella, Jr., Raymond P.; Christian, Allen T [Tracy, CA; Young, Jennifer A [Berkeley, CA; Clague, David S [Livermore, CA
2011-01-18
A method of fabricating a DNA molecule of user-defined sequence. The method comprises the steps of preselecting a multiplicity of DNA sequence segments that will comprise the DNA molecule of user-defined sequence, separating the DNA sequence segments temporally, and combining the multiplicity of DNA sequence segments with at least one polymerase enzyme wherein the multiplicity of DNA sequence segments join to produce the DNA molecule of user-defined sequence. Sequence segments may be of length n, where n is an even or odd integer. In one embodiment the length of desired hybridizing overlap is specified by the user and the sequences and the protocol for combining them are guided by computational (bioinformatics) predictions. In one embodiment sequence segments are combined from multiple reading frames to span the same region of a sequence, so that multiple desired hybridizations may occur with different overlap lengths. In one embodiment starting sequence fragments are of different lengths, n, n+1, n+2, etc.
ERIC Educational Resources Information Center
Küçüközer, Asuman
2006-01-01
This study aims to better understand the construction of the meaning of physics concepts in mechanics during a teaching sequence at the upper secondary school level. In the teaching sessions, students were introduced to the concepts of interaction and force. During this teaching sequence the models called "interactions" and "laws of…
Formulaic Sequences Used by Native English Speaking Teachers in a Thai Primary School
ERIC Educational Resources Information Center
Steyn, Sunee; Jaroongkhongdach, Woravut
2016-01-01
The use of formulaic sequences in English as a Foreign Language (EFL) lessons plays an integral role in language teaching and learning, but it seems still widely neglected in the Thai school context. To call attention to this issue, this study aims at identifying formulaic sequences used in a Thai primary school. The data were taken from three…
Sequencing Conservation Actions Through Threat Assessments in the Southeastern United States
Robert D. Sutter; Christopher C. Szell
2006-01-01
The identification of conservation priorities is one of the leading issues in conservation biology. We present a project of The Nature Conservancy, called Sequencing Conservation Actions, which prioritizes conservation areas and identifies foci for crosscutting strategies at various geographic scales. We use the term âSequencingâ to mean an ordering of actions over...
Mallory, Melanie A; Lucic, Danijela X; Sears, Mitchell T; Cloherty, Gavin A; Hillyard, David R
2014-05-01
HCV genotyping is a critical tool for guiding initiation of therapy and selecting the most appropriate treatment regimen. To evaluate the concordance between the Abbott GT II assay and genotyping by sequencing subregions of the HCV 5'UTR, core and NS5B. The Abbott assay was used to genotype 127 routine patient specimens and 35 patient specimens with unusual subtypes and mixed infection. Abbott results were compared to genotyping by 5'UTR, core and NS5B sequencing. Sequences were genotyped using the NCBI non-redundant database and the online genotyping tool COMET. Among routine specimens, core/NS5B sequencing identified 93 genotype 1s, 13 genotype 2s, 15 genotype 3s, three genotype 4s, two genotype 6s and one recombinant specimen. Genotype calls by 5'UTR, core, NS5B sequencing and the Abbott assay were 97.6% concordant. Core/NS5B sequencing identified two discrepant samples as genotype 6 (subtypes 6l and 6u) while Abbott and 5'UTR sequencing identified these samples as genotype 1 with no subtype. The Abbott assay subtyped 91.4% of genotype 1 specimens. Among the 35 rare specimens, the Abbott assay inaccurately genotyped 3k, 6e, 6o, 6q and one genotype 4 variant; gave indeterminate results for 3g, 3h, 4r, 6m, 6n, and 6q specimens; and agreed with core/NS5B sequencing for mixed specimens. The Abbott assay is an automated HCV genotyping method with improved accuracy over 5'UTR sequencing. Samples identified by the Abbott assay as genotype 1 with no subtype may be rare subtypes of other genotypes and thus require confirmation by another method. Copyright © 2014 Elsevier B.V. All rights reserved.
Shark Ig light chain junctions are as diverse as in heavy chains.
Fleurant, Marshall; Changchien, Lily; Chen, Chin-Tung; Flajnik, Martin F; Hsu, Ellen
2004-11-01
We have characterized a small family of four genes encoding one of the three nurse shark Ig L chain isotypes, called NS5. All NS5 cDNA sequences are encoded by three loci, of which two are organized as conventional clusters, each consisting of a V and J gene segment that can recombine and one C region exon; the third contains a germline-joined VJ in-frame and the fourth locus is a pseudogene. This is the second nurse shark L chain type where both germline-joined and split V-J organizations have been found. Since there are only two rearranging Ig loci, it was possible for the first time to examine junctional diversity in defined fish Ig genes, comparing productive vs nonproductive rearrangements. N region addition was found to be considerably more extensive in length and in frequency than any other vertebrate L chain so far reported and rivals that in H chain. We put forth the speculation that the unprecedented efficiency of N region addition (87-93% of NS5 sequences) may be a result not only of simultaneous H and L chain rearrangement in the shark but also of processing events that afford greater accessibility of the V or J gene coding ends to terminal deoxynucleotidyltransferase.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ton, H.; Yeung, E.S.
1997-02-15
An integrated on-line prototype for coupling a microreactor to capillary electrophoresis for DNA sequencing has been demonstrated. A dye-labeled terminator cycle-sequencing reaction is performed in a fused-silica capillary. Subsequently, the sequencing ladder is directly injected into a size-exclusion chromatographic column operated at nearly 95{degree}C for purification. On-line injection to a capillary for electrophoresis is accomplished at a junction set at nearly 70{degree}C. High temperature at the purification column and injection junction prevents the renaturation of DNA fragments during on-line transfer without affecting the separation. The high solubility of DNA in and the relatively low ionic strength of 1 x TEmore » buffer permit both effective purification and electrokinetic injection of the DNA sample. The system is compatible with highly efficient separations by a replaceable poly(ethylene oxide) polymer solution in uncoated capillary tubes. Future automation and adaptation to a multiple-capillary array system should allow high-speed, high-throughput DNA sequencing from templates to called bases in one step. 32 refs., 5 figs.« less
Ward, T R; Hoang, M L; Prusty, R; Lau, C K; Keil, R L; Fangman, W L; Brewer, B J
2000-07-01
In the ribosomal DNA of Saccharomyces cerevisiae, sequences in the nontranscribed spacer 3' of the 35S ribosomal RNA gene are important to the polar arrest of replication forks at a site called the replication fork barrier (RFB) and also to the cis-acting, mitotic hyperrecombination site called HOT1. We have found that the RFB and HOT1 activity share some but not all of their essential sequences. Many of the mutations that reduce HOT1 recombination also decrease or eliminate fork arrest at one of two closely spaced RFB sites, RFB1 and RFB2. A simple model for the juxtaposition of RFB and HOT1 sequences is that the breakage of strands in replication forks arrested at RFB stimulates recombination. Contrary to this model, we show here that HOT1-stimulated recombination does not require the arrest of forks at the RFB. Therefore, while HOT1 activity is independent of replication fork arrest, HOT1 and RFB require some common sequences, suggesting the existence of a common trans-acting factor(s).
Tso, Kai-Yuen; Lee, Sau Dan; Lo, Kwok-Wai; Yip, Kevin Y
2014-12-23
Patient-derived tumor xenografts in mice are widely used in cancer research and have become important in developing personalized therapies. When these xenografts are subject to DNA sequencing, the samples could contain various amounts of mouse DNA. It has been unclear how the mouse reads would affect data analyses. We conducted comprehensive simulations to compare three alignment strategies at different mutation rates, read lengths, sequencing error rates, human-mouse mixing ratios and sequenced regions. We also sequenced a nasopharyngeal carcinoma xenograft and a cell line to test how the strategies work on real data. We found the "filtering" and "combined reference" strategies performed better than aligning reads directly to human reference in terms of alignment and variant calling accuracies. The combined reference strategy was particularly good at reducing false negative variants calls without significantly increasing the false positive rate. In some scenarios the performance gain of these two special handling strategies was too small for special handling to be cost-effective, but it was found crucial when false non-synonymous SNVs should be minimized, especially in exome sequencing. Our study systematically analyzes the effects of mouse contamination in the sequencing data of human-in-mouse xenografts. Our findings provide information for designing data analysis pipelines for these data.
Gopinath, T; Mote, Kaustubh R; Veglia, Gianluigi
2015-05-01
We present a new method called DAISY (Dual Acquisition orIented ssNMR spectroScopY) for the simultaneous acquisition of 2D and 3D oriented solid-state NMR experiments for membrane proteins reconstituted in mechanically or magnetically aligned lipid bilayers. DAISY utilizes dual acquisition of sine and cosine dipolar or chemical shift coherences and long living (15)N longitudinal polarization to obtain two multi-dimensional spectra, simultaneously. In these new experiments, the first acquisition gives the polarization inversion spin exchange at the magic angle (PISEMA) or heteronuclear correlation (HETCOR) spectra, the second acquisition gives PISEMA-mixing or HETCOR-mixing spectra, where the mixing element enables inter-residue correlations through (15)N-(15)N homonuclear polarization transfer. The analysis of the two 2D spectra (first and second acquisitions) enables one to distinguish (15)N-(15)N inter-residue correlations for sequential assignment of membrane proteins. DAISY can be implemented in 3D experiments that include the polarization inversion spin exchange at magic angle via I spin coherence (PISEMAI) sequence, as we show for the simultaneous acquisition of 3D PISEMAI-HETCOR and 3D PISEMAI-HETCOR-mixing experiments.
"First generation" automated DNA sequencing technology.
Slatko, Barton E; Kieleczawa, Jan; Ju, Jingyue; Gardner, Andrew F; Hendrickson, Cynthia L; Ausubel, Frederick M
2011-10-01
Beginning in the 1980s, automation of DNA sequencing has greatly increased throughput, reduced costs, and enabled large projects to be completed more easily. The development of automation technology paralleled the development of other aspects of DNA sequencing: better enzymes and chemistry, separation and imaging technology, sequencing protocols, robotics, and computational advancements (including base-calling algorithms with quality scores, database developments, and sequence analysis programs). Despite the emergence of high-throughput sequencing platforms, automated Sanger sequencing technology remains useful for many applications. This unit provides background and a description of the "First-Generation" automated DNA sequencing technology. It also includes protocols for using the current Applied Biosystems (ABI) automated DNA sequencing machines. © 2011 by John Wiley & Sons, Inc.
Effective normalization for copy number variation detection from whole genome sequencing.
Janevski, Angel; Varadan, Vinay; Kamalakaran, Sitharthan; Banerjee, Nilanjana; Dimitrova, Nevenka
2012-01-01
Whole genome sequencing enables a high resolution view of the human genome and provides unique insights into genome structure at an unprecedented scale. There have been a number of tools to infer copy number variation in the genome. These tools, while validated, also include a number of parameters that are configurable to genome data being analyzed. These algorithms allow for normalization to account for individual and population-specific effects on individual genome CNV estimates but the impact of these changes on the estimated CNVs is not well characterized. We evaluate in detail the effect of normalization methodologies in two CNV algorithms FREEC and CNV-seq using whole genome sequencing data from 8 individuals spanning four populations. We apply FREEC and CNV-seq to a sequencing data set consisting of 8 genomes. We use multiple configurations corresponding to different read-count normalization methodologies in FREEC, and statistically characterize the concordance of the CNV calls between FREEC configurations and the analogous output from CNV-seq. The normalization methodologies evaluated in FREEC are: GC content, mappability and control genome. We further stratify the concordance analysis within genic, non-genic, and a collection of validated variant regions. The GC content normalization methodology generates the highest number of altered copy number regions. Both mappability and control genome normalization reduce the total number and length of copy number regions. Mappability normalization yields Jaccard indices in the 0.07 - 0.3 range, whereas using a control genome normalization yields Jaccard index values around 0.4 with normalization based on GC content. The most critical impact of using mappability as a normalization factor is substantial reduction of deletion CNV calls. The output of another method based on control genome normalization, CNV-seq, resulted in comparable CNV call profiles, and substantial agreement in variable gene and CNV region calls. Choice of read-count normalization methodology has a substantial effect on CNV calls and the use of genomic mappability or an appropriately chosen control genome can optimize the output of CNV analysis.
Estimating genotype error rates from high-coverage next-generation sequence data.
Wall, Jeffrey D; Tang, Ling Fung; Zerbe, Brandon; Kvale, Mark N; Kwok, Pui-Yan; Schaefer, Catherine; Risch, Neil
2014-11-01
Exome and whole-genome sequencing studies are becoming increasingly common, but little is known about the accuracy of the genotype calls made by the commonly used platforms. Here we use replicate high-coverage sequencing of blood and saliva DNA samples from four European-American individuals to estimate lower bounds on the error rates of Complete Genomics and Illumina HiSeq whole-genome and whole-exome sequencing. Error rates for nonreference genotype calls range from 0.1% to 0.6%, depending on the platform and the depth of coverage. Additionally, we found (1) no difference in the error profiles or rates between blood and saliva samples; (2) Complete Genomics sequences had substantially higher error rates than Illumina sequences had; (3) error rates were higher (up to 6%) for rare or unique variants; (4) error rates generally declined with genotype quality (GQ) score, but in a nonlinear fashion for the Illumina data, likely due to loss of specificity of GQ scores greater than 60; and (5) error rates increased with increasing depth of coverage for the Illumina data. These findings, especially (3)-(5), suggest that caution should be taken in interpreting the results of next-generation sequencing-based association studies, and even more so in clinical application of this technology in the absence of validation by other more robust sequencing or genotyping methods. © 2014 Wall et al.; Published by Cold Spring Harbor Laboratory Press.
Detection of microRNAs in color space.
Marco, Antonio; Griffiths-Jones, Sam
2012-02-01
Deep sequencing provides inexpensive opportunities to characterize the transcriptional diversity of known genomes. The AB SOLiD technology generates millions of short sequencing reads in color-space; that is, the raw data is a sequence of colors, where each color represents 2 nt and each nucleotide is represented by two consecutive colors. This strategy is purported to have several advantages, including increased ability to distinguish sequencing errors from polymorphisms. Several programs have been developed to map short reads to genomes in color space. However, a number of previously unexplored technical issues arise when using SOLiD technology to characterize microRNAs. Here we explore these technical difficulties. First, since the sequenced reads are longer than the biological sequences, every read is expected to contain linker fragments. The color-calling error rate increases toward the 3(') end of the read such that recognizing the linker sequence for removal becomes problematic. Second, mapping in color space may lead to the loss of the first nucleotide of each read. We propose a sequential trimming and mapping approach to map small RNAs. Using our strategy, we reanalyze three published insect small RNA deep sequencing datasets and characterize 22 new microRNAs. A bash shell script to perform the sequential trimming and mapping procedure, called SeqTrimMap, is available at: http://www.mirbase.org/tools/seqtrimmap/ antonio.marco@manchester.ac.uk Supplementary data are available at Bioinformatics online.
Goordial, J; Altshuler, Ianina; Hindson, Katherine; Chan-Yam, Kelly; Marcolefas, Evangelos; Whyte, Lyle G
2017-01-01
Significant progress is being made in the development of the next generation of low cost life detection instrumentation with much smaller size, mass and energy requirements. Here, we describe in situ life detection and sequencing in the field in soils over laying ice wedges in polygonal permafrost terrain on Axel Heiberg Island, located in the Canadian high Arctic (79°26'N), an analog to the polygonal permafrost terrain observed on Mars. The life detection methods used here include (1) the cryo-iPlate for culturing microorganisms using diffusion of in situ nutrients into semi-solid media (2) a Microbial Activity Microassay (MAM) plate (BIOLOG Ecoplate) for detecting viable extant microorganisms through a colourimetric assay, and (3) the Oxford Nanopore MinION for nucleic acid detection and sequencing of environmental samples and the products of MAM plate and cryo-iPlate. We obtained 39 microbial isolates using the cryo-iPlate, which included several putatively novel strains based on the 16S rRNA gene, including a Pedobacter sp. (96% closest similarity in GenBank) which we partially genome sequenced using the MinION. The MAM plate successfully identified an active community capable of L-serine metabolism, which was used for metagenomic sequencing with the MinION to identify the active and enriched community. A metagenome on environmental ice wedge soil samples was completed, with base calling and uplink/downlink carried out via satellite internet. Validation of MinION sequencing using the Illumina MiSeq platform was consistent with the results obtained with the MinION. The instrumentation and technology utilized here is pre-existing, low cost, low mass, low volume, and offers the prospect of equipping micro-rovers and micro-penetrators with aggressive astrobiological capabilities. Since potentially habitable astrobiology targets have been identified (RSLs on Mars, near subsurface water ice on Mars, the plumes and oceans of Europa and Enceladus), future astrobiology missions will certainly target these areas and there is a need for direct life detection instrumentation.
Pan-cancer analysis reveals technical artifacts in TCGA germline variant calls.
Buckley, Alexandra R; Standish, Kristopher A; Bhutani, Kunal; Ideker, Trey; Lasken, Roger S; Carter, Hannah; Harismendy, Olivier; Schork, Nicholas J
2017-06-12
Cancer research to date has largely focused on somatically acquired genetic aberrations. In contrast, the degree to which germline, or inherited, variation contributes to tumorigenesis remains unclear, possibly due to a lack of accessible germline variant data. Here we called germline variants on 9618 cases from The Cancer Genome Atlas (TCGA) database representing 31 cancer types. We identified batch effects affecting loss of function (LOF) variant calls that can be traced back to differences in the way the sequence data were generated both within and across cancer types. Overall, LOF indel calls were more sensitive to technical artifacts than LOF Single Nucleotide Variant (SNV) calls. In particular, whole genome amplification of DNA prior to sequencing led to an artificially increased burden of LOF indel calls, which confounded association analyses relating germline variants to tumor type despite stringent indel filtering strategies. The samples affected by these technical artifacts include all acute myeloid leukemia and practically all ovarian cancer samples. We demonstrate how technical artifacts induced by whole genome amplification of DNA can lead to false positive germline-tumor type associations and suggest TCGA whole genome amplified samples be used with caution. This study draws attention to the need to be sensitive to problems associated with a lack of uniformity in data generation in TCGA data.
A remark on copy number variation detection methods.
Li, Shuo; Dou, Xialiang; Gao, Ruiqi; Ge, Xinzhou; Qian, Minping; Wan, Lin
2018-01-01
Copy number variations (CNVs) are gain and loss of DNA sequence of a genome. High throughput platforms such as microarrays and next generation sequencing technologies (NGS) have been applied for genome wide copy number losses. Although progress has been made in both approaches, the accuracy and consistency of CNV calling from the two platforms remain in dispute. In this study, we perform a deep analysis on copy number losses on 254 human DNA samples, which have both SNP microarray data and NGS data publicly available from Hapmap Project and 1000 Genomes Project respectively. We show that the copy number losses reported from Hapmap Project and 1000 Genome Project only have < 30% overlap, while these reports are required to have cross-platform (e.g. PCR, microarray and high-throughput sequencing) experimental supporting by their corresponding projects, even though state-of-art calling methods were employed. On the other hand, copy number losses are found directly from HapMap microarray data by an accurate algorithm, i.e. CNVhac, almost all of which have lower read mapping depth in NGS data; furthermore, 88% of which can be supported by the sequences with breakpoint in NGS data. Our results suggest the ability of microarray calling CNVs and the possible introduction of false negatives from the unessential requirement of the additional cross-platform supporting. The inconsistency of CNV reports from Hapmap Project and 1000 Genomes Project might result from the inadequate information containing in microarray data, the inconsistent detection criteria, or the filtration effect of cross-platform supporting. The statistical test on CNVs called from CNVhac show that the microarray data can offer reliable CNV reports, and majority of CNV candidates can be confirmed by raw sequences. Therefore, the CNV candidates given by a good caller could be highly reliable without cross-platform supporting, so additional experimental information should be applied in need instead of necessarily.
2011-01-01
Background Sequence homology considerations widely used to transfer functional annotation to uncharacterized protein sequences require special precautions in the case of non-globular sequence segments including membrane-spanning stretches composed of non-polar residues. Simple, quantitative criteria are desirable for identifying transmembrane helices (TMs) that must be included into or should be excluded from start sequence segments in similarity searches aimed at finding distant homologues. Results We found that there are two types of TMs in membrane-associated proteins. On the one hand, there are so-called simple TMs with elevated hydrophobicity, low sequence complexity and extraordinary enrichment in long aliphatic residues. They merely serve as membrane-anchoring device. In contrast, so-called complex TMs have lower hydrophobicity, higher sequence complexity and some functional residues. These TMs have additional roles besides membrane anchoring such as intra-membrane complex formation, ligand binding or a catalytic role. Simple and complex TMs can occur both in single- and multi-membrane-spanning proteins essentially in any type of topology. Whereas simple TMs have the potential to confuse searches for sequence homologues and to generate unrelated hits with seemingly convincing statistical significance, complex TMs contain essential evolutionary information. Conclusion For extending the homology concept onto membrane proteins, we provide a necessary quantitative criterion to distinguish simple TMs (and a sufficient criterion for complex TMs) in query sequences prior to their usage in homology searches based on assessment of hydrophobicity and sequence complexity of the TM sequence segments. Reviewers This article was reviewed by Shamil Sunyaev, L. Aravind and Arcady Mushegian. PMID:22024092
Li, Jonathan Z; Chapman, Brad; Charlebois, Patrick; Hofmann, Oliver; Weiner, Brian; Porter, Alyssa J; Samuel, Reshmi; Vardhanabhuti, Saran; Zheng, Lu; Eron, Joseph; Taiwo, Babafemi; Zody, Michael C; Henn, Matthew R; Kuritzkes, Daniel R; Hide, Winston; Wilson, Cara C; Berzins, Baiba I; Acosta, Edward P; Bastow, Barbara; Kim, Peter S; Read, Sarah W; Janik, Jennifer; Meres, Debra S; Lederman, Michael M; Mong-Kryspin, Lori; Shaw, Karl E; Zimmerman, Louis G; Leavitt, Randi; De La Rosa, Guy; Jennings, Amy
2014-01-01
The impact of raltegravir-resistant HIV-1 minority variants (MVs) on raltegravir treatment failure is unknown. Illumina sequencing offers greater throughput than 454, but sequence analysis tools for viral sequencing are needed. We evaluated Illumina and 454 for the detection of HIV-1 raltegravir-resistant MVs. A5262 was a single-arm study of raltegravir and darunavir/ritonavir in treatment-naïve patients. Pre-treatment plasma was obtained from 5 participants with raltegravir resistance at the time of virologic failure. A control library was created by pooling integrase clones at predefined proportions. Multiplexed sequencing was performed with Illumina and 454 platforms at comparable costs. Illumina sequence analysis was performed with the novel snp-assess tool and 454 sequencing was analyzed with V-Phaser. Illumina sequencing resulted in significantly higher sequence coverage and a 0.095% limit of detection. Illumina accurately detected all MVs in the control library at ≥0.5% and 7/10 MVs expected at 0.1%. 454 sequencing failed to detect any MVs at 0.1% with 5 false positive calls. For MVs detected in the patient samples by both 454 and Illumina, the correlation in the detected variant frequencies was high (R2 = 0.92, P<0.001). Illumina sequencing detected 2.4-fold greater nucleotide MVs and 2.9-fold greater amino acid MVs compared to 454. The only raltegravir-resistant MV detected was an E138K mutation in one participant by Illumina sequencing, but not by 454. In participants of A5262 with raltegravir resistance at virologic failure, baseline raltegravir-resistant MVs were rarely detected. At comparable costs to 454 sequencing, Illumina demonstrated greater depth of coverage, increased sensitivity for detecting HIV MVs, and fewer false positive variant calls.
Scaling exponents for ordered maxima
Ben-Naim, E.; Krapivsky, P. L.; Lemons, N. W.
2015-12-22
We study extreme value statistics of multiple sequences of random variables. For each sequence with N variables, independently drawn from the same distribution, the running maximum is defined as the largest variable to date. We compare the running maxima of m independent sequences and investigate the probability S N that the maxima are perfectly ordered, that is, the running maximum of the first sequence is always larger than that of the second sequence, which is always larger than the running maximum of the third sequence, and so on. The probability S N is universal: it does not depend on themore » distribution from which the random variables are drawn. For two sequences, S N~N –1/2, and in general, the decay is algebraic, S N~N –σm, for large N. We analytically obtain the exponent σ 3≅1.302931 as root of a transcendental equation. Moreover, the exponents σ m grow with m, and we show that σ m~m for large m.« less
Yang, Ming Ru; Zhou, Zhi Jun; Chang, Yan Lin; Zhao, Le Hong
2012-08-01
To help determine whether the typical arthropod arrangement was a synapomorphy for the whole Tettigoniidae, we sequenced the mitochondrial genome (mitogenome) of the quiet-calling katydids, Xizicus fascipes (Orthoptera: Tettigoniidae: Meconematinae). The 16,166-bp nucleotide sequences of X. fascipes mitogenome contains the typical gene content, gene order, base composition, and codon usage found in arthropod mitogenomes. As a whole, the X. fascipes mitogenome contains a lower A+T content (70.2%) found in the complete orthopteran mitogenomes determined to date. All protein-coding genes started with a typical ATN codon. Ten of the 13 protein-coding genes have a complete termination codon, but the remaining three genes (COIII, ND5 and ND4) terminate with incomplete T. All tRNAs have the typical clover-leaf structure of mitogenome tRNA, except for tRNA(Ser(AGN)), in which lengthened anticodon stem (9 bp) with a bulged nuleotide in the middle, an unusual T-stem (6 bp in constrast to the normal 5 bp), a mini DHU arm (2 bp) and no connector nucleotides. In the A+T-rich region, two (TA)n conserved blocks that were previously described in Ensifera and two 150-bp tandem repeats plus a partial copy of the composed at 61 bp of the beginning were present. Phylogenetic analysis found: i) the monophyly of Conocephalinae was interrupted by Elimaea cheni from Phaneropterinae; and ii) Meconematinae was the most basal group among these five subfamilies.
A snapshot of the emerging tomato genome sequence
USDA-ARS?s Scientific Manuscript database
The genome of tomato (Solanum lycopersicum) is being sequenced by an international consortium of 10 countries (Korea, China, the United Kingdom, India, the Netherlands, France, Japan, Spain, Italy and the United States) as part of a larger initiative called the ‘International Solanaceae Genome Proje...
Medical Operations Console Procedure Evaluation: BME Response to Crew Call Down for an Emergency
NASA Technical Reports Server (NTRS)
Johnson-Troop; Pettys, Marianne; Hurst, Victor, IV; Smaka, Todd; Paul, Bonnie; Rosenquist, Kevin; Gast, Karin; Gillis, David; McCulley, Phyllis
2006-01-01
International Space Station (ISS) Mission Operations are managed by multiple flight control disciplines located at the lead Mission Control Center (MCC) at NASA-Johnson Space Center (JSC). ISS Medical Operations are supported by the complementary roles of Flight Surgeons (Surgeon) and Biomedical Engineer (BME) flight controllers. The Surgeon, a board certified physician, oversees all medical concerns of the crew and the BME provides operational and engineering support for Medical Operations Crew Health Care System. ISS Medical Operations is currently addressing the coordinated response to a crew call down for an emergent medical event, in particular when the BME is the only Medical Operations representative in MCC. In this case, the console procedure BME Response to Crew Call Down for an Emergency will be used. The procedure instructs the BME to contact a Surgeon as soon as possible, coordinate with other flight disciplines to establish a Private Medical Conference (PMC) for the crew and Surgeon, gather information from the crew if time permits, and provide Surgeon with pertinent console resources. It is paramount that this procedure is clearly written and easily navigated to assist the BME to respond consistently and efficiently. A total of five BME flight controllers participated in the study. Each BME participant sat in a simulated MCC environment at a console configured with resources specific to the BME MCC console and was presented with two scripted emergency call downs from an ISS crew member. Each participant used the procedure while interacting with analog MCC disciplines to respond to the crew call down. Audio and video recordings of the simulations were analyzed and each BME participant's actions were compared to the procedure. Structured debriefs were conducted at the conclusion of both simulations. The procedure was evaluated for its ability to elicit consistent responses from each BME participant. Trials were examined for deviations in procedure task completion and/or navigation, in particular the execution of the Surgeon call sequence. Debrief comments were used to analyze unclear procedural steps and to discern any discrepancies between the procedure and generally accepted BME actions. The sequence followed by BME participants differed considerably from the sequence intended by the procedure. Common deviations included the call sequence used to contact Surgeon, the content of BME and crew interaction and the gathering of pertinent console resources. Differing perceptions of task priority and imprecise language seem to have caused multiple deviations from the procedure s intended sequence. The study generated 40 recommendations for the procedure, of which 34 are being implemented. These recommendations address improving the clarity of the instructions, identifying training considerations, expediting Surgeon contact, improving cues for anticipated flight control team communication and identifying missing console tools.
FaStore - a space-saving solution for raw sequencing data.
Roguski, Lukasz; Ochoa, Idoia; Hernaez, Mikel; Deorowicz, Sebastian
2018-03-29
The affordability of DNA sequencing has led to the generation of unprecedented volumes of raw sequencing data. These data must be stored, processed, and transmitted, which poses significant challenges. To facilitate this effort, we introduce FaStore, a specialized compressor for FASTQ files. FaStore does not use any reference sequences for compression, and permits the user to choose from several lossy modes to improve the overall compression ratio, depending on the specific needs. FaStore in the lossless mode achieves a significant improvement in compression ratio with respect to previously proposed algorithms. We perform an analysis on the effect that the different lossy modes have on variant calling, the most widely used application for clinical decision making, especially important in the era of precision medicine. We show that lossy compression can offer significant compression gains, while preserving the essential genomic information and without affecting the variant calling performance. FaStore can be downloaded from https://github.com/refresh-bio/FaStore. sebastian.deorowicz@polsl.pl. Supplementary data are available at Bioinformatics online.
Czar, Michael J; Cai, Yizhi; Peccoud, Jean
2009-07-01
Chemical synthesis of custom DNA made to order calls for software streamlining the design of synthetic DNA sequences. GenoCAD (www.genocad.org) is a free web-based application to design protein expression vectors, artificial gene networks and other genetic constructs composed of multiple functional blocks called genetic parts. By capturing design strategies in grammatical models of DNA sequences, GenoCAD guides the user through the design process. By successively clicking on icons representing structural features or actual genetic parts, complex constructs composed of dozens of functional blocks can be designed in a matter of minutes. GenoCAD automatically derives the construct sequence from its comprehensive libraries of genetic parts. Upon completion of the design process, users can download the sequence for synthesis or further analysis. Users who elect to create a personal account on the system can customize their workspace by creating their own parts libraries, adding new parts to the libraries, or reusing designs to quickly generate sets of related constructs.
Cohen, Paul A; Flowers, Nicola; Tong, Stephen; Hannan, Natalie; Pertile, Mark D; Hui, Lisa
2016-08-24
Non-invasive prenatal testing (NIPT) identifies fetal aneuploidy by sequencing cell-free DNA in the maternal plasma. Pre-symptomatic maternal malignancies have been incidentally detected during NIPT based on abnormal genomic profiles. This low coverage sequencing approach could have potential for ovarian cancer screening in the non-pregnant population. Our objective was to investigate whether plasma DNA sequencing with a clinical whole genome NIPT platform can detect early- and late-stage high-grade serous ovarian carcinomas (HGSOC). This is a case control study of prospectively-collected biobank samples comprising preoperative plasma from 32 women with HGSOC (16 'early cancer' (FIGO I-II) and 16 'advanced cancer' (FIGO III-IV)) and 32 benign controls. Plasma DNA from cases and controls were sequenced using a commercial NIPT platform and chromosome dosage measured. Sequencing data were blindly analyzed with two methods: (1) Subchromosomal changes were called using an open source algorithm WISECONDOR (WIthin-SamplE COpy Number aberration DetectOR). Genomic gains or losses ≥ 15 Mb were prespecified as "screen positive" calls, and mapped to recurrent copy number variations reported in an ovarian cancer genome atlas. (2) Selected whole chromosome gains or losses were reported using the routine NIPT pipeline for fetal aneuploidy. We detected 13/32 cancer cases using the subchromosomal analysis (sensitivity 40.6 %, 95 % CI, 23.7-59.4 %), including 6/16 early and 7/16 advanced HGSOC cases. Two of 32 benign controls had subchromosomal gains ≥ 15 Mb (specificity 93.8 %, 95 % CI, 79.2-99.2 %). Twelve of the 13 true positive cancer cases exhibited specific recurrent changes reported in HGSOC tumors. The NIPT pipeline resulted in one "monosomy 18" call from the cancer group, and two "monosomy X" calls in the controls. Low coverage plasma DNA sequencing used for prenatal testing detected 40.6 % of all HGSOC, including 38 % of early stage cases. Our findings demonstrate the potential of a high throughput sequencing platform to screen for early HGSOC in plasma based on characteristic multiple segmental chromosome gains and losses. The performance of this approach may be further improved by refining bioinformatics algorithms and targeting selected cancer copy number variations.
Mapping Ribonucleotides Incorporated into DNA by Hydrolytic End-Sequencing.
Orebaugh, Clinton D; Lujan, Scott A; Burkholder, Adam B; Clausen, Anders R; Kunkel, Thomas A
2018-01-01
Ribonucleotides embedded within DNA render the DNA sensitive to the formation of single-stranded breaks under alkali conditions. Here, we describe a next-generation sequencing method called hydrolytic end sequencing (HydEn-seq) to map ribonucleotides inserted into the genome of Saccharomyce cerevisiae strains deficient in ribonucleotide excision repair. We use this method to map several genomic features in wild-type and replicase variant yeast strains.
Li, Guang-Qing; Liu, Zi; Shen, Hong-Bin; Yu, Dong-Jun
2016-10-01
As one of the most ubiquitous post-transcriptional modifications of RNA, N 6 -methyladenosine ( [Formula: see text]) plays an essential role in many vital biological processes. The identification of [Formula: see text] sites in RNAs is significantly important for both basic biomedical research and practical drug development. In this study, we designed a computational-based method, called TargetM6A, to rapidly and accurately target [Formula: see text] sites solely from the primary RNA sequences. Two new features, i.e., position-specific nucleotide/dinucleotide propensities (PSNP/PSDP), are introduced and combined with the traditional nucleotide composition (NC) feature to formulate RNA sequences. The extracted features are further optimized to obtain a much more compact and discriminative feature subset by applying an incremental feature selection (IFS) procedure. Based on the optimized feature subset, we trained TargetM6A on the training dataset with a support vector machine (SVM) as the prediction engine. We compared the proposed TargetM6A method with existing methods for predicting [Formula: see text] sites by performing stringent jackknife tests and independent validation tests on benchmark datasets. The experimental results show that the proposed TargetM6A method outperformed the existing methods for predicting [Formula: see text] sites and remarkably improved the prediction performances, with MCC = 0.526 and AUC = 0.818. We also provided a user-friendly web server for TargetM6A, which is publicly accessible for academic use at http://csbio.njust.edu.cn/bioinf/TargetM6A.
Easy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiver
Blanquart, François; Golubchik, Tanya; Gall, Astrid; Bakker, Margreet; Bezemer, Daniela; Croucher, Nicholas J; Hall, Matthew; Hillebregt, Mariska; Ratmann, Oliver; Albert, Jan; Bannert, Norbert; Fellay, Jacques; Fransen, Katrien; Gourlay, Annabelle; Grabowski, M Kate; Gunsenheimer-Bartmeyer, Barbara; Günthard, Huldrych F; Kivelä, Pia; Kouyos, Roger; Laeyendecker, Oliver; Liitsola, Kirsi; Meyer, Laurence; Porter, Kholoud; Ristola, Matti; van Sighem, Ard; Cornelissen, Marion; Kellam, Paul; Reiss, Peter
2018-01-01
Abstract Studying the evolution of viruses and their molecular epidemiology relies on accurate viral sequence data, so that small differences between similar viruses can be meaningfully interpreted. Despite its higher throughput and more detailed minority variant data, next-generation sequencing has yet to be widely adopted for HIV. The difficulty of accurately reconstructing the consensus sequence of a quasispecies from reads (short fragments of DNA) in the presence of large between- and within-host diversity, including frequent indels, may have presented a barrier. In particular, mapping (aligning) reads to a reference sequence leads to biased loss of information; this bias can distort epidemiological and evolutionary conclusions. De novo assembly avoids this bias by aligning the reads to themselves, producing a set of sequences called contigs. However contigs provide only a partial summary of the reads, misassembly may result in their having an incorrect structure, and no information is available at parts of the genome where contigs could not be assembled. To address these problems we developed the tool shiver to pre-process reads for quality and contamination, then map them to a reference tailored to the sample using corrected contigs supplemented with the user’s choice of existing reference sequences. Run with two commands per sample, it can easily be used for large heterogeneous data sets. We used shiver to reconstruct the consensus sequence and minority variant information from paired-end short-read whole-genome data produced with the Illumina platform, for sixty-five existing publicly available samples and fifty new samples. We show the systematic superiority of mapping to shiver’s constructed reference compared with mapping the same reads to the closest of 3,249 real references: median values of 13 bases called differently and more accurately, 0 bases called differently and less accurately, and 205 bases of missing sequence recovered. We also successfully applied shiver to whole-genome samples of Hepatitis C Virus and Respiratory Syncytial Virus. shiver is publicly available from https://github.com/ChrisHIV/shiver. PMID:29876136
Murillo, Gabriel H; You, Na; Su, Xiaoquan; Cui, Wei; Reilly, Muredach P; Li, Mingyao; Ning, Kang; Cui, Xinping
2016-05-15
Single nucleotide variant (SNV) detection procedures are being utilized as never before to analyze the recent abundance of high-throughput DNA sequencing data, both on single and multiple sample datasets. Building on previously published work with the single sample SNV caller genotype model selection (GeMS), a multiple sample version of GeMS (MultiGeMS) is introduced. Unlike other popular multiple sample SNV callers, the MultiGeMS statistical model accounts for enzymatic substitution sequencing errors. It also addresses the multiple testing problem endemic to multiple sample SNV calling and utilizes high performance computing (HPC) techniques. A simulation study demonstrates that MultiGeMS ranks highest in precision among a selection of popular multiple sample SNV callers, while showing exceptional recall in calling common SNVs. Further, both simulation studies and real data analyses indicate that MultiGeMS is robust to low-quality data. We also demonstrate that accounting for enzymatic substitution sequencing errors not only improves SNV call precision at low mapping quality regions, but also improves recall at reference allele-dominated sites with high mapping quality. The MultiGeMS package can be downloaded from https://github.com/cui-lab/multigems xinping.cui@ucr.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Ameur, Adam; Bunikis, Ignas; Enroth, Stefan; Gyllensten, Ulf
2014-01-01
CanvasDB is an infrastructure for management and analysis of genetic variants from massively parallel sequencing (MPS) projects. The system stores SNP and indel calls in a local database, designed to handle very large datasets, to allow for rapid analysis using simple commands in R. Functional annotations are included in the system, making it suitable for direct identification of disease-causing mutations in human exome- (WES) or whole-genome sequencing (WGS) projects. The system has a built-in filtering function implemented to simultaneously take into account variant calls from all individual samples. This enables advanced comparative analysis of variant distribution between groups of samples, including detection of candidate causative mutations within family structures and genome-wide association by sequencing. In most cases, these analyses are executed within just a matter of seconds, even when there are several hundreds of samples and millions of variants in the database. We demonstrate the scalability of canvasDB by importing the individual variant calls from all 1092 individuals present in the 1000 Genomes Project into the system, over 4.4 billion SNPs and indels in total. Our results show that canvasDB makes it possible to perform advanced analyses of large-scale WGS projects on a local server. Database URL: https://github.com/UppsalaGenomeCenter/CanvasDB PMID:25281234
Butler, Kimberly S; Young, Megan Y L; Li, Zhihua; Elespuru, Rosalie K; Wood, Steven C
2016-02-01
Next-Generation Sequencing is a rapidly advancing technology that has research and clinical applications. For many cancers, it is important to know the precise mutation(s) present, as specific mutations could indicate or contra-indicate certain treatments as well as be indicative of prognosis. Using the Ion Torrent Personal Genome Machine and the AmpliSeq Cancer Hotspot panel v2, we sequenced two pancreatic cancer cell lines, BxPC-3 and HPAF-II, alone or in mixtures, to determine the error rate, sensitivity, and reproducibility of this system. The system resulted in coverage averaging 2000× across the various amplicons and was able to reliably and reproducibly identify mutations present at a rate of 5%. Identification of mutations present at a lower rate was possible by altering the parameters by which calls were made, but with an increase in erroneous, low-level calls. The panel was able to identify known mutations in these cell lines that are present in the COSMIC database. In addition, other, novel mutations were also identified that may prove clinically useful. The system was assessed for systematic errors such as homopolymer effects, end of amplicon effects and patterns in NO CALL sequence. Overall, the system is adequate at identifying the known, targeted mutations in the panel. Published by Elsevier Inc.
Ameur, Adam; Bunikis, Ignas; Enroth, Stefan; Gyllensten, Ulf
2014-01-01
CanvasDB is an infrastructure for management and analysis of genetic variants from massively parallel sequencing (MPS) projects. The system stores SNP and indel calls in a local database, designed to handle very large datasets, to allow for rapid analysis using simple commands in R. Functional annotations are included in the system, making it suitable for direct identification of disease-causing mutations in human exome- (WES) or whole-genome sequencing (WGS) projects. The system has a built-in filtering function implemented to simultaneously take into account variant calls from all individual samples. This enables advanced comparative analysis of variant distribution between groups of samples, including detection of candidate causative mutations within family structures and genome-wide association by sequencing. In most cases, these analyses are executed within just a matter of seconds, even when there are several hundreds of samples and millions of variants in the database. We demonstrate the scalability of canvasDB by importing the individual variant calls from all 1092 individuals present in the 1000 Genomes Project into the system, over 4.4 billion SNPs and indels in total. Our results show that canvasDB makes it possible to perform advanced analyses of large-scale WGS projects on a local server. Database URL: https://github.com/UppsalaGenomeCenter/CanvasDB. © The Author(s) 2014. Published by Oxford University Press.
Diffusion modulation of DNA by toehold exchange
NASA Astrophysics Data System (ADS)
Rodjanapanyakul, Thanapop; Takabatake, Fumi; Abe, Keita; Kawamata, Ibuki; Nomura, Shinichiro M.; Murata, Satoshi
2018-05-01
We propose a method to control the diffusion speed of DNA molecules with a target sequence in a polymer solution. The interaction between solute DNA and diffusion-suppressing DNA that has been anchored to a polymer matrix is modulated by the concentration of the third DNA molecule called the competitor by a mechanism called toehold exchange. Experimental results show that the sequence-specific modulation of the diffusion coefficient is successfully achieved. The diffusion coefficient can be modulated up to sixfold by changing the concentration of the competitor. The specificity of the modulation is also verified under the coexistence of a set of DNA with noninteracting base sequences. With this mechanism, we are able to control the diffusion coefficient of individual DNA species by the concentration of another DNA species. This methodology introduces a programmability to a DNA-based reaction-diffusion system.
Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing
Matochko, Wadim L.; Derda, Ratmir
2013-01-01
Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N × 1 frequency vector n = ||ni||, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N × N matrix and a stochastic sampling operator (S a). The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of S a and use them to define the sequencing operator (S e q). Sequencing without any bias and errors is S e q = S a IN, where IN is a N × N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (C E N), which describes elimination or statistically significant downsampling, of specific reads during the sequencing process. PMID:24416071
The Planning, Implementation, and Movement of an Academic Library Collection.
ERIC Educational Resources Information Center
Kurkul, Donna Lee
1983-01-01
Discusses methodology, logistics, and time/cost study of planning, implementation, and relocation of 682,810 volume Smith College Library collection into its newly constructed and renovated facility. Call number sequence location, collection movement phasing and formulas for sequence distribution, and personnel requirements are noted. Elementary…
Quantum Point Contact Single-Nucleotide Conductance for DNA and RNA Sequence Identification.
Afsari, Sepideh; Korshoj, Lee E; Abel, Gary R; Khan, Sajida; Chatterjee, Anushree; Nagpal, Prashant
2017-11-28
Several nanoscale electronic methods have been proposed for high-throughput single-molecule nucleic acid sequence identification. While many studies display a large ensemble of measurements as "electronic fingerprints" with some promise for distinguishing the DNA and RNA nucleobases (adenine, guanine, cytosine, thymine, and uracil), important metrics such as accuracy and confidence of base calling fall well below the current genomic methods. Issues such as unreliable metal-molecule junction formation, variation of nucleotide conformations, insufficient differences between the molecular orbitals responsible for single-nucleotide conduction, and lack of rigorous base calling algorithms lead to overlapping nanoelectronic measurements and poor nucleotide discrimination, especially at low coverage on single molecules. Here, we demonstrate a technique for reproducible conductance measurements on conformation-constrained single nucleotides and an advanced algorithmic approach for distinguishing the nucleobases. Our quantum point contact single-nucleotide conductance sequencing (QPICS) method uses combed and electrostatically bound single DNA and RNA nucleotides on a self-assembled monolayer of cysteamine molecules. We demonstrate that by varying the applied bias and pH conditions, molecular conductance can be switched ON and OFF, leading to reversible nucleotide perturbation for electronic recognition (NPER). We utilize NPER as a method to achieve >99.7% accuracy for DNA and RNA base calling at low molecular coverage (∼12×) using unbiased single measurements on DNA/RNA nucleotides, which represents a significant advance compared to existing sequencing methods. These results demonstrate the potential for utilizing simple surface modifications and existing biochemical moieties in individual nucleobases for a reliable, direct, single-molecule, nanoelectronic DNA and RNA nucleotide identification method for sequencing.
Castillo, J-M; Agard, C; Artifoni, M; Brisseau, J-M; Connault, J; Durant, C; Espitia, O; Masseau, A; Neel, A; Perrin, F; Pistorius, M-A; Planchon, B; Ponge, T; Hamidou, M; Pottier, P
2016-05-01
Clinical reasoning and treatment challenges within the scope of general practice led to the development of an internal medicine assistance line provided by Nantes University Hospital. The primary outcome of this study was to describe callers' profile, their requests and answers provided. A prospective, cross-sectional, observational, descriptive study was undertaken. For each call were identified the calling physician, her/his specialty and work setting, the call's object and adequacy, the answer provided, the time needed to connect with the assistance line, the time devoted by the internal medicine physician to provide an answer to the request, and whether the assistance line prevented a visit to the emergency room. Each calling physician was then called back to obtain demographic and professional characteristics, and data relating to the call and to the assistance line. Sixty-three days were analyzed and 276 calls identified. The 237 identified calling physicians were mainly females (54%, n=93), with a mean age of 46 years, graduated from Nantes University (65%, n=86), practicing ambulatory general medicine (69%, n=164) in Loire-Atlantique department area (82%, n=176) for a mean duration of 15 years. Calls were mostly associated with diagnostic challenges (61%, n=166) concerning clinical issues (57%, n=155). A sole telephone advice was the main type of answer provided (56%, n=147) and a visit to the emergency room was prevented for 17% of calls. The assistance line activity is adequate with its missions and seems to facilitate patients' healthcare delivery advocating for the development of similar structures in other units. Improvements relating to the information, availability and physicians' training should be considered. Copyright © 2015 Société nationale française de médecine interne (SNFMI). Published by Elsevier SAS. All rights reserved.
Kelsen, Judith R; Dawany, Noor; Moran, Christopher J; Petersen, Britt-Sabina; Sarmady, Mahdi; Sasson, Ariella; Pauly-Hubbard, Helen; Martinez, Alejandro; Maurer, Kelly; Soong, Joanne; Rappaport, Eric; Franke, Andre; Keller, Andreas; Winter, Harland S; Mamula, Petar; Piccoli, David; Artis, David; Sonnenberg, Gregory F; Daly, Mark; Sullivan, Kathleen E; Baldassano, Robert N; Devoto, Marcella
2015-11-01
Very early onset inflammatory bowel disease (VEO-IBD), IBD diagnosed at 5 years of age or younger, frequently presents with a different and more severe phenotype than older-onset IBD. We investigated whether patients with VEO-IBD carry rare or novel variants in genes associated with immunodeficiencies that might contribute to disease development. Patients with VEO-IBD and parents (when available) were recruited from the Children's Hospital of Philadelphia from March 2013 through July 2014. We analyzed DNA from 125 patients with VEO-IBD (age, 3 wk to 4 y) and 19 parents, 4 of whom also had IBD. Exome capture was performed by Agilent SureSelect V4, and sequencing was performed using the Illumina HiSeq platform. Alignment to human genome GRCh37 was achieved followed by postprocessing and variant calling. After functional annotation, candidate variants were analyzed for change in protein function, minor allele frequency less than 0.1%, and scaled combined annotation-dependent depletion scores of 10 or less. We focused on genes associated with primary immunodeficiencies and related pathways. An additional 210 exome samples from patients with pediatric IBD (n = 45) or adult-onset Crohn's disease (n = 20) and healthy individuals (controls, n = 145) were obtained from the University of Kiel, Germany, and used as control groups. Four hundred genes and regions associated with primary immunodeficiency, covering approximately 6500 coding exons totaling more than 1 Mbp of coding sequence, were selected from the whole-exome data. Our analysis showed novel and rare variants within these genes that could contribute to the development of VEO-IBD, including rare heterozygous missense variants in IL10RA and previously unidentified variants in MSH5 and CD19. In an exome sequence analysis of patients with VEO-IBD and their parents, we identified variants in genes that regulate B- and T-cell functions and could contribute to pathogenesis. Our analysis could lead to the identification of previously unidentified IBD-associated variants. Copyright © 2015 AGA Institute. Published by Elsevier Inc. All rights reserved.
On the dynamics of composition of entire functions
NASA Astrophysics Data System (ADS)
Singh, Anand Prakash
2003-01-01
Let f be an entire function. For n in {bb N}, let f(n) denote the nth iterate of f. The set [ F(f)=\\{z:(f^n) is normal in some neighbourhood of z\\} ] is the Fatou set or the set of normality and its complement J(f) is the Julia set. If U is a component of F(f), then f(U) lies in some component V of F(f). If U_ncap U_m=phi for n ≠ m where U_n denotes the component of F(f) which contains f(n(U)) , then U is called a wandering domain, else U is called a pre-periodic domain, and if U_n = U for some n in {bb N} then U is called periodic domain.
In silico segmentations of lentivirus envelope sequences
Boissin-Quillon, Aurélia; Piau, Didier; Leroux, Caroline
2007-01-01
Background The gene encoding the envelope of lentiviruses exhibits a considerable plasticity, particularly the region which encodes the surface (SU) glycoprotein. Interestingly, mutations do not appear uniformly along the sequence of SU, but they are clustered in restricted areas, called variable (V) regions, which are interspersed with relatively more stable regions, called constant (C) regions. We look for specific signatures of C/V regions, using hidden Markov models constructed with SU sequences of the equine, human, small ruminant and simian lentiviruses. Results Our models yield clear and accurate delimitations of the C/V regions, when the test set and the training set were made up of sequences of the same lentivirus, but also when they were made up of sequences of different lentiviruses. Interestingly, the models predicted the different regions of lentiviruses such as the bovine and feline lentiviruses, not used in the training set. Models based on composite training sets produce accurate segmentations of sequences of all these lentiviruses. Conclusion Our results suggest that each C/V region has a specific statistical oligonucleotide composition, and that the C (respectively V) regions of one of these lentiviruses are statistically more similar to the C (respectively V) regions of the other lentiviruses, than to the V (respectively C) regions of the same lentivirus. PMID:17376229
Using Next Generation Sequencing for Multiplexed Trait-Linked Markers in Wheat
Bernardo, Amy; Wang, Shan; St. Amand, Paul; Bai, Guihua
2015-01-01
With the advent of next generation sequencing (NGS) technologies, single nucleotide polymorphisms (SNPs) have become the major type of marker for genotyping in many crops. However, the availability of SNP markers for important traits of bread wheat ( Triticum aestivum L.) that can be effectively used in marker-assisted selection (MAS) is still limited and SNP assays for MAS are usually uniplex. A shift from uniplex to multiplex assays will allow the simultaneous analysis of multiple markers and increase MAS efficiency. We designed 33 locus-specific markers from SNP or indel-based marker sequences that linked to 20 different quantitative trait loci (QTL) or genes of agronomic importance in wheat and analyzed the amplicon sequences using an Ion Torrent Proton Sequencer and a custom allele detection pipeline to determine the genotypes of 24 selected germplasm accessions. Among the 33 markers, 27 were successfully multiplexed and 23 had 100% SNP call rates. Results from analysis of "kompetitive allele-specific PCR" (KASP) and sequence tagged site (STS) markers developed from the same loci fully verified the genotype calls of 23 markers. The NGS-based multiplexed assay developed in this study is suitable for rapid and high-throughput screening of SNPs and some indel-based markers in wheat. PMID:26625271
Di Pietro, C; Di Pietro, V; Emmanuele, G; Ferro, A; Maugeri, T; Modica, E; Pigola, G; Pulvirenti, A; Purrello, M; Ragusa, M; Scalia, M; Shasha, D; Travali, S; Zimmitti, V
2003-01-01
In this paper we present a new Multiple Sequence Alignment (MSA) algorithm called AntiClusAl. The method makes use of the commonly use idea of aligning homologous sequences belonging to classes generated by some clustering algorithm, and then continue the alignment process ina bottom-up way along a suitable tree structure. The final result is then read at the root of the tree. Multiple sequence alignment in each cluster makes use of the progressive alignment with the 1-median (center) of the cluster. The 1-median of set S of sequences is the element of S which minimizes the average distance from any other sequence in S. Its exact computation requires quadratic time. The basic idea of our proposed algorithm is to make use of a simple and natural algorithmic technique based on randomized tournaments which has been successfully applied to large size search problems in general metric spaces. In particular a clustering algorithm called Antipole tree and an approximate linear 1-median computation are used. Our algorithm compared with Clustal W, a widely used tool to MSA, shows a better running time results with fully comparable alignment quality. A successful biological application showing high aminoacid conservation during evolution of Xenopus laevis SOD2 is also cited.
Empirical Validation of Pooled Whole Genome Population Re-Sequencing in Drosophila melanogaster
Zhu, Yuan; Bergland, Alan O.; González, Josefa; Petrov, Dmitri A.
2012-01-01
The sequencing of pooled non-barcoded individuals is an inexpensive and efficient means of assessing genome-wide population allele frequencies, yet its accuracy has not been thoroughly tested. We assessed the accuracy of this approach on whole, complex eukaryotic genomes by resequencing pools of largely isogenic, individually sequenced Drosophila melanogaster strains. We called SNPs in the pooled data and estimated false positive and false negative rates using the SNPs called in individual strain as a reference. We also estimated allele frequency of the SNPs using “pooled” data and compared them with “true” frequencies taken from the estimates in the individual strains. We demonstrate that pooled sequencing provides a faithful estimate of population allele frequency with the error well approximated by binomial sampling, and is a reliable means of novel SNP discovery with low false positive rates. However, a sufficient number of strains should be used in the pooling because variation in the amount of DNA derived from individual strains is a substantial source of noise when the number of pooled strains is low. Our results and analysis confirm that pooled sequencing is a very powerful and cost-effective technique for assessing of patterns of sequence variation in populations on genome-wide scales, and is applicable to any dataset where sequencing individuals or individual cells is impossible, difficult, time consuming, or expensive. PMID:22848651
Nonpareil 3: Fast Estimation of Metagenomic Coverage and Sequence Diversity.
Rodriguez-R, Luis M; Gunturu, Santosh; Tiedje, James M; Cole, James R; Konstantinidis, Konstantinos T
2018-01-01
Estimations of microbial community diversity based on metagenomic data sets are affected, often to an unknown degree, by biases derived from insufficient coverage and reference database-dependent estimations of diversity. For instance, the completeness of reference databases cannot be generally estimated since it depends on the extant diversity sampled to date, which, with the exception of a few habitats such as the human gut, remains severely undersampled. Further, estimation of the degree of coverage of a microbial community by a metagenomic data set is prohibitively time-consuming for large data sets, and coverage values may not be directly comparable between data sets obtained with different sequencing technologies. Here, we extend Nonpareil, a database-independent tool for the estimation of coverage in metagenomic data sets, to a high-performance computing implementation that scales up to hundreds of cores and includes, in addition, a k -mer-based estimation as sensitive as the original alignment-based version but about three hundred times as fast. Further, we propose a metric of sequence diversity ( N d ) derived directly from Nonpareil curves that correlates well with alpha diversity assessed by traditional metrics. We use this metric in different experiments demonstrating the correlation with the Shannon index estimated on 16S rRNA gene profiles and show that N d additionally reveals seasonal patterns in marine samples that are not captured by the Shannon index and more precise rankings of the magnitude of diversity of microbial communities in different habitats. Therefore, the new version of Nonpareil, called Nonpareil 3, advances the toolbox for metagenomic analyses of microbiomes. IMPORTANCE Estimation of the coverage provided by a metagenomic data set, i.e., what fraction of the microbial community was sampled by DNA sequencing, represents an essential first step of every culture-independent genomic study that aims to robustly assess the sequence diversity present in a sample. However, estimation of coverage remains elusive because of several technical limitations associated with high computational requirements and limiting statistical approaches to quantify diversity. Here we described Nonpareil 3, a new bioinformatics algorithm that circumvents several of these limitations and thus can facilitate culture-independent studies in clinical or environmental settings, independent of the sequencing platform employed. In addition, we present a new metric of sequence diversity based on rarefied coverage and demonstrate its use in communities from diverse ecosystems.
Fast imputation using medium- or low-coverage sequence data
USDA-ARS?s Scientific Manuscript database
Direct imputation from raw sequence reads can be more accurate than calling genotypes first and then imputing, especially if read depth is low or error rates high, but different imputation strategies are required than those used for data from genotyping chips. A fast algorithm to impute from lower t...
USDA-ARS?s Scientific Manuscript database
We describe a suite of software tools for identifying possible functional changes in gene structure that may result from sequence variants. ACE (“Assessing Changes to Exons”) converts phased genotype calls to a collection of explicit haplotype sequences, maps transcript annotations onto them, detect...
A new species of Pseudopaludicola (Anura, Leiuperinae) from Espírito Santo, Brazil
Baldo, Diego; Pupin, Nadya; Gasparini, João Luiz; Baptista Haddad, Célio F.
2018-01-01
We describe a new anuran species of the genus Pseudopaludicola that inhabits sandy areas in resting as associated to the Atlantic Forest biome in the state of Espírito Santo, Brazil. The new species is characterized by: SVL 11.7–14.6 mm in males, 14.0–16.7 mm in females; body slender; fingertips knobbed, with a central groove; hindlimbs short; abdominal fold complete; arytenoid cartilages wide; prepollex with base and two segments; prehallux with base and one segment; frontoparietal fontanelle partially exposed; advertisement call with one note composed of two isolated pulses per call; call dominant frequency ranging 4,380–4,884 Hz; diploid chromosome number 22; and Ag-NORs on 8q subterminal. In addition, its 16S rDNA sequence shows high genetic distances when compared to sequences of related species, which provides strong evidence that the new species is an independent lineage. PMID:29785347
Zhang, Xiao-Xuan; Cong, Wei; Liu, Guo-Hua; Ni, Xiao-Ting; Ma, Jian-Gang; Zheng, Wen-Bin; Zhao, Quan; Zhu, Xing-Quan
2016-03-01
Enterocytozoon bieneusi is one of the most important zoonotic pathogen that can infect almost all animals, including humans. However, little information is available regarding prevalence and genotypes of E. bieneusi in sika deer. In the present study, the prevalence of E. bieneusi infection in sika deer in Jilin province, Northeastern China was examined using PCR amplification of the internal transcribed spacer (ITS) region of the ribosomal RNA (rRNA) gene. 23 (7.06%) of 326 samples were tested E. bieneusi-positive, and the risk factor significantly associated with E. bieneusi prevalence was the age of sika deer. Sequence analysis of the ITS rRNA gene suggested that 8 genotypes of E. bieneusi were found in this study, with five known genotypes, namely J (n = 11), BEB6 (n = 4), EbpC (n = 1), CHN-DC1 (n = 1), KIN-1 (n = 1) and three novel genotypes, namely JLD-1 (n = 2), JLD-2 (n = 2) and JLD-3 (n = 1). Phylogenetic analysis indicated that genotypes CHN-DC-1, KIN-1, EbpC, JLD-2 and JLD-3 fell into group 1, while other three genotypes (genotypes J, BEB6 and JLD-1) were clustered into group 2 (so-called bovine-specific groups). These findings indicated the presence of zoonotic E. bieneusi in Jilin province, Northeastern China. Effective strategies should be performed to control E. bieneusi infection in sika deer, other animals and humans.
Leveraging Call Center Logs for Customer Behavior Prediction
NASA Astrophysics Data System (ADS)
Parvathy, Anju G.; Vasudevan, Bintu G.; Kumar, Abhishek; Balakrishnan, Rajesh
Most major businesses use business process outsourcing for performing a process or a part of a process including financial services like mortgage processing, loan origination, finance and accounting and transaction processing. Call centers are used for the purpose of receiving and transmitting a large volume of requests through outbound and inbound calls to customers on behalf of a business. In this paper we deal specifically with the call centers notes from banks. Banks as financial institutions provide loans to non-financial businesses and individuals. Their call centers act as the nuclei of their client service operations and log the transactions between the customer and the bank. This crucial conversation or information can be exploited for predicting a customer’s behavior which will in turn help these businesses to decide on the next action to be taken. Thus the banks save considerable time and effort in tracking delinquent customers to ensure minimum subsequent defaulters. Majority of the time the call center notes are very concise and brief and often the notes are misspelled and use many domain specific acronyms. In this paper we introduce a novel domain specific spelling correction algorithm which corrects the misspelled words in the call center logs to meaningful ones. We also discuss a procedure that builds the behavioral history sequences for the customers by categorizing the logs into one of the predefined behavioral states. We then describe a pattern based predictive algorithm that uses temporal behavioral patterns mined from these sequences to predict the customer’s next behavioral state.
Lee, Chi-Ching; Chen, Yi-Ping Phoebe; Yao, Tzu-Jung; Ma, Cheng-Yu; Lo, Wei-Cheng; Lyu, Ping-Chiang; Tang, Chuan Yi
2013-04-10
Sequencing of microbial genomes is important because of microbial-carrying antibiotic and pathogenetic activities. However, even with the help of new assembling software, finishing a whole genome is a time-consuming task. In most bacteria, pathogenetic or antibiotic genes are carried in genomic islands. Therefore, a quick genomic island (GI) prediction method is useful for ongoing sequencing genomes. In this work, we built a Web server called GI-POP (http://gipop.life.nthu.edu.tw) which integrates a sequence assembling tool, a functional annotation pipeline, and a high-performance GI predicting module, in a support vector machine (SVM)-based method called genomic island genomic profile scanning (GI-GPS). The draft genomes of the ongoing genome projects in contigs or scaffolds can be submitted to our Web server, and it provides the functional annotation and highly probable GI-predicting results. GI-POP is a comprehensive annotation Web server designed for ongoing genome project analysis. Researchers can perform annotation and obtain pre-analytic information include possible GIs, coding/non-coding sequences and functional analysis from their draft genomes. This pre-analytic system can provide useful information for finishing a genome sequencing project. Copyright © 2012 Elsevier B.V. All rights reserved.
NASA Technical Reports Server (NTRS)
Grasso, Christopher; Page, Dennis; O'Reilly, Taifun; Fteichert, Ralph; Lock, Patricia; Lin, Imin; Naviaux, Keith; Sisino, John
2005-01-01
Virtual Machine Language (VML) is a mission-independent, reusable software system for programming for spacecraft operations. Features of VML include a rich set of data types, named functions, parameters, IF and WHILE control structures, polymorphism, and on-the-fly creation of spacecraft commands from calculated values. Spacecraft functions can be abstracted into named blocks that reside in files aboard the spacecraft. These named blocks accept parameters and execute in a repeatable fashion. The sizes of uplink products are minimized by the ability to call blocks that implement most of the command steps. This block approach also enables some autonomous operations aboard the spacecraft, such as aerobraking, telemetry conditional monitoring, and anomaly response, without developing autonomous flight software. Operators on the ground write blocks and command sequences in a concise, high-level, human-readable programming language (also called VML ). A compiler translates the human-readable blocks and command sequences into binary files (the operations products). The flight portion of VML interprets the uplinked binary files. The ground subsystem of VML also includes an interactive sequence- execution tool hosted on workstations, which runs sequences at several thousand times real-time speed, affords debugging, and generates reports. This tool enables iterative development of blocks and sequences within times of the order of seconds.
Robustness of Massively Parallel Sequencing Platforms
Kavak, Pınar; Yüksel, Bayram; Aksu, Soner; Kulekci, M. Oguzhan; Güngör, Tunga; Hach, Faraz; Şahinalp, S. Cenk; Alkan, Can; Sağıroğlu, Mahmut Şamil
2015-01-01
The improvements in high throughput sequencing technologies (HTS) made clinical sequencing projects such as ClinSeq and Genomics England feasible. Although there are significant improvements in accuracy and reproducibility of HTS based analyses, the usability of these types of data for diagnostic and prognostic applications necessitates a near perfect data generation. To assess the usability of a widely used HTS platform for accurate and reproducible clinical applications in terms of robustness, we generated whole genome shotgun (WGS) sequence data from the genomes of two human individuals in two different genome sequencing centers. After analyzing the data to characterize SNPs and indels using the same tools (BWA, SAMtools, and GATK), we observed significant number of discrepancies in the call sets. As expected, the most of the disagreements between the call sets were found within genomic regions containing common repeats and segmental duplications, albeit only a small fraction of the discordant variants were within the exons and other functionally relevant regions such as promoters. We conclude that although HTS platforms are sufficiently powerful for providing data for first-pass clinical tests, the variant predictions still need to be confirmed using orthogonal methods before using in clinical applications. PMID:26382624
ERIC Educational Resources Information Center
Wood, Marianne
2007-01-01
This article presents a lesson called Memory Palaces. A memory palace is a memory tool used to remember information, usually as visual images, in a sequence that is logical to the person remembering it. In his book, "In the Palaces of Memory", George Johnson calls them "...structure(s) for arranging knowledge. Lots of connections to language arts,…
Ikeda, Minami; Kobayashi, Tamaki; Arai, Shinpei; Mukai, Saki; Takezawa, Yuka; Terasawa, Fumiko; Okumura, Nobuo
2014-08-01
We examined a 6-month-old girl with inherited fibrinogen abnormality and no history of bleeding or thrombosis. Routine coagulation screening tests showed a markedly low level of plasma fibrinogen determined by functional measurement and also a low level by antigenic measurement (functional/antigenic ratio=0.295), suggesting hypodysfibrinogenemia. DNA sequence analysis was performed, and γT305A fibrinogen was synthesized in Chinese hamster ovary cells based on the results. We then functionally analyzed and compared with that of nearby recombinant γN308K fibrinogen. DNA sequence analysis revealed a heterozygous γT305A substitution (mature protein residue number). The γT305A fibrinogen indicated markedly impaired thrombin-catalyzed fibrin polymerization both in the presence or absence of 1mM calcium ion compared with that of γN308K fibrinogen. Protection of plasmin degradation in the presence of calcium ion or Gly-Pro-Arg-Pro peptide (analogue for so-called knob 'A') and factor XIIIa-catalyzed fibrinogen crosslinking demonstrated that the calcium binding sites, hole 'a' and D:D interaction sites were all markedly impaired, whereas γN308Kwas impaired at the latter two sites. Molecular modeling demonstrated that γT305 is localized at a shorter distance than γN308 from the high affinity calcium binding site and hole 'a'. Our findings suggest that γT305 might be important for construction of the overall structure of the γ module of fibrinogen. Substitution of γT305A leads to both dysfibrinogenemic and hypofibrinogenemic characterization, namely hypodysfibrinogenemia. We have already reported that recombinant γT305A fibrinogen was synthesized normally and secreted slightly, but was significantly reduced. Copyright © 2014 Elsevier Ltd. All rights reserved.
Flexible DNA binding of the BTB/POZ-domain protein FBI-1.
Pessler, Frank; Hernandez, Nouria
2003-08-01
POZ-domain transcription factors are characterized by the presence of a protein-protein interaction domain called the POZ or BTB domain at their N terminus and zinc fingers at their C terminus. Despite the large number of POZ-domain transcription factors that have been identified to date and the significant insights that have been gained into their cellular functions, relatively little is known about their DNA binding properties. FBI-1 is a BTB/POZ-domain protein that has been shown to modulate HIV-1 Tat trans-activation and to repress transcription of some cellular genes. We have used various viral and cellular FBI-1 binding sites to characterize the interaction of a POZ-domain protein with DNA in detail. We find that FBI-1 binds to inverted sequence repeats downstream of the HIV-1 transcription start site. Remarkably, it binds efficiently to probes carrying these repeats in various orientations and spacings with no particular rotational alignment, indicating that its interaction with DNA is highly flexible. Indeed, FBI-1 binding sites in the adenovirus 2 major late promoter, the c-fos gene, and the c-myc P1 and P2 promoters reveal variously spaced direct, inverted, and everted sequence repeats with the consensus sequence G(A/G)GGG(T/C)(C/T)(T/C)(C/T) for each repeat.
Hou, Lin; Sun, Ning; Mane, Shrikant; Sayward, Fred; Rajeevan, Nallakkandi; Cheung, Kei-Hoi; Cho, Kelly; Pyarajan, Saiju; Aslan, Mihaela; Miller, Perry; Harvey, Philip D.; Gaziano, J. Michael; Concato, John; Zhao, Hongyu
2017-01-01
A key step in genomic studies is to assess high throughput measurements across millions of markers for each participant’s DNA, either using microarrays or sequencing techniques. Accurate genotype calling is essential for downstream statistical analysis of genotype-phenotype associations, and next generation sequencing (NGS) has recently become a more common approach in genomic studies. How the accuracy of variant calling in NGS-based studies affects downstream association analysis has not, however, been studied using empirical data in which both microarrays and NGS were available. In this article, we investigate the impact of variant calling errors on the statistical power to identify associations between single nucleotides and disease, and on associations between multiple rare variants and disease. Both differential and nondifferential genotyping errors are considered. Our results show that the power of burden tests for rare variants is strongly influenced by the specificity in variant calling, but is rather robust with regard to sensitivity. By using the variant calling accuracies estimated from a substudy of a Cooperative Studies Program project conducted by the Department of Veterans Affairs, we show that the power of association tests is mostly retained with commonly adopted variant calling pipelines. An R package, GWAS.PC, is provided to accommodate power analysis that takes account of genotyping errors (http://zhaocenter.org/software/). PMID:28019059
2018-01-01
Human vocal development is dependent on learning by imitation through social feedback between infants and caregivers. Recent studies have revealed that vocal development is also influenced by parental feedback in marmoset monkeys, suggesting vocal learning mechanisms in nonhuman primates. Marmoset infants that experience more contingent vocal feedback than their littermates develop vocalizations more rapidly, and infant marmosets with limited parental interaction exhibit immature vocal behavior beyond infancy. However, it is yet unclear whether direct parental interaction is an obligate requirement for proper vocal development because all monkeys in the aforementioned studies were able to produce the adult call repertoire after infancy. Using quantitative measures to compare distinct call parameters and vocal sequence structure, we show that social interaction has a direct impact not only on the maturation of the vocal behavior but also on acoustic call structures during vocal development. Monkeys with limited parental interaction during development show systematic differences in call entropy, a measure for maturity, compared with their normally raised siblings. In addition, different call types were occasionally uttered in motif-like sequences similar to those exhibited by vocal learners, such as birds and humans, in early vocal development. These results indicate that a lack of parental interaction leads to long-term disturbances in the acoustic structure of marmoset vocalizations, suggesting an imperative role for social interaction in proper primate vocal development. PMID:29651461
Gultekin, Yasemin B; Hage, Steffen R
2018-04-01
Human vocal development is dependent on learning by imitation through social feedback between infants and caregivers. Recent studies have revealed that vocal development is also influenced by parental feedback in marmoset monkeys, suggesting vocal learning mechanisms in nonhuman primates. Marmoset infants that experience more contingent vocal feedback than their littermates develop vocalizations more rapidly, and infant marmosets with limited parental interaction exhibit immature vocal behavior beyond infancy. However, it is yet unclear whether direct parental interaction is an obligate requirement for proper vocal development because all monkeys in the aforementioned studies were able to produce the adult call repertoire after infancy. Using quantitative measures to compare distinct call parameters and vocal sequence structure, we show that social interaction has a direct impact not only on the maturation of the vocal behavior but also on acoustic call structures during vocal development. Monkeys with limited parental interaction during development show systematic differences in call entropy, a measure for maturity, compared with their normally raised siblings. In addition, different call types were occasionally uttered in motif-like sequences similar to those exhibited by vocal learners, such as birds and humans, in early vocal development. These results indicate that a lack of parental interaction leads to long-term disturbances in the acoustic structure of marmoset vocalizations, suggesting an imperative role for social interaction in proper primate vocal development.
Novel division level bacterial diversity in a Yellowstone hot spring.
Hugenholtz, P; Pitulle, C; Hershberger, K L; Pace, N R
1998-01-01
A culture-independent molecular phylogenetic survey was carried out for the bacterial community in Obsidian Pool (OP), a Yellowstone National Park hot spring previously shown to contain remarkable archaeal diversity (S. M. Barns, R. E. Fundyga, M. W. Jeffries, and N. R. Page, Proc. Natl. Acad. Sci. USA 91:1609-1613, 1994). Small-subunit rRNA genes (rDNA) were amplified directly from OP sediment DNA by PCR with universally conserved or Bacteria-specific rDNA primers and cloned. Unique rDNA types among > 300 clones were identified by restriction fragment length polymorphism, and 122 representative rDNA sequences were determined. These were found to represent 54 distinct bacterial sequence types or clusters (> or = 98% identity) of sequences. A majority (70%) of the sequence types were affiliated with 14 previously recognized bacterial divisions (main phyla; kingdoms); 30% were unaffiliated with recognized bacterial divisions. The unaffiliated sequence types (represented by 38 sequences) nominally comprise 12 novel, division level lineages termed candidate divisions. Several OP sequences were nearly identical to those of cultivated chemolithotrophic thermophiles, including the hydrogen-oxidizing Calderobacterium and the sulfate reducers Thermodesulfovibrio and Thermodesulfobacterium, or belonged to monophyletic assemblages recognized for a particular type of metabolism, such as the hydrogen-oxidizing Aquificales and the sulfate-reducing delta-Proteobacteria. The occurrence of such organisms is consistent with the chemical composition of OP (high in reduced iron and sulfur) and suggests a lithotrophic base for primary productivity in this hot spring, through hydrogen oxidation and sulfate reduction. Unexpectedly, no archaeal sequences were encountered in OP clone libraries made with universal primers. Hybridization analysis of amplified OP DNA with domain-specific probes confirmed that the analyzed community rDNA from OP sediment was predominantly bacterial. These results expand substantially our knowledge of the extent of bacterial diversity and call into question the commonly held notion that Archaea dominate hydrothermal environments. Finally, the currently known extent of division level bacterial phylogenetic diversity is collated and summarized.
Bolevenine, a toxic protein from the Japanese toadstool Boletus venenatus.
Matsuura, Masanori; Yamada, Mina; Saikawa, Yoko; Miyairi, Kazuo; Okuno, Toshikatsu; Konno, Katsuhiro; Uenishi, Jun'ichi; Hashimoto, Kimiko; Nakata, Masaya
2007-03-01
A toxic protein, called bolevenine, was isolated from the toxic mushroom Boletus venenatus based on its lethal effects on mice. On SDS-PAGE, in either the presence or absence of 2-mercaptoethanol, this protein showed a single band of approximately 12 kDa. In contrast, based on gel filtration and MALDI-TOFMS, its relative molecular mass was estimated to be approximately 30 kDa and approximately 33 kDa, respectively, indicating that the protein consists of three identical subunits. This toxin exhibited its lethal activity following injection at 10mg/kg into mice. The N-terminal amino acid sequence was determined up to 18, and found to be similar to the previously reported bolesatine, a toxic compound isolated from Boletus satanas.
Comparing sequencing assays and human-machine analyses in actionable genomics for glioblastoma.
Wrzeszczynski, Kazimierz O; Frank, Mayu O; Koyama, Takahiko; Rhrissorrakrai, Kahn; Robine, Nicolas; Utro, Filippo; Emde, Anne-Katrin; Chen, Bo-Juen; Arora, Kanika; Shah, Minita; Vacic, Vladimir; Norel, Raquel; Bilal, Erhan; Bergmann, Ewa A; Moore Vogel, Julia L; Bruce, Jeffrey N; Lassman, Andrew B; Canoll, Peter; Grommes, Christian; Harvey, Steve; Parida, Laxmi; Michelini, Vanessa V; Zody, Michael C; Jobanputra, Vaidehi; Royyuru, Ajay K; Darnell, Robert B
2017-08-01
To analyze a glioblastoma tumor specimen with 3 different platforms and compare potentially actionable calls from each. Tumor DNA was analyzed by a commercial targeted panel. In addition, tumor-normal DNA was analyzed by whole-genome sequencing (WGS) and tumor RNA was analyzed by RNA sequencing (RNA-seq). The WGS and RNA-seq data were analyzed by a team of bioinformaticians and cancer oncologists, and separately by IBM Watson Genomic Analytics (WGA), an automated system for prioritizing somatic variants and identifying drugs. More variants were identified by WGS/RNA analysis than by targeted panels. WGA completed a comparable analysis in a fraction of the time required by the human analysts. The development of an effective human-machine interface in the analysis of deep cancer genomic datasets may provide potentially clinically actionable calls for individual patients in a more timely and efficient manner than currently possible. NCT02725684.
Aung, Win Pa Pa; Htoon, Thi Thi; Tin, Htay Htay; Thinn, Kyi Kyi; Sanpool, Oranuch; Jongthawin, Jurairat; Sadaow, Lakkhana; Phosuk, Issarapong; Rodpai, Rutchanee; Intapan, Pewpan M; Maleewong, Wanchai
2017-01-01
Opisthorchis viverrini is endemic in the South East Asian region, especially in Cambodia, Lao People's Democratic Republic, Vietnam and Thailand, but there have been no previous records from Myanmar. During stool surveys of rural populations in three regions of Lower Myanmar, Opisthorchis-like eggs were found in 34 out of 364 (9.3%) participants by stool microscopy after using the modified formalin-ether concentration technique. DNA was extracted from these positive stool samples and a portion of the mitochondrial cytochrome c oxidase subunit I (cox1) gene was amplified using the polymerase chain reaction and then sequenced. DNA sequences, successfully obtained from 18 of 34 positive samples (Bago Region, n = 13; Mon State, n = 3; Yangon Region, n = 2), confirmed that the eggs were of O. viverrini. Sequences showed 99.7% identity with O. viverrini mitochondrial cox1 (GenBank accession no. JF739555) but 95%, 88.7%, 82.6% and 81.4% identities with those of Opisthorchis lobatus from Lao People's Democratic Republic (GenBank accession nos. HQ328539-HQ328541), Metorchis orientalis from China (KT239342), Clonorchis sinensis from China (JF729303) and Opisthorchis felineus from Russia (EU921260), respectively. When alignement with other Opisthorchiidae trematodes, 81% similarity with Metorchis bilis from Czech Republic (GenBank accession nos. KT740966, KT740969, KT740970) and Slovakia (GenBank accession nos. KT740971-KT740973), 84.6% similarity with Metorchis xanthosomus from Czech Republic (GenBank accession no. KT740974), 78.6% similarity with M. xanthosomus from Poland (GenBank accession no. KT740968) and 82.2% similarity with Euamphimerus pancreaticus from Czech Republic (GenBank accession no. KT740975) were revealed. This study demonstrated, for the first time, O. viverrini from rural people in Myanmar using molecular methods and is an urgent call for surveillance and control activities against opisthorchiasis in Myanmar.
Tracking polypeptide folds on the free energy surface: effects of the chain length and sequence.
Brukhno, Andrey V; Ricchiuto, Piero; Auer, Stefan
2012-07-26
Characterization of the folding transition in polypeptides and assessing the thermodynamic stability of their structured folds are of primary importance for approaching the problem of protein folding. We use molecular dynamics simulations for a coarse grained polypeptide model in order to (1) obtain the equilibrium conformation diagram of homopolypeptides in a broad range of the chain lengths, N = 10, ..., 100, and temperatures, T (in a multicanonical ensemble), and (2) determine free energy profiles (FEPs) projected onto an optimal, so-called "natural", reaction coordinate that preserves the height of barriers and the diffusion coefficients on the underlying free energy hyper-surface. We then address the following fundamental questions. (i) How well does a kinetically determined free energy landscape of a single chain represent the polypeptide equilibrium (ensemble) behavior? In particular, under which conditions might the correspondence be lost, and what are the possible implications for the folding processes? (ii) How does the free energy landscape depend on the chain length (homopolypeptides) and the monomer interaction sequence (heteropolypeptides)? Our data reveal that at low T values equilibrium structures adopted by relatively short homopolypeptides (N < 60) are dominated by α-helical folds which correspond to the primary and secondary minima of the FEP. In contrast, longer homopolypeptides (N > 70), upon quasi-equilibrium cooling, fold preferentially in β-bundles with small helical portions, while the FEPs exhibit no distinct global minima. Moreover, subject to the choice of the initial configuration, at sufficiently low T, essentially metastable structures can be found and prevail far from the true thermodynamic equilibrium. We also show that, by sequence-enabling the polypeptide model, it is possible to restrict the chain to a very specific part of the configuration space, which results in substantial simplification and smoothing of the free energy landscape as compared to the case of the corresponding homopolypeptide.
Governor Bush makes first phone call to KSC using new area code
NASA Technical Reports Server (NTRS)
1999-01-01
In the videoconference room at Headquarters, key representatives of KSC contractors, along with KSC directorates, fill the room during an early morning phone call from Florida Governor Jeb Bush (seen on the video screen) in Tallahassee, Fla. The call is to inaugurate the change of KSC's area code from 407 to 321, effective today. Deputy Director for Business Operations Jim Jennings (fourth from right) received the call. Next to Jennings (at his right) is seated Robert Osband, Florida Space Institute, who suggested the 3-2-1 sequence to reflect the importance of the space industry to Florida's space coast.
Robot Sequencing and Visualization Program (RSVP)
NASA Technical Reports Server (NTRS)
Cooper, Brian K.; Maxwell,Scott A.; Hartman, Frank R.; Wright, John R.; Yen, Jeng; Toole, Nicholas T.; Gorjian, Zareh; Morrison, Jack C
2013-01-01
The Robot Sequencing and Visualization Program (RSVP) is being used in the Mars Science Laboratory (MSL) mission for downlink data visualization and command sequence generation. RSVP reads and writes downlink data products from the operations data server (ODS) and writes uplink data products to the ODS. The primary users of RSVP are members of the Rover Planner team (part of the Integrated Planning and Execution Team (IPE)), who use it to perform traversability/articulation analyses, take activity plan input from the Science and Mission Planning teams, and create a set of rover sequences to be sent to the rover every sol. The primary inputs to RSVP are downlink data products and activity plans in the ODS database. The primary outputs are command sequences to be placed in the ODS for further processing prior to uplink to each rover. RSVP is composed of two main subsystems. The first, called the Robot Sequence Editor (RoSE), understands the MSL activity and command dictionaries and takes care of converting incoming activity level inputs into command sequences. The Rover Planners use the RoSE component of RSVP to put together command sequences and to view and manage command level resources like time, power, temperature, etc. (via a transparent realtime connection to SEQGEN). The second component of RSVP is called HyperDrive, a set of high-fidelity computer graphics displays of the Martian surface in 3D and in stereo. The Rover Planners can explore the environment around the rover, create commands related to motion of all kinds, and see the simulated result of those commands via its underlying tight coupling with flight navigation, motor, and arm software. This software is the evolutionary replacement for the Rover Sequencing and Visualization software used to create command sequences (and visualize the Martian surface) for the Mars Exploration Rover mission.
Lin, Ya-Ying
2017-01-01
A portion of the mitochondrial cytochrome c oxidase I gene was sequenced using both genomic DNA and complement DNA from three planktonic copepod Neocalanus species (N. cristatus, N. plumchrus, and N. flemingeri). Small but critical sequence differences in CO1 were observed between gDNA and cDNA from N. plumchrus. Furthermore, careful observation revealed the presence of recombination between sequences in gDNA from N. plumchrus. Moreover, a chimera of the N. cristatus and N. plumchrus sequences was obtained from N. plumchrus gDNA. The observed phenomena can be best explained by the preferential amplification of the nuclear mitochondrial pseudogenes from gDNA of N. plumchrus. Two conclusions can be drawn from the observations. First, nuclear mitochondrial pseudogenes are pervasive in N. plumchrus. Second, a mating between a female N. cristatus and a male N. plumchrus produced viable offspring, which further backcrossed to a N. plumchrus individual. These observations not only demonstrate intriguing mating behavior in these species, but also emphasize the importance of careful interpretation of species marker sequences amplified from gDNA. PMID:28231343
van Geest, Geert; Voorrips, Roeland E; Esselink, Danny; Post, Aike; Visser, Richard Gf; Arens, Paul
2017-08-07
Cultivated chrysanthemum is an outcrossing hexaploid (2n = 6× = 54) with a disputed mode of inheritance. In this paper, we present a single nucleotide polymorphism (SNP) selection pipeline that was used to design an Affymetrix Axiom array with 183 k SNPs from RNA sequencing data (1). With this array, we genotyped four bi-parental populations (with sizes of 405, 53, 76 and 37 offspring plants respectively), and a cultivar panel of 63 genotypes. Further, we present a method for dosage scoring in hexaploids from signal intensities of the array based on mixture models (2) and validation of selection steps in the SNP selection pipeline (3). The resulting genotypic data is used to draw conclusions on the mode of inheritance in chrysanthemum (4), and to make an inference on allelic expression bias (5). With use of the mixture model approach, we successfully called the dosage of 73,936 out of 183,130 SNPs (40.4%) that segregated in any of the bi-parental populations. To investigate the mode of inheritance, we analysed markers that segregated in the large bi-parental population (n = 405). Analysis of segregation of duplex x nulliplex SNPs resulted in evidence for genome-wide hexasomic inheritance. This evidence was substantiated by the absence of strong linkage between markers in repulsion, which indicated absence of full disomic inheritance. We present the success rate of SNP discovery out of RNA sequencing data as affected by different selection steps, among which SNP coverage over genotypes and use of different types of sequence read mapping software. Genomic dosage highly correlated with relative allele coverage from the RNA sequencing data, indicating that most alleles are expressed according to their genomic dosage. The large population, genotyped with a very large number of markers, is a unique framework for extensive genetic analyses in hexaploid chrysanthemum. As starting point, we show conclusive evidence for genome-wide hexasomic inheritance.
Churkin, Alexander; Barash, Danny
2008-01-01
Background RNAmute is an interactive Java application which, given an RNA sequence, calculates the secondary structure of all single point mutations and organizes them into categories according to their similarity to the predicted structure of the wild type. The secondary structure predictions are performed using the Vienna RNA package. A more efficient implementation of RNAmute is needed, however, to extend from the case of single point mutations to the general case of multiple point mutations, which may often be desired for computational predictions alongside mutagenesis experiments. But analyzing multiple point mutations, a process that requires traversing all possible mutations, becomes highly expensive since the running time is O(nm) for a sequence of length n with m-point mutations. Using Vienna's RNAsubopt, we present a method that selects only those mutations, based on stability considerations, which are likely to be conformational rearranging. The approach is best examined using the dot plot representation for RNA secondary structure. Results Using RNAsubopt, the suboptimal solutions for a given wild-type sequence are calculated once. Then, specific mutations are selected that are most likely to cause a conformational rearrangement. For an RNA sequence of about 100 nts and 3-point mutations (n = 100, m = 3), for example, the proposed method reduces the running time from several hours or even days to several minutes, thus enabling the practical application of RNAmute to the analysis of multiple-point mutations. Conclusion A highly efficient addition to RNAmute that is as user friendly as the original application but that facilitates the practical analysis of multiple-point mutations is presented. Such an extension can now be exploited prior to site-directed mutagenesis experiments by virologists, for example, who investigate the change of function in an RNA virus via mutations that disrupt important motifs in its secondary structure. A complete explanation of the application, called MultiRNAmute, is available at [1]. PMID:18445289
Koko, Mahmoud; Abdallah, Mohammed O E; Amin, Mutaz; Ibrahim, Muntaser
2018-01-15
The conventional variant calling of pathogenic alleles in exome and genome sequencing requires the presence of the non-pathogenic alleles as genome references. This hinders the correct identification of variants with minor and/or pathogenic reference alleles warranting additional approaches for variant calling. More than 26,000 Exome Aggregation Consortium (ExAC) variants have a minor reference allele including variants with known ClinVar disease alleles. For instance, in a number of variants related to clotting disorders, the phenotype-associated allele is a human genome reference allele (rs6025, rs6003, rs1799983, and rs2227564 using the assembly hg19). We highlighted how the current variant calling standards miss homozygous reference disease variants in these sites and provided a bioinformatic panel that can be used to screen these variants using commonly available variant callers. We present exome sequencing results from an individual with venous thrombosis to emphasize how pathogenic alleles in clinically relevant variants escape variant calling while non-pathogenic alleles are detected. This article highlights the importance of specialized variant calling strategies in clinical variants with minor reference alleles especially in the context of personal genomes and exomes. We provide here a simple strategy to screen potential disease-causing variants when present in homozygous reference state.
Composeable Chat over Low-Bandwidth Intermittent Communication Links
2007-04-01
Compression (STC), introduced in this report, is a data compression algorithm intended to compress alphanumeric... Ziv - Lempel coding, the grandfather of most modern general-purpose file compression programs, watches for input symbol sequences that have previously... data . This section applies these techniques to create a new compression algorithm called Small Text Compression . Various sequence compression
USDA-ARS?s Scientific Manuscript database
Genotyping by sequencing (GBS) provides opportunities to generate high-resolution genetic maps at a low per-sample genotyping cost, but missing data and under-calling of heterozygotes complicate the creation of GBS linkage maps for highly heterozygous species. To overcome these issues, we developed ...
DOE Office of Scientific and Technical Information (OSTI.GOV)
VAN ZEIJTS,J.; DOTTAVIO,T.; FRAK,B.
The Relativistic Heavy Ion Collider (RHIC) has a high level asynchronous time-line driven by a controlling program called the ''Sequencer''. Most high-level magnet and beam related issues are orchestrated by this system. The system also plays an important task in coordinated data acquisition and saving. We present the program, operator interface, operational impact and experience.
The Babushka Concept--An Instructional Sequence to Enhance Laboratory Learning in Science Education
ERIC Educational Resources Information Center
Gårdebjer, Sofie; Larsson, Anette; Adawi, Tom
2017-01-01
This paper deals with a novel method for improving the traditional "verification" laboratory in science education. Drawing on the idea of integrated instructional units, we describe an instructional sequence which we call the Babushka concept. This concept consists of three integrated instructional units: a start-up lecture, a laboratory…
JGI Plant Genomics Gene Annotation Pipeline
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shu, Shengqiang; Rokhsar, Dan; Goodstein, David
2014-07-14
Plant genomes vary in size and are highly complex with a high amount of repeats, genome duplication and tandem duplication. Gene encodes a wealth of information useful in studying organism and it is critical to have high quality and stable gene annotation. Thanks to advancement of sequencing technology, many plant species genomes have been sequenced and transcriptomes are also sequenced. To use these vastly large amounts of sequence data to make gene annotation or re-annotation in a timely fashion, an automatic pipeline is needed. JGI plant genomics gene annotation pipeline, called integrated gene call (IGC), is our effort toward thismore » aim with aid of a RNA-seq transcriptome assembly pipeline. It utilizes several gene predictors based on homolog peptides and transcript ORFs. See Methods for detail. Here we present genome annotation of JGI flagship green plants produced by this pipeline plus Arabidopsis and rice except for chlamy which is done by a third party. The genome annotations of these species and others are used in our gene family build pipeline and accessible via JGI Phytozome portal whose URL and front page snapshot are shown below.« less
Exploiting three kinds of interface propensities to identify protein binding sites.
Liu, Bin; Wang, Xiaolong; Lin, Lei; Dong, Qiwen; Wang, Xuan
2009-08-01
Predicting the binding sites between two interacting proteins provides important clues to the function of a protein. In this study, we present a building block of proteins called order profiles to use the evolutionary information of the protein sequence frequency profiles and apply this building block to produce a class of propensities called order profile interface propensities. For comparisons, we revisit the usage of residue interface propensities and binary profile interface propensities for protein binding site prediction. Each kind of propensities combined with sequence profiles and accessible surface areas are inputted into SVM. When tested on four types of complexes (hetero-permanent complexes, hetero-transient complexes, homo-permanent complexes and homo-transient complexes), experimental results show that the order profile interface propensities are better than residue interface propensities and binary profile interface propensities. Therefore, order profile is a suitable profile-level building block of the protein sequences and can be widely used in many tasks of computational biology, such as the sequence alignment, the prediction of domain boundary, the designation of knowledge-based potentials and the protein remote homology detection.
Analysis of the origin of predictability in human communications
NASA Astrophysics Data System (ADS)
Zhang, Lin; Liu, Yani; Wu, Ye; Xiao, Jinghua
2014-01-01
Human behaviors in daily life can be traced by their communications via electronic devices. E-mails, short messages and cell-phone calls can be used to investigate the predictability of communication partners’ patterns, because these three are the most representative and common behaviors in daily communications. In this paper, we show that all the three manners have apparent predictability in partners’ patterns, and moreover, the short message users’ sequences have the highest predictability among the three. We also reveal that people with fewer communication partners have higher predictability. Finally, we investigate the origin of predictability, which comes from two aspects: one is the intrinsic pattern in the partners sequence, that is, people have the preference of communicating with a fixed partner after another fixed one. The other aspect is the burst, which is communicating with the same partner several times in a row. The high burst in short message communication pattern is one of the main reasons for its high predictability, the intrinsic pattern in e-mail partners sequence is the main reason for its predictability, and the predictability of cell-phone call partners sequence comes from both aspects.
Dinh, Hieu; Rajasekaran, Sanguthevar
2011-07-15
Exact-match overlap graphs have been broadly used in the context of DNA assembly and the shortest super string problem where the number of strings n ranges from thousands to billions. The length ℓ of the strings is from 25 to 1000, depending on the DNA sequencing technologies. However, many DNA assemblers using overlap graphs suffer from the need for too much time and space in constructing the graphs. It is nearly impossible for these DNA assemblers to handle the huge amount of data produced by the next-generation sequencing technologies where the number n of strings could be several billions. If the overlap graph is explicitly stored, it would require Ω(n(2)) memory, which could be prohibitive in practice when n is greater than a hundred million. In this article, we propose a novel data structure using which the overlap graph can be compactly stored. This data structure requires only linear time to construct and and linear memory to store. For a given set of input strings (also called reads), we can informally define an exact-match overlap graph as follows. Each read is represented as a node in the graph and there is an edge between two nodes if the corresponding reads overlap sufficiently. A formal description follows. The maximal exact-match overlap of two strings x and y, denoted by ov(max)(x, y), is the longest string which is a suffix of x and a prefix of y. The exact-match overlap graph of n given strings of length ℓ is an edge-weighted graph in which each vertex is associated with a string and there is an edge (x, y) of weight ω=ℓ-|ov(max)(x, y)| if and only if ω ≤ λ, where |ov(max)(x, y)| is the length of ov(max)(x, y) and λ is a given threshold. In this article, we show that the exact-match overlap graphs can be represented by a compact data structure that can be stored using at most (2λ-1)(2⌈logn⌉+⌈logλ⌉)n bits with a guarantee that the basic operation of accessing an edge takes O(log λ) time. We also propose two algorithms for constructing the data structure for the exact-match overlap graph. The first algorithm runs in O(λℓnlogn) worse-case time and requires O(λ) extra memory. The second one runs in O(λℓn) time and requires O(n) extra memory. Our experimental results on a huge amount of simulated data from sequence assembly show that the data structure can be constructed efficiently in time and memory. Our DNA sequence assembler that incorporates the data structure is freely available on the web at http://www.engr.uconn.edu/~htd06001/assembler/leap.zip
Zhu, Ruo-Lin; Lei, Xiao-Ying; Ke, Fei; Yuan, Xiu-Ping; Zhang, Qi-Ya
2011-02-01
Genomic sequence of Scophthalmus maximus rhabdovirus (SMRV) isolated from diseased turbot has been characterized. The complete genome of SMRV comprises 11,492 nucleotides and encodes five typical rhabdovirus genes N, P, M, G and L. In addition, two open reading frames (ORF) are predicted overlapping with P gene, one upstream of P and smaller than P (temporarily called Ps), and another in P gene which may encodes a protein similar to the vesicular stomatitis virus C protein. The C ORF is contained within the P ORF. The five typical proteins share the highest sequence identities (48.9%) with the corresponding proteins of rhabdoviruses in genus Vesiculovirus. Phylogenetic analysis of partial L protein sequence indicates that SMRV is close to genus Vesiculovirus. The first 13 nucleotides at the ends of the SMRV genome are absolutely inverse complementarity. The gene junctions between the five genes show conserved polyadenylation signal (CATGA(7)) and intergenic dinucleotide (CT) followed by putative transcription initiation sequence A(A/G)(C/G)A(A/G/T), which are different from known rhabdoviruses. The entire Ps ORF was cloned and expressed, and used to generate polyclonal antibody in mice. One obvious band could be detected in SMRV-infected carp leucocyte cells (CLCs) by anti-Ps/C serum via Western blot, and the subcellular localization of Ps-GFP fusion protein exhibited cytoplasm distribution as multiple punctuate or doughnut shaped foci of uneven size. Copyright © 2010 Elsevier B.V. All rights reserved.
Murphy, James; Klumpp, Jochen; Mahony, Jennifer; O'Connell-Motherway, Mary; Nauta, Arjen; van Sinderen, Douwe
2014-10-01
So-called 936-type phages are among the most frequently isolated phages in dairy facilities utilising Lactococcus lactis starter cultures. Despite extensive efforts to control phage proliferation and decades of research, these phages continue to negatively impact cheese production in terms of the final product quality and consequently, monetary return. Whole genome sequencing and in silico analysis of three 936-type phage genomes identified several putative (orphan) methyltransferase (MTase)-encoding genes located within the packaging and replication regions of the genome. Utilising SMRT sequencing, methylome analysis was performed on all three phages, allowing the identification of adenine modifications consistent with N-6 methyladenine sequence methylation, which in some cases could be attributed to these phage-encoded MTases. Heterologous gene expression revealed that M.Phi145I/M.Phi93I and M.Phi93DAM, encoded by genes located within the packaging module, provide protection against the restriction enzymes HphI and DpnII, respectively, representing the first functional MTases identified in members of 936-type phages. SMRT sequencing technology enabled the identification of the target motifs of MTases encoded by the genomes of three lytic 936-type phages and these MTases represent the first functional MTases identified in this species of phage. The presence of these MTase-encoding genes on 936-type phage genomes is assumed to represent an adaptive response to circumvent host encoded restriction-modification systems thereby increasing the fitness of the phages in a dynamic dairy environment.
NASA Astrophysics Data System (ADS)
Ismail, Roslina; Omar, Ghazali; Jalar, Azman; Majlis, Burhanuddin Yeop
2015-07-01
Wire bonding processes has been widely adopted in micro-electromechanical systems (MEMS) packaging especially in biomedical devices for the integration of components. In the first process sequence in wire bonding, the zone along the wire near the melted tips is called the heat-affected zone (HAZ). The HAZ plays an important factor that influenced the looping profiles of wire bonding process. This paper investigates the effect of dopants on microstructures in the HAZ. One precent palladium (Pd) was added to the as-drawn 4N gold wire and annealed at 600°C. The addition of Pd was able to moderate the grain growth in the HAZ by retarding the heat propagation to the wire. In the formation of the looping profile, the first bending point of the looping is highly associated with the length of the HAZ. The alloyed gold wire (2N gold) has a sharp angle at a distance of about 30 m from the neck of the wire with a measured bending radius of about 40 mm and bending angle of about 40° clockwise from vertical axis, while the 4N gold wire bends at a longer distance. It also shows that the HAZ for 4N gold is longer than 2N gold wire.
2014-01-01
and distance between all of the vector ambiguity pairs for the combined N−sequences. To simplify our derivation, we define the center of ambiguity (COA...modulo N . The resulting structure of the N sequences ensures that two successive RSNS vectors (paired terms from all N sequences) when considered...represented by a vector , Xh = [x1,h, x2,h, . . . , xN,h] T , of N paired integers from each se- quence at h. For example, a left-shifted, three-sequence
preAssemble: a tool for automatic sequencer trace data processing.
Adzhubei, Alexei A; Laerdahl, Jon K; Vlasova, Anna V
2006-01-17
Trace or chromatogram files (raw data) are produced by automatic nucleic acid sequencing equipment or sequencers. Each file contains information which can be interpreted by specialised software to reveal the sequence (base calling). This is done by the sequencer proprietary software or publicly available programs. Depending on the size of a sequencing project the number of trace files can vary from just a few to thousands of files. Sequencing quality assessment on various criteria is important at the stage preceding clustering and contig assembly. Two major publicly available packages--Phred and Staden are used by preAssemble to perform sequence quality processing. The preAssemble pre-assembly sequence processing pipeline has been developed for small to large scale automatic processing of DNA sequencer chromatogram (trace) data. The Staden Package Pregap4 module and base-calling program Phred are utilized in the pipeline, which produces detailed and self-explanatory output that can be displayed with a web browser. preAssemble can be used successfully with very little previous experience, however options for parameter tuning are provided for advanced users. preAssemble runs under UNIX and LINUX operating systems. It is available for downloading and will run as stand-alone software. It can also be accessed on the Norwegian Salmon Genome Project web site where preAssemble jobs can be run on the project server. preAssemble is a tool allowing to perform quality assessment of sequences generated by automatic sequencing equipment. preAssemble is flexible since both interactive jobs on the preAssemble server and the stand alone downloadable version are available. Virtually no previous experience is necessary to run a default preAssemble job, on the other hand options for parameter tuning are provided. Consequently preAssemble can be used as efficiently for just several trace files as for large scale sequence processing.
DeNovoGUI: An Open Source Graphical User Interface for de Novo Sequencing of Tandem Mass Spectra
2013-01-01
De novo sequencing is a popular technique in proteomics for identifying peptides from tandem mass spectra without having to rely on a protein sequence database. Despite the strong potential of de novo sequencing algorithms, their adoption threshold remains quite high. We here present a user-friendly and lightweight graphical user interface called DeNovoGUI for running parallelized versions of the freely available de novo sequencing software PepNovo+, greatly simplifying the use of de novo sequencing in proteomics. Our platform-independent software is freely available under the permissible Apache2 open source license. Source code, binaries, and additional documentation are available at http://denovogui.googlecode.com. PMID:24295440
DeNovoGUI: an open source graphical user interface for de novo sequencing of tandem mass spectra.
Muth, Thilo; Weilnböck, Lisa; Rapp, Erdmann; Huber, Christian G; Martens, Lennart; Vaudel, Marc; Barsnes, Harald
2014-02-07
De novo sequencing is a popular technique in proteomics for identifying peptides from tandem mass spectra without having to rely on a protein sequence database. Despite the strong potential of de novo sequencing algorithms, their adoption threshold remains quite high. We here present a user-friendly and lightweight graphical user interface called DeNovoGUI for running parallelized versions of the freely available de novo sequencing software PepNovo+, greatly simplifying the use of de novo sequencing in proteomics. Our platform-independent software is freely available under the permissible Apache2 open source license. Source code, binaries, and additional documentation are available at http://denovogui.googlecode.com .
Visibility graphs and symbolic dynamics
NASA Astrophysics Data System (ADS)
Lacasa, Lucas; Just, Wolfram
2018-07-01
Visibility algorithms are a family of geometric and ordering criteria by which a real-valued time series of N data is mapped into a graph of N nodes. This graph has been shown to often inherit in its topology nontrivial properties of the series structure, and can thus be seen as a combinatorial representation of a dynamical system. Here we explore in some detail the relation between visibility graphs and symbolic dynamics. To do that, we consider the degree sequence of horizontal visibility graphs generated by the one-parameter logistic map, for a range of values of the parameter for which the map shows chaotic behaviour. Numerically, we observe that in the chaotic region the block entropies of these sequences systematically converge to the Lyapunov exponent of the time series. Hence, Pesin's identity suggests that these block entropies are converging to the Kolmogorov-Sinai entropy of the physical measure, which ultimately suggests that the algorithm is implicitly and adaptively constructing phase space partitions which might have the generating property. To give analytical insight, we explore the relation k(x) , x ∈ [ 0 , 1 ] that, for a given datum with value x, assigns in graph space a node with degree k. In the case of the out-degree sequence, such relation is indeed a piece-wise constant function. By making use of explicit methods and tools from symbolic dynamics we are able to analytically show that the algorithm indeed performs an effective partition of the phase space and that such partition is naturally expressed as a countable union of subintervals, where the endpoints of each subinterval are related to the fixed point structure of the iterates of the map and the subinterval enumeration is associated with particular ordering structures that we called motifs.
The disorderly conduct of Hsc70 and its interaction with the Alzheimer's related Tau protein.
Taylor, Isabelle R; Ahmad, Atta; Wu, Taia; Nordhues, Bryce A; Bhullar, Anup; Gestwicki, Jason E; Zuiderweg, Erik R P
2018-05-15
Hsp70 chaperones bind to various protein substrates for folding, trafficking, and degradation. Considerable structural information is available about how prokaryotic Hsp70 (DnaK) binds substrates, but less is known about mammalian Hsp70s, of which there are 13 isoforms encoded in the human genome. Here, we report the interaction between the human Hsp70 isoform heat shock cognate 71 KDa protein (Hsc70 or HSPA8) and peptides derived from the microtubule-associated protein tau, which is linked to Alzheimer's disease. For structural studies, we used an Hsc70 construct (called BETA) comprising the substrate-binding domain, but lacking the lid. Importantly, we found that truncating the lid does not significantly impair Hsc70's chaperone activity or allostery in vitro. Using NMR, we show that BETA is partially dynamically disordered in the absence of substrate and that binding of the tau sequence GKVQIINKKG (with a KD = 500 nM) causes dramatic rigidification of BETA. Nuclear Overhauser effect distance measurements revealed that tau binds to the canonical substrate-binding cleft, similar to the binding observed with DnaK. To further develop BETA as a tool for studying Hsc70 interactions, we also measured BETA binding in NMR and fluorescent competition assays to peptides derived from huntingtin, insulin, a second tau-recognition sequence, and a KFERQ-like sequence linked to chaperone-mediated autophagy. We found that the insulin C-peptide binds BETA with high affinity (KD < 100 nM), whereas the others do not (KD > 100 μM). Together, our findings reveal several similarities and differences in how prokaryotic and mammalian Hsp70 isoforms interact with different substrate peptides. Published under license by The American Society for Biochemistry and Molecular Biology, Inc.
Identification of genomic indels and structural variations using split reads
2011-01-01
Background Recent studies have demonstrated the genetic significance of insertions, deletions, and other more complex structural variants (SVs) in the human population. With the development of the next-generation sequencing technologies, high-throughput surveys of SVs on the whole-genome level have become possible. Here we present split-read identification, calibrated (SRiC), a sequence-based method for SV detection. Results We start by mapping each read to the reference genome in standard fashion using gapped alignment. Then to identify SVs, we score each of the many initial mappings with an assessment strategy designed to take into account both sequencing and alignment errors (e.g. scoring more highly events gapped in the center of a read). All current SV calling methods have multilevel biases in their identifications due to both experimental and computational limitations (e.g. calling more deletions than insertions). A key aspect of our approach is that we calibrate all our calls against synthetic data sets generated from simulations of high-throughput sequencing (with realistic error models). This allows us to calculate sensitivity and the positive predictive value under different parameter-value scenarios and for different classes of events (e.g. long deletions vs. short insertions). We run our calculations on representative data from the 1000 Genomes Project. Coupling the observed numbers of events on chromosome 1 with the calibrations gleaned from the simulations (for different length events) allows us to construct a relatively unbiased estimate for the total number of SVs in the human genome across a wide range of length scales. We estimate in particular that an individual genome contains ~670,000 indels/SVs. Conclusions Compared with the existing read-depth and read-pair approaches for SV identification, our method can pinpoint the exact breakpoints of SV events, reveal the actual sequence content of insertions, and cover the whole size spectrum for deletions. Moreover, with the advent of the third-generation sequencing technologies that produce longer reads, we expect our method to be even more useful. PMID:21787423
Poultney, Christopher S.; Goldberg, Arthur P.; Drapeau, Elodie; Kou, Yan; Harony-Nicolas, Hala; Kajiwara, Yuji; De Rubeis, Silvia; Durand, Simon; Stevens, Christine; Rehnström, Karola; Palotie, Aarno; Daly, Mark J.; Ma’ayan, Avi; Fromer, Menachem; Buxbaum, Joseph D.
2013-01-01
Copy number variation (CNV) is an important determinant of human diversity and plays important roles in susceptibility to disease. Most studies of CNV carried out to date have made use of chromosome microarray and have had a lower size limit for detection of about 30 kilobases (kb). With the emergence of whole-exome sequencing studies, we asked whether such data could be used to reliably call rare exonic CNV in the size range of 1–30 kilobases (kb), making use of the eXome Hidden Markov Model (XHMM) program. By using both transmission information and validation by molecular methods, we confirmed that small CNV encompassing as few as three exons can be reliably called from whole-exome data. We applied this approach to an autism case-control sample (n = 811, mean per-target read depth = 161) and observed a significant increase in the burden of rare (MAF ≤1%) 1–30 kb CNV, 1–30 kb deletions, and 1–10 kb deletions in ASD. CNV in the 1–30 kb range frequently hit just a single gene, and we were therefore able to carry out enrichment and pathway analyses, where we observed enrichment for disruption of genes in cytoskeletal and autophagy pathways in ASD. In summary, our results showed that XHMM provided an effective means to assess small exonic CNV from whole-exome data, indicated that rare 1–30 kb exonic deletions could contribute to risk in up to 7% of individuals with ASD, and implicated a candidate pathway in developmental delay syndromes. PMID:24094742
Mining for class-specific motifs in protein sequence classification
2013-01-01
Background In protein sequence classification, identification of the sequence motifs or n-grams that can precisely discriminate between classes is a more interesting scientific question than the classification itself. A number of classification methods aim at accurate classification but fail to explain which sequence features indeed contribute to the accuracy. We hypothesize that sequences in lower denominations (n-grams) can be used to explore the sequence landscape and to identify class-specific motifs that discriminate between classes during classification. Discriminative n-grams are short peptide sequences that are highly frequent in one class but are either minimally present or absent in other classes. In this study, we present a new substitution-based scoring function for identifying discriminative n-grams that are highly specific to a class. Results We present a scoring function based on discriminative n-grams that can effectively discriminate between classes. The scoring function, initially, harvests the entire set of 4- to 8-grams from the protein sequences of different classes in the dataset. Similar n-grams of the same size are combined to form new n-grams, where the similarity is defined by positive amino acid substitution scores in the BLOSUM62 matrix. Substitution has resulted in a large increase in the number of discriminatory n-grams harvested. Due to the unbalanced nature of the dataset, the frequencies of the n-grams are normalized using a dampening factor, which gives more weightage to the n-grams that appear in fewer classes and vice-versa. After the n-grams are normalized, the scoring function identifies discriminative 4- to 8-grams for each class that are frequent enough to be above a selection threshold. By mapping these discriminative n-grams back to the protein sequences, we obtained contiguous n-grams that represent short class-specific motifs in protein sequences. Our method fared well compared to an existing motif finding method known as Wordspy. We have validated our enriched set of class-specific motifs against the functionally important motifs obtained from the NLSdb, Prosite and ELM databases. We demonstrate that this method is very generic; thus can be widely applied to detect class-specific motifs in many protein sequence classification tasks. Conclusion The proposed scoring function and methodology is able to identify class-specific motifs using discriminative n-grams derived from the protein sequences. The implementation of amino acid substitution scores for similarity detection, and the dampening factor to normalize the unbalanced datasets have significant effect on the performance of the scoring function. Our multipronged validation tests demonstrate that this method can detect class-specific motifs from a wide variety of protein sequence classes with a potential application to detecting proteome-specific motifs of different organisms. PMID:23496846
Fault trees and sequence dependencies
NASA Technical Reports Server (NTRS)
Dugan, Joanne Bechta; Boyd, Mark A.; Bavuso, Salvatore J.
1990-01-01
One of the frequently cited shortcomings of fault-tree models, their inability to model so-called sequence dependencies, is discussed. Several sources of such sequence dependencies are discussed, and new fault-tree gates to capture this behavior are defined. These complex behaviors can be included in present fault-tree models because they utilize a Markov solution. The utility of the new gates is demonstrated by presenting several models of the fault-tolerant parallel processor, which include both hot and cold spares.
ACTG: novel peptide mapping onto gene models.
Choi, Seunghyuk; Kim, Hyunwoo; Paek, Eunok
2017-04-15
In many proteogenomic applications, mapping peptide sequences onto genome sequences can be very useful, because it allows us to understand origins of the gene products. Existing software tools either take the genomic position of a peptide start site as an input or assume that the peptide sequence exactly matches the coding sequence of a given gene model. In case of novel peptides resulting from genomic variations, especially structural variations such as alternative splicing, these existing tools cannot be directly applied unless users supply information about the variant, either its genomic position or its transcription model. Mapping potentially novel peptides to genome sequences, while allowing certain genomic variations, requires introducing novel gene models when aligning peptide sequences to gene structures. We have developed a new tool called ACTG (Amino aCids To Genome), which maps peptides to genome, assuming all possible single exon skipping, junction variation allowing three edit distances from the original splice sites, exon extension and frame shift. In addition, it can also consider SNVs (single nucleotide variations) during mapping phase if a user provides the VCF (variant call format) file as an input. Available at http://prix.hanyang.ac.kr/ACTG/search.jsp . eunokpaek@hanyang.ac.kr. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
DOE Office of Scientific and Technical Information (OSTI.GOV)
Krishnakumar, Raga; Sinha, Anupama; Bird, Sara W.
Emerging sequencing technologies are allowing us to characterize environmental, clinical and laboratory samples with increasing speed and detail, including real-time analysis and interpretation of data. One example of this is being able to rapidly and accurately detect a wide range of pathogenic organisms, both in the clinic and the field. Genomes can have radically different GC content however, such that accurate sequence analysis can be challenging depending upon the technology used. Here, we have characterized the performance of the Oxford MinION nanopore sequencer for detection and evaluation of organisms with a range of genomic nucleotide bias. We have diagnosed themore » quality of base-calling across individual reads and discovered that the position within the read affects base-calling and quality scores. Finally, we have evaluated the performance of the current state-of-the-art neural network-based MinION basecaller, characterizing its behavior with respect to systemic errors as well as context- and sequence-specific errors. Overall, we present a detailed characterization the capabilities of the MinION in terms of generating high-accuracy sequence data from genomes with a wide range of nucleotide content. This study provides a framework for designing the appropriate experiments that are the likely to lead to accurate and rapid field-forward diagnostics.« less
Krishnakumar, Raga; Sinha, Anupama; Bird, Sara W.; ...
2018-02-16
Emerging sequencing technologies are allowing us to characterize environmental, clinical and laboratory samples with increasing speed and detail, including real-time analysis and interpretation of data. One example of this is being able to rapidly and accurately detect a wide range of pathogenic organisms, both in the clinic and the field. Genomes can have radically different GC content however, such that accurate sequence analysis can be challenging depending upon the technology used. Here, we have characterized the performance of the Oxford MinION nanopore sequencer for detection and evaluation of organisms with a range of genomic nucleotide bias. We have diagnosed themore » quality of base-calling across individual reads and discovered that the position within the read affects base-calling and quality scores. Finally, we have evaluated the performance of the current state-of-the-art neural network-based MinION basecaller, characterizing its behavior with respect to systemic errors as well as context- and sequence-specific errors. Overall, we present a detailed characterization the capabilities of the MinION in terms of generating high-accuracy sequence data from genomes with a wide range of nucleotide content. This study provides a framework for designing the appropriate experiments that are the likely to lead to accurate and rapid field-forward diagnostics.« less
Microbe-ID: an open source toolbox for microbial genotyping and species identification.
Tabima, Javier F; Everhart, Sydney E; Larsen, Meredith M; Weisberg, Alexandra J; Kamvar, Zhian N; Tancos, Matthew A; Smart, Christine D; Chang, Jeff H; Grünwald, Niklaus J
2016-01-01
Development of tools to identify species, genotypes, or novel strains of invasive organisms is critical for monitoring emergence and implementing rapid response measures. Molecular markers, although critical to identifying species or genotypes, require bioinformatic tools for analysis. However, user-friendly analytical tools for fast identification are not readily available. To address this need, we created a web-based set of applications called Microbe-ID that allow for customizing a toolbox for rapid species identification and strain genotyping using any genetic markers of choice. Two components of Microbe-ID, named Sequence-ID and Genotype-ID, implement species and genotype identification, respectively. Sequence-ID allows identification of species by using BLAST to query sequences for any locus of interest against a custom reference sequence database. Genotype-ID allows placement of an unknown multilocus marker in either a minimum spanning network or dendrogram with bootstrap support from a user-created reference database. Microbe-ID can be used for identification of any organism based on nucleotide sequences or any molecular marker type and several examples are provided. We created a public website for demonstration purposes called Microbe-ID (microbe-id.org) and provided a working implementation for the genus Phytophthora (phytophthora-id.org). In Phytophthora-ID, the Sequence-ID application allows identification based on ITS or cox spacer sequences. Genotype-ID groups individuals into clonal lineages based on simple sequence repeat (SSR) markers for the two invasive plant pathogen species P. infestans and P. ramorum. All code is open source and available on github and CRAN. Instructions for installation and use are provided at https://github.com/grunwaldlab/Microbe-ID.
Normand, A C; Packeu, A; Cassagne, C; Hendrickx, M; Ranque, S; Piarroux, R
2018-05-01
Conventional dermatophyte identification is based on morphological features. However, recent studies have proposed to use the nucleotide sequences of the rRNA internal transcribed spacer (ITS) region as an identification barcode of all fungi, including dermatophytes. Several nucleotide databases are available to compare sequences and thus identify isolates; however, these databases often contain mislabeled sequences that impair sequence-based identification. We evaluated five of these databases on a clinical isolate panel. We selected 292 clinical dermatophyte strains that were prospectively subjected to an ITS2 nucleotide sequence analysis. Sequences were analyzed against the databases, and the results were compared to clusters obtained via DNA alignment of sequence segments. The DNA tree served as the identification standard throughout the study. According to the ITS2 sequence identification, the majority of strains (255/292) belonged to the genus Trichophyton , mainly T. rubrum complex ( n = 184), T. interdigitale ( n = 40), T. tonsurans ( n = 26), and T. benhamiae ( n = 5). Other genera included Microsporum (e.g., M. canis [ n = 21], M. audouinii [ n = 10], Nannizzia gypsea [ n = 3], and Epidermophyton [ n = 3]). Species-level identification of T. rubrum complex isolates was an issue. Overall, ITS DNA sequencing is a reliable tool to identify dermatophyte species given that a comprehensive and correctly labeled database is consulted. Since many inaccurate identification results exist in the DNA databases used for this study, reference databases must be verified frequently and amended in line with the current revisions of fungal taxonomy. Before describing a new species or adding a new DNA reference to the available databases, its position in the phylogenetic tree must be verified. Copyright © 2018 American Society for Microbiology.
Ernesäter, Annica; Engström, Maria; Winblad, Ulrika; Holmström, Inger K
2014-10-03
The purpose of this study is to compare communication patterns in calls subjected to a malpractice claim with matched controls. In many countries, telephone advice nursing is patients' first contact with healthcare. Telenurses' assessment of callers' symptoms and needs are based on verbal communication only, and problems with over-triage and under-triage have been reported. A total sample of all reported medical errors (n=33) during the period 2003-2010 within Swedish Healthcare Direct was retrieved. Corresponding calls were thereafter identified and collected as sound files from the manager in charge at the respective call centres. For technical reasons, calls from four of the cases were not possible to retrieve. For the present study, matched control calls (n=26) based on the patient's age, gender and main symptom presented by the caller were collected. Male patients were in majority (n=16), and the most common reasons for calling were abdominal pain (n=10) and chest pain (n=5). There were statistically significant differences between the communication in the cases and controls: telenurses used fewer open-ended medical questions (p<0.001) in the cases compared to the control calls; callers provided telenurses with more medical information in the control calls compared to the cases (p=0.001); and telenurses used more facilitation and patient activation activities in the control calls (p=0.034), such as back-channel response (p=0.001), compared to the cases. The present study shows that telenurses in malpractice claimed calls used more closed-ended questioning compared to those in control calls, who used more open-ended questioning and back-channel response, which provided them with richer medical descriptions and more information from the caller. Hence, these communicative techniques are important in addition to solid medical and nursing competence and sound decision aid systems. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
(Pea)nuts and bolts of visual narrative: Structure and meaning in sequential image comprehension
Cohn, Neil; Paczynski, Martin; Jackendoff, Ray; Holcomb, Phillip J.; Kuperberg, Gina R.
2012-01-01
Just as syntax differentiates coherent sentences from scrambled word strings, the comprehension of sequential images must also use a cognitive system to distinguish coherent narrative sequences from random strings of images. We conducted experiments analogous to two classic studies of language processing to examine the contributions of narrative structure and semantic relatedness to processing sequential images. We compared four types of comic strips: 1) Normal sequences with both structure and meaning, 2) Semantic Only sequences (in which the panels were related to a common semantic theme, but had no narrative structure), 3) Structural Only sequences (narrative structure but no semantic relatedness), and 4) Scrambled sequences of randomly-ordered panels. In Experiment 1, participants monitored for target panels in sequences presented panel-by-panel. Reaction times were slowest to panels in Scrambled sequences, intermediate in both Structural Only and Semantic Only sequences, and fastest in Normal sequences. This suggests that both semantic relatedness and narrative structure offer advantages to processing. Experiment 2 measured ERPs to all panels across the whole sequence. The N300/N400 was largest to panels in both the Scrambled and Structural Only sequences, intermediate in Semantic Only sequences and smallest in the Normal sequences. This implies that a combination of narrative structure and semantic relatedness can facilitate semantic processing of upcoming panels (as reflected by the N300/N400). Also, panels in the Scrambled sequences evoked a larger left-lateralized anterior negativity than panels in the Structural Only sequences. This localized effect was distinct from the N300/N400, and appeared despite the fact that these two sequence types were matched on local semantic relatedness between individual panels. These findings suggest that sequential image comprehension uses a narrative structure that may be independent of semantic relatedness. Altogether, we argue that the comprehension of visual narrative is guided by an interaction between structure and meaning. PMID:22387723
ToTem: a tool for variant calling pipeline optimization.
Tom, Nikola; Tom, Ondrej; Malcikova, Jitka; Pavlova, Sarka; Kubesova, Blanka; Rausch, Tobias; Kolarik, Miroslav; Benes, Vladimir; Bystry, Vojtech; Pospisilova, Sarka
2018-06-26
High-throughput bioinformatics analyses of next generation sequencing (NGS) data often require challenging pipeline optimization. The key problem is choosing appropriate tools and selecting the best parameters for optimal precision and recall. Here we introduce ToTem, a tool for automated pipeline optimization. ToTem is a stand-alone web application with a comprehensive graphical user interface (GUI). ToTem is written in Java and PHP with an underlying connection to a MySQL database. Its primary role is to automatically generate, execute and benchmark different variant calling pipeline settings. Our tool allows an analysis to be started from any level of the process and with the possibility of plugging almost any tool or code. To prevent an over-fitting of pipeline parameters, ToTem ensures the reproducibility of these by using cross validation techniques that penalize the final precision, recall and F-measure. The results are interpreted as interactive graphs and tables allowing an optimal pipeline to be selected, based on the user's priorities. Using ToTem, we were able to optimize somatic variant calling from ultra-deep targeted gene sequencing (TGS) data and germline variant detection in whole genome sequencing (WGS) data. ToTem is a tool for automated pipeline optimization which is freely available as a web application at https://totem.software .
Kwarciak, Kamil; Radom, Marcin; Formanowicz, Piotr
2016-04-01
The classical sequencing by hybridization takes into account a binary information about sequence composition. A given element from an oligonucleotide library is or is not a part of the target sequence. However, the DNA chip technology has been developed and it enables to receive a partial information about multiplicity of each oligonucleotide the analyzed sequence consist of. Currently, it is not possible to assess the exact data of such type but even partial information should be very useful. Two realistic multiplicity information models are taken into consideration in this paper. The first one, called "one and many" assumes that it is possible to obtain information if a given oligonucleotide occurs in a reconstructed sequence once or more than once. According to the second model, called "one, two and many", one is able to receive from biochemical experiment information if a given oligonucleotide is present in an analyzed sequence once, twice or at least three times. An ant colony optimization algorithm has been implemented to verify the above models and to compare with existing algorithms for sequencing by hybridization which utilize the additional information. The proposed algorithm solves the problem with any kind of hybridization errors. Computational experiment results confirm that using even the partial information about multiplicity leads to increased quality of reconstructed sequences. Moreover, they also show that the more precise model enables to obtain better solutions and the ant colony optimization algorithm outperforms the existing ones. Test data sets and the proposed ant colony optimization algorithm are available on: http://bioserver.cs.put.poznan.pl/download/ACO4mSBH.zip. Copyright © 2016 Elsevier Ltd. All rights reserved.
Mukda, Ekchol; Trachoo, Objoon; Pasomsub, Ekawat; Tiyasirichokchai, Rawiphorn; Iemwimangsa, Nareenart; Sosothikul, Darintr; Chantratita, Wasun; Pakakasama, Samart
2017-08-01
In the present study, we used exome sequencing to analyze PRF1, UNC13D, STX11, and STXBP2, as well as genes associated with primary immunodeficiency disease (RAB27A, LYST, AP3B1, SH2D1A, ITK, CD27, XIAP, and MAGT1) in Thai children with hemophagocytic lymphohistiocytosis (HLH). We performed mutation analysis of HLH-associated genes in 25 Thai children using an exome sequencing method. Genetic variations found within these target genes were compared to exome sequencing data from 133 healthy individuals. Variants identified with minor allele frequencies <5% and novel mutations were confirmed using Sanger sequencing. Exome sequencing data revealed 101 non-synonymous single nucleotide polymorphisms (SNPs) in all subjects. These SNPs were classified as pathogenic (n = 1), likely pathogenic (n = 16), variant of unknown significance (n = 12), or benign variant (n = 72). Homozygous, compound heterozygous, and double-gene heterozygous variants, involving mutations in PRF1 (n = 3), UNC13D (n = 2), STXBP2 (n = 3), LYST (n = 3), XIAP (n = 2), AP3B1 (n = 1), RAB27A (n = 1), and MAGT1 (n = 1), were demonstrated in 12 patients. Novel mutations were found in most patients in this study. In conclusion, exome sequencing demonstrated the ability to identify rare genetic variants in HLH patients. This method is useful in the detection of mutations in multi-gene associated diseases.
Deep Sequencing to Identify the Causes of Viral Encephalitis
Chan, Benjamin K.; Wilson, Theodore; Fischer, Kael F.; Kriesel, John D.
2014-01-01
Deep sequencing allows for a rapid, accurate characterization of microbial DNA and RNA sequences in many types of samples. Deep sequencing (also called next generation sequencing or NGS) is being developed to assist with the diagnosis of a wide variety of infectious diseases. In this study, seven frozen brain samples from deceased subjects with recent encephalitis were investigated. RNA from each sample was extracted, randomly reverse transcribed and sequenced. The sequence analysis was performed in a blinded fashion and confirmed with pathogen-specific PCR. This analysis successfully identified measles virus sequences in two brain samples and herpes simplex virus type-1 sequences in three brain samples. No pathogen was identified in the other two brain specimens. These results were concordant with pathogen-specific PCR and partially concordant with prior neuropathological examinations, demonstrating that deep sequencing can accurately identify viral infections in frozen brain tissue. PMID:24699691
Steiner, S; Vogl, T J; Fischer, P; Steger, W; Neuhaus, P; Keck, H
1995-08-01
The aim of our study was to evaluate a T2-weighted turbo-spinecho sequence in comparison to a T2-weighted spinecho sequence in imaging focal liver lesions. In our study 35 patients with suspected focal liver lesions were examined. Standardised imaging protocol included a conventional T2-weighted SE sequence (TR/TE = 2000/90/45, acquisition time = 10.20) as well as a T2-weighted TSE sequence (TR/TE = 4700/90, acquisition time = 6.33). Calculation of S/N and C/N ratio as a basis of quantitative evaluation was done using standard methods. A diagnostic score was implemented to enable qualitative assessment. In 7% (n = 2) the TSE sequence enabled detection of further liver lesions showing a size of less than 1 cm in diameter. Comparing anatomical details the TSE sequence was superior. S/N and C/N ratio of anatomic and pathologic structures of the TSE sequence were higher compared to results of the SE sequence. Our results indicate that the T2-weighted turbo-spinecho sequence is well appropriate for imaging focal liver lesions, and leads to reduction of imaging time.
Host Cell Virus Entry Mediated by Australian Bat Lyssavirus Envelope G glycoprotein
2013-10-24
39 Figure 7. Comparison of the amino acid sequences of Saccolaimus and Pteropus ABLV G mature protein... sequence analysis revealed that the PCR products were identical. Sequence comparisons of the ABLV N and other lyssavirus N proteins showed that ABLV...Saccolaimus flaviventris) (129). Nucleoprotein sequence comparisons revealed that the Saccolaimus N protein shared 96% amino acid homology with the Pteropus
USDA-ARS?s Scientific Manuscript database
Quantitative PCR (Q-PCR) utilizing specific primer sequences and a fluorogenic, 5’-exonuclease linear hydrolysis probe is well established as a detection and identification method for Phakopsora pachyrhizi, the soybean rust pathogen. Because of the extreme sensitivity of Q-PCR, the DNA of a single u...
Comment on "Protein sequences from mastodon and Tyrannosaurus rex revealed by mass spectrometry".
Pevzner, Pavel A; Kim, Sangtae; Ng, Julio
2008-08-22
Asara et al. (Reports, 13 April 2007, p. 280) reported sequencing of Tyrannosaurus rex proteins and used them to establish the evolutionary relationships between birds and dinosaurs. We argue that the reported T. rex peptides may represent statistical artifacts and call for complete data release to enable experimental and computational verification of their findings.
ERIC Educational Resources Information Center
American Association of Physics Teachers (NJ1), 2009
2009-01-01
Physics First represents an organizational alternative to the traditional high school science sequence. It calls for a re-sequencing of high school courses so that students study physics before chemistry and biology. The purpose of this pamphlet is to provide: (1) Basic information and rationale for the Physics First curriculum; (2) Strategies for…
How to Help Students Conceptualize the Rigorous Definition of the Limit of a Sequence
ERIC Educational Resources Information Center
Roh, Kyeong Hah
2010-01-01
This article suggests an activity, called the epsilon-strip activity, as an instructional method for conceptualization of the rigorous definition of the limit of a sequence via visualization. The article also describes the learning objectives of each instructional step of the activity, and then provides detailed instructional methods to guide…
Lindamood Phonemic Sequencing (LiPS) [R]. What Works Clearinghouse Intervention Report
ERIC Educational Resources Information Center
What Works Clearinghouse, 2008
2008-01-01
The Lindamood Phonemic Sequencing (LiPS)[R] program (formerly called the Auditory Discrimination in Depth[R] [ADD] program) is designed to teach students skills to decode words and to identify individual sounds and blends in words. The program is individualized to meet student needs and is often used with students who have learning disabilities or…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tan, H.
1999-03-31
The purpose of this research is to develop a multiplexed sample processing system in conjunction with multiplexed capillary electrophoresis for high-throughput DNA sequencing. The concept from DNA template to called bases was first demonstrated with a manually operated single capillary system. Later, an automated microfluidic system with 8 channels based on the same principle was successfully constructed. The instrument automatically processes 8 templates through reaction, purification, denaturation, pre-concentration, injection, separation and detection in a parallel fashion. A multiplexed freeze/thaw switching principle and a distribution network were implemented to manage flow direction and sample transportation. Dye-labeled terminator cycle-sequencing reactions are performedmore » in an 8-capillary array in a hot air thermal cycler. Subsequently, the sequencing ladders are directly loaded into a corresponding size-exclusion chromatographic column operated at {approximately} 60 C for purification. On-line denaturation and stacking injection for capillary electrophoresis is simultaneously accomplished at a cross assembly set at {approximately} 70 C. Not only the separation capillary array but also the reaction capillary array and purification columns can be regenerated after every run. DNA sequencing data from this system allow base calling up to 460 bases with accuracy of 98%.« less
Coval: Improving Alignment Quality and Variant Calling Accuracy for Next-Generation Sequencing Data
Kosugi, Shunichi; Natsume, Satoshi; Yoshida, Kentaro; MacLean, Daniel; Cano, Liliana; Kamoun, Sophien; Terauchi, Ryohei
2013-01-01
Accurate identification of DNA polymorphisms using next-generation sequencing technology is challenging because of a high rate of sequencing error and incorrect mapping of reads to reference genomes. Currently available short read aligners and DNA variant callers suffer from these problems. We developed the Coval software to improve the quality of short read alignments. Coval is designed to minimize the incidence of spurious alignment of short reads, by filtering mismatched reads that remained in alignments after local realignment and error correction of mismatched reads. The error correction is executed based on the base quality and allele frequency at the non-reference positions for an individual or pooled sample. We demonstrated the utility of Coval by applying it to simulated genomes and experimentally obtained short-read data of rice, nematode, and mouse. Moreover, we found an unexpectedly large number of incorrectly mapped reads in ‘targeted’ alignments, where the whole genome sequencing reads had been aligned to a local genomic segment, and showed that Coval effectively eliminated such spurious alignments. We conclude that Coval significantly improves the quality of short-read sequence alignments, thereby increasing the calling accuracy of currently available tools for SNP and indel identification. Coval is available at http://sourceforge.net/projects/coval105/. PMID:24116042
Positional bias in variant calls against draft reference assemblies.
Briskine, Roman V; Shimizu, Kentaro K
2017-03-28
Whole genome resequencing projects may implement variant calling using draft reference genomes assembled de novo from short-read libraries. Despite lower quality of such assemblies, they allowed researchers to extend a wide range of population genetic and genome-wide association analyses to non-model species. As the variant calling pipelines are complex and involve many software packages, it is important to understand inherent biases and limitations at each step of the analysis. In this article, we report a positional bias present in variant calling performed against draft reference assemblies constructed from de Bruijn or string overlap graphs. We assessed how frequently variants appeared at each position counted from ends of a contig or scaffold sequence, and discovered unexpectedly high number of variants at the positions related to the length of either k-mers or reads used for the assembly. We detected the bias in both publicly available draft assemblies from Assemblathon 2 competition as well as in the assemblies we generated from our simulated short-read data. Simulations confirmed that the bias causing variants are predominantly false positives induced by reads from spatially distant repeated sequences. The bias is particularly strong in contig assemblies. Scaffolding does not eliminate the bias but tends to mitigate it because of the changes in variants' relative positions and alterations in read alignments. The bias can be effectively reduced by filtering out the variants that reside in repetitive elements. Draft genome sequences generated by several popular assemblers appear to be susceptible to the positional bias potentially affecting many resequencing projects in non-model species. The bias is inherent to the assembly algorithms and arises from their particular handling of repeated sequences. It is recommended to reduce the bias by filtering especially if higher-quality genome assembly cannot be achieved. Our findings can help other researchers to improve the quality of their variant data sets and reduce artefactual findings in downstream analyses.
Are the TTAGG and TTAGGG telomeric repeats phylogenetically conserved in aculeate Hymenoptera?
NASA Astrophysics Data System (ADS)
Menezes, Rodolpho S. T.; Bardella, Vanessa B.; Cabral-de-Mello, Diogo C.; Lucena, Daercio A. A.; Almeida, Eduardo A. B.
2017-10-01
Despite the (TTAGG)n telomeric repeat supposed being the ancestral DNA motif of telomeres in insects, it was repeatedly lost within some insect orders. Notably, parasitoid hymenopterans and the social wasp Metapolybia decorata (Gribodo) lack the (TTAGG)n sequence, but in other representatives of Hymenoptera, this motif was noticed, such as different ant species and the honeybee. These findings raise the question of whether the insect telomeric repeat is or not phylogenetically predominant in Hymenoptera. Thus, we evaluated the occurrence of both the (TTAGG)n sequence and the vertebrate telomere sequence (TTAGGG)n using dot-blotting hybridization in 25 aculeate species of Hymenoptera. Our results revealed the absence of (TTAGG)n sequence in all tested species, elevating the number of hymenopteran families lacking this telomeric sequence to 13 out of the 15 tested families so far. The (TTAGGG)n was not observed in any tested species. Based on our data and compiled information, we suggest that the (TTAGG)n sequence was putatively lost in the ancestor of Apocrita with at least two subsequent independent regains (in Formicidae and Apidae).
Mukherjee, J J; Dekker, E E
1987-10-25
Starting with 100 g (wet weight) of a mutant of Escherichia coli K-12 forced to grow on L-threonine as sole carbon source, we developed a 6-step procedure that provides 30-40 mg of homogeneous 2-amino-3-ketobutyrate CoA ligase (also called aminoacetone synthetase or synthase). This ligase, which catalyzes the cleavage/condensation reaction between 2-amino-3-ketobutyrate (the presumed product of the L-threonine dehydrogenase-catalyzed reaction) and glycine + acetyl-CoA, has an apparent molecular weight approximately equal to 85,000 and consists of two identical (or nearly identical) subunits with Mr = 42,000. Computer analysis of amino acid composition data, which gives the best fit nearest integer ratio for each residue, indicates a total of 387 amino acids/subunit with a calculated Mr = 42,093. Stepwise Edman degradation provided the N-terminal sequence of the first 21 amino acids. It is a pyridoxal phosphate-dependent enzyme since (a) several carbonyl reagents caused greater than 90% loss of activity, (b) dialysis against buffer containing hydroxylamine resulted in 89% loss of activity coincident with an 86% decrease in absorptivity at 428 nm, (c) incubation of the apoenzyme with 20 microM pyridoxal phosphate showed a parallel recovery (greater than 90%) of activity and 428-nm absorptivity, and (d) reduction of the holoenzyme with NaBH4 resulted in complete inactivation, disappearance of a new absorption maximum at 333 nm. Strict specificity for glycine is shown but acetyl-CoA (100%), n-propionyl-CoA (127%), or n-butyryl-CoA (16%) is utilized in the condensation reaction. Apparent Km values for acetyl-CoA, n-propionyl-CoA, and glycine are 59 microM, 80 microM, and 12 mM, respectively; the pH optimum = 7.5. Added divalent metal ions or sulfhydryl compounds inhibited catalysis of the condensation reaction.
Miraoui, Hichem; Dwyer, Andrew A.; Sykiotis, Gerasimos P.; Plummer, Lacey; Chung, Wilson; Feng, Bihua; Beenken, Andrew; Clarke, Jeff; Pers, Tune H.; Dworzynski, Piotr; Keefe, Kimberley; Niedziela, Marek; Raivio, Taneli; Crowley, William F.; Seminara, Stephanie B.; Quinton, Richard; Hughes, Virginia A.; Kumanov, Philip; Young, Jacques; Yialamas, Maria A.; Hall, Janet E.; Van Vliet, Guy; Chanoine, Jean-Pierre; Rubenstein, John; Mohammadi, Moosa; Tsai, Pei-San; Sidis, Yisrael; Lage, Kasper; Pitteloud, Nelly
2013-01-01
Congenital hypogonadotropic hypogonadism (CHH) and its anosmia-associated form (Kallmann syndrome [KS]) are genetically heterogeneous. Among the >15 genes implicated in these conditions, mutations in FGF8 and FGFR1 account for ∼12% of cases; notably, KAL1 and HS6ST1 are also involved in FGFR1 signaling and can be mutated in CHH. We therefore hypothesized that mutations in genes encoding a broader range of modulators of the FGFR1 pathway might contribute to the genetics of CHH as causal or modifier mutations. Thus, we aimed to (1) investigate whether CHH individuals harbor mutations in members of the so-called “FGF8 synexpression” group and (2) validate the ability of a bioinformatics algorithm on the basis of protein-protein interactome data (interactome-based affiliation scoring [IBAS]) to identify high-quality candidate genes. On the basis of sequence homology, expression, and structural and functional data, seven genes were selected and sequenced in 386 unrelated CHH individuals and 155 controls. Except for FGF18 and SPRY2, all other genes were found to be mutated in CHH individuals: FGF17 (n = 3 individuals), IL17RD (n = 8), DUSP6 (n = 5), SPRY4 (n = 14), and FLRT3 (n = 3). Independently, IBAS predicted FGF17 and IL17RD as the two top candidates in the entire proteome on the basis of a statistical test of their protein-protein interaction patterns to proteins known to be altered in CHH. Most of the FGF17 and IL17RD mutations altered protein function in vitro. IL17RD mutations were found only in KS individuals and were strongly linked to hearing loss (6/8 individuals). Mutations in genes encoding components of the FGF pathway are associated with complex modes of CHH inheritance and act primarily as contributors to an oligogenic genetic architecture underlying CHH. PMID:23643382
Phlugis ocraceovittata and its ultrasonic calling song (Orthoptera, Tettigoniidae, Phlugidini).
Chamorro-Rengifo, Juliana; Braun, Holger
2016-05-03
Some observations on the small predatory katydid Phlugis ocraceovittata Piza 1960 from southern Brazil are presented. A male was calling both day and night, producing long uniformly structured sequences with maximum energy between 40 and 60 kHz. According to anecdotal and indirect evidence the species is not exclusively predacious and can live partly also on vegetable food.
Wong, Frances Kam Yuet; So, Ching; Chau, June; Law, Antony Kwan Pui; Tam, Stanley Ku Fu; McGhee, Sarah
2015-01-01
home visits and telephone calls are two often used approaches in transitional care, but their differential economic effects are unknown. to examine the differential economic benefits of home visits with telephone calls and telephone calls only in transitional discharge support. cost-effectiveness analysis conducted alongside a randomised controlled trial (RCT). patients discharged from medical units randomly assigned to control (control, N = 210), home visits with calls (home, N = 196) and calls only (call, N = 204). cost-effectiveness analyses were conducted from the societal perspective comparing monetary benefits and quality-adjusted life years (QALYs) gained. the home arm was less costly but less effective at 28 days and was dominating (less costly and more effective) at 84 days. The call arm was dominating at both 28 and 84 days. The incremental QALY for the home arm was -0.0002/0.0008 (28/84 days), and the call arm was 0.0022/0.0104 (28/84 days). When the three groups were compared, the call arm had a higher probability being cost-effective at 84 days but not at 28 days (home: 53%, call: 35% (28 days) versus home: 22%, call: 73% (84 days)) measuring against the NICE threshold of £20,000. the original RCT showed that the bundled intervention involving home visits and calls was more effective than calls only in the reduction of hospital readmissions. This study adds a cost perspective to inform policymakers that both home visits and calls only are cost-effective for transitional care support, but calls only have a higher chance of being cost-effective for a sustained period after intervention. © The Author 2014. Published by Oxford University Press on behalf of the British Geriatrics Society.
Estimating Lion Abundance using N-mixture Models for Social Species
Belant, Jerrold L.; Bled, Florent; Wilton, Clay M.; Fyumagwa, Robert; Mwampeta, Stanslaus B.; Beyer, Dean E.
2016-01-01
Declining populations of large carnivores worldwide, and the complexities of managing human-carnivore conflicts, require accurate population estimates of large carnivores to promote their long-term persistence through well-informed management We used N-mixture models to estimate lion (Panthera leo) abundance from call-in and track surveys in southeastern Serengeti National Park, Tanzania. Because of potential habituation to broadcasted calls and social behavior, we developed a hierarchical observation process within the N-mixture model conditioning lion detectability on their group response to call-ins and individual detection probabilities. We estimated 270 lions (95% credible interval = 170–551) using call-ins but were unable to estimate lion abundance from track data. We found a weak negative relationship between predicted track density and predicted lion abundance from the call-in surveys. Luminosity was negatively correlated with individual detection probability during call-in surveys. Lion abundance and track density were influenced by landcover, but direction of the corresponding effects were undetermined. N-mixture models allowed us to incorporate multiple parameters (e.g., landcover, luminosity, observer effect) influencing lion abundance and probability of detection directly into abundance estimates. We suggest that N-mixture models employing a hierarchical observation process can be used to estimate abundance of other social, herding, and grouping species. PMID:27786283
Estimating Lion Abundance using N-mixture Models for Social Species.
Belant, Jerrold L; Bled, Florent; Wilton, Clay M; Fyumagwa, Robert; Mwampeta, Stanslaus B; Beyer, Dean E
2016-10-27
Declining populations of large carnivores worldwide, and the complexities of managing human-carnivore conflicts, require accurate population estimates of large carnivores to promote their long-term persistence through well-informed management We used N-mixture models to estimate lion (Panthera leo) abundance from call-in and track surveys in southeastern Serengeti National Park, Tanzania. Because of potential habituation to broadcasted calls and social behavior, we developed a hierarchical observation process within the N-mixture model conditioning lion detectability on their group response to call-ins and individual detection probabilities. We estimated 270 lions (95% credible interval = 170-551) using call-ins but were unable to estimate lion abundance from track data. We found a weak negative relationship between predicted track density and predicted lion abundance from the call-in surveys. Luminosity was negatively correlated with individual detection probability during call-in surveys. Lion abundance and track density were influenced by landcover, but direction of the corresponding effects were undetermined. N-mixture models allowed us to incorporate multiple parameters (e.g., landcover, luminosity, observer effect) influencing lion abundance and probability of detection directly into abundance estimates. We suggest that N-mixture models employing a hierarchical observation process can be used to estimate abundance of other social, herding, and grouping species.
Characteristics of fin whale vocalizations recorded on instruments in the northeast Pacific Ocean
NASA Astrophysics Data System (ADS)
Weirathmueller, Maria Michelle Josephine
This thesis focuses on fin whale vocalizations recorded on ocean bottom seismometers (OBSs) in the Northeast Pacific Ocean, using data collected between 2003 and 2013. OBSs are a valuable, and largely untapped resource for the passive acoustic monitoring of large baleen whales. This dissertation is divided into three parts, each of which uses the recordings of fin whale vocalizations to better understand their calling behaviors and distributions. The first study describes the development of a technique to extract source levels of fin whale vocalizations from OBS recordings. Source levels were estimated using data collected on a network of eight OBSs in the Northeast Pacific Ocean. The acoustic pressure levels measured at the instruments were adjusted for the propagation path between the calling whales and the instruments using the call location and estimating losses along the acoustic travel path. A total of 1241 calls were used to estimate an average source level of 189 +/-5.8 dB re 1uPa 1m. This variability is largely attributed to uncertainties in the horizontal and vertical position of the fin whale at the time of each call, and the effect of these uncertainties on subsequent calculations. The second study describes a semi-automated method for obtaining horizontal ranges to vocalizing fin whales using the timing and relative amplitude of multipath arrivals. A matched filter is used to detect fin whale calls and pick the relative times and amplitudes of multipath arrivals. Ray-based propagation models are used to predict multipath times and amplitudes as function of range. Because the direct and first multiple arrivals are not always observed, three hypotheses for the paths of the observed arrivals are considered; the solution is the hypothesis and range that optimizes the fit to the data. Ray-theoretical amplitudes are not accurate and solutions are improved by determining amplitudes from the observations using a bootstrap method. Data from ocean bottom seismometers at two locations are used to assess the method: one on the Juan de Fuca Ridge, a bathymetrically complex mid-ocean ridge environment, and the other at a flat sedimented location in the Cascadia Basin. At both sites, the method is reliable up to 4 km range which is sufficient to enable estimates of call density. The third study explores spatial and temporal trends in fin whale calling patterns. The frequency and inter-pulse interval of fin whale 20 Hz vocalizations were observed over 10 years from 2003-2013 on bottom mounted hydrophones and OBSs in the northeast Pacific Ocean. The instrument locations extended from 40°N and 130°W to 125°W with water depths ranging from 1500-4000 m. The inter-pulse interval (IPI) of fin whale song sequences was observed to increase at a rate of 0.59 seconds/year over the decade of observation. During the same time period, peak frequency decreased at a rate of 0.16 Hz/year. Two primary call patterns were observed. During the earlier years, the more commonly observed pattern had a single frequency and single IPI. In later years, a doublet pattern emerged, with two dominant frequencies and two IPIs. Many call sequences in the intervening years appeared to represent a transitional state between the two patterns. The overall trend was consistent across the entire geographical span, although some regional differences exist.
The first genome sequences of human bocaviruses from Vietnam
Thanh, Tran Tan; Van, Hoang Minh Tu; Hong, Nguyen Thi Thu; Nhu, Le Nguyen Truc; Anh, Nguyen To; Tuan, Ha Manh; Hien, Ho Van; Tuong, Nguyen Manh; Kien, Trinh Trung; Khanh, Truong Huu; Nhan, Le Nguyen Thanh; Hung, Nguyen Thanh; Chau, Nguyen Van Vinh; Thwaites, Guy; van Doorn, H. Rogier; Tan, Le Van
2017-01-01
As part of an ongoing effort to generate complete genome sequences of hand, foot and mouth disease-causing enteroviruses directly from clinical specimens, two complete coding sequences and two partial genomic sequences of human bocavirus 1 (n=3) and 2 (n=1) were co-amplified and sequenced, representing the first genome sequences of human bocaviruses from Vietnam. The sequences may aid future study aiming at understanding the evolution of the virus. PMID:28090592
Demczuk, W; Sidhu, S; Unemo, M; Whiley, D M; Allen, V G; Dillon, J R; Cole, M; Seah, C; Trembizki, E; Trees, D L; Kersh, E N; Abrams, A J; de Vries, H J C; van Dam, A P; Medina, I; Bharat, A; Mulvey, M R; Van Domselaar, G; Martin, I
2017-05-01
A curated Web-based user-friendly sequence typing tool based on antimicrobial resistance determinants in Neisseria gonorrhoeae was developed and is publicly accessible (https://ngstar.canada.ca). The N. gonorrhoeae Sequence Typing for Antimicrobial Resistance (NG-STAR) molecular typing scheme uses the DNA sequences of 7 genes ( penA , mtrR , porB , ponA , gyrA , parC , and 23S rRNA) associated with resistance to β-lactam antimicrobials, macrolides, or fluoroquinolones. NG-STAR uses the entire penA sequence, combining the historical nomenclature for penA types I to XXXVIII with novel nucleotide sequence designations; the full mtrR sequence and a portion of its promoter region; portions of ponA , porB , gyrA , and parC ; and 23S rRNA sequences. NG-STAR grouped 768 isolates into 139 sequence types (STs) ( n = 660) consisting of 29 clonal complexes (CCs) having a maximum of a single-locus variation, and 76 NG-STAR STs ( n = 109) were identified as unrelated singletons. NG-STAR had a high Simpson's diversity index value of 96.5% (95% confidence interval [CI] = 0.959 to 0.969). The most common STs were NG-STAR ST-90 ( n = 100; 13.0%), ST-42 and ST-91 ( n = 45; 5.9%), ST-64 ( n = 44; 5.72%), and ST-139 ( n = 42; 5.5%). Decreased susceptibility to azithromycin was associated with NG-STAR ST-58, ST-61, ST-64, ST-79, ST-91, and ST-139 ( n = 156; 92.3%); decreased susceptibility to cephalosporins was associated with NG-STAR ST-90, ST-91, and ST-97 ( n = 162; 94.2%); and ciprofloxacin resistance was associated with NG-STAR ST-26, ST-90, ST-91, ST-97, ST-150, and ST-158 ( n = 196; 98.0%). All isolates of NG-STAR ST-42, ST-43, ST-63, ST-81, and ST-160 ( n = 106) were susceptible to all four antimicrobials. The standardization of nomenclature associated with antimicrobial resistance determinants through an internationally available database will facilitate the monitoring of the global dissemination of antimicrobial-resistant N. gonorrhoeae strains. © Crown copyright 2017.
Image encryption using random sequence generated from generalized information domain
NASA Astrophysics Data System (ADS)
Xia-Yan, Zhang; Guo-Ji, Zhang; Xuan, Li; Ya-Zhou, Ren; Jie-Hua, Wu
2016-05-01
A novel image encryption method based on the random sequence generated from the generalized information domain and permutation-diffusion architecture is proposed. The random sequence is generated by reconstruction from the generalized information file and discrete trajectory extraction from the data stream. The trajectory address sequence is used to generate a P-box to shuffle the plain image while random sequences are treated as keystreams. A new factor called drift factor is employed to accelerate and enhance the performance of the random sequence generator. An initial value is introduced to make the encryption method an approximately one-time pad. Experimental results show that the random sequences pass the NIST statistical test with a high ratio and extensive analysis demonstrates that the new encryption scheme has superior security.
Do, Hongdo; Molania, Ramyar
2017-01-01
The identification of genomic rearrangements with high sensitivity and specificity using massively parallel sequencing remains a major challenge, particularly in precision medicine and cancer research. Here, we describe a new method for detecting rearrangements, GRIDSS (Genome Rearrangement IDentification Software Suite). GRIDSS is a multithreaded structural variant (SV) caller that performs efficient genome-wide break-end assembly prior to variant calling using a novel positional de Bruijn graph-based assembler. By combining assembly, split read, and read pair evidence using a probabilistic scoring, GRIDSS achieves high sensitivity and specificity on simulated, cell line, and patient tumor data, recently winning SV subchallenge #5 of the ICGC-TCGA DREAM8.5 Somatic Mutation Calling Challenge. On human cell line data, GRIDSS halves the false discovery rate compared to other recent methods while matching or exceeding their sensitivity. GRIDSS identifies nontemplate sequence insertions, microhomologies, and large imperfect homologies, estimates a quality score for each breakpoint, stratifies calls into high or low confidence, and supports multisample analysis. PMID:29097403
New Sequences with Low Correlation and Large Family Size
NASA Astrophysics Data System (ADS)
Zeng, Fanxin
In direct-sequence code-division multiple-access (DS-CDMA) communication systems and direct-sequence ultra wideband (DS-UWB) radios, sequences with low correlation and large family size are important for reducing multiple access interference (MAI) and accepting more active users, respectively. In this paper, a new collection of families of sequences of length pn-1, which includes three constructions, is proposed. The maximum number of cyclically distinct families without GMW sequences in each construction is φ(pn-1)/n·φ(pm-1)/m, where p is a prime number, n is an even number, and n=2m, and these sequences can be binary or polyphase depending upon choice of the parameter p. In Construction I, there are pn distinct sequences within each family and the new sequences have at most d+2 nontrivial periodic correlation {-pm-1, -1, pm-1, 2pm-1,…,dpm-1}. In Construction II, the new sequences have large family size p2n and possibly take the nontrivial correlation values in {-pm-1, -1, pm-1, 2pm-1,…,(3d-4)pm-1}. In Construction III, the new sequences possess the largest family size p(d-1)n and have at most 2d correlation levels {-pm-1, -1,pm-1, 2pm-1,…,(2d-2)pm-1}. Three constructions are near-optimal with respect to the Welch bound because the values of their Welch-Ratios are moderate, WR_??_d, WR_??_3d-4 and WR_??_2d-2, respectively. Each family in Constructions I, II and III contains a GMW sequence. In addition, Helleseth sequences and Niho sequences are special cases in Constructions I and III, and their restriction conditions to the integers m and n, pm≠2 (mod 3) and n≅0 (mod 4), respectively, are removed in our sequences. Our sequences in Construction III include the sequences with Niho type decimation 3·2m-2, too. Finally, some open questions are pointed out and an example that illustrates the performance of these sequences is given.
Song, Jiangning; Tan, Hao; Wang, Mingjun; Webb, Geoffrey I.; Akutsu, Tatsuya
2012-01-01
Protein backbone torsion angles (Phi) and (Psi) involve two rotation angles rotating around the Cα-N bond (Phi) and the Cα-C bond (Psi). Due to the planarity of the linked rigid peptide bonds, these two angles can essentially determine the backbone geometry of proteins. Accordingly, the accurate prediction of protein backbone torsion angle from sequence information can assist the prediction of protein structures. In this study, we develop a new approach called TANGLE (Torsion ANGLE predictor) to predict the protein backbone torsion angles from amino acid sequences. TANGLE uses a two-level support vector regression approach to perform real-value torsion angle prediction using a variety of features derived from amino acid sequences, including the evolutionary profiles in the form of position-specific scoring matrices, predicted secondary structure, solvent accessibility and natively disordered region as well as other global sequence features. When evaluated based on a large benchmark dataset of 1,526 non-homologous proteins, the mean absolute errors (MAEs) of the Phi and Psi angle prediction are 27.8° and 44.6°, respectively, which are 1% and 3% respectively lower than that using one of the state-of-the-art prediction tools ANGLOR. Moreover, the prediction of TANGLE is significantly better than a random predictor that was built on the amino acid-specific basis, with the p-value<1.46e-147 and 7.97e-150, respectively by the Wilcoxon signed rank test. As a complementary approach to the current torsion angle prediction algorithms, TANGLE should prove useful in predicting protein structural properties and assisting protein fold recognition by applying the predicted torsion angles as useful restraints. TANGLE is freely accessible at http://sunflower.kuicr.kyoto-u.ac.jp/~sjn/TANGLE/. PMID:22319565
Application of Logic to Integer Sequences: A Survey
NASA Astrophysics Data System (ADS)
Makowsky, Johann A.
Chomsky and Schützenberger showed in 1963 that the sequence d L (n), which counts the number of words of a given length n in a regular language L, satisfies a linear recurrence relation with constant coefficients for n, or equivalently, the generating function g_L(x)=sumn d_L(n) x^n is a rational function. In this talk we survey results concerning sequences a(n) of natural numbers which satisfy linear recurrence relations over ℤ or ℤ m , and
Evaluation of new spectral bands for multi-spectral imaging: SMIRR aircraft test results
Goetz, Alexander F.H.; Rowan, Lawrence C.; Barringer, Anthony R.
1980-01-01
A 10-channel radiometer called the Shuttle Multispectral Infrared Radiometer (SMIRR) is scheduled to take data from orbit on the second shuttle orbital light test. As part of the instrument test sequence, a series of aircraft flights was carried out over 10 test areas in Utah and Nevada. Apart from vegetation, the materials exposed at the surface were volcanic sequences ranging from tuffs to basalts, areas of hydrothermally altered volcanic rocks, sedimentary sequences of sandstone and carbonate rocks, and alluvial cover.
A DNA sequence analysis package for the IBM personal computer.
Lagrimini, L M; Brentano, S T; Donelson, J E
1984-01-01
We present here a collection of DNA sequence analysis programs, called "PC Sequence" (PCS), which are designed to run on the IBM Personal Computer (PC). These programs are written in IBM PC compiled BASIC and take full advantage of the IBM PC's speed, error handling, and graphics capabilities. For a modest initial expense in hardware any laboratory can use these programs to quickly perform computer analysis on DNA sequences. They are written with the novice user in mind and require very little training or previous experience with computers. Also provided are a text editing program for creating and modifying DNA sequence files and a communications program which enables the PC to communicate with and collect information from mainframe computers and DNA sequence databases. PMID:6546433
Innate Immune Complexity in the Purple Sea Urchin: Diversity of the Sp185/333 System
Smith, L. Courtney
2012-01-01
The California purple sea urchin, Strongylocentrotus purpuratus, is a long-lived echinoderm with a complex and sophisticated innate immune system. There are several large gene families that function in immunity in this species including the Sp185/333 gene family that has ∼50 (±10) members. The family shows intriguing sequence diversity and encodes a broad array of diverse yet similar proteins. The genes have two exons of which the second encodes the mature protein and has repeats and blocks of sequence called elements. Mosaics of element patterns plus single nucleotide polymorphisms-based variants of the elements result in significant sequence diversity among the genes yet maintains similar structure among the members of the family. Sequence of a bacterial artificial chromosome insert shows a cluster of six, tightly linked Sp185/333 genes that are flanked by GA microsatellites. The sequences between the GA microsatellites in which the Sp185/333 genes and flanking regions are located, are much more similar to each other than are the sequences outside the microsatellites suggesting processes such as gene conversion, recombination, or duplication. However, close linkage does not correspond with greater sequence similarity compared to randomly cloned and sequenced genes that are unlikely to be linked. There are three segmental duplications that are bounded by GAT microsatellites and include three almost identical genes plus flanking regions. RNA editing is detectible throughout the mRNAs based on comparisons to the genes, which, in combination with putative post-translational modifications to the proteins, results in broad arrays of Sp185/333 proteins that differ among individuals. The mature proteins have an N-terminal glycine-rich region, a central RGD motif, and a C-terminal histidine-rich region. The Sp185/333 proteins are localized to the cell surface and are found within vesicles in subsets of polygonal and small phagocytes. The coelomocyte proteome shows full-length and truncated proteins, including some with missense sequence. Current results suggest that both native Sp185/333 proteins and a recombinant protein bind bacteria and are likely important in sea urchin innate immunity. PMID:22566951
Description of calls from private well owners to a national well water hotline, 2013.
Ridpath, Alison; Taylor, Ethel; Greenstreet, Charlene; Martens, Margaret; Wicke, Heather; Martin, Colleen
2016-02-15
Water Systems Council (WSC) is a national, non-profit organization providing education and resources to private household well owners. Since 2003, WSC has provided wellcare®, a toll-free telephone hotline to answer questions from the public regarding well stewardship. In order to identify knowledge gaps regarding well stewardship among private well owners, we obtained data from WSC and reviewed calls made during 2013 to wellcare®. WSC records data from each wellcare® call-including caller information, primary reason for call, main use of well water, and if they were calling about a cistern, private well, shared well, or spring. We searched for calls with key words indicating specific contaminants of interest and reviewed primary reasons for calls. Calls classified as primarily testing-related were further categorized depending on whether the caller asked about how to test well water or how to interpret testing results. During 2013, wellcare® received 1100 calls from private well owners who were residents of 48 states. Among these calls, 87 (8%) mentioned radon, 83 (8%) coliforms, 51 (5%) chemicals related to fracking, 34 (3%) arsenic, and 32 (3%) nitrates key words. Only 38% of private well owners reported conducting any well maintenance activities, such as inspecting, cleaning, repairing the well, or testing well water, during the previous 12 months. The primary reason for calls were related to well water testing (n=403), general information relating to wells (n=249), contaminants (n=229), and well water treatment (n=97). Among calls related to testing, 319 had questions about how to test their well water, and 33 had questions about how to interpret testing results. Calls from private well owners to the wellcare® Hotline during 2013 identified key knowledge gaps regarding well stewardship; well owners are generally not testing or maintaining their wells, have questions about well water testing treatment, and concerns about well water contaminants. Published by Elsevier B.V.
One Interesting Family of Diophantine Triplets
ERIC Educational Resources Information Center
Deshpande, M. N.
2002-01-01
In this note properties of two sequences generated by the recurrence relation G[subscript n] +2 = 4 G[subscript n] +1 - G[subscript n], are studied. It is shown that one of the sequences leads to a family of diophantine triplets. Some interesting properties of these sequences are also established.
Atomistic model of the spider silk nanostructure
NASA Astrophysics Data System (ADS)
Keten, Sinan; Buehler, Markus J.
2010-04-01
Spider silk is an ultrastrong and extensible self-assembling biopolymer that outperforms the mechanical characteristics of many synthetic materials including steel. Here we report atomic-level structures that represent aggregates of MaSp1 proteins from the N. Clavipes silk sequence based on a bottom-up computational approach using replica exchange molecular dynamics. We discover that poly-alanine regions predominantly form distinct and orderly beta-sheet crystal domains while disorderly structures are formed by poly-glycine repeats, resembling 31-helices. These could be the molecular source of the large semicrystalline fraction observed in silks, and also form the basis of the so-called "prestretched" molecular configuration. Our structures are validated against experimental data based on dihedral angle pair calculations presented in Ramachandran plots, alpha-carbon atomic distances, as well as secondary structure content.
Weighted Distances in Scale-Free Configuration Models
NASA Astrophysics Data System (ADS)
Adriaans, Erwin; Komjáthy, Júlia
2018-01-01
In this paper we study first-passage percolation in the configuration model with empirical degree distribution that follows a power-law with exponent τ \\in (2,3) . We assign independent and identically distributed (i.i.d.) weights to the edges of the graph. We investigate the weighted distance (the length of the shortest weighted path) between two uniformly chosen vertices, called typical distances. When the underlying age-dependent branching process approximating the local neighborhoods of vertices is found to produce infinitely many individuals in finite time—called explosive branching process—Baroni, Hofstad and the second author showed in Baroni et al. (J Appl Probab 54(1):146-164, 2017) that typical distances converge in distribution to a bounded random variable. The order of magnitude of typical distances remained open for the τ \\in (2,3) case when the underlying branching process is not explosive. We close this gap by determining the first order of magnitude of typical distances in this regime for arbitrary, not necessary continuous edge-weight distributions that produce a non-explosive age-dependent branching process with infinite mean power-law offspring distributions. This sequence tends to infinity with the amount of vertices, and, by choosing an appropriate weight distribution, can be tuned to be any growing function that is O(log log n) , where n is the number of vertices in the graph. We show that the result remains valid for the the erased configuration model as well, where we delete loops and any second and further edges between two vertices.
Sorci, Mirco; Dassa, Bareket; Liu, Hongwei; Anand, Gaurav; Dutta, Amit K; Pietrokovski, Shmuel; Belfort, Marlene; Belfort, Georges
2013-06-18
In order to measure the intermolecular binding forces between two halves (or partners) of naturally split protein splicing elements called inteins, a novel thiol-hydrazide linker was designed and used to orient immobilized antibodies specific for each partner. Activation of the surfaces was achieved in one step, allowing direct intermolecular force measurement of the binding of the two partners of the split intein (called protein trans-splicing). Through this binding process, a whole functional intein is formed resulting in subsequent splicing. Atomic force microscopy (AFM) was used to directly measure the split intein partner binding at 1 μm/s between native (wild-type) and mixed pairs of C- and N-terminal partners of naturally occurring split inteins from three cyanobacteria. Native and mixed pairs exhibit similar binding forces within the error of the measurement technique (~52 pN). Bioinformatic sequence analysis and computational structural analysis discovered a zipper-like contact between the two partners with electrostatic and nonpolar attraction between multiple aligned ion pairs and hydrophobic residues. Also, we tested the Jarzynski's equality and demonstrated, as expected, that nonequilibrium dissipative measurements obtained here gave larger energies of interaction as compared with those for equilibrium. Hence, AFM coupled with our immobilization strategy and computational studies provides a useful analytical tool for the direct measurement of intermolecular association of split inteins and could be extended to any interacting protein pair.
Products purchased from family farming for school meals in the cities of Rio Grande do Sul
Ferigollo, Daniele; Kirsten, Vanessa Ramos; Heckler, Dienifer; Figueredo, Oscar Agustín Torres; Perez-Cassarino, Julian; Triches, Rozane Márcia
2017-01-01
ABSTRACT OBJECTIVE This study aims to verify the adequacy profile of the cities of the State of Rio Grande do Sul, Brazil, in relation to the purchase of products of family farming by the Programa Nacional de Alimentação Escolar (PNAE - National Program of School Meals). METHODS This is a quantitative descriptive study, with secondary data analysis (public calls-to-bid). The sample consisted of approximately 10% (n = 52) of the cities in the State, establishing a representation by mesoregion and size of the population. We have assessed the percentage of food purchased from family farming, as well as the type of product, requirements of frequency, delivery points, and presence of prices in 114 notices of public calls-to-bid, in 2013. RESULTS Of the cities analyzed, 71.2% (n = 37) reached 30% of food purchased from family farming. Most public calls-to-bid demanded both products of plant (90.4%; n = 103) and animal origin (79.8%; n = 91). Regarding the degree of processing, fresh products appeared in 92.1% (n = 105) of the public calls-to-bid. In relation to the delivery of products, centralized (49.1%; n = 56) and weekly deliveries (47.4%; n = 54) were the most described. Only 60% (n = 68) of the public calls-to-bid contained the price of products. CONCLUSIONS Most of the cities analyzed have fulfilled what is determined by the legislation of the PNAE. We have found in the public calls-to-bid a wide variety of food, both of plant and animal origin, and most of it is fresh. In relation to the delivery of the products, the centralized and weekly options prevailed. PMID:28225910
Homozygous and hemizygous CNV detection from exome sequencing data in a Mendelian disease cohort
Gambin, Tomasz; Akdemir, Zeynep C.; Yuan, Bo; Gu, Shen; Chiang, Theodore; Carvalho, Claudia M.B.; Shaw, Chad; Jhangiani, Shalini; Boone, Philip M.; Eldomery, Mohammad K.; Karaca, Ender; Bayram, Yavuz; Stray-Pedersen, Asbjørg; Muzny, Donna; Charng, Wu-Lin; Bahrambeigi, Vahid; Belmont, John W.; Boerwinkle, Eric; Beaudet, Arthur L.; Gibbs, Richard A.
2017-01-01
Abstract We developed an algorithm, HMZDelFinder, that uses whole exome sequencing (WES) data to identify rare and intragenic homozygous and hemizygous (HMZ) deletions that may represent complete loss-of-function of the indicated gene. HMZDelFinder was applied to 4866 samples in the Baylor–Hopkins Center for Mendelian Genomics (BHCMG) cohort and detected 773 HMZ deletion calls (567 homozygous or 206 hemizygous) with an estimated sensitivity of 86.5% (82% for single-exonic and 88% for multi-exonic calls) and precision of 78% (53% single-exonic and 96% for multi-exonic calls). Out of 773 HMZDelFinder-detected deletion calls, 82 were subjected to array comparative genomic hybridization (aCGH) and/or breakpoint PCR and 64 were confirmed. These include 18 single-exon deletions out of which 8 were exclusively detected by HMZDelFinder and not by any of seven other CNV detection tools examined. Further investigation of the 64 validated deletion calls revealed at least 15 pathogenic HMZ deletions. Of those, 7 accounted for 17–50% of pathogenic CNVs in different disease cohorts where 7.1–11% of the molecular diagnosis solved rate was attributed to CNVs. In summary, we present an algorithm to detect rare, intragenic, single-exon deletion CNVs using WES data; this tool can be useful for disease gene discovery efforts and clinical WES analyses. PMID:27980096
Ridley, R G; Patel, H V; Gerber, G E; Morton, R C; Freeman, K B
1986-01-01
A cDNA clone spanning the entire amino acid sequence of the nuclear-encoded uncoupling protein of rat brown adipose tissue mitochondria has been isolated and sequenced. With the exception of the N-terminal methionine the deduced N-terminus of the newly synthesized uncoupling protein is identical to the N-terminal 30 amino acids of the native uncoupling protein as determined by protein sequencing. This proves that the protein contains no N-terminal mitochondrial targeting prepiece and that a targeting region must reside within the amino acid sequence of the mature protein. Images PMID:3012461
Jang, Yikweon; Hahm, Eun Hye; Lee, Hyun-Jung; Park, Soyeon; Won, Yong-Jin; Choe, Jae C.
2011-01-01
Background In a species with a large distribution relative to its dispersal capacity, geographic variation in traits may be explained by gene flow, selection, or the combined effects of both. Studies of genetic diversity using neutral molecular markers show that patterns of isolation by distance (IBD) or barrier effect may be evident for geographic variation at the molecular level in amphibian species. However, selective factors such as habitat, predator, or interspecific interactions may be critical for geographic variation in sexual traits. We studied geographic variation in advertisement calls in the tree frog Hyla japonica to understand patterns of variation in these traits across Korea and provide clues about the underlying forces for variation. Methodology We recorded calls of H. japonica in three breeding seasons from 17 localities including localities in remote Jeju Island. Call characters analyzed were note repetition rate (NRR), note duration (ND), and dominant frequency (DF), along with snout-to-vent length. Results The findings of a barrier effect on DF and a longitudinal variation in NRR seemed to suggest that an open sea between the mainland and Jeju Island and mountain ranges dominated by the north-south Taebaek Mountains were related to geographic variation in call characters. Furthermore, there was a pattern of IBD in mitochondrial DNA sequences. However, no comparable pattern of IBD was found between geographic distance and call characters. We also failed to detect any effects of habitat or interspecific interaction on call characters. Conclusions Geographic variations in call characters as well as mitochondrial DNA sequences were largely stratified by geographic factors such as distance and barriers in Korean populations of H. japoinca. Although we did not detect effects of habitat or interspecific interaction, some other selective factors such as sexual selection might still be operating on call characters in conjunction with restricted gene flow. PMID:21858061
Zhao, Ying; Tsang, Chi-Ching; Xiao, Meng; Cheng, Jingwei; Xu, Yingchun; Lau, Susanna K P; Woo, Patrick C Y
2015-10-22
Internal transcribed spacer region (ITS) sequencing is the most extensively used technology for accurate molecular identification of fungal pathogens in clinical microbiology laboratories. Intra-genomic ITS sequence heterogeneity, which makes fungal identification based on direct sequencing of PCR products difficult, has rarely been reported in pathogenic fungi. During the process of performing ITS sequencing on 71 yeast strains isolated from various clinical specimens, direct sequencing of the PCR products showed ambiguous sequences in six of them. After cloning the PCR products into plasmids for sequencing, interpretable sequencing electropherograms could be obtained. For each of the six isolates, 10-49 clones were selected for sequencing and two to seven intra-genomic ITS copies were detected. The identities of these six isolates were confirmed to be Candida glabrata (n=2), Pichia (Candida) norvegensis (n=2), Candida tropicalis (n=1) and Saccharomyces cerevisiae (n=1). Multiple sequence alignment revealed that one to four intra-genomic ITS polymorphic sites were present in the six isolates, and all these polymorphic sites were located in the ITS1 and/or ITS2 regions. We report and describe the first evidence of intra-genomic ITS sequence heterogeneity in four different pathogenic yeasts, which occurred exclusively in the ITS1 and ITS2 spacer regions for the six isolates in this study.
Zhao, Ying; Tsang, Chi-Ching; Xiao, Meng; Cheng, Jingwei; Xu, Yingchun; Lau, Susanna K. P.; Woo, Patrick C. Y.
2015-01-01
Internal transcribed spacer region (ITS) sequencing is the most extensively used technology for accurate molecular identification of fungal pathogens in clinical microbiology laboratories. Intra-genomic ITS sequence heterogeneity, which makes fungal identification based on direct sequencing of PCR products difficult, has rarely been reported in pathogenic fungi. During the process of performing ITS sequencing on 71 yeast strains isolated from various clinical specimens, direct sequencing of the PCR products showed ambiguous sequences in six of them. After cloning the PCR products into plasmids for sequencing, interpretable sequencing electropherograms could be obtained. For each of the six isolates, 10–49 clones were selected for sequencing and two to seven intra-genomic ITS copies were detected. The identities of these six isolates were confirmed to be Candida glabrata (n = 2), Pichia (Candida) norvegensis (n = 2), Candida tropicalis (n = 1) and Saccharomyces cerevisiae (n = 1). Multiple sequence alignment revealed that one to four intra-genomic ITS polymorphic sites were present in the six isolates, and all these polymorphic sites were located in the ITS1 and/or ITS2 regions. We report and describe the first evidence of intra-genomic ITS sequence heterogeneity in four different pathogenic yeasts, which occurred exclusively in the ITS1 and ITS2 spacer regions for the six isolates in this study. PMID:26506340
Condon, David E; Tran, Phu V; Lien, Yu-Chin; Schug, Jonathan; Georgieff, Michael K; Simmons, Rebecca A; Won, Kyoung-Jae
2018-02-05
Identification of differentially methylated regions (DMRs) is the initial step towards the study of DNA methylation-mediated gene regulation. Previous approaches to call DMRs suffer from false prediction, use extreme resources, and/or require library installation and input conversion. We developed a new approach called Defiant to identify DMRs. Employing Weighted Welch Expansion (WWE), Defiant showed superior performance to other predictors in the series of benchmarking tests on artificial and real data. Defiant was subsequently used to investigate DNA methylation changes in iron-deficient rat hippocampus. Defiant identified DMRs close to genes associated with neuronal development and plasticity, which were not identified by its competitor. Importantly, Defiant runs between 5 to 479 times faster than currently available software packages. Also, Defiant accepts 10 different input formats widely used for DNA methylation data. Defiant effectively identifies DMRs for whole-genome bisulfite sequencing (WGBS), reduced-representation bisulfite sequencing (RRBS), Tet-assisted bisulfite sequencing (TAB-seq), and HpaII tiny fragment enrichment by ligation-mediated PCR-tag (HELP) assays.
Comparing sequencing assays and human-machine analyses in actionable genomics for glioblastoma
Wrzeszczynski, Kazimierz O.; Frank, Mayu O.; Koyama, Takahiko; Rhrissorrakrai, Kahn; Robine, Nicolas; Utro, Filippo; Emde, Anne-Katrin; Chen, Bo-Juen; Arora, Kanika; Shah, Minita; Vacic, Vladimir; Norel, Raquel; Bilal, Erhan; Bergmann, Ewa A.; Moore Vogel, Julia L.; Bruce, Jeffrey N.; Lassman, Andrew B.; Canoll, Peter; Grommes, Christian; Harvey, Steve; Parida, Laxmi; Michelini, Vanessa V.; Zody, Michael C.; Jobanputra, Vaidehi; Royyuru, Ajay K.
2017-01-01
Objective: To analyze a glioblastoma tumor specimen with 3 different platforms and compare potentially actionable calls from each. Methods: Tumor DNA was analyzed by a commercial targeted panel. In addition, tumor-normal DNA was analyzed by whole-genome sequencing (WGS) and tumor RNA was analyzed by RNA sequencing (RNA-seq). The WGS and RNA-seq data were analyzed by a team of bioinformaticians and cancer oncologists, and separately by IBM Watson Genomic Analytics (WGA), an automated system for prioritizing somatic variants and identifying drugs. Results: More variants were identified by WGS/RNA analysis than by targeted panels. WGA completed a comparable analysis in a fraction of the time required by the human analysts. Conclusions: The development of an effective human-machine interface in the analysis of deep cancer genomic datasets may provide potentially clinically actionable calls for individual patients in a more timely and efficient manner than currently possible. ClinicalTrials.gov identifier: NCT02725684. PMID:28740869
Cai, Na; Bigdeli, Tim B; Kretzschmar, Warren W; Li, Yihan; Liang, Jieqin; Hu, Jingchu; Peterson, Roseann E; Bacanu, Silviu; Webb, Bradley Todd; Riley, Brien; Li, Qibin; Marchini, Jonathan; Mott, Richard; Kendler, Kenneth S; Flint, Jonathan
2017-02-14
The China, Oxford and Virginia Commonwealth University Experimental Research on Genetic Epidemiology (CONVERGE) project on Major Depressive Disorder (MDD) sequenced 11,670 female Han Chinese at low-coverage (1.7X), providing the first large-scale whole genome sequencing resource representative of the largest ethnic group in the world. Samples are collected from 58 hospitals from 23 provinces around China. We are able to call 22 million high quality single nucleotide polymorphisms (SNP) from the nuclear genome, representing the largest SNP call set from an East Asian population to date. We use these variants for imputation of genotypes across all samples, and this has allowed us to perform a successful genome wide association study (GWAS) on MDD. The utility of these data can be extended to studies of genetic ancestry in the Han Chinese and evolutionary genetics when integrated with data from other populations. Molecular phenotypes, such as copy number variations and structural variations can be detected, quantified and analysed in similar ways.
Chanhome, Lawan; Tan, Nget Hong
2017-01-01
Background The monocled cobra (Naja kaouthia) is a medically important venomous snake in Southeast Asia. Its venom has been shown to vary geographically in relation to venom composition and neurotoxic activity, indicating vast diversity of the toxin genes within the species. To investigate the polygenic trait of the venom and its locale-specific variation, we profiled and compared the venom gland transcriptomes of N. kaouthia from Malaysia (NK-M) and Thailand (NK-T) applying next-generation sequencing (NGS) technology. Methods The transcriptomes were sequenced on the Illumina HiSeq platform, assembled and followed by transcript clustering and annotations for gene expression and function. Pairwise or multiple sequence alignments were conducted on the toxin genes expressed. Substitution rates were studied for the major toxins co-expressed in NK-M and NK-T. Results and discussion The toxin transcripts showed high redundancy (41–82% of the total mRNA expression) and comprised 23 gene families expressed in NK-M and NK-T, respectively (22 gene families were co-expressed). Among the venom genes, three-finger toxins (3FTxs) predominated in the expression, with multiple sequences noted. Comparative analysis and selection study revealed that 3FTxs are genetically conserved between the geographical specimens whilst demonstrating distinct differential expression patterns, implying gene up-regulation for selected principal toxins, or alternatively, enhanced transcript degradation or lack of transcription of certain traits. One of the striking features that elucidates the inter-geographical venom variation is the up-regulation of α-neurotoxins (constitutes ∼80.0% of toxin’s fragments per kilobase of exon model per million mapped reads (FPKM)), particularly the long-chain α-elapitoxin-Nk2a (48.3%) in NK-T but only 1.7% was noted in NK-M. Instead, short neurotoxin isoforms were up-regulated in NK-M (46.4%). Another distinct transcriptional pattern observed is the exclusively and abundantly expressed cytotoxin CTX-3 in NK-T. The findings suggested correlation with the geographical variation in proteome and toxicity of the venom, and support the call for optimising antivenom production and use in the region. Besides, the current study uncovered full and partial sequences of numerous toxin genes from N. kaouthia which have not been reported hitherto; these include N. kaouthia-specific l-amino acid oxidase (LAAO), snake venom serine protease (SVSP), cystatin, acetylcholinesterase (AChE), hyaluronidase (HYA), waprin, phospholipase B (PLB), aminopeptidase (AP), neprilysin, etc. Taken together, the findings further enrich the snake toxin database and provide deeper insights into the genetic diversity of cobra venom toxins. PMID:28392982
ERIC Educational Resources Information Center
Duffy, Ryan D.; Bott, Elizabeth M.; Allan, Blake A.; Torrey, Carrie L.; Dik, Bryan J.
2012-01-01
The current study examined the relation between perceiving a calling, living a calling, and job satisfaction among a diverse group of employed adults who completed an online survey (N = 201). Perceiving a calling and living a calling were positively correlated with career commitment, work meaning, and job satisfaction. Living a calling moderated…
Using comparative genome analysis to identify problems in annotated microbial genomes.
Poptsova, Maria S; Gogarten, J Peter
2010-07-01
Genome annotation is a tedious task that is mostly done by automated methods; however, the accuracy of these approaches has been questioned since the beginning of the sequencing era. Genome annotation is a multilevel process, and errors can emerge at different stages: during sequencing, as a result of gene-calling procedures, and in the process of assigning gene functions. Missed or wrongly annotated genes differentially impact different types of analyses. Here we discuss and demonstrate how the methods of comparative genome analysis can refine annotations by locating missing orthologues. We also discuss possible reasons for errors and show that the second-generation annotation systems, which combine multiple gene-calling programs with similarity-based methods, perform much better than the first annotation tools. Since old errors may propagate to the newly sequenced genomes, we emphasize that the problem of continuously updating popular public databases is an urgent and unresolved one. Due to the progress in genome-sequencing technologies, automated annotation techniques will remain the main approach in the future. Researchers need to be aware of the existing errors in the annotation of even well-studied genomes, such as Escherichia coli, and consider additional quality control for their results.
De la Sen, Manuel; Abbas, Mujahid; Saleem, Naeem
2016-01-01
This paper discusses some convergence properties in fuzzy ordered proximal approaches defined by [Formula: see text]-sequences of pairs, where [Formula: see text] is a surjective self-mapping and [Formula: see text] where Aand Bare nonempty subsets of and abstract nonempty set X and [Formula: see text] is a partially ordered non-Archimedean fuzzy metric space which is endowed with a fuzzy metric M, a triangular norm * and an ordering [Formula: see text] The fuzzy set M takes values in a sequence or set [Formula: see text] where the elements of the so-called switching rule [Formula: see text] are defined from [Formula: see text] to a subset of [Formula: see text] Such a switching rule selects a particular realization of M at the nth iteration and it is parameterized by a growth evolution sequence [Formula: see text] and a sequence or set [Formula: see text] which belongs to the so-called [Formula: see text]-lower-bounding mappings which are defined from [0, 1] to [0, 1]. Some application examples concerning discrete systems under switching rules and best approximation solvability of algebraic equations are discussed.
Nakato, Ryuichiro; Itoh, Tahehiko; Shirahige, Katsuhiko
2013-07-01
Chromatin immunoprecipitation with high-throughput sequencing (ChIP-seq) can identify genomic regions that bind proteins involved in various chromosomal functions. Although the development of next-generation sequencers offers the technology needed to identify these protein-binding sites, the analysis can be computationally challenging because sequencing data sometimes consist of >100 million reads/sample. Herein, we describe a cost-effective and time-efficient protocol that is generally applicable to ChIP-seq analysis; this protocol uses a novel peak-calling program termed DROMPA to identify peaks and an additional program, parse2wig, to preprocess read-map files. This two-step procedure drastically reduces computational time and memory requirements compared with other programs. DROMPA enables the identification of protein localization sites in repetitive sequences and efficiently identifies both broad and sharp protein localization peaks. Specifically, DROMPA outputs a protein-binding profile map in pdf or png format, which can be easily manipulated by users who have a limited background in bioinformatics. © 2013 The Authors Genes to Cells © 2013 by the Molecular Biology Society of Japan and Wiley Publishing Asia Pty Ltd.
VARiD: a variation detection framework for color-space and letter-space platforms.
Dalca, Adrian V; Rumble, Stephen M; Levy, Samuel; Brudno, Michael
2010-06-15
High-throughput sequencing (HTS) technologies are transforming the study of genomic variation. The various HTS technologies have different sequencing biases and error rates, and while most HTS technologies sequence the residues of the genome directly, generating base calls for each position, the Applied Biosystem's SOLiD platform generates dibase-coded (color space) sequences. While combining data from the various platforms should increase the accuracy of variation detection, to date there are only a few tools that can identify variants from color space data, and none that can analyze color space and regular (letter space) data together. We present VARiD--a probabilistic method for variation detection from both letter- and color-space reads simultaneously. VARiD is based on a hidden Markov model and uses the forward-backward algorithm to accurately identify heterozygous, homozygous and tri-allelic SNPs, as well as micro-indels. Our analysis shows that VARiD performs better than the AB SOLiD toolset at detecting variants from color-space data alone, and improves the calls dramatically when letter- and color-space reads are combined. The toolset is freely available at http://compbio.cs.utoronto.ca/varid.
Glew, Michelle D.; Marenda, Marc; Rosengarten, Renate; Citti, Christine
2002-01-01
The ruminant pathogen Mycoplasma agalactiae possesses a family of abundantly expressed variable surface lipoproteins called Vpmas. Phenotypic switches between Vpma members have previously been correlated with DNA rearrangements within a locus of vpma genes and are proposed to play an important role in disease pathogenesis. In this study, six vpma genes were characterized in the M. agalactiae type strain PG2. All vpma genes clustered within an 8-kb region and shared highly conserved 5′ untranslated regions, lipoprotein signal sequences, and short N-terminal sequences. Analyses of the vpma loci from consecutive clonal isolates showed that vpma DNA rearrangements were site specific and that cleavage and strand exchange occurred within a minimal region of 21 bp located within the 5′ untranslated region of all vpma genes. This process controlled expression of vpma genes by effectively linking the open reading frame (ORF) of a silent gene to a unique active promoter sequence within the locus. An ORF (xer1) immediately adjacent to one end of the vpma locus did not undergo rearrangement and had significant homology to a distinct subset of genes belonging to the λ integrase family of site-specific xer recombinases. It is proposed that xer1 codes for a site-specific recombinase that is not involved in chromosome dimer resolution but rather is responsible for the observed vpma-specific recombination in M. agalactiae. PMID:12374833
Microbe-ID: an open source toolbox for microbial genotyping and species identification
Tabima, Javier F.; Everhart, Sydney E.; Larsen, Meredith M.; Weisberg, Alexandra J.; Kamvar, Zhian N.; Tancos, Matthew A.; Smart, Christine D.; Chang, Jeff H.
2016-01-01
Development of tools to identify species, genotypes, or novel strains of invasive organisms is critical for monitoring emergence and implementing rapid response measures. Molecular markers, although critical to identifying species or genotypes, require bioinformatic tools for analysis. However, user-friendly analytical tools for fast identification are not readily available. To address this need, we created a web-based set of applications called Microbe-ID that allow for customizing a toolbox for rapid species identification and strain genotyping using any genetic markers of choice. Two components of Microbe-ID, named Sequence-ID and Genotype-ID, implement species and genotype identification, respectively. Sequence-ID allows identification of species by using BLAST to query sequences for any locus of interest against a custom reference sequence database. Genotype-ID allows placement of an unknown multilocus marker in either a minimum spanning network or dendrogram with bootstrap support from a user-created reference database. Microbe-ID can be used for identification of any organism based on nucleotide sequences or any molecular marker type and several examples are provided. We created a public website for demonstration purposes called Microbe-ID (microbe-id.org) and provided a working implementation for the genus Phytophthora (phytophthora-id.org). In Phytophthora-ID, the Sequence-ID application allows identification based on ITS or cox spacer sequences. Genotype-ID groups individuals into clonal lineages based on simple sequence repeat (SSR) markers for the two invasive plant pathogen species P. infestans and P. ramorum. All code is open source and available on github and CRAN. Instructions for installation and use are provided at https://github.com/grunwaldlab/Microbe-ID. PMID:27602267
Mutation Scanning in Wheat by Exon Capture and Next-Generation Sequencing.
King, Robert; Bird, Nicholas; Ramirez-Gonzalez, Ricardo; Coghill, Jane A; Patil, Archana; Hassani-Pak, Keywan; Uauy, Cristobal; Phillips, Andrew L
2015-01-01
Targeted Induced Local Lesions in Genomes (TILLING) is a reverse genetics approach to identify novel sequence variation in genomes, with the aims of investigating gene function and/or developing useful alleles for breeding. Despite recent advances in wheat genomics, most current TILLING methods are low to medium in throughput, being based on PCR amplification of the target genes. We performed a pilot-scale evaluation of TILLING in wheat by next-generation sequencing through exon capture. An oligonucleotide-based enrichment array covering ~2 Mbp of wheat coding sequence was used to carry out exon capture and sequencing on three mutagenised lines of wheat containing previously-identified mutations in the TaGA20ox1 homoeologous genes. After testing different mapping algorithms and settings, candidate SNPs were identified by mapping to the IWGSC wheat Chromosome Survey Sequences. Where sequence data for all three homoeologues were found in the reference, mutant calls were unambiguous; however, where the reference lacked one or two of the homoeologues, captured reads from these genes were mis-mapped to other homoeologues, resulting either in dilution of the variant allele frequency or assignment of mutations to the wrong homoeologue. Competitive PCR assays were used to validate the putative SNPs and estimate cut-off levels for SNP filtering. At least 464 high-confidence SNPs were detected across the three mutagenized lines, including the three known alleles in TaGA20ox1, indicating a mutation rate of ~35 SNPs per Mb, similar to that estimated by PCR-based TILLING. This demonstrates the feasibility of using exon capture for genome re-sequencing as a method of mutation detection in polyploid wheat, but accurate mutation calling will require an improved genomic reference with more comprehensive coverage of homoeologues.
Molecular characterization of an ependymin precursor from goldfish brain.
Königstorfer, A; Sterrer, S; Eckerskorn, C; Lottspeich, F; Schmidt, R; Hoffmann, W
1989-01-01
Ependymins are thought to be implicated in fundamental processes involved in plasticity of the goldfish CNS. Gas-phase sequencing of purified ependymins beta and gamma revealed that they share the same N-terminal sequence. Each sequence displays microheterogeneities at several positions. Based on the protein sequences obtained, we constructed synthetic oligonucleotides and used them as hybridization probes for screening cDNA libraries of goldfish brain. In this article we describe the full-length sequence of a mRNA encoding a precursor of ependymins. A cleavable signal sequence characteristic of secretory proteins is located at the N-terminal end, followed directly by the ependymin sequence. Also, two potential N-glycosylation sites were detected. A computer search revealed that ependymins form a novel family of unique proteins.
The influence of tightening sequence and method on screw preload in implant superstructures.
Al-Sahan, Maha M; Al Maflehi, Nassr S; Akeel, Riyadh F
2014-01-01
This study evaluated the effect of six screw-tightening sequences and two tightening methods on the screw preload in implant-supported superstructures. The preload was measured using strain gauges following the screw tightening of a metal framework connected to four implants. The experiment included six sequences ([1] 1-2-3-4, [2] 4-2-3-1, [3] 4-3-1-2, [4] 1-4-2-3, [5] 2-3-4-1, and [6] 3-2-4-1), two methods (onestep, three-step), and five replications. Significant differences were found between tightening sequences and methods. In the three-step method, a higher total preload was found in sequences 2 (312 ± 85 N), 3 (246 ± 54 N), and 4 (310 ± 96 N). In the one-step method, a higher total preload was found in sequences 1 (286 ± 94 N), 5 (764 ± 142 N), and 6 (350 ± 69 N). It is concluded that the highest total screw preload was achieved when anterior implants of the superstructure were first tightened in one step, followed by posterior implants.
ERIC Educational Resources Information Center
Shute, Valerie J.; Hansen, Eric G.; Almond, Russell G.
2007-01-01
This paper reports on a 3-year, NSF-funded research and development project called ACED: Adaptive Content with Evidence-based Diagnosis. The purpose of the project was to design, develop, and evaluate an assessment for learning (AfL) system for diverse students, using Algebra I content related to geometric sequences (i.e., successive numbers…
Characterization of Animal Exposure Calls Captured by the National Poison Data System, 2000–2010
Buttke, Danielle E.; Schier, Joshua G.; Bronstein, Alvin C.; Chang, Arthur
2015-01-01
Objective Our objective was to characterize the data captured in all animal exposure calls reported to the National Poison Data System (NPDS), a national poison center reporting database, from 1 January 2000 through 31 December 2010 and identify Poison Center usage and needs in animal exposure calls. Design We calculated descriptive statistics characterizing animal type, exposure substance, medical outcome, year and month of call, caller location, and specific state for all animal exposure call data in NPDS from 1 January 2000 to 31 December 2010. SAS version 9.2 was used for the analysis. Results There were 1,371,095 animal exposure calls out of 28,925,496 (4.7%) total human and animal exposure calls in NPDS during the study period. The majority involved companion animal exposures with 88.0% canine exposures and 10.4% feline exposures. Pesticides were the most common exposure substance (n=360,375; 26.3%), followed by prescription drugs (n=261,543; 18.6%). The most common outcome reported was ‘Not followed, judged as nontoxic exposure or minimal clinical effects possible’ (n=803,491; 58.6%), followed by ‘Not followed, judged potentially toxic exposure’ (n=263,153; 19.2%). There were 5,388 deaths reported. Pesticide exposures were responsible for the greatest number of deaths (n=1,643; 30.4%). Conclusions and clinical relevance Approximately 1 in 20 calls to PCs are regarding potentially toxic exposures to animals, suggesting a need for veterinary expertise and resources at PCs. Pesticides are one of the greatest toxic exposure threats to animals, both in numbers of exposures and severity of clinical outcomes, and is an important area for education, prevention, and treatment. PMID:26346434
Caminer, Marcel A.; Ron, Santiago R.
2014-01-01
Abstract We review the systematics of the Hypsiboas calcaratus species complex, a group of widely distributed Amazonian hylid frogs. A comprehensive analysis of genetic, morphological, and bioacoustic datasets uncovered the existence of eleven candidate species, six of which are confirmed. Two of them correspond to Hypsiboas fasciatus and Hypsiboas calcaratus and the remaining four are new species that we describe here. Hypsiboas fasciatus sensu stricto has a geographic range restricted to the eastern Andean foothills of southern Ecuador while Hypsiboas calcaratus sensu stricto has a wide distribution in the Amazon basin. Hypsiboas almendarizae sp. n. occurs at elevations between 500 and 1950 m in central and northern Ecuador; the other new species (H. maculateralis sp. n., H. alfaroi sp. n., and H. tetete sp. n.) occur at elevations below 500 m in Amazonian Ecuador and Peru. The new species differ from H. calcaratus and H. fasciatus in morphology, advertisement calls, and mitochondrial and nuclear DNA sequences. Five candidate species from the Guianan region, Peru, and Bolivia are left as unconfirmed. Examination of the type material of Hyla steinbachi, from Bolivia, shows that it is not conspecific with H. fasciatus and thus is removed from its synonymy. PMID:24478591
Mai, Uyen; Mirarab, Siavash
2018-05-08
Sequence data used in reconstructing phylogenetic trees may include various sources of error. Typically errors are detected at the sequence level, but when missed, the erroneous sequences often appear as unexpectedly long branches in the inferred phylogeny. We propose an automatic method to detect such errors. We build a phylogeny including all the data then detect sequences that artificially inflate the tree diameter. We formulate an optimization problem, called the k-shrink problem, that seeks to find k leaves that could be removed to maximally reduce the tree diameter. We present an algorithm to find the exact solution for this problem in polynomial time. We then use several statistical tests to find outlier species that have an unexpectedly high impact on the tree diameter. These tests can use a single tree or a set of related gene trees and can also adjust to species-specific patterns of branch length. The resulting method is called TreeShrink. We test our method on six phylogenomic biological datasets and an HIV dataset and show that the method successfully detects and removes long branches. TreeShrink removes sequences more conservatively than rogue taxon removal and often reduces gene tree discordance more than rogue taxon removal once the amount of filtering is controlled. TreeShrink is an effective method for detecting sequences that lead to unrealistically long branch lengths in phylogenetic trees. The tool is publicly available at https://github.com/uym2/TreeShrink .
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tomasic, Ivan B.; Metcalf, Matthew C.; Guce, Abigail I.
2010-09-03
The human lysosomal enzymes {alpha}-galactosidase ({alpha}-GAL, EC 3.2.1.22) and {alpha}-N-acetylgalactosaminidase ({alpha}-NAGAL, EC 3.2.1.49) share 46% amino acid sequence identity and have similar folds. The active sites of the two enzymes share 11 of 13 amino acids, differing only where they interact with the 2-position of the substrates. Using a rational protein engineering approach, we interconverted the enzymatic specificity of {alpha}-GAL and {alpha}-NAGAL. The engineered {alpha}-GAL (which we call {alpha}-GALSA) retains the antigenicity of {alpha}-GAL but has acquired the enzymatic specificity of {alpha}-NAGAL. Conversely, the engineered {alpha}-NAGAL (which we call {alpha}-NAGAL{sup EL}) retains the antigenicity of {alpha}-NAGAL but has acquired themore » enzymatic specificity of the {alpha}-GAL enzyme. Comparison of the crystal structures of the designed enzyme {alpha}-GAL{sup SA} to the wild-type enzymes shows that active sites of {alpha}-GAL{sup SA} and {alpha}-NAGAL superimpose well, indicating success of the rational design. The designed enzymes might be useful as non-immunogenic alternatives in enzyme replacement therapy for treatment of lysosomal storage disorders such as Fabry disease.« less
Xu, Li; Ji, Jin-Jun; Le, Wangping; Xu, Yan S; Dou, Dandan; Pan, Jieli; Jiao, Yifeng; Zhong, Tianfei; Wu, Dehong; Wang, Yumei; Wen, Chengping; Xie, Guan-Qun; Yao, Feng; Zhao, Heng; Fan, Yong-Sheng; Chin, Y Eugene
2015-10-15
Cytokine or growth factor activated STAT3 undergoes multiple post-translational modifications, dimerization and translocation into nuclei, where it binds to serum-inducible element (SIE, 'TTC(N3)GAA')-bearing promoters to activate transcription. The STAT3 DNA binding domain (DBD, 320-494) mutation in hyper immunoglobulin E syndrome (HIES), called the HIES mutation (R382Q, R382W or V463Δ), which elevates IgE synthesis, inhibits SIE binding activity and sensitizes genes such as TNF-α for expression. However, the mechanism by which the HIES mutation sensitizes STAT3 in gene induction remains elusive. Here, we report that STAT3 binds directly to the AGG-element with the consensus sequence 'AGG(N3)AGG'. Surprisingly, the helical N-terminal region (1-355), rather than the canonical STAT3 DBD, is responsible for AGG-element binding. The HIES mutation markedly enhances STAT3 AGG-element binding and AGG-promoter activation activity. Thus, STAT3 is a dual specificity transcription factor that promotes gene expression not only via SIE- but also AGG-promoter activity. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Systematics of the Osteocephalus buckleyi species complex (Anura, Hylidae) from Ecuador and Peru
Ron, Santiago R.; Venegas, Pablo J.; Toral, Eduardo; Morley Read; Diego A. Ortiz; Manzano, Andrea L.
2012-01-01
Abstract We present a new phylogeny, based on DNA sequences of mitochondrial and nuclear genes, for frogs of the genus Osteocephalus with emphasis in the Osteocephalus buckleyi species complex. Genetic, morphologic, and advertisement call data are combined to define species boundaries and describe new species. The phylogeny shows strong support for: (1) a basal position of Osteocephalus taurinus + Osteocephalus oophagus, (2) a clade containing phytotelmata breeding species, and (3) a clade that corresponds to the Osteocephalus buckleyi species complex. Our results document a large proportion of hidden diversity within a set of populations that were previously treated as a single, widely distributed species, Osteocephalus buckleyi. Individuals assignable to Osteocephalus buckleyi formed a paraphyletic group relative to Osteocephalus verruciger and Osteocephalus cabrerai and contained four species, one of which is Osteocephalus buckleyi sensu stricto and three are new. Two of the new species are shared between Ecuador and Peru (Osteocephalus vilmae sp. n. and Osteocephalus cannatellai sp. n.) and one is distributed in the Amazon region of southern Peru (Osteocephalus germani sp. n.) We discuss the difficulties of using morphological characters to define species boundaries and propose a hypothesis to explain them. PMID:23166473
Dynamic variable selection in SNP genotype autocalling from APEX microarray data.
Podder, Mohua; Welch, William J; Zamar, Ruben H; Tebbutt, Scott J
2006-11-30
Single nucleotide polymorphisms (SNPs) are DNA sequence variations, occurring when a single nucleotide--adenine (A), thymine (T), cytosine (C) or guanine (G)--is altered. Arguably, SNPs account for more than 90% of human genetic variation. Our laboratory has developed a highly redundant SNP genotyping assay consisting of multiple probes with signals from multiple channels for a single SNP, based on arrayed primer extension (APEX). This mini-sequencing method is a powerful combination of a highly parallel microarray with distinctive Sanger-based dideoxy terminator sequencing chemistry. Using this microarray platform, our current genotype calling system (known as SNP Chart) is capable of calling single SNP genotypes by manual inspection of the APEX data, which is time-consuming and exposed to user subjectivity bias. Using a set of 32 Coriell DNA samples plus three negative PCR controls as a training data set, we have developed a fully-automated genotyping algorithm based on simple linear discriminant analysis (LDA) using dynamic variable selection. The algorithm combines separate analyses based on the multiple probe sets to give a final posterior probability for each candidate genotype. We have tested our algorithm on a completely independent data set of 270 DNA samples, with validated genotypes, from patients admitted to the intensive care unit (ICU) of St. Paul's Hospital (plus one negative PCR control sample). Our method achieves a concordance rate of 98.9% with a 99.6% call rate for a set of 96 SNPs. By adjusting the threshold value for the final posterior probability of the called genotype, the call rate reduces to 94.9% with a higher concordance rate of 99.6%. We also reversed the two independent data sets in their training and testing roles, achieving a concordance rate up to 99.8%. The strength of this APEX chemistry-based platform is its unique redundancy having multiple probes for a single SNP. Our model-based genotype calling algorithm captures the redundancy in the system considering all the underlying probe features of a particular SNP, automatically down-weighting any 'bad data' corresponding to image artifacts on the microarray slide or failure of a specific chemistry. In this regard, our method is able to automatically select the probes which work well and reduce the effect of other so-called bad performing probes in a sample-specific manner, for any number of SNPs.
Drakatos, Panagis; Kosky, Christopher A; Higgins, Sean E; Muza, Rexford T; Williams, Adrian J; Leschziner, Guy D
2013-09-01
Discrimination between narcolepsy, idiopathic hypersomnia, and behavior-induced inadequate sleep syndrome (BIISS) is based on clinical features and on specific nocturnal polysomnography (NPSG) and multiple sleep latency test (MSLT) results. However, previous studies have cast doubt on the specificity and sensitivity of these diagnostic tools. Eleven variables of the NPSG were analyzed in 101 patients who were retrospectively diagnosed with narcolepsy with cataplexy (N+C) (n=24), narcolepsy without cataplexy (N-C) (n=38), idiopathic hypersomnia with long sleep period (IHL) (n=21), and BIISS (n=18). Fifteen out of 24 N+C and 8 out of 38 N-C entered the first rapid eye movement (REM) sleep period (FREMP) from sleep stage 1 (N1) or wake (W), though this sleep-stage sequence did not arise in the other patient groups. FREMP stage sequence was a function of REM sleep latency (REML) for both N+C and N-C groups. FREMP stage sequence was not associated with mean sleep latency (MSL) in N+C but was associated in N-C, which implies heterogeneity within the N-C group. REML also was a useful discriminator. Depending on the cutoff period, REML had a sensitivity and specificity of up to 85.5% and 97.4%, respectively. The FREMP stage sequence may be a useful tool in the diagnosis of narcolepsy, particularly in conjunction with sleep-stage sequence analysis of sleep-onset REM periods (SOREMPs) in the MSLT; it also may provide a helpful intermediate phenotype in the clarification of heterogeneity in the N-C diagnostic group. However, larger prospective studies are necessary to confirm these findings. Copyright © 2013 Elsevier B.V. All rights reserved.
de Souza, Gustavo A.; Arntzen, Magnus Ø.; Fortuin, Suereta; Schürch, Anita C.; Målen, Hiwa; McEvoy, Christopher R. E.; van Soolingen, Dick; Thiede, Bernd; Warren, Robin M.; Wiker, Harald G.
2011-01-01
Precise annotation of genes or open reading frames is still a difficult task that results in divergence even for data generated from the same genomic sequence. This has an impact in further proteomic studies, and also compromises the characterization of clinical isolates with many specific genetic variations that may not be represented in the selected database. We recently developed software called multistrain mass spectrometry prokaryotic database builder (MSMSpdbb) that can merge protein databases from several sources and be applied on any prokaryotic organism, in a proteomic-friendly approach. We generated a database for the Mycobacterium tuberculosis complex (using three strains of Mycobacterium bovis and five of M. tuberculosis), and analyzed data collected from two laboratory strains and two clinical isolates of M. tuberculosis. We identified 2561 proteins, of which 24 were present in M. tuberculosis H37Rv samples, but not annotated in the M. tuberculosis H37Rv genome. We were also able to identify 280 nonsynonymous single amino acid polymorphisms and confirm 367 translational start sites. As a proof of concept we applied the database to whole-genome DNA sequencing data of one of the clinical isolates, which allowed the validation of 116 predicted single amino acid polymorphisms and the annotation of 131 N-terminal start sites. Moreover we identified regions not present in the original M. tuberculosis H37Rv sequence, indicating strain divergence or errors in the reference sequence. In conclusion, we demonstrated the potential of using a merged database to better characterize laboratory or clinical bacterial strains. PMID:21030493
Gene and translation initiation site prediction in metagenomic sequences
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hyatt, Philip Douglas; LoCascio, Philip F; Hauser, Loren John
2012-01-01
Gene prediction in metagenomic sequences remains a difficult problem. Current sequencing technologies do not achieve sufficient coverage to assemble the individual genomes in a typical sample; consequently, sequencing runs produce a large number of short sequences whose exact origin is unknown. Since these sequences are usually smaller than the average length of a gene, algorithms must make predictions based on very little data. We present MetaProdigal, a metagenomic version of the gene prediction program Prodigal, that can identify genes in short, anonymous coding sequences with a high degree of accuracy. The novel value of the method consists of enhanced translationmore » initiation site identification, ability to identify sequences that use alternate genetic codes and confidence values for each gene call. We compare the results of MetaProdigal with other methods and conclude with a discussion of future improvements.« less
2012-01-01
Background While safer than their viral counterparts, conventional non-viral gene delivery DNA vectors offer a limited safety profile. They often result in the delivery of unwanted prokaryotic sequences, antibiotic resistance genes, and the bacterial origins of replication to the target, which may lead to the stimulation of unwanted immunological responses due to their chimeric DNA composition. Such vectors may also impart the potential for chromosomal integration, thus potentiating oncogenesis. We sought to engineer an in vivo system for the quick and simple production of safer DNA vector alternatives that were devoid of non-transgene bacterial sequences and would lethally disrupt the host chromosome in the event of an unwanted vector integration event. Results We constructed a parent eukaryotic expression vector possessing a specialized manufactured multi-target site called “Super Sequence”, and engineered E. coli cells (R-cell) that conditionally produce phage-derived recombinase Tel (PY54), TelN (N15), or Cre (P1). Passage of the parent plasmid vector through R-cells under optimized conditions, resulted in rapid, efficient, and one step in vivo generation of mini lcc—linear covalently closed (Tel/TelN-cell), or mini ccc—circular covalently closed (Cre-cell), DNA constructs, separated from the backbone plasmid DNA. Site-specific integration of lcc plasmids into the host chromosome resulted in chromosomal disruption and 105 fold lower viability than that seen with the ccc counterpart. Conclusion We offer a high efficiency mini DNA vector production system that confers simple, rapid and scalable in vivo production of mini lcc DNA vectors that possess all the benefits of “minicircle” DNA vectors and virtually eliminate the potential for undesirable vector integration events. PMID:23216697
Global population genetic dynamics of a highly migratory, apex predator shark.
Bernard, Andrea M; Feldheim, Kevin A; Heithaus, Michael R; Wintner, Sabine P; Wetherbee, Bradley M; Shivji, Mahmood S
2016-11-01
Knowledge of genetic connectivity dynamics in the world's large-bodied, highly migratory, apex predator sharks across their global ranges is limited. One such species, the tiger shark (Galeocerdo cuvier), occurs worldwide in warm temperate and tropical waters, uses remarkably diverse habitats (nearshore to pelagic) and possesses a generalist diet that can structure marine ecosystems through top-down processes. We investigated the phylogeography and the global population structure of this exploited, phylogenetically enigmatic shark by using 10 nuclear microsatellites (n = 380) and sequences from the mitochondrial control region (CR, n = 340) and cytochrome oxidase I gene (n = 100). All three marker classes showed the genetic differentiation between tiger sharks from the western Atlantic and Indo-Pacific ocean basins (microsatellite F ST > 0.129; CR Φ ST > 0.497), the presence of North vs. southwestern Atlantic differentiation and the isolation of tiger sharks sampled from Hawaii from other surveyed locations. Furthermore, mitochondrial DNA revealed high levels of intraocean basin matrilineal population structure, suggesting female philopatry and sex-biased gene flow. Coalescent- and genetic distance-based estimates of divergence from CR sequences were largely congruent (d corr = 0.0015-0.0050), indicating a separation of Indo-Pacific and western Atlantic tiger sharks <1 million years ago. Mitochondrial haplotype relationships suggested that the western South Atlantic Ocean was likely a historical connection for interocean basin linkages via the dispersal around South Africa. Together, the results reveal unexpectedly high levels of population structure in a highly migratory, behaviourally generalist, cosmopolitan ocean predator, calling for management and conservation on smaller-than-anticipated spatial scales. © 2016 John Wiley & Sons Ltd.
ERIC Educational Resources Information Center
Nillsen, C.; Earl, J. K.; Elizondo, F.; Wadlington, P. L.
2014-01-01
This study explored whether congruence, calling, job characteristics or personality were better predictors of job satisfaction and tenure. The sample consisted of 1968 employees across four different job roles: sales engineers (N = 309), graphic designers (N = 383), teachers (N = 481) and clergy (N = 795). Data was collected as part of a selection…
Nair, Sethu C; Pattaradilokrat, Sittiporn; Zilversmit, Martine M; Dommer, Jennifer; Nagarajan, Vijayaraj; Stephens, Melissa T; Xiao, Wenming; Tan, John C; Su, Xin-Zhuan
2014-01-01
The rodent malaria parasite Plasmodium yoelii is an important model for studying malaria immunity and pathogenesis. One approach for studying malaria disease phenotypes is genetic mapping, which requires typing a large number of genetic markers from multiple parasite strains and/or progeny from genetic crosses. Hundreds of microsatellite (MS) markers have been developed to genotype the P. yoelii genome; however, typing a large number of MS markers can be labor intensive, time consuming, and expensive. Thus, development of high-throughput genotyping tools such as DNA microarrays that enable rapid and accurate large-scale genotyping of the malaria parasite will be highly desirable. In this study, we sequenced the genomes of two P. yoelii strains (33X and N67) and obtained a large number of single nucleotide polymorphisms (SNPs). Based on the SNPs obtained, we designed sets of oligonucleotide probes to develop a microarray that could interrogate ∼11,000 SNPs across the 14 chromosomes of the parasite in a single hybridization. Results from hybridizations of DNA samples of five P. yoelii strains or cloned lines (17XNL, YM, 33X, N67 and N67C) and two progeny from a genetic cross (N67×17XNL) to the microarray showed that the array had a high call rate (∼97%) and accuracy (99.9%) in calling SNPs, providing a simple and reliable tool for typing the P. yoelii genome. Our data show that the P. yoelii genome is highly polymorphic, although isogenic pairs of parasites were also detected. Additionally, our results indicate that the 33X parasite is a progeny of 17XNL (or YM) and an unknown parasite. The highly accurate and reliable microarray developed in this study will greatly facilitate our ability to study the genetic basis of important traits and the disease it causes. Published by Elsevier B.V.
Bunawan, Hamidun; Yen, Choong Chee; Yaakop, Salmah; Noor, Normah Mohd
2017-01-26
The chloroplastic trnL intron and the nuclear internal transcribed spacer (ITS) region were sequenced for 11 Nepenthes species recorded in Peninsular Malaysia to examine their phylogenetic relationship and to evaluate the usage of trnL intron and ITS sequences for phylogenetic reconstruction of this genus. Phylogeny reconstruction was carried out using neighbor-joining, maximum parsimony and Bayesian analyses. All the trees revealed two major clusters, a lowland group consisting of N. ampullaria, N. mirabilis, N. gracilis and N. rafflesiana, and another containing both intermediately distributed species (N. albomarginata and N. benstonei) and four highland species (N. sanguinea, N. macfarlanei, N. ramispina and N. alba). The trnL intron and ITS sequences proved to provide phylogenetic informative characters for deriving a phylogeny of Nepenthes species in Peninsular Malaysia. To our knowledge, this is the first molecular phylogenetic study of Nepenthes species occurring along an altitudinal gradient in Peninsular Malaysia.
What can we learn about lyssavirus genomes using 454 sequencing?
Höper, Dirk; Finke, Stefan; Freuling, Conrad M; Hoffmann, Bernd; Beer, Martin
2012-01-01
The main task of the individual project number four"Whole genome sequencing, virus-host adaptation, and molecular epidemiological analyses of lyssaviruses "within the network" Lyssaviruses--a potential re-emerging public health threat" is to provide high quality complete genome sequences from lyssaviruses. These sequences are analysed in-depth with regard to the diversity of the viral populations as to both quasi-species and so-called defective interfering RNAs. Moreover, the sequence data will facilitate further epidemiological analyses, will provide insight into the evolution of lyssaviruses and will be the basis for the design of novel nucleic acid based diagnostics. The first results presented here indicate that not only high quality full-length lyssavirus genome sequences can be generated, but indeed efficient analysis of the viral population gets feasible.
Prinz, A; Bolz, M; Findl, O
2005-11-01
Owing to the complex topographical aspects of ophthalmic surgery, teaching with conventional surgical videos has led to a poor understanding among medical students. A novel multimedia three dimensional (3D) computer animated program, called "Ophthalmic Operation Vienna" has been developed, where surgical videos are accompanied by 3D animated sequences of all surgical steps for five operations. The aim of the study was to assess the effect of 3D animations on the understanding of cataract and glaucoma surgery among medical students. Set in the Medical University of Vienna, Department of Ophthalmology, 172 students were randomised into two groups: a 3D group (n=90), that saw the 3D animations and video sequences, and a control group (n=82), that saw only the surgical videos. The narrated text was identical for both groups. After the presentation, students were questioned and tested using multiple choice questions. Students in the 3D group found the interactive multimedia teaching methods to be a valuable supplement to the conventional surgical videos. The 3D group outperformed the control group not only in topographical understanding by 16% (p<0.0001), but also in theoretical understanding by 7% (p<0.003). Women in the 3D group gained most by 19% over the control group (p<0.0001). The use of 3D animations lead to a better understanding of difficult surgical topics among medical students, especially for female users. Gender related benefits of using multimedia should be further explored.
Lu, S; Halberg, R; Kroos, L
1990-01-01
During sporulation of the Gram-positive bacterium Bacillus subtilis, transcription of genes encoding spore coat proteins in the mother-cell compartment of the sporangium is controlled by RNA polymerase containing the sigma subunit called sigma K. Based on comparison of the N-terminal amino acid sequence of sigma K with the nucleotide sequence of the gene encoding sigma K (sigK), the primary product of sigK was inferred to be a pro-protein (pro-sigma K) with 20 extra amino acids at the N terminus. Using antibodies generated against pro-sigma K, we have detected pro-sigma K beginning at the third hour of sporulation and sigma K beginning about 1 hr later. Even when pro-sigma K is expressed artificially during growth and throughout sporulation, sigma K appears at the normal time and expression of a sigma K-controlled gene occurs normally. These results suggest that pro-sigma K is an inactive precursor that is proteolytically processed to active sigma K in a developmentally regulated fashion. Mutations that block forespore gene expression block accumulation of sigma K but not accumulation of pro-sigma K, suggesting that pro-sigma K processing is a regulatory device that couples the programs of gene expression in the two compartments of the sporangium. We propose that this regulatory device ensures completion of forespore morphogenesis prior to the synthesis in the mother-cell of spore coat proteins that will encase the forespore. Images PMID:2124700
New Stopping Criteria for Segmenting DNA Sequences
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Wentian
2001-06-18
We propose a solution on the stopping criterion in segmenting inhomogeneous DNA sequences with complex statistical patterns. This new stopping criterion is based on Bayesian information criterion in the model selection framework. When this criterion is applied to telomere of S.cerevisiae and the complete sequence of E.coli, borders of biologically meaningful units were identified, and a more reasonable number of domains was obtained. We also introduce a measure called segmentation strength which can be used to control the delineation of large domains. The relationship between the average domain size and the threshold of segmentation strength is determined for several genomemore » sequences.« less
LongISLND: in silico sequencing of lengthy and noisy datatypes
Lau, Bayo; Mohiyuddin, Marghoob; Mu, John C.; Fang, Li Tai; Bani Asadi, Narges; Dallett, Carolina; Lam, Hugo Y. K.
2016-01-01
Summary: LongISLND is a software package designed to simulate sequencing data according to the characteristics of third generation, single-molecule sequencing technologies. The general software architecture is easily extendable, as demonstrated by the emulation of Pacific Biosciences (PacBio) multi-pass sequencing with P5 and P6 chemistries, producing data in FASTQ, H5, and the latest PacBio BAM format. We demonstrate its utility by downstream processing with consensus building and variant calling. Availability and Implementation: LongISLND is implemented in Java and available at http://bioinform.github.io/longislnd Contact: hugo.lam@roche.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27667791
Pryce, Todd M; Palladino, Silvano; Price, Diane M; Gardam, Dianne J; Campbell, Peter B; Christiansen, Keryn J; Murray, Ronan J
2006-04-01
We report a direct polymerase chain reaction/sequence (d-PCRS)-based method for the rapid identification of clinically significant fungi from 5 different types of commercial broth enrichment media inoculated with clinical specimens. Media including BacT/ALERT FA (BioMérieux, Marcy l'Etoile, France) (n = 87), BACTEC Plus Aerobic/F (Becton Dickinson, Microbiology Systems, Sparks, MD) (n = 16), BACTEC Peds Plus/F (Becton Dickinson) (n = 15), BACTEC Lytic/10 Anaerobic/F (Becton Dickinson) (n = 11) bottles, and BBL MGIT (Becton Dickinson) (n = 11) were inoculated with specimens from 138 patients. A universal DNA extraction method was used combining a novel pretreatment step to remove PCR inhibitors with a column-based DNA extraction kit. Target sequences in the noncoding internal transcribed spacer regions of the rRNA gene were amplified by PCR and sequenced using a rapid (24 h) automated capillary electrophoresis system. Using sequence alignment software, fungi were identified by sequence similarity with sequences derived from isolates identified by upper-level reference laboratories or isolates defined as ex-type strains. We identified Candida albicans (n = 14), Candida parapsilosis (n = 8), Candida glabrata (n = 7), Candida krusei (n = 2), Scedosporium prolificans (n = 4), and 1 each of Candida orthopsilosis, Candida dubliniensis, Candida kefyr, Candida tropicalis, Candida guilliermondii, Saccharomyces cerevisiae, Cryptococcus neoformans, Aspergillus fumigatus, Histoplasma capsulatum, and Malassezia pachydermatis by d-PCRS analysis. All d-PCRS identifications from positive broths were in agreement with the final species identification of the isolates grown from subculture. Earlier identification of fungi using d-PCRS may facilitate prompt and more appropriate antifungal therapy.
Whole exome sequencing for familial bicuspid aortic valve identifies putative variants.
Martin, Lisa J; Pilipenko, Valentina; Kaufman, Kenneth M; Cripe, Linda; Kottyan, Leah C; Keddache, Mehdi; Dexheimer, Phillip; Weirauch, Matthew T; Benson, D Woodrow
2014-10-01
Bicuspid aortic valve (BAV) is the most common congenital cardiovascular malformation. Although highly heritable, few causal variants have been identified. The purpose of this study was to identify genetic variants underlying BAV by whole exome sequencing a multiplex BAV kindred. Whole exome sequencing was performed on 17 individuals from a single family (BAV=3; other cardiovascular malformation, 3). Postvariant calling error control metrics were established after examining the relationship between Mendelian inheritance error rate and coverage, quality score, and call rate. To determine the most effective approach to identifying susceptibility variants from among 54 674 variants passing error control metrics, we evaluated 3 variant selection strategies frequently used in whole exome sequencing studies plus extended family linkage. No putative rare, high-effect variants were identified in all affected but no unaffected individuals. Eight high-effect variants were identified by ≥2 of the commonly used selection strategies; however, these were either common in the general population (>10%) or present in the majority of the unaffected family members. However, using extended family linkage, 3 synonymous variants were identified; all 3 variants were identified by at least one other strategy. These results suggest that traditional whole exome sequencing approaches, which assume causal variants alter coding sense, may be insufficient for BAV and other complex traits. Identification of disease-associated variants is facilitated by the use of segregation within families. © 2014 American Heart Association, Inc.
Liu, Bin; Wang, Shanyi; Dong, Qiwen; Li, Shumin; Liu, Xuan
2016-04-20
DNA-binding proteins play a pivotal role in various intra- and extra-cellular activities ranging from DNA replication to gene expression control. With the rapid development of next generation of sequencing technique, the number of protein sequences is unprecedentedly increasing. Thus it is necessary to develop computational methods to identify the DNA-binding proteins only based on the protein sequence information. In this study, a novel method called iDNA-KACC is presented, which combines the Support Vector Machine (SVM) and the auto-cross covariance transformation. The protein sequences are first converted into profile-based protein representation, and then converted into a series of fixed-length vectors by the auto-cross covariance transformation with Kmer composition. The sequence order effect can be effectively captured by this scheme. These vectors are then fed into Support Vector Machine (SVM) to discriminate the DNA-binding proteins from the non DNA-binding ones. iDNA-KACC achieves an overall accuracy of 75.16% and Matthew correlation coefficient of 0.5 by a rigorous jackknife test. Its performance is further improved by employing an ensemble learning approach, and the improved predictor is called iDNA-KACC-EL. Experimental results on an independent dataset shows that iDNA-KACC-EL outperforms all the other state-of-the-art predictors, indicating that it would be a useful computational tool for DNA binding protein identification. .
Derkach, Andriy; Chiang, Theodore; Gong, Jiafen; Addis, Laura; Dobbins, Sara; Tomlinson, Ian; Houlston, Richard; Pal, Deb K.; Strug, Lisa J.
2014-01-01
Motivation: Sufficiently powered case–control studies with next-generation sequence (NGS) data remain prohibitively expensive for many investigators. If feasible, a more efficient strategy would be to include publicly available sequenced controls. However, these studies can be confounded by differences in sequencing platform; alignment, single nucleotide polymorphism and variant calling algorithms; read depth; and selection thresholds. Assuming one can match cases and controls on the basis of ethnicity and other potential confounding factors, and one has access to the aligned reads in both groups, we investigate the effect of systematic differences in read depth and selection threshold when comparing allele frequencies between cases and controls. We propose a novel likelihood-based method, the robust variance score (RVS), that substitutes genotype calls by their expected values given observed sequence data. Results: We show theoretically that the RVS eliminates read depth bias in the estimation of minor allele frequency. We also demonstrate that, using simulated and real NGS data, the RVS method controls Type I error and has comparable power to the ‘gold standard’ analysis with the true underlying genotypes for both common and rare variants. Availability and implementation: An RVS R script and instructions can be found at strug.research.sickkids.ca, and at https://github.com/strug-lab/RVS. Contact: lisa.strug@utoronto.ca Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24733292
PEA: an integrated R toolkit for plant epitranscriptome analysis.
Zhai, Jingjing; Song, Jie; Cheng, Qian; Tang, Yunjia; Ma, Chuang
2018-05-29
The epitranscriptome, also known as chemical modifications of RNA (CMRs), is a newly discovered layer of gene regulation, the biological importance of which emerged through analysis of only a small fraction of CMRs detected by high-throughput sequencing technologies. Understanding of the epitranscriptome is hampered by the absence of computational tools for the systematic analysis of epitranscriptome sequencing data. In addition, no tools have yet been designed for accurate prediction of CMRs in plants, or to extend epitranscriptome analysis from a fraction of the transcriptome to its entirety. Here, we introduce PEA, an integrated R toolkit to facilitate the analysis of plant epitranscriptome data. The PEA toolkit contains a comprehensive collection of functions required for read mapping, CMR calling, motif scanning and discovery, and gene functional enrichment analysis. PEA also takes advantage of machine learning technologies for transcriptome-scale CMR prediction, with high prediction accuracy, using the Positive Samples Only Learning algorithm, which addresses the two-class classification problem by using only positive samples (CMRs), in the absence of negative samples (non-CMRs). Hence PEA is a versatile epitranscriptome analysis pipeline covering CMR calling, prediction, and annotation, and we describe its application to predict N6-methyladenosine (m6A) modifications in Arabidopsis thaliana. Experimental results demonstrate that the toolkit achieved 71.6% sensitivity and 73.7% specificity, which is superior to existing m6A predictors. PEA is potentially broadly applicable to the in-depth study of epitranscriptomics. PEA Docker image is available at https://hub.docker.com/r/malab/pea, source codes and user manual are available at https://github.com/cma2015/PEA. chuangma2006@gmail.com. Supplementary data are available at Bioinformatics online.
Investigating expectation effects using multiple physiological measures
Siller, Alexander; Ambach, Wolfgang; Vaitl, Dieter
2015-01-01
The study aimed at experimentally investigating whether the human body can anticipate future events under improved methodological conditions. Previous studies have reported contradictory results for the phenomenon typically called presentiment. If the positive findings are accurate, they call into doubt our views about human perception, and if they are inaccurate, a plausible conventional explanation might be based on the experimental design of the previous studies, in which expectation due to item sequences was misinterpreted as presentiment. To address these points, we opted to collect several physiological variables, to test different randomization types and to manipulate subjective significance individually. For the latter, we combined a mock crime scenario, in which participants had to steal specific items, with a concealed information test (CIT), in which the participants had to conceal their knowledge when interrogated about items they had stolen or not stolen. We measured electrodermal activity, respiration, finger pulse, heart rate (HR), and reaction times. The participants (n = 154) were assigned randomly to four different groups. Items presented in the CIT were either drawn with replacement (full) or without replacement (pseudo) and were either presented category-wise (cat) or regardless of categories (nocat). To understand how these item sequences influence expectation and modulate physiological reactions, we compared the groups with respect to effect sizes for stolen vs. not stolen items. Group pseudo_cat yielded the highest effect sizes, and pseudo_nocat yielded the lowest. We could not find any evidence of presentiment but did find evidence of physiological correlates of expectation. Due to the design differing fundamentally from previous studies, these findings do not allow for conclusions on the question whether the expectation bias is being confounded with presentiment. PMID:26500600
Poultney, Christopher S; Goldberg, Arthur P; Drapeau, Elodie; Kou, Yan; Harony-Nicolas, Hala; Kajiwara, Yuji; De Rubeis, Silvia; Durand, Simon; Stevens, Christine; Rehnström, Karola; Palotie, Aarno; Daly, Mark J; Ma'ayan, Avi; Fromer, Menachem; Buxbaum, Joseph D
2013-10-03
Copy number variation (CNV) is an important determinant of human diversity and plays important roles in susceptibility to disease. Most studies of CNV carried out to date have made use of chromosome microarray and have had a lower size limit for detection of about 30 kilobases (kb). With the emergence of whole-exome sequencing studies, we asked whether such data could be used to reliably call rare exonic CNV in the size range of 1-30 kilobases (kb), making use of the eXome Hidden Markov Model (XHMM) program. By using both transmission information and validation by molecular methods, we confirmed that small CNV encompassing as few as three exons can be reliably called from whole-exome data. We applied this approach to an autism case-control sample (n = 811, mean per-target read depth = 161) and observed a significant increase in the burden of rare (MAF ≤1%) 1-30 kb CNV, 1-30 kb deletions, and 1-10 kb deletions in ASD. CNV in the 1-30 kb range frequently hit just a single gene, and we were therefore able to carry out enrichment and pathway analyses, where we observed enrichment for disruption of genes in cytoskeletal and autophagy pathways in ASD. In summary, our results showed that XHMM provided an effective means to assess small exonic CNV from whole-exome data, indicated that rare 1-30 kb exonic deletions could contribute to risk in up to 7% of individuals with ASD, and implicated a candidate pathway in developmental delay syndromes. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
On the joint spectral density of bivariate random sequences. Thesis Technical Report No. 21
NASA Technical Reports Server (NTRS)
Aalfs, David D.
1995-01-01
For univariate random sequences, the power spectral density acts like a probability density function of the frequencies present in the sequence. This dissertation extends that concept to bivariate random sequences. For this purpose, a function called the joint spectral density is defined that represents a joint probability weighing of the frequency content of pairs of random sequences. Given a pair of random sequences, the joint spectral density is not uniquely determined in the absence of any constraints. Two approaches to constraining the sequences are suggested: (1) assume the sequences are the margins of some stationary random field, (2) assume the sequences conform to a particular model that is linked to the joint spectral density. For both approaches, the properties of the resulting sequences are investigated in some detail, and simulation is used to corroborate theoretical results. It is concluded that under either of these two constraints, the joint spectral density can be computed from the non-stationary cross-correlation.
NASA Astrophysics Data System (ADS)
Weigt, Martin
Over the last years, biological research has been revolutionized by experimental high-throughput techniques, in particular by next-generation sequencing technology. Unprecedented amounts of data are accumulating, and there is a growing request for computational methods unveiling the information hidden in raw data, thereby increasing our understanding of complex biological systems. Statistical-physics models based on the maximum-entropy principle have, in the last few years, played an important role in this context. To give a specific example, proteins and many non-coding RNA show a remarkable degree of structural and functional conservation in the course of evolution, despite a large variability in amino acid sequences. We have developed a statistical-mechanics inspired inference approach - called Direct-Coupling Analysis - to link this sequence variability (easy to observe in sequence alignments, which are available in public sequence databases) to bio-molecular structure and function. In my presentation I will show, how this methodology can be used (i) to infer contacts between residues and thus to guide tertiary and quaternary protein structure prediction and RNA structure prediction, (ii) to discriminate interacting from non-interacting protein families, and thus to infer conserved protein-protein interaction networks, and (iii) to reconstruct mutational landscapes and thus to predict the phenotypic effect of mutations. References [1] M. Figliuzzi, H. Jacquier, A. Schug, O. Tenaillon and M. Weigt ''Coevolutionary landscape inference and the context-dependence of mutations in beta-lactamase TEM-1'', Mol. Biol. Evol. (2015), doi: 10.1093/molbev/msv211 [2] E. De Leonardis, B. Lutz, S. Ratz, S. Cocco, R. Monasson, A. Schug, M. Weigt ''Direct-Coupling Analysis of nucleotide coevolution facilitates RNA secondary and tertiary structure prediction'', Nucleic Acids Research (2015), doi: 10.1093/nar/gkv932 [3] F. Morcos, A. Pagnani, B. Lunt, A. Bertolino, D. Marks, C. Sander, R. Zecchina, J.N. Onuchic, T. Hwa, M. Weigt, ''Direct-coupling analysis of residue co-evolution captures native contacts across many protein families'', Proc. Natl. Acad. Sci. 108, E1293-E1301 (2011).
Pms2 Suppresses Large Expansions of the (GAA·TTC)n Sequence in Neuronal Tissues
Bourn, Rebecka L.; De Biase, Irene; Pinto, Ricardo Mouro; Sandi, Chiranjeevi; Al-Mahdawi, Sahar; Pook, Mark A.; Bidichandani, Sanjay I.
2012-01-01
Expanded trinucleotide repeat sequences are the cause of several inherited neurodegenerative diseases. Disease pathogenesis is correlated with several features of somatic instability of these sequences, including further large expansions in postmitotic tissues. The presence of somatic expansions in postmitotic tissues is consistent with DNA repair being a major determinant of somatic instability. Indeed, proteins in the mismatch repair (MMR) pathway are required for instability of the expanded (CAG·CTG)n sequence, likely via recognition of intrastrand hairpins by MutSβ. It is not clear if or how MMR would affect instability of disease-causing expanded trinucleotide repeat sequences that adopt secondary structures other than hairpins, such as the triplex/R-loop forming (GAA·TTC)n sequence that causes Friedreich ataxia. We analyzed somatic instability in transgenic mice that carry an expanded (GAA·TTC)n sequence in the context of the human FXN locus and lack the individual MMR proteins Msh2, Msh6 or Pms2. The absence of Msh2 or Msh6 resulted in a dramatic reduction in somatic mutations, indicating that mammalian MMR promotes instability of the (GAA·TTC)n sequence via MutSα. The absence of Pms2 resulted in increased accumulation of large expansions in the nervous system (cerebellum, cerebrum, and dorsal root ganglia) but not in non-neuronal tissues (heart and kidney), without affecting the prevalence of contractions. Pms2 suppressed large expansions specifically in tissues showing MutSα-dependent somatic instability, suggesting that they may act on the same lesion or structure associated with the expanded (GAA·TTC)n sequence. We conclude that Pms2 specifically suppresses large expansions of a pathogenic trinucleotide repeat sequence in neuronal tissues, possibly acting independently of the canonical MMR pathway. PMID:23071719
Pms2 suppresses large expansions of the (GAA·TTC)n sequence in neuronal tissues.
Bourn, Rebecka L; De Biase, Irene; Pinto, Ricardo Mouro; Sandi, Chiranjeevi; Al-Mahdawi, Sahar; Pook, Mark A; Bidichandani, Sanjay I
2012-01-01
Expanded trinucleotide repeat sequences are the cause of several inherited neurodegenerative diseases. Disease pathogenesis is correlated with several features of somatic instability of these sequences, including further large expansions in postmitotic tissues. The presence of somatic expansions in postmitotic tissues is consistent with DNA repair being a major determinant of somatic instability. Indeed, proteins in the mismatch repair (MMR) pathway are required for instability of the expanded (CAG·CTG)(n) sequence, likely via recognition of intrastrand hairpins by MutSβ. It is not clear if or how MMR would affect instability of disease-causing expanded trinucleotide repeat sequences that adopt secondary structures other than hairpins, such as the triplex/R-loop forming (GAA·TTC)(n) sequence that causes Friedreich ataxia. We analyzed somatic instability in transgenic mice that carry an expanded (GAA·TTC)(n) sequence in the context of the human FXN locus and lack the individual MMR proteins Msh2, Msh6 or Pms2. The absence of Msh2 or Msh6 resulted in a dramatic reduction in somatic mutations, indicating that mammalian MMR promotes instability of the (GAA·TTC)(n) sequence via MutSα. The absence of Pms2 resulted in increased accumulation of large expansions in the nervous system (cerebellum, cerebrum, and dorsal root ganglia) but not in non-neuronal tissues (heart and kidney), without affecting the prevalence of contractions. Pms2 suppressed large expansions specifically in tissues showing MutSα-dependent somatic instability, suggesting that they may act on the same lesion or structure associated with the expanded (GAA·TTC)(n) sequence. We conclude that Pms2 specifically suppresses large expansions of a pathogenic trinucleotide repeat sequence in neuronal tissues, possibly acting independently of the canonical MMR pathway.
Estimation of pairwise sequence similarity of mammalian enhancers with word neighbourhood counts.
Göke, Jonathan; Schulz, Marcel H; Lasserre, Julia; Vingron, Martin
2012-03-01
The identity of cells and tissues is to a large degree governed by transcriptional regulation. A major part is accomplished by the combinatorial binding of transcription factors at regulatory sequences, such as enhancers. Even though binding of transcription factors is sequence-specific, estimating the sequence similarity of two functionally similar enhancers is very difficult. However, a similarity measure for regulatory sequences is crucial to detect and understand functional similarities between two enhancers and will facilitate large-scale analyses like clustering, prediction and classification of genome-wide datasets. We present the standardized alignment-free sequence similarity measure N2, a flexible framework that is defined for word neighbourhoods. We explore the usefulness of adding reverse complement words as well as words including mismatches into the neighbourhood. On simulated enhancer sequences as well as functional enhancers in mouse development, N2 is shown to outperform previous alignment-free measures. N2 is flexible, faster than competing methods and less susceptible to single sequence noise and the occurrence of repetitive sequences. Experiments on the mouse enhancers reveal that enhancers active in different tissues can be separated by pairwise comparison using N2. N2 represents an improvement over previous alignment-free similarity measures without compromising speed, which makes it a good candidate for large-scale sequence comparison of regulatory sequences. The software is part of the open-source C++ library SeqAn (www.seqan.de) and a compiled version can be downloaded at http://www.seqan.de/projects/alf.html. Supplementary data are available at Bioinformatics online.
Tandem Mass Spectrum Sequencing: An Alternative to Database Search Engines in Shotgun Proteomics.
Muth, Thilo; Rapp, Erdmann; Berven, Frode S; Barsnes, Harald; Vaudel, Marc
2016-01-01
Protein identification via database searches has become the gold standard in mass spectrometry based shotgun proteomics. However, as the quality of tandem mass spectra improves, direct mass spectrum sequencing gains interest as a database-independent alternative. In this chapter, the general principle of this so-called de novo sequencing is introduced along with pitfalls and challenges of the technique. The main tools available are presented with a focus on user friendly open source software which can be directly applied in everyday proteomic workflows.
Grissa, Ibtissem; Vergnaud, Gilles; Pourcel, Christine
2009-01-01
Clustered regularly interspaced short palindromic repeats (CRISPRs) are DNA sequences composed of a succession of repeats (23- to 47-bp long) separated by unique sequences called spacers. Polymorphism can be observed in different strains of a species and may be used for genotyping. We describe protocols and bioinformatics tools that allow the identification of CRISPRs from sequenced genomes, their comparison, and their component determination (the direct repeats and the spacers). A schematic representation of the spacer organization can be produced, allowing an easy comparison between strains.
Primer on Computer Graphics Programming. Revision
1982-04-01
TEXTO 60 TO 4 3 CALL UWRITl C’Ai’,’TEXT 4 CONTINUE «.«. ^^^^ef%,xN...CX.Y.’NOO mm^^ CALL UPRNTl CTTTLECO,’ TEXTO CALL UPRNTJ CX.OPTIONCI33 CALL UPRNTJ CTITLEC25.’ TEXTO CALL UPRNTl CY,OPTIONCli3 CALL UMOVE OC.Y5...CALL USET (’TEXT’) CALL UPRINT (-1.0,-1.05,’SIDES;’) CALL USET (’INTEGER’) CALL UPRINT (0.9,-1.05,S! DES ) 1 CONTINUE CALLUEND STOP
Chang, Elizabeth; Pourmal, Sergei; Zhou, Chun; Kumar, Rupesh; Teplova, Marianna; Pavletich, Nikola P; Marians, Kenneth J; Erdjument-Bromage, Hediye
2016-07-01
In recent history, alternative approaches to Edman sequencing have been investigated, and to this end, the Association of Biomolecular Resource Facilities (ABRF) Protein Sequencing Research Group (PSRG) initiated studies in 2014 and 2015, looking into bottom-up and top-down N-terminal (Nt) dimethyl derivatization of standard quantities of intact proteins with the aim to determine Nt sequence information. We have expanded this initiative and used low picomole amounts of myoglobin to determine the efficiency of Nt-dimethylation. Application of this approach on protein domains, generated by limited proteolysis of overexpressed proteins, confirms that it is a universal labeling technique and is very sensitive when compared with Edman sequencing. Finally, we compared Edman sequencing and Nt-dimethylation of the same polypeptide fragments; results confirm that there is agreement in the identity of the Nt amino acid sequence between these 2 methods.
Sequence specificity of the human mRNA N6-adenosine methylase in vitro.
Harper, J E; Miceli, S M; Roberts, R J; Manley, J L
1990-01-01
N6-adenosine methylation is a frequent modification of mRNAs and their precursors, but little is known about the mechanism of the reaction or the function of the modification. To explore these questions, we developed conditions to examine N6-adenosine methylase activity in HeLa cell nuclear extracts. Transfer of the methyl group from S-[3H methyl]-adenosylmethionine to unlabeled random copolymer RNA substrates of varying ribonucleotide composition revealed a substrate specificity consistent with a previously deduced consensus sequence, Pu[G greater than A]AC[A/C/U]. 32-P labeled RNA substrates of defined sequence were used to examine the minimum sequence requirements for methylation. Each RNA was 20 nucleotides long, and contained either the core consensus sequence GGACU, or some variation of this sequence. RNAs containing GGACU, either in single or multiple copies, were good substrates for methylation, whereas RNAs containing single base substitutions within the GGACU sequence gave dramatically reduced methylation. These results demonstrate that the N6-adenosine methylase has a strict sequence specificity, and that there is no requirement for extended sequences or secondary structures for methylation. Recognition of this sequence does not require an RNA component, as micrococcal nuclease pretreatment of nuclear extracts actually increased methylation efficiency. Images PMID:2216767
Primer development to obtain complete coding sequence of HA and NA genes of influenza A/H3N2 virus.
Agustiningsih, Agustiningsih; Trimarsanto, Hidayat; Setiawaty, Vivi; Artika, I Made; Muljono, David Handojo
2016-08-30
Influenza is an acute respiratory illness and has become a serious public health problem worldwide. The need to study the HA and NA genes in influenza A virus is essential since these genes frequently undergo mutations. This study describes the development of primer sets for RT-PCR to obtain complete coding sequence of Hemagglutinin (HA) and Neuraminidase (NA) genes of influenza A/H3N2 virus from Indonesia. The primers were developed based on influenza A/H3N2 sequence worldwide from Global Initiative on Sharing All Influenza Data (GISAID) and further tested using Indonesian influenza A/H3N2 archived samples of influenza-like illness (ILI) surveillance from 2008 to 2009. An optimum RT-PCR condition was acquired for all HA and NA fragments designed to cover complete coding sequence of HA and NA genes. A total of 71 samples were successfully sequenced for complete coding sequence both of HA and NA genes out of 145 samples of influenza A/H3N2 tested. The developed primer sets were suitable for obtaining complete coding sequences of HA and NA genes of Indonesian samples from 2008 to 2009.
call by the Maltese Government. U.S. Navy photo (Released) 131017-N-ZZ999-007 Distressed persons wave to a call by the Maltese Government. U.S. Navy photo (Released) 131017-N-ZZ999-011 Sailors aboard the Class Tamara Vaughn (Released) 131010-N-RJ834-066 Operations Specialist 3rd Class Phillip Leak, right
Bidard, J N; de Nadai, F; Rovere, C; Moinier, D; Laur, J; Martinez, J; Cuber, J C; Kitabgi, P
1993-01-01
Neurotensin (NT) and neuromedin N (NN) are two related biologically active peptides that are encoded in the same precursor molecule. In the rat, the precursor consists of a 169-residue polypeptide starting with an N-terminal signal peptide and containing in its C-terminal region one copy each of NT and NN. NN precedes NT and is separated from it by a Lys-Arg sequence. Two other Lys-Arg sequences flank the N-terminus of NN and the C-terminus of NT. A fourth Lys-Arg sequence occurs near the middle of the precursor and is followed by an NN-like sequence. Finally, an Arg-Arg pair is present within the NT moiety. The four Lys-Arg doublets represent putative processing sites in the precursor molecule. The present study was designed to investigate the post-translational processing of the NT/NN precursor in the rat medullary thyroid carcinoma (rMTC) 6-23 cell line, which synthesizes large amounts of NT upon dexamethasone treatment. Five region-specific antisera recognizing the free N- or C-termini of sequences adjacent to the basic doublets were produced, characterized and used for immunoblotting and radioimmunoassay studies in combination with gel filtration, reverse-phase h.p.l.c. and trypsin digestion of rMTC 6-23 cell extracts. Because two of the antigenic sequences, i.e. NN and the NN-like sequence, start with a lysine residue that is essential for recognition by their respective antisera, a micromethod by which trypsin specifically cleaves at arginine residues was developed. The results show that dexamethasone-treated rMTC 6-23 cells produced comparable amounts of NT, NN and a peptide corresponding to a large N-terminal precursor fragment lacking the NN and NT moieties. This large fragment was purified. N-Terminal sequencing revealed that it started at residue Ser23 of the prepro-NT/NN sequence, and thus established the Cys22-Ser23 bond as the cleavage site of the signal peptide. Two other large N-terminal fragments bearing respectively the NN and NT sequences at their C-termini were present in lower amounts. The NN-like sequence was internal to all the large fragments. There was no evidence for the presence of peptides with the NN-like sequence at their N-termini. This shows that, in rMTC 6-23 cells, the precursor is readily processed at the three Lys-Arg doublets that flank and separate the NT and NN sequences. In contrast, the Lys-Arg doublet that precedes the NN-like sequence is not processed in this system.(ABSTRACT TRUNCATED AT 400 WORDS) Images Figure 3 PMID:8471039
GraphTeams: a method for discovering spatial gene clusters in Hi-C sequencing data.
Schulz, Tizian; Stoye, Jens; Doerr, Daniel
2018-05-08
Hi-C sequencing offers novel, cost-effective means to study the spatial conformation of chromosomes. We use data obtained from Hi-C experiments to provide new evidence for the existence of spatial gene clusters. These are sets of genes with associated functionality that exhibit close proximity to each other in the spatial conformation of chromosomes across several related species. We present the first gene cluster model capable of handling spatial data. Our model generalizes a popular computational model for gene cluster prediction, called δ-teams, from sequences to graphs. Following previous lines of research, we subsequently extend our model to allow for several vertices being associated with the same label. The model, called δ-teams with families, is particular suitable for our application as it enables handling of gene duplicates. We develop algorithmic solutions for both models. We implemented the algorithm for discovering δ-teams with families and integrated it into a fully automated workflow for discovering gene clusters in Hi-C data, called GraphTeams. We applied it to human and mouse data to find intra- and interchromosomal gene cluster candidates. The results include intrachromosomal clusters that seem to exhibit a closer proximity in space than on their chromosomal DNA sequence. We further discovered interchromosomal gene clusters that contain genes from different chromosomes within the human genome, but are located on a single chromosome in mouse. By identifying δ-teams with families, we provide a flexible model to discover gene cluster candidates in Hi-C data. Our analysis of Hi-C data from human and mouse reveals several known gene clusters (thus validating our approach), but also few sparsely studied or possibly unknown gene cluster candidates that could be the source of further experimental investigations.
Isolation and characterization of the pea cytochrome c oxidase Vb gene.
Kubo, Nakao; Arimura, Shin-Ichi; Tsutsumi, Nobuhiro; Kadowaki, Koh-Ichi; Hirai, Masashi
2006-11-01
Three copies of the gene that encodes cytochrome c oxidase subunit Vb were isolated from the pea (PscoxVb-1, PscoxVb-2, and PscoxVb-3). Northern Blot and reverse transcriptase-PCR analyses suggest that all 3 genes are transcribed in the pea. Each pea coxVb gene has an N-terminal extended sequence that can encode a mitochondrial targeting signal, called a presequence. The localization of green fluorescent proteins fused with the presequence strongly suggests the targeting of pea COXVb proteins to mitochondria. Each pea coxVb gene has 5 intron sites within the coding region. These are similar to Arabidopsis and rice, although the intron lengths vary greatly. A phylogenetic analysis of coxVb suggests the occurrence of gene duplication events during angiosperm evolution. In particular, 2 duplication events might have occurred in legumes, grasses, and Solanaceae. A comparison of amino acid sequences in COXVb or its counterpart shows the conservation of several amino acids within a zinc finger motif. Interestingly, a homology search analysis showed that bacterial protein COG4391 and a mitochondrial complex I 13 kDa subunit also have similar amino acid compositions around this motif. Such similarity might reflect evolutionary relationships among the 3 proteins.
Cai, Sheng; Tian, Xueke; Sun, Lianli; Hu, Haihong; Zheng, Shirui; Jiang, Huidi; Yu, Lushan; Zeng, Su
2015-10-20
Wide use of platinum-based chemotherapeutic regimens for the treatment for carcinoma calls for a simple and selective detection of platinum compound in biological samples. On the basis of the platinum(II)-base pair coordination, a novel type of aptameric platform for platinum detection has been introduced. This chemiluminescence (CL) aptasensor consists of a designed streptavidin (SA) aptamer sequence in which several base pairs were replaced by G-G mismatches. Only in the presence of platinum, coordination occurs between the platinum and G-G base pairs as opposed to the hydrogen-bonded G-C base pairs, which leads to SA aptamer sequence activation, resulting in their binding to SA coated magnetic beads. These Pt-DNA coordination events were monitored by a simple and direct luminol-peroxide CL reaction through horseradish peroxidase (HRP) catalysis with a strong chemiluminescence emission. The validated ranges of quantification were 0.12-240 μM with a limit of detection of 60 nM and selectivity over other metal ions. This assay was also successfully used in urine sample determination. It will be a promising candidate for the detection of platinum in biomedical and environmental samples.
KungFQ: a simple and powerful approach to compress fastq files.
Grassi, Elena; Di Gregorio, Federico; Molineris, Ivan
2012-01-01
Nowadays storing data derived from deep sequencing experiments has become pivotal and standard compression algorithms do not exploit in a satisfying manner their structure. A number of reference-based compression algorithms have been developed but they are less adequate when approaching new species without fully sequenced genomes or nongenomic data. We developed a tool that takes advantages of fastq characteristics and encodes them in a binary format optimized in order to be further compressed with standard tools (such as gzip or lzma). The algorithm is straightforward and does not need any external reference file, it scans the fastq only once and has a constant memory requirement. Moreover, we added the possibility to perform lossy compression, losing some of the original information (IDs and/or qualities) but resulting in smaller files; it is also possible to define a quality cutoff under which corresponding base calls are converted to N. We achieve 2.82 to 7.77 compression ratios on various fastq files without losing information and 5.37 to 8.77 losing IDs, which are often not used in common analysis pipelines. In this paper, we compare the algorithm performance with known tools, usually obtaining higher compression levels.
A phylogenetic delimitation of the "Sphagnum subsecundum complex" (Sphagnaceae, Bryophyta).
Shaw, A Jonathan; Boles, Sandra; Shaw, Blanka
2008-06-01
A seemingly obvious but sometimes overlooked premise of any evolutionary analysis is delineating the group of taxa under study. This is especially problematic in some bryophyte groups because of morphological simplicity and convergence. This research applies information from nucleotide sequences for eight plastid and nuclear loci to delineate a group of northern hemisphere peat moss species, the so-called Sphagnum subsecundum complex, which includes species known to be gametophytically haploid or diploid (i.e., sporophytically diploid-tetraploid). Despite the fact that S. subsecundum and several species in the complex have been attributed disjunct ranges that include all major continents, phylogenetic analyses suggest that the group is actually restricted to Europe and eastern North America. Plants from western North America, from California to Alaska, which are morphologically similar to species of the S. subsecundum complex in eastern N. America and Europe, actually belong to a different deep clade within Sphagnum section Subsecunda. One species often considered part of the S. subsecundum complex, S. contortum, likely has a reticulate history involving species in the two deepest clades within section Subsecunda. Nucleotide sequences have a strong geographic structure across the section Subsecunda, but shallow tip clades suggest repeated long-distance dispersal in the section as well.
A transmission imaging spectrograph and microfabricated channel system for DNA analysis.
Simpson, J W; Ruiz-Martinez, M C; Mulhern, G T; Berka, J; Latimer, D R; Ball, J A; Rothberg, J M; Went, G T
2000-01-01
In this paper we present the development of a DNA analysis system using a microfabricated channel device and a novel transmission imaging spectrograph which can be efficiently incorporated into a high throughput genomics facility for both sizing and sequencing of DNA fragments. The device contains 48 channels etched on a glass substrate. The channels are sealed with a flat glass plate which also provides a series of apertures for sample loading and contact with buffer reservoirs. Samples can be easily loaded in volumes up to 640 nL without band broadening because of an efficient electrokinetic stacking at the electrophoresis channel entrance. The system uses a dual laser excitation source and a highly sensitive charge-coupled device (CCD) detector allowing for simultaneous detection of many fluorescent dyes. The sieving matrices for the separation of single-stranded DNA fragments are polymerized in situ in denaturing buffer systems. Examples of separation of single-stranded DNA fragments up to 500 bases in length are shown, including accurate sizing of GeneCalling fragments, and sequencing samples prepared with a reduced amount of dye terminators. An increase in sample throughput has been achieved by color multiplexing.
Winnowing sequences from a database search.
Berman, P; Zhang, Z; Wolf, Y I; Koonin, E V; Miller, W
2000-01-01
In database searches for sequence similarity, matches to a distinct sequence region (e.g., protein domain) are frequently obscured by numerous matches to another region of the same sequence. In order to cope with this problem, algorithms are developed to discard redundant matches. One model for this problem begins with a list of intervals, each with an associated score; each interval gives the range of positions in the query sequence that align to a database sequence, and the score is that of the alignment. If interval I is contained in interval J, and I's score is less than J's, then I is said to be dominated by J. The problem is then to identify each interval that is dominated by at least K other intervals, where K is a given level of "tolerable redundancy." An algorithm is developed to solve the problem in O(N log N) time and O(N*) space, where N is the number of intervals and N* is a precisely defined value that never exceeds N and is frequently much smaller. This criterion for discarding database hits has been implemented in the Blast program, as illustrated herein with examples. Several variations and extensions of this approach are also described.
Wakefield, Andrew; Stone, Emma L.; Jones, Gareth; Harris, Stephen
2015-01-01
The light-emitting diode (LED) street light market is expanding globally, and it is important to understand how LED lights affect wildlife populations. We compared evasive flight responses of moths to bat echolocation calls experimentally under LED-lit and -unlit conditions. Significantly, fewer moths performed ‘powerdive’ flight manoeuvres in response to bat calls (feeding buzz sequences from Nyctalus spp.) under an LED street light than in the dark. LED street lights reduce the anti-predator behaviour of moths, shifting the balance in favour of their predators, aerial hawking bats. PMID:26361558
Effects of pre- and pro-sequence of thaumatin on the secretion by Pichia pastoris.
Ide, Nobuyuki; Masuda, Tetsuya; Kitabatake, Naofumi
2007-11-23
Thaumatin is a 22-kDa sweet-tasting protein containing eight disulfide bonds. When thaumatin is expressed in Pichia pastoris using the thaumatin cDNA fused with both the alpha-factor signal sequence and the Kex2 protease cleavage site from Saccharomyces cerevisiae, the N-terminal sequence of the secreted thaumatin molecule is not processed correctly. To examine the role of the thaumatin cDNA-encoded N-terminal pre-sequence and C-terminal pro-sequence on the processing of thaumatin and efficiency of thaumatin production in P. pastoris, four expression plasmids with different pre-sequence and pro-sequence were constructed and transformed into P. pastoris. The transformants containing pre-thaumatin gene that has the native plant signal, secreted thaumatin molecules in the medium. The N-terminal amino acid sequence of the secreted thaumatin molecule was processed correctly. The production yield of thaumatin was not affected by the C-terminal pro-sequence, and the pro-sequence was not processed in P. pastoris, indicating that pro-sequence is not necessary for thaumatin synthesis.
Homozygous and hemizygous CNV detection from exome sequencing data in a Mendelian disease cohort.
Gambin, Tomasz; Akdemir, Zeynep C; Yuan, Bo; Gu, Shen; Chiang, Theodore; Carvalho, Claudia M B; Shaw, Chad; Jhangiani, Shalini; Boone, Philip M; Eldomery, Mohammad K; Karaca, Ender; Bayram, Yavuz; Stray-Pedersen, Asbjørg; Muzny, Donna; Charng, Wu-Lin; Bahrambeigi, Vahid; Belmont, John W; Boerwinkle, Eric; Beaudet, Arthur L; Gibbs, Richard A; Lupski, James R
2017-02-28
We developed an algorithm, HMZDelFinder, that uses whole exome sequencing (WES) data to identify rare and intragenic homozygous and hemizygous (HMZ) deletions that may represent complete loss-of-function of the indicated gene. HMZDelFinder was applied to 4866 samples in the Baylor-Hopkins Center for Mendelian Genomics (BHCMG) cohort and detected 773 HMZ deletion calls (567 homozygous or 206 hemizygous) with an estimated sensitivity of 86.5% (82% for single-exonic and 88% for multi-exonic calls) and precision of 78% (53% single-exonic and 96% for multi-exonic calls). Out of 773 HMZDelFinder-detected deletion calls, 82 were subjected to array comparative genomic hybridization (aCGH) and/or breakpoint PCR and 64 were confirmed. These include 18 single-exon deletions out of which 8 were exclusively detected by HMZDelFinder and not by any of seven other CNV detection tools examined. Further investigation of the 64 validated deletion calls revealed at least 15 pathogenic HMZ deletions. Of those, 7 accounted for 17-50% of pathogenic CNVs in different disease cohorts where 7.1-11% of the molecular diagnosis solved rate was attributed to CNVs. In summary, we present an algorithm to detect rare, intragenic, single-exon deletion CNVs using WES data; this tool can be useful for disease gene discovery efforts and clinical WES analyses. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Luft, F; Klaes, R; Nees, M; Dürst, M; Heilmann, V; Melsheimer, P; von Knebel Doeberitz, M
2001-04-01
Human papillomavirus (HPV) genomes usually persist as episomal molecules in HPV associated preneoplastic lesions whereas they are frequently integrated into the host cell genome in HPV-related cancers cells. This suggests that malignant conversion of HPV-infected epithelia is linked to recombination of cellular and viral sequences. Due to technical limitations, precise sequence information on viral-cellular junctions were obtained only for few cell lines and primary lesions. In order to facilitate the molecular analysis of genomic HPV integration, we established a ligation-mediated PCR assay for the detection of integrated papillomavirus sequences (DIPS-PCR). DIPS-PCR was initially used to amplify genomic viral-cellular junctions from HPV-associated cervical cancer cell lines (C4-I, C4-II, SW756, and HeLa) and HPV-immortalized keratinocyte lines (HPKIA, HPKII). In addition to junctions already reported in public data bases, various new fusion fragments were identified. Subsequently, 22 different viral-cellular junctions were amplified from 17 cervical carcinomas and 1 vulval intraepithelial neoplasia (VIN III). Sequence analysis of each junction revealed that the viral E1 open reading frame (ORF) was fused to cellular sequences in 20 of 22 (91%) cases. Chromosomal integration loci mapped to chromosomes 1 (2n), 2 (3n), 7 (2n), 8 (3n), 10 (1n), 14 (5n), 16 (1n), 17 (2n), and mitochondrial DNA (1n), suggesting random distribution of chromosomal integration sites. Precise sequence information obtained by DIPS-PCR was further used to monitor the monoclonal origin of 4 cervical cancers, 1 case of recurrent premalignant lesions and 1 lymph node metastasis. Therefore, DIPS-PCR might allow efficient therapy control and prediction of relapse in patients with HPV-associated anogenital cancers. Copyright 2001 Wiley-Liss, Inc.
Rocher, Solen; Jean, Martine; Castonguay, Yves; Belzile, François
2015-01-01
Genotyping-by-sequencing (GBS) is a relatively low-cost high throughput genotyping technology based on next generation sequencing and is applicable to orphan species with no reference genome. A combination of genome complexity reduction and multiplexing with DNA barcoding provides a simple and affordable way to resolve allelic variation between plant samples or populations. GBS was performed on ApeKI libraries using DNA from 48 genotypes each of two heterogeneous populations of tetraploid alfalfa (Medicago sativa spp. sativa): the synthetic cultivar Apica (ATF0) and a derived population (ATF5) obtained after five cycles of recurrent selection for superior tolerance to freezing (TF). Nearly 400 million reads were obtained from two lanes of an Illumina HiSeq 2000 sequencer and analyzed with the Universal Network-Enabled Analysis Kit (UNEAK) pipeline designed for species with no reference genome. Following the application of whole dataset-level filters, 11,694 single nucleotide polymorphism (SNP) loci were obtained. About 60% had a significant match on the Medicago truncatula syntenic genome. The accuracy of allelic ratios and genotype calls based on GBS data was directly assessed using 454 sequencing on a subset of SNP loci scored in eight plant samples. Sequencing depth in this study was not sufficient for accurate tetraploid allelic dosage, but reliable genotype calls based on diploid allelic dosage were obtained when using additional quality filtering. Principal Component Analysis of SNP loci in plant samples revealed that a small proportion (<5%) of the genetic variability assessed by GBS is able to differentiate ATF0 and ATF5. Our results confirm that analysis of GBS data using UNEAK is a reliable approach for genome-wide discovery of SNP loci in outcrossed polyploids. PMID:26115486
Shin, Saeam; Kim, Yoonjung; Chul Oh, Seoung; Yu, Nae; Lee, Seung-Tae; Rak Choi, Jong; Lee, Kyung-A
2017-05-23
In this study, we validated the analytical performance of BRCA1/2 sequencing using Ion Torrent's new bench-top sequencer with amplicon panel with optimized bioinformatics pipelines. Using 43 samples that were previously validated by Illumina's MiSeq platform and/or by Sanger sequencing/multiplex ligation-dependent probe amplification, we amplified the target with the Oncomine™ BRCA Research Assay and sequenced on Ion Torrent S5 XL (Thermo Fisher Scientific, Waltham, MA, USA). We compared two bioinformatics pipelines for optimal processing of S5 XL sequence data: the Torrent Suite with a plug-in Torrent Variant Caller (Thermo Fisher Scientific), and commercial NextGENe software (Softgenetics, State College, PA, USA). All expected 681 single nucleotide variants, 15 small indels, and three copy number variants were correctly called, except one common variant adjacent to a rare variant on the primer-binding site. The sensitivity, specificity, false positive rate, and accuracy for detection of single nucleotide variant and small indels of S5 XL sequencing were 99.85%, 100%, 0%, and 99.99% for the Torrent Variant Caller and 99.85%, 99.99%, 0.14%, and 99.99% for NextGENe, respectively. The reproducibility of variant calling was 100%, and the precision of variant frequency also showed good performance with coefficients of variation between 0.32 and 5.29%. We obtained highly accurate data through uniform and sufficient coverage depth over all target regions and through optimization of the bioinformatics pipeline. We confirmed that our platform is accurate and practical for diagnostic BRCA1/2 testing in a clinical laboratory.
Sequence capture of ultraconserved elements from bird museum specimens.
McCormack, John E; Tsai, Whitney L E; Faircloth, Brant C
2016-09-01
New DNA sequencing technologies are allowing researchers to explore the genomes of the millions of natural history specimens collected prior to the molecular era. Yet, we know little about how well specific next-generation sequencing (NGS) techniques work with the degraded DNA typically extracted from museum specimens. Here, we use one type of NGS approach, sequence capture of ultraconserved elements (UCEs), to collect data from bird museum specimens as old as 120 years. We targeted 5060 UCE loci in 27 western scrub-jays (Aphelocoma californica) representing three evolutionary lineages that could be species, and we collected an average of 3749 UCE loci containing 4460 single nucleotide polymorphisms (SNPs). Despite older specimens producing fewer and shorter loci in general, we collected thousands of markers from even the oldest specimens. More sequencing reads per individual helped to boost the number of UCE loci we recovered from older specimens, but more sequencing was not as successful at increasing the length of loci. We detected contamination in some samples and determined that contamination was more prevalent in older samples that were subject to less sequencing. For the phylogeny generated from concatenated UCE loci, contamination led to incorrect placement of some individuals. In contrast, a species tree constructed from SNPs called within UCE loci correctly placed individuals into three monophyletic groups, perhaps because of the stricter analytical procedures used for SNP calling. This study and other recent studies on the genomics of museum specimens have profound implications for natural history collections, where millions of older specimens should now be considered genomic resources. © 2015 The Authors. Molecular Ecology Resources Published by John Wiley & Sons Ltd.
AmpliVar: mutation detection in high-throughput sequence from amplicon-based libraries.
Hsu, Arthur L; Kondrashova, Olga; Lunke, Sebastian; Love, Clare J; Meldrum, Cliff; Marquis-Nicholson, Renate; Corboy, Greg; Pham, Kym; Wakefield, Matthew; Waring, Paul M; Taylor, Graham R
2015-04-01
Conventional means of identifying variants in high-throughput sequencing align each read against a reference sequence, and then call variants at each position. Here, we demonstrate an orthogonal means of identifying sequence variation by grouping the reads as amplicons prior to any alignment. We used AmpliVar to make key-value hashes of sequence reads and group reads as individual amplicons using a table of flanking sequences. Low-abundance reads were removed according to a selectable threshold, and reads above this threshold were aligned as groups, rather than as individual reads, permitting the use of sensitive alignment tools. We show that this approach is more sensitive, more specific, and more computationally efficient than comparable methods for the analysis of amplicon-based high-throughput sequencing data. The method can be extended to enable alignment-free confirmation of variants seen in hybridization capture target-enrichment data. © 2015 WILEY PERIODICALS, INC.
Evidence of automatic processing in sequence learning using process-dissociation
Mong, Heather M.; McCabe, David P.; Clegg, Benjamin A.
2012-01-01
This paper proposes a way to apply process-dissociation to sequence learning in addition and extension to the approach used by Destrebecqz and Cleeremans (2001). Participants were trained on two sequences separated from each other by a short break. Following training, participants self-reported their knowledge of the sequences. A recognition test was then performed which required discrimination of two trained sequences, either under the instructions to call any sequence encountered in the experiment “old” (the inclusion condition), or only sequence fragments from one half of the experiment “old” (the exclusion condition). The recognition test elicited automatic and controlled process estimates using the process dissociation procedure, and suggested both processes were involved. Examining the underlying processes supporting performance may provide more information on the fundamental aspects of the implicit and explicit constructs than has been attainable through awareness testing. PMID:22679465
Matsutani, Sachiko
2004-08-09
In eukaryotes, RNA polymerase III (RNAP III) transcribes the genes for small RNAs like tRNAs, 5S rRNA, and several viral RNAs, and short interspersed repetitive elements (SINEs). The genes for these RNAs and SINEs have internal promoters that consist of two regions. These two regions are called the A and B blocks. The multisubunit transcription factor TFIIIC is required for transcription initiation of RNAP III; in transcription of tRNAs, the B-block binding subunit of TFIIIC recognizes a promoter. Although internal promoter sequences are conserved in eukaryotes, no evidence of homology between the B-block binding subunits of vertebrates and yeasts has been reported previously. Here, I reported the results of PSI-BLAST searches using the B-block binding subunits of human and Shizosacchromyces pombe as queries, showing that the same Arabidopsis proteins were hit with low E-values in both searches. Comparison of the convergent iterative alignments obtained by these PSI-BLAST searches revealed that the vertebrate, yeast, and Arabidopsis proteins have similarities in their N-terminal one-third regions. In these regions, there were three domains with conserved sequence similarities, one located in the N-terminal end region. The N-terminal end region of the B-block binding subunit of Saccharomyces cerevisiae is tentatively identified as a HMG box, which is the DNA binding motif. Although I compared the alignment of the N-terminal end regions of the B-block binding subunits, and their homologs, with that of the HMG boxes, it is not clear whether they are related. Molecular phylogenetic analyses using the small subunit rRNA and ubiquitous proteins like actin and alpha-tubulin, show that fungi are more closely related to animals than either is to plants. Interestingly, the results obtained in this study show that, with respect to the B-block binding subunits of TFIIICs, animals appear to be evolutionarily closer to plants than to fungi.
dupA as a risk determinant in Helicobacter pylori infection.
Douraghi, Masoumeh; Mohammadi, Marjan; Oghalaie, Akbar; Abdirad, Afshin; Mohagheghi, Mohammad Ali; Hosseini, Mahmoud Eshagh; Zeraati, Hojat; Ghasemi, Amir; Esmaieli, Maryam; Mohajerani, Nazanin
2008-05-01
The Helicobacter pylori duodenal ulcer promoting (dupA) gene has been previously described as a risk marker for duodenal ulcer (DU) development and a protective factor against gastric cancer (GC). Recent studies which have assessed the application of dupA in the prediction of clinical outcomes have been controversial. In the current study, the association of dupA with the clinical outcomes and histopathological changes following H. pylori infection was evaluated in Iranian patients. A total of 157 H. pylori-infected patients with DU (n=30), gastric ulcer (n=23), gastritis (n=68) or GC (n=36) were assessed. The presence of jhp0917 and jhp0918 genes was determined by gene specific PCR. Gastric histopathological changes were recorded according to the updated Sydney system. Seventy-eight (49.7 %) and 71 (45.2 %) of the 157 tested strains, respectively, were positive and negative for both genes. The remaining 8 (5.09 %) of the 157 strains were jhp0917-positive/jhp0918-negative. Univariate analysis showed inverse associations between dupA and histological features including dysplasia as the penultimate stage of GC and lymphoid follicles as a consequence of relatively long-standing H. pylori-associated gastritis. The degrees of nucleotide sequence identity of Iranian strains to Colombian, Brazilian and Indian strains ranged from 86.1 to 100 % for the aligned regions of jhp0917, from 88 to 98.8 % for jhp0918 and from 93.4 to 99.5 % for the partial sequences of the dupA gene. Despite the fact that possession of the dupA gene showed no association with any disease category in our population as reported in several other countries, association of dupA-negative strains of H. pylori with pre-malignant lesions calls for additional studies to evaluate the role of this gene as a protective marker against GC.
Evolutionary neural networks for anomaly detection based on the behavior of a program.
Han, Sang-Jun; Cho, Sung-Bae
2006-06-01
The process of learning the behavior of a given program by using machine-learning techniques (based on system-call audit data) is effective to detect intrusions. Rule learning, neural networks, statistics, and hidden Markov models (HMMs) are some of the kinds of representative methods for intrusion detection. Among them, neural networks are known for good performance in learning system-call sequences. In order to apply this knowledge to real-world problems successfully, it is important to determine the structures and weights of these call sequences. However, finding the appropriate structures requires very long time periods because there are no suitable analytical solutions. In this paper, a novel intrusion-detection technique based on evolutionary neural networks (ENNs) is proposed. One advantage of using ENNs is that it takes less time to obtain superior neural networks than when using conventional approaches. This is because they discover the structures and weights of the neural networks simultaneously. Experimental results with the 1999 Defense Advanced Research Projects Agency (DARPA) Intrusion Detection Evaluation (IDEVAL) data confirm that ENNs are promising tools for intrusion detection.
Tang, Hua; Chen, Wei; Lin, Hao
2016-04-01
Immunoglobulins, also called antibodies, are a group of cell surface proteins which are produced by the immune system in response to the presence of a foreign substance (called antigen). They play key roles in many medical, diagnostic and biotechnological applications. Correct identification of immunoglobulins is crucial to the comprehension of humoral immune function. With the avalanche of protein sequences identified in postgenomic age, it is highly desirable to develop computational methods to timely identify immunoglobulins. In view of this, we designed a predictor called "IGPred" by formulating protein sequences with the pseudo amino acid composition into which nine physiochemical properties of amino acids were incorporated. Jackknife cross-validated results showed that 96.3% of immunoglobulins and 97.5% of non-immunoglobulins can be correctly predicted, indicating that IGPred holds very high potential to become a useful tool for antibody analysis. For the convenience of most experimental scientists, a web-server for IGPred was established at http://lin.uestc.edu.cn/server/IGPred. We believe that the web-server will become a powerful tool to study immunoglobulins and to guide related experimental validations.
Channing, Alan; Baptista, Ninda
2013-01-01
A study combining DNA sequences of the mitochondrial 16S rRNA gene, advertisement calls and morphology of some southern African river frogs confirms Amietia vandijki (Visser & Channing, 1997) as a good species. The form presently referred to as Amietia angolensis in southern Africa is shown to comprise two species: Amietia angolensis (Bocage, 1866) known from Angola, and Amietia quecketti (Boulenger, 1895) known from South Africa, Zimbabwe and Lesotho. Junior synonyms of A. quecketti include Rana theileri Mocquard, 1906 and Afrana dracomontana Channing, 1978. The form presently known as Amietia fuscigula is shown to consist of two distantly related taxa: Amietia fuscigula (Duméril & Bibron, 1841) from the south-western Cape and an undescribed species that we here name Amietia poyntoni sp. nov. Channing & Baptista, known from the rest of South Africa and Namibia. These five species have large differences in 16S sequences, as well as differences in morphology and advertisement call. Call and molecular data are both diagnostic, while morphology shows some overlap between taxa. An extended study of the genus across Africa is in preparation.
From seconds to months: an overview of multi-scale dynamics of mobile telephone calls
NASA Astrophysics Data System (ADS)
Saramäki, Jari; Moro, Esteban
2015-06-01
Big Data on electronic records of social interactions allow approaching human behaviour and sociality from a quantitative point of view with unforeseen statistical power. Mobile telephone Call Detail Records (CDRs), automatically collected by telecom operators for billing purposes, have proven especially fruitful for understanding one-to-one communication patterns as well as the dynamics of social networks that are reflected in such patterns. We present an overview of empirical results on the multi-scale dynamics of social dynamics and networks inferred from mobile telephone calls. We begin with the shortest timescales and fastest dynamics, such as burstiness of call sequences between individuals, and "zoom out" towards longer temporal and larger structural scales, from temporal motifs formed by correlated calls between multiple individuals to long-term dynamics of social groups. We conclude this overview with a future outlook.
NMR-based diffusion lattice imaging
NASA Astrophysics Data System (ADS)
Laun, Frederik Bernd; Müller, Lars; Kuder, Tristan Anselm
2016-03-01
Nuclear magnetic resonance (NMR) diffusion experiments are widely employed as they yield information about structures hindering the diffusion process, e.g., about cell membranes. While it has been shown in recent articles that these experiments can be used to determine the shape of closed pores averaged over a volume of interest, it is still an open question how much information can be gained in open well-connected systems. In this theoretical work, it is shown that the full structure information of connected periodic systems is accessible. To this end, the so-called "SEquential Rephasing by Pulsed field-gradient Encoding N Time intervals" (SERPENT) sequence is used, which employs several diffusion encoding gradient pulses with different amplitudes. Two two-dimensional solid matrices that are surrounded by an NMR-visible medium are considered: a hexagonal lattice of cylinders and a rectangular lattice of isosceles triangles.
Secreted fungal aspartic proteases: A review.
Mandujano-González, Virginia; Villa-Tanaca, Lourdes; Anducho-Reyes, Miguel Angel; Mercado-Flores, Yuridia
2016-01-01
The aspartic proteases, also called aspartyl and aspartate proteases or acid proteases (E.C.3.4.23), belong to the endopeptidase family and are characterized by the conserved sequence Asp-Gly-Thr at the active site. These enzymes are found in a wide variety of microorganisms in which they perform important functions related to nutrition and pathogenesis. In addition, their high activity and stability at acid pH make them attractive for industrial application in the food industry; specifically, they are used as milk-coagulating agents in cheese production or serve to improve the taste of some foods. This review presents an analysis of the characteristics and properties of secreted microbial aspartic proteases and their potential for commercial application. Copyright © 2016 Asociación Española de Micología. Published by Elsevier Espana. All rights reserved.
Verstappen, Koen M; Huijbregts, Loes; Spaninks, Mirlin; Wagenaar, Jaap A; Fluit, Ad C; Duim, Birgitta
2017-01-01
Staphylococcus pseudintermedius is an opportunistic pathogen in dogs and cats and occasionally causes infections in humans. S. pseudintermedius is often resistant to multiple classes of antimicrobials. It requires a reliable detection so that it is not misidentified as S. aureus. Phenotypic and currently-used molecular-based diagnostic assays lack specificity or are labour-intensive using multiplex PCR or nucleic acid sequencing. The aim of this study was to identify a specific target for real-time PCR by comparing whole genome sequences of S. pseudintermedius and non-pseudintermedius.Genome sequences were downloaded from public repositories and supplemented by isolates that were sequenced in this study. A Perl-script was written that analysed 300-nt fragments from a reference genome sequence of S. pseudintermedius and checked if this sequence was present in other S. pseudintermedius genomes (n = 74) and non-pseudintermedius genomes (n = 138). Six sequences specific for S. pseudintermedius were identified (sequence length between 300-500 nt). One sequence, which was located in the spsJ gene, was used to develop primers and a probe. The real-time PCR showed 100% specificity when testing for S. pseudintermedius isolates (n = 54), and eight other staphylococcal species (n = 43). In conclusion, a novel approach by comparing whole genome sequences identified a sequence that is specific for S. pseudintermedius and provided a real-time PCR target for rapid and reliable detection of S. pseudintermedius.
Inverse statistical physics of protein sequences: a key issues review.
Cocco, Simona; Feinauer, Christoph; Figliuzzi, Matteo; Monasson, Rémi; Weigt, Martin
2018-03-01
In the course of evolution, proteins undergo important changes in their amino acid sequences, while their three-dimensional folded structure and their biological function remain remarkably conserved. Thanks to modern sequencing techniques, sequence data accumulate at unprecedented pace. This provides large sets of so-called homologous, i.e. evolutionarily related protein sequences, to which methods of inverse statistical physics can be applied. Using sequence data as the basis for the inference of Boltzmann distributions from samples of microscopic configurations or observables, it is possible to extract information about evolutionary constraints and thus protein function and structure. Here we give an overview over some biologically important questions, and how statistical-mechanics inspired modeling approaches can help to answer them. Finally, we discuss some open questions, which we expect to be addressed over the next years.
Inverse statistical physics of protein sequences: a key issues review
NASA Astrophysics Data System (ADS)
Cocco, Simona; Feinauer, Christoph; Figliuzzi, Matteo; Monasson, Rémi; Weigt, Martin
2018-03-01
In the course of evolution, proteins undergo important changes in their amino acid sequences, while their three-dimensional folded structure and their biological function remain remarkably conserved. Thanks to modern sequencing techniques, sequence data accumulate at unprecedented pace. This provides large sets of so-called homologous, i.e. evolutionarily related protein sequences, to which methods of inverse statistical physics can be applied. Using sequence data as the basis for the inference of Boltzmann distributions from samples of microscopic configurations or observables, it is possible to extract information about evolutionary constraints and thus protein function and structure. Here we give an overview over some biologically important questions, and how statistical-mechanics inspired modeling approaches can help to answer them. Finally, we discuss some open questions, which we expect to be addressed over the next years.
Digital signal processing methods for biosequence comparison.
Benson, D C
1990-01-01
A method is discussed for DNA or protein sequence comparison using a finite field fast Fourier transform, a digital signal processing technique; and statistical methods are discussed for analyzing the output of this algorithm. This method compares two sequences of length N in computing time proportional to N log N compared to N2 for methods currently used. This method makes it feasible to compare very long sequences. An example is given to show that the method correctly identifies sites of known homology. PMID:2349096
Governor Bush makes first phone call to KSC using new area code
NASA Technical Reports Server (NTRS)
1999-01-01
At 8 a.m. in the videoconference room at Headquarters, Deputy Director for Business Operations Jim Jennings (center) makes the connection for a phone call from Florida Governor Jeb Bush and Center Director Roy Bridges in Tallahassee, Fla. The call is to inaugurate the change of KSC's area code from 407 to 321, effective today. Key representatives of KSC contractors, along with KSC directorates, fill the room where the phone call is being received. Seated next to Jennings are Robert Osband (left), Florida Space Institute, and Col. Stephan Duresky (right), vice commander, 45th Space Wing. Osband is the one who suggested the 3-2-1 sequence to reflect the importance of the space industry to Florida's space coast.
Governor Bush makes first phone call to KSC using new area code
NASA Technical Reports Server (NTRS)
1999-01-01
At 8 a.m. in the videoconference room at Headquarters, Deputy Director for Business Operations Jim Jennings (center) waits for a phone call from Florida Governor Jeb Bush and Center Director Roy Bridges in Tallahassee, Fla. The call is to inaugurate the change of KSC's area code from 407 to 321, effective today. Key representatives of KSC contractors, along with KSC directorates, fill the room where the phone call is being received. Seated next to Jennings are Robert Osband (left), Florida Space Institute, and Col. Stephan Duresky (right), vice commander, 45th Space Wing. Osband is the one who suggested the 3-2-1 sequence, to reflect the importance of the space industry to Florida's space coast.
Randomized Trial of Nicotine Lozenges and Phone Counseling for Smokeless Tobacco Cessation
Danaher, Brian G.; Ebbert, Jon O.; van Meter, Nora; Lichtenstein, Edward; Widdop, Chris; Crowley, Ryann; Akers, Laura; Seeley, John R.
2015-01-01
Introduction: Relatively few treatment programs have been developed specifically for smokeless tobacco (ST) users who want to quit. Their results suggest that self-help materials, telephone counseling, and nicotine lozenges are efficacious. This study provides the first direct examination of the separate and combined effects of telephone counseling and lozenges. Methods: We recruited ST users online (N = 1067) and randomly assigned them to 1 of 3 conditions: (a) a lozenge group (n = 356), who were mailed 4-mg nicotine lozenges; (b) a coach calls group (n = 354), who were offered 3 coaching phone calls; or (c) a lozenge + coach calls group (N = 357), who received both lozenges and coaching calls. Additionally, all participants were mailed self-help materials. Self-reported tobacco abstinence was assessed at 3 and 6 months after randomization. Results: Complete-case and intention-to-treat (ITT) analyses for all tobacco abstinence were performed at 3 months, 6 months, and both 3 and 6 months (repeated point prevalence). ITT analyses revealed a highly similar result: the lozenge + coach calls condition was significantly more successful in encouraging tobacco abstinence than either the lozenge group or the coach calls group, which did not differ. Conclusions: Combining nicotine lozenges and phone counseling significantly increased tobacco abstinence rates compared with either intervention alone, whereas coach calls and lozenges were equivalent. The study confirms the high tobacco abstinence rates for self-help ST cessation interventions and offers guidance to providing tobacco treatment to ST users. PMID:25168034
Nanopore sequencing of drug-resistance-associated genes in malaria parasites, Plasmodium falciparum.
Runtuwene, Lucky R; Tuda, Josef S B; Mongan, Arthur E; Makalowski, Wojciech; Frith, Martin C; Imwong, Mallika; Srisutham, Suttipat; Nguyen Thi, Lan Anh; Tuan, Nghia Nguyen; Eshita, Yuki; Maeda, Ryuichiro; Yamagishi, Junya; Suzuki, Yutaka
2018-05-29
Here, we report the application of a portable sequencer, MinION, for genotyping the malaria parasite Plasmodium falciparum. In the present study, an amplicon mixture of nine representative genes causing resistance to anti-malaria drugs is diagnosed. First, we developed the procedure for four laboratory strains (3D7, Dd2, 7G8, and K1), and then applied the developed procedure to ten clinical samples. We sequenced and re-sequenced the samples using the obsolete flow cell R7.3 and the most recent flow cell R9.4. Although the average base-call accuracy of the MinION sequencer was 74.3%, performing >50 reads at a given position improves the accuracy of the SNP call, yielding a precision and recall rate of 0.92 and 0.8, respectively, with flow cell R7.3. These numbers increased significantly with flow cell R9.4, in which the precision and recall are 1 and 0.97, respectively. Based on the SNP information, the drug resistance status in ten clinical samples was inferred. We also analyzed K13 gene mutations from 54 additional clinical samples as a proof of concept. We found that a novel amino-acid changing variation is dominant in this area. In addition, we performed a small population-based analysis using 3 and 5 cases (K13) and 10 and 5 cases (PfCRT) from Thailand and Vietnam, respectively. We identified distinct genotypes from the respective regions. This approach will change the standard methodology for the sequencing diagnosis of malaria parasites, especially in developing countries.
Yu, Yang; Wei, Jiankai; Zhang, Xiaojun; Liu, Jingwen; Liu, Chengzhang; Li, Fuhua; Xiang, Jianhai
2014-01-01
The application of next generation sequencing technology has greatly facilitated high throughput single nucleotide polymorphism (SNP) discovery and genotyping in genetic research. In the present study, SNPs were discovered based on two transcriptomes of Litopenaeus vannamei (L. vannamei) generated from Illumina sequencing platform HiSeq 2000. One transcriptome of L. vannamei was obtained through sequencing on the RNA from larvae at mysis stage and its reference sequence was de novo assembled. The data from another transcriptome were downloaded from NCBI and the reads of the two transcriptomes were mapped separately to the assembled reference by BWA. SNP calling was performed using SAMtools. A total of 58,717 and 36,277 SNPs with high quality were predicted from the two transcriptomes, respectively. SNP calling was also performed using the reads of two transcriptomes together, and a total of 96,040 SNPs with high quality were predicted. Among these 96,040 SNPs, 5,242 and 29,129 were predicted as non-synonymous and synonymous SNPs respectively. Characterization analysis of the predicted SNPs in L. vannamei showed that the estimated SNP frequency was 0.21% (one SNP per 476 bp) and the estimated ratio for transition to transversion was 2.0. Fifty SNPs were randomly selected for validation by Sanger sequencing after PCR amplification and 76% of SNPs were confirmed, which indicated that the SNPs predicted in this study were reliable. These SNPs will be very useful for genetic study in L. vannamei, especially for the high density linkage map construction and genome-wide association studies. PMID:24498047
A protein secretion system linked to bacteroidete gliding motility and pathogenesis
Sato, Keiko; Naito, Mariko; Yukitake, Hideharu; Hirakawa, Hideki; Shoji, Mikio; McBride, Mark J.; Rhodes, Ryan G.; Nakayama, Koji
2009-01-01
Porphyromonas gingivalis secretes strong proteases called gingipains that are implicated in periodontal pathogenesis. Protein secretion systems common to other Gram-negative bacteria are lacking in P. gingivalis, but several proteins, including PorT, have been linked to gingipain secretion. Comparative genome analysis and genetic experiments revealed 11 additional proteins involved in gingipain secretion. Six of these (PorK, PorL, PorM, PorN, PorW, and Sov) were similar in sequence to Flavobacterium johnsoniae gliding motility proteins, and two others (PorX and PorY) were putative two-component system regulatory proteins. Real-time RT-PCR analysis revealed that porK, porL, porM, porN, porP, porT, and sov were down-regulated in P. gingivalis porX and porY mutants. Disruption of the F. johnsoniae porT ortholog resulted in defects in motility, chitinase secretion, and translocation of a gliding motility protein, SprB adhesin, to the cell surface, providing a link between a unique protein translocation system and a motility apparatus in members of the Bacteroidetes phylum. PMID:19966289
NASA Astrophysics Data System (ADS)
Lee, Y.; Bescond, M.; Logoteta, D.; Cavassilas, N.; Lannoo, M.; Luisier, M.
2018-05-01
We propose an efficient method to quantum mechanically treat anharmonic interactions in the atomistic nonequilibrium Green's function simulation of phonon transport. We demonstrate that the so-called lowest-order approximation, implemented through a rescaling technique and analytically continued by means of the Padé approximants, can be used to accurately model third-order anharmonic effects. Although the paper focuses on a specific self-energy, the method is applicable to a very wide class of physical interactions. We apply this approach to the simulation of anharmonic phonon transport in realistic Si and Ge nanowires with uniform or discontinuous cross sections. The effect of increasing the temperature above 300 K is also investigated. In all the considered cases, we are able to obtain a good agreement with the routinely adopted self-consistent Born approximation, at a remarkably lower computational cost. In the more complicated case of high temperatures (≫300 K), we find that the first-order Richardson extrapolation applied to the sequence of the Padé approximants N -1 /N results in a significant acceleration of the convergence.
NASA Astrophysics Data System (ADS)
Prihandini, Rafiantika M.; Agustin, I. H.; Dafik
2018-04-01
In this paper we use simple and non trivial graph. If there exist a bijective function g:V(G) \\cup E(G)\\to \\{1,2,\\ldots,|V(G)|+|E(G)|\\}, such that for all subgraphs {P}2\\vartriangleright H of G isomorphic to H, then graph G is called an (a, b)-{P}2\\vartriangleright H-antimagic total graph. Furthermore, we can consider the total {P}2\\vartriangleright H-weights W({P}2\\vartriangleright H)={\\sum }v\\in V({P2\\vartriangleright H)}f(v)+{\\sum }e\\in E({P2\\vartriangleright H)}f(e) which should form an arithmetic sequence {a, a + d, a + 2d, …, a + (n ‑ 1)d}, where a and d are positive integers and n is the number of all subgraphs isomorphic to H. Our paper describes the existence of super (a, b)-{P}2\\vartriangleright H antimagic total labeling for graph operation of comb product namely of G=L\\vartriangleright H, where L is a (b, d*)-edge antimagic vertex labeling graph and H is a connected graph.
The Douglas-fir genome sequence reveals specialization of the photosynthetic apparatus in Pinaceae
David B. Neale; Patrick E. McGuire; Nicholas C. Wheeler; Kristian A. Stevens; Marc W. Crepeau; Charis Cardeno; Aleksey V. Zimin; Daniela Puiu; Geo M. Pertea; U. Uzay Sezen; Claudio Casola; Tomasz E. Koralewski; Robin Paul; Daniel Gonzalez-Ibeas; Sumaira Zaman; Richard Cronn; Mark Yandell; Carson Holt; Charles H. Langley; James A. Yorke; Steven L. Salzberg; Jill L. Wegrzyn
2017-01-01
A reference genome sequence for Pseudotsuga menziesii var. menziesii (Mirb.) Franco (Coastal Douglas-fir) is reported, thus providing a reference sequence for a third genus of the family Pinaceae. The contiguity and quality of the genome assembly far exceeds that of other conifer reference genome sequences (contig N50 = 44,136 bp and scaffold N50...
Hirata, Satoshi; Kojima, Kaname; Misawa, Kazuharu; Gervais, Olivier; Kawai, Yosuke; Nagasaki, Masao
2018-05-01
Forensic DNA typing is widely used to identify missing persons and plays a central role in forensic profiling. DNA typing usually uses capillary electrophoresis fragment analysis of PCR amplification products to detect the length of short tandem repeat (STR) markers. Here, we analyzed whole genome data from 1,070 Japanese individuals generated using massively parallel short-read sequencing of 162 paired-end bases. We have analyzed 843,473 STR loci with two to six basepair repeat units and cataloged highly polymorphic STR loci in the Japanese population. To evaluate the performance of the cataloged STR loci, we compared 23 STR loci, widely used in forensic DNA typing, with capillary electrophoresis based STR genotyping results in the Japanese population. Seventeen loci had high correlations and high call rates. The other six loci had low call rates or low correlations due to either the limitations of short-read sequencing technology, the bioinformatics tool used, or the complexity of repeat patterns. With these analyses, we have also purified the suitable 218 STR loci with four basepair repeat units and 53 loci with five basepair repeat units both for short read sequencing and PCR based technologies, which would be candidates to the actual forensic DNA typing in Japanese population.
Metagenome assembly through clustering of next-generation sequencing data using protein sequences.
Sim, Mikang; Kim, Jaebum
2015-02-01
The study of environmental microbial communities, called metagenomics, has gained a lot of attention because of the recent advances in next-generation sequencing (NGS) technologies. Microbes play a critical role in changing their environments, and the mode of their effect can be solved by investigating metagenomes. However, the difficulty of metagenomes, such as the combination of multiple microbes and different species abundance, makes metagenome assembly tasks more challenging. In this paper, we developed a new metagenome assembly method by utilizing protein sequences, in addition to the NGS read sequences. Our method (i) builds read clusters by using mapping information against available protein sequences, and (ii) creates contig sequences by finding consensus sequences through probabilistic choices from the read clusters. By using simulated NGS read sequences from real microbial genome sequences, we evaluated our method in comparison with four existing assembly programs. We found that our method could generate relatively long and accurate metagenome assemblies, indicating that the idea of using protein sequences, as a guide for the assembly, is promising. Copyright © 2015 Elsevier B.V. All rights reserved.
Characterization of a Fluorescent Protein Reporter System
2008-03-01
pathways are initiated with the binding of a small molecule to a catalytic ribonucleic acid molecule (RNA), called a ribozyme (Thodima et al., 2006). The... ribozyme is part of a larger RNA construct, called a riboswitch, which initiates translation of a specific genetic sequence on a plasmid (circular...protein gene. Yen et al. (2004) reported insertion of a self-cleaving ribozyme upstream of the reporter gene. In the absence of a regulator (“off
ɛ-connectedness, finite approximations, shape theory and coarse graining in hyperspaces
NASA Astrophysics Data System (ADS)
Alonso-Morón, Manuel; Cuchillo-Ibanez, Eduardo; Luzón, Ana
2008-12-01
We use upper semifinite hyperspaces of compacta to describe ε-connectedness and to compute homology from finite approximations. We find a new connection between ε-connectedness and the so-called Shape Theory. We construct a geodesically complete R-tree, by means of ε-components at different resolutions, whose behavior at infinite captures the topological structure of the space of components of a given compact metric space. We also construct inverse sequences of finite spaces using internal finite approximations of compact metric spaces. These sequences can be converted into inverse sequences of polyhedra and simplicial maps by means of what we call the Alexandroff-McCord correspondence. This correspondence allows us to relate upper semifinite hyperspaces of finite approximation with the Vietoris-Rips complexes of such approximations at different resolutions. Two motivating examples are included in the introduction. We propose this procedure as a different mathematical foundation for problems on data analysis. This process is intrinsically related to the methodology of shape theory. This paper reinforces Robins’s idea of using methods from shape theory to compute homology from finite approximations.
Sequence Segmentation with changeptGUI.
Tasker, Edward; Keith, Jonathan M
2017-01-01
Many biological sequences have a segmental structure that can provide valuable clues to their content, structure, and function. The program changept is a tool for investigating the segmental structure of a sequence, and can also be applied to multiple sequences in parallel to identify a common segmental structure, thus providing a method for integrating multiple data types to identify functional elements in genomes. In the previous edition of this book, a command line interface for changept is described. Here we present a graphical user interface for this package, called changeptGUI. This interface also includes tools for pre- and post-processing of data and results to facilitate investigation of the number and characteristics of segment classes.
Leakey, Tatiana I; Zielinski, Jerzy; Siegfried, Rachel N; Siegel, Eric R; Fan, Chun-Yang; Cooney, Craig A
2008-06-01
DNA methylation at cytosines is a widely studied epigenetic modification. Methylation is commonly detected using bisulfite modification of DNA followed by PCR and additional techniques such as restriction digestion or sequencing. These additional techniques are either laborious, require specialized equipment, or are not quantitative. Here we describe a simple algorithm that yields quantitative results from analysis of conventional four-dye-trace sequencing. We call this method Mquant and we compare it with the established laboratory method of combined bisulfite restriction assay (COBRA). This analysis of sequencing electropherograms provides a simple, easily applied method to quantify DNA methylation at specific CpG sites.
LongISLND: in silico sequencing of lengthy and noisy datatypes.
Lau, Bayo; Mohiyuddin, Marghoob; Mu, John C; Fang, Li Tai; Bani Asadi, Narges; Dallett, Carolina; Lam, Hugo Y K
2016-12-15
LongISLND is a software package designed to simulate sequencing data according to the characteristics of third generation, single-molecule sequencing technologies. The general software architecture is easily extendable, as demonstrated by the emulation of Pacific Biosciences (PacBio) multi-pass sequencing with P5 and P6 chemistries, producing data in FASTQ, H5, and the latest PacBio BAM format. We demonstrate its utility by downstream processing with consensus building and variant calling. LongISLND is implemented in Java and available at http://bioinform.github.io/longislnd CONTACT: hugo.lam@roche.comSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Proteolytic processing of the vitellogenin precursor in the boll weevil, Anthonomus grandis.
Heilmann, L J; Trewitt, P M; Kumaran, A K
1993-01-01
The soluble proteins of the eggs of the coleopteran insect Anthonomus grandis Boheman, the cotton boll weevil, consist almost entirely of two vitellin types with M(r)s of 160,000 and 47,000. We sequenced their N-terminal ends and one internal cyanogen bromide fragment of the large vitellin and compared these sequences with the deduced amino acid sequence from the vitellogenin gene. The results suggest that both the boll weevil vitellin proteins are products of the proteolytic cleavage of a single precursor protein. The smaller 47,000 M(r) vitellin protein is derived from the N-terminal portion of the precursor adjacent to an 18 amino acid signal peptide. The cleavage site between the large and small vitellins at amino acid 362 is adjacent to a pentapeptide sequence containing two pairs of arginine residues. Comparison of the boll weevil sequences with limited known sequences from the single 180,000 M(r) honey bee protein show that the honey bee vitellin N-terminal exhibits sequence homology to the N-terminal of the 47,000 M(r) boll weevil vitellin. Treatment of the vitellins with an N-glycosidase results in a decrease in molecular weight of both proteins, from 47,000 to 39,000 and from 160,000 to 145,000, indicating that about 10-15% of the molecular weight of each vitellin consists of N-linked carbohydrate. The molecular weight of the deglycosylated large vitellin is smaller than that predicted from the gene sequence, indicating possible further proteolytic processing at the C-terminal of that protein.
A hybrid computational strategy to address WGS variant analysis in >5000 samples.
Huang, Zhuoyi; Rustagi, Navin; Veeraraghavan, Narayanan; Carroll, Andrew; Gibbs, Richard; Boerwinkle, Eric; Venkata, Manjunath Gorentla; Yu, Fuli
2016-09-10
The decreasing costs of sequencing are driving the need for cost effective and real time variant calling of whole genome sequencing data. The scale of these projects are far beyond the capacity of typical computing resources available with most research labs. Other infrastructures like the cloud AWS environment and supercomputers also have limitations due to which large scale joint variant calling becomes infeasible, and infrastructure specific variant calling strategies either fail to scale up to large datasets or abandon joint calling strategies. We present a high throughput framework including multiple variant callers for single nucleotide variant (SNV) calling, which leverages hybrid computing infrastructure consisting of cloud AWS, supercomputers and local high performance computing infrastructures. We present a novel binning approach for large scale joint variant calling and imputation which can scale up to over 10,000 samples while producing SNV callsets with high sensitivity and specificity. As a proof of principle, we present results of analysis on Cohorts for Heart And Aging Research in Genomic Epidemiology (CHARGE) WGS freeze 3 dataset in which joint calling, imputation and phasing of over 5300 whole genome samples was produced in under 6 weeks using four state-of-the-art callers. The callers used were SNPTools, GATK-HaplotypeCaller, GATK-UnifiedGenotyper and GotCloud. We used Amazon AWS, a 4000-core in-house cluster at Baylor College of Medicine, IBM power PC Blue BioU at Rice and Rhea at Oak Ridge National Laboratory (ORNL) for the computation. AWS was used for joint calling of 180 TB of BAM files, and ORNL and Rice supercomputers were used for the imputation and phasing step. All other steps were carried out on the local compute cluster. The entire operation used 5.2 million core hours and only transferred a total of 6 TB of data across the platforms. Even with increasing sizes of whole genome datasets, ensemble joint calling of SNVs for low coverage data can be accomplished in a scalable, cost effective and fast manner by using heterogeneous computing platforms without compromising on the quality of variants.
Sastre-Garau, X; Favre, M; Couturier, J; Orth, G
2000-08-01
We previously described two genital carcinomas (IC2, IC4) containing human papillomavirus type 16 (HPV-16)- or HPV-18-related sequences integrated in chromosomal bands containing the c-myc (8q24) or N-myc (2p24) gene, respectively. The c-myc gene was rearranged and amplified in IC2 cells without evidence of overexpression. The N-myc gene was amplified and highly transcribed in IC4 cells. Here, the sequence of an 8039 bp IC4 DNA fragment containing the integrated viral sequences and the cellular junctions is reported. A 3948 bp segment of the genome of HPV-45 encompassing the upstream regulatory region and the E6 and E7 ORFs was integrated into the untranslated part of N-myc exon 3, upstream of the N-myc polyadenylation signal. Both N-myc and HPV-45 sequences were amplified 10- to 20-fold. The 3' ends of the major N-myc transcript were mapped upstream of the 5' junction. A minor N-myc/HPV-45 fusion transcript was also identified, as well as two abundant transcripts from the HPV-45 E6-E7 region. Large amounts of N-myc protein were detected in IC4 cells. A major alteration of c-myc sequences in IC2 cells involved the insertion of a non-coding sequence into the second intron and their co-amplification with the third exon, without any evidence for the integration of HPV-16 sequences within or close to the gene. Different patterns of myc gene alterations may thus be associated with integration of HPV DNA in genital tumours, including the activation of the protooncogene via a mechanism of insertional mutagenesis and/or gene amplification.
Tillage and cropping sequence impacts on nitrogen cycling in dryland farming in eastern Montana, USA
USDA-ARS?s Scientific Manuscript database
Information on N cycling in dryland crops and soils as influenced by long-term tillage and cropping sequence is needed to quantify soil N sequestration, mineralization, and N balance to reduce N fertilization rate and N losses through soil processes. We evaluated the 21-yr effects of combinations of...
NASA Astrophysics Data System (ADS)
Slyusarchuk, Vasilii E.
2009-02-01
Necessary and sufficient conditions are found for the invertibility of the nonlinear difference operator \\displaystyle (\\mathscr Rx)(n)=H(x(n),x(n+1)),\\qquad n\\in\\mathbb Z, in the space of bounded two-sided number sequences. Here H\\colon \\mathbb R^2\\to \\mathbb R is a continuous function. Bibliography: 29 titles.
PipeOnline 2.0: automated EST processing and functional data sorting.
Ayoubi, Patricia; Jin, Xiaojing; Leite, Saul; Liu, Xianghui; Martajaja, Jeson; Abduraham, Abdurashid; Wan, Qiaolan; Yan, Wei; Misawa, Eduardo; Prade, Rolf A
2002-11-01
Expressed sequence tags (ESTs) are generated and deposited in the public domain, as redundant, unannotated, single-pass reactions, with virtually no biological content. PipeOnline automatically analyses and transforms large collections of raw DNA-sequence data from chromatograms or FASTA files by calling the quality of bases, screening and removing vector sequences, assembling and rewriting consensus sequences of redundant input files into a unigene EST data set and finally through translation, amino acid sequence similarity searches, annotation of public databases and functional data. PipeOnline generates an annotated database, retaining the processed unigene sequence, clone/file history, alignments with similar sequences, and proposed functional classification, if available. Functional annotation is automatic and based on a novel method that relies on homology of amino acid sequence multiplicity within GenBank records. Records are examined through a function ordered browser or keyword queries with automated export of results. PipeOnline offers customization for individual projects (MyPipeOnline), automated updating and alert service. PipeOnline is available at http://stress-genomics.org.
Advances in DNA sequencing technologies for high resolution HLA typing.
Cereb, Nezih; Kim, Hwa Ran; Ryu, Jaejun; Yang, Soo Young
2015-12-01
This communication describes our experience in large-scale G group-level high resolution HLA typing using three different DNA sequencing platforms - ABI 3730 xl, Illumina MiSeq and PacBio RS II. Recent advances in DNA sequencing technologies, so-called next generation sequencing (NGS), have brought breakthroughs in deciphering the genetic information in all living species at a large scale and at an affordable level. The NGS DNA indexing system allows sequencing multiple genes for large number of individuals in a single run. Our laboratory has adopted and used these technologies for HLA molecular testing services. We found that each sequencing technology has its own strengths and weaknesses, and their sequencing performances complement each other. HLA genes are highly complex and genotyping them is quite challenging. Using these three sequencing platforms, we were able to meet all requirements for G group-level high resolution and high volume HLA typing. Copyright © 2015 American Society for Histocompatibility and Immunogenetics. Published by Elsevier Inc. All rights reserved.
Single-Cell Whole-Genome Amplification and Sequencing: Methodology and Applications.
Huang, Lei; Ma, Fei; Chapman, Alec; Lu, Sijia; Xie, Xiaoliang Sunney
2015-01-01
We present a survey of single-cell whole-genome amplification (WGA) methods, including degenerate oligonucleotide-primed polymerase chain reaction (DOP-PCR), multiple displacement amplification (MDA), and multiple annealing and looping-based amplification cycles (MALBAC). The key parameters to characterize the performance of these methods are defined, including genome coverage, uniformity, reproducibility, unmappable rates, chimera rates, allele dropout rates, false positive rates for calling single-nucleotide variations, and ability to call copy-number variations. Using these parameters, we compare five commercial WGA kits by performing deep sequencing of multiple single cells. We also discuss several major applications of single-cell genomics, including studies of whole-genome de novo mutation rates, the early evolution of cancer genomes, circulating tumor cells (CTCs), meiotic recombination of germ cells, preimplantation genetic diagnosis (PGD), and preimplantation genomic screening (PGS) for in vitro-fertilized embryos.
Derkach, Andriy; Chiang, Theodore; Gong, Jiafen; Addis, Laura; Dobbins, Sara; Tomlinson, Ian; Houlston, Richard; Pal, Deb K; Strug, Lisa J
2014-08-01
Sufficiently powered case-control studies with next-generation sequence (NGS) data remain prohibitively expensive for many investigators. If feasible, a more efficient strategy would be to include publicly available sequenced controls. However, these studies can be confounded by differences in sequencing platform; alignment, single nucleotide polymorphism and variant calling algorithms; read depth; and selection thresholds. Assuming one can match cases and controls on the basis of ethnicity and other potential confounding factors, and one has access to the aligned reads in both groups, we investigate the effect of systematic differences in read depth and selection threshold when comparing allele frequencies between cases and controls. We propose a novel likelihood-based method, the robust variance score (RVS), that substitutes genotype calls by their expected values given observed sequence data. We show theoretically that the RVS eliminates read depth bias in the estimation of minor allele frequency. We also demonstrate that, using simulated and real NGS data, the RVS method controls Type I error and has comparable power to the 'gold standard' analysis with the true underlying genotypes for both common and rare variants. An RVS R script and instructions can be found at strug.research.sickkids.ca, and at https://github.com/strug-lab/RVS. lisa.strug@utoronto.ca Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Systematic Evaluation of the Dependence of Deoxyribozyme Catalysis on Random Region Length
Velez, Tania E.; Singh, Jaydeep; Xiao, Ying; Allen, Emily C.; Wong, On Yi; Chandra, Madhavaiah; Kwon, Sarah C.; Silverman, Scott K.
2012-01-01
Functional nucleic acids are DNA and RNA aptamers that bind targets, or they are deoxyribozymes and ribozymes that have catalytic activity. These functional DNA and RNA sequences can be identified from random-sequence pools by in vitro selection, which requires choosing the length of the random region. Shorter random regions allow more complete coverage of sequence space but may not permit the structural complexity necessary for binding or catalysis. In contrast, longer random regions are sampled incompletely but may allow adoption of more complicated structures that enable function. In this study, we systematically examined random region length (N20 through N60) for two particular deoxyribozyme catalytic activities, DNA cleavage and tyrosine-RNA nucleopeptide linkage formation. For both activities, we previously identified deoxyribozymes using only N40 regions. In the case of DNA cleavage, here we found that shorter N20 and N30 regions allowed robust catalytic function, either by DNA hydrolysis or by DNA deglycosylation and strand scission via β-elimination, whereas longer N50 and N60 regions did not lead to catalytically active DNA sequences. Follow-up selections with N20, N30, and N40 regions revealed an interesting interplay of metal ion cofactors and random region length. Separately, for Tyr-RNA linkage formation, N30 and N60 regions provided catalytically active sequences, whereas N20 was unsuccessful, and the N40 deoxyribozymes were functionally superior (in terms of rate and yield) to N30 and N60. Collectively, the results indicate that with future in vitro selection experiments for DNA and RNA catalysts, and by extension for aptamers, random region length should be an important experimental variable. PMID:23088677
Structure of a Burkholderia pseudomallei Trimeric Autotransporter Adhesin Head
Edwards, Thomas E.; Phan, Isabelle; Abendroth, Jan; Dieterich, Shellie H.; Masoudi, Amir; Guo, Wenjin; Hewitt, Stephen N.; Kelley, Angela; Leibly, David; Brittnacher, Mitch J.; Staker, Bart L.; Miller, Samuel I.; Van Voorhis, Wesley C.; Myler, Peter J.; Stewart, Lance J.
2010-01-01
Background Pathogenic bacteria adhere to the host cell surface using a family of outer membrane proteins called Trimeric Autotransporter Adhesins (TAAs). Although TAAs are highly divergent in sequence and domain structure, they are all conceptually comprised of a C-terminal membrane anchoring domain and an N-terminal passenger domain. Passenger domains consist of a secretion sequence, a head region that facilitates binding to the host cell surface, and a stalk region. Methodology/Principal Findings Pathogenic species of Burkholderia contain an overabundance of TAAs, some of which have been shown to elicit an immune response in the host. To understand the structural basis for host cell adhesion, we solved a 1.35 Å resolution crystal structure of a BpaA TAA head domain from Burkholderia pseudomallei, the pathogen that causes melioidosis. The structure reveals a novel fold of an intricately intertwined trimer. The BpaA head is composed of structural elements that have been observed in other TAA head structures as well as several elements of previously unknown structure predicted from low sequence homology between TAAs. These elements are typically up to 40 amino acids long and are not domains, but rather modular structural elements that may be duplicated or omitted through evolution, creating molecular diversity among TAAs. Conclusions/Significance The modular nature of BpaA, as demonstrated by its head domain crystal structure, and of TAAs in general provides insights into evolution of pathogen-host adhesion and may provide an avenue for diagnostics. PMID:20862217
Reicher, S; Seroussi, E; Weller, J I; Rosov, A; Gootwine, E
2012-07-01
Polymorphisms in mitochondrial DNA (mtDNA) protein- and tRNA-coding genes were shown to be associated with various diseases in humans as well as with production and reproduction traits in livestock. Alignment of full length mitochondria sequences from the 5 known ovine haplogroups: HA (n = 3), HB (n = 5), HC (n = 3), HD (n = 2), and HE (n = 2; GenBank accession nos. HE577847-50 and 11 published complete ovine mitochondria sequences) revealed sequence variation in 10 out of the 13 protein coding mtDNA sequences. Twenty-six of the 245 variable sites found in the protein coding sequences represent non-synonymous mutations. Sequence variation was observed also in 8 out of the 22 tRNA mtDNA sequences. On the basis of the mtDNA control region and cytochrome b partial sequences along with information on maternal lineages within an Afec-Assaf flock, 1,126 Afec-Assaf ewes were assigned to mitochondrial haplogroups HA, HB, and HC, with frequencies of 0.43, 0.43, and 0.14, respectively. Analysis of birth weight and growth rate records of lamb (n = 1286) and productivity from 4,993 lambing records revealed no association between mitochondrial haplogroup affiliation and female longevity, lambs perinatal survival rate, birth weight, and daily growth rate of lambs up to 150 d that averaged 1,664 d, 88.3%, 4.5 kg, and 320 g/d, respectively. However, significant (P < 0.0001) differences among the haplogroups were found for prolificacy of ewes, with prolificacies (mean ± SE) of 2.14 ± 0.04, 2.25 ± 0.04, and 2.30 ± 0.06 lamb born/ewe lambing for the HA, HB, and the HC haplogroups, respectively. Our results highlight the ovine mitogenome genetic variation in protein- and tRNA coding genes and suggest that sequence variation in ovine mtDNA is associated with variation in ewe prolificacy.
Accurate and exact CNV identification from targeted high-throughput sequence data.
Nord, Alex S; Lee, Ming; King, Mary-Claire; Walsh, Tom
2011-04-12
Massively parallel sequencing of barcoded DNA samples significantly increases screening efficiency for clinically important genes. Short read aligners are well suited to single nucleotide and indel detection. However, methods for CNV detection from targeted enrichment are lacking. We present a method combining coverage with map information for the identification of deletions and duplications in targeted sequence data. Sequencing data is first scanned for gains and losses using a comparison of normalized coverage data between samples. CNV calls are confirmed by testing for a signature of sequences that span the CNV breakpoint. With our method, CNVs can be identified regardless of whether breakpoints are within regions targeted for sequencing. For CNVs where at least one breakpoint is within targeted sequence, exact CNV breakpoints can be identified. In a test data set of 96 subjects sequenced across ~1 Mb genomic sequence using multiplexing technology, our method detected mutations as small as 31 bp, predicted quantitative copy count, and had a low false-positive rate. Application of this method allows for identification of gains and losses in targeted sequence data, providing comprehensive mutation screening when combined with a short read aligner.
2016 Annual Report of the University of Kansas Health System Poison Control Center.
Thornton, Stephen L; Oller, Lisa; Coons, Doyle M
2018-05-01
This is the 2016 Annual Report of the University of Kansas Health System Poison Control Center (PCC). The PCC is one of 55 certified poison control centers in the United States and serves the state of Kansas 24-hours a day, 365 days a year, with certified specialists in poison information and medical toxicologists. The PCC receives calls from the public, law enforcement, health care professionals, and public health agencies. All calls to the PCC are recorded electronically in the Toxicall® data management system and uploaded in near real-time to the National Poison Data System (NPDS), which is the data repository for all poison control centers in the United States. All encounters reported to the PCC from January 1, 2016 to December 31, 2016 were analyzed. Data recorded for each exposure includes caller location, age, weight, gender, substance exposed to, nature of exposure, route of exposure, interventions, medical outcome, disposition and location of care. Encounters were classified further as human exposure, animal exposure, confirmed non-exposure, or information call (no exposure reported). The PCC logged 21,965 total encounters in 2016, including 20,713 human exposure cases. The PCC received calls from every county in Kansas. The majority of human exposure cases (50.4%, n = 10,174) were female. Approximately 67% (n = 13,903) of human exposures involved a child (defined as age 19 years or less). Most encounters occurred at a residence (94.0%, n = 19,476) and most calls (72.3%, n = 14,964) originated from a residence. The majority of human exposures (n = 18,233) were acute cases (exposures occurring over eight hours or less). Ingestion was the most common route of exposure documented (86.3%, n = 17,882). The most common reported substance in pediatric encounters was cosmetics/personal care products (n = 1,362), followed by household cleaning product (n = 1,301). For adult encounters, sedatives/hypnotics/antipsychotics (n = 1,130) and analgesics (n = 1,103) were the most frequently involved substances. Unintentional exposures were the most common reason for exposures (81.3%, n = 16,836). Most encounters (71.1%, n = 14,732) were managed in a non-healthcare facility (i.e., a residence). Among human exposures, 14,679 involved exposures to pharmaceutical agents while 10,176 involved exposure to non-pharmaceuticals. Medical outcomes were 32% (n = 6,582) no effect, 19% (n = 3,911) minor effect, 8% (n = 1,623) moderate effect, and 2% (n = 348) major effects. There were 15 deaths in 2016 reported to the PCC. Number of exposures, calls from healthcare facilities, cases with moderate or major medical outcomes, and deaths all increased in 2016 compared to 2015. The results of the 2016 University of Kansas Health System Poison Control annual report demonstrates that the center receives calls from the entire state of Kansas totaling over 20,000 human exposures per year. While pediatric exposures remain the most common, there is an increasing number of calls from healthcare facilities and for cases with serious outcomes. The experience of the PCC is similar to national data. This report supports the continued value of the PCC to both public and acute health care in the state of Kansas.
2016 Annual Report of the University of Kansas Health System Poison Control Center
Thornton, Stephen L.; Oller, Lisa; Coons, Doyle M.
2018-01-01
Introduction This is the 2016 Annual Report of the University of Kansas Health System Poison Control Center (PCC). The PCC is one of 55 certified poison control centers in the United States and serves the state of Kansas 24-hours a day, 365 days a year, with certified specialists in poison information and medical toxicologists. The PCC receives calls from the public, law enforcement, health care professionals, and public health agencies. All calls to the PCC are recorded electronically in the Toxicall® data management system and uploaded in near real-time to the National Poison Data System (NPDS), which is the data repository for all poison control centers in the United States. Methods All encounters reported to the PCC from January 1, 2016 to December 31, 2016 were analyzed. Data recorded for each exposure includes caller location, age, weight, gender, substance exposed to, nature of exposure, route of exposure, interventions, medical outcome, disposition and location of care. Encounters were classified further as human exposure, animal exposure, confirmed non-exposure, or information call (no exposure reported). Results The PCC logged 21,965 total encounters in 2016, including 20,713 human exposure cases. The PCC received calls from every county in Kansas. The majority of human exposure cases (50.4%, n = 10,174) were female. Approximately 67% (n = 13,903) of human exposures involved a child (defined as age 19 years or less). Most encounters occurred at a residence (94.0%, n = 19,476) and most calls (72.3%, n = 14,964) originated from a residence. The majority of human exposures (n = 18,233) were acute cases (exposures occurring over eight hours or less). Ingestion was the most common route of exposure documented (86.3%, n = 17,882). The most common reported substance in pediatric encounters was cosmetics/personal care products (n = 1,362), followed by household cleaning product (n = 1,301). For adult encounters, sedatives/hypnotics/antipsychotics (n = 1,130) and analgesics (n = 1,103) were the most frequently involved substances. Unintentional exposures were the most common reason for exposures (81.3%, n = 16,836). Most encounters (71.1%, n = 14,732) were managed in a non-healthcare facility (i.e., a residence). Among human exposures, 14,679 involved exposures to pharmaceutical agents while 10,176 involved exposure to non-pharmaceuticals. Medical outcomes were 32% (n = 6,582) no effect, 19% (n = 3,911) minor effect, 8% (n = 1,623) moderate effect, and 2% (n = 348) major effects. There were 15 deaths in 2016 reported to the PCC. Number of exposures, calls from healthcare facilities, cases with moderate or major medical outcomes, and deaths all increased in 2016 compared to 2015. Conclusion The results of the 2016 University of Kansas Health System Poison Control annual report demonstrates that the center receives calls from the entire state of Kansas totaling over 20,000 human exposures per year. While pediatric exposures remain the most common, there is an increasing number of calls from healthcare facilities and for cases with serious outcomes. The experience of the PCC is similar to national data. This report supports the continued value of the PCC to both public and acute health care in the state of Kansas. PMID:29796151
Sequential addition of short DNA oligos in DNA-polymerase-based synthesis reactions
Gardner, Shea N; Mariella, Jr., Raymond P; Christian, Allen T; Young, Jennifer A; Clague, David S
2013-06-25
A method of preselecting a multiplicity of DNA sequence segments that will comprise the DNA molecule of user-defined sequence, separating the DNA sequence segments temporally, and combining the multiplicity of DNA sequence segments with at least one polymerase enzyme wherein the multiplicity of DNA sequence segments join to produce the DNA molecule of user-defined sequence. Sequence segments may be of length n, where n is an odd integer. In one embodiment the length of desired hybridizing overlap is specified by the user and the sequences and the protocol for combining them are guided by computational (bioinformatics) predictions. In one embodiment sequence segments are combined from multiple reading frames to span the same region of a sequence, so that multiple desired hybridizations may occur with different overlap lengths.
UniNovo: a universal tool for de novo peptide sequencing.
Jeong, Kyowon; Kim, Sangtae; Pevzner, Pavel A
2013-08-15
Mass spectrometry (MS) instruments and experimental protocols are rapidly advancing, but de novo peptide sequencing algorithms to analyze tandem mass (MS/MS) spectra are lagging behind. Although existing de novo sequencing tools perform well on certain types of spectra [e.g. Collision Induced Dissociation (CID) spectra of tryptic peptides], their performance often deteriorates on other types of spectra, such as Electron Transfer Dissociation (ETD), Higher-energy Collisional Dissociation (HCD) spectra or spectra of non-tryptic digests. Thus, rather than developing a new algorithm for each type of spectra, we develop a universal de novo sequencing algorithm called UniNovo that works well for all types of spectra or even for spectral pairs (e.g. CID/ETD spectral pairs). UniNovo uses an improved scoring function that captures the dependences between different ion types, where such dependencies are learned automatically using a modified offset frequency function. The performance of UniNovo is compared with PepNovo+, PEAKS and pNovo using various types of spectra. The results show that the performance of UniNovo is superior to other tools for ETD spectra and superior or comparable with others for CID and HCD spectra. UniNovo also estimates the probability that each reported reconstruction is correct, using simple statistics that are readily obtained from a small training dataset. We demonstrate that the estimation is accurate for all tested types of spectra (including CID, HCD, ETD, CID/ETD and HCD/ETD spectra of trypsin, LysC or AspN digested peptides). UniNovo is implemented in JAVA and tested on Windows, Ubuntu and OS X machines. UniNovo is available at http://proteomics.ucsd.edu/Software/UniNovo.html along with the manual.
First genome report on novel sequence types of Neisseria meningitidis: ST12777 and ST12778.
Veeraraghavan, Balaji; Lal, Binesh; Devanga Ragupathi, Naveen Kumar; Neeravi, Iyyan Raj; Jeyaraman, Ranjith; Varghese, Rosemol; Paul, Miracle Magdalene; Baskaran, Ashtawarthani; Ranjan, Ranjini
2018-03-01
Neisseria meningitidis is an important causative agent of meningitis and/or sepsis with high morbidity and mortality. Baseline genome data on N. meningitidis, especially from developing countries such as India, are lacking. This study aimed to investigate the whole genome sequences of N. meningitidis isolates from a tertiary care centre in India. Whole-genome sequencing was performed using an Ion Torrent™ Personal Genome Machine™ (PGM) with 400-bp chemistry. Data were assembled de novo using SPAdes Genome Assembler v.5.0.0.0. Sequence annotation was performed through PATRIC, RAST and the NCBI PGAAP server. Downstream analysis of the isolates was performed using the Center for Genomic Epidemiology databases for antimicrobial resistance genes and sequence types. Virulence factors and CRISPR were analysed using the PubMLST database and CRISPRFinder, respectively. This study reports the whole genome shotgun sequences of eight N. meningitidis isolates from bloodstream infections. The genome data revealed two novel sequence types (ST12777 and ST12778), along with ST11, ST437 and ST6928. The virulence profile of the isolates matched their sequence types. All isolates were negative for plasmid-mediated resistance genes. To the best of our knowledge, this is the first report of ST11 and ST437 N. meningitidis isolates in India along with two novel sequence types (ST12777 and ST12778). These results indicate that the sequence types circulating in India are diverse and require continuous monitoring. Further studies strengthening the genome data on N. meningitidis are required to understand the prevalence, spread, exact resistance and virulence mechanisms along with serotypes. Copyright © 2017 International Society for Chemotherapy of Infection and Cancer. Published by Elsevier Ltd. All rights reserved.
Indexing a sequence for mapping reads with a single mismatch.
Crochemore, Maxime; Langiu, Alessio; Rahman, M Sohel
2014-05-28
Mapping reads against a genome sequence is an interesting and useful problem in computational molecular biology and bioinformatics. In this paper, we focus on the problem of indexing a sequence for mapping reads with a single mismatch. We first focus on a simpler problem where the length of the pattern is given beforehand during the data structure construction. This version of the problem is interesting in its own right in the context of the next generation sequencing. In the sequel, we show how to solve the more general problem. In both cases, our algorithm can construct an efficient data structure in O(n log(1+ε) n) time and space and can answer subsequent queries in O(m log log n + K) time. Here, n is the length of the sequence, m is the length of the read, 0<ε<1 and is the optimal output size.
NASA Astrophysics Data System (ADS)
Rusyaman, E.; Parmikanti, K.; Chaerani, D.; Asefan; Irianingsih, I.
2018-03-01
One of the application of fractional ordinary differential equation is related to the viscoelasticity, i.e., a correlation between the viscosity of fluids and the elasticity of solids. If the solution function develops into function with two or more variables, then its differential equation must be changed into fractional partial differential equation. As the preliminary study for two variables viscoelasticity problem, this paper discusses about convergence analysis of function sequence which is the solution of the homogenous fractional partial differential equation. The method used to solve the problem is Homotopy Analysis Method. The results show that if given two real number sequences (αn) and (βn) which converge to α and β respectively, then the solution function sequences of fractional partial differential equation with order (αn, βn) will also converge to the solution function of fractional partial differential equation with order (α, β).
A novel glucan-binding protein with lipase activity from the oral pathogen Streptococcus mutans.
Shah, Deepan S H; Russell, Roy R B
2004-06-01
Streptococcus mutans produces extracellular glucosyltransferases (GTFs) that synthesize glucans from sucrose. These glucans are important in determining the permeability properties and adhesiveness of dental plaque. GTFs and the GbpA glucan-binding protein are characterized by a binding domain containing a series of 33-amino-acid repeats, called 'A' repeats. The S. mutans genome sequence was searched for ORFs containing 'A' repeats, and one novel gene, gbpD, which appears to be unique to the mutans group of streptococci, was identified. The GbpD sequence revealed the presence of three 'A' repeats, in the middle of the protein, and a novel glucan-binding assay showed that GbpD binds to dextran with a K(D) of 2-3 nM. Construction of truncated derivatives of GbpD confirmed that the 'A' repeat region was essential for binding. Furthermore, a gbpD knockout mutant was modified in the extent of aggregation induced by polymers derived from sucrose. The N-terminus of GbpD has a signal sequence, followed by a region with no homologues in the public databases, while the C-terminus has homology to the alpha/beta hydrolase family (including lipases and carboxylesterases). GbpD contains the two regions typical of these enzymes: a GxSxG active site 'lipase box' and an 'oxyanion hole'. GbpD released free fatty acids (FFAs) from a range of triglycerides in the presence of calcium, indicating a lipase activity. The glucan binding/lipase bifunctionality suggested the natural substrate for the enzyme may be a surface macromolecule consisting of carbohydrate linked to lipid. The gbpD mutant was less hydrophobic than wild-type and pure recombinant GbpD reduced the hydrophobicity of S. mutans and another plaque bacterium, Streptococcus sanguinis. GbpD bound to and released FFA from lipoteichoic acid (LTA) of S. sanguinis, but had no effect on LTA from S. mutans. These results raise the intriguing possibility that GbpD may be involved in direct interspecies competition within the plaque biofilm.
Preparation of 13C/15N-labeled oligomers using the polymerase chain reaction
Chen, Xian; Gupta, Goutam; Bradbury, E. Morton
2001-01-01
Preparation of .sup.13 C/.sup.15 N-labeled DNA oligomers using the polymerase chain reaction (PCR). A PCR based method for uniform (.sup.13 C/.sup.15 N)-labeling of DNA duplexes is described. Multiple copies of a blunt-ended duplex are cloned into a plasmid, each copy containing the sequence of interest and restriction Hinc II sequences at both the 5' and 3' ends. PCR using bi-directional primers and uniformly .sup.13 C/.sup.15 N-labeled dNTP precursors generates labeled DNA duplexes containing multiple copies of the sequence of interest. Twenty-four cycles of PCR, followed by restriction and purification, gave the uniformly .sup.13 C/.sup.15 N-labeled duplex sequence with a 30% yield. Such labeled duplexes find significant applications in multinuclear magnetic resonance spectroscopy.
Wustman, Brandon A; Morse, Daniel E; Evans, John Spencer
2004-08-05
The AP7 and AP24 proteins represent a class of mineral-interaction polypeptides that are found in the aragonite-containing nacre layer of mollusk shell (H. rufescens). These proteins have been shown to preferentially interfere with calcium carbonate mineral growth in vitro. It is believed that both proteins play an important role in aragonite polymorph selection in the mollusk shell. Previously, we demonstrated the 1-30 amino acid (AA) N-terminal sequences of AP7 and AP24 represent mineral interaction/modification domains in both proteins, as evidenced by their ability to frustrate calcium carbonate crystal growth at step edge regions. In this present report, using free N-terminal, C(alpha)-amide "capped" synthetic polypeptides representing the 1-30 AA regions of AP7 (AP7-1 polypeptide) and AP24 (AP24-1 polypeptide) and NMR spectroscopy, we confirm that both N-terminal sequences possess putative Ca (II) interaction polyanionic sequence regions (2 x -DD- in AP7-1, -DDDED- in AP24-1) that are random coil-like in structure. However, with regard to the remaining sequences regions, each polypeptide features unique structural differences. AP7-1 possesses an extended beta-strand or polyproline type II-like structure within the A11-M10, S12-V13, and S28-I27 sequence regions, with the remaining sequence regions adopting a random-coil-like structure, a trait common to other polyelectrolyte mineral-associated polypeptide sequences. Conversely, AP24-1 possesses random coil-like structure within A1-S9 and Q14-N16 sequence regions, and evidence for turn-like, bend, or loop conformation within the G10-N13, Q17-N24, and M29-F30 sequence regions, similar to the structures identified within the putative elastomeric proteins Lustrin A and sea urchin spicule matrix proteins. The similarities and differences in AP7 and AP24 N-terminal domain structure are discussed with regard to joint AP7-AP24 protein modification of calcium carbonate growth. Copyright 2004 Wiley Periodicals, Inc.
Yarimizu, Tohru; Nakamura, Mikiko; Hoshida, Hisashi; Akada, Rinji
2015-02-14
Targeting of cellular proteins to the extracellular environment is directed by a secretory signal sequence located at the N-terminus of a secretory protein. These signal sequences usually contain an N-terminal basic amino acid followed by a stretch containing hydrophobic residues, although no consensus signal sequence has been identified. In this study, simple modeling of signal sequences was attempted using Gaussia princeps secretory luciferase (GLuc) in the yeast Kluyveromyces marxianus, which allowed comprehensive recombinant gene construction to substitute synthetic signal sequences. Mutational analysis of the GLuc signal sequence revealed that the GLuc hydrophobic peptide length was lower limit for effective secretion and that the N-terminal basic residue was indispensable. Deletion of the 16th Glu caused enhanced levels of secreted protein, suggesting that this hydrophilic residue defined the boundary of a hydrophobic peptide stretch. Consequently, we redesigned this domain as a repeat of a single hydrophobic amino acid between the N-terminal Lys and C-terminal Glu. Stretches consisting of Phe, Leu, Ile, or Met were effective for secretion but the number of residues affected secretory activity. A stretch containing sixteen consecutive methionine residues (M16) showed the highest activity; the M16 sequence was therefore utilized for the secretory production of human leukemia inhibitory factor protein in yeast, resulting in enhanced secreted protein yield. We present a new concept for the provision of secretory signal sequence ability in the yeast K. marxianus, determined by the number of residues of a single hydrophobic residue located between N-terminal basic and C-terminal acidic amino acid boundaries.
Grissa, Ibtissem; Vergnaud, Gilles; Pourcel, Christine
2007-01-01
Background In Archeae and Bacteria, the repeated elements called CRISPRs for "clustered regularly interspaced short palindromic repeats" are believed to participate in the defence against viruses. Short sequences called spacers are stored in-between repeated elements. In the current model, motifs comprising spacers and repeats may target an invading DNA and lead to its degradation through a proposed mechanism similar to RNA interference. Analysis of intra-species polymorphism shows that new motifs (one spacer and one repeated element) are added in a polarised fashion. Although their principal characteristics have been described, a lot remains to be discovered on the way CRISPRs are created and evolve. As new genome sequences become available it appears necessary to develop automated scanning tools to make available CRISPRs related information and to facilitate additional investigations. Description We have produced a program, CRISPRFinder, which identifies CRISPRs and extracts the repeated and unique sequences. Using this software, a database is constructed which is automatically updated monthly from newly released genome sequences. Additional tools were created to allow the alignment of flanking sequences in search for similarities between different loci and to build dictionaries of unique sequences. To date, almost six hundred CRISPRs have been identified in 475 published genomes. Two Archeae out of thirty-seven and about half of Bacteria do not possess a CRISPR. Fine analysis of repeated sequences strongly supports the current view that new motifs are added at one end of the CRISPR adjacent to the putative promoter. Conclusion It is hoped that availability of a public database, regularly updated and which can be queried on the web will help in further dissecting and understanding CRISPR structure and flanking sequences evolution. Subsequent analyses of the intra-species CRISPR polymorphism will be facilitated by CRISPRFinder and the dictionary creator. CRISPRdb is accessible at PMID:17521438
Zook, Justin M.; Samarov, Daniel; McDaniel, Jennifer; Sen, Shurjo K.; Salit, Marc
2012-01-01
While the importance of random sequencing errors decreases at higher DNA or RNA sequencing depths, systematic sequencing errors (SSEs) dominate at high sequencing depths and can be difficult to distinguish from biological variants. These SSEs can cause base quality scores to underestimate the probability of error at certain genomic positions, resulting in false positive variant calls, particularly in mixtures such as samples with RNA editing, tumors, circulating tumor cells, bacteria, mitochondrial heteroplasmy, or pooled DNA. Most algorithms proposed for correction of SSEs require a data set used to calculate association of SSEs with various features in the reads and sequence context. This data set is typically either from a part of the data set being “recalibrated” (Genome Analysis ToolKit, or GATK) or from a separate data set with special characteristics (SysCall). Here, we combine the advantages of these approaches by adding synthetic RNA spike-in standards to human RNA, and use GATK to recalibrate base quality scores with reads mapped to the spike-in standards. Compared to conventional GATK recalibration that uses reads mapped to the genome, spike-ins improve the accuracy of Illumina base quality scores by a mean of 5 Phred-scaled quality score units, and by as much as 13 units at CpG sites. In addition, since the spike-in data used for recalibration are independent of the genome being sequenced, our method allows run-specific recalibration even for the many species without a comprehensive and accurate SNP database. We also use GATK with the spike-in standards to demonstrate that the Illumina RNA sequencing runs overestimate quality scores for AC, CC, GC, GG, and TC dinucleotides, while SOLiD has less dinucleotide SSEs but more SSEs for certain cycles. We conclude that using these DNA and RNA spike-in standards with GATK improves base quality score recalibration. PMID:22859977
Beccari, T; Hoade, J; Orlacchio, A; Stirling, J L
1992-01-01
cDNAs encoding the mouse beta-N-acetylhexosaminidase alpha-subunit were isolated from a mouse testis library. The longest of these (1.7 kb) was sequenced and showed 83% similarity with the human alpha-subunit cDNA sequence. The 5' end of the coding sequence was obtained from a genomic DNA clone. Alignment of the human and mouse sequences showed that all three putative N-glycosylation sites are conserved, but that the mouse alpha-subunit has an additional site towards the C-terminus. All eight cysteines in the human sequence are conserved in the mouse. There are an additional two cysteines in the mouse alpha-subunit signal peptide. All amino acids affected in Tay-Sachs-disease mutations are conserved in the mouse. Images Fig. 1. PMID:1379046
The effect of call libraries and acoustic filters on the identification of bat echolocation.
Clement, Matthew J; Murray, Kevin L; Solick, Donald I; Gruver, Jeffrey C
2014-09-01
Quantitative methods for species identification are commonly used in acoustic surveys for animals. While various identification models have been studied extensively, there has been little study of methods for selecting calls prior to modeling or methods for validating results after modeling. We obtained two call libraries with a combined 1556 pulse sequences from 11 North American bat species. We used four acoustic filters to automatically select and quantify bat calls from the combined library. For each filter, we trained a species identification model (a quadratic discriminant function analysis) and compared the classification ability of the models. In a separate analysis, we trained a classification model using just one call library. We then compared a conventional model assessment that used the training library against an alternative approach that used the second library. We found that filters differed in the share of known pulse sequences that were selected (68 to 96%), the share of non-bat noises that were excluded (37 to 100%), their measurement of various pulse parameters, and their overall correct classification rate (41% to 85%). Although the top two filters did not differ significantly in overall correct classification rate (85% and 83%), rates differed significantly for some bat species. In our assessment of call libraries, overall correct classification rates were significantly lower (15% to 23% lower) when tested on the second call library instead of the training library. Well-designed filters obviated the need for subjective and time-consuming manual selection of pulses. Accordingly, researchers should carefully design and test filters and include adequate descriptions in publications. Our results also indicate that it may not be possible to extend inferences about model accuracy beyond the training library. If so, the accuracy of acoustic-only surveys may be lower than commonly reported, which could affect ecological understanding or management decisions based on acoustic surveys.
The effect of call libraries and acoustic filters on the identification of bat echolocation
Clement, Matthew J; Murray, Kevin L; Solick, Donald I; Gruver, Jeffrey C
2014-01-01
Quantitative methods for species identification are commonly used in acoustic surveys for animals. While various identification models have been studied extensively, there has been little study of methods for selecting calls prior to modeling or methods for validating results after modeling. We obtained two call libraries with a combined 1556 pulse sequences from 11 North American bat species. We used four acoustic filters to automatically select and quantify bat calls from the combined library. For each filter, we trained a species identification model (a quadratic discriminant function analysis) and compared the classification ability of the models. In a separate analysis, we trained a classification model using just one call library. We then compared a conventional model assessment that used the training library against an alternative approach that used the second library. We found that filters differed in the share of known pulse sequences that were selected (68 to 96%), the share of non-bat noises that were excluded (37 to 100%), their measurement of various pulse parameters, and their overall correct classification rate (41% to 85%). Although the top two filters did not differ significantly in overall correct classification rate (85% and 83%), rates differed significantly for some bat species. In our assessment of call libraries, overall correct classification rates were significantly lower (15% to 23% lower) when tested on the second call library instead of the training library. Well-designed filters obviated the need for subjective and time-consuming manual selection of pulses. Accordingly, researchers should carefully design and test filters and include adequate descriptions in publications. Our results also indicate that it may not be possible to extend inferences about model accuracy beyond the training library. If so, the accuracy of acoustic-only surveys may be lower than commonly reported, which could affect ecological understanding or management decisions based on acoustic surveys. PMID:25535563
The effect of call libraries and acoustic filters on the identification of bat echolocation
Clement, Matthew; Murray, Kevin L; Solick, Donald I; Gruver, Jeffrey C
2014-01-01
Quantitative methods for species identification are commonly used in acoustic surveys for animals. While various identification models have been studied extensively, there has been little study of methods for selecting calls prior to modeling or methods for validating results after modeling. We obtained two call libraries with a combined 1556 pulse sequences from 11 North American bat species. We used four acoustic filters to automatically select and quantify bat calls from the combined library. For each filter, we trained a species identification model (a quadratic discriminant function analysis) and compared the classification ability of the models. In a separate analysis, we trained a classification model using just one call library. We then compared a conventional model assessment that used the training library against an alternative approach that used the second library. We found that filters differed in the share of known pulse sequences that were selected (68 to 96%), the share of non-bat noises that were excluded (37 to 100%), their measurement of various pulse parameters, and their overall correct classification rate (41% to 85%). Although the top two filters did not differ significantly in overall correct classification rate (85% and 83%), rates differed significantly for some bat species. In our assessment of call libraries, overall correct classification rates were significantly lower (15% to 23% lower) when tested on the second call library instead of the training library. Well-designed filters obviated the need for subjective and time-consuming manual selection of pulses. Accordingly, researchers should carefully design and test filters and include adequate descriptions in publications. Our results also indicate that it may not be possible to extend inferences about model accuracy beyond the training library. If so, the accuracy of acoustic-only surveys may be lower than commonly reported, which could affect ecological understanding or management decisions based on acoustic surveys.
Jiang, Jiming
2015-04-01
Sequencing of complete plant genomes has become increasingly more routine since the advent of the next-generation sequencing technology. Identification and annotation of large amounts of noncoding but functional DNA sequences, including cis-regulatory DNA elements (CREs), have become a new frontier in plant genome research. Genomic regions containing active CREs bound to regulatory proteins are hypersensitive to DNase I digestion and are called DNase I hypersensitive sites (DHSs). Several recent DHS studies in plants illustrate that DHS datasets produced by DNase I digestion followed by next-generation sequencing (DNase-seq) are highly valuable for the identification and characterization of CREs associated with plant development and responses to environmental cues. DHS-based genomic profiling has opened a door to identify and annotate the 'dark matter' in sequenced plant genomes. Copyright © 2015 Elsevier Ltd. All rights reserved.
Berger, C; Berger, B; Parson, W
2012-01-01
In recent years, evidence from domestic dogs has increasingly been analyzed by forensic DNA testing. Especially, canine hairs have proved most suitable and practical due to the high rate of hair transfer occurring between dogs and humans. Starting with the description of a contamination-free sample handling procedure, we give a detailed workflow for sequencing hypervariable segments (HVS) of the mtDNA control region from canine evidence. After the hair material is lysed and the DNA extracted by Phenol/Chloroform, the amplification and sequencing strategy comprises the HVS I and II of the canine control region and is optimized for DNA of medium-to-low quality and quantity. The sequencing procedure is based on the Sanger Big-dye deoxy-terminator method and the separation of the sequencing reaction products is performed on a conventional multicolor fluorescence detection capillary electrophoresis platform. Finally, software-aided base calling and sequence interpretation are addressed exemplarily.
Batts, W.N.; Arakawa, C.K.; Bernard, J.; Winton, J.R.
1993-01-01
Biotinylated DNA probes were constructed to hybndize with speclfic sequences within the messenger RNA (mRNA) of the nucleoprotein (N) gene of vlral hemorrhagic septicemia virus (VHSV) reference strains from Europe (07-71) and North Arnenca (Makah) Probes were synthesized that were complementary to (1) a 29-nucleotide sequence near the center of the N gene conlmon to both the 07-71 and Makah reference strains of the virus (2) a unique 28- nucleotide sequence that followed the open readng frame of the Makah N gene mRNA most of which was absent In the 07-71 strain, and (3) a 22-nucleobde sequence wthin the 07-71 N gene that had 6 nllsmatches \