NASA Astrophysics Data System (ADS)
Lestari, D.; Bustamam, A.; Novianti, T.; Ardaneswari, G.
2017-07-01
DNA sequence can be defined as a succession of letters, representing the order of nucleotides within DNA, using a permutation of four DNA base codes including adenine (A), guanine (G), cytosine (C), and thymine (T). The precise code of the sequences is determined using DNA sequencing methods and technologies, which have been developed since the 1970s and currently become highly developed, advanced and highly throughput sequencing technologies. So far, DNA sequencing has greatly accelerated biological and medical research and discovery. However, in some cases DNA sequencing could produce any ambiguous and not clear enough sequencing results that make them quite difficult to be determined whether these codes are A, T, G, or C. To solve these problems, in this study we can introduce other representation of DNA codes namely Quaternion Q = (PA, PT, PG, PC), where PA, PT, PG, PC are the probability of A, T, G, C bases that could appear in Q and PA + PT + PG + PC = 1. Furthermore, using Quaternion representations we are able to construct the improved scoring matrix for global sequence alignment processes, by applying a dot product method. Moreover, this scoring matrix produces better and higher quality of the match and mismatch score between two DNA base codes. In implementation, we applied the Needleman-Wunsch global sequence alignment algorithm using Octave, to analyze our target sequence which contains some ambiguous sequence data. The subject sequences are the DNA sequences of Streptococcus pneumoniae families obtained from the Genebank, meanwhile the target DNA sequence are received from our collaborator database. As the results we found the Quaternion representations improve the quality of the sequence alignment score and we can conclude that DNA sequence target has maximum similarity with Streptococcus pneumoniae.
Improvements on a privacy-protection algorithm for DNA sequences with generalization lattices.
Li, Guang; Wang, Yadong; Su, Xiaohong
2012-10-01
When developing personal DNA databases, there must be an appropriate guarantee of anonymity, which means that the data cannot be related back to individuals. DNA lattice anonymization (DNALA) is a successful method for making personal DNA sequences anonymous. However, it uses time-consuming multiple sequence alignment and a low-accuracy greedy clustering algorithm. Furthermore, DNALA is not an online algorithm, and so it cannot quickly return results when the database is updated. This study improves the DNALA method. Specifically, we replaced the multiple sequence alignment in DNALA with global pairwise sequence alignment to save time, and we designed a hybrid clustering algorithm comprised of a maximum weight matching (MWM)-based algorithm and an online algorithm. The MWM-based algorithm is more accurate than the greedy algorithm in DNALA and has the same time complexity. The online algorithm can process data quickly when the database is updated. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Efficient alignment-free DNA barcode analytics.
Kuksa, Pavel; Pavlovic, Vladimir
2009-11-10
In this work we consider barcode DNA analysis problems and address them using alternative, alignment-free methods and representations which model sequences as collections of short sequence fragments (features). The methods use fixed-length representations (spectrum) for barcode sequences to measure similarities or dissimilarities between sequences coming from the same or different species. The spectrum-based representation not only allows for accurate and computationally efficient species classification, but also opens possibility for accurate clustering analysis of putative species barcodes and identification of critical within-barcode loci distinguishing barcodes of different sample groups. New alignment-free methods provide highly accurate and fast DNA barcode-based identification and classification of species with substantial improvements in accuracy and speed over state-of-the-art barcode analysis methods. We evaluate our methods on problems of species classification and identification using barcodes, important and relevant analytical tasks in many practical applications (adverse species movement monitoring, sampling surveys for unknown or pathogenic species identification, biodiversity assessment, etc.) On several benchmark barcode datasets, including ACG, Astraptes, Hesperiidae, Fish larvae, and Birds of North America, proposed alignment-free methods considerably improve prediction accuracy compared to prior results. We also observe significant running time improvements over the state-of-the-art methods. Our results show that newly developed alignment-free methods for DNA barcoding can efficiently and with high accuracy identify specimens by examining only few barcode features, resulting in increased scalability and interpretability of current computational approaches to barcoding.
Coval: Improving Alignment Quality and Variant Calling Accuracy for Next-Generation Sequencing Data
Kosugi, Shunichi; Natsume, Satoshi; Yoshida, Kentaro; MacLean, Daniel; Cano, Liliana; Kamoun, Sophien; Terauchi, Ryohei
2013-01-01
Accurate identification of DNA polymorphisms using next-generation sequencing technology is challenging because of a high rate of sequencing error and incorrect mapping of reads to reference genomes. Currently available short read aligners and DNA variant callers suffer from these problems. We developed the Coval software to improve the quality of short read alignments. Coval is designed to minimize the incidence of spurious alignment of short reads, by filtering mismatched reads that remained in alignments after local realignment and error correction of mismatched reads. The error correction is executed based on the base quality and allele frequency at the non-reference positions for an individual or pooled sample. We demonstrated the utility of Coval by applying it to simulated genomes and experimentally obtained short-read data of rice, nematode, and mouse. Moreover, we found an unexpectedly large number of incorrectly mapped reads in ‘targeted’ alignments, where the whole genome sequencing reads had been aligned to a local genomic segment, and showed that Coval effectively eliminated such spurious alignments. We conclude that Coval significantly improves the quality of short-read sequence alignments, thereby increasing the calling accuracy of currently available tools for SNP and indel identification. Coval is available at http://sourceforge.net/projects/coval105/. PMID:24116042
Efficient alignment-free DNA barcode analytics
Kuksa, Pavel; Pavlovic, Vladimir
2009-01-01
Background In this work we consider barcode DNA analysis problems and address them using alternative, alignment-free methods and representations which model sequences as collections of short sequence fragments (features). The methods use fixed-length representations (spectrum) for barcode sequences to measure similarities or dissimilarities between sequences coming from the same or different species. The spectrum-based representation not only allows for accurate and computationally efficient species classification, but also opens possibility for accurate clustering analysis of putative species barcodes and identification of critical within-barcode loci distinguishing barcodes of different sample groups. Results New alignment-free methods provide highly accurate and fast DNA barcode-based identification and classification of species with substantial improvements in accuracy and speed over state-of-the-art barcode analysis methods. We evaluate our methods on problems of species classification and identification using barcodes, important and relevant analytical tasks in many practical applications (adverse species movement monitoring, sampling surveys for unknown or pathogenic species identification, biodiversity assessment, etc.) On several benchmark barcode datasets, including ACG, Astraptes, Hesperiidae, Fish larvae, and Birds of North America, proposed alignment-free methods considerably improve prediction accuracy compared to prior results. We also observe significant running time improvements over the state-of-the-art methods. Conclusion Our results show that newly developed alignment-free methods for DNA barcoding can efficiently and with high accuracy identify specimens by examining only few barcode features, resulting in increased scalability and interpretability of current computational approaches to barcoding. PMID:19900305
2012-01-01
Background Hawthorn is the common name of all plant species in the genus Crataegus, which belongs to the Rosaceae family. Crataegus are considered useful medicinal plants because of their high content of proanthocyanidins (PAs) and other related compounds. To improve PAs production in Crataegus tissues, the sequences of genes encoding PAs biosynthetic enzymes are required. Findings Different bioinformatics tools, including BLAST, multiple sequence alignment and alignment PCR analysis were used to design primers suitable for the amplification of DNA fragments from 10 candidate genes encoding enzymes involved in PAs biosynthesis in C. aronia. DNA sequencing results proved the utility of the designed primers. The primers were used successfully to amplify DNA fragments of different PAs biosynthesis genes in different Rosaceae plants. Conclusion To the best of our knowledge, this is the first use of the alignment PCR approach to isolate DNA sequences encoding PAs biosynthetic enzymes in Rosaceae plants. PMID:22883984
Zuiter, Afnan Saeid; Sawwan, Jammal; Al Abdallat, Ayed
2012-08-10
Hawthorn is the common name of all plant species in the genus Crataegus, which belongs to the Rosaceae family. Crataegus are considered useful medicinal plants because of their high content of proanthocyanidins (PAs) and other related compounds. To improve PAs production in Crataegus tissues, the sequences of genes encoding PAs biosynthetic enzymes are required. Different bioinformatics tools, including BLAST, multiple sequence alignment and alignment PCR analysis were used to design primers suitable for the amplification of DNA fragments from 10 candidate genes encoding enzymes involved in PAs biosynthesis in C. aronia. DNA sequencing results proved the utility of the designed primers. The primers were used successfully to amplify DNA fragments of different PAs biosynthesis genes in different Rosaceae plants. To the best of our knowledge, this is the first use of the alignment PCR approach to isolate DNA sequences encoding PAs biosynthetic enzymes in Rosaceae plants.
Fayazfar, H; Afshar, A; Dolati, M; Dolati, A
2014-07-11
For the first time, a new platform based on electrochemical growth of Au nanoparticles on aligned multi-walled carbon nanotubes (A-MWCNT) was developed for sensitive lable-free DNA detection of the TP53 gene mutation, one of the most popular genes in cancer research. Electrochemical impedance spectroscopy (EIS) was used to monitor the sequence-specific DNA hybridization events related to TP53 gene. Compared to the bare Ta or MWCNT/Ta electrodes, the synergistic interactions of vertically aligned MWCNT array and gold nanoparticles at modified electrode could improve the density of the probe DNA attachment and resulting the sensitivity of the DNA sensor greatly. Using EIS, over the extended DNA concentration range, the change of charge transfer resistance was found to have a linear relationship in respect to the logarithm of the complementary oligonucleotides sequence concentrations in the wide range of 1.0×10(-15)-1.0×10(-7)M, with a detection limit of 1.0×10(-17)M (S/N=3). The prepared sensor also showed good stability (14 days), reproducibility (RSD=2.1%) and could be conveniently regenerated via dehybridization in hot water. The significant improvement in sensitivity illustrates that combining gold nanoparticles with the on-site fabricated aligned MWCNT array represents a promising platform for achieving sensitive biosensor for fast mutation screening related to most human cancer types. Copyright © 2014. Published by Elsevier B.V.
USDA-ARS?s Scientific Manuscript database
BACKGROUND: Next-generation sequencing projects commonly commence by aligning reads to a reference genome assembly. While improvements in alignment algorithms and computational hardware have greatly enhanced the efficiency and accuracy of alignments, a significant percentage of reads often remain u...
Lu, Emily; Elizondo-Riojas, Miguel-Angel; Chang, Jeffrey T; Volk, David E
2014-06-10
Next-generation sequencing results from bead-based aptamer libraries have demonstrated that traditional DNA/RNA alignment software is insufficient. This is particularly true for X-aptamers containing specialty bases (W, X, Y, Z, ...) that are identified by special encoding. Thus, we sought an automated program that uses the inherent design scheme of bead-based X-aptamers to create a hypothetical reference library and Markov modeling techniques to provide improved alignments. Aptaligner provides this feature as well as length error and noise level cutoff features, is parallelized to run on multiple central processing units (cores), and sorts sequences from a single chip into projects and subprojects.
StructAlign, a Program for Alignment of Structures of DNA-Protein Complexes.
Popov, Ya V; Galitsyna, A A; Alexeevski, A V; Karyagina, A S; Spirin, S A
2015-11-01
Comparative analysis of structures of complexes of homologous proteins with DNA is important in the analysis of DNA-protein recognition. Alignment is a necessary stage of the analysis. An alignment is a matching of amino acid residues and nucleotides of one complex to residues and nucleotides of the other. Currently, there are no programs available for aligning structures of DNA-protein complexes. We present the program StructAlign, which should fill this gap. The program inputs a pair of complexes of DNA double helix with proteins and outputs an alignment of DNA chains corresponding to the best spatial fit of the protein chains.
Raimann, Jochen G; Ficociello, Linda H; Usvyat, Len A; Zhang, Hanjie; Pacelli, Lisa; Moore, Sandi; Sheppard, Penny; Xiao, Qingqing; Wang, Yuedong; Mullon, Claudy; Balter, Paul; Sullivan, Terry; Kotanko, Peter
2018-04-02
Evidence indicates favorable effects of dialysate (DNa + ) to serum sodium concentration (SNa + ) alignment, however, results from larger sample populations are needed. For this reason, we conducted a retrospective propensity score-matched cohort study from a quality improvement project to investigate the effects of alignment on population of maintenance hemodialysis patients. At 4 participating hemodialysis (HD) clinics, patients with SNa + lower than the standard DNa + of 137 mEq/L who received HD with DNa + aligned to the average of the last 4 SNa + measurements were evaluated (clinicaltrials.gov # NCT01825590 ). In this retrospective data analysis, an intention-to-treat (primary) and an as-treated "intervention" (secondary) cohort were created. "Aligned" patients from both cohorts (N = 163 for the primary and N = 137 for the secondary) were then propensity-score matched in a 1:1 fashion to "unaligned" patients from the Renal Research Institute database. The propensity score was generated based on age, gender, white race, Hispanic ethnicity, absence or presence of diabetes, hemodialysis vintage, interdialytic weight gain (IDWG; as a percentage of postdialysis body weight), catheter as primary dialysis access, predialysis systolic blood pressure, serum sodium concentration, hospitalization count during baseline. T-Test was employed for group comparisons of changes to the primary (volume-related and hemodynamic parameters) and tertiary outcomes. All-cause and fluid overload-related hospitalization admission rates were compared using Wilcoxon Rank Sum test and Cox regression analysis for repeated events. In the primary analysis, aligned and unaligned subjects showed comparable demographics at baseline. Treatment effects were significant for IDWG [-0.12 (95% CI -0.24 to 0) L] and showed decreasing non-significant trends for pre-dialysis hemodynamic parameters. Count comparison and Cox regression analysis showed no clear advantage of alignment in terms of all-cause and fluid overload-related hospitalization. Results from the largest sodium alignment program to date suggest positive treatment effects on volume-related and hemodynamic parameters, but no clear effect on risk of hospitalization. Well-matched control patients minimized confounding effects. Small effects and lack of significant differences may be explained by a low baseline DNa + limiting the interventional change.
Yoon, Hyejin; Leitner, Thomas
2014-12-17
Analyses of entire viral genomes or mtDNA requires comprehensive design of many primers across their genomes. In addition, simultaneous optimization of several DNA primer design criteria may improve overall experimental efficiency and downstream bioinformatic processing. To achieve these goals, we developed PrimerDesign-M. It includes several options for multiple-primer design, allowing researchers to efficiently design walking primers that cover long DNA targets, such as entire HIV-1 genomes, and that optimizes primers simultaneously informed by genetic diversity in multiple alignments and experimental design constraints given by the user. PrimerDesign-M can also design primers that include DNA barcodes and minimize primer dimerization. PrimerDesign-Mmore » finds optimal primers for highly variable DNA targets and facilitates design flexibility by suggesting alternative designs to adapt to experimental conditions.« less
An improved model for whole genome phylogenetic analysis by Fourier transform.
Yin, Changchuan; Yau, Stephen S-T
2015-10-07
DNA sequence similarity comparison is one of the major steps in computational phylogenetic studies. The sequence comparison of closely related DNA sequences and genomes is usually performed by multiple sequence alignments (MSA). While the MSA method is accurate for some types of sequences, it may produce incorrect results when DNA sequences undergone rearrangements as in many bacterial and viral genomes. It is also limited by its computational complexity for comparing large volumes of data. Previously, we proposed an alignment-free method that exploits the full information contents of DNA sequences by Discrete Fourier Transform (DFT), but still with some limitations. Here, we present a significantly improved method for the similarity comparison of DNA sequences by DFT. In this method, we map DNA sequences into 2-dimensional (2D) numerical sequences and then apply DFT to transform the 2D numerical sequences into frequency domain. In the 2D mapping, the nucleotide composition of a DNA sequence is a determinant factor and the 2D mapping reduces the nucleotide composition bias in distance measure, and thus improving the similarity measure of DNA sequences. To compare the DFT power spectra of DNA sequences with different lengths, we propose an improved even scaling algorithm to extend shorter DFT power spectra to the longest length of the underlying sequences. After the DFT power spectra are evenly scaled, the spectra are in the same dimensionality of the Fourier frequency space, then the Euclidean distances of full Fourier power spectra of the DNA sequences are used as the dissimilarity metrics. The improved DFT method, with increased computational performance by 2D numerical representation, can be applicable to any DNA sequences of different length ranges. We assess the accuracy of the improved DFT similarity measure in hierarchical clustering of different DNA sequences including simulated and real datasets. The method yields accurate and reliable phylogenetic trees and demonstrates that the improved DFT dissimilarity measure is an efficient and effective similarity measure of DNA sequences. Due to its high efficiency and accuracy, the proposed DFT similarity measure is successfully applied on phylogenetic analysis for individual genes and large whole bacterial genomes. Copyright © 2015 Elsevier Ltd. All rights reserved.
The number of reduced alignments between two DNA sequences
2014-01-01
Background In this study we consider DNA sequences as mathematical strings. Total and reduced alignments between two DNA sequences have been considered in the literature to measure their similarity. Results for explicit representations of some alignments have been already obtained. Results We present exact, explicit and computable formulas for the number of different possible alignments between two DNA sequences and a new formula for a class of reduced alignments. Conclusions A unified approach for a wide class of alignments between two DNA sequences has been provided. The formula is computable and, if complemented by software development, will provide a deeper insight into the theory of sequence alignment and give rise to new comparison methods. AMS Subject Classification Primary 92B05, 33C20, secondary 39A14, 65Q30 PMID:24684679
Noise reduction in single time frame optical DNA maps
Müller, Vilhelm; Westerlund, Fredrik
2017-01-01
In optical DNA mapping technologies sequence-specific intensity variations (DNA barcodes) along stretched and stained DNA molecules are produced. These “fingerprints” of the underlying DNA sequence have a resolution of the order one kilobasepairs and the stretching of the DNA molecules are performed by surface adsorption or nano-channel setups. A post-processing challenge for nano-channel based methods, due to local and global random movement of the DNA molecule during imaging, is how to align different time frames in order to produce reproducible time-averaged DNA barcodes. The current solutions to this challenge are computationally rather slow. With high-throughput applications in mind, we here introduce a parameter-free method for filtering a single time frame noisy barcode (snap-shot optical map), measured in a fraction of a second. By using only a single time frame barcode we circumvent the need for post-processing alignment. We demonstrate that our method is successful at providing filtered barcodes which are less noisy and more similar to time averaged barcodes. The method is based on the application of a low-pass filter on a single noisy barcode using the width of the Point Spread Function of the system as a unique, and known, filtering parameter. We find that after applying our method, the Pearson correlation coefficient (a real number in the range from -1 to 1) between the single time-frame barcode and the time average of the aligned kymograph increases significantly, roughly by 0.2 on average. By comparing to a database of more than 3000 theoretical plasmid barcodes we show that the capabilities to identify plasmids is improved by filtering single time-frame barcodes compared to the unfiltered analogues. Since snap-shot experiments and computational time using our method both are less than a second, this study opens up for high throughput optical DNA mapping with improved reproducibility. PMID:28640821
Genetically improved BarraCUDA.
Langdon, W B; Lam, Brian Yee Hong
2017-01-01
BarraCUDA is an open source C program which uses the BWA algorithm in parallel with nVidia CUDA to align short next generation DNA sequences against a reference genome. Recently its source code was optimised using "Genetic Improvement". The genetically improved (GI) code is up to three times faster on short paired end reads from The 1000 Genomes Project and 60% more accurate on a short BioPlanet.com GCAT alignment benchmark. GPGPU BarraCUDA running on a single K80 Tesla GPU can align short paired end nextGen sequences up to ten times faster than bwa on a 12 core server. The speed up was such that the GI version was adopted and has been regularly downloaded from SourceForge for more than 12 months.
Indel detection from DNA and RNA sequencing data with transIndel.
Yang, Rendong; Van Etten, Jamie L; Dehm, Scott M
2018-04-19
Insertions and deletions (indels) are a major class of genomic variation associated with human disease. Indels are primarily detected from DNA sequencing (DNA-seq) data but their transcriptional consequences remain unexplored due to challenges in discriminating medium-sized and large indels from splicing events in RNA-seq data. Here, we developed transIndel, a splice-aware algorithm that parses the chimeric alignments predicted by a short read aligner and reconstructs the mid-sized insertions and large deletions based on the linear alignments of split reads from DNA-seq or RNA-seq data. TransIndel exhibits competitive or superior performance over eight state-of-the-art indel detection tools on benchmarks using both synthetic and real DNA-seq data. Additionally, we applied transIndel to DNA-seq and RNA-seq datasets from 333 primary prostate cancer patients from The Cancer Genome Atlas (TCGA) and 59 metastatic prostate cancer patients from AACR-PCF Stand-Up- To-Cancer (SU2C) studies. TransIndel enhanced the taxonomy of DNA- and RNA-level alterations in prostate cancer by identifying recurrent FOXA1 indels as well as exitron splicing in genes implicated in disease progression. Our study demonstrates that transIndel is a robust tool for elucidation of medium- and large-sized indels from DNA-seq and RNA-seq data. Including RNA-seq in indel discovery efforts leads to significant improvements in sensitivity for identification of med-sized and large indels missed by DNA-seq, and reveals non-canonical RNA-splicing events in genes associated with disease pathology.
Finding Protein and Nucleotide Similarities with FASTA
Pearson, William R.
2016-01-01
The FASTA programs provide a comprehensive set of rapid similarity searching tools ( fasta36, fastx36, tfastx36, fasty36, tfasty36), similar to those provided by the BLAST package, as well as programs for slower, optimal, local and global similarity searches ( ssearch36, ggsearch36) and for searching with short peptides and oligonucleotides ( fasts36, fastm36). The FASTA programs use an empirical strategy for estimating statistical significance that accommodates a range of similarity scoring matrices and gap penalties, improving alignment boundary accuracy and search sensitivity (Unit 3.5). The FASTA programs can produce “BLAST-like” alignment and tabular output, for ease of integration into existing analysis pipelines, and can search small, representative databases, and then report results for a larger set of sequences, using links from the smaller dataset. The FASTA programs work with a wide variety of database formats, including mySQL and postgreSQL databases (Unit 9.4). The programs also provide a strategy for integrating domain and active site annotations into alignments and highlighting the mutational state of functionally critical residues. These protocols describe how to use the FASTA programs to characterize protein and DNA sequences, using protein:protein, protein:DNA, and DNA:DNA comparisons. PMID:27010337
Finding Protein and Nucleotide Similarities with FASTA.
Pearson, William R
2016-03-24
The FASTA programs provide a comprehensive set of rapid similarity searching tools (fasta36, fastx36, tfastx36, fasty36, tfasty36), similar to those provided by the BLAST package, as well as programs for slower, optimal, local, and global similarity searches (ssearch36, ggsearch36), and for searching with short peptides and oligonucleotides (fasts36, fastm36). The FASTA programs use an empirical strategy for estimating statistical significance that accommodates a range of similarity scoring matrices and gap penalties, improving alignment boundary accuracy and search sensitivity. The FASTA programs can produce "BLAST-like" alignment and tabular output, for ease of integration into existing analysis pipelines, and can search small, representative databases, and then report results for a larger set of sequences, using links from the smaller dataset. The FASTA programs work with a wide variety of database formats, including mySQL and postgreSQL databases. The programs also provide a strategy for integrating domain and active site annotations into alignments and highlighting the mutational state of functionally critical residues. These protocols describe how to use the FASTA programs to characterize protein and DNA sequences, using protein:protein, protein:DNA, and DNA:DNA comparisons. Copyright © 2016 John Wiley & Sons, Inc.
Wright, Imogen A.; Travers, Simon A.
2014-01-01
The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA. RAMICS utilizes profile hidden Markov models to discover the open reading frame of each sequence and aligns to the reference sequence in a biologically relevant manner, distinguishing between genuine codon-sized indels and frameshift mutations. This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. Performance improvements are gained through the use of graphics processing units, which increase the speed of mapping through parallelization. RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance. PMID:24861618
Pairagon: a highly accurate, HMM-based cDNA-to-genome aligner.
Lu, David V; Brown, Randall H; Arumugam, Manimozhiyan; Brent, Michael R
2009-07-01
The most accurate way to determine the intron-exon structures in a genome is to align spliced cDNA sequences to the genome. Thus, cDNA-to-genome alignment programs are a key component of most annotation pipelines. The scoring system used to choose the best alignment is a primary determinant of alignment accuracy, while heuristics that prevent consideration of certain alignments are a primary determinant of runtime and memory usage. Both accuracy and speed are important considerations in choosing an alignment algorithm, but scoring systems have received much less attention than heuristics. We present Pairagon, a pair hidden Markov model based cDNA-to-genome alignment program, as the most accurate aligner for sequences with high- and low-identity levels. We conducted a series of experiments testing alignment accuracy with varying sequence identity. We first created 'perfect' simulated cDNA sequences by splicing the sequences of exons in the reference genome sequences of fly and human. The complete reference genome sequences were then mutated to various degrees using a realistic mutation simulator and the perfect cDNAs were aligned to them using Pairagon and 12 other aligners. To validate these results with natural sequences, we performed cross-species alignment using orthologous transcripts from human, mouse and rat. We found that aligner accuracy is heavily dependent on sequence identity. For sequences with 100% identity, Pairagon achieved accuracy levels of >99.6%, with one quarter of the errors of any other aligner. Furthermore, for human/mouse alignments, which are only 85% identical, Pairagon achieved 87% accuracy, higher than any other aligner. Pairagon source and executables are freely available at http://mblab.wustl.edu/software/pairagon/
DNA Multiple Sequence Alignment Guided by Protein Domains: The MSA-PAD 2.0 Method.
Balech, Bachir; Monaco, Alfonso; Perniola, Michele; Santamaria, Monica; Donvito, Giacinto; Vicario, Saverio; Maggi, Giorgio; Pesole, Graziano
2018-01-01
Multiple sequence alignment (MSA) is a fundamental component in many DNA sequence analyses including metagenomics studies and phylogeny inference. When guided by protein profiles, DNA multiple alignments assume a higher precision and robustness. Here we present details of the use of the upgraded version of MSA-PAD (2.0), which is a DNA multiple sequence alignment framework able to align DNA sequences coding for single/multiple protein domains guided by PFAM or user-defined annotations. MSA-PAD has two alignment strategies, called "Gene" and "Genome," accounting for coding domains order and genomic rearrangements, respectively. Novel options were added to the present version, where the MSA can be guided by protein profiles provided by the user. This allows MSA-PAD 2.0 to run faster and to add custom protein profiles sometimes not present in PFAM database according to the user's interest. MSA-PAD 2.0 is currently freely available as a Web application at https://recasgateway.cloud.ba.infn.it/ .
The twilight zone of cis element alignments.
Sebastian, Alvaro; Contreras-Moreira, Bruno
2013-02-01
Sequence alignment of proteins and nucleic acids is a routine task in bioinformatics. Although the comparison of complete peptides, genes or genomes can be undertaken with a great variety of tools, the alignment of short DNA sequences and motifs entails pitfalls that have not been fully addressed yet. Here we confront the structural superposition of transcription factors with the sequence alignment of their recognized cis elements. Our goals are (i) to test TFcompare (http://floresta.eead.csic.es/tfcompare), a structural alignment method for protein-DNA complexes; (ii) to benchmark the pairwise alignment of regulatory elements; (iii) to define the confidence limits and the twilight zone of such alignments and (iv) to evaluate the relevance of these thresholds with elements obtained experimentally. We find that the structure of cis elements and protein-DNA interfaces is significantly more conserved than their sequence and measures how this correlates with alignment errors when only sequence information is considered. Our results confirm that DNA motifs in the form of matrices produce better alignments than individual sequences. Finally, we report that empirical and theoretically derived twilight thresholds are useful for estimating the natural plasticity of regulatory sequences, and hence for filtering out unreliable alignments.
The twilight zone of cis element alignments
Sebastian, Alvaro; Contreras-Moreira, Bruno
2013-01-01
Sequence alignment of proteins and nucleic acids is a routine task in bioinformatics. Although the comparison of complete peptides, genes or genomes can be undertaken with a great variety of tools, the alignment of short DNA sequences and motifs entails pitfalls that have not been fully addressed yet. Here we confront the structural superposition of transcription factors with the sequence alignment of their recognized cis elements. Our goals are (i) to test TFcompare (http://floresta.eead.csic.es/tfcompare), a structural alignment method for protein–DNA complexes; (ii) to benchmark the pairwise alignment of regulatory elements; (iii) to define the confidence limits and the twilight zone of such alignments and (iv) to evaluate the relevance of these thresholds with elements obtained experimentally. We find that the structure of cis elements and protein–DNA interfaces is significantly more conserved than their sequence and measures how this correlates with alignment errors when only sequence information is considered. Our results confirm that DNA motifs in the form of matrices produce better alignments than individual sequences. Finally, we report that empirical and theoretically derived twilight thresholds are useful for estimating the natural plasticity of regulatory sequences, and hence for filtering out unreliable alignments. PMID:23268451
The Dfam database of repetitive DNA families.
Hubley, Robert; Finn, Robert D; Clements, Jody; Eddy, Sean R; Jones, Thomas A; Bao, Weidong; Smit, Arian F A; Wheeler, Travis J
2016-01-04
Repetitive DNA, especially that due to transposable elements (TEs), makes up a large fraction of many genomes. Dfam is an open access database of families of repetitive DNA elements, in which each family is represented by a multiple sequence alignment and a profile hidden Markov model (HMM). The initial release of Dfam, featured in the 2013 NAR Database Issue, contained 1143 families of repetitive elements found in humans, and was used to produce more than 100 Mb of additional annotation of TE-derived regions in the human genome, with improved speed. Here, we describe recent advances, most notably expansion to 4150 total families including a comprehensive set of known repeat families from four new organisms (mouse, zebrafish, fly and nematode). We describe improvements to coverage, and to our methods for identifying and reducing false annotation. We also describe updates to the website interface. The Dfam website has moved to http://dfam.org. Seed alignments, profile HMMs, hit lists and other underlying data are available for download. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Wright, Imogen A; Travers, Simon A
2014-07-01
The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA. RAMICS utilizes profile hidden Markov models to discover the open reading frame of each sequence and aligns to the reference sequence in a biologically relevant manner, distinguishing between genuine codon-sized indels and frameshift mutations. This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. Performance improvements are gained through the use of graphics processing units, which increase the speed of mapping through parallelization. RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Statistical Significance of Optical Map Alignments
Sarkar, Deepayan; Goldstein, Steve; Schwartz, David C.
2012-01-01
Abstract The Optical Mapping System constructs ordered restriction maps spanning entire genomes through the assembly and analysis of large datasets comprising individually analyzed genomic DNA molecules. Such restriction maps uniquely reveal mammalian genome structure and variation, but also raise computational and statistical questions beyond those that have been solved in the analysis of smaller, microbial genomes. We address the problem of how to filter maps that align poorly to a reference genome. We obtain map-specific thresholds that control errors and improve iterative assembly. We also show how an optimal self-alignment score provides an accurate approximation to the probability of alignment, which is useful in applications seeking to identify structural genomic abnormalities. PMID:22506568
Fornander, Louise H.; Renodon-Cornière, Axelle; Kuwabara, Naoyuki; Ito, Kentaro; Tsutsui, Yasuhiro; Shimizu, Toshiyuki; Iwasaki, Hiroshi; Nordén, Bengt; Takahashi, Masayuki
2014-01-01
The Swi5-Sfr1 heterodimer protein stimulates the Rad51-promoted DNA strand exchange reaction, a crucial step in homologous recombination. To clarify how this accessory protein acts on the strand exchange reaction, we have analyzed how the structure of the primary reaction intermediate, the Rad51/single-stranded DNA (ssDNA) complex filament formed in the presence of ATP, is affected by Swi5-Sfr1. Using flow linear dichroism spectroscopy, we observe that the nucleobases of the ssDNA are more perpendicularly aligned to the filament axis in the presence of Swi5-Sfr1, whereas the bases are more randomly oriented in the absence of Swi5-Sfr1. When using a modified version of the natural protein where the N-terminal part of Sfr1 is deleted, which has no affinity for DNA but maintained ability to stimulate the strand exchange reaction, we still observe the improved perpendicular DNA base orientation. This indicates that Swi5-Sfr1 exerts its activating effect through interaction with the Rad51 filament mainly and not with the DNA. We propose that the role of a coplanar alignment of nucleobases induced by Swi5-Sfr1 in the presynaptic Rad51/ssDNA complex is to facilitate the critical matching with an invading double-stranded DNA, hence stimulating the strand exchange reaction. PMID:24304898
Fornander, Louise H; Renodon-Cornière, Axelle; Kuwabara, Naoyuki; Ito, Kentaro; Tsutsui, Yasuhiro; Shimizu, Toshiyuki; Iwasaki, Hiroshi; Nordén, Bengt; Takahashi, Masayuki
2014-02-01
The Swi5-Sfr1 heterodimer protein stimulates the Rad51-promoted DNA strand exchange reaction, a crucial step in homologous recombination. To clarify how this accessory protein acts on the strand exchange reaction, we have analyzed how the structure of the primary reaction intermediate, the Rad51/single-stranded DNA (ssDNA) complex filament formed in the presence of ATP, is affected by Swi5-Sfr1. Using flow linear dichroism spectroscopy, we observe that the nucleobases of the ssDNA are more perpendicularly aligned to the filament axis in the presence of Swi5-Sfr1, whereas the bases are more randomly oriented in the absence of Swi5-Sfr1. When using a modified version of the natural protein where the N-terminal part of Sfr1 is deleted, which has no affinity for DNA but maintained ability to stimulate the strand exchange reaction, we still observe the improved perpendicular DNA base orientation. This indicates that Swi5-Sfr1 exerts its activating effect through interaction with the Rad51 filament mainly and not with the DNA. We propose that the role of a coplanar alignment of nucleobases induced by Swi5-Sfr1 in the presynaptic Rad51/ssDNA complex is to facilitate the critical matching with an invading double-stranded DNA, hence stimulating the strand exchange reaction.
Minimap2: pairwise alignment for nucleotide sequences.
Li, Heng
2018-05-10
Recent advances in sequencing technologies promise ultra-long reads of ∼100 kilo bases (kb) in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 mega bases (Mb) in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms. Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database. It works with accurate short reads of ≥ 100bp in length, ≥1kb genomic reads at error rate ∼15%, full-length noisy Direct RNA or cDNA reads, and assembly contigs or closely related full chromosomes of hundreds of megabases in length. Minimap2 does split-read alignment, employs concave gap cost for long insertions and deletions (INDELs) and introduces new heuristics to reduce spurious alignments. It is 3-4 times as fast as mainstream short-read mappers at comparable accuracy, and is ≥30 times faster than long-read genomic or cDNA mappers at higher accuracy, surpassing most aligners specialized in one type of alignment. https://github.com/lh3/minimap2. hengli@broadinstitute.org.
Local alignment of two-base encoded DNA sequence
Homer, Nils; Merriman, Barry; Nelson, Stanley F
2009-01-01
Background DNA sequence comparison is based on optimal local alignment of two sequences using a similarity score. However, some new DNA sequencing technologies do not directly measure the base sequence, but rather an encoded form, such as the two-base encoding considered here. In order to compare such data to a reference sequence, the data must be decoded into sequence. The decoding is deterministic, but the possibility of measurement errors requires searching among all possible error modes and resulting alignments to achieve an optimal balance of fewer errors versus greater sequence similarity. Results We present an extension of the standard dynamic programming method for local alignment, which simultaneously decodes the data and performs the alignment, maximizing a similarity score based on a weighted combination of errors and edits, and allowing an affine gap penalty. We also present simulations that demonstrate the performance characteristics of our two base encoded alignment method and contrast those with standard DNA sequence alignment under the same conditions. Conclusion The new local alignment algorithm for two-base encoded data has substantial power to properly detect and correct measurement errors while identifying underlying sequence variants, and facilitating genome re-sequencing efforts based on this form of sequence data. PMID:19508732
Teschome, Bezu; Facsko, Stefan; Gothelf, Kurt V; Keller, Adrian
2015-11-24
DNA origami has become an established technique for designing well-defined nanostructures with any desired shape and for the controlled arrangement of functional nanostructures with few nanometer resolution. These unique features make DNA origami nanostructures promising candidates for use as scaffolds in nanoelectronics and nanophotonics device fabrication. Consequently, a number of studies have shown the precise organization of metallic nanoparticles on various DNA origami shapes. In this work, we fabricated large arrays of aligned DNA origami decorated with a high density of gold nanoparticles (AuNPs). To this end, we first demonstrate the high-yield assembly of high-density AuNP arrangements on DNA origami adsorbed to Si surfaces with few unbound background nanoparticles by carefully controlling the concentrations of MgCl2 and AuNPs in the hybridization buffer and the hybridization time. Then, we evaluate two methods, i.e., hybridization to prealigned DNA origami and molecular combing in a receding meniscus, with respect to their potential to yield large arrays of aligned AuNP-decorated DNA origami nanotubes. Because of the comparatively low MgCl2 concentration required for the efficient immobilization of the AuNPs, the prealigned DNA origami become mobile and displaced from their original positions, thereby decreasing the alignment yield. This increased mobility, on the other hand, makes the adsorbed origami susceptible to molecular combing, and a total alignment yield of 86% is obtained in this way.
How effective are DNA barcodes in the identification of African rainforest trees?
Parmentier, Ingrid; Duminil, Jérôme; Kuzmina, Maria; Philippe, Morgane; Thomas, Duncan W; Kenfack, David; Chuyong, George B; Cruaud, Corinne; Hardy, Olivier J
2013-01-01
DNA barcoding of rain forest trees could potentially help biologists identify species and discover new ones. However, DNA barcodes cannot always distinguish between closely related species, and the size and completeness of barcode databases are key parameters for their successful application. We test the ability of rbcL, matK and trnH-psbA plastid DNA markers to identify rain forest trees at two sites in Atlantic central Africa under the assumption that a database is exhaustive in terms of species content, but not necessarily in terms of haplotype diversity within species. We assess the accuracy of identification to species or genus using a genetic distance matrix between samples either based on a global multiple sequence alignment (GD) or on a basic local alignment search tool (BLAST). Where a local database is available (within a 50 ha plot), barcoding was generally reliable for genus identification (95-100% success), but less for species identification (71-88%). Using a single marker, best results for species identification were obtained with trnH-psbA. There was a significant decrease of barcoding success in species-rich clades. When the local database was used to identify the genus of trees from another region and did include all genera from the query individuals but not all species, genus identification success decreased to 84-90%. The GD method performed best but a global multiple sequence alignment is not applicable on trnH-psbA. Barcoding is a useful tool to assign unidentified African rain forest trees to a genus, but identification to a species is less reliable, especially in species-rich clades, even using an exhaustive local database. Combining two markers improves the accuracy of species identification but it would only marginally improve genus identification. Finally, we highlight some limitations of the BLAST algorithm as currently implemented and suggest possible improvements for barcoding applications.
How Effective Are DNA Barcodes in the Identification of African Rainforest Trees?
Parmentier, Ingrid; Duminil, Jérôme; Kuzmina, Maria; Philippe, Morgane; Thomas, Duncan W.; Kenfack, David; Chuyong, George B.; Cruaud, Corinne; Hardy, Olivier J.
2013-01-01
Background DNA barcoding of rain forest trees could potentially help biologists identify species and discover new ones. However, DNA barcodes cannot always distinguish between closely related species, and the size and completeness of barcode databases are key parameters for their successful application. We test the ability of rbcL, matK and trnH-psbA plastid DNA markers to identify rain forest trees at two sites in Atlantic central Africa under the assumption that a database is exhaustive in terms of species content, but not necessarily in terms of haplotype diversity within species. Methodology/Principal Findings We assess the accuracy of identification to species or genus using a genetic distance matrix between samples either based on a global multiple sequence alignment (GD) or on a basic local alignment search tool (BLAST). Where a local database is available (within a 50 ha plot), barcoding was generally reliable for genus identification (95–100% success), but less for species identification (71–88%). Using a single marker, best results for species identification were obtained with trnH-psbA. There was a significant decrease of barcoding success in species-rich clades. When the local database was used to identify the genus of trees from another region and did include all genera from the query individuals but not all species, genus identification success decreased to 84–90%. The GD method performed best but a global multiple sequence alignment is not applicable on trnH-psbA. Conclusions/Significance Barcoding is a useful tool to assign unidentified African rain forest trees to a genus, but identification to a species is less reliable, especially in species-rich clades, even using an exhaustive local database. Combining two markers improves the accuracy of species identification but it would only marginally improve genus identification. Finally, we highlight some limitations of the BLAST algorithm as currently implemented and suggest possible improvements for barcoding applications. PMID:23565134
Searching for SNPs with cloud computing
2009-01-01
As DNA sequencing outpaces improvements in computer speed, there is a critical need to accelerate tasks like alignment and SNP calling. Crossbow is a cloud-computing software tool that combines the aligner Bowtie and the SNP caller SOAPsnp. Executing in parallel using Hadoop, Crossbow analyzes data comprising 38-fold coverage of the human genome in three hours using a 320-CPU cluster rented from a cloud computing service for about $85. Crossbow is available from http://bowtie-bio.sourceforge.net/crossbow/. PMID:19930550
Maleki, Ehsan; Babashah, Hossein; Koohi, Somayyeh; Kavehvash, Zahra
2017-07-01
This paper presents an optical processing approach for exploring a large number of genome sequences. Specifically, we propose an optical correlator for global alignment and an extended moiré matching technique for local analysis of spatially coded DNA, whose output is fed to a novel three-dimensional artificial neural network for local DNA alignment. All-optical implementation of the proposed 3D artificial neural network is developed and its accuracy is verified in Zemax. Thanks to its parallel processing capability, the proposed structure performs local alignment of 4 million sequences of 150 base pairs in a few seconds, which is much faster than its electrical counterparts, such as the basic local alignment search tool.
Multiple DNA and protein sequence alignment on a workstation and a supercomputer.
Tajima, K
1988-11-01
This paper describes a multiple alignment method using a workstation and supercomputer. The method is based on the alignment of a set of aligned sequences with the new sequence, and uses a recursive procedure of such alignment. The alignment is executed in a reasonable computation time on diverse levels from a workstation to a supercomputer, from the viewpoint of alignment results and computational speed by parallel processing. The application of the algorithm is illustrated by several examples of multiple alignment of 12 amino acid and DNA sequences of HIV (human immunodeficiency virus) env genes. Colour graphic programs on a workstation and parallel processing on a supercomputer are discussed.
Nanofabricated Racks of Aligned and Anchored DNA Substrates for Single-Molecule Imaging
2009-01-01
Single-molecule studies of biological macromolecules can benefit from new experimental platforms that facilitate experimental design and data acquisition. Here we develop new strategies to construct curtains of DNA in which the molecules are aligned with respect to one another and maintained in an extended configuration by anchoring both ends of the DNA to the surface of a microfluidic sample chamber that is otherwise coated with an inert lipid bilayer. This “double-tethered” DNA substrate configuration is established through the use of nanofabricated rack patterns comprised of two distinct functional elements: linear barriers to lipid diffusion that align DNA molecules anchored by one end to the bilayer and antibody-coated pentagons that provide immobile anchor points for the opposite ends of the DNA. These devices enable the alignment and anchoring of thousands of individual DNA molecules, which can then be visualized using total internal reflection fluorescence microscopy under conditions that do not require continuous application of buffer flow to stretch the DNA. This unique strategy offers the potential for studying protein−DNA interactions on large DNA substrates without compromising measurements through application of hydrodynamic force. We provide a proof-of-principle demonstration that double-tethered DNA curtains made with nanofabricated rack patterns can be used in a one-dimensional diffusion assay that monitors the motion of quantum dot-tagged proteins along DNA. PMID:19736980
Nanofabricated racks of aligned and anchored DNA substrates for single-molecule imaging.
Gorman, Jason; Fazio, Teresa; Wang, Feng; Wind, Shalom; Greene, Eric C
2010-01-19
Single-molecule studies of biological macromolecules can benefit from new experimental platforms that facilitate experimental design and data acquisition. Here we develop new strategies to construct curtains of DNA in which the molecules are aligned with respect to one another and maintained in an extended configuration by anchoring both ends of the DNA to the surface of a microfluidic sample chamber that is otherwise coated with an inert lipid bilayer. This "double-tethered" DNA substrate configuration is established through the use of nanofabricated rack patterns comprised of two distinct functional elements: linear barriers to lipid diffusion that align DNA molecules anchored by one end to the bilayer and antibody-coated pentagons that provide immobile anchor points for the opposite ends of the DNA. These devices enable the alignment and anchoring of thousands of individual DNA molecules, which can then be visualized using total internal reflection fluorescence microscopy under conditions that do not require continuous application of buffer flow to stretch the DNA. This unique strategy offers the potential for studying protein-DNA interactions on large DNA substrates without compromising measurements through application of hydrodynamic force. We provide a proof-of-principle demonstration that double-tethered DNA curtains made with nanofabricated rack patterns can be used in a one-dimensional diffusion assay that monitors the motion of quantum dot-tagged proteins along DNA.
Spreadsheet-based program for alignment of overlapping DNA sequences.
Anbazhagan, R; Gabrielson, E
1999-06-01
Molecular biology laboratories frequently face the challenge of aligning small overlapping DNA sequences derived from a long DNA segment. Here, we present a short program that can be used to adapt Excel spreadsheets as a tool for aligning DNA sequences, regardless of their orientation. The program runs on any Windows or Macintosh operating system computer with Excel 97 or Excel 98. The program is available for use as an Excel file, which can be downloaded from the BioTechniques Web site. Upon execution, the program opens a specially designed customized workbook and is capable of identifying overlapping regions between two sequence fragments and displaying the sequence alignment. It also performs a number of specialized functions such as recognition of restriction enzyme cutting sites and CpG island mapping without costly specialized software.
DNAAlignEditor: DNA alignment editor tool
Sanchez-Villeda, Hector; Schroeder, Steven; Flint-Garcia, Sherry; Guill, Katherine E; Yamasaki, Masanori; McMullen, Michael D
2008-01-01
Background With advances in DNA re-sequencing methods and Next-Generation parallel sequencing approaches, there has been a large increase in genomic efforts to define and analyze the sequence variability present among individuals within a species. For very polymorphic species such as maize, this has lead to a need for intuitive, user-friendly software that aids the biologist, often with naïve programming capability, in tracking, editing, displaying, and exporting multiple individual sequence alignments. To fill this need we have developed a novel DNA alignment editor. Results We have generated a nucleotide sequence alignment editor (DNAAlignEditor) that provides an intuitive, user-friendly interface for manual editing of multiple sequence alignments with functions for input, editing, and output of sequence alignments. The color-coding of nucleotide identity and the display of associated quality score aids in the manual alignment editing process. DNAAlignEditor works as a client/server tool having two main components: a relational database that collects the processed alignments and a user interface connected to database through universal data access connectivity drivers. DNAAlignEditor can be used either as a stand-alone application or as a network application with multiple users concurrently connected. Conclusion We anticipate that this software will be of general interest to biologists and population genetics in editing DNA sequence alignments and analyzing natural sequence variation regardless of species, and will be particularly useful for manual alignment editing of sequences in species with high levels of polymorphism. PMID:18366684
Walder, Robert; Van Patten, William J; Adhikari, Ayush; Perkins, Thomas T
2018-01-23
Single-molecule force spectroscopy (SMFS) is a powerful technique to characterize the energy landscape of individual proteins, the mechanical properties of nucleic acids, and the strength of receptor-ligand interactions. Atomic force microscopy (AFM)-based SMFS benefits from ongoing progress in improving the precision and stability of cantilevers and the AFM itself. Underappreciated is that the accuracy of such AFM studies remains hindered by inadvertently stretching molecules at an angle while measuring only the vertical component of the force and extension, degrading both measurements. This inaccuracy is particularly problematic in AFM studies using double-stranded DNA and RNA due to their large persistence length (p ≈ 50 nm), often limiting such studies to other SMFS platforms (e.g., custom-built optical and magnetic tweezers). Here, we developed an automated algorithm that aligns the AFM tip above the DNA's attachment point to a coverslip. Importantly, this algorithm was performed at low force (10-20 pN) and relatively fast (15-25 s), preserving the connection between the tip and the target molecule. Our data revealed large uncorrected lateral offsets for 100 and 650 nm DNA molecules [24 ± 18 nm (mean ± standard deviation) and 180 ± 110 nm, respectively]. Correcting this offset yielded a 3-fold improvement in accuracy and precision when characterizing DNA's overstretching transition. We also demonstrated high throughput by acquiring 88 geometrically corrected force-extension curves of a single individual 100 nm DNA molecule in ∼40 min and versatility by aligning polyprotein- and PEG-based protein-ligand assays. Importantly, our software-based algorithm was implemented on a commercial AFM, so it can be broadly adopted. More generally, this work illustrates how to enhance AFM-based SMFS by developing more sophisticated data-acquisition protocols.
An optimized and low-cost FPGA-based DNA sequence alignment--a step towards personal genomics.
Shah, Hurmat Ali; Hasan, Laiq; Ahmad, Nasir
2013-01-01
DNA sequence alignment is a cardinal process in computational biology but also is much expensive computationally when performing through traditional computational platforms like CPU. Of many off the shelf platforms explored for speeding up the computation process, FPGA stands as the best candidate due to its performance per dollar spent and performance per watt. These two advantages make FPGA as the most appropriate choice for realizing the aim of personal genomics. The previous implementation of DNA sequence alignment did not take into consideration the price of the device on which optimization was performed. This paper presents optimization over previous FPGA implementation that increases the overall speed-up achieved as well as the price incurred by the platform that was optimized. The optimizations are (1) The array of processing elements is made to run on change in input value and not on clock, so eliminating the need for tight clock synchronization, (2) the implementation is unrestrained by the size of the sequences to be aligned, (3) the waiting time required for the sequences to load to FPGA is reduced to the minimum possible and (4) an efficient method is devised to store the output matrix that make possible to save the diagonal elements to be used in next pass, in parallel with the computation of output matrix. Implemented on Spartan3 FPGA, this implementation achieved 20 times performance improvement in terms of CUPS over GPP implementation.
Swain, Timothy D
2018-01-01
The recent rapid proliferation of novel taxon identification in the Zoanthidea has been accompanied by a parallel propagation of gene trees as a tool of species discovery, but not a corresponding increase in our understanding of phylogeny. This disparity is caused by the trade-off between the capabilities of automated DNA sequence alignment and data content of genes applied to phylogenetic inference in this group. Conserved genes or segments are easily aligned across the order, but produce poorly resolved trees; hypervariable genes or segments contain the evolutionary signal necessary for resolution and robust support, but sequence alignment is daunting. Staggered alignments are a form of phylogeny-informed sequence alignment composed of a mosaic of local and universal regions that allow phylogenetic inference to be applied to all nucleotides from both hypervariable and conserved gene segments. Comparisons between species tree phylogenies inferred from all data (staggered alignment) and hypervariable-excluded data (standard alignment) demonstrate improved confidence and greater topological agreement with other sources of data for the complete-data tree. This novel phylogeny is the most comprehensive to date (in terms of taxa and data) and can serve as an expandable tool for evolutionary hypothesis testing in the Zoanthidea. Spanish language abstract available in Text S1. Translation by L. O. Swain, DePaul University, Chicago, Illinois, 60604, USA. Copyright © 2017 Elsevier Inc. All rights reserved.
DNA Translator and Aligner: HyperCard utilities to aid phylogenetic analysis of molecules.
Eernisse, D J
1992-04-01
DNA Translator and Aligner are molecular phylogenetics HyperCard stacks for Macintosh computers. They manipulate sequence data to provide graphical gene mapping, conversions, translations and manual multiple-sequence alignment editing. DNA Translator is able to convert documented GenBank or EMBL documented sequences into linearized, rescalable gene maps whose gene sequences are extractable by clicking on the corresponding map button or by selection from a scrolling list. Provided gene maps, complete with extractable sequences, consist of nine metazoan, one yeast, and one ciliate mitochondrial DNAs and three green plant chloroplast DNAs. Single or multiple sequences can be manipulated to aid in phylogenetic analysis. Sequences can be translated between nucleic acids and proteins in either direction with flexible support of alternate genetic codes and ambiguous nucleotide symbols. Multiple aligned sequence output from diverse sources can be converted to Nexus, Hennig86 or PHYLIP format for subsequent phylogenetic analysis. Input or output alignments can be examined with Aligner, a convenient accessory stack included in the DNA Translator package. Aligner is an editor for the manual alignment of up to 100 sequences that toggles between display of matched characters and normal unmatched sequences. DNA Translator also generates graphic displays of amino acid coding and codon usage frequency relative to all other, or only synonymous, codons for approximately 70 select organism-organelle combinations. Codon usage data is compatible with spreadsheet or UWGCG formats for incorporation of additional molecules of interest. The complete package is available via anonymous ftp and is free for non-commercial uses.
Training alignment parameters for arbitrary sequencers with LAST-TRAIN.
Hamada, Michiaki; Ono, Yukiteru; Asai, Kiyoshi; Frith, Martin C
2017-03-15
LAST-TRAIN improves sequence alignment accuracy by inferring substitution and gap scores that fit the frequencies of substitutions, insertions, and deletions in a given dataset. We have applied it to mapping DNA reads from IonTorrent and PacBio RS, and we show that it reduces reference bias for Oxford Nanopore reads. the source code is freely available at http://last.cbrc.jp/. mhamada@waseda.jp or mcfrith@edu.k.u-tokyo.ac.jp. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Johnson, LeeAnn K; Brown, Mary B; Carruthers, Ethan A; Ferguson, John A; Dombek, Priscilla E; Sadowsky, Michael J
2004-08-01
A horizontal, fluorophore-enhanced, repetitive extragenic palindromic-PCR (rep-PCR) DNA fingerprinting technique (HFERP) was developed and evaluated as a means to differentiate human from animal sources of Escherichia coli. Box A1R primers and PCR were used to generate 2,466 rep-PCR and 1,531 HFERP DNA fingerprints from E. coli strains isolated from fecal material from known human and 12 animal sources: dogs, cats, horses, deer, geese, ducks, chickens, turkeys, cows, pigs, goats, and sheep. HFERP DNA fingerprinting reduced within-gel grouping of DNA fingerprints and improved alignment of DNA fingerprints between gels, relative to that achieved using rep-PCR DNA fingerprinting. Jackknife analysis of the complete rep-PCR DNA fingerprint library, done using Pearson's product-moment correlation coefficient, indicated that animal and human isolates were assigned to the correct source groups with an 82.2% average rate of correct classification. However, when only unique isolates were examined, isolates from a single animal having a unique DNA fingerprint, Jackknife analysis showed that isolates were assigned to the correct source groups with a 60.5% average rate of correct classification. The percentages of correctly classified isolates were about 15 and 17% greater for rep-PCR and HFERP, respectively, when analyses were done using the curve-based Pearson's product-moment correlation coefficient, rather than the band-based Jaccard algorithm. Rarefaction analysis indicated that, despite the relatively large size of the known-source database, genetic diversity in E. coli was very great and is most likely accounting for our inability to correctly classify many environmental E. coli isolates. Our data indicate that removal of duplicate genotypes within DNA fingerprint libraries, increased database size, proper methods of statistical analysis, and correct alignment of band data within and between gels improve the accuracy of microbial source tracking methods.
Potential benefits from using a new reference map in genomic prediction
USDA-ARS?s Scientific Manuscript database
Many genomic studies in cattle have used the 2009 reference assembly from the University of Maryland (UMD3.1). A new USDA Agricultural Research Service-University of California, Davis (ARS-UCD) assembly based on longer DNA reads from the same cow (Dominette) should improve sequence alignment, imputa...
Parson, W; Gusmão, L; Hares, D R; Irwin, J A; Mayr, W R; Morling, N; Pokorak, E; Prinz, M; Salas, A; Schneider, P M; Parsons, T J
2014-11-01
The DNA Commission of the International Society of Forensic Genetics (ISFG) regularly publishes guidelines and recommendations concerning the application of DNA polymorphisms to the question of human identification. Previous recommendations published in 2000 addressed the analysis and interpretation of mitochondrial DNA (mtDNA) in forensic casework. While the foundations set forth in the earlier recommendations still apply, new approaches to the quality control, alignment and nomenclature of mitochondrial sequences, as well as the establishment of mtDNA reference population databases, have been developed. Here, we describe these developments and discuss their application to both mtDNA casework and mtDNA reference population databasing applications. While the generation of mtDNA for forensic casework has always been guided by specific standards, it is now well-established that data of the same quality are required for the mtDNA reference population data used to assess the statistical weight of the evidence. As a result, we introduce guidelines regarding sequence generation, as well as quality control measures based on the known worldwide mtDNA phylogeny, that can be applied to ensure the highest quality population data possible. For both casework and reference population databasing applications, the alignment and nomenclature of haplotypes is revised here and the phylogenetic alignment proffered as acceptable standard. In addition, the interpretation of heteroplasmy in the forensic context is updated, and the utility of alignment-free database searches for unbiased probability estimates is highlighted. Finally, we discuss statistical issues and define minimal standards for mtDNA database searches. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
DNA sequence+shape kernel enables alignment-free modeling of transcription factor binding.
Ma, Wenxiu; Yang, Lin; Rohs, Remo; Noble, William Stafford
2017-10-01
Transcription factors (TFs) bind to specific DNA sequence motifs. Several lines of evidence suggest that TF-DNA binding is mediated in part by properties of the local DNA shape: the width of the minor groove, the relative orientations of adjacent base pairs, etc. Several methods have been developed to jointly account for DNA sequence and shape properties in predicting TF binding affinity. However, a limitation of these methods is that they typically require a training set of aligned TF binding sites. We describe a sequence + shape kernel that leverages DNA sequence and shape information to better understand protein-DNA binding preference and affinity. This kernel extends an existing class of k-mer based sequence kernels, based on the recently described di-mismatch kernel. Using three in vitro benchmark datasets, derived from universal protein binding microarrays (uPBMs), genomic context PBMs (gcPBMs) and SELEX-seq data, we demonstrate that incorporating DNA shape information improves our ability to predict protein-DNA binding affinity. In particular, we observe that (i) the k-spectrum + shape model performs better than the classical k-spectrum kernel, particularly for small k values; (ii) the di-mismatch kernel performs better than the k-mer kernel, for larger k; and (iii) the di-mismatch + shape kernel performs better than the di-mismatch kernel for intermediate k values. The software is available at https://bitbucket.org/wenxiu/sequence-shape.git. rohs@usc.edu or william-noble@uw.edu. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
DNA sequence alignment by microhomology sampling during homologous recombination
Qi, Zhi; Redding, Sy; Lee, Ja Yil; Gibb, Bryan; Kwon, YoungHo; Niu, Hengyao; Gaines, William A.; Sung, Patrick
2015-01-01
Summary Homologous recombination (HR) mediates the exchange of genetic information between sister or homologous chromatids. During HR, members of the RecA/Rad51 family of recombinases must somehow search through vast quantities of DNA sequence to align and pair ssDNA with a homologous dsDNA template. Here we use single-molecule imaging to visualize Rad51 as it aligns and pairs homologous DNA sequences in real-time. We show that Rad51 uses a length-based recognition mechanism while interrogating dsDNA, enabling robust kinetic selection of 8-nucleotide (nt) tracts of microhomology, which kinetically confines the search to sites with a high probability of being a homologous target. Successful pairing with a 9th nucleotide coincides with an additional reduction in binding free energy and subsequent strand exchange occurs in precise 3-nt steps, reflecting the base triplet organization of the presynaptic complex. These findings provide crucial new insights into the physical and evolutionary underpinnings of DNA recombination. PMID:25684365
Elasticity-mediated nematiclike bacterial organization in model extracellular DNA matrix.
Smalyukh, Ivan I; Butler, John; Shrout, Joshua D; Parsek, Matthew R; Wong, Gerard C L
2008-09-01
DNA is a common extracellular matrix component of bacterial biofilms. We find that bacteria can spontaneously order in a matrix of aligned concentrated DNA, in which rod-shaped cells of Pseudomonas aeruginosa follow the orientation of extended DNA chains. The alignment of bacteria is ensured by elasticity and liquid crystalline properties of the DNA matrix. These findings show how behavior of planktonic bacteria may be modified in extracellular polymeric substances of biofilms and illustrate the potential of using complex fluids to manipulate embedded nanosized and microsized active particles.
2010-01-01
Background The vast sequence divergence among different virus groups has presented a great challenge to alignment-based analysis of virus phylogeny. Due to the problems caused by the uncertainty in alignment, existing tools for phylogenetic analysis based on multiple alignment could not be directly applied to the whole-genome comparison and phylogenomic studies of viruses. There has been a growing interest in alignment-free methods for phylogenetic analysis using complete genome data. Among the alignment-free methods, a dynamical language (DL) method proposed by our group has successfully been applied to the phylogenetic analysis of bacteria and chloroplast genomes. Results In this paper, the DL method is used to analyze the whole-proteome phylogeny of 124 large dsDNA viruses and 30 parvoviruses, two data sets with large difference in genome size. The trees from our analyses are in good agreement to the latest classification of large dsDNA viruses and parvoviruses by the International Committee on Taxonomy of Viruses (ICTV). Conclusions The present method provides a new way for recovering the phylogeny of large dsDNA viruses and parvoviruses, and also some insights on the affiliation of a number of unclassified viruses. In comparison, some alignment-free methods such as the CV Tree method can be used for recovering the phylogeny of large dsDNA viruses, but they are not suitable for resolving the phylogeny of parvoviruses with a much smaller genome size. PMID:20565983
Fan, Long; Hui, Jerome H L; Yu, Zu Guo; Chu, Ka Hou
2014-07-01
Species identification based on short sequences of DNA markers, that is, DNA barcoding, has emerged as an integral part of modern taxonomy. However, software for the analysis of large and multilocus barcoding data sets is scarce. The Basic Local Alignment Search Tool (BLAST) is currently the fastest tool capable of handling large databases (e.g. >5000 sequences), but its accuracy is a concern and has been criticized for its local optimization. However, current more accurate software requires sequence alignment or complex calculations, which are time-consuming when dealing with large data sets during data preprocessing or during the search stage. Therefore, it is imperative to develop a practical program for both accurate and scalable species identification for DNA barcoding. In this context, we present VIP Barcoding: a user-friendly software in graphical user interface for rapid DNA barcoding. It adopts a hybrid, two-stage algorithm. First, an alignment-free composition vector (CV) method is utilized to reduce searching space by screening a reference database. The alignment-based K2P distance nearest-neighbour method is then employed to analyse the smaller data set generated in the first stage. In comparison with other software, we demonstrate that VIP Barcoding has (i) higher accuracy than Blastn and several alignment-free methods and (ii) higher scalability than alignment-based distance methods and character-based methods. These results suggest that this platform is able to deal with both large-scale and multilocus barcoding data with accuracy and can contribute to DNA barcoding for modern taxonomy. VIP Barcoding is free and available at http://msl.sls.cuhk.edu.hk/vipbarcoding/. © 2014 John Wiley & Sons Ltd.
DNA Nanotubes for NMR Structure Determination of Membrane Proteins
Bellot, Gaëtan; McClintock, Mark A.; Chou, James J; Shih, William M.
2013-01-01
Structure determination of integral membrane proteins by solution NMR represents one of the most important challenges of structural biology. A Residual-Dipolar-Coupling-based refinement approach can be used to solve the structure of membrane proteins up to 40 kDa in size, however, a weak-alignment medium that is detergent-resistant is required. Previously, availability of media suitable for weak alignment of membrane proteins was severely limited. We describe here a protocol for robust, large-scale synthesis of detergent-resistant DNA nanotubes that can be assembled into dilute liquid crystals for application as weak-alignment media in solution NMR structure determination of membrane proteins in detergent micelles. The DNA nanotubes are heterodimers of 400nm-long six-helix bundles each self-assembled from a M13-based p7308 scaffold strand and >170 short oligonucleotide staple strands. Compatibility with proteins bearing considerable positive charge as well as modulation of molecular alignment, towards collection of linearly independent restraints, can be introduced by reducing the negative charge of DNA nanotubes via counter ions and small DNA binding molecules. This detergent-resistant liquid-crystal media offers a number of properties conducive for membrane protein alignment, including high-yield production, thermal stability, buffer compatibility, and structural programmability. Production of sufficient nanotubes for 4–5 NMR experiments can be completed in one week by a single individual. PMID:23518667
DNA nanotubes for NMR structure determination of membrane proteins.
Bellot, Gaëtan; McClintock, Mark A; Chou, James J; Shih, William M
2013-04-01
Finding a way to determine the structures of integral membrane proteins using solution nuclear magnetic resonance (NMR) spectroscopy has proved to be challenging. A residual-dipolar-coupling-based refinement approach can be used to resolve the structure of membrane proteins up to 40 kDa in size, but to do this you need a weak-alignment medium that is detergent-resistant and it has thus far been difficult to obtain such a medium suitable for weak alignment of membrane proteins. We describe here a protocol for robust, large-scale synthesis of detergent-resistant DNA nanotubes that can be assembled into dilute liquid crystals for application as weak-alignment media in solution NMR structure determination of membrane proteins in detergent micelles. The DNA nanotubes are heterodimers of 400-nm-long six-helix bundles, each self-assembled from a M13-based p7308 scaffold strand and >170 short oligonucleotide staple strands. Compatibility with proteins bearing considerable positive charge as well as modulation of molecular alignment, toward collection of linearly independent restraints, can be introduced by reducing the negative charge of DNA nanotubes using counter ions and small DNA-binding molecules. This detergent-resistant liquid-crystal medium offers a number of properties conducive for membrane protein alignment, including high-yield production, thermal stability, buffer compatibility and structural programmability. Production of sufficient nanotubes for four or five NMR experiments can be completed in 1 week by a single individual.
Simultaneous phylogeny reconstruction and multiple sequence alignment
Yue, Feng; Shi, Jian; Tang, Jijun
2009-01-01
Background A phylogeny is the evolutionary history of a group of organisms. To date, sequence data is still the most used data type for phylogenetic reconstruction. Before any sequences can be used for phylogeny reconstruction, they must be aligned, and the quality of the multiple sequence alignment has been shown to affect the quality of the inferred phylogeny. At the same time, all the current multiple sequence alignment programs use a guide tree to produce the alignment and experiments showed that good guide trees can significantly improve the multiple alignment quality. Results We devise a new algorithm to simultaneously align multiple sequences and search for the phylogenetic tree that leads to the best alignment. We also implemented the algorithm as a C program package, which can handle both DNA and protein data and can take simple cost model as well as complex substitution matrices, such as PAM250 or BLOSUM62. The performance of the new method are compared with those from other popular multiple sequence alignment tools, including the widely used programs such as ClustalW and T-Coffee. Experimental results suggest that this method has good performance in terms of both phylogeny accuracy and alignment quality. Conclusion We present an algorithm to align multiple sequences and reconstruct the phylogenies that minimize the alignment score, which is based on an efficient algorithm to solve the median problems for three sequences. Our extensive experiments suggest that this method is very promising and can produce high quality phylogenies and alignments. PMID:19208110
Widespread recombination in published animal mtDNA sequences.
Tsaousis, A D; Martin, D P; Ladoukakis, E D; Posada, D; Zouros, E
2005-04-01
Mitochondrial DNA (mtDNA) recombination has been observed in several animal species, but there are doubts as to whether it is common or only occurs under special circumstances. Animal mtDNA sequences retrieved from public databases were unambiguously aligned and rigorously tested for evidence of recombination. At least 30 recombination events were detected among 186 alignments examined. Recombinant sequences were found in invertebrates and vertebrates, including primates. It appears that mtDNA recombination may occur regularly in the animal cell but rarely produces new haplotypes because of homoplasmy. Common animal mtDNA recombination would necessitate a reexamination of phylogenetic and biohistorical inference based on the assumption of clonal mtDNA transmission. Recombination may also have an important role in producing and purging mtDNA mutations and thus in mtDNA-based diseases and senescence.
CMSA: a heterogeneous CPU/GPU computing system for multiple similar RNA/DNA sequence alignment.
Chen, Xi; Wang, Chen; Tang, Shanjiang; Yu, Ce; Zou, Quan
2017-06-24
The multiple sequence alignment (MSA) is a classic and powerful technique for sequence analysis in bioinformatics. With the rapid growth of biological datasets, MSA parallelization becomes necessary to keep its running time in an acceptable level. Although there are a lot of work on MSA problems, their approaches are either insufficient or contain some implicit assumptions that limit the generality of usage. First, the information of users' sequences, including the sizes of datasets and the lengths of sequences, can be of arbitrary values and are generally unknown before submitted, which are unfortunately ignored by previous work. Second, the center star strategy is suited for aligning similar sequences. But its first stage, center sequence selection, is highly time-consuming and requires further optimization. Moreover, given the heterogeneous CPU/GPU platform, prior studies consider the MSA parallelization on GPU devices only, making the CPUs idle during the computation. Co-run computation, however, can maximize the utilization of the computing resources by enabling the workload computation on both CPU and GPU simultaneously. This paper presents CMSA, a robust and efficient MSA system for large-scale datasets on the heterogeneous CPU/GPU platform. It performs and optimizes multiple sequence alignment automatically for users' submitted sequences without any assumptions. CMSA adopts the co-run computation model so that both CPU and GPU devices are fully utilized. Moreover, CMSA proposes an improved center star strategy that reduces the time complexity of its center sequence selection process from O(mn 2 ) to O(mn). The experimental results show that CMSA achieves an up to 11× speedup and outperforms the state-of-the-art software. CMSA focuses on the multiple similar RNA/DNA sequence alignment and proposes a novel bitmap based algorithm to improve the center star strategy. We can conclude that harvesting the high performance of modern GPU is a promising approach to accelerate multiple sequence alignment. Besides, adopting the co-run computation model can maximize the entire system utilization significantly. The source code is available at https://github.com/wangvsa/CMSA .
MSuPDA: A Memory Efficient Algorithm for Sequence Alignment.
Khan, Mohammad Ibrahim; Kamal, Md Sarwar; Chowdhury, Linkon
2016-03-01
Space complexity is a million dollar question in DNA sequence alignments. In this regard, memory saving under pushdown automata can help to reduce the occupied spaces in computer memory. Our proposed process is that anchor seed (AS) will be selected from given data set of nucleotide base pairs for local sequence alignment. Quick splitting techniques will separate the AS from all the DNA genome segments. Selected AS will be placed to pushdown automata's (PDA) input unit. Whole DNA genome segments will be placed into PDA's stack. AS from input unit will be matched with the DNA genome segments from stack of PDA. Match, mismatch and indel of nucleotides will be popped from the stack under the control unit of pushdown automata. During the POP operation on stack, it will free the memory cell occupied by the nucleotide base pair.
Acceleration of short and long DNA read mapping without loss of accuracy using suffix array.
Tárraga, Joaquín; Arnau, Vicente; Martínez, Héctor; Moreno, Raul; Cazorla, Diego; Salavert-Torres, José; Blanquer-Espert, Ignacio; Dopazo, Joaquín; Medina, Ignacio
2014-12-01
HPG Aligner applies suffix arrays for DNA read mapping. This implementation produces a highly sensitive and extremely fast mapping of DNA reads that scales up almost linearly with read length. The approach presented here is faster (over 20× for long reads) and more sensitive (over 98% in a wide range of read lengths) than the current state-of-the-art mappers. HPG Aligner is not only an optimal alternative for current sequencers but also the only solution available to cope with longer reads and growing throughputs produced by forthcoming sequencing technologies. https://github.com/opencb/hpg-aligner. © The Author 2014. Published by Oxford University Press.
JavaScript DNA translator: DNA-aligned protein translations.
Perry, William L
2002-12-01
There are many instances in molecular biology when it is necessary to identify ORFs in a DNA sequence. While programs exist for displaying protein translations in multiple ORFs in alignment with a DNA sequence, they are often expensive, exist as add-ons to software that must be purchased, or are only compatible with a particular operating system. JavaScript DNA Translator is a shareware application written in JavaScript, a scripting language interpreted by the Netscape Communicator and Internet Explorer Web browsers, which makes it compatible with several different operating systems. While the program uses a familiar Web page interface, it requires no connection to the Internet since calculations are performed on the user's own computer. The program analyzes one or multiple DNA sequences and generates translations in up to six reading frames aligned to a DNA sequence, in addition to displaying translations as separate sequences in FASTA format. ORFs within a reading frame can also be displayed as separate sequences. Flexible formatting options are provided, including the ability to hide ORFs below a minimum size specified by the user. The program is available free of charge at the BioTechniques Software Library (www.Biotechniques.com).
MSuPDA: A memory efficient algorithm for sequence alignment.
Khan, Mohammad Ibrahim; Kamal, Md Sarwar; Chowdhury, Linkon
2015-01-16
Space complexity is a million dollar question in DNA sequence alignments. In this regards, MSuPDA (Memory Saving under Pushdown Automata) can help to reduce the occupied spaces in computer memory. Our proposed process is that Anchor Seed (AS) will be selected from given data set of Nucleotides base pairs for local sequence alignment. Quick Splitting (QS) techniques will separate the Anchor Seed from all the DNA genome segments. Selected Anchor Seed will be placed to pushdown Automata's (PDA) input unit. Whole DNA genome segments will be placed into PDA's stack. Anchor Seed from input unit will be matched with the DNA genome segments from stack of PDA. Whatever matches, mismatches or Indel, of Nucleotides will be POP from the stack under the control of control unit of Pushdown Automata. During the POP operation on stack it will free the memory cell occupied by the Nucleotide base pair.
BarraCUDA - a fast short read sequence aligner using graphics processing units
2012-01-01
Background With the maturation of next-generation DNA sequencing (NGS) technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence. Findings Using the NVIDIA Compute Unified Device Architecture (CUDA) software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput. Conclusions BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology. BarraCUDA is currently available from http://seqbarracuda.sf.net PMID:22244497
HTSFinder: Powerful Pipeline of DNA Signature Discovery by Parallel and Distributed Computing
Karimi, Ramin; Hajdu, Andras
2016-01-01
Comprehensive effort for low-cost sequencing in the past few years has led to the growth of complete genome databases. In parallel with this effort, a strong need, fast and cost-effective methods and applications have been developed to accelerate sequence analysis. Identification is the very first step of this task. Due to the difficulties, high costs, and computational challenges of alignment-based approaches, an alternative universal identification method is highly required. Like an alignment-free approach, DNA signatures have provided new opportunities for the rapid identification of species. In this paper, we present an effective pipeline HTSFinder (high-throughput signature finder) with a corresponding k-mer generator GkmerG (genome k-mers generator). Using this pipeline, we determine the frequency of k-mers from the available complete genome databases for the detection of extensive DNA signatures in a reasonably short time. Our application can detect both unique and common signatures in the arbitrarily selected target and nontarget databases. Hadoop and MapReduce as parallel and distributed computing tools with commodity hardware are used in this pipeline. This approach brings the power of high-performance computing into the ordinary desktop personal computers for discovering DNA signatures in large databases such as bacterial genome. A considerable number of detected unique and common DNA signatures of the target database bring the opportunities to improve the identification process not only for polymerase chain reaction and microarray assays but also for more complex scenarios such as metagenomics and next-generation sequencing analysis. PMID:26884678
HTSFinder: Powerful Pipeline of DNA Signature Discovery by Parallel and Distributed Computing.
Karimi, Ramin; Hajdu, Andras
2016-01-01
Comprehensive effort for low-cost sequencing in the past few years has led to the growth of complete genome databases. In parallel with this effort, a strong need, fast and cost-effective methods and applications have been developed to accelerate sequence analysis. Identification is the very first step of this task. Due to the difficulties, high costs, and computational challenges of alignment-based approaches, an alternative universal identification method is highly required. Like an alignment-free approach, DNA signatures have provided new opportunities for the rapid identification of species. In this paper, we present an effective pipeline HTSFinder (high-throughput signature finder) with a corresponding k-mer generator GkmerG (genome k-mers generator). Using this pipeline, we determine the frequency of k-mers from the available complete genome databases for the detection of extensive DNA signatures in a reasonably short time. Our application can detect both unique and common signatures in the arbitrarily selected target and nontarget databases. Hadoop and MapReduce as parallel and distributed computing tools with commodity hardware are used in this pipeline. This approach brings the power of high-performance computing into the ordinary desktop personal computers for discovering DNA signatures in large databases such as bacterial genome. A considerable number of detected unique and common DNA signatures of the target database bring the opportunities to improve the identification process not only for polymerase chain reaction and microarray assays but also for more complex scenarios such as metagenomics and next-generation sequencing analysis.
DNA barcoding the native flowering plants and conifers of Wales.
de Vere, Natasha; Rich, Tim C G; Ford, Col R; Trinder, Sarah A; Long, Charlotte; Moore, Chris W; Satterthwaite, Danielle; Davies, Helena; Allainguillaume, Joel; Ronca, Sandra; Tatarinova, Tatiana; Garbett, Hannah; Walker, Kevin; Wilkinson, Mike J
2012-01-01
We present the first national DNA barcode resource that covers the native flowering plants and conifers for the nation of Wales (1143 species). Using the plant DNA barcode markers rbcL and matK, we have assembled 97.7% coverage for rbcL, 90.2% for matK, and a dual-locus barcode for 89.7% of the native Welsh flora. We have sampled multiple individuals for each species, resulting in 3304 rbcL and 2419 matK sequences. The majority of our samples (85%) are from DNA extracted from herbarium specimens. Recoverability of DNA barcodes is lower using herbarium specimens, compared to freshly collected material, mostly due to lower amplification success, but this is balanced by the increased efficiency of sampling species that have already been collected, identified, and verified by taxonomic experts. The effectiveness of the DNA barcodes for identification (level of discrimination) is assessed using four approaches: the presence of a barcode gap (using pairwise and multiple alignments), formation of monophyletic groups using Neighbour-Joining trees, and sequence similarity in BLASTn searches. These approaches yield similar results, providing relative discrimination levels of 69.4 to 74.9% of all species and 98.6 to 99.8% of genera using both markers. Species discrimination can be further improved using spatially explicit sampling. Mean species discrimination using barcode gap analysis (with a multiple alignment) is 81.6% within 10×10 km squares and 93.3% for 2×2 km squares. Our database of DNA barcodes for Welsh native flowering plants and conifers represents the most complete coverage of any national flora, and offers a valuable platform for a wide range of applications that require accurate species identification.
DNA Barcoding the Native Flowering Plants and Conifers of Wales
de Vere, Natasha; Rich, Tim C. G.; Ford, Col R.; Trinder, Sarah A.; Long, Charlotte; Moore, Chris W.; Satterthwaite, Danielle; Davies, Helena; Allainguillaume, Joel; Ronca, Sandra; Tatarinova, Tatiana; Garbett, Hannah; Walker, Kevin; Wilkinson, Mike J.
2012-01-01
We present the first national DNA barcode resource that covers the native flowering plants and conifers for the nation of Wales (1143 species). Using the plant DNA barcode markers rbcL and matK, we have assembled 97.7% coverage for rbcL, 90.2% for matK, and a dual-locus barcode for 89.7% of the native Welsh flora. We have sampled multiple individuals for each species, resulting in 3304 rbcL and 2419 matK sequences. The majority of our samples (85%) are from DNA extracted from herbarium specimens. Recoverability of DNA barcodes is lower using herbarium specimens, compared to freshly collected material, mostly due to lower amplification success, but this is balanced by the increased efficiency of sampling species that have already been collected, identified, and verified by taxonomic experts. The effectiveness of the DNA barcodes for identification (level of discrimination) is assessed using four approaches: the presence of a barcode gap (using pairwise and multiple alignments), formation of monophyletic groups using Neighbour-Joining trees, and sequence similarity in BLASTn searches. These approaches yield similar results, providing relative discrimination levels of 69.4 to 74.9% of all species and 98.6 to 99.8% of genera using both markers. Species discrimination can be further improved using spatially explicit sampling. Mean species discrimination using barcode gap analysis (with a multiple alignment) is 81.6% within 10×10 km squares and 93.3% for 2×2 km squares. Our database of DNA barcodes for Welsh native flowering plants and conifers represents the most complete coverage of any national flora, and offers a valuable platform for a wide range of applications that require accurate species identification. PMID:22701588
NASA Astrophysics Data System (ADS)
Reifenberger, Jeffrey; Dorfman, Kevin; Cao, Han
Human DNA is a not a polymer consisting of a uniform distribution of all 4 nucleic acids, but rather contains regions of high AT and high GC content. When confined, these regions could have different stretch due to the extra hydrogen bond present in the GC basepair. To measure this potential difference, human genomic DNA was nicked with NtBspQI, labeled with a cy3 like fluorophore at the nick site, stained with YOYO, loaded into a device containing an array of nanochannels, and imaged. Over 473,000 individual molecules of DNA, corresponding to roughly 30x coverage of a human genome, were collected and aligned to the human reference. Based on the known AT/GC content between aligned pairs of labels, the stretch was measured for regions of similar size but different AT/GC content. We found that regions of high GC content were consistently more stretched than regions of high AT content between pairs of labels varying in size between 2.5 kbp and 500 kbp. We measured that for every 1% increase in GC content there was roughly a 0.06% increase in stretch. While this effect is small, it is important to take into account differences in stretch between AT and GC rich regions to improve the sensitivity of detection of structural variations from genomic variations. NIH Grant: R01-HG006851.
Scherer, N M; Basso, D M
2008-09-16
DNATagger is a web-based tool for coloring and editing DNA, RNA and protein sequences and alignments. It is dedicated to the visualization of protein coding sequences and also protein sequence alignments to facilitate the comprehension of evolutionary processes in sequence analysis. The distinctive feature of DNATagger is the use of codons as informative units for coloring DNA and RNA sequences. The codons are colored according to their corresponding amino acids. It is the first program that colors codons in DNA sequences without being affected by "out-of-frame" gaps of alignments. It can handle single gaps and gaps inside the triplets. The program also provides the possibility to edit the alignments and change color patterns and translation tables. DNATagger is a JavaScript application, following the W3C guidelines, designed to work on standards-compliant web browsers. It therefore requires no installation and is platform independent. The web-based DNATagger is available as free and open source software at http://www.inf.ufrgs.br/~dmbasso/dnatagger/.
Akopiants, Konstantin; Zhou, Rui-Zhe; Mohapatra, Susovan; Valerie, Kristoffer; Lees-Miller, Susan P; Lee, Kyung-Jong; Chen, David J; Revy, Patrick; de Villartay, Jean-Pierre; Povirk, Lawrence F
2009-07-01
XLF/Cernunnos is a core protein of the nonhomologous end-joining pathway of DNA double-strand break repair. To better define the role of Cernunnos in end joining, whole-cell extracts were prepared from Cernunnos-deficient human cells. These extracts effected little joining of DNA ends with cohesive 5' or 3' overhangs, and no joining at all of partially complementary 3' overhangs that required gap filling prior to ligation. Assays in which gap-filled but unligated intermediates were trapped using dideoxynucleotides revealed that there was no gap filling on aligned DSB ends in the Cernunnos-deficient extracts. Recombinant Cernunnos protein restored gap filling and end joining of partially complementary overhangs, and stimulated joining of cohesive ends more than twentyfold. XLF-dependent gap filling was nearly eliminated by immunodepletion of DNA polymerase lambda, but was restored by addition of either polymerase lambda or polymerase mu. Thus, Cernunnos is essential for gap filling by either polymerase during nonhomologous end joining, suggesting that it plays a major role in aligning the two DNA ends in the repair complex.
King, Brian R; Aburdene, Maurice; Thompson, Alex; Warres, Zach
2014-01-01
Digital signal processing (DSP) techniques for biological sequence analysis continue to grow in popularity due to the inherent digital nature of these sequences. DSP methods have demonstrated early success for detection of coding regions in a gene. Recently, these methods are being used to establish DNA gene similarity. We present the inter-coefficient difference (ICD) transformation, a novel extension of the discrete Fourier transformation, which can be applied to any DNA sequence. The ICD method is a mathematical, alignment-free DNA comparison method that generates a genetic signature for any DNA sequence that is used to generate relative measures of similarity among DNA sequences. We demonstrate our method on a set of insulin genes obtained from an evolutionarily wide range of species, and on a set of avian influenza viral sequences, which represents a set of highly similar sequences. We compare phylogenetic trees generated using our technique against trees generated using traditional alignment techniques for similarity and demonstrate that the ICD method produces a highly accurate tree without requiring an alignment prior to establishing sequence similarity.
Borozan, Ivan; Watt, Stuart; Ferretti, Vincent
2015-05-01
Alignment-based sequence similarity searches, while accurate for some type of sequences, can produce incorrect results when used on more divergent but functionally related sequences that have undergone the sequence rearrangements observed in many bacterial and viral genomes. Here, we propose a classification model that exploits the complementary nature of alignment-based and alignment-free similarity measures with the aim to improve the accuracy with which DNA and protein sequences are characterized. Our model classifies sequences using a combined sequence similarity score calculated by adaptively weighting the contribution of different sequence similarity measures. Weights are determined independently for each sequence in the test set and reflect the discriminatory ability of individual similarity measures in the training set. Because the similarity between some sequences is determined more accurately with one type of measure rather than another, our classifier allows different sets of weights to be associated with different sequences. Using five different similarity measures, we show that our model significantly improves the classification accuracy over the current composition- and alignment-based models, when predicting the taxonomic lineage for both short viral sequence fragments and complete viral sequences. We also show that our model can be used effectively for the classification of reads from a real metagenome dataset as well as protein sequences. All the datasets and the code used in this study are freely available at https://collaborators.oicr.on.ca/vferretti/borozan_csss/csss.html. ivan.borozan@gmail.com Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
Borozan, Ivan; Watt, Stuart; Ferretti, Vincent
2015-01-01
Motivation: Alignment-based sequence similarity searches, while accurate for some type of sequences, can produce incorrect results when used on more divergent but functionally related sequences that have undergone the sequence rearrangements observed in many bacterial and viral genomes. Here, we propose a classification model that exploits the complementary nature of alignment-based and alignment-free similarity measures with the aim to improve the accuracy with which DNA and protein sequences are characterized. Results: Our model classifies sequences using a combined sequence similarity score calculated by adaptively weighting the contribution of different sequence similarity measures. Weights are determined independently for each sequence in the test set and reflect the discriminatory ability of individual similarity measures in the training set. Because the similarity between some sequences is determined more accurately with one type of measure rather than another, our classifier allows different sets of weights to be associated with different sequences. Using five different similarity measures, we show that our model significantly improves the classification accuracy over the current composition- and alignment-based models, when predicting the taxonomic lineage for both short viral sequence fragments and complete viral sequences. We also show that our model can be used effectively for the classification of reads from a real metagenome dataset as well as protein sequences. Availability and implementation: All the datasets and the code used in this study are freely available at https://collaborators.oicr.on.ca/vferretti/borozan_csss/csss.html. Contact: ivan.borozan@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25573913
DNA motif alignment by evolving a population of Markov chains.
Bi, Chengpeng
2009-01-30
Deciphering cis-regulatory elements or de novo motif-finding in genomes still remains elusive although much algorithmic effort has been expended. The Markov chain Monte Carlo (MCMC) method such as Gibbs motif samplers has been widely employed to solve the de novo motif-finding problem through sequence local alignment. Nonetheless, the MCMC-based motif samplers still suffer from local maxima like EM. Therefore, as a prerequisite for finding good local alignments, these motif algorithms are often independently run a multitude of times, but without information exchange between different chains. Hence it would be worth a new algorithm design enabling such information exchange. This paper presents a novel motif-finding algorithm by evolving a population of Markov chains with information exchange (PMC), each of which is initialized as a random alignment and run by the Metropolis-Hastings sampler (MHS). It is progressively updated through a series of local alignments stochastically sampled. Explicitly, the PMC motif algorithm performs stochastic sampling as specified by a population-based proposal distribution rather than individual ones, and adaptively evolves the population as a whole towards a global maximum. The alignment information exchange is accomplished by taking advantage of the pooled motif site distributions. A distinct method for running multiple independent Markov chains (IMC) without information exchange, or dubbed as the IMC motif algorithm, is also devised to compare with its PMC counterpart. Experimental studies demonstrate that the performance could be improved if pooled information were used to run a population of motif samplers. The new PMC algorithm was able to improve the convergence and outperformed other popular algorithms tested using simulated and biological motif sequences.
Kim, Jeremie S; Senol Cali, Damla; Xin, Hongyi; Lee, Donghyuk; Ghose, Saugata; Alser, Mohammed; Hassan, Hasan; Ergin, Oguz; Alkan, Can; Mutlu, Onur
2018-05-09
Seed location filtering is critical in DNA read mapping, a process where billions of DNA fragments (reads) sampled from a donor are mapped onto a reference genome to identify genomic variants of the donor. State-of-the-art read mappers 1) quickly generate possible mapping locations for seeds (i.e., smaller segments) within each read, 2) extract reference sequences at each of the mapping locations, and 3) check similarity between each read and its associated reference sequences with a computationally-expensive algorithm (i.e., sequence alignment) to determine the origin of the read. A seed location filter comes into play before alignment, discarding seed locations that alignment would deem a poor match. The ideal seed location filter would discard all poor match locations prior to alignment such that there is no wasted computation on unnecessary alignments. We propose a novel seed location filtering algorithm, GRIM-Filter, optimized to exploit 3D-stacked memory systems that integrate computation within a logic layer stacked under memory layers, to perform processing-in-memory (PIM). GRIM-Filter quickly filters seed locations by 1) introducing a new representation of coarse-grained segments of the reference genome, and 2) using massively-parallel in-memory operations to identify read presence within each coarse-grained segment. Our evaluations show that for a sequence alignment error tolerance of 0.05, GRIM-Filter 1) reduces the false negative rate of filtering by 5.59x-6.41x, and 2) provides an end-to-end read mapper speedup of 1.81x-3.65x, compared to a state-of-the-art read mapper employing the best previous seed location filtering algorithm. GRIM-Filter exploits 3D-stacked memory, which enables the efficient use of processing-in-memory, to overcome the memory bandwidth bottleneck in seed location filtering. We show that GRIM-Filter significantly improves the performance of a state-of-the-art read mapper. GRIM-Filter is a universal seed location filter that can be applied to any read mapper. We hope that our results provide inspiration for new works to design other bioinformatics algorithms that take advantage of emerging technologies and new processing paradigms, such as processing-in-memory using 3D-stacked memory devices.
Fast single-pass alignment and variant calling using sequencing data
USDA-ARS?s Scientific Manuscript database
Sequencing research requires efficient computation. Few programs use already known information about DNA variants when aligning sequence data to the reference map. New program findmap.f90 reads the previous variant list before aligning sequence, calling variant alleles, and summing the allele counts...
Wallen, Rachel; Gokarn, Nirmal; Bercea, Priscila; Grzincic, Elissa; Bandyopadhyay, Krisanu
2015-12-01
Vertically aligned single-walled carbon nanotube (VASWCNT) assemblies are generated on cysteamine and 2-mercaptoethanol (2-ME)-functionalized gold surfaces through amide bond formation between carboxylic groups generated at the end of acid-shortened single-walled carbon nanotubes (SWCNTs) and amine groups present on the gold surfaces. Atomic force microscopy (AFM) imaging confirms the vertical alignment mode of SWCNT attachment through significant changes in surface roughness compared to bare gold surfaces and the lack of any horizontally aligned SWCNTs present. These SWCNT assemblies are further modified with an amine-terminated single-stranded probe-DNA. Subsequent hybridization of the surface-bound probe-DNA in the presence of complementary strands in solution is followed using impedance measurements in the presence of Fe(CN)6 (3-/4-) as the redox probe in solution, which show changes in the interfacial electrochemical properties, specifically the charge-transfer resistance, due to hybridization. In addition, hybridization of the probe-DNA is also compared when it is attached directly to the gold surfaces without any intermediary SWCNTs. Contrary to our expectations, impedance measurements show a decrease in charge-transfer resistance with time due to hybridization with 300 nM complementary DNA in solution with the probe-DNA attached to SWCNTs. In contrast, an increase in charge-transfer resistance is observed with time during hybridization when the probe-DNA is attached directly to the gold surfaces. The decrease in charge-transfer resistance during hybridization in the presence of VASWCNTs indicates an enhancement in the electron transfer process of the redox probe at the VASWCNT-modified electrode. The results suggest that VASWCNTs are acting as mediators of electron transfer, which facilitate the charge transfer of the redox probe at the electrode-solution interface.
NASA Astrophysics Data System (ADS)
Wallen, Rachel; Gokarn, Nirmal; Bercea, Priscila; Grzincic, Elissa; Bandyopadhyay, Krisanu
2015-06-01
Vertically aligned single-walled carbon nanotube (VASWCNT) assemblies are generated on cysteamine and 2-mercaptoethanol (2-ME)-functionalized gold surfaces through amide bond formation between carboxylic groups generated at the end of acid-shortened single-walled carbon nanotubes (SWCNTs) and amine groups present on the gold surfaces. Atomic force microscopy (AFM) imaging confirms the vertical alignment mode of SWCNT attachment through significant changes in surface roughness compared to bare gold surfaces and the lack of any horizontally aligned SWCNTs present. These SWCNT assemblies are further modified with an amine-terminated single-stranded probe-DNA. Subsequent hybridization of the surface-bound probe-DNA in the presence of complementary strands in solution is followed using impedance measurements in the presence of Fe(CN)6 3-/4- as the redox probe in solution, which show changes in the interfacial electrochemical properties, specifically the charge-transfer resistance, due to hybridization. In addition, hybridization of the probe-DNA is also compared when it is attached directly to the gold surfaces without any intermediary SWCNTs. Contrary to our expectations, impedance measurements show a decrease in charge-transfer resistance with time due to hybridization with 300 nM complementary DNA in solution with the probe-DNA attached to SWCNTs. In contrast, an increase in charge-transfer resistance is observed with time during hybridization when the probe-DNA is attached directly to the gold surfaces. The decrease in charge-transfer resistance during hybridization in the presence of VASWCNTs indicates an enhancement in the electron transfer process of the redox probe at the VASWCNT-modified electrode. The results suggest that VASWCNTs are acting as mediators of electron transfer, which facilitate the charge transfer of the redox probe at the electrode-solution interface.
Direct growth of aligned graphitic nanoribbons from a DNA template by chemical vapour deposition.
Sokolov, Anatoliy N; Yap, Fung Ling; Liu, Nan; Kim, Kwanpyo; Ci, Lijie; Johnson, Olasupo B; Wang, Huiliang; Vosgueritchian, Michael; Koh, Ai Leen; Chen, Jihua; Park, Jinseong; Bao, Zhenan
2013-01-01
Graphene, laterally confined within narrow ribbons, exhibits a bandgap and is envisioned as a next-generation material for high-performance electronics. To take advantage of this phenomenon, there is a critical need to develop methodologies that result in graphene ribbons <10 nm in width. Here we report the use of metal salts infused within stretched DNA as catalysts to grow nanoscopic graphitic nanoribbons. The nanoribbons are termed graphitic as they have been determined to consist of regions of sp(2) and sp(3) character. The nanoscopic graphitic nanoribbons are micrometres in length, <10 nm in width, and take on the shape of the DNA template. The DNA strand is converted to a graphitic nanoribbon by utilizing chemical vapour deposition conditions. Depending on the growth conditions, metallic or semiconducting graphitic nanoribbons are formed. Improvements in the growth method have potential to lead to bottom-up synthesis of pristine single-layer graphene nanoribbons.
Discovering Sequence Motifs with Arbitrary Insertions and Deletions
Frith, Martin C.; Saunders, Neil F. W.; Kobe, Bostjan; Bailey, Timothy L.
2008-01-01
Biology is encoded in molecular sequences: deciphering this encoding remains a grand scientific challenge. Functional regions of DNA, RNA, and protein sequences often exhibit characteristic but subtle motifs; thus, computational discovery of motifs in sequences is a fundamental and much-studied problem. However, most current algorithms do not allow for insertions or deletions (indels) within motifs, and the few that do have other limitations. We present a method, GLAM2 (Gapped Local Alignment of Motifs), for discovering motifs allowing indels in a fully general manner, and a companion method GLAM2SCAN for searching sequence databases using such motifs. glam2 is a generalization of the gapless Gibbs sampling algorithm. It re-discovers variable-width protein motifs from the PROSITE database significantly more accurately than the alternative methods PRATT and SAM-T2K. Furthermore, it usefully refines protein motifs from the ELM database: in some cases, the refined motifs make orders of magnitude fewer overpredictions than the original ELM regular expressions. GLAM2 performs respectably on the BAliBASE multiple alignment benchmark, and may be superior to leading multiple alignment methods for “motif-like” alignments with N- and C-terminal extensions. Finally, we demonstrate the use of GLAM2 to discover protein kinase substrate motifs and a gapped DNA motif for the LIM-only transcriptional regulatory complex: using GLAM2SCAN, we identify promising targets for the latter. GLAM2 is especially promising for short protein motifs, and it should improve our ability to identify the protein cleavage sites, interaction sites, post-translational modification attachment sites, etc., that underlie much of biology. It may be equally useful for arbitrarily gapped motifs in DNA and RNA, although fewer examples of such motifs are known at present. GLAM2 is public domain software, available for download at http://bioinformatics.org.au/glam2. PMID:18437229
Liu, Kevin; Warnow, Tandy J; Holder, Mark T; Nelesen, Serita M; Yu, Jiaye; Stamatakis, Alexandros P; Linder, C Randal
2012-01-01
Highly accurate estimation of phylogenetic trees for large data sets is difficult, in part because multiple sequence alignments must be accurate for phylogeny estimation methods to be accurate. Coestimation of alignments and trees has been attempted but currently only SATé estimates reasonably accurate trees and alignments for large data sets in practical time frames (Liu K., Raghavan S., Nelesen S., Linder C.R., Warnow T. 2009b. Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science. 324:1561-1564). Here, we present a modification to the original SATé algorithm that improves upon SATé (which we now call SATé-I) in terms of speed and of phylogenetic and alignment accuracy. SATé-II uses a different divide-and-conquer strategy than SATé-I and so produces smaller more closely related subsets than SATé-I; as a result, SATé-II produces more accurate alignments and trees, can analyze larger data sets, and runs more efficiently than SATé-I. Generally, SATé is a metamethod that takes an existing multiple sequence alignment method as an input parameter and boosts the quality of that alignment method. SATé-II-boosted alignment methods are significantly more accurate than their unboosted versions, and trees based upon these improved alignments are more accurate than trees based upon the original alignments. Because SATé-I used maximum likelihood (ML) methods that treat gaps as missing data to estimate trees and because we found a correlation between the quality of tree/alignment pairs and ML scores, we explored the degree to which SATé's performance depends on using ML with gaps treated as missing data to determine the best tree/alignment pair. We present two lines of evidence that using ML with gaps treated as missing data to optimize the alignment and tree produces very poor results. First, we show that the optimization problem where a set of unaligned DNA sequences is given and the output is the tree and alignment of those sequences that maximize likelihood under the Jukes-Cantor model is uninformative in the worst possible sense. For all inputs, all trees optimize the likelihood score. Second, we show that a greedy heuristic that uses GTR+Gamma ML to optimize the alignment and the tree can produce very poor alignments and trees. Therefore, the excellent performance of SATé-II and SATé-I is not because ML is used as an optimization criterion for choosing the best tree/alignment pair but rather due to the particular divide-and-conquer realignment techniques employed.
Villard, Pierre; Malausa, Thibaut
2013-07-01
SP-Designer is an open-source program providing a user-friendly tool for the design of specific PCR primer pairs from a DNA sequence alignment containing sequences from various taxa. SP-Designer selects PCR primer pairs for the amplification of DNA from a target species on the basis of several criteria: (i) primer specificity, as assessed by interspecific sequence polymorphism in the annealing regions, (ii) the biochemical characteristics of the primers and (iii) the intended PCR conditions. SP-Designer generates tables, detailing the primer pair and PCR characteristics, and a FASTA file locating the primer sequences in the original sequence alignment. SP-Designer is Windows-compatible and freely available from http://www2.sophia.inra.fr/urih/sophia_mart/sp_designer/info_sp_designer.php. © 2013 John Wiley & Sons Ltd.
Sequence analysis of Leukemia DNA
NASA Astrophysics Data System (ADS)
Nacong, Nasria; Lusiyanti, Desy; Irawan, Muhammad. Isa
2018-03-01
Cancer is a very deadly disease, one of which is leukemia disease or better known as blood cancer. The cancer cell can be detected by taking DNA in laboratory test. This study focused on local alignment of leukemia and non leukemia data resulting from NCBI in the form of DNA sequences by using Smith-Waterman algorithm. SmithWaterman algorithm was invented by TF Smith and MS Waterman in 1981. These algorithms try to find as much as possible similarity of a pair of sequences, by giving a negative value to the unequal base pair (mismatch), and positive values on the same base pair (match). So that will obtain the maximum positive value as the end of the alignment, and the minimum value as the initial alignment. This study will use sequences of leukemia and 3 sequences of non leukemia.
Genomic signal processing methods for computation of alignment-free distances from DNA sequences.
Borrayo, Ernesto; Mendizabal-Ruiz, E Gerardo; Vélez-Pérez, Hugo; Romo-Vázquez, Rebeca; Mendizabal, Adriana P; Morales, J Alejandro
2014-01-01
Genomic signal processing (GSP) refers to the use of digital signal processing (DSP) tools for analyzing genomic data such as DNA sequences. A possible application of GSP that has not been fully explored is the computation of the distance between a pair of sequences. In this work we present GAFD, a novel GSP alignment-free distance computation method. We introduce a DNA sequence-to-signal mapping function based on the employment of doublet values, which increases the number of possible amplitude values for the generated signal. Additionally, we explore the use of three DSP distance metrics as descriptors for categorizing DNA signal fragments. Our results indicate the feasibility of employing GAFD for computing sequence distances and the use of descriptors for characterizing DNA fragments.
Genomic Signal Processing Methods for Computation of Alignment-Free Distances from DNA Sequences
Borrayo, Ernesto; Mendizabal-Ruiz, E. Gerardo; Vélez-Pérez, Hugo; Romo-Vázquez, Rebeca; Mendizabal, Adriana P.; Morales, J. Alejandro
2014-01-01
Genomic signal processing (GSP) refers to the use of digital signal processing (DSP) tools for analyzing genomic data such as DNA sequences. A possible application of GSP that has not been fully explored is the computation of the distance between a pair of sequences. In this work we present GAFD, a novel GSP alignment-free distance computation method. We introduce a DNA sequence-to-signal mapping function based on the employment of doublet values, which increases the number of possible amplitude values for the generated signal. Additionally, we explore the use of three DSP distance metrics as descriptors for categorizing DNA signal fragments. Our results indicate the feasibility of employing GAFD for computing sequence distances and the use of descriptors for characterizing DNA fragments. PMID:25393409
Yu, Yi-Kuo; Capra, John A.; Stojmirović, Aleksandar; Landsman, David; Altschul, Stephen F.
2015-01-01
Motivation: DNA and protein patterns are usefully represented by sequence logos. However, the methods for logo generation in common use lack a proper statistical basis, and are non-optimal for recognizing functionally relevant alignment columns. Results: We redefine the information at a logo position as a per-observation multiple alignment log-odds score. Such scores are positive or negative, depending on whether a column’s observations are better explained as arising from relatedness or chance. Within this framework, we propose distinct normalized maximum likelihood and Bayesian measures of column information. We illustrate these measures on High Mobility Group B (HMGB) box proteins and a dataset of enzyme alignments. Particularly in the context of protein alignments, our measures improve the discrimination of biologically relevant positions. Availability and implementation: Our new measures are implemented in an open-source Web-based logo generation program, which is available at http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/logoddslogo/index.html. A stand-alone version of the program is also available from this site. Contact: altschul@ncbi.nlm.nih.gov Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25294922
GateKeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping.
Alser, Mohammed; Hassan, Hasan; Xin, Hongyi; Ergin, Oguz; Mutlu, Onur; Alkan, Can
2017-11-01
High throughput DNA sequencing (HTS) technologies generate an excessive number of small DNA segments -called short reads- that cause significant computational burden. To analyze the entire genome, each of the billions of short reads must be mapped to a reference genome based on the similarity between a read and 'candidate' locations in that reference genome. The similarity measurement, called alignment, formulated as an approximate string matching problem, is the computational bottleneck because: (i) it is implemented using quadratic-time dynamic programming algorithms and (ii) the majority of candidate locations in the reference genome do not align with a given read due to high dissimilarity. Calculating the alignment of such incorrect candidate locations consumes an overwhelming majority of a modern read mapper's execution time. Therefore, it is crucial to develop a fast and effective filter that can detect incorrect candidate locations and eliminate them before invoking computationally costly alignment algorithms. We propose GateKeeper, a new hardware accelerator that functions as a pre-alignment step that quickly filters out most incorrect candidate locations. GateKeeper is the first design to accelerate pre-alignment using Field-Programmable Gate Arrays (FPGAs), which can perform pre-alignment much faster than software. When implemented on a single FPGA chip, GateKeeper maintains high accuracy (on average >96%) while providing, on average, 90-fold and 130-fold speedup over the state-of-the-art software pre-alignment techniques, Adjacency Filter and Shifted Hamming Distance (SHD), respectively. The addition of GateKeeper as a pre-alignment step can reduce the verification time of the mrFAST mapper by a factor of 10. https://github.com/BilkentCompGen/GateKeeper. mohammedalser@bilkent.edu.tr or onur.mutlu@inf.ethz.ch or calkan@cs.bilkent.edu.tr. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Alignment of high-throughput sequencing data inside in-memory databases.
Firnkorn, Daniel; Knaup-Gregori, Petra; Lorenzo Bermejo, Justo; Ganzinger, Matthias
2014-01-01
In times of high-throughput DNA sequencing techniques, performance-capable analysis of DNA sequences is of high importance. Computer supported DNA analysis is still an intensive time-consuming task. In this paper we explore the potential of a new In-Memory database technology by using SAP's High Performance Analytic Appliance (HANA). We focus on read alignment as one of the first steps in DNA sequence analysis. In particular, we examined the widely used Burrows-Wheeler Aligner (BWA) and implemented stored procedures in both, HANA and the free database system MySQL, to compare execution time and memory management. To ensure that the results are comparable, MySQL has been running in memory as well, utilizing its integrated memory engine for database table creation. We implemented stored procedures, containing exact and inexact searching of DNA reads within the reference genome GRCh37. Due to technical restrictions in SAP HANA concerning recursion, the inexact matching problem could not be implemented on this platform. Hence, performance analysis between HANA and MySQL was made by comparing the execution time of the exact search procedures. Here, HANA was approximately 27 times faster than MySQL which means, that there is a high potential within the new In-Memory concepts, leading to further developments of DNA analysis procedures in the future.
TaxI: a software tool for DNA barcoding using distance methods
Steinke, Dirk; Vences, Miguel; Salzburger, Walter; Meyer, Axel
2005-01-01
DNA barcoding is a promising approach to the diagnosis of biological diversity in which DNA sequences serve as the primary key for information retrieval. Most existing software for evolutionary analysis of DNA sequences was designed for phylogenetic analyses and, hence, those algorithms do not offer appropriate solutions for the rapid, but precise analyses needed for DNA barcoding, and are also unable to process the often large comparative datasets. We developed a flexible software tool for DNA taxonomy, named TaxI. This program calculates sequence divergences between a query sequence (taxon to be barcoded) and each sequence of a dataset of reference sequences defined by the user. Because the analysis is based on separate pairwise alignments this software is also able to work with sequences characterized by multiple insertions and deletions that are difficult to align in large sequence sets (i.e. thousands of sequences) by multiple alignment algorithms because of computational restrictions. Here, we demonstrate the utility of this approach with two datasets of fish larvae and juveniles from Lake Constance and juvenile land snails under different models of sequence evolution. Sets of ribosomal 16S rRNA sequences, characterized by multiple indels, performed as good as or better than cox1 sequence sets in assigning sequences to species, demonstrating the suitability of rRNA genes for DNA barcoding. PMID:16214755
Sutherland, John C.
2017-04-15
Linear dichroism provides information on the orientation of chromophores part of, or bound to, an orientable molecule such as DNA. For molecular alignment induced by hydrodynamic shear, the principal axes orthogonal to the direction of alignment are not equivalent. Thus, the magnitude of the flow-induced change in absorption for light polarized parallel to the direction of flow can be more than a factor of two greater than the corresponding change for light polarized perpendicular to both that direction and the shear axis. The ratio of the two flow-induced changes in absorption, the dichroic increment ratio, is characterized using the orthogonalmore » orientation model, which assumes that each absorbing unit is aligned parallel to one of the principal axes of the apparatus. The absorption of the alienable molecules is characterized by components parallel and perpendicular to the orientable axis of the molecule. The dichroic increment ratio indicates that for the alignment of DNA in rectangular flow cells, average alignment is not uniaxial, but for higher shear, as produced in a Couette cell, it can be. The results from the simple model are identical to tensor models for typical experimental configuration. Approaches for measuring the dichroic increment ratio with modern dichrometers are further discussed.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sutherland, John C.
Linear dichroism provides information on the orientation of chromophores part of, or bound to, an orientable molecule such as DNA. For molecular alignment induced by hydrodynamic shear, the principal axes orthogonal to the direction of alignment are not equivalent. Thus, the magnitude of the flow-induced change in absorption for light polarized parallel to the direction of flow can be more than a factor of two greater than the corresponding change for light polarized perpendicular to both that direction and the shear axis. The ratio of the two flow-induced changes in absorption, the dichroic increment ratio, is characterized using the orthogonalmore » orientation model, which assumes that each absorbing unit is aligned parallel to one of the principal axes of the apparatus. The absorption of the alienable molecules is characterized by components parallel and perpendicular to the orientable axis of the molecule. The dichroic increment ratio indicates that for the alignment of DNA in rectangular flow cells, average alignment is not uniaxial, but for higher shear, as produced in a Couette cell, it can be. The results from the simple model are identical to tensor models for typical experimental configuration. Approaches for measuring the dichroic increment ratio with modern dichrometers are further discussed.« less
Sutherland, John C
2017-04-15
Linear dichroism provides information on the orientation of chromophores part of, or bound to, an orientable molecule such as DNA. For molecular alignment induced by hydrodynamic shear, the principal axes orthogonal to the direction of alignment are not equivalent. Thus, the magnitude of the flow-induced change in absorption for light polarized parallel to the direction of flow can be more than a factor of two greater than the corresponding change for light polarized perpendicular to both that direction and the shear axis. The ratio of the two flow-induced changes in absorption, the dichroic increment ratio, is characterized using the orthogonal orientation model, which assumes that each absorbing unit is aligned parallel to one of the principal axes of the apparatus. The absorption of the alienable molecules is characterized by components parallel and perpendicular to the orientable axis of the molecule. The dichroic increment ratio indicates that for the alignment of DNA in rectangular flow cells, average alignment is not uniaxial, but for higher shear, as produced in a Couette cell, it can be. The results from the simple model are identical to tensor models for typical experimental configurations. Approaches for measuring the dichroic increment ratio with modern dichrometers are discussed. Copyright © 2017. Published by Elsevier Inc.
enoLOGOS: a versatile web tool for energy normalized sequence logos
Workman, Christopher T.; Yin, Yutong; Corcoran, David L.; Ideker, Trey; Stormo, Gary D.; Benos, Panayiotis V.
2005-01-01
enoLOGOS is a web-based tool that generates sequence logos from various input sources. Sequence logos have become a popular way to graphically represent DNA and amino acid sequence patterns from a set of aligned sequences. Each position of the alignment is represented by a column of stacked symbols with its total height reflecting the information content in this position. Currently, the available web servers are able to create logo images from a set of aligned sequences, but none of them generates weighted sequence logos directly from energy measurements or other sources. With the advent of high-throughput technologies for estimating the contact energy of different DNA sequences, tools that can create logos directly from binding affinity data are useful to researchers. enoLOGOS generates sequence logos from a variety of input data, including energy measurements, probability matrices, alignment matrices, count matrices and aligned sequences. Furthermore, enoLOGOS can represent the mutual information of different positions of the consensus sequence, a unique feature of this tool. Another web interface for our software, C2H2-enoLOGOS, generates logos for the DNA-binding preferences of the C2H2 zinc-finger transcription factor family members. enoLOGOS and C2H2-enoLOGOS are accessible over the web at . PMID:15980495
Zemali, El-Amine; Boukra, Abdelmadjid
2015-08-01
The multiple sequence alignment (MSA) is one of the most challenging problems in bioinformatics, it involves discovering similarity between a set of protein or DNA sequences. This paper introduces a new method for the MSA problem called biogeography-based optimization with multiple populations (BBOMP). It is based on a recent metaheuristic inspired from the mathematics of biogeography named biogeography-based optimization (BBO). To improve the exploration ability of BBO, we have introduced a new concept allowing better exploration of the search space. It consists of manipulating multiple populations having each one its own parameters. These parameters are used to build up progressive alignments allowing more diversity. At each iteration, the best found solution is injected in each population. Moreover, to improve solution quality, six operators are defined. These operators are selected with a dynamic probability which changes according to the operators efficiency. In order to test proposed approach performance, we have considered a set of datasets from Balibase 2.0 and compared it with many recent algorithms such as GAPAM, MSA-GA, QEAMSA and RBT-GA. The results show that the proposed approach achieves better average score than the previously cited methods.
Phylo: A Citizen Science Approach for Improving Multiple Sequence Alignment
Kam, Alfred; Kwak, Daniel; Leung, Clarence; Wu, Chu; Zarour, Eleyine; Sarmenta, Luis; Blanchette, Mathieu; Waldispühl, Jérôme
2012-01-01
Background Comparative genomics, or the study of the relationships of genome structure and function across different species, offers a powerful tool for studying evolution, annotating genomes, and understanding the causes of various genetic disorders. However, aligning multiple sequences of DNA, an essential intermediate step for most types of analyses, is a difficult computational task. In parallel, citizen science, an approach that takes advantage of the fact that the human brain is exquisitely tuned to solving specific types of problems, is becoming increasingly popular. There, instances of hard computational problems are dispatched to a crowd of non-expert human game players and solutions are sent back to a central server. Methodology/Principal Findings We introduce Phylo, a human-based computing framework applying “crowd sourcing” techniques to solve the Multiple Sequence Alignment (MSA) problem. The key idea of Phylo is to convert the MSA problem into a casual game that can be played by ordinary web users with a minimal prior knowledge of the biological context. We applied this strategy to improve the alignment of the promoters of disease-related genes from up to 44 vertebrate species. Since the launch in November 2010, we received more than 350,000 solutions submitted from more than 12,000 registered users. Our results show that solutions submitted contributed to improving the accuracy of up to 70% of the alignment blocks considered. Conclusions/Significance We demonstrate that, combined with classical algorithms, crowd computing techniques can be successfully used to help improving the accuracy of MSA. More importantly, we show that an NP-hard computational problem can be embedded in casual game that can be easily played by people without significant scientific training. This suggests that citizen science approaches can be used to exploit the billions of “human-brain peta-flops” of computation that are spent every day playing games. Phylo is available at: http://phylo.cs.mcgill.ca. PMID:22412834
Tóth, Annamária; Hausknecht, Anton; Krisai-Greilhuber, Irmgard; Papp, Tamás; Vágvölgyi, Csaba; Nagy, László G.
2013-01-01
Reconciling traditional classifications, morphology, and the phylogenetic relationships of brown-spored agaric mushrooms has proven difficult in many groups, due to extensive convergence in morphological features. Here, we address the monophyly of the Bolbitiaceae, a family with over 700 described species and examine the higher-level relationships within the family using a newly constructed multilocus dataset (ITS, nrLSU rDNA and EF1-alpha). We tested whether the fast-evolving Internal Transcribed Spacer (ITS) sequences can be accurately aligned across the family, by comparing the outcome of two iterative alignment refining approaches (an automated and a manual) and various indel-treatment strategies. We used PRANK to align sequences in both cases. Our results suggest that – although PRANK successfully evades overmatching of gapped sites, referred previously to as alignment overmatching – it infers an unrealistically high number of indel events with natively generated guide-trees. This 'alignment undermatching' could be avoided by using more rigorous (e.g. ML) guide trees. The trees inferred in this study support the monophyly of the core Bolbitiaceae, with the exclusion of Panaeolus, Agrocybe, and some of the genera formerly placed in the family. Bolbitius and Conocybe were found monophyletic, however, Pholiotina and Galerella require redefinition. The phylogeny revealed that stipe coverage type is a poor predictor of phylogenetic relationships, indicating the need for a revision of the intrageneric relationships within Conocybe. PMID:23418526
Optimized smith waterman processor design for breast cancer early diagnosis
NASA Astrophysics Data System (ADS)
Nurdin, D. S.; Isa, M. N.; Ismail, R. C.; Ahmad, M. I.
2017-09-01
This paper presents an optimized design of Processing Element (PE) of Systolic Array (SA) which implements affine gap penalty Smith Waterman (SW) algorithm on the Xilinx Virtex-6 XC6VLX75T Field Programmable Gate Array (FPGA) for Deoxyribonucleic Acid (DNA) sequence alignment. The PE optimization aims to reduce PE logic resources to increase number of PEs in FPGA for higher degree of parallelism during alignment matrix computations. This is useful for aligning long DNA-based disease sequence such as Breast Cancer (BC) for early diagnosis. The optimized PE architecture has the smallest PE area with 15 slices in a PE and 776 PEs implemented in the Virtex - 6 FPGA.
Verzotto, Davide; M Teo, Audrey S; Hillmer, Axel M; Nagarajan, Niranjan
2016-01-01
Resolution of complex repeat structures and rearrangements in the assembly and analysis of large eukaryotic genomes is often aided by a combination of high-throughput sequencing and genome-mapping technologies (for example, optical restriction mapping). In particular, mapping technologies can generate sparse maps of large DNA fragments (150 kilo base pairs (kbp) to 2 Mbp) and thus provide a unique source of information for disambiguating complex rearrangements in cancer genomes. Despite their utility, combining high-throughput sequencing and mapping technologies has been challenging because of the lack of efficient and sensitive map-alignment algorithms for robustly aligning error-prone maps to sequences. We introduce a novel seed-and-extend glocal (short for global-local) alignment method, OPTIMA (and a sliding-window extension for overlap alignment, OPTIMA-Overlap), which is the first to create indexes for continuous-valued mapping data while accounting for mapping errors. We also present a novel statistical model, agnostic with respect to technology-dependent error rates, for conservatively evaluating the significance of alignments without relying on expensive permutation-based tests. We show that OPTIMA and OPTIMA-Overlap outperform other state-of-the-art approaches (1.6-2 times more sensitive) and are more efficient (170-200 %) and precise in their alignments (nearly 99 % precision). These advantages are independent of the quality of the data, suggesting that our indexing approach and statistical evaluation are robust, provide improved sensitivity and guarantee high precision.
Westhoff, Connie M.; Uy, Jon Michael; Aguad, Maria; Smeland‐Wagman, Robin; Kaufman, Richard M.; Rehm, Heidi L.; Green, Robert C.; Silberstein, Leslie E.
2015-01-01
BACKGROUND There are 346 serologically defined red blood cell (RBC) antigens and 33 serologically defined platelet (PLT) antigens, most of which have known genetic changes in 45 RBC or six PLT genes that correlate with antigen expression. Polymorphic sites associated with antigen expression in the primary literature and reference databases are annotated according to nucleotide positions in cDNA. This makes antigen prediction from next‐generation sequencing data challenging, since it uses genomic coordinates. STUDY DESIGN AND METHODS The conventional cDNA reference sequences for all known RBC and PLT genes that correlate with antigen expression were aligned to the human reference genome. The alignments allowed conversion of conventional cDNA nucleotide positions to the corresponding genomic coordinates. RBC and PLT antigen prediction was then performed using the human reference genome and whole genome sequencing (WGS) data with serologic confirmation. RESULTS Some major differences and alignment issues were found when attempting to convert the conventional cDNA to human reference genome sequences for the following genes: ABO, A4GALT, RHD, RHCE, FUT3, ACKR1 (previously DARC), ACHE, FUT2, CR1, GCNT2, and RHAG. However, it was possible to create usable alignments, which facilitated the prediction of all RBC and PLT antigens with a known molecular basis from WGS data. Traditional serologic typing for 18 RBC antigens were in agreement with the WGS‐based antigen predictions, providing proof of principle for this approach. CONCLUSION Detailed mapping of conventional cDNA annotated RBC and PLT alleles can enable accurate prediction of RBC and PLT antigens from whole genomic sequencing data. PMID:26634332
DNA looping by FokI: the impact of twisting and bending rigidity on protein-induced looping dynamics
Laurens, Niels; Rusling, David A.; Pernstich, Christian; Brouwer, Ineke; Halford, Stephen E.; Wuite, Gijs J. L.
2012-01-01
Protein-induced DNA looping is crucial for many genetic processes such as transcription, gene regulation and DNA replication. Here, we use tethered-particle motion to examine the impact of DNA bending and twisting rigidity on loop capture and release, using the restriction endonuclease FokI as a test system. To cleave DNA efficiently, FokI bridges two copies of an asymmetric sequence, invariably aligning the sites in parallel. On account of the fixed alignment, the topology of the DNA loop is set by the orientation of the sites along the DNA. We show that both the separation of the FokI sites and their orientation, altering, respectively, the twisting and the bending of the DNA needed to juxtapose the sites, have profound effects on the dynamics of the looping interaction. Surprisingly, the presence of a nick within the loop does not affect the observed rigidity of the DNA. In contrast, the introduction of a 4-nt gap fully relaxes all of the torque present in the system but does not necessarily enhance loop stability. FokI therefore employs torque to stabilise its DNA-looping interaction by acting as a ‘torsional’ catch bond. PMID:22373924
Harper, B; McClain, S; Ganko, E W
2012-08-01
Global regulatory agencies require bioinformatic sequence analysis as part of their safety evaluation for transgenic crops. Analysis typically focuses on encoded proteins and adjacent endogenous flanking sequences. Recently, regulatory expectations have expanded to include all reading frames of the inserted DNA. The intent is to provide biologically relevant results that can be used in the overall assessment of safety. This paper evaluates the relevance of assessing the allergenic potential of all DNA reading frames found in common food genes using methods considered for the analysis of T-DNA sequences used in transgenic crops. FASTA and BLASTX algorithms were used to compare genes from maize, rice, soybean, cucumber, melon, watermelon, and tomato using international regulatory guidance. Results show that BLASTX for maize yielded 7254 alignments that exceeded allergen similarity thresholds and 210,772 alignments that matched eight or more consecutive amino acids with an allergen; other crops produced similar results. This analysis suggests that each nontransgenic crop has a much greater potential for allergenic risk than what has been observed clinically. We demonstrate that a meaningful safety assessment is unlikely to be provided by using methods with inherently high frequencies of false positive alignments when broadly applied to all reading frames of DNA sequence. Copyright © 2012 Elsevier Inc. All rights reserved.
2009-01-01
Background Sequence identification of ESTs from non-model species offers distinct challenges particularly when these species have duplicated genomes and when they are phylogenetically distant from sequenced model organisms. For the common carp, an environmental model of aquacultural interest, large numbers of ESTs remained unidentified using BLAST sequence alignment. We have used the expression profiles from large-scale microarray experiments to suggest gene identities. Results Expression profiles from ~700 cDNA microarrays describing responses of 7 major tissues to multiple environmental stressors were used to define a co-expression landscape. This was based on the Pearsons correlation coefficient relating each gene with all other genes, from which a network description provided clusters of highly correlated genes as 'mountains'. We show that these contain genes with known identities and genes with unknown identities, and that the correlation constitutes evidence of identity in the latter. This procedure has suggested identities to 522 of 2701 unknown carp ESTs sequences. We also discriminate several common carp genes and gene isoforms that were not discriminated by BLAST sequence alignment alone. Precision in identification was substantially improved by use of data from multiple tissues and treatments. Conclusion The detailed analysis of co-expression landscapes is a sensitive technique for suggesting an identity for the large number of BLAST unidentified cDNAs generated in EST projects. It is capable of detecting even subtle changes in expression profiles, and thereby of distinguishing genes with a common BLAST identity into different identities. It benefits from the use of multiple treatments or contrasts, and from the large-scale microarray data. PMID:19939286
The practical evaluation of DNA barcode efficacy.
Spouge, John L; Mariño-Ramírez, Leonardo
2012-01-01
This chapter describes a workflow for measuring the efficacy of a barcode in identifying species. First, assemble individual sequence databases corresponding to each barcode marker. A controlled collection of taxonomic data is preferable to GenBank data, because GenBank data can be problematic, particularly when comparing barcodes based on more than one marker. To ensure proper controls when evaluating species identification, specimens not having a sequence in every marker database should be discarded. Second, select a computer algorithm for assigning species to barcode sequences. No algorithm has yet improved notably on assigning a specimen to the species of its nearest neighbor within a barcode database. Because global sequence alignments (e.g., with the Needleman-Wunsch algorithm, or some related algorithm) examine entire barcode sequences, they generally produce better species assignments than local sequence alignments (e.g., with BLAST). No neighboring method (e.g., global sequence similarity, global sequence distance, or evolutionary distance based on a global alignment) has yet shown a notable superiority in identifying species. Finally, "the probability of correct identification" (PCI) provides an appropriate measurement of barcode efficacy. The overall PCI for a data set is the average of the species PCIs, taken over all species in the data set. This chapter states explicitly how to calculate PCI, how to estimate its statistical sampling error, and how to use data on PCR failure to set limits on how much improvements in PCR technology can improve species identification.
GATA: A graphic alignment tool for comparative sequenceanalysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nix, David A.; Eisen, Michael B.
2005-01-01
Several problems exist with current methods used to align DNA sequences for comparative sequence analysis. Most dynamic programming algorithms assume that conserved sequence elements are collinear. This assumption appears valid when comparing orthologous protein coding sequences. Functional constraints on proteins provide strong selective pressure against sequence inversions, and minimize sequence duplications and feature shuffling. For non-coding sequences this collinearity assumption is often invalid. For example, enhancers contain clusters of transcription factor binding sites that change in number, orientation, and spacing during evolution yet the enhancer retains its activity. Dotplot analysis is often used to estimate non-coding sequence relatedness. Yet dotmore » plots do not actually align sequences and thus cannot account well for base insertions or deletions. Moreover, they lack an adequate statistical framework for comparing sequence relatedness and are limited to pairwise comparisons. Lastly, dot plots and dynamic programming text outputs fail to provide an intuitive means for visualizing DNA alignments.« less
DNA-dependent protein kinase in nonhomologous end joining: a lock with multiple keys?
Weterings, Eric; Chen, David J
2007-10-22
The DNA-dependent protein kinase (DNA-PK) is one of the central enzymes involved in DNA double-strand break (DSB) repair. It facilitates proper alignment of the two ends of the broken DNA molecule and coordinates access of other factors to the repair complex. We discuss the latest findings on DNA-PK phosphorylation and offer a working model for the regulation of DNA-PK during DSB repair.
Non-B DB v2.0: a database of predicted non-B DNA-forming motifs and its associated tools.
Cer, Regina Z; Donohue, Duncan E; Mudunuri, Uma S; Temiz, Nuri A; Loss, Michael A; Starner, Nathan J; Halusa, Goran N; Volfovsky, Natalia; Yi, Ming; Luke, Brian T; Bacolla, Albino; Collins, Jack R; Stephens, Robert M
2013-01-01
The non-B DB, available at http://nonb.abcc.ncifcrf.gov, catalogs predicted non-B DNA-forming sequence motifs, including Z-DNA, G-quadruplex, A-phased repeats, inverted repeats, mirror repeats, direct repeats and their corresponding subsets: cruciforms, triplexes and slipped structures, in several genomes. Version 2.0 of the database revises and re-implements the motif discovery algorithms to better align with accepted definitions and thresholds for motifs, expands the non-B DNA-forming motifs coverage by including short tandem repeats and adds key visualization tools to compare motif locations relative to other genomic annotations. Non-B DB v2.0 extends the ability for comparative genomics by including re-annotation of the five organisms reported in non-B DB v1.0, human, chimpanzee, dog, macaque and mouse, and adds seven additional organisms: orangutan, rat, cow, pig, horse, platypus and Arabidopsis thaliana. Additionally, the non-B DB v2.0 provides an overall improved graphical user interface and faster query performance.
Hong, Jungeui; Gresham, David
2017-11-01
Quantitative analysis of next-generation sequencing (NGS) data requires discriminating duplicate reads generated by PCR from identical molecules that are of unique origin. Typically, PCR duplicates are identified as sequence reads that align to the same genomic coordinates using reference-based alignment. However, identical molecules can be independently generated during library preparation. Misidentification of these molecules as PCR duplicates can introduce unforeseen biases during analyses. Here, we developed a cost-effective sequencing adapter design by modifying Illumina TruSeq adapters to incorporate a unique molecular identifier (UMI) while maintaining the capacity to undertake multiplexed, single-index sequencing. Incorporation of UMIs into TruSeq adapters (TrUMIseq adapters) enables identification of bona fide PCR duplicates as identically mapped reads with identical UMIs. Using TrUMIseq adapters, we show that accurate removal of PCR duplicates results in improved accuracy of both allele frequency (AF) estimation in heterogeneous populations using DNA sequencing and gene expression quantification using RNA-Seq.
Paging through history: parchment as a reservoir of ancient DNA for next generation sequencing
Teasdale, M. D.; van Doorn, N. L.; Fiddyment, S.; Webb, C. C.; O'Connor, T.; Hofreiter, M.; Collins, M. J.; Bradley, D. G.
2015-01-01
Parchment represents an invaluable cultural reservoir. Retrieving an additional layer of information from these abundant, dated livestock-skins via the use of ancient DNA (aDNA) sequencing has been mooted by a number of researchers. However, prior PCR-based work has indicated that this may be challenged by cross-individual and cross-species contamination, perhaps from the bulk parchment preparation process. Here we apply next generation sequencing to two parchments of seventeenth and eighteenth century northern English provenance. Following alignment to the published sheep, goat, cow and human genomes, it is clear that the only genome displaying substantial unique homology is sheep and this species identification is confirmed by collagen peptide mass spectrometry. Only 4% of sequence reads align preferentially to a different species indicating low contamination across species. Moreover, mitochondrial DNA sequences suggest an upper bound of contamination at 5%. Over 45% of reads aligned to the sheep genome, and even this limited sequencing exercise yield 9 and 7% of each sampled sheep genome post filtering, allowing the mapping of genetic affinity to modern British sheep breeds. We conclude that parchment represents an excellent substrate for genomic analyses of historical livestock. PMID:25487331
Kinetics of DNA Tile Dimerization
2015-01-01
Investigating how individual molecular components interact with one another within DNA nanoarchitectures, both in terms of their spatial and temporal interactions, is fundamentally important for a better understanding of their physical behaviors. This will provide researchers with valuable insight for designing more complex higher-order structures that can be assembled more efficiently. In this report, we examined several spatial factors that affect the kinetics of bivalent, double-helical (DH) tile dimerization, including the orientation and number of sticky ends (SEs), the flexibility of the double helical domains, and the size of the tiles. The rate constants we obtained confirm our hypothesis that increased nucleation opportunities and well-aligned SEs accelerate tile–tile dimerization. Increased flexibility in the tiles causes slower dimerization rates, an effect that can be reversed by introducing restrictions to the tile flexibility. The higher dimerization rates of more rigid tiles results from the opposing effects of higher activation energies and higher pre-exponential factors from the Arrhenius equation, where the pre-exponential factor dominates. We believe that the results presented here will assist in improved implementation of DNA tile based algorithmic self-assembly, DNA based molecular robotics, and other specific nucleic acid systems, and will provide guidance to design and assembly processes to improve overall yield and efficiency. PMID:24794259
Kinetics of DNA tile dimerization.
Jiang, Shuoxing; Yan, Hao; Liu, Yan
2014-06-24
Investigating how individual molecular components interact with one another within DNA nanoarchitectures, both in terms of their spatial and temporal interactions, is fundamentally important for a better understanding of their physical behaviors. This will provide researchers with valuable insight for designing more complex higher-order structures that can be assembled more efficiently. In this report, we examined several spatial factors that affect the kinetics of bivalent, double-helical (DH) tile dimerization, including the orientation and number of sticky ends (SEs), the flexibility of the double helical domains, and the size of the tiles. The rate constants we obtained confirm our hypothesis that increased nucleation opportunities and well-aligned SEs accelerate tile-tile dimerization. Increased flexibility in the tiles causes slower dimerization rates, an effect that can be reversed by introducing restrictions to the tile flexibility. The higher dimerization rates of more rigid tiles results from the opposing effects of higher activation energies and higher pre-exponential factors from the Arrhenius equation, where the pre-exponential factor dominates. We believe that the results presented here will assist in improved implementation of DNA tile based algorithmic self-assembly, DNA based molecular robotics, and other specific nucleic acid systems, and will provide guidance to design and assembly processes to improve overall yield and efficiency.
In silico Analysis of 2085 Clones from a Normalized Rat Vestibular Periphery 3′ cDNA Library
Roche, Joseph P.; Cioffi, Joseph A.; Kwitek, Anne E.; Erbe, Christy B.; Popper, Paul
2005-01-01
The inserts from 2400 cDNA clones isolated from a normalized Rattus norvegicus vestibular periphery cDNA library were sequenced and characterized. The Wackym-Soares vestibular 3′ cDNA library was constructed from the saccular and utricular maculae, the ampullae of all three semicircular canals and Scarpa's ganglia containing the somata of the primary afferent neurons, microdissected from 104 male and female rats. The inserts from 2400 randomly selected clones were sequenced from the 5′ end. Each sequence was analyzed using the BLAST algorithm compared to the Genbank nonredundant, rat genome, mouse genome and human genome databases to search for high homology alignments. Of the initial 2400 clones, 315 (13%) were found to be of poor quality and did not yield useful information, and therefore were eliminated from the analysis. Of the remaining 2085 sequences, 918 (44%) were found to represent 758 unique genes having useful annotations that were identified in databases within the public domain or in the published literature; these sequences were designated as known characterized sequences. 1141 sequences (55%) aligned with 1011 unique sequences had no useful annotations and were designated as known but uncharacterized sequences. Of the remaining 26 sequences (1%), 24 aligned with rat genomic sequences, but none matched previously described rat expressed sequence tags or mRNAs. No significant alignment to the rat or human genomic sequences could be found for the remaining 2 sequences. Of the 2085 sequences analyzed, 86% were singletons. The known, characterized sequences were analyzed with the FatiGO online data-mining tool (http://fatigo.bioinfo.cnio.es/) to identify level 5 biological process gene ontology (GO) terms for each alignment and to group alignments with similar or identical GO terms. Numerous genes were identified that have not been previously shown to be expressed in the vestibular system. Further characterization of the novel cDNA sequences may lead to the identification of genes with vestibular-specific functions. Continued analysis of the rat vestibular periphery transcriptome should provide new insights into vestibular function and generate new hypotheses. Physiological studies are necessary to further elucidate the roles of the identified genes and novel sequences in vestibular function. PMID:16103642
Method for promoting specific alignment of short oligonucleotides on nucleic acids
Studier, F. William; Kieleczawa, Jan; Dunn, John J.
1996-01-01
Disclosed is a method for promoting specific alignment of short oligonucleotides on a nucleic acid polymer. The nucleic acid polymer is incubated in a solution containing a single-stranded DNA-binding protein and a plurality of oligonucleotides which are perfectly complementary to distinct but adjacent regions of a predetermined contiguous nucleotide sequence in the nucleic acid polymer. The plurality of oligonucleotides anneal to the nucleic acid polymer to form a contiguous region of double stranded nucleic acid. Specific application of the methods disclosed include priming DNA synthesis and template-directed ligation.
A polarized view on DNA under tension
NASA Astrophysics Data System (ADS)
van Mameren, Joost; Vermeulen, Karen; Wuite, Gijs J. L.; Peterman, Erwin J. G.
2018-03-01
In the past decades, sensitive fluorescence microscopy techniques have contributed significantly to our understanding of the dynamics of DNA. The specific labeling of DNA using intercalating dyes has allowed for quantitative measurement of the thermal fluctuations the polymers undergo. On the other hand, recent advances in single-molecule manipulation techniques have unraveled the mechanical and elastic properties of this intricate polymer. Here, we have combined these two approaches to study the conformational dynamics of DNA under a wide range of tensions. Using polarized fluorescence microscopy in conjunction with optical-tweezers-based manipulation of YOYO-intercalated DNA, we controllably align the YOYO dyes using DNA tension, enabling us to disentangle the rapid dynamics of the dyes from that of the DNA itself. With unprecedented control of the DNA alignment, we resolve an inconsistency in reports about the tilted orientation of intercalated dyes. We find that intercalated dyes are on average oriented perpendicular to the long axis of the DNA, yet undergo fast dynamics on the time scale of absorption and fluorescence emission. In the overstretching transition of double-stranded DNA, we do not observe changes in orientation or orientational dynamics of the dyes. Only beyond the overstretching transition, a considerable depolarization is observed, presumably caused by an average tilting of the DNA base pairs. Our combined approach thus contributes to the elucidation of unique features of the molecular dynamics of DNA.
Optimization of sequence alignment for simple sequence repeat regions.
Jighly, Abdulqader; Hamwieh, Aladdin; Ogbonnaya, Francis C
2011-07-20
Microsatellites, or simple sequence repeats (SSRs), are tandemly repeated DNA sequences, including tandem copies of specific sequences no longer than six bases, that are distributed in the genome. SSR has been used as a molecular marker because it is easy to detect and is used in a range of applications, including genetic diversity, genome mapping, and marker assisted selection. It is also very mutable because of slipping in the DNA polymerase during DNA replication. This unique mutation increases the insertion/deletion (INDELs) mutation frequency to a high ratio - more than other types of molecular markers such as single nucleotide polymorphism (SNPs).SNPs are more frequent than INDELs. Therefore, all designed algorithms for sequence alignment fit the vast majority of the genomic sequence without considering microsatellite regions, as unique sequences that require special consideration. The old algorithm is limited in its application because there are many overlaps between different repeat units which result in false evolutionary relationships. To overcome the limitation of the aligning algorithm when dealing with SSR loci, a new algorithm was developed using PERL script with a Tk graphical interface. This program is based on aligning sequences after determining the repeated units first, and the last SSR nucleotides positions. This results in a shifting process according to the inserted repeated unit type.When studying the phylogenic relations before and after applying the new algorithm, many differences in the trees were obtained by increasing the SSR length and complexity. However, less distance between different linage had been observed after applying the new algorithm. The new algorithm produces better estimates for aligning SSR loci because it reflects more reliable evolutionary relations between different linages. It reduces overlapping during SSR alignment, which results in a more realistic phylogenic relationship.
A Novel Partial Sequence Alignment Tool for Finding Large Deletions
Aruk, Taner; Ustek, Duran; Kursun, Olcay
2012-01-01
Finding large deletions in genome sequences has become increasingly more useful in bioinformatics, such as in clinical research and diagnosis. Although there are a number of publically available next generation sequencing mapping and sequence alignment programs, these software packages do not correctly align fragments containing deletions larger than one kb. We present a fast alignment software package, BinaryPartialAlign, that can be used by wet lab scientists to find long structural variations in their experiments. For BinaryPartialAlign, we make use of the Smith-Waterman (SW) algorithm with a binary-search-based approach for alignment with large gaps that we called partial alignment. BinaryPartialAlign implementation is compared with other straight-forward applications of SW. Simulation results on mtDNA fragments demonstrate the effectiveness (runtime and accuracy) of the proposed method. PMID:22566777
Protein alignment algorithms with an efficient backtracking routine on multiple GPUs.
Blazewicz, Jacek; Frohmberg, Wojciech; Kierzynka, Michal; Pesch, Erwin; Wojciechowski, Pawel
2011-05-20
Pairwise sequence alignment methods are widely used in biological research. The increasing number of sequences is perceived as one of the upcoming challenges for sequence alignment methods in the nearest future. To overcome this challenge several GPU (Graphics Processing Unit) computing approaches have been proposed lately. These solutions show a great potential of a GPU platform but in most cases address the problem of sequence database scanning and computing only the alignment score whereas the alignment itself is omitted. Thus, the need arose to implement the global and semiglobal Needleman-Wunsch, and Smith-Waterman algorithms with a backtracking procedure which is needed to construct the alignment. In this paper we present the solution that performs the alignment of every given sequence pair, which is a required step for progressive multiple sequence alignment methods, as well as for DNA recognition at the DNA assembly stage. Performed tests show that the implementation, with performance up to 6.3 GCUPS on a single GPU for affine gap penalties, is very efficient in comparison to other CPU and GPU-based solutions. Moreover, multiple GPUs support with load balancing makes the application very scalable. The article shows that the backtracking procedure of the sequence alignment algorithms may be designed to fit in with the GPU architecture. Therefore, our algorithm, apart from scores, is able to compute pairwise alignments. This opens a wide range of new possibilities, allowing other methods from the area of molecular biology to take advantage of the new computational architecture. Performed tests show that the efficiency of the implementation is excellent. Moreover, the speed of our GPU-based algorithms can be almost linearly increased when using more than one graphics card.
Lenis, Vasileios Panagiotis E; Swain, Martin; Larkin, Denis M
2018-05-01
Cross-species whole-genome sequence alignment is a critical first step for genome comparative analyses, ranging from the detection of sequence variants to studies of chromosome evolution. Animal genomes are large and complex, and whole-genome alignment is a computationally intense process, requiring expensive high-performance computing systems due to the need to explore extensive local alignments. With hundreds of sequenced animal genomes available from multiple projects, there is an increasing demand for genome comparative analyses. Here, we introduce G-Anchor, a new, fast, and efficient pipeline that uses a strictly limited but highly effective set of local sequence alignments to anchor (or map) an animal genome to another species' reference genome. G-Anchor makes novel use of a databank of highly conserved DNA sequence elements. We demonstrate how these elements may be aligned to a pair of genomes, creating anchors. These anchors enable the rapid mapping of scaffolds from a de novo assembled genome to chromosome assemblies of a reference species. Our results demonstrate that G-Anchor can successfully anchor a vertebrate genome onto a phylogenetically related reference species genome using a desktop or laptop computer within a few hours and with comparable accuracy to that achieved by a highly accurate whole-genome alignment tool such as LASTZ. G-Anchor thus makes whole-genome comparisons accessible to researchers with limited computational resources. G-Anchor is a ready-to-use tool for anchoring a pair of vertebrate genomes. It may be used with large genomes that contain a significant fraction of evolutionally conserved DNA sequences and that are not highly repetitive, polypoid, or excessively fragmented. G-Anchor is not a substitute for whole-genome aligning software but can be used for fast and accurate initial genome comparisons. G-Anchor is freely available and a ready-to-use tool for the pairwise comparison of two genomes.
Prophagic DNA Fragments in Streptococcus agalactiae Strains and Association with Neonatal Meningitis
van der Mee-Marquet, Nathalie; Domelier, Anne-Sophie; Mereghetti, Laurent; Lanotte, Philippe; Rosenau, Agnès; van Leeuwen, Willem; Quentin, Roland
2006-01-01
We identified—by randomly amplified polymorphic DNA (RAPD) analysis at the population level followed by DNA differential display, cloning, and sequencing—three prophage DNA fragments (F5, F7, and F10) in Streptococcus agalactiae that displayed significant sequence similarity to the DNA of S. agalactiae and Streptococcus pyogenes. The F5 sequence aligned with a prophagic gene encoding the large subunit of a terminase, F7 aligned with a phage-associated cell wall hydrolase and a phage-associated lysin, and F10 aligned with a transcriptional regulator (ArpU family) and a phage-associated endonuclease. We first determined the prevalence of F5, F7, and F10 by PCR in a collection of 109 strains isolated in the 1980s and divided into two populations: one with a high risk of causing meningitis (HR group) and the other with a lower risk of causing meningitis (LR group). These fragments were significantly more prevalent in the HR group than in the LR group (P < 0.001). Our findings suggest that lysogeny has increased the ability of some S. agalactiae strains to invade the neonatal brain endothelium. We then determined the prevalence of F5, F7, and F10 by PCR in a collection of 40 strains recently isolated from neonatal meningitis cases for comparison with the cerebrospinal fluid (CSF) strains isolated in the 1980s. The prevalence of the three prophage DNA fragments was similar in these two populations isolated 15 years apart. We suggest that the prophage DNA fragments identified have remained stable in many CSF S. agalactiae strains, possibly due to their importance in virulence or fitness. PMID:16517893
The Kinetic Mechanism for DNA Unwinding by Multiple Molecules of Dda Helicase Aligned on DNA†
Eoff, Robert L.; Raney, Kevin D.
2010-01-01
Helicases catalyze the separation of double-stranded nucleic acids to form single-stranded intermediates. Using transient state kinetic methods we have determined the kinetic properties of DNA unwinding under conditions that favor a monomeric form of the Dda helicase as well as conditions that allow multiple molecules to function on the same substrate. Multiple helicase molecules can align like a train on the DNA track. The number of base pairs unwound in a single binding event for Dda is increased from ~19 bp for the monomeric form to ~64 bp when as many as four Dda molecules are aligned on the same substrate, while the kinetic step-size (3.2 ± 0.7 bp) and unwinding rate (242 ± 25 bp s−1) appear to be independent of the number of Dda molecules present on a given substrate. The data support a model in which the helicase molecules bound to the same substrate move along the DNA track independently during DNA unwinding. The observed increase in processivity arises from the increased probability that at least one of the helicases will completely unwind the DNA prior to dissociation. These results are in contrast to previous reports in which multiple Dda molecules on the same track greatly enhanced the rate and amplitude for displacement of protein blocks on the track. Therefore, only when the progress of the lead molecule in the train is impeded by some type of block, such as a protein bound to DNA, do the trailing molecules interact with the lead molecule in order to overcome the block. The fact that trailing helicase molecules have little impact on the lead molecule in the train during routine DNA unwinding suggests that the trailing molecules are moving at similar rates as the lead molecule. This result implicates a step in the translocation mechanism as contributing greatly to the overall rate-limiting step for unwinding of duplex DNA. PMID:20408588
Extracting DNA words based on the sequence features: non-uniform distribution and integrity.
Li, Zhi; Cao, Hongyan; Cui, Yuehua; Zhang, Yanbo
2016-01-25
DNA sequence can be viewed as an unknown language with words as its functional units. Given that most sequence alignment algorithms such as the motif discovery algorithms depend on the quality of background information about sequences, it is necessary to develop an ab initio algorithm for extracting the "words" based only on the DNA sequences. We considered that non-uniform distribution and integrity were two important features of a word, based on which we developed an ab initio algorithm to extract "DNA words" that have potential functional meaning. A Kolmogorov-Smirnov test was used for consistency test of uniform distribution of DNA sequences, and the integrity was judged by the sequence and position alignment. Two random base sequences were adopted as negative control, and an English book was used as positive control to verify our algorithm. We applied our algorithm to the genomes of Saccharomyces cerevisiae and 10 strains of Escherichia coli to show the utility of the methods. The results provide strong evidences that the algorithm is a promising tool for ab initio building a DNA dictionary. Our method provides a fast way for large scale screening of important DNA elements and offers potential insights into the understanding of a genome.
Parson, Walther; Strobl, Christina; Huber, Gabriela; Zimmermann, Bettina; Gomes, Sibylle M.; Souto, Luis; Fendt, Liane; Delport, Rhena; Langit, Reina; Wootton, Sharon; Lagacé, Robert; Irwin, Jodi
2013-01-01
Insights into the human mitochondrial phylogeny have been primarily achieved by sequencing full mitochondrial genomes (mtGenomes). In forensic genetics (partial) mtGenome information can be used to assign haplotypes to their phylogenetic backgrounds, which may, in turn, have characteristic geographic distributions that would offer useful information in a forensic case. In addition and perhaps even more relevant in the forensic context, haplogroup-specific patterns of mutations form the basis for quality control of mtDNA sequences. The current method for establishing (partial) mtDNA haplotypes is Sanger-type sequencing (STS), which is laborious, time-consuming, and expensive. With the emergence of Next Generation Sequencing (NGS) technologies, the body of available mtDNA data can potentially be extended much more quickly and cost-efficiently. Customized chemistries, laboratory workflows and data analysis packages could support the community and increase the utility of mtDNA analysis in forensics. We have evaluated the performance of mtGenome sequencing using the Personal Genome Machine (PGM) and compared the resulting haplotypes directly with conventional Sanger-type sequencing. A total of 64 mtGenomes (>1 million bases) were established that yielded high concordance with the corresponding STS haplotypes (<0.02% differences). About two-thirds of the differences were observed in or around homopolymeric sequence stretches. In addition, the sequence alignment algorithm employed to align NGS reads played a significant role in the analysis of the data and the resulting mtDNA haplotypes. Further development of alignment software would be desirable to facilitate the application of NGS in mtDNA forensic genetics. PMID:23948325
Reconstructing evolutionary trees in parallel for massive sequences.
Zou, Quan; Wan, Shixiang; Zeng, Xiangxiang; Ma, Zhanshan Sam
2017-12-14
Building the evolutionary trees for massive unaligned DNA sequences is challenging and crucial. However, reconstructing evolutionary tree for ultra-large sequences is hard. Massive multiple sequence alignment is also challenging and time/space consuming. Hadoop and Spark are developed recently, which bring spring light for the classical computational biology problems. In this paper, we tried to solve the multiple sequence alignment and evolutionary reconstruction in parallel. HPTree, which is developed in this paper, can deal with big DNA sequence files quickly. It works well on the >1GB files, and gets better performance than other evolutionary reconstruction tools. Users could use HPTree for reonstructing evolutioanry trees on the computer clusters or cloud platform (eg. Amazon Cloud). HPTree could help on population evolution research and metagenomics analysis. In this paper, we employ the Hadoop and Spark platform and design an evolutionary tree reconstruction software tool for unaligned massive DNA sequences. Clustering and multiple sequence alignment are done in parallel. Neighbour-joining model was employed for the evolutionary tree building. We opened our software together with source codes via http://lab.malab.cn/soft/HPtree/ .
Molecular Identification and Databases in Fusarium
USDA-ARS?s Scientific Manuscript database
DNA sequence-based methods for identifying pathogenic and mycotoxigenic Fusarium isolates have become the gold standard worldwide. Moreover, fusarial DNA sequence data are increasing rapidly in several web-accessible databases for comparative purposes. Unfortunately, the use of Basic Alignment Sea...
Gentle Masking of Low-Complexity Sequences Improves Homology Search
Frith, Martin C.
2011-01-01
Detection of sequences that are homologous, i.e. descended from a common ancestor, is a fundamental task in computational biology. This task is confounded by low-complexity tracts (such as atatatatatat), which arise frequently and independently, causing strong similarities that are not homologies. There has been much research on identifying low-complexity tracts, but little research on how to treat them during homology search. We propose to find homologies by aligning sequences with “gentle” masking of low-complexity tracts. Gentle masking means that the match score involving a masked letter is , where is the unmasked score. Gentle masking slightly but noticeably improves the sensitivity of homology search (compared to “harsh” masking), without harming specificity. We show examples in three useful homology search problems: detection of NUMTs (nuclear copies of mitochondrial DNA), recruitment of metagenomic DNA reads to reference genomes, and pseudogene detection. Gentle masking is currently the best way to treat low-complexity tracts during homology search. PMID:22205972
Applications of alignment-free methods in epigenomics.
Pinello, Luca; Lo Bosco, Giosuè; Yuan, Guo-Cheng
2014-05-01
Epigenetic mechanisms play an important role in the regulation of cell type-specific gene activities, yet how epigenetic patterns are established and maintained remains poorly understood. Recent studies have supported a role of DNA sequences in recruitment of epigenetic regulators. Alignment-free methods have been applied to identify distinct sequence features that are associated with epigenetic patterns and to predict epigenomic profiles. Here, we review recent advances in such applications, including the methods to map DNA sequence to feature space, sequence comparison and prediction models. Computational studies using these methods have provided important insights into the epigenetic regulatory mechanisms.
Tamang, Abiral; Ghosh, Sujoy Kumar; Garain, Samiran; Alam, Md Mehebub; Haeberle, Jörg; Henkel, Karsten; Schmeisser, Dieter; Mandal, Dipankar
2015-08-05
A flexible nanogenerator (NG) is fabricated with a poly(vinylidene fluoride) (PVDF) film, where deoxyribonucleic acid (DNA) is the agent for the electroactive β-phase nucleation. Denatured DNA is co-operating to align the molecular -CH2/-CF2 dipoles of PVDF causing piezoelectricity without electrical poling. The NG is capable of harvesting energy from a variety of easily accessible mechanical stress such as human touch, machine vibration, football juggling, and walking. The NG exhibits high piezoelectric energy conversion efficiency facilitating the instant turn-on of several green or blue light-emitting diodes. The generated energy can be used to charge capacitors providing a wide scope for the design of self-powered portable devices.
NASA Astrophysics Data System (ADS)
Novianti, T.; Sadikin, M.; Widia, S.; Juniantito, V.; Arida, E. A.
2018-03-01
Development of unidentified specific gene is essential to analyze the availability these genes in biological process. Identification unidentified specific DNA of HIF 1α genes is important to analyze their contribution in tissue regeneration process in lizard tail (Hemidactylus platyurus). Bioinformatics and PCR techniques are relatively an easier method to identify an unidentified gene. The most widely used method is BLAST (Basic Local Alignment Sequence Tools) method for alignment the sequences from the other organism. BLAST technique is online software from website https://blast.ncbi.nlm.nih.gov/Blast.cgi that capable to generate the similar sequences from closest kinship to distant kindship. Gecko japonicus is a species that it has closest kinship with H. platyurus. Comparing HIF 1 α gene sequence of G. japonicus with the other species used multiple alignment methods from Mega7 software. Conserved base areas were identified using Clustal IX method. Primary DNA of HIF 1 α gene was design by Primer3 software. HIF 1α gene of lizard (H. platyurus) was successfully amplified using a real-time PCR machine by primary DNA that we had designed from Gecko japonicus. Identification unidentified gene of HIF 1a lizard has been done successfully with multiple alignment method. The study was conducted by analyzing during the growth of tail on day 1, 3, 5, 7, 10, 13 and 17 of lizard tail after autotomy. Process amplification of HIF 1α gene was described by CT value in real time PCR machine. HIF 1α expression of gene is quantified by Livak formula. Chi-square statistic test is 0.000 which means that there is a different expression of HIF 1 α gene in every growth day treatment.
DNA looping by FokI: the impact of synapse geometry on loop topology at varied site orientations
Rusling, David A.; Laurens, Niels; Pernstich, Christian; Wuite, Gijs J. L.; Halford, Stephen E.
2012-01-01
Most restriction endonucleases, including FokI, interact with two copies of their recognition sequence before cutting DNA. On DNA with two sites they act in cis looping out the intervening DNA. While many restriction enzymes operate symmetrically at palindromic sites, FokI acts asymmetrically at a non-palindromic site. The directionality of its sequence means that two FokI sites can be bridged in either parallel or anti-parallel alignments. Here we show by biochemical and single-molecule biophysical methods that FokI aligns two recognition sites on separate DNA molecules in parallel and that the parallel arrangement holds for sites in the same DNA regardless of whether they are in inverted or repeated orientations. The parallel arrangement dictates the topology of the loop trapped between sites in cis: the loop from inverted sites has a simple 180° bend, while that with repeated sites has a convoluted 360° turn. The ability of FokI to act at asymmetric sites thus enabled us to identify the synapse geometry for sites in trans and in cis, which in turn revealed the relationship between synapse geometry and loop topology. PMID:22362745
Dunnican, Ward J; Singh, T Paul; Ata, Ashar; Bendana, Emma E; Conlee, Thomas D; Dolce, Charles J; Ramakrishnan, Rakesh
2010-06-01
Reverse alignment (mirror image) visualization is a disconcerting situation occasionally faced during laparoscopic operations. This occurs when the camera faces back at the surgeon in the opposite direction from which the surgeon's body and instruments are facing. Most surgeons will attempt to optimize trocar and camera placement to avoid this situation. The authors' objective was to determine whether the intentional use of reverse alignment visualization during laparoscopic training would improve performance. A standard box trainer was configured for reverse alignment, and 34 medical students and junior surgical residents were randomized to train with either forward alignment (DIRECT) or reverse alignment (MIRROR) visualization. Enrollees were tested on both modalities before and after a 4-week structured training program specific to their modality. Student's t test was used to determine differences in task performance between the 2 groups. Twenty-one participants completed the study (10 DIRECT, 11 MIRROR). There were no significant differences in performance time between DIRECT or MIRROR participants during forward or reverse alignment initial testing. At final testing, DIRECT participants had improved times only in forward alignment performance; they demonstrated no significant improvement in reverse alignment performance. MIRROR participants had significant time improvement in both forward and reverse alignment performance at final testing. Reverse alignment imaging for laparoscopic training improves task performance for both reverse alignment and forward alignment tasks. This may be translated into improved performance in the operating room when faced with reverse alignment situations. Minimal lab training can account for drastic adaptation to this environment.
Bae, Dong Geun; Jeong, Ji-Eun; Kang, Seok Hee; Byun, Myunghwan; Han, Dong-Wook; Lin, Zhiqun; Woo, Han Young; Hong, Suck Won
2016-08-01
DNA molecules have been widely recognized as promising building blocks for constructing functional nanostructures with two main features, that is, self-assembly and rich chemical functionality. The intrinsic feature size of DNA makes it attractive for creating versatile nanostructures. Moreover, the ease of access to tune the surface of DNA by chemical functionalization offers numerous opportunities for many applications. Herein, a simple yet robust strategy is developed to yield the self-assembly of DNA by exploiting controlled evaporative assembly of DNA solution in a unique confined geometry. Intriguingly, depending on the concentration of DNA solution, highly aligned nanostructured fibrillar-like arrays and well-positioned concentric ring-like superstructures composed of DNAs are formed. Subsequently, the ring-like negatively charged DNA superstructures are employed as template to produce conductive organic nanowires on a silicon substrate by complexing with a positively charged conjugated polyelectrolyte poly[9,9-bis(6'-N,N,N-trimethylammoniumhexyl)fluorene dibromide] (PF2) through the strong electrostatic interaction. Finally, a monolithic integration of aligned arrays of DNA-templated PF2 nanowires to yield two DNA/PF2-based devices is demonstrated. It is envisioned that this strategy can be readily extended to pattern other biomolecules and may render a broad range of potential applications from the nucleotide sequence and hybridization as recognition events to transducing elements in chemical sensors. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
BEAUTY-X: enhanced BLAST searches for DNA queries.
Worley, K C; Culpepper, P; Wiese, B A; Smith, R F
1998-01-01
BEAUTY (BLAST Enhanced Alignment Utility) is an enhanced version of the BLAST database search tool that facilitates identification of the functions of matched sequences. Three recent improvements to the BEAUTY program described here make the enhanced output (1) available for DNA queries, (2) available for searches of any protein database, and (3) more up-to-date, with periodic updates of the domain information. BEAUTY searches of the NCBI and EMBL non-redundant protein sequence databases are available from the BCM Search Launcher Web pages (http://gc.bcm.tmc. edu:8088/search-launcher/launcher.html). BEAUTY Post-Processing of submitted search results is available using the BCM Search Launcher Batch Client (version 2.6) (ftp://gc.bcm.tmc. edu/pub/software/search-launcher/). Example figures are available at http://dot.bcm.tmc. edu:9331/papers/beautypp.html (kworley,culpep)@bcm.tmc.edu
Phylo-VISTA: Interactive visualization of multiple DNA sequence alignments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shah, Nameeta; Couronne, Olivier; Pennacchio, Len A.
The power of multi-sequence comparison for biological discovery is well established. The need for new capabilities to visualize and compare cross-species alignment data is intensified by the growing number of genomic sequence datasets being generated for an ever-increasing number of organisms. To be efficient these visualization algorithms must support the ability to accommodate consistently a wide range of evolutionary distances in a comparison framework based upon phylogenetic relationships. Results: We have developed Phylo-VISTA, an interactive tool for analyzing multiple alignments by visualizing a similarity measure for multiple DNA sequences. The complexity of visual presentation is effectively organized using a frameworkmore » based upon interspecies phylogenetic relationships. The phylogenetic organization supports rapid, user-guided interspecies comparison. To aid in navigation through large sequence datasets, Phylo-VISTA leverages concepts from VISTA that provide a user with the ability to select and view data at varying resolutions. The combination of multiresolution data visualization and analysis, combined with the phylogenetic framework for interspecies comparison, produces a highly flexible and powerful tool for visual data analysis of multiple sequence alignments. Availability: Phylo-VISTA is available at http://www-gsd.lbl. gov/phylovista. It requires an Internet browser with Java Plugin 1.4.2 and it is integrated into the global alignment program LAGAN at http://lagan.stanford.edu« less
Dzikowski, R; Levy, M G; Poore, M F; Flowers, J R; Paperna, I
2004-04-01
Infections by metacercariae of Clinostomum (Leidy, 1856) species adversely affect aquacultured fish and are potentially transmissible to humans. Molecular methodologies are efficient tools, which enable diagnosis of all life-history stages of trematodes in their diverse hosts. The small subunit of ribosomal DNA genes of adults of the Old World Clinostomum complanatum (Rudolphi, 1819) and the New World Clinostomum marginatum (Rudolphi, 1819), obtained from a little egret Egretta garzetta (Linnaeus, 1766) and the great blue heron Ardea herodias (Linnaeus, 1758), respectively, were amplified, sequenced, and aligned. The resulting alignment was used to develop a genetic assay to differentiate between these species.
Fischer, Christiane; Daniel, Rolf; Wubet, Tesfaye
2012-01-01
The ribosomal DNA comprised of the ITS1-5.8S-ITS2 regions is widely used as a fungal marker in molecular ecology and systematics but cannot be aligned with confidence across genetically distant taxa. In order to study the diversity of Agaricomycotina in forest soils, we designed primers targeting the more alignable 28S (LSU) gene, which should be more useful for phylogenetic analyses of the detected taxa. This paper compares the performance of the established ITS1F/4B primer pair, which targets basidiomycetes, to that of two new pairs. Key factors in the comparison were the diversity covered, off-target amplification, rarefaction at different Operational Taxonomic Unit (OTU) cutoff levels, sensitivity of the method used to process the alignment to missing data and insecure positional homology, and the congruence of monophyletic clades with OTU assignments and BLAST-derived OTU names. The ITS primer pair yielded no off-target amplification but also exhibited the least fidelity to the expected phylogenetic groups. The LSU primers give complementary pictures of diversity, but were more sensitive to modifications of the alignment such as the removal of difficult-to align stretches. The LSU primers also yielded greater numbers of singletons but also had a greater tendency to produce OTUs containing sequences from a wider variety of species as judged by BLAST similarity. We introduced some new parameters to describe alignment heterogeneity based on Shannon entropy and the extent and contents of the OTUs in a phylogenetic tree space. Our results suggest that ITS should not be used when calculating phylogenetic trees from genetically distant sequences obtained from environmental DNA extractions and that it is inadvisable to define OTUs on the basis of very heterogeneous alignments. PMID:22363808
A generalized global alignment algorithm.
Huang, Xiaoqiu; Chao, Kun-Mao
2003-01-22
Homologous sequences are sometimes similar over some regions but different over other regions. Homologous sequences have a much lower global similarity if the different regions are much longer than the similar regions. We present a generalized global alignment algorithm for comparing sequences with intermittent similarities, an ordered list of similar regions separated by different regions. A generalized global alignment model is defined to handle sequences with intermittent similarities. A dynamic programming algorithm is designed to compute an optimal general alignment in time proportional to the product of sequence lengths and in space proportional to the sum of sequence lengths. The algorithm is implemented as a computer program named GAP3 (Global Alignment Program Version 3). The generalized global alignment model is validated by experimental results produced with GAP3 on both DNA and protein sequences. The GAP3 program extends the ability of standard global alignment programs to recognize homologous sequences of lower similarity. The GAP3 program is freely available for academic use at http://bioinformatics.iastate.edu/aat/align/align.html.
NASA Astrophysics Data System (ADS)
Oiwa, Nestor; Cordeiro, Claudette; Heermann, Dieter
2016-05-01
Instead of ATCG letter alignments, typically used in bioinformatics, we propose a new alignment method using the probability distribution function of the bottom of the occupied molecular orbital (BOMO), highest occupied molecular orbital (HOMO) and lowest unoccupied orbital (LUMO). We apply the technique to transcription factors with Cys2His2 zinc fingers. These transcription factors search for binding sites, probing for the electronic patterns at the minor and major DNA groves. The eukaryotic Cys2His2 zinc finger proteins bind to DNA ubiquitously at highly conserved domains. They are responsible for gene regulation and the spatial organization of DNA. To study and understand these zinc finger DNA-protein interactions, we use the extended ladder in the DNA model proposed by Zhu, Rasmussen, Balatsky & Bishop (2007) te{Zhu-2007}. Considering one single spinless electron in each nucleotide π-orbital along a double DNA chain (dDNA), we find a typical pattern for the bottom of BOMO, HOMO and LUMO along the binding sites. We specifically looked at two members of zinc finger protein family: specificity protein 1 (SP1) and early grown response 1 transcription factors (EGR1). When the valence band is filled, we find electrons in the purines along the nucleotide sequence, compatible with the electric charges of the binding amino acids in SP1 and EGR1 zinc finger.
Xu, Yi-Hua; Manoharan, Herbert T; Pitot, Henry C
2007-09-01
The bisulfite genomic sequencing technique is one of the most widely used techniques to study sequence-specific DNA methylation because of its unambiguous ability to reveal DNA methylation status to the order of a single nucleotide. One characteristic feature of the bisulfite genomic sequencing technique is that a number of sample sequence files will be produced from a single DNA sample. The PCR products of bisulfite-treated DNA samples cannot be sequenced directly because they are heterogeneous in nature; therefore they should be cloned into suitable plasmids and then sequenced. This procedure generates an enormous number of sample DNA sequence files as well as adding extra bases belonging to the plasmids to the sequence, which will cause problems in the final sequence comparison. Finding the methylation status for each CpG in each sample sequence is not an easy job. As a result CpG PatternFinder was developed for this purpose. The main functions of the CpG PatternFinder are: (i) to analyze the reference sequence to obtain CpG and non-CpG-C residue position information. (ii) To tailor sample sequence files (delete insertions and mark deletions from the sample sequence files) based on a configuration of ClustalW multiple alignment. (iii) To align sample sequence files with a reference file to obtain bisulfite conversion efficiency and CpG methylation status. And, (iv) to produce graphics, highlighted aligned sequence text and a summary report which can be easily exported to Microsoft Office suite. CpG PatternFinder is designed to operate cooperatively with BioEdit, a freeware on the internet. It can handle up to 100 files of sample DNA sequences simultaneously, and the total CpG pattern analysis process can be finished in minutes. CpG PatternFinder is an ideal software tool for DNA methylation studies to determine the differential methylation pattern in a large number of individuals in a population. Previously we developed the CpG Analyzer program; CpG PatternFinder is our further effort to create software tools for DNA methylation studies.
Liston, A; Robinson, W A; Piñero, D; Alvarez-Buylla, E R
1999-02-01
A 650-bp portion of the nuclear ribosomal DNA internal transcribed spacer region was sequenced in 47 species of Pinus, representing all recognized subsections of the genus, and 2 species of Picea and Cathaya as outgroups. Parsimony analyses of these length variable sequences were conducted using a manual alignment, 13 different automated alignments, elision of the automated alignments, and exclusion of all alignment ambiguous sites. High and moderately supported clades were consistently resolved across the different analyses, while poorly supported clades were inconsistently recovered. Comparison of the topologies highlights taxa of particularly problematic placement including Pinus nelsonii and P. aristata. Within subgenus Pinus, there is moderate support for the monophyly of a narrowly circumscribed subsect. Pinus (=subsect. Sylvestres) and strong support for a clade of North and Central American hard pines. The Himalayan P. roxburghii may be sister species to these "New World hard pines," which have two well-supported subgroups, subsect. Ponderosae and a clade of the remaining five subsections. The position of subsect. Contortae conflicts with its placement in a chloroplast DNA restriction site study. Within subgenus Strobus there is consistent support for the monophyly of a broadly circumscribed subsect. Strobi (including P. krempfii and a polyphyletic subsect. Cembrae) derived from a paraphyletic grade of the remaining soft pines. Relationships among subsects. Gerardianae, Cembroides, and Balfourianae are poorly resolved. Support for the monophyly of subgenus Pinus and subgenus Strobus is not consistently obtained. Copyright 1999 Academic Press.
Absence of ancient DNA in sub-fossil insect inclusions preserved in 'Anthropocene' Colombian copal.
Penney, David; Wadsworth, Caroline; Fox, Graeme; Kennedy, Sandra L; Preziosi, Richard F; Brown, Terence A
2013-01-01
Insects preserved in copal, the sub-fossilized resin precursor of amber, have potential value in molecular ecological studies of recently-extinct species and of extant species that have never been collected as living specimens. The objective of the work reported in this paper was therefore to determine if ancient DNA is present in insects preserved in copal. We prepared DNA libraries from two stingless bees (Apidae: Meliponini: Trigonisca ameliae) preserved in 'Anthropocene' Colombian copal, dated to 'post-Bomb' and 10,612±62 cal yr BP, respectively, and obtained sequence reads using the GS Junior 454 System. Read numbers were low, but were significantly higher for DNA extracts prepared from crushed insects compared with extracts obtained by a non-destructive method. The younger specimen yielded sequence reads up to 535 nucleotides in length, but searches of these sequences against the nucleotide database revealed very few significant matches. None of these hits was to stingless bees though one read of 97 nucleotides aligned with two non-contiguous segments of the mitochondrial cytochrome oxidase subunit I gene of the East Asia bumblebee Bombus hypocrita. The most significant hit was for 452 nucleotides of a 470-nucleotide read that aligned with part of the genome of the root-nodulating bacterium Bradyrhizobium japonicum. The other significant hits were to proteobacteria and an actinomycete. Searches directed specifically at Apidae nucleotide sequences only gave short and insignificant alignments. All of the reads from the older specimen appeared to be artefacts. We were therefore unable to obtain any convincing evidence for the preservation of ancient DNA in either of the two copal inclusions that we studied, and conclude that DNA is not preserved in this type of material. Our results raise further doubts about claims of DNA extraction from fossil insects in amber, many millions of years older than copal.
Absence of Ancient DNA in Sub-Fossil Insect Inclusions Preserved in ‘Anthropocene’ Colombian Copal
Penney, David; Wadsworth, Caroline; Fox, Graeme; Kennedy, Sandra L.; Preziosi, Richard F.; Brown, Terence A.
2013-01-01
Insects preserved in copal, the sub-fossilized resin precursor of amber, have potential value in molecular ecological studies of recently-extinct species and of extant species that have never been collected as living specimens. The objective of the work reported in this paper was therefore to determine if ancient DNA is present in insects preserved in copal. We prepared DNA libraries from two stingless bees (Apidae: Meliponini: Trigonisca ameliae) preserved in ‘Anthropocene’ Colombian copal, dated to ‘post-Bomb’ and 10,612±62 cal yr BP, respectively, and obtained sequence reads using the GS Junior 454 System. Read numbers were low, but were significantly higher for DNA extracts prepared from crushed insects compared with extracts obtained by a non-destructive method. The younger specimen yielded sequence reads up to 535 nucleotides in length, but searches of these sequences against the nucleotide database revealed very few significant matches. None of these hits was to stingless bees though one read of 97 nucleotides aligned with two non-contiguous segments of the mitochondrial cytochrome oxidase subunit I gene of the East Asia bumblebee Bombus hypocrita. The most significant hit was for 452 nucleotides of a 470-nucleotide read that aligned with part of the genome of the root-nodulating bacterium Bradyrhizobium japonicum. The other significant hits were to proteobacteria and an actinomycete. Searches directed specifically at Apidae nucleotide sequences only gave short and insignificant alignments. All of the reads from the older specimen appeared to be artefacts. We were therefore unable to obtain any convincing evidence for the preservation of ancient DNA in either of the two copal inclusions that we studied, and conclude that DNA is not preserved in this type of material. Our results raise further doubts about claims of DNA extraction from fossil insects in amber, many millions of years older than copal. PMID:24039876
Direct Observation of Azimuthal Correlations between DNA in Hydrated Aggregates
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kornyshev, Alexei A.; Lee, Dominic J.; Wynveen, Aaron
2005-09-30
This study revisits the classical x-ray diffraction patterns from hydrated, noncrystalline fibers originally used to establish the helical structure of DNA. We argue that changes in these diffraction patterns with DNA packing density reveal strong azimuthally dependent interactions between adjacent molecules up to {approx}40 A interaxial or {approx}20 A surface-to-surface separations. These interactions appear to force significant torsional 'straightening' of DNA and strong azimuthal alignment of nearest neighbor molecules. The results are in good agreement with the predictions of recent theoretical models relating DNA-DNA interactions to the helical symmetry of their surface charge patterns.
Biologically inspired EM image alignment and neural reconstruction.
Knowles-Barley, Seymour; Butcher, Nancy J; Meinertzhagen, Ian A; Armstrong, J Douglas
2011-08-15
Three-dimensional reconstruction of consecutive serial-section transmission electron microscopy (ssTEM) images of neural tissue currently requires many hours of manual tracing and annotation. Several computational techniques have already been applied to ssTEM images to facilitate 3D reconstruction and ease this burden. Here, we present an alternative computational approach for ssTEM image analysis. We have used biologically inspired receptive fields as a basis for a ridge detection algorithm to identify cell membranes, synaptic contacts and mitochondria. Detected line segments are used to improve alignment between consecutive images and we have joined small segments of membrane into cell surfaces using a dynamic programming algorithm similar to the Needleman-Wunsch and Smith-Waterman DNA sequence alignment procedures. A shortest path-based approach has been used to close edges and achieve image segmentation. Partial reconstructions were automatically generated and used as a basis for semi-automatic reconstruction of neural tissue. The accuracy of partial reconstructions was evaluated and 96% of membrane could be identified at the cost of 13% false positive detections. An open-source reference implementation is available in the Supplementary information. seymour.kb@ed.ac.uk; douglas.armstrong@ed.ac.uk Supplementary data are available at Bioinformatics online.
AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide
2015-11-19
Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database in which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. This database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.
Gopal, J; Yebra, M J; Bhagwat, A S
1994-01-01
The methyltransferase (MTase) in the DsaV restriction--modification system methylates within 5'-CCNGG sequences. We have cloned the gene for this MTase and determined its sequence. The predicted sequence of the MTase protein contains sequence motifs conserved among all cytosine-5 MTases and is most similar to other MTases that methylate CCNGG sequences, namely M.ScrFI and M.SsoII. All three MTases methylate the internal cytosine within their recognition sequence. The 'variable' region within the three enzymes that methylate CCNGG can be aligned with the sequences of two enzymes that methylate CCWGG sequences. Remarkably, two segments within this region contain significant similarity with the region of M.HhaI that is known to contact DNA bases. These alignments suggest that many cytosine-5 MTases are likely to interact with DNA using a similar structural framework. Images PMID:7971279
Mapping the yeast genome by melting in nanofluidic devices
NASA Astrophysics Data System (ADS)
Welch, Robert L.; Czolkos, Ilja; Sladek, Rob; Reisner, Walter
2012-02-01
Optical mapping of DNA provides large-scale genomic information that can be used to assemble contigs from next-generation sequencing, and to detect re-arrangements between single cells. A recent optical mapping technique called denaturation mapping has the unique advantage of using physical principles rather than the action of enzymes to probe genomic structure. The absence of reagents or reaction steps makes denaturation mapping simpler than other protocols. Denaturation mapping uses fluorescence microscopy to image the pattern of partial melting along a DNA molecule extended in a channel of cross-section ˜100nm at the heart of a nanofluidic device. We successfully aligned melting maps from single DNA molecules to a theoretical map of the yeast genome (11.6Mbp) to identify their location. By aligning hundreds of molecules we assembled a consensus melting map of the yeast genome with 95% coverage.
Rumen Microbiome, Probiotics, and Fermentation Additives.
McCann, Joshua C; Elolimy, Ahmed A; Loor, Juan J
2017-11-01
Fermentation of a variety of feedstuffs by the ruminal microbiome is the distinctive feature of the ruminant digestive tract. The host derives energy and nutrients from microbiome activity; these organisms are essential to survival. Advances in DNA sequencing and bioinformatics have redefined the rumen microbial community. Current research seeks to connect our understanding of the rumen microbiome with nutritional strategies in ruminant livestock systems and their associated digestive disorders. These efforts align with a growing number of products designed to improve ruminal fermentation to benefit the overall efficiency of ruminant livestock production and health. Copyright © 2017 Elsevier Inc. All rights reserved.
Oligo Design: a computer program for development of probes for oligonucleotide microarrays.
Herold, Keith E; Rasooly, Avraham
2003-12-01
Oligonucleotide microarrays have demonstrated potential for the analysis of gene expression, genotyping, and mutational analysis. Our work focuses primarily on the detection and identification of bacteria based on known short sequences of DNA. Oligo Design, the software described here, automates several design aspects that enable the improved selection of oligonucleotides for use with microarrays for these applications. Two major features of the program are: (i) a tiling algorithm for the design of short overlapping temperature-matched oligonucleotides of variable length, which are useful for the analysis of single nucleotide polymorphisms and (ii) a set of tools for the analysis of multiple alignments of gene families and related short DNA sequences, which allow for the identification of conserved DNA sequences for PCR primer selection and variable DNA sequences for the selection of unique probes for identification. Note that the program does not address the full genome perspective but, instead, is focused on the genetic analysis of short segments of DNA. The program is Internet-enabled and includes a built-in browser and the automated ability to download sequences from GenBank by specifying the GI number. The program also includes several utilities, including audio recital of a DNA sequence (useful for verifying sequences against a written document), a random sequence generator that provides insight into the relationship between melting temperature and GC content, and a PCR calculator.
Functionalization of quantum rods with oligonucleotides for programmable assembly with DNA origami
NASA Astrophysics Data System (ADS)
Doane, Tennyson L.; Alam, Rabeka; Maye, Mathew M.
2015-02-01
The DNA-mediated self-assembly of CdSe/CdS quantum rods (QRs) onto DNA origami is described. Two QR types with unique optical emission and high polarization were synthesized, and then functionalized with oligonucleotides (ssDNA) using a novel protection-deprotection approach, which harnessed ssDNA's tailorable rigidity and denaturation temperature to increase DNA coverage by reducing non-specific coordination and wrapping. The QR assembly was programmable, and occurred at two different assembly zones that had capture strands in parallel alignment. QRs with different optical properties were assembled, opening up future studies on orientation dependent QR FRET. The QR-origami conjugates could be purified via gel electrophoresis and sucrose gradient ultracentrifugation. Assembly yields, QR stoichiometry and orientation, as well as energy transfer implications were studied in light of QR distances, origami flexibility, and conditions.The DNA-mediated self-assembly of CdSe/CdS quantum rods (QRs) onto DNA origami is described. Two QR types with unique optical emission and high polarization were synthesized, and then functionalized with oligonucleotides (ssDNA) using a novel protection-deprotection approach, which harnessed ssDNA's tailorable rigidity and denaturation temperature to increase DNA coverage by reducing non-specific coordination and wrapping. The QR assembly was programmable, and occurred at two different assembly zones that had capture strands in parallel alignment. QRs with different optical properties were assembled, opening up future studies on orientation dependent QR FRET. The QR-origami conjugates could be purified via gel electrophoresis and sucrose gradient ultracentrifugation. Assembly yields, QR stoichiometry and orientation, as well as energy transfer implications were studied in light of QR distances, origami flexibility, and conditions. Electronic supplementary information (ESI) available: Experimental conditions, DNA origami blueprint and sequences, FRET calculations. Additional Fig. S1-S13. See DOI: 10.1039/c4nr07662a
Adachi, Noboru; Umetsu, Kazuo; Shojo, Hideki
2014-01-01
Mitochondrial DNA (mtDNA) is widely used for DNA analysis of highly degraded samples because of its polymorphic nature and high number of copies in a cell. However, as endogenous mtDNA in deteriorated samples is scarce and highly fragmented, it is not easy to obtain reliable data. In the current study, we report the risks of direct sequencing mtDNA in highly degraded material, and suggest a strategy to ensure the quality of sequencing data. It was observed that direct sequencing data of the hypervariable segment (HVS) 1 by using primer sets that generate an amplicon of 407 bp (long-primer sets) was different from results obtained by using newly designed primer sets that produce an amplicon of 120-139 bp (mini-primer sets). The data aligned with the results of mini-primer sets analysis in an amplicon length-dependent manner; the shorter the amplicon, the more evident the endogenous sequence became. Coding region analysis using multiplex amplified product-length polymorphisms revealed the incongruence of single nucleotide polymorphisms between the coding region and HVS 1 caused by contamination with exogenous mtDNA. Although the sequencing data obtained using long-primer sets turned out to be erroneous, it was unambiguous and reproducible. These findings suggest that PCR primers that produce amplicons shorter than those currently recognized should be used for mtDNA analysis in highly degraded samples. Haplogroup motif analysis of the coding region and HVS should also be performed to improve the reliability of forensic mtDNA data. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Majoros, William H; Ohler, Uwe
2010-12-16
The computational detection of regulatory elements in DNA is a difficult but important problem impacting our progress in understanding the complex nature of eukaryotic gene regulation. Attempts to utilize cross-species conservation for this task have been hampered both by evolutionary changes of functional sites and poor performance of general-purpose alignment programs when applied to non-coding sequence. We describe a new and flexible framework for modeling binding site evolution in multiple related genomes, based on phylogenetic pair hidden Markov models which explicitly model the gain and loss of binding sites along a phylogeny. We demonstrate the value of this framework for both the alignment of regulatory regions and the inference of precise binding-site locations within those regions. As the underlying formalism is a stochastic, generative model, it can also be used to simulate the evolution of regulatory elements. Our implementation is scalable in terms of numbers of species and sequence lengths and can produce alignments and binding-site predictions with accuracy rivaling or exceeding current systems that specialize in only alignment or only binding-site prediction. We demonstrate the validity and power of various model components on extensive simulations of realistic sequence data and apply a specific model to study Drosophila enhancers in as many as ten related genomes and in the presence of gain and loss of binding sites. Different models and modeling assumptions can be easily specified, thus providing an invaluable tool for the exploration of biological hypotheses that can drive improvements in our understanding of the mechanisms and evolution of gene regulation.
cljam: a library for handling DNA sequence alignment/map (SAM) with parallel processing.
Takeuchi, Toshiki; Yamada, Atsuo; Aoki, Takashi; Nishimura, Kunihiro
2016-01-01
Next-generation sequencing can determine DNA bases and the results of sequence alignments are generally stored in files in the Sequence Alignment/Map (SAM) format and the compressed binary version (BAM) of it. SAMtools is a typical tool for dealing with files in the SAM/BAM format. SAMtools has various functions, including detection of variants, visualization of alignments, indexing, extraction of parts of the data and loci, and conversion of file formats. It is written in C and can execute fast. However, SAMtools requires an additional implementation to be used in parallel with, for example, OpenMP (Open Multi-Processing) libraries. For the accumulation of next-generation sequencing data, a simple parallelization program, which can support cloud and PC cluster environments, is required. We have developed cljam using the Clojure programming language, which simplifies parallel programming, to handle SAM/BAM data. Cljam can run in a Java runtime environment (e.g., Windows, Linux, Mac OS X) with Clojure. Cljam can process and analyze SAM/BAM files in parallel and at high speed. The execution time with cljam is almost the same as with SAMtools. The cljam code is written in Clojure and has fewer lines than other similar tools.
rVISTA 2.0: Evolutionary Analysis of Transcription Factor Binding Sites
DOE Office of Scientific and Technical Information (OSTI.GOV)
Loots, G G; Ovcharenko, I
2004-01-28
Identifying and characterizing the patterns of DNA cis-regulatory modules represents a challenge that has the potential to reveal the regulatory language the genome uses to dictate transcriptional dynamics. Several studies have demonstrated that regulatory modules are under positive selection and therefore are often conserved between related species. Using this evolutionary principle we have created a comparative tool, rVISTA, for analyzing the regulatory potential of noncoding sequences. The rVISTA tool combines transcription factor binding site (TFBS) predictions, sequence comparisons and cluster analysis to identify noncoding DNA regions that are highly conserved and present in a specific configuration within an alignment. Heremore » we present the newly developed version 2.0 of the rVISTA tool that can process alignments generated by both zPicture and PipMaker alignment programs or use pre-computed pairwise alignments of seven vertebrate genomes available from the ECR Browser. The rVISTA web server is closely interconnected with the TRANSFAC database, allowing users to either search for matrices present in the TRANSFAC library collection or search for user-defined consensus sequences. rVISTA tool is publicly available at http://rvista.dcode.org/.« less
Single-molecule optical genome mapping of a human HapMap and a colorectal cancer cell line.
Teo, Audrey S M; Verzotto, Davide; Yao, Fei; Nagarajan, Niranjan; Hillmer, Axel M
2015-01-01
Next-generation sequencing (NGS) technologies have changed our understanding of the variability of the human genome. However, the identification of genome structural variations based on NGS approaches with read lengths of 35-300 bases remains a challenge. Single-molecule optical mapping technologies allow the analysis of DNA molecules of up to 2 Mb and as such are suitable for the identification of large-scale genome structural variations, and for de novo genome assemblies when combined with short-read NGS data. Here we present optical mapping data for two human genomes: the HapMap cell line GM12878 and the colorectal cancer cell line HCT116. High molecular weight DNA was obtained by embedding GM12878 and HCT116 cells, respectively, in agarose plugs, followed by DNA extraction under mild conditions. Genomic DNA was digested with KpnI and 310,000 and 296,000 DNA molecules (≥ 150 kb and 10 restriction fragments), respectively, were analyzed per cell line using the Argus optical mapping system. Maps were aligned to the human reference by OPTIMA, a new glocal alignment method. Genome coverage of 6.8× and 5.7× was obtained, respectively; 2.9× and 1.7× more than the coverage obtained with previously available software. Optical mapping allows the resolution of large-scale structural variations of the genome, and the scaffold extension of NGS-based de novo assemblies. OPTIMA is an efficient new alignment method; our optical mapping data provide a resource for genome structure analyses of the human HapMap reference cell line GM12878, and the colorectal cancer cell line HCT116.
Iterative refinement of structure-based sequence alignments by Seed Extension
Kim, Changhoon; Tai, Chin-Hsien; Lee, Byungkook
2009-01-01
Background Accurate sequence alignment is required in many bioinformatics applications but, when sequence similarity is low, it is difficult to obtain accurate alignments based on sequence similarity alone. The accuracy improves when the structures are available, but current structure-based sequence alignment procedures still mis-align substantial numbers of residues. In order to correct such errors, we previously explored the possibility of replacing the residue-based dynamic programming algorithm in structure alignment procedures with the Seed Extension algorithm, which does not use a gap penalty. Here, we describe a new procedure called RSE (Refinement with Seed Extension) that iteratively refines a structure-based sequence alignment. Results RSE uses SE (Seed Extension) in its core, which is an algorithm that we reported recently for obtaining a sequence alignment from two superimposed structures. The RSE procedure was evaluated by comparing the correctly aligned fractions of residues before and after the refinement of the structure-based sequence alignments produced by popular programs. CE, DaliLite, FAST, LOCK2, MATRAS, MATT, TM-align, SHEBA and VAST were included in this analysis and the NCBI's CDD root node set was used as the reference alignments. RSE improved the average accuracy of sequence alignments for all programs tested when no shift error was allowed. The amount of improvement varied depending on the program. The average improvements were small for DaliLite and MATRAS but about 5% for CE and VAST. More substantial improvements have been seen in many individual cases. The additional computation times required for the refinements were negligible compared to the times taken by the structure alignment programs. Conclusion RSE is a computationally inexpensive way of improving the accuracy of a structure-based sequence alignment. It can be used as a standalone procedure following a regular structure-based sequence alignment or to replace the traditional iterative refinement procedures based on residue-level dynamic programming algorithm in many structure alignment programs. PMID:19589133
Budavari, Tamas; Langmead, Ben; Wheelan, Sarah J.; Salzberg, Steven L.; Szalay, Alexander S.
2015-01-01
When computing alignments of DNA sequences to a large genome, a key element in achieving high processing throughput is to prioritize locations in the genome where high-scoring mappings might be expected. We formulated this task as a series of list-processing operations that can be efficiently performed on graphics processing unit (GPU) hardware.We followed this approach in implementing a read aligner called Arioc that uses GPU-based parallel sort and reduction techniques to identify high-priority locations where potential alignments may be found. We then carried out a read-by-read comparison of Arioc’s reported alignments with the alignments found by several leading read aligners. With simulated reads, Arioc has comparable or better accuracy than the other read aligners we tested. With human sequencing reads, Arioc demonstrates significantly greater throughput than the other aligners we evaluated across a wide range of sensitivity settings. The Arioc software is available at https://github.com/RWilton/Arioc. It is released under a BSD open-source license. PMID:25780763
Identification of true EST alignments for recognising transcribed regions.
Ma, Chuang; Wang, Jia; Li, Lun; Duan, Mo-Jie; Zhou, Yan-Hong
2011-01-01
Transcribed regions can be determined by aligning Expressed Sequence Tags (ESTs) with genome sequences. The kernel of this strategy is to effectively distinguish true EST alignments from spurious ones. In this study, three measures including Direction Check, Identity Check and Terminal Check were introduced to more effectively eliminate spurious EST alignments. On the basis of these introduced measures and other widely used measures, a computational tool, named ESTCleanser, has been developed to identify true EST alignments for obtaining reliable transcribed regions. The performance of ESTCleanser has been evaluated on the well-annotated human ENCyclopedia of DNA Elements (ENCODE) regions using human ESTs in the dbEST database. The evaluation results show that the accuracy of ESTCleanser at exon and intron levels is more remarkably enhanced than that of UCSC-spliced EST alignments. This work would be helpful to EST-based researches on finding new genes, complementing genome annotation, recognising alternative splicing events and Single Nucleotide Polymorphisms (SNPs), etc.
Palmer, Lance E; Dejori, Mathaeus; Bolanos, Randall; Fasulo, Daniel
2010-01-15
With the rapid expansion of DNA sequencing databases, it is now feasible to identify relevant information from prior sequencing projects and completed genomes and apply it to de novo sequencing of new organisms. As an example, this paper demonstrates how such extra information can be used to improve de novo assemblies by augmenting the overlapping step. Finding all pairs of overlapping reads is a key task in many genome assemblers, and to this end, highly efficient algorithms have been developed to find alignments in large collections of sequences. It is well known that due to repeated sequences, many aligned pairs of reads nevertheless do not overlap. But no overlapping algorithm to date takes a rigorous approach to separating aligned but non-overlapping read pairs from true overlaps. We present an approach that extends the Minimus assembler by a data driven step to classify overlaps as true or false prior to contig construction. We trained several different classification models within the Weka framework using various statistics derived from overlaps of reads available from prior sequencing projects. These statistics included percent mismatch and k-mer frequencies within the overlaps as well as a comparative genomics score derived from mapping reads to multiple reference genomes. We show that in real whole-genome sequencing data from the E. coli and S. aureus genomes, by providing a curated set of overlaps to the contigging phase of the assembler, we nearly doubled the median contig length (N50) without sacrificing coverage of the genome or increasing the number of mis-assemblies. Machine learning methods that use comparative and non-comparative features to classify overlaps as true or false can be used to improve the quality of a sequence assembly.
Generation of non-genomic oligonucleotide tag sequences for RNA template-specific PCR
Pinto, Fernando Lopes; Svensson, Håkan; Lindblad, Peter
2006-01-01
Background In order to overcome genomic DNA contamination in transcriptional studies, reverse template-specific polymerase chain reaction, a modification of reverse transcriptase polymerase chain reaction, is used. The possibility of using tags whose sequences are not found in the genome further improves reverse specific polymerase chain reaction experiments. Given the absence of software available to produce genome suitable tags, a simple tool to fulfill such need was developed. Results The program was developed in Perl, with separate use of the basic local alignment search tool, making the tool platform independent (known to run on Windows XP and Linux). In order to test the performance of the generated tags, several molecular experiments were performed. The results show that Tagenerator is capable of generating tags with good priming properties, which will deliberately not result in PCR amplification of genomic DNA. Conclusion The program Tagenerator is capable of generating tag sequences that combine genome absence with good priming properties for RT-PCR based experiments, circumventing the effects of genomic DNA contamination in an RNA sample. PMID:16820068
QueTAL: a suite of tools to classify and compare TAL effectors functionally and phylogenetically
Pérez-Quintero, Alvaro L.; Lamy, Léo; Gordon, Jonathan L.; Escalon, Aline; Cunnac, Sébastien; Szurek, Boris; Gagnevin, Lionel
2015-01-01
Transcription Activator-Like (TAL) effectors from Xanthomonas plant pathogenic bacteria can bind to the promoter region of plant genes and induce their expression. DNA-binding specificity is governed by a central domain made of nearly identical repeats, each determining the recognition of one base pair via two amino acid residues (a.k.a. Repeat Variable Di-residue, or RVD). Knowing how TAL effectors differ from each other within and between strains would be useful to infer functional and evolutionary relationships, but their repetitive nature precludes reliable use of traditional alignment methods. The suite QueTAL was therefore developed to offer tailored tools for comparison of TAL effector genes. The program DisTAL considers each repeat as a unit, transforms a TAL effector sequence into a sequence of coded repeats and makes pair-wise alignments between these coded sequences to construct trees. The program FuncTAL is aimed at finding TAL effectors with similar DNA-binding capabilities. It calculates correlations between position weight matrices of potential target DNA sequence predicted from the RVD sequence, and builds trees based on these correlations. The programs accurately represented phylogenetic and functional relationships between TAL effectors using either simulated or literature-curated data. When using the programs on a large set of TAL effector sequences, the DisTAL tree largely reflected the expected species phylogeny. In contrast, FuncTAL showed that TAL effectors with similar binding capabilities can be found between phylogenetically distant taxa. This suite will help users to rapidly analyse any TAL effector genes of interest and compare them to other available TAL genes and should improve our understanding of TAL effectors evolution. It is available at http://bioinfo-web.mpl.ird.fr/cgi-bin2/quetal/quetal.cgi. PMID:26284082
Li, Qin; Cui, Chenchen; Higgins, Daniel A; Li, Jun
2012-09-05
The potential-dependent reorientation dynamics of double-stranded DNA (ds-DNA) attached to planar glassy carbon electrode (GCE) surfaces were investigated. The orientation state of surface-bound ds-DNA was followed by monitoring the fluorescence from a 6-carboxyfluorescein (FAM6) fluorophore covalently linked to the distal end of the DNA. Positive potentials (i.e., +0.2 V vs open circuit potential, OCP) caused the ds-DNA to align parallel to the electrode surface, resulting in strong dipole-electrode quenching of FAM6 fluorescence. Switching of the GCE potential to negative values (i.e., -0.2 V vs OCP) caused the ds-DNA to reorient perpendicular to the electrode surface, with a concomitant increase in FAM6 fluorescence. In addition to the very fast (submilliseconds) dynamics of the initial reorientation process, slow (0.1-0.9 s) relaxation of FAM6 fluorescence to intermediate levels was also observed after potential switching. These dynamics have not been previously described in the literature. They are too slow to be explained by double layer charging, and chronoamperometry data showed no evidence of such effects. Both the amplitude and rate of the dynamics were found to depend upon buffer concentration, and ds-DNA length, demonstrating a dependence on the double layer field. The dynamics are concluded to arise from previously undetected complexities in the mechanism of potential-dependent ds-DNA reorientation. The possible origins of these dynamics are discussed. A better understanding of these dynamics will lead to improved models for potential-dependent ds-DNA reorientation at electrode surfaces and will facilitate the development of advanced electrochemical devices for detection of target DNAs.
Kück, Patrick; Meusemann, Karen; Dambach, Johannes; Thormann, Birthe; von Reumont, Björn M; Wägele, Johann W; Misof, Bernhard
2010-03-31
Methods of alignment masking, which refers to the technique of excluding alignment blocks prior to tree reconstructions, have been successful in improving the signal-to-noise ratio in sequence alignments. However, the lack of formally well defined methods to identify randomness in sequence alignments has prevented a routine application of alignment masking. In this study, we compared the effects on tree reconstructions of the most commonly used profiling method (GBLOCKS) which uses a predefined set of rules in combination with alignment masking, with a new profiling approach (ALISCORE) based on Monte Carlo resampling within a sliding window, using different data sets and alignment methods. While the GBLOCKS approach excludes variable sections above a certain threshold which choice is left arbitrary, the ALISCORE algorithm is free of a priori rating of parameter space and therefore more objective. ALISCORE was successfully extended to amino acids using a proportional model and empirical substitution matrices to score randomness in multiple sequence alignments. A complex bootstrap resampling leads to an even distribution of scores of randomly similar sequences to assess randomness of the observed sequence similarity. Testing performance on real data, both masking methods, GBLOCKS and ALISCORE, helped to improve tree resolution. The sliding window approach was less sensitive to different alignments of identical data sets and performed equally well on all data sets. Concurrently, ALISCORE is capable of dealing with different substitution patterns and heterogeneous base composition. ALISCORE and the most relaxed GBLOCKS gap parameter setting performed best on all data sets. Correspondingly, Neighbor-Net analyses showed the most decrease in conflict. Alignment masking improves signal-to-noise ratio in multiple sequence alignments prior to phylogenetic reconstruction. Given the robust performance of alignment profiling, alignment masking should routinely be used to improve tree reconstructions. Parametric methods of alignment profiling can be easily extended to more complex likelihood based models of sequence evolution which opens the possibility of further improvements.
A Linked Series of Laboratory Exercises in Molecular Biology Utilizing Bioinformatics and GFP
ERIC Educational Resources Information Center
Medin, Carey L.; Nolin, Katie L.
2011-01-01
Molecular biologists commonly use bioinformatics to map and analyze DNA and protein sequences and to align different DNA and protein sequences for comparison. Additionally, biologists can create and view 3D models of protein structures to further understand intramolecular interactions. The primary goal of this 10-week laboratory was to introduce…
Fine-tuning structural RNA alignments in the twilight zone.
Bremges, Andreas; Schirmer, Stefanie; Giegerich, Robert
2010-04-30
A widely used method to find conserved secondary structure in RNA is to first construct a multiple sequence alignment, and then fold the alignment, optimizing a score based on thermodynamics and covariance. This method works best around 75% sequence similarity. However, in a "twilight zone" below 55% similarity, the sequence alignment tends to obscure the covariance signal used in the second phase. Therefore, while the overall shape of the consensus structure may still be found, the degree of conservation cannot be estimated reliably. Based on a combination of available methods, we present a method named planACstar for improving structure conservation in structural alignments in the twilight zone. After constructing a consensus structure by alignment folding, planACstar abandons the original sequence alignment, refolds the sequences individually, but consistent with the consensus, aligns the structures, irrespective of sequence, by a pure structure alignment method, and derives an improved sequence alignment from the alignment of structures, to be re-submitted to alignment folding, etc.. This circle may be iterated as long as structural conservation improves, but normally, one step suffices. Employing the tools ClustalW, RNAalifold, and RNAforester, we find that for sequences with 30-55% sequence identity, structural conservation can be improved by 10% on average, with a large variation, measured in terms of RNAalifold's own criterion, the structure conservation index.
Kimura, Tomohiro; Nakano, Toshiki; Yamaguchi, Toshiyasu; Sato, Minoru; Ogawa, Tomohisa; Muramoto, Koji; Yokoyama, Takehiko; Kan-No, Nobuhiro; Nagahisa, Eizou; Janssen, Frank; Grieshaber, Manfred K
2004-01-01
The complete complementary DNA sequences of genes presumably coding for opine dehydrogenases from Arabella iricolor (sandworm), Haliotis discus hannai (abalone), and Patinopecten yessoensis (scallop) were determined, and partial cDNA sequences were derived for Meretrix lusoria (Japanese hard clam) and Spisula sachalinensis (Sakhalin surf clam). The primers ODH-9F and ODH-11R proved useful for amplifying the sequences for opine dehydrogenases from the 4 mollusk species investigated in this study. The sequence of the sandworm was obtained using primers constructed from the amino acid sequence of tauropine dehydrogenase, the main opine dehydrogenase in A. iricolor. The complete cDNA sequence of A. iricolor, H. discus hannai, and P. yessoensis encode 397, 400, and 405 amino acids, respectively. All sequences were aligned and compared with published databank sequences of Loligo opalescens, Loligo vulgaris (squid), Sepia officinalis (cuttlefish), and Pecten maximus (scallop). As expected, a high level of homology was observed for the cDNA from closely related species, such as for cephalopods or scallops, whereas cDNA from the other species showed lower-level homologies. A similar trend was observed when the deduced amino acid sequences were compared. Furthermore, alignment of these sequences revealed some structural motifs that are possibly related to the binding sites of the substrates. The phylogenetic trees derived from the nucleotide and amino acid sequences were consistent with the classification of species resulting from classical taxonomic analyses.
CRITICA: coding region identification tool invoking comparative analysis
NASA Technical Reports Server (NTRS)
Badger, J. H.; Olsen, G. J.; Woese, C. R. (Principal Investigator)
1999-01-01
Gene recognition is essential to understanding existing and future DNA sequence data. CRITICA (Coding Region Identification Tool Invoking Comparative Analysis) is a suite of programs for identifying likely protein-coding sequences in DNA by combining comparative analysis of DNA sequences with more common noncomparative methods. In the comparative component of the analysis, regions of DNA are aligned with related sequences from the DNA databases; if the translation of the aligned sequences has greater amino acid identity than expected for the observed percentage nucleotide identity, this is interpreted as evidence for coding. CRITICA also incorporates noncomparative information derived from the relative frequencies of hexanucleotides in coding frames versus other contexts (i.e., dicodon bias). The dicodon usage information is derived by iterative analysis of the data, such that CRITICA is not dependent on the existence or accuracy of coding sequence annotations in the databases. This independence makes the method particularly well suited for the analysis of novel genomes. CRITICA was tested by analyzing the available Salmonella typhimurium DNA sequences. Its predictions were compared with the DNA sequence annotations and with the predictions of GenMark. CRITICA proved to be more accurate than GenMark, and moreover, many of its predictions that would seem to be errors instead reflect problems in the sequence databases. The source code of CRITICA is freely available by anonymous FTP (rdp.life.uiuc.edu in/pub/critica) and on the World Wide Web (http:/(/)rdpwww.life.uiuc.edu).
Overcoming low-alignment signal contrast induced alignment failure by alignment signal enhancement
NASA Astrophysics Data System (ADS)
Lee, Byeong Soo; Kim, Young Ha; Hwang, Hyunwoo; Lee, Jeongjin; Kong, Jeong Heung; Kang, Young Seog; Paarhuis, Bart; Kok, Haico; de Graaf, Roelof; Weichselbaum, Stefan; Droste, Richard; Mason, Christopher; Aarts, Igor; de Boeij, Wim P.
2016-03-01
Overlay is one of the key factors which enables optical lithography extension to 1X node DRAM manufacturing. It is natural that accurate wafer alignment is a prerequisite for good device overlay. However, alignment failures or misalignments are commonly observed in a fab. There are many factors which could induce alignment problems. Low alignment signal contrast is one of the main issues. Alignment signal contrast can be degraded by opaque stack materials or by alignment mark degradation due to processes like CMP. This issue can be compounded by mark sub-segmentation from design rules in combination with double or quadruple spacer process. Alignment signal contrast can be improved by applying new material or process optimization, which sometimes lead to the addition of another process-step with higher costs. If we can amplify the signal components containing the position information and reduce other unwanted signal and background contributions then we can improve alignment performance without process change. In this paper we use ASML's new alignment sensor (as was introduced and released on the NXT:1980Di) and sample wafers with special stacks which can induce poor alignment signal to demonstrate alignment and overlay improvement.
An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system
DOE Office of Scientific and Technical Information (OSTI.GOV)
AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide
Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database inmore » which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. Lastly, this database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.« less
An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system
AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide
2015-11-19
Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database inmore » which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. Lastly, this database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.« less
BlackOPs: increasing confidence in variant detection through mappability filtering.
Cabanski, Christopher R; Wilkerson, Matthew D; Soloway, Matthew; Parker, Joel S; Liu, Jinze; Prins, Jan F; Marron, J S; Perou, Charles M; Hayes, D Neil
2013-10-01
Identifying variants using high-throughput sequencing data is currently a challenge because true biological variants can be indistinguishable from technical artifacts. One source of technical artifact results from incorrectly aligning experimentally observed sequences to their true genomic origin ('mismapping') and inferring differences in mismapped sequences to be true variants. We developed BlackOPs, an open-source tool that simulates experimental RNA-seq and DNA whole exome sequences derived from the reference genome, aligns these sequences by custom parameters, detects variants and outputs a blacklist of positions and alleles caused by mismapping. Blacklists contain thousands of artifact variants that are indistinguishable from true variants and, for a given sample, are expected to be almost completely false positives. We show that these blacklist positions are specific to the alignment algorithm and read length used, and BlackOPs allows users to generate a blacklist specific to their experimental setup. We queried the dbSNP and COSMIC variant databases and found numerous variants indistinguishable from mapping errors. We demonstrate how filtering against blacklist positions reduces the number of potential false variants using an RNA-seq glioblastoma cell line data set. In summary, accounting for mapping-caused variants tuned to experimental setups reduces false positives and, therefore, improves genome characterization by high-throughput sequencing.
Phylogenetic inference under varying proportions of indel-induced alignment gaps
Dwivedi, Bhakti; Gadagkar, Sudhindra R
2009-01-01
Background The effect of alignment gaps on phylogenetic accuracy has been the subject of numerous studies. In this study, we investigated the relationship between the total number of gapped sites and phylogenetic accuracy, when the gaps were introduced (by means of computer simulation) to reflect indel (insertion/deletion) events during the evolution of DNA sequences. The resulting (true) alignments were subjected to commonly used gap treatment and phylogenetic inference methods. Results (1) In general, there was a strong – almost deterministic – relationship between the amount of gap in the data and the level of phylogenetic accuracy when the alignments were very "gappy", (2) gaps resulting from deletions (as opposed to insertions) contributed more to the inaccuracy of phylogenetic inference, (3) the probabilistic methods (Bayesian, PhyML & "MLε, " a method implemented in DNAML in PHYLIP) performed better at most levels of gap percentage when compared to parsimony (MP) and distance (NJ) methods, with Bayesian analysis being clearly the best, (4) methods that treat gapped sites as missing data yielded less accurate trees when compared to those that attribute phylogenetic signal to the gapped sites (by coding them as binary character data – presence/absence, or as in the MLε method), and (5) in general, the accuracy of phylogenetic inference depended upon the amount of available data when the gaps resulted from mainly deletion events, and the amount of missing data when insertion events were equally likely to have caused the alignment gaps. Conclusion When gaps in an alignment are a consequence of indel events in the evolution of the sequences, the accuracy of phylogenetic analysis is likely to improve if: (1) alignment gaps are categorized as arising from insertion events or deletion events and then treated separately in the analysis, (2) the evolutionary signal provided by indels is harnessed in the phylogenetic analysis, and (3) methods that utilize the phylogenetic signal in indels are developed for distance methods too. When the true homology is known and the amount of gaps is 20 percent of the alignment length or less, the methods used in this study are likely to yield trees with 90–100 percent accuracy. PMID:19698168
Tso, Kai-Yuen; Lee, Sau Dan; Lo, Kwok-Wai; Yip, Kevin Y
2014-12-23
Patient-derived tumor xenografts in mice are widely used in cancer research and have become important in developing personalized therapies. When these xenografts are subject to DNA sequencing, the samples could contain various amounts of mouse DNA. It has been unclear how the mouse reads would affect data analyses. We conducted comprehensive simulations to compare three alignment strategies at different mutation rates, read lengths, sequencing error rates, human-mouse mixing ratios and sequenced regions. We also sequenced a nasopharyngeal carcinoma xenograft and a cell line to test how the strategies work on real data. We found the "filtering" and "combined reference" strategies performed better than aligning reads directly to human reference in terms of alignment and variant calling accuracies. The combined reference strategy was particularly good at reducing false negative variants calls without significantly increasing the false positive rate. In some scenarios the performance gain of these two special handling strategies was too small for special handling to be cost-effective, but it was found crucial when false non-synonymous SNVs should be minimized, especially in exome sequencing. Our study systematically analyzes the effects of mouse contamination in the sequencing data of human-in-mouse xenografts. Our findings provide information for designing data analysis pipelines for these data.
Wu, Ren-Guei; Yang, Chung-Shi; Wang, Pen-Cheng; Tseng, Fan-Gang
2009-06-01
We present a micro-CEC chip carrying out a highly efficient separation of dsDNA fragments through vertically aligned multi-wall carbon nanotubes (MWCNTs) in a microchannel. The vertically aligned MWCNTs were grown directly in the microchannel to form straight nanopillar arrays as ordered and directional chromatographic supports. 1-Pyrenedodecanoic acid was employed for the surface modification of the MWCNTs' stationary phase to adsorb analytes by hydrophobic interactions. This device was used for separating dsDNA fragments of three different lengths (254, 360, and 572 bp), and fluorescence detection was employed to verify the electrokinetic transport in the MWCNT array. The micro-CEC separation of the three compounds was achieved in less than 300 s at a field strength of 66 V/cm due to superior laminar flow patterns and a lower flow resistance resulting from the vertically aligned MWCNTs being used as the stationary phase medium. In addition, a fivefold reduction of band broadening was obtained when the analyte was separated by the chromatographic MWCNT array channel instead of the CE channel. From all of the results, we suggest that an in situ grown and directional MWCNT array can potentially be useful for preparing more diversified forms of stationary phases for vertically efficient chip-based electrochromatography.
Halter, Mathew C; Zahn, James A
2017-02-01
White biotechnology has made a positive impact on the chemical industry by providing safer, more efficient chemical manufacturing processes that have reduced the use of toxic chemicals, harsh reaction conditions, and expensive metal catalysts, which has improved alignment with the principles of Green Chemistry. The genetically-modified (GM) biocatalysts that are utilized in these processes are typically separated from high-value products and then recycled, or eliminated. Elimination routes include disposal in sanitary landfills, incineration, use as a fuel, animal feed, or reuse as an agricultural soil amendment or other value-added products. Elimination routes that have the potential to impact the food chain or environment have been more heavily scrutinized for the fate and persistence of biological products. In this study, we developed and optimized a method for monitoring the degradation of strain-specific DNA markers from a genetically-modified organism (GMO) used for the commercial production of 1,3-propanediol. Laboratory and field tests showed that a marker for heterologous DNA in the GM organism was no longer detectable by end-point polymerase chain reaction (PCR) after 14 days. The half-life of heterologous DNA was increased by 17% (from 42.4 to 49.7 h) after sterilization of the soil from a field plot, which indicated that abiotic factors were important in degradation of DNA under field conditions. There was no evidence for horizontal transfer of DNA target sequences from the GMO to viable organisms present in the soil.
BLAST and FASTA similarity searching for multiple sequence alignment.
Pearson, William R
2014-01-01
BLAST, FASTA, and other similarity searching programs seek to identify homologous proteins and DNA sequences based on excess sequence similarity. If two sequences share much more similarity than expected by chance, the simplest explanation for the excess similarity is common ancestry-homology. The most effective similarity searches compare protein sequences, rather than DNA sequences, for sequences that encode proteins, and use expectation values, rather than percent identity, to infer homology. The BLAST and FASTA packages of sequence comparison programs provide programs for comparing protein and DNA sequences to protein databases (the most sensitive searches). Protein and translated-DNA comparisons to protein databases routinely allow evolutionary look back times from 1 to 2 billion years; DNA:DNA searches are 5-10-fold less sensitive. BLAST and FASTA can be run on popular web sites, but can also be downloaded and installed on local computers. With local installation, target databases can be customized for the sequence data being characterized. With today's very large protein databases, search sensitivity can also be improved by searching smaller comprehensive databases, for example, a complete protein set from an evolutionarily neighboring model organism. By default, BLAST and FASTA use scoring strategies target for distant evolutionary relationships; for comparisons involving short domains or queries, or searches that seek relatively close homologs (e.g. mouse-human), shallower scoring matrices will be more effective. Both BLAST and FASTA provide very accurate statistical estimates, which can be used to reliably identify protein sequences that diverged more than 2 billion years ago.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wilkins, T.A.
1993-06-01
This study investigates the molecular events of vacuole ontogeny in rapidly elongated cotton plant cells. Within the DNA coding region, the cotton and carrot cDNA clones exhibit 82.2% nucleotide sequence homology; at the amino acid level cotton and carrot catalytic subunits exhibited 95.7% identity and 2.1% amino acid similarity. When aligned with the analogous sequences from yeast, the cotton protein shared only 60.5% amino acid identity and 12.7% similarity. 10 refs., 1 tab.
Fine-tuning structural RNA alignments in the twilight zone
2010-01-01
Background A widely used method to find conserved secondary structure in RNA is to first construct a multiple sequence alignment, and then fold the alignment, optimizing a score based on thermodynamics and covariance. This method works best around 75% sequence similarity. However, in a "twilight zone" below 55% similarity, the sequence alignment tends to obscure the covariance signal used in the second phase. Therefore, while the overall shape of the consensus structure may still be found, the degree of conservation cannot be estimated reliably. Results Based on a combination of available methods, we present a method named planACstar for improving structure conservation in structural alignments in the twilight zone. After constructing a consensus structure by alignment folding, planACstar abandons the original sequence alignment, refolds the sequences individually, but consistent with the consensus, aligns the structures, irrespective of sequence, by a pure structure alignment method, and derives an improved sequence alignment from the alignment of structures, to be re-submitted to alignment folding, etc.. This circle may be iterated as long as structural conservation improves, but normally, one step suffices. Conclusions Employing the tools ClustalW, RNAalifold, and RNAforester, we find that for sequences with 30-55% sequence identity, structural conservation can be improved by 10% on average, with a large variation, measured in terms of RNAalifold's own criterion, the structure conservation index. PMID:20433706
Cost-effective sequencing of full-length cDNA clones powered by a de novo-reference hybrid assembly.
Kuroshu, Reginaldo M; Watanabe, Junichi; Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka; Kasahara, Masahiro
2010-05-07
Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence approximately 800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only approximately US$3 per clone, demonstrating a significant advantage over previous approaches.
Lin, C H; Patel, D J
1997-11-01
Structural studies by nuclear magnetic resonance (NMR) of RNA and DNA aptamer complexes identified through in vitro selection and amplification have provided a wealth of information on RNA and DNA tertiary structure and molecular recognition in solution. The RNA and DNA aptamers that target ATP (and AMP) with micromolar affinity exhibit distinct binding site sequences and secondary structures. We report below on the tertiary structure of the AMP-DNA aptamer complex in solution and compare it with the previously reported tertiary structure of the AMP-RNA aptamer complex in solution. The solution structure of the AMP-DNA aptamer complex shows, surprisingly, that two AMP molecules are intercalated at adjacent sites within a rectangular widened minor groove. Complex formation involves adaptive binding where the asymmetric internal bubble of the free DNA aptamer zippers up through formation of a continuous six-base mismatch segment which includes a pair of adjacent three-base platforms. The AMP molecules pair through their Watson-Crick edges with the minor groove edges of guanine residues. These recognition G.A mismatches are flanked by sheared G.A and reversed Hoogsteen G.G mismatch pairs. The AMP-DNA aptamer and AMP-RNA aptamer complexes have distinct tertiary structures and binding stoichiometries. Nevertheless, both complexes have similar structural features and recognition alignments in their binding pockets. Specifically, AMP targets both DNA and RNA aptamers by intercalating between purine bases and through identical G.A mismatch formation. The recognition G.A mismatch stacks with a reversed Hoogsteen G.G mismatch in one direction and with an adenine base in the other direction in both complexes. It is striking that DNA and RNA aptamers selected independently from libraries of 10(14) molecules in each case utilize identical mismatch alignments for molecular recognition with micromolar affinity within binding-site pockets containing common structural elements.
Image correlation method for DNA sequence alignment.
Curilem Saldías, Millaray; Villarroel Sassarini, Felipe; Muñoz Poblete, Carlos; Vargas Vásquez, Asticio; Maureira Butler, Iván
2012-01-01
The complexity of searches and the volume of genomic data make sequence alignment one of bioinformatics most active research areas. New alignment approaches have incorporated digital signal processing techniques. Among these, correlation methods are highly sensitive. This paper proposes a novel sequence alignment method based on 2-dimensional images, where each nucleic acid base is represented as a fixed gray intensity pixel. Query and known database sequences are coded to their pixel representation and sequence alignment is handled as object recognition in a scene problem. Query and database become object and scene, respectively. An image correlation process is carried out in order to search for the best match between them. Given that this procedure can be implemented in an optical correlator, the correlation could eventually be accomplished at light speed. This paper shows an initial research stage where results were "digitally" obtained by simulating an optical correlation of DNA sequences represented as images. A total of 303 queries (variable lengths from 50 to 4500 base pairs) and 100 scenes represented by 100 x 100 images each (in total, one million base pair database) were considered for the image correlation analysis. The results showed that correlations reached very high sensitivity (99.01%), specificity (98.99%) and outperformed BLAST when mutation numbers increased. However, digital correlation processes were hundred times slower than BLAST. We are currently starting an initiative to evaluate the correlation speed process of a real experimental optical correlator. By doing this, we expect to fully exploit optical correlation light properties. As the optical correlator works jointly with the computer, digital algorithms should also be optimized. The results presented in this paper are encouraging and support the study of image correlation methods on sequence alignment.
Score distributions of gapped multiple sequence alignments down to the low-probability tail
NASA Astrophysics Data System (ADS)
Fieth, Pascal; Hartmann, Alexander K.
2016-08-01
Assessing the significance of alignment scores of optimally aligned DNA or amino acid sequences can be achieved via the knowledge of the score distribution of random sequences. But this requires obtaining the distribution in the biologically relevant high-scoring region, where the probabilities are exponentially small. For gapless local alignments of infinitely long sequences this distribution is known analytically to follow a Gumbel distribution. Distributions for gapped local alignments and global alignments of finite lengths can only be obtained numerically. To obtain result for the small-probability region, specific statistical mechanics-based rare-event algorithms can be applied. In previous studies, this was achieved for pairwise alignments. They showed that, contrary to results from previous simple sampling studies, strong deviations from the Gumbel distribution occur in case of finite sequence lengths. Here we extend the studies to multiple sequence alignments with gaps, which are much more relevant for practical applications in molecular biology. We study the distributions of scores over a large range of the support, reaching probabilities as small as 10-160, for global and local (sum-of-pair scores) multiple alignments. We find that even after suitable rescaling, eliminating the sequence-length dependence, the distributions for multiple alignment differ from the pairwise alignment case. Furthermore, we also show that the previously discussed Gaussian correction to the Gumbel distribution needs to be refined, also for the case of pairwise alignments.
Evolution of physician-hospital alignment models: a case study of comanagement.
Sowers, Kevin W; Newman, Paul R; Langdon, Jeffrey C
2013-06-01
Recently, quality, financial, and regulatory demands have driven physicians to seek alignment opportunities with hospitals. The motivation for alignment on the part of physicians and hospitals is now accelerating because the new paradigm under healthcare reform requires an increased focus on improving quality, cost, and efficiency. We (1) identify the key drivers for physician-hospital alignment models; (2) summarize comanagement as a physician-hospital alignment model; and (3) explore a detailed case study of comanagement as an option to better align physicians with hospital goals on quality, safety, and outcomes. A Medline abstract review was performed that identified 45 references that discuss options for physician-hospital alignment. None of the articles identified provide a detailed example of successful alignment structures. A detailed case study of a successful comanagement alignment program is reviewed. The key drivers for alignment are inpatient growth rates, declining reimbursements, and the opportunity to improve quality, decrease costs, and increase efficiency. Two general strategies of alignment involve noneconomic and/or economic integration. In our example, comanagement with economic integration was chosen as the preferred structure for physician-hospital alignment. The choice of structure will vary depending on the existing relationships and governance of the hospital and the physicians in the targeted area of focus. The measure of success in building physician-hospital alignment is measured in improvements in care for the patient, reduced cost of care delivery, and improved relations between physicians and hospital leadership.
NASA Astrophysics Data System (ADS)
Millard, Julie T.; Pilon, André M.
2003-04-01
A recent forensic approach for identification of unknown biological samples is mitochondrial DNA (mtDNA) sequencing. We describe a laboratory exercise suitable for an undergraduate biochemistry course in which the polymerase chain reaction is used to amplify a 440 base pair hypervariable region of human mtDNA from a variety of "crime scene" samples (e.g., teeth, hair, nails, cigarettes, envelope flaps, toothbrushes, and chewing gum). Amplification is verified via agarose gel electrophoresis and then samples are subjected to cycle sequencing. Sequence alignments are made via the program CLUSTAL W, allowing students to compare samples and solve the "crime."
Elucidating the role of transcription in shaping the 3D structure of the bacterial genome
NASA Astrophysics Data System (ADS)
Brandao, Hugo B.; Wang, Xindan; Rudner, David Z.; Mirny, Leonid
Active transcription has been linked to several genome conformation changes in bacteria, including the recruitment of chromosomal DNA to the cell membrane and formation of nucleoid clusters. Using genomic and imaging data as input into mathematical models and polymer simulations, we sought to explore the extent to which bacterial 3D genome structure could be explained by 1D transcription tracks. Using B. subtilis as a model organism, we investigated via polymer simulations the role of loop extrusion and DNA super-coiling on the formation of interaction domains and other fine-scale features that are visible in chromosome conformation capture (Hi-C) data. We then explored the role of the condensin structural maintenance of chromosome complex on the alignment of chromosomal arms. A parameter-free transcription traffic model demonstrated that mean chromosomal arm alignment can be quantitatively explained, and the effects on arm alignment in genomically rearranged strains of B. subtilis were accurately predicted. H.B. acknowledges support from the Natural Sciences and Engineering Research Council of Canada for a PGS-D fellowship.
FEAST: sensitive local alignment with multiple rates of evolution.
Hudek, Alexander K; Brown, Daniel G
2011-01-01
We present a pairwise local aligner, FEAST, which uses two new techniques: a sensitive extension algorithm for identifying homologous subsequences, and a descriptive probabilistic alignment model. We also present a new procedure for training alignment parameters and apply it to the human and mouse genomes, producing a better parameter set for these sequences. Our extension algorithm identifies homologous subsequences by considering all evolutionary histories. It has higher maximum sensitivity than Viterbi extensions, and better balances specificity. We model alignments with several submodels, each with unique statistical properties, describing strongly similar and weakly similar regions of homologous DNA. Training parameters using two submodels produces superior alignments, even when we align with only the parameters from the weaker submodel. Our extension algorithm combined with our new parameter set achieves sensitivity 0.59 on synthetic tests. In contrast, LASTZ with default settings achieves sensitivity 0.35 with the same false positive rate. Using the weak submodel as parameters for LASTZ increases its sensitivity to 0.59 with high error. FEAST is available at http://monod.uwaterloo.ca/feast/.
Fiannaca, Antonino; La Rosa, Massimo; Rizzo, Riccardo; Urso, Alfonso
2015-07-01
In this paper, an alignment-free method for DNA barcode classification that is based on both a spectral representation and a neural gas network for unsupervised clustering is proposed. In the proposed methodology, distinctive words are identified from a spectral representation of DNA sequences. A taxonomic classification of the DNA sequence is then performed using the sequence signature, i.e., the smallest set of k-mers that can assign a DNA sequence to its proper taxonomic category. Experiments were then performed to compare our method with other supervised machine learning classification algorithms, such as support vector machine, random forest, ripper, naïve Bayes, ridor, and classification tree, which also consider short DNA sequence fragments of 200 and 300 base pairs (bp). The experimental tests were conducted over 10 real barcode datasets belonging to different animal species, which were provided by the on-line resource "Barcode of Life Database". The experimental results showed that our k-mer-based approach is directly comparable, in terms of accuracy, recall and precision metrics, with the other classifiers when considering full-length sequences. In addition, we demonstrate the robustness of our method when a classification is performed task with a set of short DNA sequences that were randomly extracted from the original data. For example, the proposed method can reach the accuracy of 64.8% at the species level with 200-bp fragments. Under the same conditions, the best other classifier (random forest) reaches the accuracy of 20.9%. Our results indicate that we obtained a clear improvement over the other classifiers for the study of short DNA barcode sequence fragments. Copyright © 2015 Elsevier B.V. All rights reserved.
Morise, Hisashi; Miyazaki, Erika; Yoshimitsu, Shoko; Eki, Toshihiko
2012-01-01
Soil nematodes play crucial roles in the soil food web and are a suitable indicator for assessing soil environments and ecosystems. Previous nematode community analyses based on nematode morphology classification have been shown to be useful for assessing various soil environments. Here we have conducted DNA barcode analysis for soil nematode community analyses in Japanese soils. We isolated nematodes from two different environmental soils of an unmanaged flowerbed and an agricultural field using the improved flotation-sieving method. Small subunit (SSU) rDNA fragments were directly amplified from each of 68 (flowerbed samples) and 48 (field samples) isolated nematodes to determine the nucleotide sequence. Sixteen and thirteen operational taxonomic units (OTUs) were obtained by multiple sequence alignment from the flowerbed and agricultural field nematodes, respectively. All 29 SSU rDNA-derived OTUs (rOTUs) were further mapped onto a phylogenetic tree with 107 known nematode species. Interestingly, the two nematode communities examined were clearly distinct from each other in terms of trophic groups: Animal predators and plant feeders were markedly abundant in the flowerbed soils, in contrast, bacterial feeders were dominantly observed in the agricultural field soils. The data from the flowerbed nematodes suggests a possible food web among two different trophic nematode groups and plants (weeds) in the closed soil environment. Finally, DNA sequences derived from the mitochondrial cytochrome oxidase c subunit 1 (COI) gene were determined as a DNA barcode from 43 agricultural field soil nematodes. These nematodes were assigned to 13 rDNA-derived OTUs, but in the COI gene analysis were assigned to 23 COI gene-derived OTUs (cOTUs), indicating that COI gene-based barcoding may provide higher taxonomic resolution than conventional SSU rDNA-barcoding in soil nematode community analysis. PMID:23284767
SNPmplexViewer--toward a cost-effective traceability system
2011-01-01
Background Beef traceability has become mandatory in many regions of the world and is typically achieved through the use of unique numerical codes on ear tags and animal passports. DNA-based traceability uses the animal's own DNA code to identify it and the products derived from it. Using SNaPshot, a primer-extension-based method, a multiplex of 25 SNPs in a single reaction has been practiced for reducing the expense of genotyping a panel of SNPs useful for identity control. Findings To further decrease SNaPshot's cost, we introduced the Perl script SNPmplexViewer, which facilitates the analysis of trace files for reactions performed without the use of fluorescent size standards. SNPmplexViewer automatically aligns reference and target trace electropherograms, run with and without fluorescent size standards, respectively. SNPmplexViewer produces a modified target trace file containing a normalised trace in which the reference size standards are embedded. SNPmplexViewer also outputs aligned images of the two electropherograms together with a difference profile. Conclusions Modified trace files generated by SNPmplexViewer enable genotyping of SnaPshot reactions performed without fluorescent size standards, using common fragment-sizing software packages. SNPmplexViewer's normalised output may also improve the genotyping software's performance. Thus, SNPmplexViewer is a general free tool enabling the reduction of SNaPshot's cost as well as the fast viewing and comparing of trace electropherograms for fragment analysis. SNPmplexViewer is available at http://cowry.agri.huji.ac.il/cgi-bin/SNPmplexViewer.cgi. PMID:21600063
Farrington, Heather L.; Edwards, Christine E.; Guan, Xin; Carr, Matthew R.; Baerwaldt, Kelly; Lance, Richard F.
2015-01-01
Invasive Asian bighead and silver carp (Hypophthalmichthys nobilis and H. molitrix) pose a substantial threat to North American aquatic ecosystems. Recently, environmental DNA (eDNA), genetic material shed by organisms into their environment that can be detected by non-invasive sampling strategies and genetic assays, has gained recognition as a tool for tracking the invasion front of these species toward the Great Lakes. The goal of this study was to develop new species-specific conventional PCR (cPCR) and quantitative (qPCR) markers for detection of these species in North American surface waters. We first generated complete mitochondrial genome sequences from 33 bighead and 29 silver carp individuals collected throughout their introduced range. These sequences were aligned with those from other common and closely related fish species from the Illinois River watershed to identify and design new species-specific markers for the detection of bighead and silver carp DNA in environmental water samples. We then tested these genetic markers in the laboratory for species-specificity and sensitivity. Newly developed markers performed well in field trials, did not have any false positive detections, and many markers had much higher detection rates and sensitivity compared to the markers currently used in eDNA surveillance programs. We also explored the use of multiple genetic markers to determine whether it would improve detection rates, results of which showed that using multiple highly sensitive markers should maximize detection rates in environmental samples. The new markers developed in this study greatly expand the number of species-specific genetic markers available to track the invasion front of bighead and silver carp and will improve the resolution of these assays. Additionally, the use of the qPCR markers developed in this study may reduce sample processing time and cost of eDNA monitoring for these species. PMID:25706532
Farrington, Heather L; Edwards, Christine E; Guan, Xin; Carr, Matthew R; Baerwaldt, Kelly; Lance, Richard F
2015-01-01
Invasive Asian bighead and silver carp (Hypophthalmichthys nobilis and H. molitrix) pose a substantial threat to North American aquatic ecosystems. Recently, environmental DNA (eDNA), genetic material shed by organisms into their environment that can be detected by non-invasive sampling strategies and genetic assays, has gained recognition as a tool for tracking the invasion front of these species toward the Great Lakes. The goal of this study was to develop new species-specific conventional PCR (cPCR) and quantitative (qPCR) markers for detection of these species in North American surface waters. We first generated complete mitochondrial genome sequences from 33 bighead and 29 silver carp individuals collected throughout their introduced range. These sequences were aligned with those from other common and closely related fish species from the Illinois River watershed to identify and design new species-specific markers for the detection of bighead and silver carp DNA in environmental water samples. We then tested these genetic markers in the laboratory for species-specificity and sensitivity. Newly developed markers performed well in field trials, did not have any false positive detections, and many markers had much higher detection rates and sensitivity compared to the markers currently used in eDNA surveillance programs. We also explored the use of multiple genetic markers to determine whether it would improve detection rates, results of which showed that using multiple highly sensitive markers should maximize detection rates in environmental samples. The new markers developed in this study greatly expand the number of species-specific genetic markers available to track the invasion front of bighead and silver carp and will improve the resolution of these assays. Additionally, the use of the qPCR markers developed in this study may reduce sample processing time and cost of eDNA monitoring for these species.
Zheng, Qi; Grice, Elizabeth A
2016-10-01
Accurate mapping of next-generation sequencing (NGS) reads to reference genomes is crucial for almost all NGS applications and downstream analyses. Various repetitive elements in human and other higher eukaryotic genomes contribute in large part to ambiguously (non-uniquely) mapped reads. Most available NGS aligners attempt to address this by either removing all non-uniquely mapping reads, or reporting one random or "best" hit based on simple heuristics. Accurate estimation of the mapping quality of NGS reads is therefore critical albeit completely lacking at present. Here we developed a generalized software toolkit "AlignerBoost", which utilizes a Bayesian-based framework to accurately estimate mapping quality of ambiguously mapped NGS reads. We tested AlignerBoost with both simulated and real DNA-seq and RNA-seq datasets at various thresholds. In most cases, but especially for reads falling within repetitive regions, AlignerBoost dramatically increases the mapping precision of modern NGS aligners without significantly compromising the sensitivity even without mapping quality filters. When using higher mapping quality cutoffs, AlignerBoost achieves a much lower false mapping rate while exhibiting comparable or higher sensitivity compared to the aligner default modes, therefore significantly boosting the detection power of NGS aligners even using extreme thresholds. AlignerBoost is also SNP-aware, and higher quality alignments can be achieved if provided with known SNPs. AlignerBoost's algorithm is computationally efficient, and can process one million alignments within 30 seconds on a typical desktop computer. AlignerBoost is implemented as a uniform Java application and is freely available at https://github.com/Grice-Lab/AlignerBoost.
Samadian, Soroush; Bruce, Jeff P; Pugh, Trevor J
2018-03-01
Somatic copy number variations (CNVs) play a crucial role in development of many human cancers. The broad availability of next-generation sequencing data has enabled the development of algorithms to computationally infer CNV profiles from a variety of data types including exome and targeted sequence data; currently the most prevalent types of cancer genomics data. However, systemic evaluation and comparison of these tools remains challenging due to a lack of ground truth reference sets. To address this need, we have developed Bamgineer, a tool written in Python to introduce user-defined haplotype-phased allele-specific copy number events into an existing Binary Alignment Mapping (BAM) file, with a focus on targeted and exome sequencing experiments. As input, this tool requires a read alignment file (BAM format), lists of non-overlapping genome coordinates for introduction of gains and losses (bed file), and an optional file defining known haplotypes (vcf format). To improve runtime performance, Bamgineer introduces the desired CNVs in parallel using queuing and parallel processing on a local machine or on a high-performance computing cluster. As proof-of-principle, we applied Bamgineer to a single high-coverage (mean: 220X) exome sequence file from a blood sample to simulate copy number profiles of 3 exemplar tumors from each of 10 tumor types at 5 tumor cellularity levels (20-100%, 150 BAM files in total). To demonstrate feasibility beyond exome data, we introduced read alignments to a targeted 5-gene cell-free DNA sequencing library to simulate EGFR amplifications at frequencies consistent with circulating tumor DNA (10, 1, 0.1 and 0.01%) while retaining the multimodal insert size distribution of the original data. We expect Bamgineer to be of use for development and systematic benchmarking of CNV calling algorithms by users using locally-generated data for a variety of applications. The source code is freely available at http://github.com/pughlab/bamgineer.
Sievers, Aaron; Bosiek, Katharina; Bisch, Marc; Dreessen, Chris; Riedel, Jascha; Froß, Patrick; Hausmann, Michael; Hildenbrand, Georg
2017-01-01
In genome analysis, k-mer-based comparison methods have become standard tools. However, even though they are able to deliver reliable results, other algorithms seem to work better in some cases. To improve k-mer-based DNA sequence analysis and comparison, we successfully checked whether adding positional resolution is beneficial for finding and/or comparing interesting organizational structures. A simple but efficient algorithm for extracting and saving local k-mer spectra (frequency distribution of k-mers) was developed and used. The results were analyzed by including positional information based on visualizations as genomic maps and by applying basic vector correlation methods. This analysis was concentrated on small word lengths (1 ≤ k ≤ 4) on relatively small viral genomes of Papillomaviridae and Herpesviridae, while also checking its usability for larger sequences, namely human chromosome 2 and the homologous chromosomes (2A, 2B) of a chimpanzee. Using this alignment-free analysis, several regions with specific characteristics in Papillomaviridae and Herpesviridae formerly identified by independent, mostly alignment-based methods, were confirmed. Correlations between the k-mer content and several genes in these genomes have been found, showing similarities between classified and unclassified viruses, which may be potentially useful for further taxonomic research. Furthermore, unknown k-mer correlations in the genomes of Human Herpesviruses (HHVs), which are probably of major biological function, are found and described. Using the chromosomes of a chimpanzee and human that are currently known, identities between the species on every analyzed chromosome were reproduced. This demonstrates the feasibility of our approach for large data sets of complex genomes. Based on these results, we suggest k-mer analysis with positional resolution as a method for closing a gap between the effectiveness of alignment-based methods (like NCBI BLAST) and the high pace of standard k-mer analysis. PMID:28422050
Sharma, Virag; Hiller, Michael
2017-08-21
Genome alignments provide a powerful basis to transfer gene annotations from a well-annotated reference genome to many other aligned genomes. The completeness of these annotations crucially depends on the sensitivity of the underlying genome alignment. Here, we investigated the impact of the genome alignment parameters and found that parameters with a higher sensitivity allow the detection of thousands of novel alignments between orthologous exons that have been missed before. In particular, comparisons between species separated by an evolutionary distance of >0.75 substitutions per neutral site, like human and other non-placental vertebrates, benefit from increased sensitivity. To systematically test if increased sensitivity improves comparative gene annotations, we built a multiple alignment of 144 vertebrate genomes and used this alignment to map human genes to the other 143 vertebrates with CESAR. We found that higher alignment sensitivity substantially improves the completeness of comparative gene annotations by adding on average 2382 and 7440 novel exons and 117 and 317 novel genes for mammalian and non-mammalian species, respectively. Our results suggest a more sensitive alignment strategy that should generally be used for genome alignments between distantly-related species. Our 144-vertebrate genome alignment and the comparative gene annotations (https://bds.mpi-cbg.de/hillerlab/144VertebrateAlignment_CESAR/) are a valuable resource for comparative genomics. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Subbotin, S A; Vierstraete, A; De Ley, P; Rowe, J; Waeyenberge, L; Moens, M; Vanfleteren, J R
2001-10-01
The ITS1, ITS2, and 5.8S gene sequences of nuclear ribosomal DNA from 40 taxa of the family Heteroderidae (including the genera Afenestrata, Cactodera, Heterodera, Globodera, Punctodera, Meloidodera, Cryphodera, and Thecavermiculatus) were sequenced and analyzed. The ITS regions displayed high levels of sequence divergence within Heteroderinae and compared to outgroup taxa. Unlike recent findings in root knot nematodes, ITS sequence polymorphism does not appear to complicate phylogenetic analysis of cyst nematodes. Phylogenetic analyses with maximum-parsimony, minimum-evolution, and maximum-likelihood methods were performed with a range of computer alignments, including elision and culled alignments. All multiple alignments and phylogenetic methods yielded similar basic structure for phylogenetic relationships of Heteroderidae. The cyst-forming nematodes are represented by six main clades corresponding to morphological characters and host specialization, with certain clades assuming different positions depending on alignment procedure and/or method of phylogenetic inference. Hypotheses of monophyly of Punctoderinae and Heteroderinae are, respectively, strongly and moderately supported by the ITS data across most alignments. Close relationships were revealed between the Avenae and the Sacchari groups and between the Humuli group and the species H. salixophila within Heteroderinae. The Goettingiana group occupies a basal position within this subfamily. The validity of the genera Afenestrata and Bidera was tested and is discussed based on molecular data. We conclude that ITS sequence data are appropriate for studies of relationships within the different species groups and less so for recovery of more ancient speciations within Heteroderidae. Copyright 2001 Academic Press.
Systematic Error in Seed Plant Phylogenomics
Zhong, Bojian; Deusch, Oliver; Goremykin, Vadim V.; Penny, David; Biggs, Patrick J.; Atherton, Robin A.; Nikiforova, Svetlana V.; Lockhart, Peter James
2011-01-01
Resolving the closest relatives of Gnetales has been an enigmatic problem in seed plant phylogeny. The problem is known to be difficult because of the extent of divergence between this diverse group of gymnosperms and their closest phylogenetic relatives. Here, we investigate the evolutionary properties of conifer chloroplast DNA sequences. To improve taxon sampling of Cupressophyta (non-Pinaceae conifers), we report sequences from three new chloroplast (cp) genomes of Southern Hemisphere conifers. We have applied a site pattern sorting criterion to study compositional heterogeneity, heterotachy, and the fit of conifer chloroplast genome sequences to a general time reversible + G substitution model. We show that non-time reversible properties of aligned sequence positions in the chloroplast genomes of Gnetales mislead phylogenetic reconstruction of these seed plants. When 2,250 of the most varied sites in our concatenated alignment are excluded, phylogenetic analyses favor a close evolutionary relationship between the Gnetales and Pinaceae—the Gnepine hypothesis. Our analytical protocol provides a useful approach for evaluating the robustness of phylogenomic inferences. Our findings highlight the importance of goodness of fit between substitution model and data for understanding seed plant phylogeny. PMID:22016337
An algebraic hypothesis about the primeval genetic code architecture.
Sánchez, Robersy; Grau, Ricardo
2009-09-01
A plausible architecture of an ancient genetic code is derived from an extended base triplet vector space over the Galois field of the extended base alphabet {D,A,C,G,U}, where symbol D represents one or more hypothetical bases with unspecific pairings. We hypothesized that the high degeneration of a primeval genetic code with five bases and the gradual origin and improvement of a primeval DNA repair system could make possible the transition from ancient to modern genetic codes. Our results suggest that the Watson-Crick base pairing G identical with C and A=U and the non-specific base pairing of the hypothetical ancestral base D used to define the sum and product operations are enough features to determine the coding constraints of the primeval and the modern genetic code, as well as, the transition from the former to the latter. Geometrical and algebraic properties of this vector space reveal that the present codon assignment of the standard genetic code could be induced from a primeval codon assignment. Besides, the Fourier spectrum of the extended DNA genome sequences derived from the multiple sequence alignment suggests that the called period-3 property of the present coding DNA sequences could also exist in the ancient coding DNA sequences. The phylogenetic analyses achieved with metrics defined in the N-dimensional vector space (B(3))(N) of DNA sequences and with the new evolutionary model presented here also suggest that an ancient DNA coding sequence with five or more bases does not contradict the expected evolutionary history.
Kimita, Gathii; Mutai, Beth; Nyanjom, Steven Ger; Wamunyokoli, Fred; Waitumbi, John
2016-07-01
Rickettsia africae, the etiological agent of African tick bite fever, is widely distributed in sub-Saharan Africa. Contrary to reports of its homogeneity, a localized study in Asembo, Kenya recently reported high genetic diversity. The present study aims to elucidate the extent of this heterogeneity by examining archived Rickettsia africae DNA samples collected from different eco-regions of Kenya. To evaluate their phylogenetic relationships, archived genomic DNA obtained from 57 ticks a priori identified to contain R. africae by comparison to ompA, ompB and gltA genes was used to amplify five rickettsial genes i.e. gltA, ompA, ompB, 17kDa and sca4. The resulting amplicons were sequenced. Translated amino acid alignments were used to guide the nucleotide alignments. Single gene and concatenated alignments were used to infer phylogenetic relationships. Out of the 57 DNA samples, three were determined to be R. aeschlimanii and not R. africae. One sample turned out to be a novel rickettsiae and an interim name of "Candidatus Rickettsia moyalensis" is proposed. The bonafide R. africae formed two distinct clades. Clade I contained 9% of the samples and branched with the validated R. africae str ESF-5, while clade II (two samples) formed a distinct sub-lineage. This data supports the use of multiple genes for phylogenetic inferences. It is determined that, despite its recent emergence, the R. africae lineage is diverse. This data also provides evidence of a novel Rickettsia species, Candidatus Rickettsia moyalensis.
Gate-controlled conductance switching in DNA
Xiang, Limin; Palma, Julio L.; Li, Yueqi; Mujica, Vladimiro; Ratner, Mark A.; Tao, Nongjian
2017-01-01
Extensive evidence has shown that long-range charge transport can occur along double helical DNA, but active control (switching) of single-DNA conductance with an external field has not yet been demonstrated. Here we demonstrate conductance switching in DNA by replacing a DNA base with a redox group. By applying an electrochemical (EC) gate voltage to the molecule, we switch the redox group between the oxidized and reduced states, leading to reversible switching of the DNA conductance between two discrete levels. We further show that monitoring the individual conductance switching allows the study of redox reaction kinetics and thermodynamics at single molecular level using DNA as a probe. Our theoretical calculations suggest that the switch is due to the change in the energy level alignment of the redox states relative to the Fermi level of the electrodes. PMID:28218275
Cost-Effective Sequencing of Full-Length cDNA Clones Powered by a De Novo-Reference Hybrid Assembly
Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka
2010-01-01
Background Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. Methodology We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence ∼800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. Conclusions The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only ∼US$3 per clone, demonstrating a significant advantage over previous approaches. PMID:20479877
Response of human corneal fibroblasts on silk film surface patterns.
Gil, Eun Seok; Park, Sang-Hyug; Marchant, Jeff; Omenetto, Fiorenzo; Kaplan, David L
2010-06-11
Transparent, biodegradable, mechanically robust, and surface-patterned silk films were evaluated for the effect of surface morphology on human corneal fibroblast (hCF) cell proliferation, orientation, and ECM deposition and alignment. A series of dimensionally different surface groove patterns were prepared from optically graded glass substrates followed by casting poly(dimethylsiloxane) (PDMS) replica molds. The features on the patterned silk films showed an array of asymmetric triangles and displayed 37-342 nm depths and 445-3 582 nm widths. hCF DNA content on all patterned films were not significantly different from that on flat silk films after 4 d in culture. However, the depth and width of the grooves influenced cell alignment, while the depth differences affected cell orientation; overall, deeper and narrower grooves induced more hCF orientation. Over 14 d in culture, cell layers and actin filament organization demonstrated that confluent hCFs and their cytoskeletal filaments were oriented along the direction of the silk film patterned groove axis. Collagen type V and proteoglycans (decorin and biglycan), important markers of corneal stromal tissue, were highly expressed with alignment. Understanding corneal stromal fibroblast responses to surface features on a protein-based biomaterial applicable in vivo for corneal repair potential suggests options to improve corneal tissue mimics. Further, the approaches provide fundamental biomaterial designs useful for bioengineering oriented tissue layers, an endemic feature in most biological tissue structures that lead to critical tissue functions.
Baumler, David J.; Banta, Lois M.; Hung, Kai F.; Schwarz, Jodi A.; Cabot, Eric L.; Glasner, Jeremy D.; Perna, Nicole T.
2012-01-01
Genomics and bioinformatics are topics of increasing interest in undergraduate biological science curricula. Many existing exercises focus on gene annotation and analysis of a single genome. In this paper, we present two educational modules designed to enable students to learn and apply fundamental concepts in comparative genomics using examples related to bacterial pathogenesis. Students first examine alignments of genomes of Escherichia coli O157:H7 strains isolated from three food-poisoning outbreaks using the multiple-genome alignment tool Mauve. Students investigate conservation of virulence factors using the Mauve viewer and by browsing annotations available at the A Systematic Annotation Package for Community Analysis of Genomes database. In the second module, students use an alignment of five Yersinia pestis genomes to analyze single-nucleotide polymorphisms of three genes to classify strains into biovar groups. Students are then given sequences of bacterial DNA amplified from the teeth of corpses from the first and second pandemics of the bubonic plague and asked to classify these new samples. Learning-assessment results reveal student improvement in self-efficacy and content knowledge, as well as students' ability to use BLAST to identify genomic islands and conduct analyses of virulence factors from E. coli O157:H7 or Y. pestis. Each of these educational modules offers educators new ready-to-implement resources for integrating comparative genomic topics into their curricula. PMID:22383620
Bustamante, Carlos; Chemla, Yann R; Moffitt, Jeffrey R
2009-10-01
Optical traps or "optical tweezers" have become an indispensable tool in understanding fundamental biological processes. Using our design, a dual-trap optical tweezers with differential detection, we can detect length changes to a DNA molecule tethering the trapped beads of 1 bp. By forming two traps from the same laser and maximizing the common optical paths of the two trapping beams, we decouple the instrument from many sources of environmental and instrumental noise that typically limit spatial resolution. The performance of a high-resolution instrument--the formation of strong traps, the minimization of background signals from trap movements, or the mitigation of the axial coupling, for example--can be greatly improved through careful alignment. This procedure, which is described in this article, starts from the laser and advances through the instrument, component by component. Alignment is complicated by the fact that the trapping light is in the near infrared (NIR) spectrum. Standard infrared viewing cards are commonly used to locate the beam, but unfortunately, bleach quickly. As an alternative, we use an IR-viewing charge-coupled device (CCD) camera equipped with a C-mount telephoto lens and display its image on a monitor. By visualizing the scattered light on a pair of irises of identical height separated by >12 in., the beam direction can be set very accurately along a fixed axis.
Zheng, Qi; Grice, Elizabeth A.
2016-01-01
Accurate mapping of next-generation sequencing (NGS) reads to reference genomes is crucial for almost all NGS applications and downstream analyses. Various repetitive elements in human and other higher eukaryotic genomes contribute in large part to ambiguously (non-uniquely) mapped reads. Most available NGS aligners attempt to address this by either removing all non-uniquely mapping reads, or reporting one random or "best" hit based on simple heuristics. Accurate estimation of the mapping quality of NGS reads is therefore critical albeit completely lacking at present. Here we developed a generalized software toolkit "AlignerBoost", which utilizes a Bayesian-based framework to accurately estimate mapping quality of ambiguously mapped NGS reads. We tested AlignerBoost with both simulated and real DNA-seq and RNA-seq datasets at various thresholds. In most cases, but especially for reads falling within repetitive regions, AlignerBoost dramatically increases the mapping precision of modern NGS aligners without significantly compromising the sensitivity even without mapping quality filters. When using higher mapping quality cutoffs, AlignerBoost achieves a much lower false mapping rate while exhibiting comparable or higher sensitivity compared to the aligner default modes, therefore significantly boosting the detection power of NGS aligners even using extreme thresholds. AlignerBoost is also SNP-aware, and higher quality alignments can be achieved if provided with known SNPs. AlignerBoost’s algorithm is computationally efficient, and can process one million alignments within 30 seconds on a typical desktop computer. AlignerBoost is implemented as a uniform Java application and is freely available at https://github.com/Grice-Lab/AlignerBoost. PMID:27706155
Robinson, P J; Cranenburgh, R M; Head, I M; Robinson, N J
1997-04-01
The sequence 5'-GCGATCGC-3', designated HIP1, for highly iterated palindrome, was first identified at the borders of a gene-deletion event and subsequently shown to constitute up to 2.5% of the DNA in some cyanobacteria. It is now reported that HIP1 is polyphyletic, occurring in several distinct cyanobacterial lineages and not defining a clade. HIP1 does not introduce gaps into sequence alignments. It aligns with partial HIP1 sites in related sequences showing that it propagates by nucleotide substitutions rather than insertion. Constructs have been created to determine the frequencies at which deletion events occur between palindromes located within the selectable marker neo. Deletion between HIP1 sites was more frequent in Synechococcus PCC 7942 than deletion between control palindromes, 5'-CCGATCGG-3', designated PAL0. However, this is not due to a recombinase that recognises HIP1 and is peculiar to cyanobacteria because similar deletion frequencies were detected in Escherichia coli. Furthermore, the frequency of deletion of DNA flanked asymmetrically by one HIP1 site and one PAL0 site was less than the frequency of deletion of DNA flanked asymmetrically by identical copies of either palindrome. This is consistent with deletion by copy-choice.
NASA Astrophysics Data System (ADS)
Leonardi, Marcelo
The primary purpose of this study was to examine the impact of a scheduling change from a trimester 4x4 block schedule to a modified hybrid schedule on student achievement in ninth grade biology courses. This study examined the impact of the scheduling change on student achievement through teacher created benchmark assessments in Genetics, DNA, and Evolution and on the California Standardized Test in Biology. The secondary purpose of this study examined the ninth grade biology teacher perceptions of ninth grade biology student achievement. Using a mixed methods research approach, data was collected both quantitatively and qualitatively as aligned to research questions. Quantitative methods included gathering data from departmental benchmark exams and California Standardized Test in Biology and conducting multiple analysis of covariance and analysis of covariance to determine significance differences. Qualitative methods include journal entries questions and focus group interviews. The results revealed a statistically significant increase in scores on both the DNA and Evolution benchmark exams. DNA and Evolution benchmark exams showed significant improvements from a change in scheduling format. The scheduling change was responsible for 1.5% of the increase in DNA benchmark scores and 2% of the increase in Evolution benchmark scores. The results revealed a statistically significant decrease in scores on the Genetics Benchmark exam as a result of the scheduling change. The scheduling change was responsible for 1% of the decrease in Genetics benchmark scores. The results also revealed a statistically significant increase in scores on the CST Biology exam. The scheduling change was responsible for .7% of the increase in CST Biology scores. Results of the focus group discussions indicated that all teachers preferred the modified hybrid schedule over the trimester schedule and that it improved student achievement.
JVM: Java Visual Mapping tool for next generation sequencing read.
Yang, Ye; Liu, Juan
2015-01-01
We developed a program JVM (Java Visual Mapping) for mapping next generation sequencing read to reference sequence. The program is implemented in Java and is designed to deal with millions of short read generated by sequence alignment using the Illumina sequencing technology. It employs seed index strategy and octal encoding operations for sequence alignments. JVM is useful for DNA-Seq, RNA-Seq when dealing with single-end resequencing. JVM is a desktop application, which supports reads capacity from 1 MB to 10 GB.
Alignment method for solar collector arrays
Driver, Jr., Richard B
2012-10-23
The present invention is directed to an improved method for establishing camera fixture location for aligning mirrors on a solar collector array (SCA) comprising multiple mirror modules. The method aligns the mirrors on a module by comparing the location of the receiver image in photographs with the predicted theoretical receiver image location. To accurately align an entire SCA, a common reference is used for all of the individual module images within the SCA. The improved method can use relative pixel location information in digital photographs along with alignment fixture inclinometer data to calculate relative locations of the fixture between modules. The absolute locations are determined by minimizing alignment asymmetry for the SCA. The method inherently aligns all of the mirrors in an SCA to the receiver, even with receiver position and module-to-module alignment errors.
Capture and alignment of phi29 viral particles in sub-40 nanometer porous alumina membranes.
Moon, Jeong-Mi; Akin, Demir; Xuan, Yi; Ye, Peide D; Guo, Peixuan; Bashir, Rashid
2009-02-01
Bacteriophage phi29 virus nanoparticles and its associated DNA packaging nanomotor can provide for novel possibilities towards the development of hybrid bio-nano structures. Towards the goal of interfacing the phi29 viruses and nanomotors with artificial micro and nanostructures, we fabricated nanoporous Anodic Aluminum Oxide (AAO) membranes with pore size of 70 nm and shrunk the pores to sub 40 nm diameter using atomic layer deposition (ALD) of Aluminum Oxide. We were able to capture and align particles in the anodized nanopores using two methods. Firstly, a functionalization and polishing process to chemically attach the particles in the inner surface of the pores was developed. Secondly, centrifugation of the particles was utilized to align them in the pores of the nanoporous membranes. In addition, when a mixture of empty capsids and packaged particles was centrifuged at specific speeds, it was found that the empty capsids deform and pass through 40 nm diameter pores whereas the particles packaged with DNA were mainly retained at the top surface of the nanoporous membranes. Fluorescence microscopy was used to verify the selective filtration of empty capsids through the nanoporous membranes.
The chromokinesin Kid is required for maintenance of proper metaphase spindle size.
Tokai-Nishizumi, Noriko; Ohsugi, Miho; Suzuki, Emiko; Yamamoto, Tadashi
2005-11-01
The human chromokinesin Kid/kinesin-10, a plus end-directed microtubule (MT)-based motor with both microtubule- and DNA-binding domains, is required for proper chromosome alignment at the metaphase plate. Here, we performed RNA interference experiments to deplete endogenous Kid from HeLa cells and confirmed defects in metaphase chromosome arm alignment in Kid-depleted cells. In addition, we noted a shortening of the spindle length, resulting in a pole-to-pole distance only 80% of wild type. The spindle microtubule-bundles with which Kid normally colocalize became less robust. Rescue of the two Kid deficiency phenotypes-imprecise chromosome alignment at metaphase and shortened spindles- exhibited distinct requirements. Mutants lacking either the DNA-binding domain or the MT motor ATPase failed to rescue the former defect, whereas rescue of the shortened spindle phenotype required neither activity. Kid also exhibits microtubule bundling activity in vitro, and rescue of the shortened spindle phenotype and the bundling activity displayed similar domain requirements, except that rescue required a coiled-coil domain not needed for bundling. These results suggest that distinct from its role in chromosome movement, Kid contributes to spindle morphogenesis by mediating spindle microtubules stabilization.
The Chromokinesin Kid Is Required for Maintenance of Proper Metaphase Spindle SizeD⃞
Tokai-Nishizumi, Noriko; Ohsugi, Miho; Suzuki, Emiko; Yamamoto, Tadashi
2005-01-01
The human chromokinesin Kid/kinesin-10, a plus end-directed microtubule (MT)-based motor with both microtubule- and DNA-binding domains, is required for proper chromosome alignment at the metaphase plate. Here, we performed RNA interference experiments to deplete endogenous Kid from HeLa cells and confirmed defects in metaphase chromosome arm alignment in Kid-depleted cells. In addition, we noted a shortening of the spindle length, resulting in a pole-to-pole distance only 80% of wild type. The spindle microtubule-bundles with which Kid normally colocalize became less robust. Rescue of the two Kid deficiency phenotypes—imprecise chromosome alignment at metaphase and shortened spindles— exhibited distinct requirements. Mutants lacking either the DNA-binding domain or the MT motor ATPase failed to rescue the former defect, whereas rescue of the shortened spindle phenotype required neither activity. Kid also exhibits microtubule bundling activity in vitro, and rescue of the shortened spindle phenotype and the bundling activity displayed similar domain requirements, except that rescue required a coiled-coil domain not needed for bundling. These results suggest that distinct from its role in chromosome movement, Kid contributes to spindle morphogenesis by mediating spindle microtubules stabilization. PMID:16176979
Retention-error patterns in complex alphanumeric serial-recall tasks.
Mathy, Fabien; Varré, Jean-Stéphane
2013-01-01
We propose a new method based on an algorithm usually dedicated to DNA sequence alignment in order to both reliably score short-term memory performance on immediate serial-recall tasks and analyse retention-error patterns. There can be considerable confusion on how performance on immediate serial list recall tasks is scored, especially when the to-be-remembered items are sampled with replacement. We discuss the utility of sequence-alignment algorithms to compare the stimuli to the participants' responses. The idea is that deletion, substitution, translocation, and insertion errors, which are typical in DNA, are also typical putative errors in short-term memory (respectively omission, confusion, permutation, and intrusion errors). We analyse four data sets in which alphanumeric lists included a few (or many) repetitions. After examining the method on two simple data sets, we show that sequence alignment offers 1) a compelling method for measuring capacity in terms of chunks when many regularities are introduced in the material (third data set) and 2) a reliable estimator of individual differences in short-term memory capacity. This study illustrates the difficulty of arriving at a good measure of short-term memory performance, and also attempts to characterise the primary factors underpinning remembering and forgetting.
Phylogenetic study of Class Armophorea (Alveolata, Ciliophora) based on 18S-rDNA data.
da Silva Paiva, Thiago; do Nascimento Borges, Bárbara; da Silva-Neto, Inácio Domingos
2013-12-01
The 18S rDNA phylogeny of Class Armophorea, a group of anaerobic ciliates, is proposed based on an analysis of 44 sequences (out of 195) retrieved from the NCBI/GenBank database. Emphasis was placed on the use of two nucleotide alignment criteria that involved variation in the gap-opening and gap-extension parameters and the use of rRNA secondary structure to orientate multiple-alignment. A sensitivity analysis of 76 data sets was run to assess the effect of variations in indel parameters on tree topologies. Bayesian inference, maximum likelihood and maximum parsimony phylogenetic analyses were used to explore how different analytic frameworks influenced the resulting hypotheses. A sensitivity analysis revealed that the relationships among higher taxa of the Intramacronucleata were dependent upon how indels were determined during multiple-alignment of nucleotides. The phylogenetic analyses rejected the monophyly of the Armophorea most of the time and consistently indicated that the Metopidae and Nyctotheridae were related to the Litostomatea. There was no consensus on the placement of the Caenomorphidae, which could be a sister group of the Metopidae + Nyctorheridae, or could have diverged at the base of the Spirotrichea branch or the Intramacronucleata tree.
Phylogenetic study of Class Armophorea (Alveolata, Ciliophora) based on 18S-rDNA data
da Silva Paiva, Thiago; do Nascimento Borges, Bárbara; da Silva-Neto, Inácio Domingos
2013-01-01
The 18S rDNA phylogeny of Class Armophorea, a group of anaerobic ciliates, is proposed based on an analysis of 44 sequences (out of 195) retrieved from the NCBI/GenBank database. Emphasis was placed on the use of two nucleotide alignment criteria that involved variation in the gap-opening and gap-extension parameters and the use of rRNA secondary structure to orientate multiple-alignment. A sensitivity analysis of 76 data sets was run to assess the effect of variations in indel parameters on tree topologies. Bayesian inference, maximum likelihood and maximum parsimony phylogenetic analyses were used to explore how different analytic frameworks influenced the resulting hypotheses. A sensitivity analysis revealed that the relationships among higher taxa of the Intramacronucleata were dependent upon how indels were determined during multiple-alignment of nucleotides. The phylogenetic analyses rejected the monophyly of the Armophorea most of the time and consistently indicated that the Metopidae and Nyctotheridae were related to the Litostomatea. There was no consensus on the placement of the Caenomorphidae, which could be a sister group of the Metopidae + Nyctorheridae, or could have diverged at the base of the Spirotrichea branch or the Intramacronucleata tree. PMID:24385862
Ramos, Enrique; Levinson, Benjamin T; Chasnoff, Sara; Hughes, Andrew; Young, Andrew L; Thornton, Katherine; Li, Allie; Vallania, Francesco L M; Province, Michael; Druley, Todd E
2012-12-06
Rare genetic variation in the human population is a major source of pathophysiological variability and has been implicated in a host of complex phenotypes and diseases. Finding disease-related genes harboring disparate functional rare variants requires sequencing of many individuals across many genomic regions and comparing against unaffected cohorts. However, despite persistent declines in sequencing costs, population-based rare variant detection across large genomic target regions remains cost prohibitive for most investigators. In addition, DNA samples are often precious and hybridization methods typically require large amounts of input DNA. Pooled sample DNA sequencing is a cost and time-efficient strategy for surveying populations of individuals for rare variants. We set out to 1) create a scalable, multiplexing method for custom capture with or without individual DNA indexing that was amenable to low amounts of input DNA and 2) expand the functionality of the SPLINTER algorithm for calling substitutions, insertions and deletions across either candidate genes or the entire exome by integrating the variant calling algorithm with the dynamic programming aligner, Novoalign. We report methodology for pooled hybridization capture with pre-enrichment, indexed multiplexing of up to 48 individuals or non-indexed pooled sequencing of up to 92 individuals with as little as 70 ng of DNA per person. Modified solid phase reversible immobilization bead purification strategies enable no sample transfers from sonication in 96-well plates through adapter ligation, resulting in 50% less library preparation reagent consumption. Custom Y-shaped adapters containing novel 7 base pair index sequences with a Hamming distance of ≥2 were directly ligated onto fragmented source DNA eliminating the need for PCR to incorporate indexes, and was followed by a custom blocking strategy using a single oligonucleotide regardless of index sequence. These results were obtained aligning raw reads against the entire genome using Novoalign followed by variant calling of non-indexed pools using SPLINTER or SAMtools for indexed samples. With these pipelines, we find sensitivity and specificity of 99.4% and 99.7% for pooled exome sequencing. Sensitivity, and to a lesser degree specificity, proved to be a function of coverage. For rare variants (≤2% minor allele frequency), we achieved sensitivity and specificity of ≥94.9% and ≥99.99% for custom capture of 2.5 Mb in multiplexed libraries of 22-48 individuals with only ≥5-fold coverage/chromosome, but these parameters improved to ≥98.7 and 100% with 20-fold coverage/chromosome. This highly scalable methodology enables accurate rare variant detection, with or without individual DNA sample indexing, while reducing the amount of required source DNA and total costs through less hybridization reagent consumption, multi-sample sonication in a standard PCR plate, multiplexed pre-enrichment pooling with a single hybridization and lesser sequencing coverage required to obtain high sensitivity.
Zhang, Tao; Zhu, Yongyun; Zhou, Feng; Yan, Yaxiong; Tong, Jinwu
2017-06-17
Initial alignment of the strapdown inertial navigation system (SINS) is intended to determine the initial attitude matrix in a short time with certain accuracy. The alignment accuracy of the quaternion filter algorithm is remarkable, but the convergence rate is slow. To solve this problem, this paper proposes an improved quaternion filter algorithm for faster initial alignment based on the error model of the quaternion filter algorithm. The improved quaternion filter algorithm constructs the K matrix based on the principle of optimal quaternion algorithm, and rebuilds the measurement model by containing acceleration and velocity errors to make the convergence rate faster. A doppler velocity log (DVL) provides the reference velocity for the improved quaternion filter alignment algorithm. In order to demonstrate the performance of the improved quaternion filter algorithm in the field, a turntable experiment and a vehicle test are carried out. The results of the experiments show that the convergence rate of the proposed improved quaternion filter is faster than that of the tradition quaternion filter algorithm. In addition, the improved quaternion filter algorithm also demonstrates advantages in terms of correctness, effectiveness, and practicability.
PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences.
Mirarab, Siavash; Nguyen, Nam; Guo, Sheng; Wang, Li-San; Kim, Junhyong; Warnow, Tandy
2015-05-01
We introduce PASTA, a new multiple sequence alignment algorithm. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing that PASTA produces highly accurate alignments, improving on the accuracy and scalability of the leading alignment methods (including SATé). We also show that trees estimated on PASTA alignments are highly accurate--slightly better than SATé trees, but with substantial improvements relative to other methods. Finally, PASTA is faster than SATé, highly parallelizable, and requires relatively little memory.
A putative peroxidase cDNA from turnip and analysis of the encoded protein sequence.
Romero-Gómez, S; Duarte-Vázquez, M A; García-Almendárez, B E; Mayorga-Martínez, L; Cervantes-Avilés, O; Regalado, C
2008-12-01
A putative peroxidase cDNA was isolated from turnip roots (Brassica napus L. var. purple top white globe) by reverse transcriptase-polymerase chain reaction (RT-PCR) and rapid amplification of cDNA ends (RACE). Total RNA extracted from mature turnip roots was used as a template for RT-PCR, using a degenerated primer designed to amplify the highly conserved distal motif of plant peroxidases. The resulting partial sequence was used to design the rest of the specific primers for 5' and 3' RACE. Two cDNA fragments were purified, sequenced, and aligned with the partial sequence from RT-PCR, and a complete overlapping sequence was obtained and labeled as BbPA (Genbank Accession No. AY423440, named as podC). The full length cDNA is 1167bp long and contains a 1077bp open reading frame (ORF) encoding a 358 deduced amino acid peroxidase polypeptide. The putative peroxidase (BnPA) showed a calculated Mr of 34kDa, and isoelectric point (pI) of 4.5, with no significant identity with other reported turnip peroxidases. Sequence alignment showed that only three peroxidases have a significant identity with BnPA namely AtP29a (84%), and AtPA2 (81%) from Arabidopsis thaliana, and HRPA2 (82%) from horseradish (Armoracia rusticana). Work is in progress to clone this gene into an adequate host to study the specific role and possible biotechnological applications of this alternative peroxidase source.
Guo, Chun-Teng; McClean, Stephen; Shaw, Chris; Rao, Ping-Fan; Ye, Ming-Yu; Bjourson, Anthony J
2013-05-01
One novel Kunitz BPTI-like peptide designated as BBPTI-1, with chymotrypsin inhibitory activity was identified from the venom of Burmese Daboia russelii siamensis. It was purified by three steps of chromatography including gel filtration, cation exchange and reversed phase. A partial N-terminal sequence of BBPTI-1, HDRPKFCYLPADPGECLAHMRSF was obtained by automated Edman degradation and a Ki value of 4.77nM determined. Cloning of BBPTI-1 including the open reading frame and 3' untranslated region was achieved from cDNA libraries derived from lyophilized venom using a 3' RACE strategy. In addition a cDNA sequence, designated as BBPTI-5, was also obtained. Alignment of cDNA sequences showed that BBPTI-5 exhibited an identical sequence to BBPTI-1 cDNA except for an eight nucleotide deletion in the open reading frame. Gene variations that represented deletions in the BBPTI-5 cDNA resulted in a novel protease inhibitor analog. Amino acid sequence alignment revealed that deduced peptides derived from cloning of their respective precursor cDNAs from libraries showed high similarity and homology with other Kunitz BPTI proteinase inhibitors. BBPTI-1 and BBPTI-5 consist of 60 and 66 amino acid residues respectively, including six conserved cysteine residues. As these peptides have been reported to have influence on the processes of coagulation, fibrinolysis and inflammation, their potential application in biomedical contexts warrants further investigation. Copyright © 2013 Elsevier Inc. All rights reserved.
Is “Junk” DNA Mostly Intron DNA?
Wong, Gane Ka-Shu; Passey, Douglas A.; Huang, Ying-zong; Yang, Zhiyong; Yu, Jun
2000-01-01
Among higher eukaryotes, very little of the genome codes for protein. What is in the rest of the genome, or the “junk” DNA, that, in Homo sapiens, is estimated to be almost 97% of the genome? Is it possible that much of this “junk” is intron DNA? This is not a question that can be answered just by looking at the published data, even from the finished genomes. One cannot assume that there are no genes in a sequenced region, just because no genes were annotated. We introduce another approach to this problem, based on an analysis of the cDNA-to-genomic alignments, in all of the complete or nearly-complete genomes from the multicellular organisms. Our conclusion is that, in animals but not in plants, most of the “junk” is intron DNA. PMID:11076852
Mass transport through vertically aligned large diameter MWCNT embedded in parylene
Krishnakumar, P; Tiwari, P B; Staples, S; Luo, T; Darici, Y; He, J; Lindsay, SM
2013-01-01
We have fabricated porous membranes using a parylene encapsulated vertically aligned forest of multi-walled carbon nanotube (MWCNT, about 7nm inner diameter). The transport of charged particles in electrolyte through these membranes was studied by applying electric field and pressure. Under an electric field in the range of 4.4×104 V/m, electrophoresis instead of electroomosis is found to be the main mechanism for ion transport. Small molecules and 5 nm gold nanoparticles can be driven through the membranes by an electric field. However, small biomolecules, like DNA oligomers, cannot. Due to the weak electric driving force, the interactions between charged particles and the hydrophobic CNT inner surface play important roles in the transport, leading to enhanced selectivity for small molecules. Simple chemical modification on the CNT ends also induces an obvious effect on the translocation of single strand DNA oligomer and gold nanoparticle under a modest pressure (<294 Pa). PMID:23064678
Dzikowski, R; Levy, M G; Poore, M F; Flowers, J R; Paperna, I
2003-12-29
Metacercariae of Bolbophorus species are serious pathogens of farmed fish. Molecular diagnostic tools, capable of identifying and differentiating these parasites, may assist in the development of rationale control strategies. The rDNA 18S (small sub-unit: SSU) genes of adult B. confusus and B. levantinus obtained from a pelican, Pelecanus onocrotalus, and a night heron, Nycticorax nycticorax, respectively, were amplified, sequenced, and aligned. Based on this alignment, we developed a genetic differentiation assay between B. confusus and B. levantinus. These 2 species were compared genetically with the North American species B. damnificus and Bolbophorus sp. ('Type 2'). The relationship between species is outlined and discussed. In addition to the molecular study, specimens of B. confusus and B. levantinus were compared morphologically, using scanning electron microscopy. Morphologic analysis revealed interspecific differences in details of the holdfast organ and the position of the acetabulum.
NASA Astrophysics Data System (ADS)
Spinney, Patrick; Collins, Scott D.; Howitt, David G.; Smith, Rosemary L.
2012-06-01
Rapid and cost-effective DNA sequencing is a pivotal prerequisite for the genomics era. Many of the recent advances in forensics, medicine, agriculture, taxonomy, and drug discovery have paralleled critical advances in DNA sequencing technology. Nanopore modalities for DNA sequencing have recently surfaced including the electrical interrogation of protein ion channels and/or solid-state nanopores during translocation of DNA. However to date, most of this work has met with mixed success. In this work, we present a unique nanofabrication strategy that realizes an artificial nanopore articulated with carbon electrodes to sense the current modulations during the transport of DNA through the nanopore. This embodiment overcomes most of the technical difficulties inherent in other artificial nanopore embodiments and present a versatile platform for the testing of DNA single nucleotide detection. Characterization of the device using gold nanoparticles, silica nanoparticles, lambda dsDNA and 16-mer ssDNA are presented. Although single molecule DNA sequencing is still not demonstrated, the device shows a path towards this goal.
Taggart, David J.; Camerlengo, Terry L.; Harrison, Jason K.; Sherrer, Shanen M.; Kshetry, Ajay K.; Taylor, John-Stephen; Huang, Kun; Suo, Zucai
2013-01-01
Cellular genomes are constantly damaged by endogenous and exogenous agents that covalently and structurally modify DNA to produce DNA lesions. Although most lesions are mended by various DNA repair pathways in vivo, a significant number of damage sites persist during genomic replication. Our understanding of the mutagenic outcomes derived from these unrepaired DNA lesions has been hindered by the low throughput of existing sequencing methods. Therefore, we have developed a cost-effective high-throughput short oligonucleotide sequencing assay that uses next-generation DNA sequencing technology for the assessment of the mutagenic profiles of translesion DNA synthesis catalyzed by any error-prone DNA polymerase. The vast amount of sequencing data produced were aligned and quantified by using our novel software. As an example, the high-throughput short oligonucleotide sequencing assay was used to analyze the types and frequencies of mutations upstream, downstream and at a site-specifically placed cis–syn thymidine–thymidine dimer generated individually by three lesion-bypass human Y-family DNA polymerases. PMID:23470999
DNA Barcode Sequence Identification Incorporating Taxonomic Hierarchy and within Taxon Variability
Little, Damon P.
2011-01-01
For DNA barcoding to succeed as a scientific endeavor an accurate and expeditious query sequence identification method is needed. Although a global multiple–sequence alignment can be generated for some barcoding markers (e.g. COI, rbcL), not all barcoding markers are as structurally conserved (e.g. matK). Thus, algorithms that depend on global multiple–sequence alignments are not universally applicable. Some sequence identification methods that use local pairwise alignments (e.g. BLAST) are unable to accurately differentiate between highly similar sequences and are not designed to cope with hierarchic phylogenetic relationships or within taxon variability. Here, I present a novel alignment–free sequence identification algorithm–BRONX–that accounts for observed within taxon variability and hierarchic relationships among taxa. BRONX identifies short variable segments and corresponding invariant flanking regions in reference sequences. These flanking regions are used to score variable regions in the query sequence without the production of a global multiple–sequence alignment. By incorporating observed within taxon variability into the scoring procedure, misidentifications arising from shared alleles/haplotypes are minimized. An explicit treatment of more inclusive terminals allows for separate identifications to be made for each taxonomic level and/or for user–defined terminals. BRONX performs better than all other methods when there is imperfect overlap between query and reference sequences (e.g. mini–barcode queries against a full–length barcode database). BRONX consistently produced better identifications at the genus–level for all query types. PMID:21857897
MCM ring hexamerization is a prerequisite for DNA-binding
Froelich, Clifford A.; Nourse, Amanda; Enemark, Eric J.
2015-09-13
The hexameric Minichromosome Maintenance (MCM) protein complex forms a ring that unwinds DNA at the replication fork in eukaryotes and archaea. Our recent crystal structure of an archaeal MCM N-terminal domain bound to single-stranded DNA (ssDNA) revealed ssDNA associating across tight subunit interfaces but not at the loose interfaces, indicating that DNA-binding is governed not only by the DNA-binding residues of the subunits (MCM ssDNA-binding motif, MSSB) but also by the relative orientation of the subunits. We now extend these findings to show that DNA-binding by the MCM N-terminal domain of the archaeal organism Pyrococcus furiosus occurs specifically in themore » hexameric oligomeric form. We show that mutants defective for hexamerization are defective in binding ssDNA despite retaining all the residues observed to interact with ssDNA in the crystal structure. One mutation that exhibits severely defective hexamerization and ssDNA-binding is at a conserved phenylalanine that aligns with the mouse Mcm4(Chaos3) mutation associated with chromosomal instability, cancer, and decreased intersubunit association.« less
Fusion bonding and alignment fixture
Ackler, Harold D.; Swierkowski, Stefan P.; Tarte, Lisa A.; Hicks, Randall K.
2000-01-01
An improved vacuum fusion bonding structure and process for aligned bonding of large area glass plates, patterned with microchannels and access holes and slots, for elevated glass fusion temperatures. Vacuum pumpout of all the components is through the bottom platform which yields an untouched, defect free top surface which greatly improves optical access through this smooth surface. Also, a completely non-adherent interlayer, such as graphite, with alignment and location features is located between the main steel platform and the glass plate pair, which makes large improvements in quality, yield, and ease of use, and enables aligned bonding of very large glass structures.
NASA Astrophysics Data System (ADS)
Li, Jing; Song, Ningfang; Yang, Gongliu; Jiang, Rui
2016-07-01
In the initial alignment process of strapdown inertial navigation system (SINS), large misalignment angles always bring nonlinear problem, which can usually be processed using the scaled unscented Kalman filter (SUKF). In this paper, the problem of large misalignment angles in SINS alignment is further investigated, and the strong tracking scaled unscented Kalman filter (STSUKF) is proposed with fixed parameters to improve convergence speed, while these parameters are artificially constructed and uncertain in real application. To further improve the alignment stability and reduce the parameters selection, this paper proposes a fuzzy adaptive strategy combined with STSUKF (FUZZY-STSUKF). As a result, initial alignment scheme of large misalignment angles based on FUZZY-STSUKF is designed and verified by simulations and turntable experiment. The results show that the scheme improves the accuracy and convergence speed of SINS initial alignment compared with those based on SUKF and STSUKF.
Warp-averaging event-related potentials.
Wang, K; Begleiter, H; Porjesz, B
2001-10-01
To align the repeated single trials of the event-related potential (ERP) in order to get an improved estimate of the ERP. A new implementation of the dynamic time warping is applied to compute a warp-average of the single trials. The trilinear modeling method is applied to filter the single trials prior to alignment. Alignment is based on normalized signals and their estimated derivatives. These features reduce the misalignment due to aligning the random alpha waves, explaining amplitude differences in latency differences, or the seemingly small amplitudes of some components. Simulations and applications to visually evoked potentials show significant improvement over some commonly used methods. The new implementation of the dynamic time warping can be used to align the major components (P1, N1, P2, N2, P3) of the repeated single trials. The average of the aligned single trials is an improved estimate of the ERP. This could lead to more accurate results in subsequent analysis.
Valenzuela-González, Fabiola; Martínez-Porchas, Marcel; Villalpando-Canchola, Enrique; Vargas-Albores, Francisco
2016-03-01
Ultrafast-metagenomic sequence classification using exact alignments (Kraken) is a novel approach to classify 16S rDNA sequences. The classifier is based on mapping short sequences to the lowest ancestor and performing alignments to form subtrees with specific weights in each taxon node. This study aimed to evaluate the classification performance of Kraken with long 16S rDNA random environmental sequences produced by cloning and then Sanger sequenced. A total of 480 clones were isolated and expanded, and 264 of these clones formed contigs (1352 ± 153 bp). The same sequences were analyzed using the Ribosomal Database Project (RDP) classifier. Deeper classification performance was achieved by Kraken than by the RDP: 73% of the contigs were classified up to the species or variety levels, whereas 67% of these contigs were classified no further than the genus level by the RDP. The results also demonstrated that unassembled sequences analyzed by Kraken provide similar or inclusively deeper information. Moreover, sequences that did not form contigs, which are usually discarded by other programs, provided meaningful information when analyzed by Kraken. Finally, it appears that the assembly step for Sanger sequences can be eliminated when using Kraken. Kraken cumulates the information of both sequence senses, providing additional elements for the classification. In conclusion, the results demonstrate that Kraken is an excellent choice for use in the taxonomic assignment of sequences obtained by Sanger sequencing or based on third generation sequencing, of which the main goal is to generate larger sequences. Copyright © 2016 Elsevier B.V. All rights reserved.
PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences
Mirarab, Siavash; Nguyen, Nam; Guo, Sheng; Wang, Li-San; Kim, Junhyong
2015-01-01
Abstract We introduce PASTA, a new multiple sequence alignment algorithm. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing that PASTA produces highly accurate alignments, improving on the accuracy and scalability of the leading alignment methods (including SATé). We also show that trees estimated on PASTA alignments are highly accurate—slightly better than SATé trees, but with substantial improvements relative to other methods. Finally, PASTA is faster than SATé, highly parallelizable, and requires relatively little memory. PMID:25549288
Tucker, Valerie C; Kirkham, Amanda J; Hopwood, Andrew J
2012-05-01
We describe the forensic validation of Promega's PowerPlex® European Standard Investigator 16 (ESI 16) multiplex kit and compare results generated with the AmpFlSTR® SGM Plus® (SGM+) multiplex. ESI 16 combines the loci contained within the SGM+ multiplex with five additional loci: D2S441, D10S1248, D22S1045, D1S1656, and D12S391. A relative reduction in amplicon size of the SGM+ loci facilitates an increased robustness and amplification success of these amplicons with degraded DNA samples. Tests performed herein supplement ESI 16 data published previously with sensitivity, profile quality, mock casework, inhibitor and mixture study data collected in our laboratories in alignment with our internal technical and quality guidelines and those issued by the Scientific Working Group on DNA Analysis Methods (SWGDAM), the DNA Advisory Board (DAB) and the DNA working group (DNAWG) of the European Network of Forensic Science Institutes (ENFSI). Full profiles were routinely generated from a fully heterozygous single source DNA template using 62.5 pg for ESI 16 and 500 pg for SGM+. This increase in sensitivity has a consequent effect on mixture analyses and the detection of minor mixture components. The improved PCR chemistry confers enhanced tolerance to high levels of laboratory prepared inhibitors compared with SGM+ results. In summary, our results demonstrate that the ESI 16 multiplex kit is more robust and sensitive compared with SGM+ and will be a suitable replacement system for the analysis of forensic DNA samples providing compliance with the European standard set of STR loci.
Ultraaccurate genome sequencing and haplotyping of single human cells.
Chu, Wai Keung; Edge, Peter; Lee, Ho Suk; Bansal, Vikas; Bafna, Vineet; Huang, Xiaohua; Zhang, Kun
2017-11-21
Accurate detection of variants and long-range haplotypes in genomes of single human cells remains very challenging. Common approaches require extensive in vitro amplification of genomes of individual cells using DNA polymerases and high-throughput short-read DNA sequencing. These approaches have two notable drawbacks. First, polymerase replication errors could generate tens of thousands of false-positive calls per genome. Second, relatively short sequence reads contain little to no haplotype information. Here we report a method, which is dubbed SISSOR (single-stranded sequencing using microfluidic reactors), for accurate single-cell genome sequencing and haplotyping. A microfluidic processor is used to separate the Watson and Crick strands of the double-stranded chromosomal DNA in a single cell and to randomly partition megabase-size DNA strands into multiple nanoliter compartments for amplification and construction of barcoded libraries for sequencing. The separation and partitioning of large single-stranded DNA fragments of the homologous chromosome pairs allows for the independent sequencing of each of the complementary and homologous strands. This enables the assembly of long haplotypes and reduction of sequence errors by using the redundant sequence information and haplotype-based error removal. We demonstrated the ability to sequence single-cell genomes with error rates as low as 10 -8 and average 500-kb-long DNA fragments that can be assembled into haplotype contigs with N50 greater than 7 Mb. The performance could be further improved with more uniform amplification and more accurate sequence alignment. The ability to obtain accurate genome sequences and haplotype information from single cells will enable applications of genome sequencing for diverse clinical needs. Copyright © 2017 the Author(s). Published by PNAS.
A low-complexity add-on score for protein remote homology search with COMER.
Margelevicius, Mindaugas
2018-06-15
Protein sequence alignment forms the basis for comparative modeling, the most reliable approach to protein structure prediction, among many other applications. Alignment between sequence families, or profile-profile alignment, represents one of the most, if not the most, sensitive means for homology detection but still necessitates improvement. We aim at improving the quality of profile-profile alignments and the sensitivity induced by them by refining profile-profile substitution scores. We have developed a new score that represents an additional component of profile-profile substitution scores. A comprehensive evaluation shows that the new add-on score statistically significantly improves both the sensitivity and the alignment quality of the COMER method. We discuss why the score leads to the improvement and its almost optimal computational complexity that makes it easily implementable in any profile-profile alignment method. An implementation of the add-on score in the open-source COMER software and data are available at https://sourceforge.net/projects/comer. The COMER software is also available on Github at https://github.com/minmarg/comer and as a Docker image (minmar/comer). Supplementary data are available at Bioinformatics online.
Dynamics of TBP binding to the TATA box
NASA Astrophysics Data System (ADS)
Schluesche, Peter; Heiss, Gregor; Meisterernst, Michael; Lamb, Don C.
2008-02-01
Gene expression is highly controlled and regulated in living cells. One of the first steps in gene transcription is recognition of the promoter site by the TATA box Binding Protein (TBP). TBP recruits other transcriptions factors and eventually the RNA polymerase II to transcribe the DNA in mRNA. We developed a single pair Förster Resonance Energy Transfer (spFRET) assay to investigate the mechanism of gene regulation. Here, we apply this assay to investigate the initial binding process of TBP to the adenovirus major late (AdML) promoter site. From the spFRET measurements, we were able to identify two conformations of the TBP-DNA complex that correspond to TBP bound in the correct and the opposite orientation. Increased incubation times or the presence of the transcription factor TFIIA improved the alignment of TBP on the promoter site. Binding of TBP to the TATA box shows a rich dynamics with abrupt transitions between multiple FRET states. A frame-wise histogram analysis revealed the presence of at least six discrete states, showing that TBP binding is more complicated than previously thought. Hence, the spFRET assay is very sensitive to the conformation of the TBP-DNA complex and is very promising tool for investigating the pathway of TBP binding in detail.
Sequence evaluation of four specific cDNA libraries for developmental genomics of sunflower.
Tamborindeguy, C; Ben, C; Liboz, T; Gentzbittel, L
2004-04-01
Four different cDNA libraries were constructed from sunflower protoplasts growing under embryogenic and non-embryogenic conditions: one standard library from each condition and two subtractive libraries in opposite sense. A total of 22,876 cDNA clones were obtained and 4800 ESTs were sequenced, giving rise to 2479 high quality ESTs representing an unigene set of 1502 sequences. This set was compared with ESTs represented in public databases using the programs BLASTN and BLASTX, and its members were classified according to putative function using the catalog in the Kyoto Encyclopedia of Genes and Genomes (KEGG). Some 33% of sequences failed to align with existing plant ESTs and therefore represent putative novel genes. The libraries show a low level of redundancy and, on average, 50% of the present ESTs have not been previously reported for sunflower. Several potentially interesting genes were identified, based on their homology with genes involved in animal zygotic division or plant embryogenesis. We also identified two ESTs that show significantly different levels of expression under embryogenic and non-embryogenic conditions. The libraries described here represent an original and valuable resource for the discovery of yet unknown genes putatively involved in dicot embryogenesis and improving our knowledge of the mechanisms involved in polarity acquisition by plant embryos.
Free Energy Gap and Statistical Thermodynamic Fidelity of DNA Codes
2007-10-01
reverse-complement unless otherwise stated. For strand x, let Nx denote its complement. A (perfect) Watson - Crick duplex is the joining of complement...is possible for complementary sequences to form a non-perfectly aligned duplex, we will call any x W Nx duplex a Watson - Crick (WC) duplex. Two...DATES COVERED (From - To) 4. TITLE AND SUBTITLE FREE ENERGY GAP AND STATISTICAL THERMODYNAMIC FIDELITY OF DNA CODES 5a. CONTRACT NUMBER FA8750-07
Free Energy Gap and Statistical Thermodynamic Fidelity of DNA Codes (Postprint)
2007-01-01
reverse-complement unless otherwise stated. For strand x, let Nx denote its complement. A (perfect) Watson - Crick duplex is the joining of complement...is possible for complementary sequences to form a non-perfectly aligned duplex, we will call any x W Nx duplex a Watson - Crick (WC) duplex. Two...DATES COVERED (From - To) 4. TITLE AND SUBTITLE FREE ENERGY GAP AND STATISTICAL THERMODYNAMIC FIDELITY OF DNA CODES 5a. CONTRACT NUMBER FA8750-07
Rueckert, Sonja; Simdyanov, Timur G.; Aleoshin, Vladimir V.; Leander, Brian S.
2011-01-01
Background Environmental SSU rDNA surveys have significantly improved our understanding of microeukaryotic diversity. Many of the sequences acquired using this approach are closely related to lineages previously characterized at both morphological and molecular levels, making interpretation of these data relatively straightforward. Some sequences, by contrast, appear to be phylogenetic orphans and are sometimes inferred to represent “novel lineages” of unknown cellular identity. Consequently, interpretation of environmental DNA surveys of cellular diversity rely on an adequately comprehensive database of DNA sequences derived from identified species. Several major taxa of microeukaryotes, however, are still very poorly represented in these databases, and this is especially true for diverse groups of single-celled parasites, such as gregarine apicomplexans. Methodology/Principal Findings This study attempts to address this paucity of DNA sequence data by characterizing four different gregarine species, isolated from the intestines of crustaceans, at both morphological and molecular levels: Thiriotia pugettiae sp. n. from the graceful kelp crab (Pugettia gracilis), Cephaloidophora cf. communis from two different species of barnacles (Balanus glandula and B. balanus), Heliospora cf. longissima from two different species of freshwater amphipods (Eulimnogammarus verrucosus and E. vittatus), and Heliospora caprellae comb. n. from a skeleton shrimp (Caprella alaskana). SSU rDNA sequences were acquired from isolates of these gregarine species and added to a global apicomplexan alignment containing all major groups of gregarines characterized so far. Molecular phylogenetic analyses of these data demonstrated that all of the gregarines collected from crustacean hosts formed a very strongly supported clade with 48 previously unidentified environmental DNA sequences. Conclusions/Significance This expanded molecular phylogenetic context enabled us to establish a major clade of intestinal gregarine parasites and infer the cellular identities of several previously unidentified environmental SSU rDNA sequences, including several sequences that have formerly been discussed broadly in the literature as a suspected “novel” lineage of eukaryotes. PMID:21483868
Evidence for a remodelling of DNA-PK upon autophosphorylation from electron microscopy studies
Morris, Edward P.; Rivera-Calzada, Angel; da Fonseca, Paula C. A.; Llorca, Oscar; Pearl, Laurence H.; Spagnolo, Laura
2011-01-01
The multi-subunit DNA-dependent protein kinase (DNA-PK), a crucial player in DNA repair by non-homologous end-joining in higher eukaryotes, consists of a catalytic subunit (DNA-PKcs) and the Ku heterodimer. Ku recruits DNA-PKcs to double-strand breaks, where DNA-PK assembles prior to DNA repair. The interaction of DNA-PK with DNA is regulated via autophosphorylation. Recent SAXS data addressed the conformational changes occurring in the purified catalytic subunit upon autophosphorylation. Here, we present the first structural analysis of the effects of autophosphorylation on the trimeric DNA-PK enzyme, performed by electron microscopy and single particle analysis. We observe a considerable degree of heterogeneity in the autophosphorylated material, which we resolved into subpopulations of intact complex, and separate DNA-PKcs and Ku, by using multivariate statistical analysis and multi-reference alignment on a partitioned particle image data set. The proportion of dimeric oligomers was reduced compared to non-phosphorylated complex, and those dimers remaining showed a substantial variation in mutual monomer orientation. Together, our data indicate a substantial remodelling of DNA-PK holo-enzyme upon autophosphorylation, which is crucial to the release of protein factors from a repaired DNA double-strand break. PMID:21450809
Charge Transport in 2D DNA Tunnel Junction Diodes.
Yoon, Minho; Min, Sung-Wook; Dugasani, Sreekantha Reddy; Lee, Yong Uk; Oh, Min Suk; Anthopoulos, Thomas D; Park, Sung Ha; Im, Seongil
2017-12-01
Recently, deoxyribonucleic acid (DNA) is studied for electronics due to its intrinsic benefits such as its natural plenitude, biodegradability, biofunctionality, and low-cost. However, its applications are limited to passive components because of inherent insulating properties. In this report, a metal-insulator-metal tunnel diode with Au/DNA/NiO x junctions is presented. Through the self-aligning process of DNA molecules, a 2D DNA nanosheet is synthesized and used as a tunneling barrier, and semitransparent conducting oxide (NiO x ) is applied as a top electrode for resolving metal penetration issues. This molecular device successfully operates as a nonresonant tunneling diode, and temperature-variable current-voltage analysis proves that Fowler-Nordheim tunneling is a dominant conduction mechanism at the junctions. DNA-based tunneling devices appear to be promising prototypes for nanoelectronics using biomolecules. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Aligned Single Wall Carbon Nanotube Polymer Composites Using an Electric Field
NASA Technical Reports Server (NTRS)
Park, Cheol; Wiklinson, John; Banda, Sumanth; Ounaies, Zoubeida; Wise, Kristopher E.; Sauti, Godfrey; Lillehei, Peter T.; Harrison, Joycelyn S.
2005-01-01
While high shear alignment has been shown to improve the mechanical properties of single wall carbon nanotubes (SWNT)-polymer composites, it is difficult to control and often results in degradation of the electrical and dielectric properties of the composite. Here, we report a novel method to actively align SWNTs in a polymer matrix, which allows for control over the degree of alignment of SWNTs without the side effects of shear alignment. In this process, SWNTs are aligned via field-induced dipolar interactions among the nanotubes under an AC electric field in a liquid matrix followed by immobilization by photopolymerization while maintaining the electric field. Alignment of SWNTs was controlled as a function of magnitude, frequency, and application time of the applied electric field. The degree of SWNT alignment was assessed using optical microscopy and polarized Raman spectroscopy and the morphology of the aligned nanocomposites was investigated by high resolution scanning electron microscopy. The structure of the field induced aligned SWNTs is intrinsically different from that of shear aligned SWNTs. In the present work, SWNTs are not only aligned along the field, but also migrate laterally to form thick, aligned SWNT percolative columns between the electrodes. The actively aligned SWNTs amplify the electrical and dielectric properties in addition to improving the mechanical properties of the composite. All of these properties of the aligned nanocomposites exhibited anisotropic characteristics, which were controllable by tuning the applied field conditions.
2017-03-01
possible. The thesis also utilized organizational alignment literature to include organizational alignment principles in the evaluation. Key principles ...thesis also utilized organizational alignment literature to include organizational alignment principles in the evaluation. Key principles include 1...39 A. CORE PRINCIPLES ...............................................................................39 B. RELATIONSHIP OF CORE PRINCIPLES
Study of DNA binding sites using the Rényi parametric entropy measure.
Krishnamachari, A; moy Mandal, Vijnan; Karmeshu
2004-04-07
Shannon's definition of uncertainty or surprisal has been applied extensively to measure the information content of aligned DNA sequences and characterizing DNA binding sites. In contrast to Shannon's uncertainty, this study investigates the applicability and suitability of a parametric uncertainty measure due to Rényi. It is observed that this measure also provides results in agreement with Shannon's measure, pointing to its utility in analysing DNA binding site region. For facilitating the comparison between these uncertainty measures, a dimensionless quantity called "redundancy" has been employed. It is found that Rényi's measure at low parameter values possess a better delineating feature of binding sites (of binding regions) than Shannon's measure. The critical value of the parameter is chosen with an outlier criterion.
Things fall apart: biological species form unconnected parsimony networks.
Hart, Michael W; Sunday, Jennifer
2007-10-22
The generality of operational species definitions is limited by problematic definitions of between-species divergence. A recent phylogenetic species concept based on a simple objective measure of statistically significant genetic differentiation uses between-species application of statistical parsimony networks that are typically used for population genetic analysis within species. Here we review recent phylogeographic studies and reanalyse several mtDNA barcoding studies using this method. We found that (i) alignments of DNA sequences typically fall apart into a separate subnetwork for each Linnean species (but with a higher rate of true positives for mtDNA data) and (ii) DNA sequences from single species typically stick together in a single haplotype network. Departures from these patterns are usually consistent with hybridization or cryptic species diversity.
Holznecht, Catherine; Schmidt, Travis; Gould, Jon
2012-01-01
Circumstances may arise during laparoscopic procedures in which alignment of the laparoscope and the instruments is off by 180°, creating a mirror image of the operative field. It has been shown that task performance is degraded under these reverse-alignment conditions, and that the magnitude of performance impairment is directly related to laparoscopic experience and skill. The aim of this study was to determine if reverse-alignment surgical skills could be developed through training. Twenty-two medical students were randomized to train in either reverse- or forward-alignment conditions on a standardized laparoscopic task in a video trainer (peg transfer). Baseline scores were attained for each group under both orientations. Subjects participated in three 1-h training sessions during an 8-week period. Post-training scores were then obtained under both alignment conditions. Pre and post-training scores were compared for users in each study group under both conditions. Post-training assessments in the forward orientation demonstrated that subjects in the forward-training group improved significantly compared to pre-testing, while the performance of subjects in the reverse-training group did not improve. Under reverse-alignment conditions, both groups improved on post-test assessment, with dramatic improvements observed for those in the reverse-training group. Laparoscopic novices can learn to adapt to a sensorimotor discordance in a simulated training environment. While it is possible that skills developed by training under standard forward-alignment conditions can be utilized in situations of extreme visual-spatial discordance, the intentional development of reverse-alignment skills by training under these conditions may prove beneficial to novice surgeons.
A new method to cluster genomes based on cumulative Fourier power spectrum.
Dong, Rui; Zhu, Ziyue; Yin, Changchuan; He, Rong L; Yau, Stephen S-T
2018-06-20
Analyzing phylogenetic relationships using mathematical methods has always been of importance in bioinformatics. Quantitative research may interpret the raw biological data in a precise way. Multiple Sequence Alignment (MSA) is used frequently to analyze biological evolutions, but is very time-consuming. When the scale of data is large, alignment methods cannot finish calculation in reasonable time. Therefore, we present a new method using moments of cumulative Fourier power spectrum in clustering the DNA sequences. Each sequence is translated into a vector in Euclidean space. Distances between the vectors can reflect the relationships between sequences. The mapping between the spectra and moment vector is one-to-one, which means that no information is lost in the power spectra during the calculation. We cluster and classify several datasets including Influenza A, primates, and human rhinovirus (HRV) datasets to build up the phylogenetic trees. Results show that the new proposed cumulative Fourier power spectrum is much faster and more accurately than MSA and another alignment-free method known as k-mer. The research provides us new insights in the study of phylogeny, evolution, and efficient DNA comparison algorithms for large genomes. The computer programs of the cumulative Fourier power spectrum are available at GitHub (https://github.com/YaulabTsinghua/cumulative-Fourier-power-spectrum). Copyright © 2018. Published by Elsevier B.V.
Yuan, Siqi; Zheng, Yuchi; Zeng, Xiaomao
2016-01-01
Recent improvements in next-generation sequencing (NGS) technologies can facilitate the obtainment of mitochondrial genomes. However, it is not clear whether NGS could be effectively used to reconstruct the mitogenome with high gene rearrangement. These high rearrangements would cause amplification failure, and/or assembly and alignment errors. Here, we choose two frogs with rearranged gene order, Amolops chunganensis and Quasipaa boulengeri, to test whether gene rearrangements affect the mitogenome assembly and alignment by using NGS. The mitogenomes with gene rearrangements are sequenced through Illumina MiSeq genomic sequencing and assembled effectively by Trinity v2.1.0 and SOAPdenovo2. Gene order and contents in the mitogenome of A. chunganensis and Q. boulengeri are typical neobatrachian pattern except for rearrangements at the position of “WANCY” tRNA genes cluster. Further, the mitogenome of Q. boulengeri is characterized with a tandem duplication of trnM. Moreover, we utilize 13 protein-coding genes of A. chunganensis, Q. boulengeri and other neobatrachians to reconstruct the phylogenetic tree for evaluating mitochondrial sequence authenticity of A. chunganensis and Q. boulengeri. In this work, we provide nearly complete mitochondrial genomes of A. chunganensis and Q. boulengeri. PMID:27994980
Laser illumination of multiple capillaries that form a waveguide
Dhadwal, Harbans S.; Quesada, Mark A.; Studier, F. William
1998-08-04
A system and method are disclosed for efficient laser illumination of the interiors of multiple capillaries simultaneously, and collection of light emitted from them. Capillaries in a parallel array can form an optical waveguide wherein refraction at the cylindrical surfaces confines side-on illuminating light to the core of each successive capillary in the array. Methods are provided for determining conditions where capillaries will form a waveguide and for assessing and minimizing losses due to reflection. Light can be delivered to the arrayed capillaries through an integrated fiber optic transmitter or through a pair of such transmitters aligned coaxially at opposite sides of the array. Light emitted from materials within the capillaries can be carried to a detection system through optical fibers, each of which collects light from a single capillary, with little cross talk between the capillaries. The collection ends of the optical fibers can be in a parallel array with the same spacing as the capillary array, so that the collection fibers can all be aligned to the capillaries simultaneously. Applicability includes improving the efficiency of many analytical methods that use capillaries, including particularly high-throughput DNA sequencing and diagnostic methods based on capillary electrophoresis.
Laser illumination of multiple capillaries that form a waveguide
Dhadwal, H.S.; Quesada, M.A.; Studier, F.W.
1998-08-04
A system and method are disclosed for efficient laser illumination of the interiors of multiple capillaries simultaneously, and collection of light emitted from them. Capillaries in a parallel array can form an optical waveguide wherein refraction at the cylindrical surfaces confines side-on illuminating light to the core of each successive capillary in the array. Methods are provided for determining conditions where capillaries will form a waveguide and for assessing and minimizing losses due to reflection. Light can be delivered to the arrayed capillaries through an integrated fiber optic transmitter or through a pair of such transmitters aligned coaxially at opposite sides of the array. Light emitted from materials within the capillaries can be carried to a detection system through optical fibers, each of which collects light from a single capillary, with little cross talk between the capillaries. The collection ends of the optical fibers can be in a parallel array with the same spacing as the capillary array, so that the collection fibers can all be aligned to the capillaries simultaneously. Applicability includes improving the efficiency of many analytical methods that use capillaries, including particularly high-throughput DNA sequencing and diagnostic methods based on capillary electrophoresis. 35 figs.
Huang, Shuguang; Yeo, Adeline A; Li, Shuyu Dan
2007-10-01
The Kolmogorov-Smirnov (K-S) test is a statistical method often used for comparing two distributions. In high-throughput screening (HTS) studies, such distributions usually arise from the phenotype of independent cell populations. However, the K-S test has been criticized for being overly sensitive in applications, and it often detects a statistically significant difference that is not biologically meaningful. One major reason is that there is a common phenomenon in HTS studies that systematic drifting exists among the distributions due to reasons such as instrument variation, plate edge effect, accidental difference in sample handling, etc. In particular, in high-content cellular imaging experiments, the location shift could be dramatic since some compounds themselves are fluorescent. This oversensitivity of the K-S test is particularly overpowered in cellular assays where the sample sizes are very big (usually several thousands). In this paper, a modified K-S test is proposed to deal with the nonspecific location-shift problem in HTS studies. Specifically, we propose that the distributions are "normalized" by density curve alignment before the K-S test is conducted. In applications to simulation data and real experimental data, the results show that the proposed method has improved specificity.
Identification of duck plague virus by polymerase chain reaction.
Hansen, W R; Brown, S E; Nashold, S W; Knudson, D L
1999-01-01
A polymerase chain reaction (PCR) assay was developed for detecting duck plague virus. A 765-bp EcoRI fragment cloned from the genome of the duck plague vaccine (DP-VAC) virus was sequenced for PCR primer development. The fragment sequence was found by GenBank alignment searches to be similar to the 3' ends of an undefined open reading frame and the gene for DNA polymerase protein in other herpesviruses. Three of four primers sets were found to be specific for the DP-VAC virus and 100% (7/7) of field isolates but did not amplify DNA from inclusion body disease of cranes virus. The specificity of one primer set was tested with genome templates from other avian herpesviruses, including those from a golden eagle, bald eagle, great horned owl, snowy owl, peregrine falcon, prairie falcon, pigeon, psittacine, and chicken (infectious laryngotracheitis), but amplicons were not produced. Hence, this PCR test is highly specific for duck plague virus DNA. Two primer sets were able to detect 1 fg of DNA from the duck plague vaccine strain, equivalent to five genome copies. In addition, the ratio of tissue culture infectious doses to genome copies of duck plague vaccine virus from infected duck embryo cells was determined to be 1:100, making the PCR assay 20 times more sensitive than tissue culture for detecting duck plague virus. The speed, sensitivity, and specificity of this PCR provide a greatly improved diagnostic and research tool for studying the epizootiology of duck plague.
Ivankin, A V; Kolesnikova, T D; Demakov, S A; Andreenkov, O V; Bil'danova, E R; Andreenkova, N G; Zhimulev, I F
2011-01-01
Methods of physical DNA mapping and direct visualization of replication and transcription in specific regions of genome play crucial role in the researches of structural and functional organization of eukaryotic genomes. Since DNA strands in the cells are organized into high-fold structure and present as highly compacted chromosomes, the majority of these methods have lower resolution at chromosomal level. One of the approaches to enhance the resolution and mapping accuracy is the method of molecular combing. The method is based on the process of stretching and alignment of DNA molecules that are covalently attached with one of the ends to the cover glass surface. In this article we describe the major methodological steps of molecular combing and their adaptation for researches of DNA replication parameters in polyploidy and diploid tissues of Drosophyla larvae.
NASA Astrophysics Data System (ADS)
van Eijck, L.; Merzel, F.; Rols, S.; Ollivier, J.; Forsyth, V. T.; Johnson, M. R.
2011-08-01
Quantifying the molecular elasticity of DNA is fundamental to our understanding of its biological functions. Recently different groups, through experiments on tailored DNA samples and numerical models, have reported a range of stretching force constants (0.3 to 3N/m). However, the most direct, microscopic measurement of DNA stiffness is obtained from the dispersion of its vibrations. A new neutron scattering spectrometer and aligned, wet spun samples have enabled such measurements, which provide the first data of collective excitations of DNA and yield a force constant of 83N/m. Structural and dynamic order persists unchanged to within 15 K of the melting point of the sample, precluding the formation of bubbles. These findings are supported by large scale phonon and molecular dynamics calculations, which reconcile hard and soft force constants.
Molecular Mapping of the ROSY Locus in DROSOPHILA MELANOGASTER
Coté, Babette; Bender, Welcome; Curtis, Daniel; Chovnick, Arthur
1986-01-01
The DNA from the chromosomal region of the Drosophila rosy locus has been examined in 83 rosy mutant strains. Several spontaneous and radiation-induced alleles were associated with insertions and deletions, respectively. The lesions are clustered in a 4-kb region. Some of the alleles identified on the DNA map have been located on the genetic map by fine-structure recombination experiments. The genetic and molecular maps are collinear, and the alignment identifies the DNA location of the rosy control region. A rosy RNA of 4.5 kb has been identified; its 5' end lies in or near the control region. PMID:2420682
SNPServer: a real-time SNP discovery tool.
Savage, David; Batley, Jacqueline; Erwin, Tim; Logan, Erica; Love, Christopher G; Lim, Geraldine A C; Mongin, Emmanuel; Barker, Gary; Spangenberg, German C; Edwards, David
2005-07-01
SNPServer is a real-time flexible tool for the discovery of SNPs (single nucleotide polymorphisms) within DNA sequence data. The program uses BLAST, to identify related sequences, and CAP3, to cluster and align these sequences. The alignments are parsed to the SNP discovery software autoSNP, a program that detects SNPs and insertion/deletion polymorphisms (indels). Alternatively, lists of related sequences or pre-assembled sequences may be entered for SNP discovery. SNPServer and autoSNP use redundancy to differentiate between candidate SNPs and sequence errors. For each candidate SNP, two measures of confidence are calculated, the redundancy of the polymorphism at a SNP locus and the co-segregation of the candidate SNP with other SNPs in the alignment. SNPServer is available at http://hornbill.cspp.latrobe.edu.au/snpdiscovery.html.
A new structural framework for integrating replication protein A into DNA processing machinery
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brosey, Chris A; Yan, Chunli; Tsutakawa, Susan E
2013-01-01
By coupling the protection and organization of ssDNA with the recruitment and alignment of DNA processing factors, Replication Protein A (RPA) lies at the heart of dynamic multi-protein DNA processing machinery. Nevertheless, how RPA manages to coordinate the biochemical functions of its eight domains remains unknown. We examined the structural biochemistry of RPA s DNA binding activity, combining small-angle x-ray and neutron scattering with all-atom molecular dynamics simulations to investigate the architecture of RPA s DNA-binding core. It has been long held that RPA engages ssDNA in three stages, but our data reveal that RPA undergoes two rather than threemore » transitions as it binds ssDNA. In contrast to previous models, RPA is more compact when fully engaged on 20-30 nucleotides of ssDNA than when DNA-free, and there is no evidence for significant population of a highly compacted structure in the initial 8-10 nucleotide binding mode. These results provide a new framework for understanding the integration of ssDNA into DNA processing machinery and how binding partners may manipulate RPA architecture to gain access to the substrate.« less
Cho, Woon; Jang, Jinbeum; Koschan, Andreas; Abidi, Mongi A; Paik, Joonki
2016-11-28
A fundamental limitation of hyperspectral imaging is the inter-band misalignment correlated with subject motion during data acquisition. One way of resolving this problem is to assess the alignment quality of hyperspectral image cubes derived from the state-of-the-art alignment methods. In this paper, we present an automatic selection framework for the optimal alignment method to improve the performance of face recognition. Specifically, we develop two qualitative prediction models based on: 1) a principal curvature map for evaluating the similarity index between sequential target bands and a reference band in the hyperspectral image cube as a full-reference metric; and 2) the cumulative probability of target colors in the HSV color space for evaluating the alignment index of a single sRGB image rendered using all of the bands of the hyperspectral image cube as a no-reference metric. We verify the efficacy of the proposed metrics on a new large-scale database, demonstrating a higher prediction accuracy in determining improved alignment compared to two full-reference and five no-reference image quality metrics. We also validate the ability of the proposed framework to improve hyperspectral face recognition.
Time-Resolved Small-Angle X-ray Scattering Reveals Millisecond Transitions of a DNA Origami Switch.
Bruetzel, Linda K; Walker, Philipp U; Gerling, Thomas; Dietz, Hendrik; Lipfert, Jan
2018-04-11
Self-assembled DNA structures enable creation of specific shapes at the nanometer-micrometer scale with molecular resolution. The construction of functional DNA assemblies will likely require dynamic structures that can undergo controllable conformational changes. DNA devices based on shape complementary stacking interactions have been demonstrated to undergo reversible conformational changes triggered by changes in ionic environment or temperature. An experimentally unexplored aspect is how quickly conformational transitions of large synthetic DNA origami structures can actually occur. Here, we use time-resolved small-angle X-ray scattering to monitor large-scale conformational transitions of a two-state DNA origami switch in free solution. We show that the DNA device switches from its open to its closed conformation upon addition of MgCl 2 in milliseconds, which is close to the theoretical diffusive speed limit. In contrast, measurements of the dimerization of DNA origami bricks reveal much slower and concentration-dependent assembly kinetics. DNA brick dimerization occurs on a time scale of minutes to hours suggesting that the kinetics depend on local concentration and molecular alignment.
Implant alignment in total elbow arthroplasty: conventional vs. navigated techniques
NASA Astrophysics Data System (ADS)
McDonald, Colin P.; Johnson, James A.; King, Graham J. W.; Peters, Terry M.
2009-02-01
Incorrect selection of the native flexion-extension axis during implant alignment in elbow replacement surgery is likely a significant contributor to failure of the prosthesis. Computer and image-assisted surgery is emerging as a useful surgical tool in terms of improving the accuracy of orthopaedic procedures. This study evaluated the accuracy of implant alignment using an image-based navigation technique compared against a conventional non-navigated approach. Implant alignment error was 0.8 +/- 0.3 mm in translation and 1.1 +/- 0.4° in rotation for the navigated alignment, compared with 3.1 +/- 1.3 mm and 5.0 +/- 3.8° for the non-navigated alignment. Five (5) of the 11 non-navigated alignments were malaligned greater than 5° while none of the navigated alignments were placed with an error of greater than 2.0°. It is likely that improved implant positioning will lead to reduced implant loading and wear, resulting in fewer implantrelated complications and revision surgeries.
Improving scanner wafer alignment performance by target optimization
NASA Astrophysics Data System (ADS)
Leray, Philippe; Jehoul, Christiane; Socha, Robert; Menchtchikov, Boris; Raghunathan, Sudhar; Kent, Eric; Schoonewelle, Hielke; Tinnemans, Patrick; Tuffy, Paul; Belen, Jun; Wise, Rich
2016-03-01
In the process nodes of 10nm and below, the patterning complexity along with the processing and materials required has resulted in a need to optimize alignment targets in order to achieve the required precision, accuracy and throughput performance. Recent industry publications on the metrology target optimization process have shown a move from the expensive and time consuming empirical methodologies, towards a faster computational approach. ASML's Design for Control (D4C) application, which is currently used to optimize YieldStar diffraction based overlay (DBO) metrology targets, has been extended to support the optimization of scanner wafer alignment targets. This allows the necessary process information and design methodology, used for DBO target designs, to be leveraged for the optimization of alignment targets. In this paper, we show how we applied this computational approach to wafer alignment target design. We verify the correlation between predictions and measurements for the key alignment performance metrics and finally show the potential alignment and overlay performance improvements that an optimized alignment target could achieve.
A novel approach to multiple sequence alignment using hadoop data grids.
Sudha Sadasivam, G; Baktavatchalam, G
2010-01-01
Multiple alignment of protein sequences helps to determine evolutionary linkage and to predict molecular structures. The factors to be considered while aligning multiple sequences are speed and accuracy of alignment. Although dynamic programming algorithms produce accurate alignments, they are computation intensive. In this paper we propose a time efficient approach to sequence alignment that also produces quality alignment. The dynamic nature of the algorithm coupled with data and computational parallelism of hadoop data grids improves the accuracy and speed of sequence alignment. The principle of block splitting in hadoop coupled with its scalability facilitates alignment of very large sequences.
Performances of Different Fragment Sizes for Reduced Representation Bisulfite Sequencing in Pigs.
Yuan, Xiao-Long; Zhang, Zhe; Pan, Rong-Yang; Gao, Ning; Deng, Xi; Li, Bin; Zhang, Hao; Sangild, Per Torp; Li, Jia-Qi
2017-01-01
Reduced representation bisulfite sequencing (RRBS) has been widely used to profile genome-scale DNA methylation in mammalian genomes. However, the applications and technical performances of RRBS with different fragment sizes have not been systematically reported in pigs, which serve as one of the important biomedical models for humans. The aims of this study were to evaluate capacities of RRBS libraries with different fragment sizes to characterize the porcine genome. We found that the Msp I-digested segments between 40 and 220 bp harbored a high distribution peak at 74 bp, which were highly overlapped with the repetitive elements and might reduce the unique mapping alignment. The RRBS library of 110-220 bp fragment size had the highest unique mapping alignment and the lowest multiple alignment. The cost-effectiveness of the 40-110 bp, 110-220 bp and 40-220 bp fragment sizes might decrease when the dataset size was more than 70, 50 and 110 million reads for these three fragment sizes, respectively. Given a 50-million dataset size, the average sequencing depth of the detected CpG sites in the 110-220 bp fragment size appeared to be deeper than in the 40-110 bp and 40-220 bp fragment sizes, and these detected CpG sties differently located in gene- and CpG island-related regions. In this study, our results demonstrated that selections of fragment sizes could affect the numbers and sequencing depth of detected CpG sites as well as the cost-efficiency. No single solution of RRBS is optimal in all circumstances for investigating genome-scale DNA methylation. This work provides the useful knowledge on designing and executing RRBS for investigating the genome-wide DNA methylation in tissues from pigs.
Probabilistic topic modeling for the analysis and classification of genomic sequences
2015-01-01
Background Studies on genomic sequences for classification and taxonomic identification have a leading role in the biomedical field and in the analysis of biodiversity. These studies are focusing on the so-called barcode genes, representing a well defined region of the whole genome. Recently, alignment-free techniques are gaining more importance because they are able to overcome the drawbacks of sequence alignment techniques. In this paper a new alignment-free method for DNA sequences clustering and classification is proposed. The method is based on k-mers representation and text mining techniques. Methods The presented method is based on Probabilistic Topic Modeling, a statistical technique originally proposed for text documents. Probabilistic topic models are able to find in a document corpus the topics (recurrent themes) characterizing classes of documents. This technique, applied on DNA sequences representing the documents, exploits the frequency of fixed-length k-mers and builds a generative model for a training group of sequences. This generative model, obtained through the Latent Dirichlet Allocation (LDA) algorithm, is then used to classify a large set of genomic sequences. Results and conclusions We performed classification of over 7000 16S DNA barcode sequences taken from Ribosomal Database Project (RDP) repository, training probabilistic topic models. The proposed method is compared to the RDP tool and Support Vector Machine (SVM) classification algorithm in a extensive set of trials using both complete sequences and short sequence snippets (from 400 bp to 25 bp). Our method reaches very similar results to RDP classifier and SVM for complete sequences. The most interesting results are obtained when short sequence snippets are considered. In these conditions the proposed method outperforms RDP and SVM with ultra short sequences and it exhibits a smooth decrease of performance, at every taxonomic level, when the sequence length is decreased. PMID:25916734
A Kalman Filter for SINS Self-Alignment Based on Vector Observation.
Xu, Xiang; Xu, Xiaosu; Zhang, Tao; Li, Yao; Tong, Jinwu
2017-01-29
In this paper, a self-alignment method for strapdown inertial navigation systems based on the q -method is studied. In addition, an improved method based on integrating gravitational apparent motion to form apparent velocity is designed, which can reduce the random noises of the observation vectors. For further analysis, a novel self-alignment method using a Kalman filter based on adaptive filter technology is proposed, which transforms the self-alignment procedure into an attitude estimation using the observation vectors. In the proposed method, a linear psuedo-measurement equation is adopted by employing the transfer method between the quaternion and the observation vectors. Analysis and simulation indicate that the accuracy of the self-alignment is improved. Meanwhile, to improve the convergence rate of the proposed method, a new method based on parameter recognition and a reconstruction algorithm for apparent gravitation is devised, which can reduce the influence of the random noises of the observation vectors. Simulations and turntable tests are carried out, and the results indicate that the proposed method can acquire sound alignment results with lower standard variances, and can obtain higher alignment accuracy and a faster convergence rate.
A Kalman Filter for SINS Self-Alignment Based on Vector Observation
Xu, Xiang; Xu, Xiaosu; Zhang, Tao; Li, Yao; Tong, Jinwu
2017-01-01
In this paper, a self-alignment method for strapdown inertial navigation systems based on the q-method is studied. In addition, an improved method based on integrating gravitational apparent motion to form apparent velocity is designed, which can reduce the random noises of the observation vectors. For further analysis, a novel self-alignment method using a Kalman filter based on adaptive filter technology is proposed, which transforms the self-alignment procedure into an attitude estimation using the observation vectors. In the proposed method, a linear psuedo-measurement equation is adopted by employing the transfer method between the quaternion and the observation vectors. Analysis and simulation indicate that the accuracy of the self-alignment is improved. Meanwhile, to improve the convergence rate of the proposed method, a new method based on parameter recognition and a reconstruction algorithm for apparent gravitation is devised, which can reduce the influence of the random noises of the observation vectors. Simulations and turntable tests are carried out, and the results indicate that the proposed method can acquire sound alignment results with lower standard variances, and can obtain higher alignment accuracy and a faster convergence rate. PMID:28146059
Thermal Conductivity of Polyimide/Carbon Nanofiller Blends
NASA Technical Reports Server (NTRS)
Delozier, D. M.; Watson, K. A.; Ghose, S.; Working, D. C.; Connell, J. W.; Smith, J. G.; Sun, Y. P.; Lin, Y.
2006-01-01
Ultem(TM) was mixed with three different carbon-based nanofillers in efforts to increase the thermal conductivity of the polymer. After initial mixing, the nanocomposites were extruded or processed via the Laboratory Mixing Molder (LMM) process. High resolution scanning electron microscopy (HRSEM) revealed significant alignment of the nanofillers in the extruded samples. Thermal conductivity measurements were made both in the direction and perpendicular to the direction of alignment of nanofillers as well as for unaligned samples. It was found that the largest improvement in thermal conductivity was achieved in the case of aligned samples when the measurement was performed in the direction of alignment. Unaligned samples also showed a significant improvement in thermal conductivity and may be useful in applications when it is not possible to align the nanofiller. However the improvements in thermal conductivity did not approach those expected based on a rule of mixtures. This is likely due to poor phonon transfer through the matrix.
High-throughput sequence alignment using Graphics Processing Units
Schatz, Michael C; Trapnell, Cole; Delcher, Arthur L; Varshney, Amitabh
2007-01-01
Background The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs) in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU. PMID:18070356
CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment
Manavski, Svetlin A; Valle, Giorgio
2008-01-01
Background Searching for similarities in protein and DNA databases has become a routine procedure in Molecular Biology. The Smith-Waterman algorithm has been available for more than 25 years. It is based on a dynamic programming approach that explores all the possible alignments between two sequences; as a result it returns the optimal local alignment. Unfortunately, the computational cost is very high, requiring a number of operations proportional to the product of the length of two sequences. Furthermore, the exponential growth of protein and DNA databases makes the Smith-Waterman algorithm unrealistic for searching similarities in large sets of sequences. For these reasons heuristic approaches such as those implemented in FASTA and BLAST tend to be preferred, allowing faster execution times at the cost of reduced sensitivity. The main motivation of our work is to exploit the huge computational power of commonly available graphic cards, to develop high performance solutions for sequence alignment. Results In this paper we present what we believe is the fastest solution of the exact Smith-Waterman algorithm running on commodity hardware. It is implemented in the recently released CUDA programming environment by NVidia. CUDA allows direct access to the hardware primitives of the last-generation Graphics Processing Units (GPU) G80. Speeds of more than 3.5 GCUPS (Giga Cell Updates Per Second) are achieved on a workstation running two GeForce 8800 GTX. Exhaustive tests have been done to compare our implementation to SSEARCH and BLAST, running on a 3 GHz Intel Pentium IV processor. Our solution was also compared to a recently published GPU implementation and to a Single Instruction Multiple Data (SIMD) solution. These tests show that our implementation performs from 2 to 30 times faster than any other previous attempt available on commodity hardware. Conclusions The results show that graphic cards are now sufficiently advanced to be used as efficient hardware accelerators for sequence alignment. Their performance is better than any alternative available on commodity hardware platforms. The solution presented in this paper allows large scale alignments to be performed at low cost, using the exact Smith-Waterman algorithm instead of the largely adopted heuristic approaches. PMID:18387198
Sarin, Hemant
2017-03-01
To study the conserved basis for gene expression in comparative cell types at opposite ends of the cell pressuromodulation spectrum, the lymphatic endothelial cell and the blood microvascular capillary endothelial cell. The mechanism for gene expression is studied in terms of the 5' -> 3' direction paired point tropy quotients ( prpT Q s) and the final 5' -> 3' direction episodic sub-episode block sums split-integrated weighted average-averaged gene overexpression tropy quotient ( esebssiwaagoT Q ). The final 5' -> 3' esebssiwaagoT Q classifies an lymphatic endothelial cell overexpressed gene as a supra-pressuromodulated gene ( esebssiwaagoT Q ≥ 0.25 < 0.75) every time and classifies a blood microvascular capillary endothelial cell overexpressed gene every time as an infra-pressuromodulated gene ( esebssiwaagoT Q < 0.25) (100% sensitivity; 100% specificity). Horizontal alignment of 5' -> 3' intergene distance segment tropy wrt the gene is the basis for DNA transcription in the pressuromodulated state.
Generate Optimized Genetic Rhythm for Enzyme Expression in Non-native systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
2016-11-03
Most amino acids are represented by more than one codon, resulting in redundancy in the genetic code. Silent codon substitutions that do not alter the amino acid sequence still have an effect on protein expression. We have developed an algorithm, GoGREEN, to enhance the expression of foreign proteins in a host organism. GoGREEN selects codons according to frequency patterns seen in the gene of interest using the codon usage table from the host organism. GoGREEN is also designed to accommodate gaps in the sequence.This software takes for input (1) the aligned protein sequences for genes the user wishes to express,more » (2) the codon usage table for the host organism, (3) and the DNA sequence for the target protein found in the host organism. The program will select codons based on codon usage patterns for the target DNA sequence. The program will also select codons for “gaps” found in the aligned protein sequences using the codon usage table from the host organism.« less
Ibarra, Ignacio L; Melo, Francisco
2010-07-01
Dynamic programming (DP) is a general optimization strategy that is successfully used across various disciplines of science. In bioinformatics, it is widely applied in calculating the optimal alignment between pairs of protein or DNA sequences. These alignments form the basis of new, verifiable biological hypothesis. Despite its importance, there are no interactive tools available for training and education on understanding the DP algorithm. Here, we introduce an interactive computer application with a graphical interface, for the purpose of educating students about DP. The program displays the DP scoring matrix and the resulting optimal alignment(s), while allowing the user to modify key parameters such as the values in the similarity matrix, the sequence alignment algorithm version and the gap opening/extension penalties. We hope that this software will be useful to teachers and students of bioinformatics courses, as well as researchers who implement the DP algorithm for diverse applications. The software is freely available at: http:/melolab.org/sat. The software is written in the Java computer language, thus it runs on all major platforms and operating systems including Windows, Mac OS X and LINUX. All inquiries or comments about this software should be directed to Francisco Melo at fmelo@bio.puc.cl.
Dynamic programming algorithms for biological sequence comparison.
Pearson, W R; Miller, W
1992-01-01
Efficient dynamic programming algorithms are available for a broad class of protein and DNA sequence comparison problems. These algorithms require computer time proportional to the product of the lengths of the two sequences being compared [O(N2)] but require memory space proportional only to the sum of these lengths [O(N)]. Although the requirement for O(N2) time limits use of the algorithms to the largest computers when searching protein and DNA sequence databases, many other applications of these algorithms, such as calculation of distances for evolutionary trees and comparison of a new sequence to a library of sequence profiles, are well within the capabilities of desktop computers. In particular, the results of library searches with rapid searching programs, such as FASTA or BLAST, should be confirmed by performing a rigorous optimal alignment. Whereas rapid methods do not overlook significant sequence similarities, FASTA limits the number of gaps that can be inserted into an alignment, so that a rigorous alignment may extend the alignment substantially in some cases. BLAST does not allow gaps in the local regions that it reports; a calculation that allows gaps is very likely to extend the alignment substantially. Although a Monte Carlo evaluation of the statistical significance of a similarity score with a rigorous algorithm is much slower than the heuristic approach used by the RDF2 program, the dynamic programming approach should take less than 1 hr on a 386-based PC or desktop Unix workstation. For descriptive purposes, we have limited our discussion to methods for calculating similarity scores and distances that use gap penalties of the form g = rk. Nevertheless, programs for the more general case (g = q+rk) are readily available. Versions of these programs that run either on Unix workstations, IBM-PC class computers, or the Macintosh can be obtained from either of the authors.
DNA barcoding the floras of biodiversity hotspots.
Lahaye, Renaud; van der Bank, Michelle; Bogarin, Diego; Warner, Jorge; Pupulin, Franco; Gigot, Guillaume; Maurin, Olivier; Duthoit, Sylvie; Barraclough, Timothy G; Savolainen, Vincent
2008-02-26
DNA barcoding is a technique in which species identification is performed by using DNA sequences from a small fragment of the genome, with the aim of contributing to a wide range of ecological and conservation studies in which traditional taxonomic identification is not practical. DNA barcoding is well established in animals, but there is not yet any universally accepted barcode for plants. Here, we undertook intensive field collections in two biodiversity hotspots (Mesoamerica and southern Africa). Using >1,600 samples, we compared eight potential barcodes. Going beyond previous plant studies, we assessed to what extent a "DNA barcoding gap" is present between intra- and interspecific variations, using multiple accessions per species. Given its adequate rate of variation, easy amplification, and alignment, we identified a portion of the plastid matK gene as a universal DNA barcode for flowering plants. Critically, we further demonstrate the applicability of DNA barcoding for biodiversity inventories. In addition, analyzing >1,000 species of Mesoamerican orchids, DNA barcoding with matK alone reveals cryptic species and proves useful in identifying species listed in Convention on International Trade of Endangered Species (CITES) appendixes.
DNA barcoding the floras of biodiversity hotspots
Lahaye, Renaud; van der Bank, Michelle; Bogarin, Diego; Warner, Jorge; Pupulin, Franco; Gigot, Guillaume; Maurin, Olivier; Duthoit, Sylvie; Barraclough, Timothy G.; Savolainen, Vincent
2008-01-01
DNA barcoding is a technique in which species identification is performed by using DNA sequences from a small fragment of the genome, with the aim of contributing to a wide range of ecological and conservation studies in which traditional taxonomic identification is not practical. DNA barcoding is well established in animals, but there is not yet any universally accepted barcode for plants. Here, we undertook intensive field collections in two biodiversity hotspots (Mesoamerica and southern Africa). Using >1,600 samples, we compared eight potential barcodes. Going beyond previous plant studies, we assessed to what extent a “DNA barcoding gap” is present between intra- and interspecific variations, using multiple accessions per species. Given its adequate rate of variation, easy amplification, and alignment, we identified a portion of the plastid matK gene as a universal DNA barcode for flowering plants. Critically, we further demonstrate the applicability of DNA barcoding for biodiversity inventories. In addition, analyzing >1,000 species of Mesoamerican orchids, DNA barcoding with matK alone reveals cryptic species and proves useful in identifying species listed in Convention on International Trade of Endangered Species (CITES) appendixes. PMID:18258745
Self-Alignment MEMS IMU Method Based on the Rotation Modulation Technique on a Swing Base
Chen, Zhiyong; Yang, Haotian; Wang, Chengbin; Lin, Zhihui; Guo, Meifeng
2018-01-01
The micro-electro-mechanical-system (MEMS) inertial measurement unit (IMU) has been widely used in the field of inertial navigation due to its small size, low cost, and light weight, but aligning MEMS IMUs remains a challenge for researchers. MEMS IMUs have been conventionally aligned on a static base, requiring other sensors, such as magnetometers or satellites, to provide auxiliary information, which limits its application range to some extent. Therefore, improving the alignment accuracy of MEMS IMU as much as possible under swing conditions is of considerable value. This paper proposes an alignment method based on the rotation modulation technique (RMT), which is completely self-aligned, unlike the existing alignment techniques. The effect of the inertial sensor errors is mitigated by rotating the IMU. Then, inertial frame-based alignment using the rotation modulation technique (RMT-IFBA) achieved coarse alignment on the swing base. The strong tracking filter (STF) further improved the alignment accuracy. The performance of the proposed method was validated with a physical experiment, and the results of the alignment showed that the standard deviations of pitch, roll, and heading angle were 0.0140°, 0.0097°, and 0.91°, respectively, which verified the practicality and efficacy of the proposed method for the self-alignment of the MEMS IMU on a swing base. PMID:29649150
Nucleotide sequencing and identification of some wild mushrooms.
Das, Sudip Kumar; Mandal, Aninda; Datta, Animesh K; Gupta, Sudha; Paul, Rita; Saha, Aditi; Sengupta, Sonali; Dubey, Priyanka Kumari
2013-01-01
The rDNA-ITS (Ribosomal DNA Internal Transcribed Spacers) fragment of the genomic DNA of 8 wild edible mushrooms (collected from Eastern Chota Nagpur Plateau of West Bengal, India) was amplified using ITS1 (Internal Transcribed Spacers 1) and ITS2 primers and subjected to nucleotide sequence determination for identification of mushrooms as mentioned. The sequences were aligned using ClustalW software program. The aligned sequences revealed identity (homology percentage from GenBank data base) of Amanita hemibapha [CN (Chota Nagpur) 1, % identity 99 (JX844716.1)], Amanita sp. [CN 2, % identity 98 (JX844763.1)], Astraeus hygrometricus [CN 3, % identity 87 (FJ536664.1)], Termitomyces sp. [CN 4, % identity 90 (JF746992.1)], Termitomyces sp. [CN 5, % identity 99 (GU001667.1)], T. microcarpus [CN 6, % identity 82 (EF421077.1)], Termitomyces sp. [CN 7, % identity 76 (JF746993.1)], and Volvariella volvacea [CN 8, % identity 100 (JN086680.1)]. Although out of 8 mushrooms 4 could be identified up to species level, the nucleotide sequences of the rest may be relevant to further characterization. A phylogenetic tree is constructed using Neighbor-Joining method showing interrelationship between/among the mushrooms. The determined nucleotide sequences of the mushrooms may provide additional information enriching GenBank database aiding to molecular taxonomy and facilitating its domestication and characterization for human benefits.
Cytotoxic and genotoxic affects of acid mine drainage on fish Channa punctata (Bloch).
Talukdar, B; Kalita, H K; Basumatary, S; Saikia, D J; Sarma, D
2017-10-01
The investigation deals with the effects of Acid Mine Drainage (AMD) of coal mine on fish Channa punctata (Bloch) by examining the incidence of haematological, morphological, histological changes and DNA fragmentation in tissues of C. punctata in laboratory condition. For this study fishes were exposed to 10% of AMD for a period of 30 days. The fusion of the primary and secondary gill lamellae, distortion, loss of alignment, deposition of worn out tissues and mucous on the surface of the lamella in the gills; degeneration of morphological architecture, loss of alignment of tubules, mucous deposition in the kidney; cellular damage, cellular necrosis, extraneous deposition on the surface, pore formation in the liver are some important changes detected by scanning electron microscopy. Fishes of AMD treated group showed gradual significant decrease in TEC, Hb and, increase in TLC and DLC as compared to that of the control. DNA fragmentation observed in kidney of fishes from treated group indicates an intricate pollutant present in the AMD. The high incidence of morphological and histological alterations, haematological changes along with DNA breakage in C. punctata is an evidence of the cytotoxic and genotoxic potential of AMD of coal mines. Copyright © 2017 Elsevier Inc. All rights reserved.
GPU-BSM: A GPU-Based Tool to Map Bisulfite-Treated Reads
Manconi, Andrea; Orro, Alessandro; Manca, Emanuele; Armano, Giuliano; Milanesi, Luciano
2014-01-01
Cytosine DNA methylation is an epigenetic mark implicated in several biological processes. Bisulfite treatment of DNA is acknowledged as the gold standard technique to study methylation. This technique introduces changes in the genomic DNA by converting cytosines to uracils while 5-methylcytosines remain nonreactive. During PCR amplification 5-methylcytosines are amplified as cytosine, whereas uracils and thymines as thymine. To detect the methylation levels, reads treated with the bisulfite must be aligned against a reference genome. Mapping these reads to a reference genome represents a significant computational challenge mainly due to the increased search space and the loss of information introduced by the treatment. To deal with this computational challenge we devised GPU-BSM, a tool based on modern Graphics Processing Units. Graphics Processing Units are hardware accelerators that are increasingly being used successfully to accelerate general-purpose scientific applications. GPU-BSM is a tool able to map bisulfite-treated reads from whole genome bisulfite sequencing and reduced representation bisulfite sequencing, and to estimate methylation levels, with the goal of detecting methylation. Due to the massive parallelization obtained by exploiting graphics cards, GPU-BSM aligns bisulfite-treated reads faster than other cutting-edge solutions, while outperforming most of them in terms of unique mapped reads. PMID:24842718
Histone H1 functions as a stimulatory factor in backup pathways of NHEJ
Rosidi, Bustanur; Wang, Minli; Wu, Wenqi; Sharma, Aparna; Wang, Huichen; Iliakis, George
2008-01-01
DNA double-strand breaks (DSBs) induced in the genome of higher eukaryotes by ionizing radiation (IR) are predominantly removed by two pathways of non-homologous end-joining (NHEJ) termed D-NHEJ and B-NHEJ. While D-NHEJ depends on the activities of the DNA-dependent protein kinase (DNA-PK) and DNA ligase IV/XRCC4/XLF, B-NHEJ utilizes, at least partly, DNA ligase III/XRCC1 and PARP-1. Using in vitro end-joining assays and protein fractionation protocols similar to those previously applied for the characterization of DNA ligase III as an end-joining factor, we identify here histone H1 as an additional putative NHEJ factor. H1 strongly enhances DNA-end joining and shifts the product spectrum from circles to multimers. While H1 enhances the DNA-end-joining activities of both DNA Ligase IV and DNA Ligase III, the effect on ligase III is significantly stronger. Histone H1 also enhances the activity of PARP-1. Since histone H1 has been shown to counteract D-NHEJ, these observations and the known functions of the protein identify it as a putative alignment factor operating preferentially within B-NHEJ. PMID:18250087
Yang, Xiaoxia; Wang, Jia; Sun, Jun; Liu, Rong
2015-01-01
Protein-nucleic acid interactions are central to various fundamental biological processes. Automated methods capable of reliably identifying DNA- and RNA-binding residues in protein sequence are assuming ever-increasing importance. The majority of current algorithms rely on feature-based prediction, but their accuracy remains to be further improved. Here we propose a sequence-based hybrid algorithm SNBRFinder (Sequence-based Nucleic acid-Binding Residue Finder) by merging a feature predictor SNBRFinderF and a template predictor SNBRFinderT. SNBRFinderF was established using the support vector machine whose inputs include sequence profile and other complementary sequence descriptors, while SNBRFinderT was implemented with the sequence alignment algorithm based on profile hidden Markov models to capture the weakly homologous template of query sequence. Experimental results show that SNBRFinderF was clearly superior to the commonly used sequence profile-based predictor and SNBRFinderT can achieve comparable performance to the structure-based template methods. Leveraging the complementary relationship between these two predictors, SNBRFinder reasonably improved the performance of both DNA- and RNA-binding residue predictions. More importantly, the sequence-based hybrid prediction reached competitive performance relative to our previous structure-based counterpart. Our extensive and stringent comparisons show that SNBRFinder has obvious advantages over the existing sequence-based prediction algorithms. The value of our algorithm is highlighted by establishing an easy-to-use web server that is freely accessible at http://ibi.hzau.edu.cn/SNBRFinder.
Morgan, Jess A T; Godwin, Rosamond M
2017-08-30
Modern molecular approaches have vastly improved diagnostic capabilities for differentiating among species of chicken infecting Eimeria. Consolidating information from multiple genetic markers, adding additional poultry Eimeria species and increasing the size of available data-sets is improving the resolving power of the DNA, and consequently our understanding of the genus. This study adds information from 25 complete mitochondrial DNA genomes from Australian chicken Eimeria isolates representing all 10 species known to occur in Australia, including OTU-X, -Y and -Z. The resulting phylogeny provides a comprehensive view of species relatedness highlighting where the OTUs align with respect to others members of the genus. All three OTUs fall within the Eimeria clade that contains only chicken-infecting species with close affinities to E. maxima, E. brunetti and E. mitis. Mitochondrial genetic diversity was low among Australian isolates likely reflecting their recent introduction to the country post-European settlement. The lack of observed genetic diversity is a promising outcome as it suggests that the currently used live vaccines should continue to offer widespread protection against Eimeria outbreaks in all states and territories. Flocks were frequently found to host multiple strains of the same species, a factor that should be considered when studying disease epidemiology in the field. Crown Copyright © 2017. Published by Elsevier B.V. All rights reserved.
Spatially-Interactive Biomolecular Networks Organized by Nucleic Acid Nanostructures
Fu, Jinglin; Liu, Minghui; Liu, Yan; Yan, Hao
2013-01-01
Conspectus Living systems have evolved a variety of nanostructures to control the molecular interactions that mediate many functions including the recognition of targets by receptors, the binding of enzymes to substrates, and the regulation of enzymatic activity. Mimicking these structures outside of the cell requires methods that offer nanoscale control over the organization of individual network components. Advances in DNA nanotechnology have enabled the design and fabrication of sophisticated one-, two- and three-dimensional (1D, 2D and 3D) nanostructures that utilize spontaneous and sequence specific DNA hybridization. Compared to other self-assembling biopolymers, DNA nanostructures offer predictable and programmable interactions, and surface features to which other nanoparticles and bio-molecules can be precisely positioned. The ability to control the spatial arrangement of the components while constructing highly-organized networks will lead to various applications of these systems. For example, DNA nanoarrays with surface displays of molecular probes can sense noncovalent hybridization interactions with DNA, RNA, and proteins and covalent chemical reactions. DNA nanostructures can also align external molecules into well-defined arrays, which may improve the resolution of many structural determination methods, such as X-ray diffraction, cryo-EM, NMR, and super-resolution fluorescence. Moreover, by constraining target entities to specific conformations, self-assembled DNA nanostructures can serve as molecular rulers to evaluate conformation-dependent activities. This Account describes the most recent advances in the DNA nanostructure directed assembly of biomolecular networks and explores the possibility of applying this technology to other fields of study. Recently, several reports have demonstrated the DNA nanostructure directed assembly of spatially-interactive biomolecular networks. For example, researchers have constructed synthetic multi-enzyme cascades by organizing the position of the components using DNA nanoscaffolds in vitro, or by utilizing RNA matrices in vivo. These structures display enhanced efficiency compared to the corresponding unstructured enzyme mixtures. Such systems are designed to mimic cellular function, where substrate diffusion between enzymes is facilitated and reactions are catalyzed with high efficiency and specificity. In addition, researchers have assembled multiple choromophores into arrays using a DNA nanoscaffold that optimizes the relative distance between the dyes and their spatial organization. The resulting artificial light harvesting system exhibits efficient cascading energy transfers. Finally, DNA nanostructures have been used as assembly templates to construct nanodevices that execute rationally-designed behaviors, including cargo loading, transportation and route control. PMID:22642503
Processing and population genetic analysis of multigenic datasets with ProSeq3 software.
Filatov, Dmitry A
2009-12-01
The current tendency in molecular population genetics is to use increasing numbers of genes in the analysis. Here I describe a program for handling and population genetic analysis of DNA polymorphism data collected from multiple genes. The program includes a sequence/alignment editor and an internal relational database that simplify the preparation and manipulation of multigenic DNA polymorphism datasets. The most commonly used DNA polymorphism analyses are implemented in ProSeq3, facilitating population genetic analysis of large multigenic datasets. Extensive input/output options make ProSeq3 a convenient hub for sequence data processing and analysis. The program is available free of charge from http://dps.plants.ox.ac.uk/sequencing/proseq.htm.
Improved alignment evaluation and optimization : final report.
DOT National Transportation Integrated Search
2007-09-11
This report outlines the development of an enhanced highway alignment evaluation and optimization : model. A GIS-based software tool is prepared for alignment optimization that uses genetic algorithms for : optimal search. The software is capable of ...
Sensitive Technique For Detecting Alignment Of Seed Laser
NASA Technical Reports Server (NTRS)
Barnes, Norman P.
1994-01-01
Frequency response near resonance measured. Improved technique for detection and quantification of alignment of injection-seeding laser with associated power-oscillator laser proposed. Particularly useful in indicating alignment at spectral purity greater than 98 percent because it becomes more sensitive as perfect alignment approached. In addition, implemented relatively easily, without turning on power-oscillator laser.
DIALIGN P: fast pair-wise and multiple sequence alignment using parallel processors.
Schmollinger, Martin; Nieselt, Kay; Kaufmann, Michael; Morgenstern, Burkhard
2004-09-09
Parallel computing is frequently used to speed up computationally expensive tasks in Bioinformatics. Herein, a parallel version of the multi-alignment program DIALIGN is introduced. We propose two ways of dividing the program into independent sub-routines that can be run on different processors: (a) pair-wise sequence alignments that are used as a first step to multiple alignment account for most of the CPU time in DIALIGN. Since alignments of different sequence pairs are completely independent of each other, they can be distributed to multiple processors without any effect on the resulting output alignments. (b) For alignments of large genomic sequences, we use a heuristics by splitting up sequences into sub-sequences based on a previously introduced anchored alignment procedure. For our test sequences, this combined approach reduces the program running time of DIALIGN by up to 97%. By distributing sub-routines to multiple processors, the running time of DIALIGN can be crucially improved. With these improvements, it is possible to apply the program in large-scale genomics and proteomics projects that were previously beyond its scope.
Structure and mechanism of human DNA polymerase [eta
DOE Office of Scientific and Technical Information (OSTI.GOV)
Biertümpfel, Christian; Zhao, Ye; Kondo, Yuji
2010-11-03
The variant form of the human syndrome xeroderma pigmentosum (XPV) is caused by a deficiency in DNA polymerase {eta} (Pol{eta}), a DNA polymerase that enables replication through ultraviolet-induced pyrimidine dimers. Here we report high-resolution crystal structures of human Pol{eta} at four consecutive steps during DNA synthesis through cis-syn cyclobutane thymine dimers. Pol{eta} acts like a 'molecular splint' to stabilize damaged DNA in a normal B-form conformation. An enlarged active site accommodates the thymine dimer with excellent stereochemistry for two-metal ion catalysis. Two residues conserved among Pol{eta} orthologues form specific hydrogen bonds with the lesion and the incoming nucleotide to assistmore » translesion synthesis. On the basis of the structures, eight Pol{eta} missense mutations causing XPV can be rationalized as undermining the molecular splint or perturbing the active-site alignment. The structures also provide an insight into the role of Pol{eta} in replicating through D loop and DNA fragile sites.« less
Dfam: a database of repetitive DNA based on profile hidden Markov models.
Wheeler, Travis J; Clements, Jody; Eddy, Sean R; Hubley, Robert; Jones, Thomas A; Jurka, Jerzy; Smit, Arian F A; Finn, Robert D
2013-01-01
We present a database of repetitive DNA elements, called Dfam (http://dfam.janelia.org). Many genomes contain a large fraction of repetitive DNA, much of which is made up of remnants of transposable elements (TEs). Accurate annotation of TEs enables research into their biology and can shed light on the evolutionary processes that shape genomes. Identification and masking of TEs can also greatly simplify many downstream genome annotation and sequence analysis tasks. The commonly used TE annotation tools RepeatMasker and Censor depend on sequence homology search tools such as cross_match and BLAST variants, as well as Repbase, a collection of known TE families each represented by a single consensus sequence. Dfam contains entries corresponding to all Repbase TE entries for which instances have been found in the human genome. Each Dfam entry is represented by a profile hidden Markov model, built from alignments generated using RepeatMasker and Repbase. When used in conjunction with the hidden Markov model search tool nhmmer, Dfam produces a 2.9% increase in coverage over consensus sequence search methods on a large human benchmark, while maintaining low false discovery rates, and coverage of the full human genome is 54.5%. The website provides a collection of tools and data views to support improved TE curation and annotation efforts. Dfam is also available for download in flat file format or in the form of MySQL table dumps.
A new structural framework for integrating replication protein A into DNA processing machinery
Brosey, Chris A.; Yan, Chunli; Tsutakawa, Susan E.; Heller, William T.; Rambo, Robert P.; Tainer, John A.; Ivanov, Ivaylo; Chazin, Walter J.
2013-01-01
By coupling the protection and organization of single-stranded DNA (ssDNA) with recruitment and alignment of DNA processing factors, replication protein A (RPA) lies at the heart of dynamic multi-protein DNA processing machinery. Nevertheless, how RPA coordinates biochemical functions of its eight domains remains unknown. We examined the structural biochemistry of RPA’s DNA-binding activity, combining small-angle X-ray and neutron scattering with all-atom molecular dynamics simulations to investigate the architecture of RPA’s DNA-binding core. The scattering data reveal compaction promoted by DNA binding; DNA-free RPA exists in an ensemble of states with inter-domain mobility and becomes progressively more condensed and less dynamic on binding ssDNA. Our results contrast with previous models proposing RPA initially binds ssDNA in a condensed state and becomes more extended as it fully engages the substrate. Moreover, the consensus view that RPA engages ssDNA in initial, intermediate and final stages conflicts with our data revealing that RPA undergoes two (not three) transitions as it binds ssDNA with no evidence for a discrete intermediate state. These results form a framework for understanding how RPA integrates the ssDNA substrate into DNA processing machinery, provides substrate access to its binding partners and promotes the progression and selection of DNA processing pathways. PMID:23303776
A new structural framework for integrating replication protein A into DNA processing machinery
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brosey, Chris; Yan, Chunli; Tsutakawa, Susan
2013-01-17
By coupling the protection and organization of single-stranded DNA (ssDNA) with recruitment and alignment of DNA processing factors, replication protein A (RPA) lies at the heart of dynamic multi-protein DNA processing machinery. Nevertheless, how RPA coordinates biochemical functions of its eight domains remains unknown. We examined the structural biochemistry of RPA's DNA-binding activity, combining small-angle X-ray and neutron scattering with all-atom molecular dynamics simulations to investigate the architecture of RPA's DNA-binding core. The scattering data reveal compaction promoted by DNA binding; DNA-free RPA exists in an ensemble of states with inter-domain mobility and becomes progressively more condensed and less dynamicmore » on binding ssDNA. Our results contrast with previous models proposing RPA initially binds ssDNA in a condensed state and becomes more extended as it fully engages the substrate. Moreover, the consensus view that RPA engages ssDNA in initial, intermediate and final stages conflicts with our data revealing that RPA undergoes two (not three) transitions as it binds ssDNA with no evidence for a discrete intermediate state. These results form a framework for understanding how RPA integrates the ssDNA substrate into DNA processing machinery, provides substrate access to its binding partners and promotes the progression and selection of DNA processing pathways.« less
CORAL: aligning conserved core regions across domain families.
Fong, Jessica H; Marchler-Bauer, Aron
2009-08-01
Homologous protein families share highly conserved sequence and structure regions that are frequent targets for comparative analysis of related proteins and families. Many protein families, such as the curated domain families in the Conserved Domain Database (CDD), exhibit similar structural cores. To improve accuracy in aligning such protein families, we propose a profile-profile method CORAL that aligns individual core regions as gap-free units. CORAL computes optimal local alignment of two profiles with heuristics to preserve continuity within core regions. We benchmarked its performance on curated domains in CDD, which have pre-defined core regions, against COMPASS, HHalign and PSI-BLAST, using structure superpositions and comprehensive curator-optimized alignments as standards of truth. CORAL improves alignment accuracy on core regions over general profile methods, returning a balanced score of 0.57 for over 80% of all domain families in CDD, compared with the highest balanced score of 0.45 from other methods. Further, CORAL provides E-values to aid in detecting homologous protein families and, by respecting block boundaries, produces alignments with improved 'readability' that facilitate manual refinement. CORAL will be included in future versions of the NCBI Cn3D/CDTree software, which can be downloaded at http://www.ncbi.nlm.nih.gov/Structure/cdtree/cdtree.shtml. Supplementary data are available at Bioinformatics online.
Ontology Alignment Repair through Modularization and Confidence-Based Heuristics
Santos, Emanuel; Faria, Daniel; Pesquita, Catia; Couto, Francisco M.
2015-01-01
Ontology Matching aims at identifying a set of semantic correspondences, called an alignment, between related ontologies. In recent years, there has been a growing interest in efficient and effective matching methods for large ontologies. However, alignments produced for large ontologies are often logically incoherent. It was only recently that the use of repair techniques to improve the coherence of ontology alignments began to be explored. This paper presents a novel modularization technique for ontology alignment repair which extracts fragments of the input ontologies that only contain the necessary classes and relations to resolve all detectable incoherences. The paper presents also an alignment repair algorithm that uses a global repair strategy to minimize both the degree of incoherence and the number of mappings removed from the alignment, while overcoming the scalability problem by employing the proposed modularization technique. Our evaluation shows that our modularization technique produces significantly small fragments of the ontologies and that our repair algorithm produces more complete alignments than other current alignment repair systems, while obtaining an equivalent degree of incoherence. Additionally, we also present a variant of our repair algorithm that makes use of the confidence values of the mappings to improve alignment repair. Our repair algorithm was implemented as part of AgreementMakerLight, a free and open-source ontology matching system. PMID:26710335
Ontology Alignment Repair through Modularization and Confidence-Based Heuristics.
Santos, Emanuel; Faria, Daniel; Pesquita, Catia; Couto, Francisco M
2015-01-01
Ontology Matching aims at identifying a set of semantic correspondences, called an alignment, between related ontologies. In recent years, there has been a growing interest in efficient and effective matching methods for large ontologies. However, alignments produced for large ontologies are often logically incoherent. It was only recently that the use of repair techniques to improve the coherence of ontology alignments began to be explored. This paper presents a novel modularization technique for ontology alignment repair which extracts fragments of the input ontologies that only contain the necessary classes and relations to resolve all detectable incoherences. The paper presents also an alignment repair algorithm that uses a global repair strategy to minimize both the degree of incoherence and the number of mappings removed from the alignment, while overcoming the scalability problem by employing the proposed modularization technique. Our evaluation shows that our modularization technique produces significantly small fragments of the ontologies and that our repair algorithm produces more complete alignments than other current alignment repair systems, while obtaining an equivalent degree of incoherence. Additionally, we also present a variant of our repair algorithm that makes use of the confidence values of the mappings to improve alignment repair. Our repair algorithm was implemented as part of AgreementMakerLight, a free and open-source ontology matching system.
Lake sedimentary DNA accurately records 20th Century introductions of exotic conifers in Scotland.
Sjögren, Per; Edwards, Mary E; Gielly, Ludovic; Langdon, Catherine T; Croudace, Ian W; Merkel, Marie Kristine Føreid; Fonville, Thierry; Alsos, Inger Greve
2017-01-01
Sedimentary DNA (sedDNA) has recently emerged as a new proxy for reconstructing past vegetation, but its taphonomy, source area and representation biases need better assessment. We investigated how sedDNA in recent sediments of two small Scottish lakes reflects a major vegetation change, using well-documented 20 th Century plantations of exotic conifers as an experimental system. We used next-generation sequencing to barcode sedDNA retrieved from subrecent lake sediments. For comparison, pollen was analysed from the same samples. The sedDNA record contains 73 taxa (mainly genus or species), all but one of which are present in the study area. Pollen and sedDNA shared 35% of taxa, which partly reflects a difference in source area. More aquatic taxa were recorded in sedDNA, whereas taxa assumed to be of regional rather than local origin were recorded only as pollen. The chronology of the sediments and planting records are well aligned, and sedDNA of exotic conifers appears in high quantities with the establishment of plantations around the lakes. SedDNA recorded other changes in local vegetation that accompanied afforestation. There were no signs of DNA leaching in the sediments or DNA originating from pollen. © 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.
Sequencing, Annotation and Analysis of the Syrian Hamster (Mesocricetus auratus) Transcriptome
Tchitchek, Nicolas; Safronetz, David; Rasmussen, Angela L.; Martens, Craig; Virtaneva, Kimmo; Porcella, Stephen F.; Feldmann, Heinz
2014-01-01
Background The Syrian hamster (golden hamster, Mesocricetus auratus) is gaining importance as a new experimental animal model for multiple pathogens, including emerging zoonotic diseases such as Ebola. Nevertheless there are currently no publicly available transcriptome reference sequences or genome for this species. Results A cDNA library derived from mRNA and snRNA isolated and pooled from the brains, lungs, spleens, kidneys, livers, and hearts of three adult female Syrian hamsters was sequenced. Sequence reads were assembled into 62,482 contigs and 111,796 reads remained unassembled (singletons). This combined contig/singleton dataset, designated as the Syrian hamster transcriptome, represents a total of 60,117,204 nucleotides. Our Mesocricetus auratus Syrian hamster transcriptome mapped to 11,648 mouse transcripts representing 9,562 distinct genes, and mapped to a similar number of transcripts and genes in the rat. We identified 214 quasi-complete transcripts based on mouse annotations. Canonical pathways involved in a broad spectrum of fundamental biological processes were significantly represented in the library. The Syrian hamster transcriptome was aligned to the current release of the Chinese hamster ovary (CHO) cell transcriptome and genome to improve the genomic annotation of this species. Finally, our Syrian hamster transcriptome was aligned against 14 other rodents, primate and laurasiatheria species to gain insights about the genetic relatedness and placement of this species. Conclusions This Syrian hamster transcriptome dataset significantly improves our knowledge of the Syrian hamster's transcriptome, especially towards its future use in infectious disease research. Moreover, this library is an important resource for the wider scientific community to help improve genome annotation of the Syrian hamster and other closely related species. Furthermore, these data provide the basis for development of expression microarrays that can be used in functional genomics studies. PMID:25398096
Evidence for Widespread Reticulate Evolution within Human Duplicons
Jackson, Michael S. ; Oliver, Karen ; Loveland, Jane ; Humphray, Sean ; Dunham, Ian ; Rocchi, Mariano ; Viggiano, Luigi ; Park, Jonathan P. ; Hurles, Matthew E. ; Santibanez-Koref, Mauro
2005-01-01
Approximately 5% of the human genome consists of segmental duplications that can cause genomic mutations and may play a role in gene innovation. Reticulate evolutionary processes, such as unequal crossing-over and gene conversion, are known to occur within specific duplicon families, but the broader contribution of these processes to the evolution of human duplications remains poorly characterized. Here, we use phylogenetic profiling to analyze multiple alignments of 24 human duplicon families that span >8 Mb of DNA. Our results indicate that none of them are evolving independently, with all alignments showing sharp discontinuities in phylogenetic signal consistent with reticulation. To analyze these results in more detail, we have developed a quartet method that estimates the relative contribution of nucleotide substitution and reticulate processes to sequence evolution. Our data indicate that most of the duplications show a highly significant excess of sites consistent with reticulate evolution, compared with the number expected by nucleotide substitution alone, with 15 of 30 alignments showing a >20-fold excess over that expected. Using permutation tests, we also show that at least 5% of the total sequence shares 100% sequence identity because of reticulation, a figure that includes 74 independent tracts of perfect identity >2 kb in length. Furthermore, analysis of a subset of alignments indicates that the density of reticulation events is as high as 1 every 4 kb. These results indicate that phylogenetic relationships within recently duplicated human DNA can be rapidly disrupted by reticulate evolution. This finding has important implications for efforts to finish the human genome sequence, complicates comparative sequence analysis of duplicon families, and could profoundly influence the tempo of gene-family evolution. PMID:16252241
HBLAST: Parallelised sequence similarity--A Hadoop MapReducable basic local alignment search tool.
O'Driscoll, Aisling; Belogrudov, Vladislav; Carroll, John; Kropp, Kai; Walsh, Paul; Ghazal, Peter; Sleator, Roy D
2015-04-01
The recent exponential growth of genomic databases has resulted in the common task of sequence alignment becoming one of the major bottlenecks in the field of computational biology. It is typical for these large datasets and complex computations to require cost prohibitive High Performance Computing (HPC) to function. As such, parallelised solutions have been proposed but many exhibit scalability limitations and are incapable of effectively processing "Big Data" - the name attributed to datasets that are extremely large, complex and require rapid processing. The Hadoop framework, comprised of distributed storage and a parallelised programming framework known as MapReduce, is specifically designed to work with such datasets but it is not trivial to efficiently redesign and implement bioinformatics algorithms according to this paradigm. The parallelisation strategy of "divide and conquer" for alignment algorithms can be applied to both data sets and input query sequences. However, scalability is still an issue due to memory constraints or large databases, with very large database segmentation leading to additional performance decline. Herein, we present Hadoop Blast (HBlast), a parallelised BLAST algorithm that proposes a flexible method to partition both databases and input query sequences using "virtual partitioning". HBlast presents improved scalability over existing solutions and well balanced computational work load while keeping database segmentation and recompilation to a minimum. Enhanced BLAST search performance on cheap memory constrained hardware has significant implications for in field clinical diagnostic testing; enabling faster and more accurate identification of pathogenic DNA in human blood or tissue samples. Copyright © 2015 Elsevier Inc. All rights reserved.
Fortin, Connor H; Schulze, Katharina V; Babbitt, Gregory A
2015-01-01
It is now widely-accepted that DNA sequences defining DNA-protein interactions functionally depend upon local biophysical features of DNA backbone that are important in defining sites of binding interaction in the genome (e.g. DNA shape, charge and intrinsic dynamics). However, these physical features of DNA polymer are not directly apparent when analyzing and viewing Shannon information content calculated at single nucleobases in a traditional sequence logo plot. Thus, sequence logos plots are severely limited in that they convey no explicit information regarding the structural dynamics of DNA backbone, a feature often critical to binding specificity. We present TRX-LOGOS, an R software package and Perl wrapper code that interfaces the JASPAR database for computational regulatory genomics. TRX-LOGOS extends the traditional sequence logo plot to include Shannon information content calculated with regard to the dinucleotide-based BI-BII conformation shifts in phosphate linkages on the DNA backbone, thereby adding a visual measure of intrinsic DNA flexibility that can be critical for many DNA-protein interactions. TRX-LOGOS is available as an R graphics module offered at both SourceForge and as a download supplement at this journal. To demonstrate the general utility of TRX logo plots, we first calculated the information content for 416 Saccharomyces cerevisiae transcription factor binding sites functionally confirmed in the Yeastract database and matched to previously published yeast genomic alignments. We discovered that flanking regions contain significantly elevated information content at phosphate linkages than can be observed at nucleobases. We also examined broader transcription factor classifications defined by the JASPAR database, and discovered that many general signatures of transcription factor binding are locally more information rich at the level of DNA backbone dynamics than nucleobase sequence. We used TRX-logos in combination with MEGA 6.0 software for molecular evolutionary genetics analysis to visually compare the human Forkhead box/FOX protein evolution to its binding site evolution. We also compared the DNA binding signatures of human TP53 tumor suppressor determined by two different laboratory methods (SELEX and ChIP-seq). Further analysis of the entire yeast genome, center aligned at the start codon, also revealed a distinct sequence-independent 3 bp periodic pattern in information content, present only in coding region, and perhaps indicative of the non-random organization of the genetic code. TRX-LOGOS is useful in any situation in which important information content in DNA can be better visualized at the positions of phosphate linkages (i.e. dinucleotides) where the dynamic properties of the DNA backbone functions to facilitate DNA-protein interaction.
Hartman, Amber L; Riddle, Sean; McPhillips, Timothy; Ludäscher, Bertram; Eisen, Jonathan A
2010-06-12
For more than two decades microbiologists have used a highly conserved microbial gene as a phylogenetic marker for bacteria and archaea. The small-subunit ribosomal RNA gene, also known as 16 S rRNA, is encoded by ribosomal DNA, 16 S rDNA, and has provided a powerful comparative tool to microbial ecologists. Over time, the microbial ecology field has matured from small-scale studies in a select number of environments to massive collections of sequence data that are paired with dozens of corresponding collection variables. As the complexity of data and tool sets have grown, the need for flexible automation and maintenance of the core processes of 16 S rDNA sequence analysis has increased correspondingly. We present WATERS, an integrated approach for 16 S rDNA analysis that bundles a suite of publicly available 16 S rDNA analysis software tools into a single software package. The "toolkit" includes sequence alignment, chimera removal, OTU determination, taxonomy assignment, phylogentic tree construction as well as a host of ecological analysis and visualization tools. WATERS employs a flexible, collection-oriented 'workflow' approach using the open-source Kepler system as a platform. By packaging available software tools into a single automated workflow, WATERS simplifies 16 S rDNA analyses, especially for those without specialized bioinformatics, programming expertise. In addition, WATERS, like some of the newer comprehensive rRNA analysis tools, allows researchers to minimize the time dedicated to carrying out tedious informatics steps and to focus their attention instead on the biological interpretation of the results. One advantage of WATERS over other comprehensive tools is that the use of the Kepler workflow system facilitates result interpretation and reproducibility via a data provenance sub-system. Furthermore, new "actors" can be added to the workflow as desired and we see WATERS as an initial seed for a sizeable and growing repository of interoperable, easy-to-combine tools for asking increasingly complex microbial ecology questions.
Gold nanocrystals with DNA-directed morphologies.
Ma, Xingyi; Huh, June; Park, Wounjhang; Lee, Luke P; Kwon, Young Jik; Sim, Sang Jun
2016-09-16
Precise control over the structure of metal nanomaterials is important for developing advanced nanobiotechnology. Assembly methods of nanoparticles into structured blocks have been widely demonstrated recently. However, synthesis of nanocrystals with controlled, three-dimensional structures remains challenging. Here we show a directed crystallization of gold by a single DNA molecular regulator in a sequence-independent manner and its applications in three-dimensional topological controls of crystalline nanostructures. We anchor DNA onto gold nanoseed with various alignments to form gold nanocrystals with defined topologies. Some topologies are asymmetric including pushpin-, star- and biconcave disk-like structures, as well as more complex jellyfish- and flower-like structures. The approach of employing DNA enables the solution-based synthesis of nanocrystals with controlled, three-dimensional structures in a desired direction, and expands the current tools available for designing and synthesizing feature-rich nanomaterials for future translational biotechnology.
Gold nanocrystals with DNA-directed morphologies
NASA Astrophysics Data System (ADS)
Ma, Xingyi; Huh, June; Park, Wounjhang; Lee, Luke P.; Kwon, Young Jik; Sim, Sang Jun
2016-09-01
Precise control over the structure of metal nanomaterials is important for developing advanced nanobiotechnology. Assembly methods of nanoparticles into structured blocks have been widely demonstrated recently. However, synthesis of nanocrystals with controlled, three-dimensional structures remains challenging. Here we show a directed crystallization of gold by a single DNA molecular regulator in a sequence-independent manner and its applications in three-dimensional topological controls of crystalline nanostructures. We anchor DNA onto gold nanoseed with various alignments to form gold nanocrystals with defined topologies. Some topologies are asymmetric including pushpin-, star- and biconcave disk-like structures, as well as more complex jellyfish- and flower-like structures. The approach of employing DNA enables the solution-based synthesis of nanocrystals with controlled, three-dimensional structures in a desired direction, and expands the current tools available for designing and synthesizing feature-rich nanomaterials for future translational biotechnology.
Introducing difference recurrence relations for faster semi-global alignment of long sequences.
Suzuki, Hajime; Kasahara, Masahiro
2018-02-19
The read length of single-molecule DNA sequencers is reaching 1 Mb. Popular alignment software tools widely used for analyzing such long reads often take advantage of single-instruction multiple-data (SIMD) operations to accelerate calculation of dynamic programming (DP) matrices in the Smith-Waterman-Gotoh (SWG) algorithm with a fixed alignment start position at the origin. Nonetheless, 16-bit or 32-bit integers are necessary for storing the values in a DP matrix when sequences to be aligned are long; this situation hampers the use of the full SIMD width of modern processors. We proposed a faster semi-global alignment algorithm, "difference recurrence relations," that runs more rapidly than the state-of-the-art algorithm by a factor of 2.1. Instead of calculating and storing all the values in a DP matrix directly, our algorithm computes and stores mainly the differences between the values of adjacent cells in the matrix. Although the SWG algorithm and our algorithm can output exactly the same result, our algorithm mainly involves 8-bit integer operations, enabling us to exploit the full width of SIMD operations (e.g., 32) on modern processors. We also developed a library, libgaba, so that developers can easily integrate our algorithm into alignment programs. Our novel algorithm and optimized library implementation will facilitate accelerating nucleotide long-read analysis algorithms that use pairwise alignment stages. The library is implemented in the C programming language and available at https://github.com/ocxtal/libgaba .
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jiang, Tao; Meyer, Travis A.; Modlin, Charles
In this paper, we describe the co-assembly of two different building units: collagen-mimetic peptides and DNA origami. Two peptides CP ++ and sCP ++ are designed with a sequence comprising a central block (Pro-Hyp-Gly) and two positively charged domains (Pro-Arg-Gly) at both N- and C-termini. Co-assembly of peptides and DNA origami two-layer (TL) nanosheets affords the formation of one-dimensional nanowires with repeating periodicity of similar to 10 nm. Structural analyses suggest a face-to-face stacking of DNA nanosheets with peptides aligned perpendicularly to the sheet surfaces. We demonstrate the potential of selective peptide-DNA association between face-to-face and edge-to-edge packing by tailoringmore » the size of DNA nanostructures. Finally, this study presents an attractive strategy to create hybrid biomolecular assemblies from peptide and DNA-based building blocks that takes advantage of the intrinsic chemical and physical properties of the respective components to encode structural and, potentially, functional complexity within readily accessible biomimetic materials.« less
Anisotropic Brownian motion in ordered phases of DNA fragments.
Dobrindt, J; Rodrigo Teixeira da Silva, E; Alves, C; Oliveira, C L P; Nallet, F; Andreoli de Oliveira, E; Navailles, L
2012-01-01
Using Fluorescence Recovery After Photobleaching, we investigate the Brownian motion of DNA rod-like fragments in two distinct anisotropic phases with a local nematic symmetry. The height of the measurement volume ensures the averaging of the anisotropy of the in-plane diffusive motion parallel or perpendicular to the local nematic director in aligned domains. Still, as shown in using a model specifically designed to handle such a situation and predicting a non-Gaussian shape for the bleached spot as fluorescence recovery proceeds, the two distinct diffusion coefficients of the DNA particles can be retrieved from data analysis. In the first system investigated (a ternary DNA-lipid lamellar complex), the magnitude and anisotropy of the diffusion coefficient of the DNA fragments confined by the lipid bilayers are obtained for the first time. In the second, binary DNA-solvent system, the magnitude of the diffusion coefficient is found to decrease markedly as DNA concentration is increased from isotropic to cholesteric phase. In addition, the diffusion coefficient anisotropy measured within cholesteric domains in the phase coexistence region increases with concentration, and eventually reaches a high value in the cholesteric phase.
Jiang, Tao; Meyer, Travis A.; Modlin, Charles; ...
2017-09-26
In this paper, we describe the co-assembly of two different building units: collagen-mimetic peptides and DNA origami. Two peptides CP ++ and sCP ++ are designed with a sequence comprising a central block (Pro-Hyp-Gly) and two positively charged domains (Pro-Arg-Gly) at both N- and C-termini. Co-assembly of peptides and DNA origami two-layer (TL) nanosheets affords the formation of one-dimensional nanowires with repeating periodicity of similar to 10 nm. Structural analyses suggest a face-to-face stacking of DNA nanosheets with peptides aligned perpendicularly to the sheet surfaces. We demonstrate the potential of selective peptide-DNA association between face-to-face and edge-to-edge packing by tailoringmore » the size of DNA nanostructures. Finally, this study presents an attractive strategy to create hybrid biomolecular assemblies from peptide and DNA-based building blocks that takes advantage of the intrinsic chemical and physical properties of the respective components to encode structural and, potentially, functional complexity within readily accessible biomimetic materials.« less
Method for vacuum fusion bonding
Ackler, Harold D.; Swierkowski, Stefan P.; Tarte, Lisa A.; Hicks, Randall K.
2001-01-01
An improved vacuum fusion bonding structure and process for aligned bonding of large area glass plates, patterned with microchannels and access holes and slots, for elevated glass fusion temperatures. Vacuum pumpout of all components is through the bottom platform which yields an untouched, defect free top surface which greatly improves optical access through this smooth surface. Also, a completely non-adherent interlayer, such as graphite, with alignment and location features is located between the main steel platform and the glass plate pair, which makes large improvements in quality, yield, and ease of use, and enables aligned bonding of very large glass structures.
Embedding strategies for effective use of information from multiple sequence alignments.
Henikoff, S.; Henikoff, J. G.
1997-01-01
We describe a new strategy for utilizing multiple sequence alignment information to detect distant relationships in searches of sequence databases. A single sequence representing a protein family is enriched by replacing conserved regions with position-specific scoring matrices (PSSMs) or consensus residues derived from multiple alignments of family members. In comprehensive tests of these and other family representations, PSSM-embedded queries produced the best results overall when used with a special version of the Smith-Waterman searching algorithm. Moreover, embedding consensus residues instead of PSSMs improved performance with readily available single sequence query searching programs, such as BLAST and FASTA. Embedding PSSMs or consensus residues into a representative sequence improves searching performance by extracting multiple alignment information from motif regions while retaining single sequence information where alignment is uncertain. PMID:9070452
NASA Astrophysics Data System (ADS)
Ghazali, M. F.; Razak, N. A. Abd; Abu Osman, N. A.; Gholizadeh, H.
2017-06-01
Knee flexion contracture on a stump side is a phenomenon in which the stump cannot move in normal range of motion (ROM) or cannot be fully extended. This study has been carried out by using Biodex Stability System (BSS) in order to investigate the effect of stump flexion contracture towards the postural stability among the transtibial prosthesis users with the intervention of alignment accommodation. The BSS provides the reading of anterior-posterior stability index (APSI), medial-lateral stability index (MLSI), and overall stability index (OSI). Higher reading of the index indicates lesser stability. Each of the subjects had been tested in three different sessions that were Visit 1 (before contracture improvement), Visit 2 (after contracture improvement without alignment readjustment), and Visit 3 (after contracture improvement with alignment readjustment). The APSI reading was significantly higher during Visit 2 compared to Visit 1 and Visit 3. The OSI during Visit 2 was also found significantly higher compared to Visit 3. In Visit 2, the degree of contracture was significantly improved with 44.1% less than Visit 1. The stability index in anterior-posterior aspect (APSI) was proven to be lower as the prosthetic alignment was adjusted according to the ROM of knee. This finding explained that the alignment set up based on the adaptation with the stump’s ROM can contribute positively in maintaining postural stability.
Modeling the interactions of the nucleotide excision repair UvrA(2) dimer with DNA.
Gantchev, Tsvetan G; Hunting, Darel J
2010-12-28
The UvrA protein initiates the DNA damage recognition process by the bacterial nucleotide excision repair (NER) system. Recently, crystallographic structures of holo-UvrA(2) dimers from two different microorganisms have been released (Protein Data Bank entries 2r6f , 2vf7 , and 2vf8 ). However, the details of the DNA binding by UvrA(2) and other peculiarities involved in the damage recognition process remain unknown. We have undertaken a molecular modeling approach to appraise the possible modes of DNA-UvrA(2) interaction using molecular docking and short-scale guided molecular dynamics [continuum field, constrained, and/or unrestricted simulated annealing (SA)], taking into account the three-dimensional location of a series of mutation-identified UvrA residues implicated in DNA binding. The molecular docking was based on the assumptions that the UvrA(2) dimer is preformed prior to DNA binding and that no major protein conformational rearrangements, except moderate domain reorientations, are required for binding of undamaged DNA. As a first approximation, DNA was treated as a rigid ligand. From the electrostatic relief of the ventral surface of UvrA(2), we initially identified three, noncollinear DNA binding paths. Each of the three resulting nucleoprotein complexes (C1, C2, and C3) was analyzed separately, including calculation of binding energies, the number and type of interaction residues (including mutated ones), and the predominant mode of translational and rotational motion of specific protein domains after SA to ensure improved DNA binding. The UvrA(2) dimer can accommodate DNA in all three orientations, albeit with different binding strengths. One of the UvrA(2)-DNA complexes (C1) fulfilled most of the requirements (high interaction energy, proximity of DNA to mutated residues, etc.) expected for a natural, high-affinity DNA binding site. This nucleoprotein presents a structural organization that is designed to clamp and bend double-stranded DNA. We examined the binding site in more detail by docking DNAs of significantly different (AT- vs CG-enriched) sequences and by submitting the complexes to DNA-unrestricted SA. It was found that in a manner independent of the DNA sequence and applied MD protocols, UvrA(2) favors binding of a bent and unwound undamaged DNA, with a kink positioned in the proximity of the Zn3 hairpins, anticollinearly aligned at the bottom of the ventral protein surface. It is further hypothesized that the Zn3 modules play an essential role in the damage recognition process and that the apparent existence of a family of DNA binding sites might be biologically relevant. Our data should prove to be useful in rational (structure-based) mutation studies.
Kraken: ultrafast metagenomic sequence classification using exact alignments
2014-01-01
Kraken is an ultrafast and highly accurate program for assigning taxonomic labels to metagenomic DNA sequences. Previous programs designed for this task have been relatively slow and computationally expensive, forcing researchers to use faster abundance estimation programs, which only classify small subsets of metagenomic data. Using exact alignment of k-mers, Kraken achieves classification accuracy comparable to the fastest BLAST program. In its fastest mode, Kraken classifies 100 base pair reads at a rate of over 4.1 million reads per minute, 909 times faster than Megablast and 11 times faster than the abundance estimation program MetaPhlAn. Kraken is available at http://ccb.jhu.edu/software/kraken/. PMID:24580807
A Benchmark Study on Error Assessment and Quality Control of CCS Reads Derived from the PacBio RS
Jiao, Xiaoli; Zheng, Xin; Ma, Liang; Kutty, Geetha; Gogineni, Emile; Sun, Qiang; Sherman, Brad T.; Hu, Xiaojun; Jones, Kristine; Raley, Castle; Tran, Bao; Munroe, David J.; Stephens, Robert; Liang, Dun; Imamichi, Tomozumi; Kovacs, Joseph A.; Lempicki, Richard A.; Huang, Da Wei
2013-01-01
PacBio RS, a newly emerging third-generation DNA sequencing platform, is based on a real-time, single-molecule, nano-nitch sequencing technology that can generate very long reads (up to 20-kb) in contrast to the shorter reads produced by the first and second generation sequencing technologies. As a new platform, it is important to assess the sequencing error rate, as well as the quality control (QC) parameters associated with the PacBio sequence data. In this study, a mixture of 10 prior known, closely related DNA amplicons were sequenced using the PacBio RS sequencing platform. After aligning Circular Consensus Sequence (CCS) reads derived from the above sequencing experiment to the known reference sequences, we found that the median error rate was 2.5% without read QC, and improved to 1.3% with an SVM based multi-parameter QC method. In addition, a De Novo assembly was used as a downstream application to evaluate the effects of different QC approaches. This benchmark study indicates that even though CCS reads are post error-corrected it is still necessary to perform appropriate QC on CCS reads in order to produce successful downstream bioinformatics analytical results. PMID:24179701
A Benchmark Study on Error Assessment and Quality Control of CCS Reads Derived from the PacBio RS.
Jiao, Xiaoli; Zheng, Xin; Ma, Liang; Kutty, Geetha; Gogineni, Emile; Sun, Qiang; Sherman, Brad T; Hu, Xiaojun; Jones, Kristine; Raley, Castle; Tran, Bao; Munroe, David J; Stephens, Robert; Liang, Dun; Imamichi, Tomozumi; Kovacs, Joseph A; Lempicki, Richard A; Huang, Da Wei
2013-07-31
PacBio RS, a newly emerging third-generation DNA sequencing platform, is based on a real-time, single-molecule, nano-nitch sequencing technology that can generate very long reads (up to 20-kb) in contrast to the shorter reads produced by the first and second generation sequencing technologies. As a new platform, it is important to assess the sequencing error rate, as well as the quality control (QC) parameters associated with the PacBio sequence data. In this study, a mixture of 10 prior known, closely related DNA amplicons were sequenced using the PacBio RS sequencing platform. After aligning Circular Consensus Sequence (CCS) reads derived from the above sequencing experiment to the known reference sequences, we found that the median error rate was 2.5% without read QC, and improved to 1.3% with an SVM based multi-parameter QC method. In addition, a De Novo assembly was used as a downstream application to evaluate the effects of different QC approaches. This benchmark study indicates that even though CCS reads are post error-corrected it is still necessary to perform appropriate QC on CCS reads in order to produce successful downstream bioinformatics analytical results.
Gao, Fuqiang; Ma, Jinhui; Sun, Wei; Guo, Wanshou; Li, Zirong; Wang, Weiguo
2017-01-01
There are unanswered questions about knee-ankle alignment after total knee arthroplasty (TKA) for varus and valgus osteoarthritis (OA) of the knee. The aim of this retrospective study was to assess knee-ankle alignment after TKA. The study consisted of 149 patients who had undergone TKA due to varus and valgus knee OA. The alignment and angles in the selected knees and ankles were measured on full-length standing anteroposterior radiographs, both pre-operatively and post-operatively. The paired t-test and Pearson's correlation tests were used for statistical analysis. The results showed that ankle alignment correlated with knee alignment both pre-operatively and postoperatively (P<0.05). The pre-operative malalignment of the knee was corrected (P<0.05), and the ankle tilt angle was accordingly improved in the operative side after TKA (P<0.05). In addition, TKA had little effect on knee-ankle alignment on the non-operative side (P>0.05). These findings indicated that routine TKA could correct the varus or valgus deformity of a knee, and improve the tilt of the ankle. Ankle alignment correlated with knee alignment both pre-operatively and postoperatively. Both pre-operative knee and ankle malalignment can be simultaneously corrected following TKA. Level III. Copyright © 2016 Elsevier B.V. All rights reserved.
Suero, Eduardo M; Lueke, Ulrich; Stuebig, Timo; Hawi, Nael; Krettek, Christian; Liodakis, Emmanouil
2018-04-25
Procedure volume is an important determinant of total knee arthroplasty (TKA) outcomes. We aimed to determine whether computer navigation or patient-specific instrumentation (PSI) would improve postoperative alignment in a low-volume setting. PSI for TKA achieves better limb and implant alignment compared to conventional TKA and to computer navigated TKA. This is a retrospective cohort study of 385 primary TKAs (Women=59%. Mean age=67years. Mean BMI=30.1kg/m 2 ), which were performed using conventional instrumentation (n=117; 30%), computer navigation (n=209; 54%), or patient-specific instrumentation (n=59; 15%) in a low-volume center (<50 TKAs/year). The risk of postoperative leg and implant mechanical alignment outliers in the coronal plane (>3° from neutral), average alignment and operation time were assessed. The risk of postoperative mechanical alignment outliers (>3°) was reduced by 89% in the navigated group (4% outliers) compared to the conventional group (35%) (RR=0.11; p<0.0001). No significant improvement was observed in the PSI group (27%) (RR=0.91; p=0.772). The risk of postoperative femoral component coronal alignment outliers was reduced by 63% in the navigated group (11%) compared to the conventional group (31%) (RR=0.37; p=0.018). No significant reduction in outliers was observed in the PSI group (32%) (RR=1.08; p=0.816). There was a reduction in the risk of tibial component coronal malalignment of 66% in the navigated group (5%) compared to the conventional group (13%) (RR=0.33; p=0.070). There was a two-fold increase in the risk of tibial component alignment outliers in the PSI group (29%) (RR=1.94; p=0.110). Computer navigation improved postoperative alignment in TKA. No evidence of improved alignment was seen with patient-specific instrumentation. The routine use of patient-specific instrumentation in low-volume centers is not supported by the currently available data. Retrospective cohort study. Level IV. Copyright © 2018 Elsevier Masson SAS. All rights reserved.
"Performance Of A Wafer Stepper With Automatic Intra-Die Registration Correction."
NASA Astrophysics Data System (ADS)
van den Brink, M. A.; Wittekoek, S.; Linders, H. F. D.; van Hout, F. J.; George, R. A.
1987-01-01
An evaluation of a wafer stepper with the new improved Philips/ASM-L phase grating alignment system is reported. It is shown that an accurate alignment system needs an accurate X-Y-0 wafer stage and an accurate reticle Z stage to realize optimum overlay accuracy. This follows from a discussion of the overlay budget and an alignment procedure model. The accurate wafer stage permits high overlay accuracy using global alignment only, thus eliminating the throughput penalty of align-by-field schemes. The accurate reticle Z stage enables an intra-die magnification control with respect to the wafer scale. Various overlay data are reported, which have been measured with the automatic metrology program of the stepper. It is demonstrated that the new dual alignment system (with the external spatial filter) has improved the ability to align to weakly reflecting layers. The results are supported by a Fourier analysis of the alignment signal. Resolution data are given for the PAS 2500 projection lenses, which show that the high overlay accuracy of the system is properly matched with submicron linewidth control. The results of a recently introduced 20mm i-line lens with a numerical aperture of 0.4 (Zeiss 10-78-58) are included.
Feltus, F Alex; Wan, Jun; Schulze, Stefan R; Estill, James C; Jiang, Ning; Paterson, Andrew H
2004-09-01
Dense coverage of the rice genome with polymorphic DNA markers is an invaluable tool for DNA marker-assisted breeding, positional cloning, and a wide range of evolutionary studies. We have aligned drafts of two rice subspecies, indica and japonica, and analyzed levels and patterns of genetic diversity. After filtering multiple copy and low quality sequence, 408,898 candidate DNA polymorphisms (SNPs/INDELs) were discerned between the two subspecies. These filters have the consequence that our data set includes only a subset of the available SNPs (in particular excluding large numbers of SNPs that may occur between repetitive DNA alleles) but increase the likelihood that this subset is useful: Direct sequencing suggests that 79.8% +/- 7.5% of the in silico SNPs are real. The SNP sample in our database is not randomly distributed across the genome. In fact, 566 rice genomic regions had unusually high (328 contigs/48.6 Mb/13.6% of genome) or low (237 contigs/64.7 Mb/18.1% of genome) polymorphism rates. Many SNP-poor regions were substantially longer than most SNP-rich regions, covering up to 4 Mb, and possibly reflecting introgression between the respective gene pools that may have occurred hundreds of years ago. Although 46.2% +/- 8.3% of the SNPs differentiate other pairs of japonica and indica genotypes, SNP rates in rice were not predictive of evolutionary rates for corresponding genes in another grass species, sorghum. The data set is freely available at http://www.plantgenome.uga.edu/snp.
Feltus, F. Alex; Wan, Jun; Schulze, Stefan R.; Estill, James C.; Jiang, Ning; Paterson, Andrew H.
2004-01-01
Dense coverage of the rice genome with polymorphic DNA markers is an invaluable tool for DNA marker-assisted breeding, positional cloning, and a wide range of evolutionary studies. We have aligned drafts of two rice subspecies, indica and japonica, and analyzed levels and patterns of genetic diversity. After filtering multiple copy and low quality sequence, 408,898 candidate DNA polymorphisms (SNPs/INDELs) were discerned between the two subspecies. These filters have the consequence that our data set includes only a subset of the available SNPs (in particular excluding large numbers of SNPs that may occur between repetitive DNA alleles) but increase the likelihood that this subset is useful: Direct sequencing suggests that 79.8% ± 7.5% of the in silico SNPs are real. The SNP sample in our database is not randomly distributed across the genome. In fact, 566 rice genomic regions had unusually high (328 contigs/48.6 Mb/13.6% of genome) or low (237 contigs/64.7 Mb/18.1% of genome) polymorphism rates. Many SNP-poor regions were substantially longer than most SNP-rich regions, covering up to 4 Mb, and possibly reflecting introgression between the respective gene pools that may have occurred hundreds of years ago. Although 46.2% ± 8.3% of the SNPs differentiate other pairs of japonica and indica genotypes, SNP rates in rice were not predictive of evolutionary rates for corresponding genes in another grass species, sorghum. The data set is freely available at http://www.plantgenome.uga.edu/snp. PMID:15342564
Rudi, Knut; Zimonja, Monika; Kvenshagen, Bente; Rugtveit, Jarle; Midtvedt, Tore; Eggesbø, Merete
2007-01-01
We present a novel approach for comparing 16S rRNA gene clone libraries that is independent of both DNA sequence alignment and definition of bacterial phylogroups. These steps are the major bottlenecks in current microbial comparative analyses. We used direct comparisons of taxon density distributions in an absolute evolutionary coordinate space. The coordinate space was generated by using alignment-independent bilinear multivariate modeling. Statistical analyses for clone library comparisons were based on multivariate analysis of variance, partial least-squares regression, and permutations. Clone libraries from both adult and infant gastrointestinal tract microbial communities were used as biological models. We reanalyzed a library consisting of 11,831 clones covering complete colons from three healthy adults in addition to a smaller 390-clone library from infant feces. We show that it is possible to extract detailed information about microbial community structures using our alignment-independent method. Our density distribution analysis is also very efficient with respect to computer operation time, meeting the future requirements of large-scale screenings to understand the diversity and dynamics of microbial communities. PMID:17337554
Compositions and methods for detecting single nucleotide polymorphisms
Yeh, Hsin-Chih; Werner, James; Martinez, Jennifer S.
2016-11-22
Described herein are nucleic acid based probes and methods for discriminating and detecting single nucleotide variants in nucleic acid molecules (e.g., DNA). The methods include use of a pair of probes can be used to detect and identify polymorphisms, for example single nucleotide polymorphism in DNA. The pair of probes emit a different fluorescent wavelength of light depending on the association and alignment of the probes when hybridized to a target nucleic acid molecule. Each pair of probes is capable of discriminating at least two different nucleic acid molecules that differ by at least a single nucleotide difference. The methods can probes can be used, for example, for detection of DNA polymorphisms that are indicative of a particular disease or condition.
Lyu, Weiwei; Cheng, Xianghong
2017-11-28
Transfer alignment is always a key technology in a strapdown inertial navigation system (SINS) because of its rapidity and accuracy. In this paper a transfer alignment model is established, which contains the SINS error model and the measurement model. The time delay in the process of transfer alignment is analyzed, and an H∞ filtering method with delay compensation is presented. Then the H∞ filtering theory and the robust mechanism of H∞ filter are deduced and analyzed in detail. In order to improve the transfer alignment accuracy in SINS with time delay, an adaptive H∞ filtering method with delay compensation is proposed. Since the robustness factor plays an important role in the filtering process and has effect on the filtering accuracy, the adaptive H∞ filter with delay compensation can adjust the value of robustness factor adaptively according to the dynamic external environment. The vehicle transfer alignment experiment indicates that by using the adaptive H∞ filtering method with delay compensation, the transfer alignment accuracy and the pure inertial navigation accuracy can be dramatically improved, which demonstrates the superiority of the proposed filtering method.
Carbon Nanotube Electrode Arrays For Enhanced Chemical and Biological Sensing
NASA Technical Reports Server (NTRS)
Han, Jie
2003-01-01
Applications of carbon nanotubes for ultra-sensitive electrical sensing of chemical and biological species have been a major focus in NASA Ames Center for Nanotechnology. Great progress has been made toward controlled growth and chemical functionalization of vertically aligned carbon nanotube arrays and integration into micro-fabricated chip devices. Carbon nanotube electrode arrays devices have been used for sub-attomole detection of DNA molecules. Interdigitated carbon nanotubes arrays devices have been applied to sub ppb (part per billion) level chemical sensing for many molecules at room temperature. Stability and reliability have also been addressed in our device development. These results show order of magnitude improvement in device performance, size and power consumption as compared to micro devices, promising applications of carbon nanotube electrode arrays for clinical molecular diagnostics, personal medical testing and monitoring, and environmental monitoring.
Yang, Haozhe; Mei, Hui; Seela, Frank
2015-07-06
Reverse Watson-Crick DNA with parallel-strand orientation (ps DNA) has been constructed. Pyrrolo-dC (PyrdC) nucleosides with phenyl and pyridinyl residues linked to the 6 position of the pyrrolo[2,3-d]pyrimidine base have been incorporated in 12- and 25-mer oligonucleotide duplexes and utilized as silver-ion binding sites. Thermal-stability studies on the parallel DNA strands demonstrated extremely strong silver-ion binding and strongly enhanced duplex stability. Stoichiometric UV and fluorescence titration experiments verified that a single (2py) PyrdC-(2py) PyrdC pair captures two silver ions in ps DNA. A structure for the PyrdC silver-ion base pair that aligns 7-deazapurine bases head-to-tail instead of head-to-head, as suggested for canonical DNA, is proposed. The silver DNA double helix represents the first example of a ps DNA structure built up of bidentate and tridentate reverse Watson-Crick base pairs stabilized by a dinuclear silver-mediated PyrdC pair. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Star, Bastiaan; Nederbragt, Alexander J.; Hansen, Marianne H. S.; Skage, Morten; Gilfillan, Gregor D.; Bradbury, Ian R.; Pampoulie, Christophe; Stenseth, Nils Chr; Jakobsen, Kjetill S.; Jentoft, Sissel
2014-01-01
Degradation-specific processes and variation in laboratory protocols can bias the DNA sequence composition from samples of ancient or historic origin. Here, we identify a novel artifact in sequences from historic samples of Atlantic cod (Gadus morhua), which forms interrupted palindromes consisting of reverse complementary sequence at the 5′ and 3′-ends of sequencing reads. The palindromic sequences themselves have specific properties – the bases at the 5′-end align well to the reference genome, whereas extensive misalignments exists among the bases at the terminal 3′-end. The terminal 3′ bases are artificial extensions likely caused by the occurrence of hairpin loops in single stranded DNA (ssDNA), which can be ligated and amplified in particular library creation protocols. We propose that such hairpin loops allow the inclusion of erroneous nucleotides, specifically at the 3′-end of DNA strands, with the 5′-end of the same strand providing the template. We also find these palindromes in previously published ancient DNA (aDNA) datasets, albeit at varying and substantially lower frequencies. This artifact can negatively affect the yield of endogenous DNA in these types of samples and introduces sequence bias. PMID:24608104
Direct Single-Molecule Observation of Mode and Geometry of RecA-Mediated Homology Search.
Lee, Andrew J; Endo, Masayuki; Hobbs, Jamie K; Wälti, Christoph
2018-01-23
Genomic integrity, when compromised by accrued DNA lesions, is maintained through efficient repair via homologous recombination. For this process the ubiquitous recombinase A (RecA), and its homologues such as the human Rad51, are of central importance, able to align and exchange homologous sequences within single-stranded and double-stranded DNA in order to swap out defective regions. Here, we directly observe the widely debated mechanism of RecA homology searching at a single-molecule level using high-speed atomic force microscopy (HS-AFM) in combination with tailored DNA origami frames to present the reaction targets in a way suitable for AFM-imaging. We show that RecA nucleoprotein filaments move along DNA substrates via short-distance facilitated diffusions, or slides, interspersed with longer-distance random moves, or hops. Importantly, from the specific interaction geometry, we find that the double-stranded substrate DNA resides in the secondary DNA binding-site within the RecA nucleoprotein filament helical groove during the homology search. This work demonstrates that tailored DNA origami, in conjunction with HS-AFM, can be employed to reveal directly conformational and geometrical information on dynamic protein-DNA interactions which was previously inaccessible at an individual single-molecule level.
DNA-DNA interaction beyond the ground state
NASA Astrophysics Data System (ADS)
Lee, D. J.; Wynveen, A.; Kornyshev, A. A.
2004-11-01
The electrostatic interaction potential between DNA duplexes in solution is a basis for the statistical mechanics of columnar DNA assemblies. It may also play an important role in recombination of homologous genes. We develop a theory of this interaction that includes thermal torsional fluctuations of DNA using field-theoretical methods and Monte Carlo simulations. The theory extends and rationalizes the earlier suggested variational approach which was developed in the context of a ground state theory of interaction of nonhomologous duplexes. It shows that the heuristic variational theory is equivalent to the Hartree self-consistent field approximation. By comparison of the Hartree approximation with an exact solution based on the QM analogy of path integrals, as well as Monte Carlo simulations, we show that this easily analytically-tractable approximation works very well in most cases. Thermal fluctuations do not remove the ability of DNA molecules to attract each other at favorable azimuthal conformations, neither do they wash out the possibility of electrostatic “snap-shot” recognition of homologous sequences, considered earlier on the basis of ground state calculations. At short distances DNA molecules undergo a “torsional alignment transition,” which is first order for nonhomologous DNA and weaker order for homologous sequences.
Total knee arthroplasty with a computer-navigated saw: a pilot study.
Garvin, Kevin L; Barrera, Andres; Mahoney, Craig R; Hartman, Curtis W; Haider, Hani
2013-01-01
Computer-aided surgery aims to improve implant alignment in TKA but has only been adopted by a minority for routine use. A novel approach, navigated freehand bone cutting (NFC), is intended to achieve wider acceptance by eliminating the need for cumbersome, implant-specific mechanical jigs and avoiding the expense of navigation. We determined cutting time, surface quality, implant fit, and implant alignment after NFC of synthetic femoral specimens and the feasibility and alignment of a complete TKA performed with NFC technology in cadaveric specimens. Seven surgeons prepared six synthetic femoral specimens each, using our custom NFC system. Cutting times, quality of bone cuts, and implant fit and alignment were assessed quantitatively by CT surface scanning and computational measurements. Additionally, a single surgeon performed a complete TKA on two cadaveric specimens using the NFC system, with cutting time and implant alignment analyzed through plain radiographs and CT. For the synthetic specimens, femoral coronal alignment was within ± 2° of neutral in 94% of the specimens. Sagittal alignment was within 0° to 5° of flexion in all specimens. Rotation was within ± 1° of the epicondylar axis in 97% of the specimens. The mean time to make cuts improved from 13 minutes for the first specimen to 9 minutes for the fourth specimen. TKA was performed in two cadaveric specimens without complications and implants were well aligned. TKA is feasible with NFC, which eliminates the need for implant-specific instruments. We observed a fast learning curve. NFC has the potential to improve TKA alignment, reduce operative time, and reduce the number of instruments in surgery. Fewer instruments and less sterilization could reduce costs associated with TKA.
Paramagnetic decoration of DNA origami nanostructures by Eu³⁺ coordination.
Opherden, Lars; Oertel, Jana; Barkleit, Astrid; Fahmy, Karim; Keller, Adrian
2014-07-15
The folding of DNA into arbitrary two- and three-dimensional shapes, called DNA origami, represents a powerful tool for the synthesis of functional nanostructures. Here, we present the first approach toward the paramagnetic functionalization of DNA origami nanostructures by utilizing postassembly coordination with Eu(3+) ions. In contrast to the usual formation of toroidal dsDNA condensates in the presence of trivalent cations, planar as well as rod-like DNA origami maintain their shape and monomeric state even under high loading with the trivalent lanthanide. Europium coordination was demonstrated by the change in Eu(3+) luminescence upon binding to the two DNA origami. Their natural circular dichroism in the Mg(2+)- and Eu(3+)-bound state was found to be very similar to that of genomic DNA, evidencing little influence of the DNA origami superstructure on the local chirality of the stacked base pairs. In contrast, the magnetic circular dichroism of the Mg(2+)-bound DNA origami deviates from that of genomic DNA. Furthermore, the lanthanide affects the magnetic properties of DNA in a superstructure-dependent fashion, indicative of the existence of superstructure-specific geometry of Eu(3+) binding sites in the DNA origami that are not formed in genomic DNA. This simple approach lays the foundation for the generation of magneto-responsive DNA origami nanostructures. Such systems do not require covalent modifications and can be used for the magnetic manipulation of DNA nanostructures or for the paramagnetic alignment of molecules in NMR spectroscopy.
NASA Astrophysics Data System (ADS)
Carbeck, Jeffrey; Petit, Cecilia
2004-03-01
Current efforts in nanotechnology use one of two basic approaches: top-down fabrication and bottom-up assembly. Top-down strategies use lithography and contact printing to create patterned surfaces and microfluidic channels that, in turn, can corral and organize nanoscale structures. Bottom-up approaches use templates to direct the assembly of atoms, molecules, and nanoparticles through molecular recognition. The goal of this work is to integrate these strategies by first patterning and orienting DNA molecules through top-down tools so that single DNA chains can then serve as templates for the bottom-up construction of hetero-structures composed of proteins and nanoparticles, both metallic and semi-conducting. The first part of this talk focuses on the top-down strategies used to create microscopic patterns of stretched and aligned molecules of DNA. Specifically, it presents a new method in which molecular combing -- a process by which molecules are deposited and stretched onto a surface by the passage of an air-water interface -- is performed in microchannels. This approach demonstrates that the shape and motion of this interface serve as an effective local field directing the chains dynamically as they are stretched onto the surface. The geometry of the microchannel directs the placement of the DNA molecules, while the geometry of the air-water interface directs the local orientation and curvature of the molecules. This ability to control both the placement and orientation of chains has implication for the use of this technique in genetic analysis and in the bottom up approach to nanofabrication.The second half of this talk presents our bottom-up strategy, which allows placement of nanoparticles along individual DNA chains with a theoretical resolution of less than 1 nm. Specifically, we demonstrate the sequence-specific patterning of nanoparticles via the hybridization of functionalized complementary probes to surface-bound chains of double-stranded DNA. Using this technique, we demonstrate the ability to assemble metals, semiconductors, and a composite of both on a single molecule.
Shear induced alignment of short nanofibers in 3D printed polymer composites.
Yunus, Doruk Erdem; Shi, Wentao; Sohrabi, Salman; Liu, Yaling
2016-12-09
3D printing of composite materials offers an opportunity to combine the desired properties of composite materials with the flexibility of additive manufacturing in geometric shape and complexity. In this paper, the shear-induced alignment of aluminum oxide nanowires during stereolithography printing was utilized to fabricate a nanowire reinforced polymer composite. To align the fibers, a lateral oscillation mechanism was implemented and combined with wall pattern printing technique to generate shear flow in both vertical and horizontal directions. A series of specimens were fabricated for testing the composite material's tensile strength. The results showed that mechanical properties of the composite were improved by reinforcement of nanofibers through shear induced alignment. The improvement of tensile strength was approximately ∼28% by aligning the nanowires at 5 wt% (∼1.5% volume fraction) loading of aluminum oxide nanowires.
Pair aligning improved motility of Quincke rollers.
Lu, Shi Qing; Zhang, Bing Yue; Zhang, Zhi Chao; Shi, Yan; Zhang, Tian Hui
2018-06-06
Density-dependent speed is studied in a two-dimensional active colloid in which the colloidal particles are propelled by an external electric field via a Quincke rotation. Above the critcal electric field, dense dynamic clusters form spotaneously, in which the particles are highly aligned in velocity and move much faster than isolated units. Detailed observations on pair collision reveal that the alignment of velocity is induced by the long-ranged hydrodynamic interactions and the improvement of speed in the clusters arises from pair aligning in which two particles are closely paired and rotate synchronically. In the aligning state, the short-range in-plane dipole-dipole attraction enhances the rotation torque and gives rises to a larger rolling speed. The pair aligning becomes difficult and unstable at high electric field where the normal dipole-dipole repulsion becomes dominant. As a consequence, the dependence of speed on density becomes weak increasingly upon the increase of the electric field. This result offers an interpretation for the discrepancy between our and previous observations on Quincke rollers.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pierce, Karisa M.; Wood, Lianna F.; Wright, Bob W.
2005-12-01
A comprehensive two-dimensional (2D) retention time alignment algorithm was developed using a novel indexing scheme. The algorithm is termed comprehensive because it functions to correct the entire chromatogram in both dimensions and it preserves the separation information in both dimensions. Although the algorithm is demonstrated by correcting comprehensive two-dimensional gas chromatography (GC x GC) data, the algorithm is designed to correct shifting in all forms of 2D separations, such as LC x LC, LC x CE, CE x CE, and LC x GC. This 2D alignment algorithm was applied to three different data sets composed of replicate GC x GCmore » separations of (1) three 22-component control mixtures, (2) three gasoline samples, and (3) three diesel samples. The three data sets were collected using slightly different temperature or pressure programs to engender significant retention time shifting in the raw data and then demonstrate subsequent corrections of that shifting upon comprehensive 2D alignment of the data sets. Thirty 12-min GC x GC separations from three 22-component control mixtures were used to evaluate the 2D alignment performance (10 runs/mixture). The average standard deviation of the first column retention time improved 5-fold from 0.020 min (before alignment) to 0.004 min (after alignment). Concurrently, the average standard deviation of second column retention time improved 4-fold from 3.5 ms (before alignment) to 0.8 ms (after alignment). Alignment of the 30 control mixture chromatograms took 20 min. The quantitative integrity of the GC x GC data following 2D alignment was also investigated. The mean integrated signal was determined for all components in the three 22-component mixtures for all 30 replicates. The average percent difference in the integrated signal for each component before and after alignment was 2.6%. Singular value decomposition (SVD) was applied to the 22-component control mixture data before and after alignment to show the restoration of trilinearity to the data, since trilinearity benefits chemometric analysis. By applying comprehensive 2D retention time alignment to all three data sets (control mixtures, gasoline samples, and diesel samples), classification by principal component analysis (PCA) substantially improved, resulting in 100% accurate scores clustering.« less
[Identification of Tibetan medicine "Dida" of Gentianaceae using DNA barcoding].
Liu, Chuan; Zhang, Yu-Xin; Liu, Yue; Chen, Yi-Long; Fan, Gang; Xiang, Li; Xu, Jiang; Zhang, Yi
2016-02-01
The ITS2 barcode was used toidentify Tibetan medicine "Dida", and tosecure its quality and safety in medication. A total of 13 species, 151 experimental samples for the study from the Tibetan Plateau, including Gentianaceae Swertia, Halenia, Gentianopsis, Comastoma, Lomatogonium ITS2 sequences were amplified, and purified PCR products were sequenced. Sequence assembly and consensus sequence generation were performed using the CodonCode Aligner V3.7.1. The Kimura 2-Parameter (K2P) distances were calculated using MEGA 6.0. The neighbor-joining (NJ) phylogenetic trees were constructed. There are 31 haplotypes among 231 bp after alignment of all ITS2 sequence haplotypes, and the average G±C content of 61.40%. The NJ tree strongly supported that every species clustered into their own clade and high identification success rate, except that Swertia bifolia and Swertia wolfangiana could not be distinguished from each other based on the sequence divergences. DNA barcoding could be used as a fast and accurate identification method to distinguish Tibetan medicine "Dida" to ensure its safe use. Copyright© by the Chinese Pharmaceutical Association.
Global Analysis of Transcription Factor-Binding Sites in Yeast Using ChIP-Seq
Lefrançois, Philippe; Gallagher, Jennifer E. G.; Snyder, Michael
2016-01-01
Transcription factors influence gene expression through their ability to bind DNA at specific regulatory elements. Specific DNA-protein interactions can be isolated through the chromatin immunoprecipitation (ChIP) procedure, in which DNA fragments bound by the protein of interest are recovered. ChIP is followed by high-throughput DNA sequencing (Seq) to determine the genomic provenance of ChIP DNA fragments and their relative abundance in the sample. This chapter describes a ChIP-Seq strategy adapted for budding yeast to enable the genome-wide characterization of binding sites of transcription factors (TFs) and other DNA-binding proteins in an efficient and cost-effective way. Yeast strains with epitope-tagged TFs are most commonly used for ChIP-Seq, along with their matching untagged control strains. The initial step of ChIP involves the cross-linking of DNA and proteins. Next, yeast cells are lysed and sonicated to shear chromatin into smaller fragments. An antibody against an epitope-tagged TF is used to pull down chromatin complexes containing DNA and the TF of interest. DNA is then purified and proteins degraded. Specific barcoded adapters for multiplex DNA sequencing are ligated to ChIP DNA. Short DNA sequence reads (28–36 base pairs) are parsed according to the barcode and aligned against the yeast reference genome, thus generating a nucleotide-resolution map of transcription factor-binding sites and their occupancy. PMID:25213249
Weaver, Mitchell T; Lynch, Kyle B; Zhu, Zaifang; Chen, Huang; Lu, Joann J; Pu, Qiaosheng; Liu, Shaorong
2017-04-01
Laser-induced fluorescence (LIF) detectors for low-micrometer and sub-micrometer capillary on-column detection are not commercially available. In this paper, we describe in details how to construct a confocal LIF detector to address this issue. We characterize the detector by determining its limit of detection (LOD), linear dynamic range (LDR) and background signal drift; a very low LOD (~70 fluorescein molecules or 12 yoctomole fluorescein), a wide LDR (greater than 3 orders of magnitude) and a small background signal drift (~1.2-fold of the root mean square noise) are obtained. For detecting analytes inside a low-micrometer and sub-micrometer capillary, proper alignment is essential. We present a simple protocol to align the capillary with the optical system and use the position-lock capability of a translation stage to fix the capillary in position during the experiment. To demonstrate the feasibility of using this detector for narrow capillary systems, we build a 2-μm-i.d. capillary flow injection analysis (FIA) system using the newly developed LIF prototype as a detector and obtain an FIA LOD of 14 zeptomole fluorescein. We also separate a DNA ladder sample by bare narrow capillary - hydrodynamic chromatography and use the LIF prototype to monitor the resolved DNA fragments. We obtain not only well-resolved peaks but also the quantitative information of all DNA fragments. Copyright © 2016 Elsevier B.V. All rights reserved.
Liao, Ai-Jun; Su, Qi; Wang, Xun; Zeng, Bin; Shi, Wei
2008-01-01
AIM: To isolate and analyze the DNA sequences which are methylated differentially between gastric cancer and normal gastric mucosa. METHODS: The differentially methylated DNA sequences between gastric cancer and normal gastric mucosa were isolated by methylation-sensitive representational difference analysis (MS-RDA). Similarities between the separated fragments and the human genomic DNA were analyzed with Basic Local Alignment Search Tool (BLAST). RESULTS: Three differentially methylated DNA sequences were obtained, two of which have been accepted by GenBank. The accession numbers are AY887106 and AY887107. AY887107 was highly similar to the 11th exon of LOC440683 (98%), 3’ end of LOC440887 (99%), and promoter and exon regions of DRD5 (94%). AY887106 was consistent (98%) with a CpG island in ribosomal RNA isolated from colorectal cancer by Minoru Toyota in 1999. CONCLUSION: The methylation degree is different between gastric cancer and normal gastric mucosa. The differentially methylated DNA sequences can be isolated effectively by MS-RDA. PMID:18322944
Ancient DNA sequence revealed by error-correcting codes.
Brandão, Marcelo M; Spoladore, Larissa; Faria, Luzinete C B; Rocha, Andréa S L; Silva-Filho, Marcio C; Palazzo, Reginaldo
2015-07-10
A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code.
Ancient DNA sequence revealed by error-correcting codes
Brandão, Marcelo M.; Spoladore, Larissa; Faria, Luzinete C. B.; Rocha, Andréa S. L.; Silva-Filho, Marcio C.; Palazzo, Reginaldo
2015-01-01
A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code. PMID:26159228
A High-Throughput Arabidopsis Reverse Genetics System
Sessions, Allen; Burke, Ellen; Presting, Gernot; Aux, George; McElver, John; Patton, David; Dietrich, Bob; Ho, Patrick; Bacwaden, Johana; Ko, Cynthia; Clarke, Joseph D.; Cotton, David; Bullis, David; Snell, Jennifer; Miguel, Trini; Hutchison, Don; Kimmerly, Bill; Mitzel, Theresa; Katagiri, Fumiaki; Glazebrook, Jane; Law, Marc; Goff, Stephen A.
2002-01-01
A collection of Arabidopsis lines with T-DNA insertions in known sites was generated to increase the efficiency of functional genomics. A high-throughput modified thermal asymetric interlaced (TAIL)-PCR protocol was developed and used to amplify DNA fragments flanking the T-DNA left borders from ∼100,000 transformed lines. A total of 85,108 TAIL-PCR products from 52,964 T-DNA lines were sequenced and compared with the Arabidopsis genome to determine the positions of T-DNAs in each line. Predicted T-DNA insertion sites, when mapped, showed a bias against predicted coding sequences. Predicted insertion mutations in genes of interest can be identified using Arabidopsis Gene Index name searches or by BLAST (Basic Local Alignment Search Tool) search. Insertions can be confirmed by simple PCR assays on individual lines. Predicted insertions were confirmed in 257 of 340 lines tested (76%). This resource has been named SAIL (Syngenta Arabidopsis Insertion Library) and is available to the scientific community at www.tmri.org. PMID:12468722
Adenovirus 36 DNA in human adipose tissue.
Ponterio, E; Cangemi, R; Mariani, S; Casella, G; De Cesare, A; Trovato, F M; Garozzo, A; Gnessi, L
2015-12-01
Recent studies have suggested a possible correlation between obesity and adenovirus 36 (Adv36) infection in humans. As information on adenoviral DNA presence in human adipose tissue are limited, we evaluated the presence of Adv36 DNA in adipose tissue of 21 adult overweight or obese patients. Total DNA was extracted from adipose tissue biopsies. Virus detection was performed using PCR protocols with primers against specific Adv36 fiber protein and the viral oncogenic E4orf1 protein nucleotide sequences. Sequences were aligned with the NCBI database and phylogenetic analyses were carried out with MEGA6 software. Adv36 DNA was found in four samples (19%). This study indicates that some individuals carry Adv36 in the visceral adipose tissue. Further studies are needed to determine the specific effect of Adv36 infection on adipocytes, the prevalence of Adv36 infection and its relationship with obesity in the perspective of developing a vaccine that could potentially prevent or mitigate infection.
W-curve alignments for HIV-1 genomic comparisons.
Cork, Douglas J; Lembark, Steven; Tovanabutra, Sodsai; Robb, Merlin L; Kim, Jerome H
2010-06-01
The W-curve was originally developed as a graphical visualization technique for viewing DNA and RNA sequences. Its ability to render features of DNA also makes it suitable for computational studies. Its main advantage in this area is utilizing a single-pass algorithm for comparing the sequences. Avoiding recursion during sequence alignments offers advantages for speed and in-process resources. The graphical technique also allows for multiple models of comparison to be used depending on the nucleotide patterns embedded in similar whole genomic sequences. The W-curve approach allows us to compare large numbers of samples quickly. We are currently tuning the algorithm to accommodate quirks specific to HIV-1 genomic sequences so that it can be used to aid in diagnostic and vaccine efforts. Tracking the molecular evolution of the virus has been greatly hampered by gap associated problems predominantly embedded within the envelope gene of the virus. Gaps and hypermutation of the virus slow conventional string based alignments of the whole genome. This paper describes the W-curve algorithm itself, and how we have adapted it for comparison of similar HIV-1 genomes. A treebuilding method is developed with the W-curve that utilizes a novel Cylindrical Coordinate distance method and gap analysis method. HIV-1 C2-V5 env sequence regions from a Mother/Infant cohort study are used in the comparison. The output distance matrix and neighbor results produced by the W-curve are functionally equivalent to those from Clustal for C2-V5 sequences in the mother/infant pairs infected with CRF01_AE. Significant potential exists for utilizing this method in place of conventional string based alignment of HIV-1 genomes, such as Clustal X. With W-curve heuristic alignment, it may be possible to obtain clinically useful results in a short time-short enough to affect clinical choices for acute treatment. A description of the W-curve generation process, including a comparison technique of aligning extremes of the curves to effectively phase-shift them past the HIV-1 gap problem, is presented. Besides yielding similar neighbor-joining phenogram topologies, most Mother and Infant C2-V5 sequences in the cohort pairs geometrically map closest to each other, indicating that W-curve heuristics overcame any gap problem.
NASA Astrophysics Data System (ADS)
Birrento, Monica L.; Bryan, Tracy M.; Samosorn, Siritron; Beck, Jennifer L.
2015-07-01
Electrospray ionization mass spectrometry (ESI-MS) conditions were optimized for simultaneous observation of a bimolecular qDNA and a Watson-Crick base-paired duplex DNA/RNA hybrid. The DNA sequence used was telomeric DNA, and the RNA contained the template for telomerase-mediated telomeric DNA synthesis. Addition of RNA to the quadruplex DNA (qDNA) resulted in formation of the duplex DNA/RNA hybrid. Melting profiles obtained using circular dichroism spectroscopy confirmed that the DNA/RNA hybrid exhibited greater thermal stability than the bimolecular qDNA in solution. Binding of a 13-substituted berberine ( 1) derivative to the bimolecular qDNA stabilized its structure as evidenced by an increase in its stability in the mass spectrometer, and an increase in its circular dichroism (CD) melting temperature of 10°C. The DNA/RNA hybrid did not bind the ligand extensively and its thermal stability was unchanged in the presence of ( 1). The qDNA-ligand complex resisted unfolding in the presence of excess RNA, limiting the formation of the DNA/RNA hybrid. Previously, it has been proposed that DNA secondary structures, such as qDNA, may be involved in the telomerase mechanism. DNA/RNA hybrid structures occur at the active site of telomerase. The results presented in the current work show that if telomeric DNA was folded into a qDNA structure, it is possible for a DNA/RNA hybrid to form as is required during template alignment. The discrimination of ligand ( 1) for binding to the bimolecular qDNA over the DNA/RNA hybrid positions it as a useful compound for probing the role(s), if any, of antiparallel qDNA in the telomerase mechanism.
Synthesis of Bipartite Tetracysteine PNA Probes for DNA In Situ Fluorescent Labeling.
Fang, Ge-Min; Seitz, Oliver
2017-12-24
"Label-free" fluorescent probes that avoid additional steps or building blocks for conjugation of fluorescent dyes with oligonucleotides can significantly reduce the time and cost of parallel bioanalysis of a large number of nucleic acid samples. A method for the synthesis of "label-free" bicysteine-modified PNA probes using solid-phase synthesis and procedures for sequence-specific DNA in situ fluorescent labeling is described here. The concept is based on the adjacent alignment of two bicysteine-modified peptide nucleic acids on a DNA target to form a structurally optimized bipartite tetracysteine motif, which induces a sequence-specific fluorogenic reaction with commercially available biarsenic dyes, even in complex media such as cell lysate. This unit will help researchers to quickly synthesize bipartite tetracysteine PNA probes and carry out low-cost DNA in situ fluorescent labeling experiments. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.
Chen, Tianbao; Gagliardo, Ron; Walker, Brian; Zhou, Mei; Shaw, Chris
2005-12-01
Phylloxin is a novel prototype antimicrobial peptide from the skin of Phyllomedusa bicolor. Here, we describe parallel identification and sequencing of phylloxin precursor transcript (mRNA) and partial gene structure (genomic DNA) from the same sample of lyophilized skin secretion using our recently-described cloning technique. The open-reading frame of the phylloxin precursor was identical in nucleotide sequence to that previously reported and alignment with the nucleotide sequence derived from genomic DNA indicated the presence of a 175 bp intron located in a near identical position to that found in the dermaseptins. The highly-conserved structural organization of skin secretion peptide genes in P. bicolor can thus be extended to include that encoding phylloxin (plx). These data further reinforce our assertion that application of the described methodology can provide robust genomic/transcriptomic/peptidomic data without the need for specimen sacrifice.
Liquid crystalline pattern formation in drying droplets of biopolymers
NASA Astrophysics Data System (ADS)
Smalyukh, Ivan; Zribi, Olena; Butler, John; Lavrentovich, Oleg; Wong, Gerard
2006-03-01
When a droplet of DNA in water dries out, a ring-like deposit is observed along the perimeter, similar to the stains in spilled drops of coffee. However, the dried ring of DNA is a self-similar birefringent pattern composed of extended molecules. We examine dynamics of the pattern formation at the droplet's rim. This gives us an insight into the underlining physics. During the major part of drying process the contact line is pinned so that DNA molecules are brought to the perimeter and extended by the radial capillary flow. Lyotropic nematic phase is formed in which highly concentrated DNA aligns along the triple line to minimize elastic energy. When the contact angle becomes small, the contact line starts to retract and the radial dilative stress causes buckling distortions at the rim which then propagate deep into the elastic liquid- crystalline medium and give rise to the pattern.
Single-Molecule Denaturation Mapping of DNA in Nanofluidic Channels
NASA Astrophysics Data System (ADS)
Reisner, Walter; Larsen, Niels; Silahtaroglu, Asli; Kristensen, Anders; Tommerup, Niels; Tegenfeldt, Jonas O.; Flyvbjerg, Henrik
2010-03-01
Nanochannel based DNA stretching can serve as a platform for a new optical mapping technique based on measuring the pattern of partial melting along the extended molecules. We partially melt DNA extended in nanofluidic channels via a combination of local heating and added chemical denaturants. The melted molecules, imaged via a standard fluorescence videomicroscopy setup, exhibit a nonuniform fluorescence profile corresponding to a series of local dips and peaks in the intensity trace along the stretched molecule. We show that this barcode is consistent with the presence of locally melted regions along the molecule and can be explained by calculations of sequence-dependent melting probability. Specifically, we obtain experimental melting profiles for T4, T7, lambda-phage and bacterial artificial chromosome DNA (from human chromosome 12) and compare these profiles to theory. In addition, we demonstrate that the BAC melting profile can be used to align the BAC to its correct position on chromosome 12.
The post-genomic era of biological network alignment.
Faisal, Fazle E; Meng, Lei; Crawford, Joseph; Milenković, Tijana
2015-12-01
Biological network alignment aims to find regions of topological and functional (dis)similarities between molecular networks of different species. Then, network alignment can guide the transfer of biological knowledge from well-studied model species to less well-studied species between conserved (aligned) network regions, thus complementing valuable insights that have already been provided by genomic sequence alignment. Here, we review computational challenges behind the network alignment problem, existing approaches for solving the problem, ways of evaluating their alignment quality, and the approaches' biomedical applications. We discuss recent innovative efforts of improving the existing view of network alignment. We conclude with open research questions in comparative biological network research that could further our understanding of principles of life, evolution, disease, and therapeutics.
SARA-Coffee web server, a tool for the computation of RNA sequence and structure multiple alignments
Di Tommaso, Paolo; Bussotti, Giovanni; Kemena, Carsten; Capriotti, Emidio; Chatzou, Maria; Prieto, Pablo; Notredame, Cedric
2014-01-01
This article introduces the SARA-Coffee web server; a service allowing the online computation of 3D structure based multiple RNA sequence alignments. The server makes it possible to combine sequences with and without known 3D structures. Given a set of sequences SARA-Coffee outputs a multiple sequence alignment along with a reliability index for every sequence, column and aligned residue. SARA-Coffee combines SARA, a pairwise structural RNA aligner with the R-Coffee multiple RNA aligner in a way that has been shown to improve alignment accuracy over most sequence aligners when enough structural data is available. The server can be accessed from http://tcoffee.crg.cat/apps/tcoffee/do:saracoffee. PMID:24972831
The Impact of Alignment Coaching on Christian Teachers' Worthy Performance
ERIC Educational Resources Information Center
Hines, Linda M.
2010-01-01
"The Impact of Alignment Coaching on Christian Teachers' Worthy Performance" uses Human Performance Technology and "teleonomics" (Gilbert 2007) to document several intersecting vantage points as one performance improvement system of alignment coaching (AC). Coaching relationships and accomplishments of consistently (daily) reading the Bible,…
Spatiotemporal alignment of in utero BOLD-MRI series.
Turk, Esra Abaci; Luo, Jie; Gagoski, Borjan; Pascau, Javier; Bibbo, Carolina; Robinson, Julian N; Grant, P Ellen; Adalsteinsson, Elfar; Golland, Polina; Malpica, Norberto
2017-08-01
To present a method for spatiotemporal alignment of in-utero magnetic resonance imaging (MRI) time series acquired during maternal hyperoxia for enabling improved quantitative tracking of blood oxygen level-dependent (BOLD) signal changes that characterize oxygen transport through the placenta to fetal organs. The proposed pipeline for spatiotemporal alignment of images acquired with a single-shot gradient echo echo-planar imaging includes 1) signal nonuniformity correction, 2) intravolume motion correction based on nonrigid registration, 3) correction of motion and nonrigid deformations across volumes, and 4) detection of the outlier volumes to be discarded from subsequent analysis. BOLD MRI time series collected from 10 pregnant women during 3T scans were analyzed using this pipeline. To assess pipeline performance, signal fluctuations between consecutive timepoints were examined. In addition, volume overlap and distance between manual region of interest (ROI) delineations in a subset of frames and the delineations obtained through propagation of the ROIs from the reference frame were used to quantify alignment accuracy. A previously demonstrated rigid registration approach was used for comparison. The proposed pipeline improved anatomical alignment of placenta and fetal organs over the state-of-the-art rigid motion correction methods. In particular, unexpected temporal signal fluctuations during the first normoxia period were significantly decreased (P < 0.01) and volume overlap and distance between region boundaries measures were significantly improved (P < 0.01). The proposed approach to align MRI time series enables more accurate quantitative studies of placental function by improving spatiotemporal alignment across placenta and fetal organs. 1 Technical Efficacy: Stage 1 J. MAGN. RESON. IMAGING 2017;46:403-412. © 2017 International Society for Magnetic Resonance in Medicine.
DNA curtains for high-throughput single-molecule optical imaging.
Greene, Eric C; Wind, Shalom; Fazio, Teresa; Gorman, Jason; Visnapuu, Mari-Liis
2010-01-01
Single-molecule approaches provide a valuable tool in the arsenal of the modern biologist, and new discoveries continue to be made possible through the use of these state-of-the-art technologies. However, it can be inherently difficult to obtain statistically relevant data from experimental approaches specifically designed to probe individual reactions. This problem is compounded with more complex biochemical reactions, heterogeneous systems, and/or reactions requiring the use of long DNA substrates. Here we give an overview of a technology developed in our laboratory, which relies upon simple micro- or nanofabricated structures in combination with "bio-friendly" lipid bilayers, to align thousands of long DNA molecules into defined patterns on the surface of a microfluidic sample chamber. We call these "DNA curtains," and we have developed several different versions varying in complexity and DNA substrate configuration, which are designed to meet different experimental needs. This novel approach to single-molecule imaging provides a powerful experimental platform that offers the potential for concurrent observation of hundreds or even thousands of protein-DNA interactions in real time. Copyright 2010 Elsevier Inc. All rights reserved.
Dielectrophoresis of gold nanoparticles conjugated to DNA origami structures
Wiens, Matthew; Lakatos, Mathias; Heerwig, Andreas; Ostermaier, Frieder; Haufe, Nora
2016-01-01
Summary DNA nanostructures are promising construction materials to bridge the gap between self-assembly of functional molecules and conventional top-down fabrication methods in nanotechnology. Their positioning onto specific locations of a microstructured substrate is an important task towards this aim. Here we study manipulation and positioning of pristine and of gold nanoparticle-conjugated tubular DNA origami structures using ac dielectrophoresis. The dielectrophoretic behavior was investigated employing fluorescence microscopy. For the pristine origami, a significant dielectrophoretic response was found to take place in the megahertz range, whereas, due to the higher polarizability of the metallic nanoparticles, the nanoparticle/DNA hybrid structures required a lower electrical field strength and frequency for a comparable trapping at the edges of the electrode structure. The nanoparticle conjugation additionally resulted in a remarkable alteration of the DNA structure arrangement. The growth of linear, chain-like structures in between electrodes at applied frequencies in the megahertz range was observed. The long-range chain formation is caused by a local, gold nanoparticle-induced field concentration along the DNA nanostructures, which in turn, creates dielectrophoretic forces that enable the observed self-alignment of the hybrid structures. PMID:27547612
Guidugli, Lucia; Shimelis, Hermela; Masica, David L; Pankratz, Vernon S; Lipton, Gary B; Singh, Namit; Hu, Chunling; Monteiro, Alvaro N A; Lindor, Noralane M; Goldgar, David E; Karchin, Rachel; Iversen, Edwin S; Couch, Fergus J
2018-01-17
Many variants of uncertain significance (VUS) have been identified in BRCA2 through clinical genetic testing. VUS pose a significant clinical challenge because the contribution of these variants to cancer risk has not been determined. We conducted a comprehensive assessment of VUS in the BRCA2 C-terminal DNA binding domain (DBD) by using a validated functional assay of BRCA2 homologous recombination (HR) DNA-repair activity and defined a classifier of variant pathogenicity. Among 139 variants evaluated, 54 had ≥99% probability of pathogenicity, and 73 had ≥95% probability of neutrality. Functional assay results were compared with predictions of variant pathogenicity from the Align-GVGD protein-sequence-based prediction algorithm, which has been used for variant classification. Relative to the HR assay, Align-GVGD significantly (p < 0.05) over-predicted pathogenic variants. We subsequently combined functional and Align-GVGD prediction results in a Bayesian hierarchical model (VarCall) to estimate the overall probability of pathogenicity for each VUS. In addition, to predict the effects of all other BRCA2 DBD variants and to prioritize variants for functional studies, we used the endoPhenotype-Optimized Sequence Ensemble (ePOSE) algorithm to train classifiers for BRCA2 variants by using data from the HR functional assay. Together, the results show that systematic functional assays in combination with in silico predictors of pathogenicity provide robust tools for clinical annotation of BRCA2 VUS. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Wood, David L. A.; Nones, Katia; Steptoe, Anita; Christ, Angelika; Harliwong, Ivon; Newell, Felicity; Bruxner, Timothy J. C.; Miller, David; Cloonan, Nicole; Grimmond, Sean M.
2015-01-01
Genetic variation modulates gene expression transcriptionally or post-transcriptionally, and can profoundly alter an individual’s phenotype. Measuring allelic differential expression at heterozygous loci within an individual, a phenomenon called allele-specific expression (ASE), can assist in identifying such factors. Massively parallel DNA and RNA sequencing and advances in bioinformatic methodologies provide an outstanding opportunity to measure ASE genome-wide. In this study, matched DNA and RNA sequencing, genotyping arrays and computationally phased haplotypes were integrated to comprehensively and conservatively quantify ASE in a single human brain and liver tissue sample. We describe a methodological evaluation and assessment of common bioinformatic steps for ASE quantification, and recommend a robust approach to accurately measure SNP, gene and isoform ASE through the use of personalized haplotype genome alignment, strict alignment quality control and intragenic SNP aggregation. Our results indicate that accurate ASE quantification requires careful bioinformatic analyses and is adversely affected by sample specific alignment confounders and random sampling even at moderate sequence depths. We identified multiple known and several novel ASE genes in liver, including WDR72, DSP and UBD, as well as genes that contained ASE SNPs with imbalance direction discordant with haplotype phase, explainable by annotated transcript structure, suggesting isoform derived ASE. The methods evaluated in this study will be of use to researchers performing highly conservative quantification of ASE, and the genes and isoforms identified as ASE of interest to researchers studying those loci. PMID:25965996
Drosophila CTCF tandemly aligns with other insulator proteins at the borders of H3K27me3 domains.
Van Bortle, Kevin; Ramos, Edward; Takenaka, Naomi; Yang, Jingping; Wahi, Jessica E; Corces, Victor G
2012-11-01
Several multiprotein DNA complexes capable of insulator activity have been identified in Drosophila melanogaster, yet only CTCF, a highly conserved zinc finger protein, and the transcription factor TFIIIC have been shown to function in mammals. CTCF is involved in diverse nuclear activities, and recent studies suggest that the proteins with which it associates and the DNA sequences that it targets may underlie these various roles. Here we show that the Drosophila homolog of CTCF (dCTCF) aligns in the genome with other Drosophila insulator proteins such as Suppressor of Hairy wing [SU(HW)] and Boundary Element Associated Factor of 32 kDa (BEAF-32) at the borders of H3K27me3 domains, which are also enriched for associated insulator proteins and additional cofactors. RNAi depletion of dCTCF and combinatorial knockdown of gene expression for other Drosophila insulator proteins leads to a reduction in H3K27me3 levels within repressed domains, suggesting that insulators are important for the maintenance of appropriate repressive chromatin structure in Polycomb (Pc) domains. These results shed new insights into the roles of insulators in chromatin domain organization and support recent models suggesting that insulators underlie interactions important for Pc-mediated repression. We reveal an important relationship between dCTCF and other Drosophila insulator proteins and speculate that vertebrate CTCF may also align with other nuclear proteins to accomplish similar functions.
Drosophila CTCF tandemly aligns with other insulator proteins at the borders of H3K27me3 domains
Van Bortle, Kevin; Ramos, Edward; Takenaka, Naomi; Yang, Jingping; Wahi, Jessica E.; Corces, Victor G.
2012-01-01
Several multiprotein DNA complexes capable of insulator activity have been identified in Drosophila melanogaster, yet only CTCF, a highly conserved zinc finger protein, and the transcription factor TFIIIC have been shown to function in mammals. CTCF is involved in diverse nuclear activities, and recent studies suggest that the proteins with which it associates and the DNA sequences that it targets may underlie these various roles. Here we show that the Drosophila homolog of CTCF (dCTCF) aligns in the genome with other Drosophila insulator proteins such as Suppressor of Hairy wing [SU(HW)] and Boundary Element Associated Factor of 32 kDa (BEAF-32) at the borders of H3K27me3 domains, which are also enriched for associated insulator proteins and additional cofactors. RNAi depletion of dCTCF and combinatorial knockdown of gene expression for other Drosophila insulator proteins leads to a reduction in H3K27me3 levels within repressed domains, suggesting that insulators are important for the maintenance of appropriate repressive chromatin structure in Polycomb (Pc) domains. These results shed new insights into the roles of insulators in chromatin domain organization and support recent models suggesting that insulators underlie interactions important for Pc-mediated repression. We reveal an important relationship between dCTCF and other Drosophila insulator proteins and speculate that vertebrate CTCF may also align with other nuclear proteins to accomplish similar functions. PMID:22722341
Efficient pairwise RNA structure prediction using probabilistic alignment constraints in Dynalign
2007-01-01
Background Joint alignment and secondary structure prediction of two RNA sequences can significantly improve the accuracy of the structural predictions. Methods addressing this problem, however, are forced to employ constraints that reduce computation by restricting the alignments and/or structures (i.e. folds) that are permissible. In this paper, a new methodology is presented for the purpose of establishing alignment constraints based on nucleotide alignment and insertion posterior probabilities. Using a hidden Markov model, posterior probabilities of alignment and insertion are computed for all possible pairings of nucleotide positions from the two sequences. These alignment and insertion posterior probabilities are additively combined to obtain probabilities of co-incidence for nucleotide position pairs. A suitable alignment constraint is obtained by thresholding the co-incidence probabilities. The constraint is integrated with Dynalign, a free energy minimization algorithm for joint alignment and secondary structure prediction. The resulting method is benchmarked against the previous version of Dynalign and against other programs for pairwise RNA structure prediction. Results The proposed technique eliminates manual parameter selection in Dynalign and provides significant computational time savings in comparison to prior constraints in Dynalign while simultaneously providing a small improvement in the structural prediction accuracy. Savings are also realized in memory. In experiments over a 5S RNA dataset with average sequence length of approximately 120 nucleotides, the method reduces computation by a factor of 2. The method performs favorably in comparison to other programs for pairwise RNA structure prediction: yielding better accuracy, on average, and requiring significantly lesser computational resources. Conclusion Probabilistic analysis can be utilized in order to automate the determination of alignment constraints for pairwise RNA structure prediction methods in a principled fashion. These constraints can reduce the computational and memory requirements of these methods while maintaining or improving their accuracy of structural prediction. This extends the practical reach of these methods to longer length sequences. The revised Dynalign code is freely available for download. PMID:17445273
Diffeomorphic functional brain surface alignment: Functional demons.
Nenning, Karl-Heinz; Liu, Hesheng; Ghosh, Satrajit S; Sabuncu, Mert R; Schwartz, Ernst; Langs, Georg
2017-08-01
Aligning brain structures across individuals is a central prerequisite for comparative neuroimaging studies. Typically, registration approaches assume a strong association between the features used for alignment, such as macro-anatomy, and the variable observed, such as functional activation or connectivity. Here, we propose to use the structure of intrinsic resting state fMRI signal correlation patterns as a basis for alignment of the cortex in functional studies. Rather than assuming the spatial correspondence of functional structures between subjects, we have identified locations with similar connectivity profiles across subjects. We mapped functional connectivity relationships within the brain into an embedding space, and aligned the resulting maps of multiple subjects. We then performed a diffeomorphic alignment of the cortical surfaces, driven by the corresponding features in the joint embedding space. Results show that functional alignment based on resting state fMRI identifies functionally homologous regions across individuals with higher accuracy than alignment based on the spatial correspondence of anatomy. Further, functional alignment enables measurement of the strength of the anatomo-functional link across the cortex, and reveals the uneven distribution of this link. Stronger anatomo-functional dissociation was found in higher association areas compared to primary sensory- and motor areas. Functional alignment based on resting state features improves group analysis of task based functional MRI data, increasing statistical power and improving the delineation of task-specific core regions. Finally, a comparison of the anatomo-functional dissociation between cohorts is demonstrated with a group of left and right handed subjects. Copyright © 2017 Elsevier Inc. All rights reserved.
DeBoy, Robert T; Mongodin, Emmanuel F; Emerson, Joanne B; Nelson, Karen E
2006-04-01
In the present study, the chromosomes of two members of the Thermotogales were compared. A whole-genome alignment of Thermotoga maritima MSB8 and Thermotoga neapolitana NS-E has revealed numerous large-scale DNA rearrangements, most of which are associated with CRISPR DNA repeats and/or tRNA genes. These DNA rearrangements do not include the putative origin of DNA replication but move within the same replichore, i.e., the same replicating half of the chromosome (delimited by the replication origin and terminus). Based on cumulative GC skew analysis, both the T. maritima and T. neapolitana lineages contain one or two major inverted DNA segments. Also, based on PCR amplification and sequence analysis of the DNA joints that are associated with the major rearrangements, the overall chromosome architecture was found to be conserved at most DNA joints for other strains of T. neapolitana. Taken together, the results from this analysis suggest that the observed chromosomal rearrangements in the Thermotogales likely occurred by successive inversions after their divergence from a common ancestor and before strain diversification. Finally, sequence analysis shows that size polymorphisms in the DNA joints associated with CRISPRs can be explained by expansion and possibly contraction of the DNA repeat and spacer unit, providing a tool for discerning the relatedness of strains from different geographic locations.
Polymorphisms in the leptin gene promoter in Brazilian beef herds.
Guimarães, R C; Azevedo, J S N; Corrêa, S C; Campelo, J E G; Barbosa, E M; Gonçalves, E C; Silva Filho, E
2016-12-02
Brazil is the world's largest producer of beef cattle; however, the quality of its herds needs to be improved. The use of molecular markers as auxiliary tools in selecting animals for reproduction with high pattern for beef production would significantly improve the quality of the final beef product in Brazil. The leptin gene has been demonstrated to be an excellent candidate gene for bovine breeding. The objective of this study was to sequence and compare the leptin gene promoter of Brazil's important cattle breeds in order to identify polymorphisms in it. Blood samples of the Nellore, Guzerat, Tabapuã, and Senepol breeds were collected for genomic DNA extraction. The genomic DNA was used as a template for polymerase chain reaction (PCR) to amplify a 1575-bp fragment, which in turn was sequenced, aligned, and compared between animals of different breeds. Twenty-three single nucleotide polymorphic sites, including transitions and transversions, were detected at positions -1457, -1452, -1446, -1397, -1392, -1361, -1238, -963,-901, -578, -516, -483, -478, -470, -432, -430, -292, -282, -272, -211, -202, -170, and -147. Additionally, two insertion sites at positions -680 and -416 and two deletion sites at positions -1255 and -1059 were detected. As the promoter region of the leptin gene has been demonstrated to vary among breeds, these variations must be tested for their use as potential molecular markers for artificial selection of animals for enhanced beef production in different systems of bovine production in Brazil.
Instructional Alignment under No Child Left Behind
ERIC Educational Resources Information Center
Polikoff, Morgan S.
2012-01-01
The alignment of instruction with the content of standards and assessments is the key mediating variable separating the policy of standards-based reform (SBR) from the outcome of improved student achievement. Few studies have investigated SBR's effects on instructional alignment, and most have serious methodological limitations. This research uses…
Alignment of Human Resource Practices and Teacher Performance Competency
ERIC Educational Resources Information Center
Heneman III, Herbert G.; Milanowski, Anthony T.
2004-01-01
In this article, we argue that human resource (HR) management practices are important components of strategies for improving student achievement in an accountability environment. We present a framework illustrating the alignment of educational HR management practices to a teacher performance competency model, which in turn is aligned with student…
Chunk Alignment for Corpus-Based Machine Translation
ERIC Educational Resources Information Center
Kim, Jae Dong
2011-01-01
Since sub-sentential alignment is critically important to the translation quality of an Example-Based Machine Translation (EBMT) system, which operates by finding and combining phrase-level matches against the training examples, we developed a new alignment algorithm for the purpose of improving the EBMT system's performance. This new…
Improving Business-IT Alignment through Business Architecture
ERIC Educational Resources Information Center
Li, Chingmei
2010-01-01
The business and Information Technology (IT) alignment issue has become one of the Top-10 IT management issues since 1980. IT has continually strived to achieve alignment with business goals and objectives. These IT efforts include ERP implementation to benefit from the best practices; data center consolidation and server virtualization to keep…
High voltage electrophoretic deposition for electrochemical energy storage and other applications
NASA Astrophysics Data System (ADS)
Santhanagopalan, Sunand
High voltage electrophoretic deposition (HVEPD) has been developed as a novel technique to obtain vertically aligned forests of one-dimensional nanomaterials for efficient energy storage. The ability to control and manipulate nanomaterials is critical for their effective usage in a variety of applications. Oriented structures of one-dimensional nanomaterials provide a unique opportunity to take full advantage of their excellent mechanical and electrochemical properties. However, it is still a significant challenge to obtain such oriented structures with great process flexibility, ease of processing under mild conditions and the capability to scale up, especially in context of efficient device fabrication and system packaging. This work presents HVEPD as a simple, versatile and generic technique to obtain vertically aligned forests of different one-dimensional nanomaterials on flexible, transparent and scalable substrates. Improvements on material chemistry and reduction of contact resistance have enabled the fabrication of high power supercapacitor electrodes using the HVEPD method. The investigations have also paved the way for further enhancements of performance by employing hybrid material systems and AC/DC pulsed deposition. Multi-walled carbon nanotubes (MWCNTs) were used as the starting material to demonstrate the HVEPD technique. A comprehensive study of the key parameters was conducted to better understand the working mechanism of the HVEPD process. It has been confirmed that HVEPD was enabled by three key factors: high deposition voltage for alignment, low dispersion concentration to avoid aggregation and simultaneous formation of holding layer by electrodeposition for reinforcement of nanoforests. A set of suitable parameters were found to obtain vertically aligned forests of MWCNTs. Compared with their randomly oriented counterparts, the aligned MWCNT forests showed better electrochemical performance, lower electrical resistance and a capability to achieve superhydrophpbicity, indicating their potential in a broad range of applications. The versatile and generic nature of the HVEPD process has been demonstrated by achieving deposition on flexible and transparent substrates, as well as aligned forests of manganese dioxide (MnO2) nanorods. A continuous roll-printing HVEPD approach was then developed to obtain aligned MWCNT forest with low contact resistance on large, flexible substrates. Such large-scale electrodes showed no deterioration in electrochemical performance and paved the way for practical device fabrication. The effect of a holding layer on the contact resistance between aligned MWCNT forests and the substrate was studied to improve electrochemical performance of such electrodes. It was found that a suitable precursor salt like nickel chloride could be used to achieve a conductive holding layer which helped to significantly reduce the contact resistance. This in turn enhanced the electrochemical performance of the electrodes. High-power scalable redox capacitors were then prepared using HVEPD. Very high power/energy densities and excellent cyclability have been achieved by synergistically combining hydrothermally synthesized, highly crystalline α-MnO 2 nanorods, vertically aligned forests and reduced contact resistance. To further improve the performance, hybrid electrodes have been prepared in the form of vertically aligned forest of MWCNTs with branches of α-MnO 2 nanorods on them. Large- scale electrodes with such hybrid structures were manufactured using continuous HVEPD and characterized, showing further improved power and energy densities. The alignment quality and density of MWCNT forests were also improved by using an AC/DC pulsed deposition technique. In this case, AC voltage was first used to align the MWCNTs, followed by immediate DC voltage to deposit the aligned MWCNTs along with the conductive holding layer. Decoupling of alignment from deposition was proven to result in better alignment quality and higher electrochemical performance.
Tsui, Nancy B. Y.; Jiang, Peiyong; Chow, Katherine C. K.; Su, Xiaoxi; Leung, Tak Y.; Sun, Hao; Chan, K. C. Allen; Chiu, Rossa W. K.; Lo, Y. M. Dennis
2012-01-01
Background Fetal DNA in maternal urine, if present, would be a valuable source of fetal genetic material for noninvasive prenatal diagnosis. However, the existence of fetal DNA in maternal urine has remained controversial. The issue is due to the lack of appropriate technology to robustly detect the potentially highly degraded fetal DNA in maternal urine. Methodology We have used massively parallel paired-end sequencing to investigate cell-free DNA molecules in maternal urine. Catheterized urine samples were collected from seven pregnant women during the third trimester of pregnancies. We detected fetal DNA by identifying sequenced reads that contained fetal-specific alleles of the single nucleotide polymorphisms. The sizes of individual urinary DNA fragments were deduced from the alignment positions of the paired reads. We measured the fractional fetal DNA concentration as well as the size distributions of fetal and maternal DNA in maternal urine. Principal Findings Cell-free fetal DNA was detected in five of the seven maternal urine samples, with the fractional fetal DNA concentrations ranged from 1.92% to 4.73%. Fetal DNA became undetectable in maternal urine after delivery. The total urinary cell-free DNA molecules were less intact when compared with plasma DNA. Urinary fetal DNA fragments were very short, and the most dominant fetal sequences were between 29 bp and 45 bp in length. Conclusions With the use of massively parallel sequencing, we have confirmed the existence of transrenal fetal DNA in maternal urine, and have shown that urinary fetal DNA was heavily degraded. PMID:23118982
Radiographic parameters improve lower extremity prosthetic alignment.
Mooney, Ryan; Carry, Patrick; Wylie, Erin; Schultz, Abby; McNair, Bryan; Page, Carol; Biffl, Susan; Heare, Travis
2013-12-01
The goal of prosthetic fitting is to provide comfort and functionality to the patient. It is thought that incorporating the use of standing anterior-posterior long leg radiographs (LLR) into the fitting of lower extremity prostheses will provide an objective guide when making adjustments, and be a better assessment of alignment. This study compares prosthetic alignment before and after radiography-guided adjustments. This retrospective study was performed at a multidisciplinary amputee clinic on patients with congenital and/or acquired limb deficiencies. Their prosthetic alignment was evaluated by LLR and adjusted as needed. Satisfactory alignment was defined as a mechanical axis angular deviation of ≤1° and a leg length discrepancy of ≤10 mm. A total of 45 unique prostheses from 24 subjects (10 female and 14 male) were included. Post-adjustment radiographs were obtained from 29 prostheses. After the initial prosthetic fitting, the probability of a satisfactory fit was 20.0 % (95 % CI 10.9-34.9 %). Following the baseline adjustment, the probability of a satisfactory fit improved to 53.3 % (95 % CI 37.5-70.9 %). After adjustment number 4, the probability of a satisfactory fit further improved to 76.7 % (95 % CI 41.9-98.0 %). There were also significant improvements in distal offset distance (p = 0.0040) and leg length discrepancy (p = 0.0206). The distal offset distance decreased by an average of 10.7 mm (95 % CI 3.6-17.8), and leg length discrepancy decreased by an average of 3.0 mm (95 % CI 00.48-5.5). The addition of LLRs to existing fitting methods significantly improves prosthetic alignment and length.
TU-H-CAMPUS-TeP2-04: Measurement of Stereotactic Output Factors with DNA Double-Strand Breaks
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cline, K; Obeidat, M; Stathakis, S
Purpose: Radiotherapy treatment is specified by radiation dose prescriptions, but biological DNA damage actually controls treatment effectiveness. It is impractical to directly measure dose in the clinic, so we measure quantities, such as collected charge, and calculate the relationship to dose. At small fields, such as those in stereotactic radiosurgery (SRS), charged-particle equilibrium (CPE) breaks down and the accuracy of the measurement for delivered dose decreases. By measuring DNA double-strand breaks (DSB) directly, we believe treatment accuracy could improve by providing a more meaningful measurement. Methods: A DNA dosimeter, consisting of magnetic streptavidin beads attached to 4 kilobase pair DNAmore » strands labeled with biotin and fluorescein amidite (FAM) on opposing ends, was suspended in phosphate-buffered saline (PBS). Twenty µL samples were placed in plastic micro-capillary tubes inside a water tank setup and irradiated with 10 cm, 3 cm, 1.25 cm, 0.75 cm, and 0.5 cm radiation field sizes, where the three smallest sizes were cones. After irradiation, the dosimeters were mechanically separated into beads (intact DNA) and supernatant (broken DNA/FAM) using a magnet. The fluorescence was read and the probability of DSB was calculated. This was used to calculate the output factor for an SRS beam and compared to that measured using a diode detector. Results: The output factors relative to a 10 cm field were 0.89±0.07, 0.76±0.08, 0.59±0.04, and 0.78±0.12 for the field sizes of 3 cm, 1.25 cm, 0.75 cm, and 0.5 cm, respectively. Some of the diode measurements do not fall within these uncertainties. Conclusion: This was the first attempt to measure output factors in a water tank with the DNA dosimeter. Although differences compared to the diode were observed, the uncertainty analysis ignored systematic errors. For future work, we will repeat this experiment to quantify and correct systematic errors, such as those caused by positional alignment and sample contamination. This work was funded in part by CPRIT (RP140105).« less
COACH: profile-profile alignment of protein families using hidden Markov models.
Edgar, Robert C; Sjölander, Kimmen
2004-05-22
Alignments of two multiple-sequence alignments, or statistical models of such alignments (profiles), have important applications in computational biology. The increased amount of information in a profile versus a single sequence can lead to more accurate alignments and more sensitive homolog detection in database searches. Several profile-profile alignment methods have been proposed and have been shown to improve sensitivity and alignment quality compared with sequence-sequence methods (such as BLAST) and profile-sequence methods (e.g. PSI-BLAST). Here we present a new approach to profile-profile alignment we call Comparison of Alignments by Constructing Hidden Markov Models (HMMs) (COACH). COACH aligns two multiple sequence alignments by constructing a profile HMM from one alignment and aligning the other to that HMM. We compare the alignment accuracy of COACH with two recently published methods: Yona and Levitt's prof_sim and Sadreyev and Grishin's COMPASS. On two sets of reference alignments selected from the FSSP database, we find that COACH is able, on average, to produce alignments giving the best coverage or the fewest errors, depending on the chosen parameter settings. COACH is freely available from www.drive5.com/lobster
Bergman, C M; Kreitman, M
2001-08-01
Comparative genomic approaches to gene and cis-regulatory prediction are based on the principle that differential DNA sequence conservation reflects variation in functional constraint. Using this principle, we analyze noncoding sequence conservation in Drosophila for 40 loci with known or suspected cis-regulatory function encompassing >100 kb of DNA. We estimate the fraction of noncoding DNA conserved in both intergenic and intronic regions and describe the length distribution of ungapped conserved noncoding blocks. On average, 22%-26% of noncoding sequences surveyed are conserved in Drosophila, with median block length approximately 19 bp. We show that point substitution in conserved noncoding blocks exhibits transition bias as well as lineage effects in base composition, and occurs more than an order of magnitude more frequently than insertion/deletion (indel) substitution. Overall, patterns of noncoding DNA structure and evolution differ remarkably little between intergenic and intronic conserved blocks, suggesting that the effects of transcription per se contribute minimally to the constraints operating on these sequences. The results of this study have implications for the development of alignment and prediction algorithms specific to noncoding DNA, as well as for models of cis-regulatory DNA sequence evolution.
Modular structural elements in the replication origin region of Tetrahymena rDNA.
Du, C; Sanzgiri, R P; Shaiu, W L; Choi, J K; Hou, Z; Benbow, R M; Dobbs, D L
1995-01-01
Computer analyses of the DNA replication origin region in the amplified rRNA genes of Tetrahymena thermophila identified a potential initiation zone in the 5'NTS [Dobbs, Shaiu and Benbow (1994), Nucleic Acids Res. 22, 2479-2489]. This region consists of a putative DNA unwinding element (DUE) aligned with predicted bent DNA segments, nuclear matrix or scaffold associated region (MAR/SAR) consensus sequences, and other common modular sequence elements previously shown to be clustered in eukaryotic chromosomal origin regions. In this study, two mung bean nuclease-hypersensitive sites in super-coiled plasmid DNA were localized within the major DUE-like element predicted by thermodynamic analyses. Three restriction fragments of the 5'NTS region predicted to contain bent DNA segments exhibited anomalous migration characteristic of bent DNA during electrophoresis on polyacrylamide gels. Restriction fragments containing the 5'NTS region bound Tetrahymena nuclear matrices in an in vitro binding assay, consistent with an association of the replication origin region with the nuclear matrix in vivo. The direct demonstration in a protozoan origin region of elements previously identified in Drosophila, chick and mammalian origin regions suggests that clusters of modular structural elements may be a conserved feature of eukaryotic chromosomal origins of replication. Images PMID:7784181
Seligmann, Hervé
2016-07-01
Swinger DNAs are sequences whose homology with known sequences is detected only by assuming systematic exchanges between nucleotides. Nine symmetric (X<->Y, i.e. A<->C) and fourteen asymmetric (X->Y->Z, i.e. A->C->G) exchanges exist. All swinger DNA previously detected in GenBank follow the A<->T+C<->G exchange, while mitochondrial swinger RNAs distribute among different swinger types. Here different alignment criteria detect 87 additional swinger mitochondrial DNAs (86 from insects), including the first swinger gene embedded within a complete genome, corresponding to the mitochondrial 16S rDNA of the stonefly Kamimuria wangi. Other Kamimuria mt genome regions are "regular", stressing unanswered questions on (a) swinger polymerization regulation; (b) swinger 16S rDNA functions; and (c) specificity to rDNA, in particular 16S rDNA. Sharp switches between regular and swinger replication, together with previous observations on swinger transcription, suggest that swinger replication might be due to a switch in polymerization mode of regular polymerases and the possibility of swinger-encoded information, predicted in primordial genes such as rDNA.
A periodic pattern of SNPs in the human genome
Madsen, Bo Eskerod; Villesen, Palle; Wiuf, Carsten
2007-01-01
By surveying a filtered, high-quality set of SNPs in the human genome, we have found that SNPs positioned 1, 2, 4, 6, or 8 bp apart are more frequent than SNPs positioned 3, 5, 7, or 9 bp apart. The observed pattern is not restricted to genomic regions that are known to cause sequencing or alignment errors, for example, transposable elements (SINE, LINE, and LTR), tandem repeats, and large duplicated regions. However, we found that the pattern is almost entirely confined to what we define as “periodic DNA.” Periodic DNA is a genomic region with a high degree of periodicity in nucleotide usage. It turned out that periodic DNA is mainly small regions (average length 16.9 bp), widely distributed in the genome. Furthermore, periodic DNA has a 1.8 times higher SNP density than the rest of the genome and SNPs inside periodic DNA have a significantly higher genotyping error rate than SNPs outside periodic DNA. Our results suggest that not all SNPs in the human genome are created by independent single nucleotide mutations, and that care should be taken in analysis of SNPs from periodic DNA. The latter may have important consequences for SNP and association studies. PMID:17673700
NASA Technical Reports Server (NTRS)
Li, Jun; Koehne, Jessica; Chen, Hua; Cassell, Alan; Ng, Hou Tee; Ye, Qi; Han, Jie; Meyyappan, M.
2004-01-01
There is a strong need for faster, cheaper, and simpler methods for nucleic acid analysis in today s clinical tests. Nanotechnologies can potentially provide solutions to these requirements by integrating nanomaterials with biofunctionalities. Dramatic improvement in the sensitivity and multiplexing can be achieved through the high-degree miniaturization. Here, we present our study in the development of an ultrasensitive label-free electronic chip for DNA/RNA analysis based on carbon nanotube nanoelectrode arrays. A reliable nanoelectrode array based on vertically aligned multi-walled carbon nanotubes (MWNTs) embedded in a SiO2 matrix is fabricated using a bottom-up approach. Characteristic nanoelectrode behavior is observed with a low-density MWNT nanoelectrode array in measuring both the bulk and surface immobilized redox species. The open-end of MWNTs are found to present similar properties as graphite edge-plane electrodes, with a wide potential window, flexible chemical functionalities, and good biocompatibility. A BRCA1 related oligonucleotide probe with 18 bases is covalently functionalized at the open ends of the MWNTs and specifically hybridized with an oligonucleotide target as well as a PCR amplicon. The guanine bases in the target molecules are employed as the signal moieties for the electrochemical measurements. Ru(bpy)3(2+) mediator is used to further amplify the guanine oxidation signal. This technique has been employed for direct electrochemical detection of label-free PCR amplicon through specific hybridization with the BRCAl probe. The detection limit is estimated to be less than approximately 1000 DNA molecules, approaching the limit of the sensitivity by laser-based fluorescence techniques in DNA microarray. This system provides a general electronic platform for rapid molecular diagnostics in applications requiring ultrahigh sensitivity, high-degree of miniaturization, simple sample preparation, and low- cost operation.
Shaddel, Minoo; Ebrahimi, Mansour; Tabandeh, Mohammad Reza
2018-06-01
Toxoplasma gondii , is a causative agent of morbidity and mortality in immunocompromised and congenitally-infected individuals. Attempts to construct DNA vaccines against T. gondii using surface proteins are increasing. The dense granule antigens are highly expressed in the acute and chronic phases of T. gondii infection and considered as suitable DNA vaccine candidates to control toxoplasmosis. In the present study, bioinformatics tools and online software were used to predict, analyze and compare the structural, physical and chemical characters and immunogenicity of the GRA-1, GRA-4, GRA-6 and GRA-7 proteins. Sequence alignment results indicated that the GRA-1, GRA-4, GRA-6 and GRA-7 proteins had low similarity. The secondary structure prediction demonstrated that among the four proteins, GRA-1 and GRA-6 had similar secondary structure except for a little discrepancy. Hydrophilicity/hydrophobicity analysis showed multiple hydrophilic regions and some classical high hydrophilic domains for each protein sequence. Immunogenic epitope prediction results demonstrated that the GRA-1 and GRA-4 epitopes were stable and GRA-4 showed the highest degree of antigenicity. Although the GRA-7 epitope had the highest score of immunogenicity, this epitope was instable and had the lowest degree of antigenicity and half-time in eukaryotic cell. Also, the results indicated that GRA4-GRA7 epitope and GRA6-GRA7 had the highest degree of antigenicity and immunogenicity among multi-hybrid epitopes, respectively. Totally, in the present study, single epitopes showed the highest degree of antigenicity compared with multi-hybrid epitopes. Given the results, it can be concluded that GRA-4 and GRA-7 can be powerful DNA vaccine candidates against T. gondii .
Improving transmission efficiency of large sequence alignment/map (SAM) files.
Sakib, Muhammad Nazmus; Tang, Jijun; Zheng, W Jim; Huang, Chin-Tser
2011-01-01
Research in bioinformatics primarily involves collection and analysis of a large volume of genomic data. Naturally, it demands efficient storage and transfer of this huge amount of data. In recent years, some research has been done to find efficient compression algorithms to reduce the size of various sequencing data. One way to improve the transmission time of large files is to apply a maximum lossless compression on them. In this paper, we present SAMZIP, a specialized encoding scheme, for sequence alignment data in SAM (Sequence Alignment/Map) format, which improves the compression ratio of existing compression tools available. In order to achieve this, we exploit the prior knowledge of the file format and specifications. Our experimental results show that our encoding scheme improves compression ratio, thereby reducing overall transmission time significantly.
Improving a scissor-action couch for conformal arc radiotherapy and radiosurgery.
Li, Kaile; Yu, Cedric X; Ma, Lijun
2004-01-01
We have developed a method to improve the setup accuracy of a Varian Clinac 6/100 couch for delivering conformal arc therapy using a tertiary micro multileaf collimator (MLC) system. Several immobilization devices have been developed to improve the mechanical stability and isocenter alignment of the couch: turn-knob harnesses, double-track alignment plates, and a drop-in rod that attaches the couch to the concrete floor. These add-on components minimize the intercomponent motion of the couch's scissor elevator, which allows consistent treatment setup. The accuracy of our isocenter couch alignment is an improvement over the above devices, within 1 mm of their accuracy. The couch has been used with over 15 patients and with over 50 modulated conformal arc treatment deliveries at our institution.
Lyu, Weiwei
2017-01-01
Transfer alignment is always a key technology in a strapdown inertial navigation system (SINS) because of its rapidity and accuracy. In this paper a transfer alignment model is established, which contains the SINS error model and the measurement model. The time delay in the process of transfer alignment is analyzed, and an H∞ filtering method with delay compensation is presented. Then the H∞ filtering theory and the robust mechanism of H∞ filter are deduced and analyzed in detail. In order to improve the transfer alignment accuracy in SINS with time delay, an adaptive H∞ filtering method with delay compensation is proposed. Since the robustness factor plays an important role in the filtering process and has effect on the filtering accuracy, the adaptive H∞ filter with delay compensation can adjust the value of robustness factor adaptively according to the dynamic external environment. The vehicle transfer alignment experiment indicates that by using the adaptive H∞ filtering method with delay compensation, the transfer alignment accuracy and the pure inertial navigation accuracy can be dramatically improved, which demonstrates the superiority of the proposed filtering method. PMID:29182592
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tummeltshammer, Clemens; Taylor, Alaric; Kenyon, Anthony J.
2014-11-07
We investigate homeotropically aligned fluorophores and Förster resonance energy transfer (FRET) for luminescent solar concentrators using Monte-Carlo ray tracing. The homeotropic alignment strongly improves the trapping efficiency, while FRET circumvents the low absorption at homeotropic alignment by separating the absorption and emission processes. We predict that this design doped with two organic dye molecules can yield a 82.9% optical efficiency improvement compared to a single, arbitrarily oriented dye molecule. We also show that quantum dots are prime candidates for absorption/donor fluorophores due to their wide absorption band. The potentially strong re-absorption and low quantum yield of quantum dots is notmore » a hindrance for this design.« less
2009-01-01
Background Identical blood samples tested using different kits can give markedly different hepatitis B virus (HBV) DNA levels, which can cause difficulty in the interpretation of viral load. A universal double-stranded DNA control or standard that can be used in all commercial HBV DNA nucleic acid amplification assay kits is urgently needed. By aligning all HBV genotypes (A-H), we found that the surface antigen gene and precore-core gene regions of HBV are the most conserved regions among the different HBV genotypes. We constructed a chimeric fragment by overlapping extension polymerase chain reaction and obtained a 1,349-bp HBVC+S fragment. We then packaged the fragment into lambda phages using a traditional lambda phage cloning procedure. Results The obtained armored DNA was resistant to DNase I digestion and was stable, noninfectious to humans, and could be easily extracted using commercial kits. More importantly, the armored DNA may be used with all HBV DNA nucleic acid amplification assay kits. Conclusions The lambda phage packaging system can be used as an excellent expression platform for armored DNA. The obtained armored DNA possessed all characteristics of an excellent positive control or standard. In addition, this armored DNA is likely to be appropriate for all commercial HBV DNA nucleic acid amplification detection kits. Thus, the constructed armored DNA can probably be used as a universal positive control or standard in HBV DNA assays. PMID:20025781
Yu, Guihua; Kushwaha, Amit; Lee, Jungkyu K; Shaqfeh, Eric S G; Bao, Zhenan
2011-01-25
DNA has been recently explored as a powerful tool for developing molecular scaffolds for making reproducible and reliable metal contacts to single organic semiconducting molecules. A critical step in the process of exploiting DNA-organic molecule-DNA (DOD) array structures is the controlled tethering and stretching of DNA molecules. Here we report the development of reproducible surface chemistry for tethering DNA molecules at tunable density and demonstrate shear flow processing as a rationally controlled approach for stretching/aligning DNA molecules of various lengths. Through enzymatic cleavage of λ-phage DNA to yield a series of DNA chains of various lengths from 17.3 μm down to 4.2 μm, we have investigated the flow/extension behavior of these tethered DNA molecules under different flow strengths in the flow-gradient plane. We compared Brownian dynamic simulations for the flow dynamics of tethered λ-DNA in shear, and found our flow-gradient plane experimental results matched well with our bead-spring simulations. The shear flow processing demonstrated in our studies represents a controllable approach for tethering and stretching DNA molecules of various lengths. Together with further metallization of DNA chains within DOD structures, this bottom-up approach can potentially enable efficient and reliable fabrication of large-scale nanoelectronic devices based on single organic molecules, therefore opening opportunities in both fundamental understanding of charge transport at the single molecular level and many exciting applications for ever-shrinking molecular circuits.
A phylogeny of robber flies (Diptera: Asilidae) at the subfamilial level: molecular evidence.
Bybee, Seth M; Taylor, Sean D; Riley Nelson, C; Whiting, Michael F
2004-03-01
We present the first formal analysis of phylogenetic relationships among the Asilidae, based on four genes: 16S rDNA, 18S rDNA, 28S rDNA, and cytochrome oxidase II. Twenty-six ingroup taxa representing 11 of the 12 described subfamilies were selected to produce a phylogenetic estimate of asilid subfamilial relationships via optimization alignment, parsimony, and maximum likelihood techniques. Phylogenetic analyses support the monophyly of Asilidae with Leptogastrinae as the most basal robber fly lineage. Apocleinae+(Asilinae+Ommatiinae) is supported as monophyletic. The laphriinae-group (Laphriinae+Laphystiinae) and the dasypogoninae-group (Dasypogoninae+Stenopogoninae+Stichopogoninae+ Trigonomiminae) are paraphyletic. These results suggest that current subfamilial classification only partially reflects robber fly phylogeny, indicating the need for further phylogenetic investigation of this group.
Cruz, V P; Oliveira, C; Foresti, F
2015-01-01
5S rDNA genes of the stingray Potamotrygon motoro were PCR replicated, purified, cloned and sequenced. Two distinct classes of segments of different sizes were obtained. The smallest, with 342 bp units, was classified as class I, and the largest, with 1900 bp units, was designated as class II. Alignment with the consensus sequences for both classes showed changes in a few bases in the 5S rDNA genes. TATA-like sequences were detected in the nontranscribed spacer (NTS) regions of class I and a microsatellite (GCT) 10 sequence was detected in the NTS region of class II. The results obtained can help to understand the molecular organization of ribosomal genes and the mechanism of gene dispersion.
Mitochondrial gene rearrangements confirm the parallel evolution of the crab-like form.
Morrison, C L; Harvey, A W; Lavery, S; Tieu, K; Huang, Y; Cunningham, C W
2002-01-01
The repeated appearance of strikingly similar crab-like forms in independent decapod crustacean lineages represents a remarkable case of parallel evolution. Uncertainty surrounding the phylogenetic relationships among crab-like lineages has hampered evolutionary studies. As is often the case, aligned DNA sequences by themselves were unable to fully resolve these relationships. Four nested mitochondrial gene rearrangements--including one of the few reported movements of an arthropod protein-coding gene--are congruent with the DNA phylogeny and help to resolve a crucial node. A phylogenetic analysis of DNA sequences, and gene rearrangements, supported five independent origins of the crab-like form, and suggests that the evolution of the crab-like form may be irreversible. This result supports the utility of mitochondrial gene rearrangements in phylogenetic reconstruction. PMID:11886621
PolyaPeak: Detecting Transcription Factor Binding Sites from ChIP-seq Using Peak Shape Information
Wu, Hao; Ji, Hongkai
2014-01-01
ChIP-seq is a powerful technology for detecting genomic regions where a protein of interest interacts with DNA. ChIP-seq data for mapping transcription factor binding sites (TFBSs) have a characteristic pattern: around each binding site, sequence reads aligned to the forward and reverse strands of the reference genome form two separate peaks shifted away from each other, and the true binding site is located in between these two peaks. While it has been shown previously that the accuracy and resolution of binding site detection can be improved by modeling the pattern, efficient methods are unavailable to fully utilize that information in TFBS detection procedure. We present PolyaPeak, a new method to improve TFBS detection by incorporating the peak shape information. PolyaPeak describes peak shapes using a flexible Pólya model. The shapes are automatically learnt from the data using Minorization-Maximization (MM) algorithm, then integrated with the read count information via a hierarchical model to distinguish true binding sites from background noises. Extensive real data analyses show that PolyaPeak is capable of robustly improving TFBS detection compared with existing methods. An R package is freely available. PMID:24608116
Seamless, axially aligned, fiber tubes, meshes, microbundles and gradient biomaterial constructs
Elia, Roberto; Firpo, Matthew A.; Kaplan, David L.; Peattie, Robert A.
2012-01-01
A new electrospinning apparatus was developed to generate nanofibrous materials with improved organizational control. The system functions by oscillating the deposition signal (ODS) of multiple collectors, allowing significantly improved nanofiber control by manipulating the electric field which drives the electrospinning process. Other electrospinning techniques designed to impart deposited fiber organizational control, such as rotating mandrels or parallel collector systems, do not generate seamless constructs with high quality alignment in sizes large enough for medical devices. In contrast, the ODS collection system produces deposited fiber networks with highly pure alignment in a variety of forms and sizes, including flat (8 × 8 cm2), tubular (1.3 cm diameter), or rope-like microbundle (45 μm diameter) samples. Additionally, the mechanism of our technique allows for scale-up beyond these dimensions. The ODS collection system produced 81.6 % of fibers aligned within 5° of the axial direction, nearly a four-fold improvement over the rotating mandrel technique. The meshes produced from the 9 % (w/v) fibroin/PEO blend demonstrated significant mechanical anisotropy due to the fiber alignment. In 37 °C PBS, aligned samples produced an ultimate tensile strength of 16.47 ± 1.18 MPa, a Young's modulus of 37.33 MPa, and a yield strength of 7.79 ± 1.13 MPa. The material was 300 % stiffer when extended in the direction of fiber alignment and required 20 times the amount of force to be deformed, compared to aligned meshes extended perpendicular to the fiber direction. The ODS technique could be applied to any electrospinnable polymer to overcome the more limited uniformity and induced mechanical strain of rotating mandrel techniques, and greatly surpasses the limited length of standard parallel collector techniques. PMID:22890517
Evaluation of mathematical algorithms for automatic patient alignment in radiosurgery.
Williams, Kenneth M; Schulte, Reinhard W; Schubert, Keith E; Wroe, Andrew J
2015-06-01
Image registration techniques based on anatomical features can serve to automate patient alignment for intracranial radiosurgery procedures in an effort to improve the accuracy and efficiency of the alignment process as well as potentially eliminate the need for implanted fiducial markers. To explore this option, four two-dimensional (2D) image registration algorithms were analyzed: the phase correlation technique, mutual information (MI) maximization, enhanced correlation coefficient (ECC) maximization, and the iterative closest point (ICP) algorithm. Digitally reconstructed radiographs from the treatment planning computed tomography scan of a human skull were used as the reference images, while orthogonal digital x-ray images taken in the treatment room were used as the captured images to be aligned. The accuracy of aligning the skull with each algorithm was compared to the alignment of the currently practiced procedure, which is based on a manual process of selecting common landmarks, including implanted fiducials and anatomical skull features. Of the four algorithms, three (phase correlation, MI maximization, and ECC maximization) demonstrated clinically adequate (ie, comparable to the standard alignment technique) translational accuracy and improvements in speed compared to the interactive, user-guided technique; however, the ICP algorithm failed to give clinically acceptable results. The results of this work suggest that a combination of different algorithms may provide the best registration results. This research serves as the initial groundwork for the translation of automated, anatomy-based 2D algorithms into a real-world system for 2D-to-2D image registration and alignment for intracranial radiosurgery. This may obviate the need for invasive implantation of fiducial markers into the skull and may improve treatment room efficiency and accuracy. © The Author(s) 2014.
Design pattern mining using distributed learning automata and DNA sequence alignment.
Esmaeilpour, Mansour; Naderifar, Vahideh; Shukur, Zarina
2014-01-01
Over the last decade, design patterns have been used extensively to generate reusable solutions to frequently encountered problems in software engineering and object oriented programming. A design pattern is a repeatable software design solution that provides a template for solving various instances of a general problem. This paper describes a new method for pattern mining, isolating design patterns and relationship between them; and a related tool, DLA-DNA for all implemented pattern and all projects used for evaluation. DLA-DNA achieves acceptable precision and recall instead of other evaluated tools based on distributed learning automata (DLA) and deoxyribonucleic acid (DNA) sequences alignment. The proposed method mines structural design patterns in the object oriented source code and extracts the strong and weak relationships between them, enabling analyzers and programmers to determine the dependency rate of each object, component, and other section of the code for parameter passing and modular programming. The proposed model can detect design patterns better that available other tools those are Pinot, PTIDEJ and DPJF; and the strengths of their relationships. The result demonstrate that whenever the source code is build standard and non-standard, based on the design patterns, then the result of the proposed method is near to DPJF and better that Pinot and PTIDEJ. The proposed model is tested on the several source codes and is compared with other related models and available tools those the results show the precision and recall of the proposed method, averagely 20% and 9.6% are more than Pinot, 27% and 31% are more than PTIDEJ and 3.3% and 2% are more than DPJF respectively. The primary idea of the proposed method is organized in two following steps: the first step, elemental design patterns are identified, while at the second step, is composed to recognize actual design patterns.
mtDNA-Server: next-generation sequencing data analysis of human mitochondrial DNA in the cloud.
Weissensteiner, Hansi; Forer, Lukas; Fuchsberger, Christian; Schöpf, Bernd; Kloss-Brandstätter, Anita; Specht, Günther; Kronenberg, Florian; Schönherr, Sebastian
2016-07-08
Next generation sequencing (NGS) allows investigating mitochondrial DNA (mtDNA) characteristics such as heteroplasmy (i.e. intra-individual sequence variation) to a higher level of detail. While several pipelines for analyzing heteroplasmies exist, issues in usability, accuracy of results and interpreting final data limit their usage. Here we present mtDNA-Server, a scalable web server for the analysis of mtDNA studies of any size with a special focus on usability as well as reliable identification and quantification of heteroplasmic variants. The mtDNA-Server workflow includes parallel read alignment, heteroplasmy detection, artefact or contamination identification, variant annotation as well as several quality control metrics, often neglected in current mtDNA NGS studies. All computational steps are parallelized with Hadoop MapReduce and executed graphically with Cloudgene. We validated the underlying heteroplasmy and contamination detection model by generating four artificial sample mix-ups on two different NGS devices. Our evaluation data shows that mtDNA-Server detects heteroplasmies and artificial recombinations down to the 1% level with perfect specificity and outperforms existing approaches regarding sensitivity. mtDNA-Server is currently able to analyze the 1000G Phase 3 data (n = 2,504) in less than 5 h and is freely accessible at https://mtdna-server.uibk.ac.at. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Charge transport and ac response under light illumination in gate-modulated DNA molecular junctions.
Zhang, Yan; Zhu, Wen-Huan; Ding, Guo-Hui; Dong, Bing; Wang, Xue-Feng
2015-05-22
Using a two-strand tight-binding model and within nonequilibrium Green's function approach, we study charge transport through DNA sequences (GC)NGC and (GC)1(TA)NTA (GC)3 sandwiched between two Pt electrodes. We show that at low temperature DNA sequence (GC)NGC exhibits coherent charge carrier transport at very small bias, since the highest occupied molecular orbital in the GC base pair can be aligned with the Fermi energy of the metallic electrodes by a gate voltage. A weak distance dependent conductance is found in DNA sequence (GC)1(TA)NTA (GC)3 with large NTA. Different from the mechanism of thermally induced hopping of charges proposed by the previous experiments, we find that this phenomenon is dominated by quantum tunnelling through discrete quantum well states in the TA base pairs. In addition, ac response of this DNA junction under light illumination is also investigated. The suppression of ac conductances of the left and right lead of DNA sequences at some particular frequencies is attributed to the excitation of electrons in the DNA to the lead Fermi surface by ac potential, or the excitation of electrons in deep DNA energy levels to partially occupied energy levels in the transport window. Therefore, measuring ac response of DNA junctions can reveal a wealth of information about the intrinsic dynamics of DNA molecules.
The Effectiveness of Aligned Developmental Feedback on the Overhand Throw in Third-Grade Students
ERIC Educational Resources Information Center
Cohen, Rona; Goodway, Jacqueline D.; Lidor, Ronnie
2012-01-01
Background: To improve student performance, teachers need to evaluate the developmental level of the child and to deliver feedback statements that correspond with the student's ability to process the information delivered. Therefore, feedback aligned with the developmental level of the child (aligned developmental feedback--ADF) is sometimes…
Corrected High-Frame Rate Anchored Ultrasound with Software Alignment
ERIC Educational Resources Information Center
Miller, Amanda L.; Finch, Kenneth B.
2011-01-01
Purpose: To improve lingual ultrasound imaging with the Corrected High Frame Rate Anchored Ultrasound with Software Alignment (CHAUSA; Miller, 2008) method. Method: A production study of the IsiXhosa alveolar click is presented. Articulatory-to-acoustic alignment is demonstrated using a Tri-Modal 3-ms pulse generator. Images from 2 simultaneous…
Drumheller, Douglas S.
1998-01-01
An improved well-pump for geothermal wells, an alignment system for a well-pump, and to a method for aligning a rotor and stator within a well-pump, wherein the well-pump has a whistle assembly formed at a bottom portion thereof, such that variations in the frequency of the whistle, indicating misalignment, may be monitored during pumping.
Aligning Assessment and Instruction with State Standards for Children with Significant Disabilities
ERIC Educational Resources Information Center
Parrish, Polly R.; Stodden, Robert A.
2009-01-01
This article presents a classroom teacher's perspective on one of the important requirements of the No Child Left Behind Act of 2001 (NCLB) legislation and aligned language found in the Individuals With Disabilities Education Improvement Act (IDEA 2004)--that of aligning assessment and instructional practices with state academic content standard…
Munger, Steven C.; Raghupathy, Narayanan; Choi, Kwangbom; Simons, Allen K.; Gatti, Daniel M.; Hinerfeld, Douglas A.; Svenson, Karen L.; Keller, Mark P.; Attie, Alan D.; Hibbs, Matthew A.; Graber, Joel H.; Chesler, Elissa J.; Churchill, Gary A.
2014-01-01
Massively parallel RNA sequencing (RNA-seq) has yielded a wealth of new insights into transcriptional regulation. A first step in the analysis of RNA-seq data is the alignment of short sequence reads to a common reference genome or transcriptome. Genetic variants that distinguish individual genomes from the reference sequence can cause reads to be misaligned, resulting in biased estimates of transcript abundance. Fine-tuning of read alignment algorithms does not correct this problem. We have developed Seqnature software to construct individualized diploid genomes and transcriptomes for multiparent populations and have implemented a complete analysis pipeline that incorporates other existing software tools. We demonstrate in simulated and real data sets that alignment to individualized transcriptomes increases read mapping accuracy, improves estimation of transcript abundance, and enables the direct estimation of allele-specific expression. Moreover, when applied to expression QTL mapping we find that our individualized alignment strategy corrects false-positive linkage signals and unmasks hidden associations. We recommend the use of individualized diploid genomes over reference sequence alignment for all applications of high-throughput sequencing technology in genetically diverse populations. PMID:25236449
Iterative pass optimization of sequence data
NASA Technical Reports Server (NTRS)
Wheeler, Ward C.
2003-01-01
The problem of determining the minimum-cost hypothetical ancestral sequences for a given cladogram is known to be NP-complete. This "tree alignment" problem has motivated the considerable effort placed in multiple sequence alignment procedures. Wheeler in 1996 proposed a heuristic method, direct optimization, to calculate cladogram costs without the intervention of multiple sequence alignment. This method, though more efficient in time and more effective in cladogram length than many alignment-based procedures, greedily optimizes nodes based on descendent information only. In their proposal of an exact multiple alignment solution, Sankoff et al. in 1976 described a heuristic procedure--the iterative improvement method--to create alignments at internal nodes by solving a series of median problems. The combination of a three-sequence direct optimization with iterative improvement and a branch-length-based cladogram cost procedure, provides an algorithm that frequently results in superior (i.e., lower) cladogram costs. This iterative pass optimization is both computation and memory intensive, but economies can be made to reduce this burden. An example in arthropod systematics is discussed. c2003 The Willi Hennig Society. Published by Elsevier Science (USA). All rights reserved.
Alignment and Integration of Lightweight Mirror Segments
NASA Technical Reports Server (NTRS)
Evans, Tyler; Biskach, Michael; Mazzarella, Jim; McClelland, Ryan; Saha, Timo; Zhang, Will; Chan, Kai-Wing
2011-01-01
The optics for the International X-Ray Observatory (IXO) require alignment and integration of about fourteen thousand thin mirror segments to achieve the mission goal of 3.0 square meters of effective area at 1.25 keV with an angular resolution of five arc-seconds. These mirror segments are 0.4 mm thick, and 200 to 400 mm in size, which makes it difficult not to impart distortion at the sub-arc-second level. This paper outlines the precise alignment, permanent bonding, and verification testing techniques developed at NASA's Goddard Space Flight Center (GSFC). Improvements in alignment include new hardware and automation software. Improvements in bonding include two module new simulators to bond mirrors into, a glass housing for proving single pair bonding, and a Kovar module for bonding multiple pairs of mirrors. Three separate bonding trials were x-ray tested producing results meeting the requirement of sub ten arc-second alignment. This paper will highlight these recent advances in alignment, testing, and bonding techniques and the exciting developments in thin x-ray optic technology development.
X-ray verification of an optically aligned off-plane grating module
NASA Astrophysics Data System (ADS)
Donovan, Benjamin D.; McEntaffer, Randall L.; Tutt, James H.; DeRoo, Casey T.; Allured, Ryan; Gaskin, Jessica A.; Kolodziejczak, Jeffery J.
2018-01-01
Off-plane x-ray reflection gratings are theoretically capable of achieving high resolution and high diffraction efficiencies over the soft x-ray bandpass, making them an ideal technology to implement on upcoming x-ray spectroscopy missions. To achieve high effective area, these gratings must be aligned into grating modules. X-ray testing was performed on an aligned grating module to assess the current optical alignment methods. Results indicate that the grating module achieved the desired alignment for an upcoming x-ray spectroscopy suborbital rocket payload with modest effective area and resolving power. These tests have also outlined a pathway towards achieving the stricter alignment tolerances of future x-ray spectrometer payloads, which require improvements in alignment metrology, grating fabrication, and testing techniques.
Zhao, Ya-E; Xu, Ji-Ru; Hu, Li; Wu, Li-Ping; Wang, Zheng-Hang
2012-05-01
The study for the first time attempted to accomplish 18S ribosomal DNA (rDNA) complete sequence amplification and analysis for three Demodex species (Demodex folliculorum, Demodex brevis and Demodex canis) based on gDNA extraction from individual mites. The mites were treated by DNA Release Additive and Hot Start II DNA Polymerase so as to promote mite disruption and increase PCR specificity. Determination of D. folliculorum gDNA showed that the gDNA yield reached the highest at 1 mite, tending to descend with the increase of mite number. The individual mite gDNA was successfully used for 18S rDNA fragment (about 900 bp) amplification examination. The alignments of 18S rDNA complete sequences of individual mite samples and those of pooled mite samples ( ≥ 1000mites/sample) showed over 97% identities for each species, indicating that the gDNA extracted from a single individual mite was as satisfactory as that from pooled mites for PCR amplification. Further pairwise sequence analyses showed that average divergence, genetic distance, transition/transversion or phylogenetic tree could not effectively identify the three Demodex species, largely due to the differentiation in the D. canis isolates. It can be concluded that the individual Demodex mite gDNA can satisfy the molecular study of Demodex. 18S rDNA complete sequence is suitable for interfamily identification in Cheyletoidea, but whether it is suitable for intrafamily identification cannot be confirmed until the ascertainment of the types of Demodex mites parasitizing in dogs. Copyright © 2012 Elsevier Inc. All rights reserved.
2012-01-01
Background The Nymphaeales (waterlilly and relatives) lineage has diverged as the second branch of basal angiosperms and comprises of two families: Cabombaceae and Nymphaceae. The classification of Nymphaeales and phylogeny within the flowering plants are quite intriguing as several systems (Thorne system, Dahlgren system, Cronquist system, Takhtajan system and APG III system (Angiosperm Phylogeny Group III system) have attempted to redefine the Nymphaeales taxonomy. There have been also fossil records consisting especially of seeds, pollen, stems, leaves and flowers as early as the lower Cretaceous. Here we present an in silico study of the order Nymphaeales taking maturaseK (matK) and internal transcribed spacer (ITS2) as biomarkers for phylogeny reconstruction (using character-based methods and Bayesian approach) and identification of motifs for DNA barcoding. Results The Maximum Likelihood (ML) and Bayesian approach yielded congruent fully resolved and well-supported trees using a concatenated (ITS2+ matK) supermatrix aligned dataset. The taxon sampling corroborates the monophyly of Cabombaceae. Nuphar emerges as a monophyletic clade in the family Nymphaeaceae while there are slight discrepancies in the monophyletic nature of the genera Nymphaea owing to Victoria-Euryale and Ondinea grouping in the same node of Nymphaeaceae. ITS2 secondary structures alignment corroborate the primary sequence analysis. Hydatellaceae emerged as a sister clade to Nymphaeaceae and had a basal lineage amongst the water lilly clades. Species from Cycas and Ginkgo were taken as outgroups and were rooted in the overall tree topology from various methods. Conclusions MatK genes are fast evolving highly variant regions of plant chloroplast DNA that can serve as potential biomarkers for DNA barcoding and also in generating primers for angiosperms with identification of unique motif regions. We have reported unique genus specific motif regions in the Order Nymphaeles from matK dataset which can be further validated for barcoding and designing of PCR primers. Our analysis using a novel approach of sequence-structure alignment and phylogenetic reconstruction using molecular morphometrics congrue with the current placement of Hydatellaceae within the early-divergent angiosperm order Nymphaeales. The results underscore the fact that more diverse genera, if not fully resolved to be monophyletic, should be represented by all major lineages. PMID:23282079
Identification of a Herbal Powder by Deoxyribonucleic Acid Barcoding and Structural Analyses.
Sheth, Bhavisha P; Thaker, Vrinda S
2015-10-01
Authentic identification of plants is essential for exploiting their medicinal properties as well as to stop the adulteration and malpractices with the trade of the same. To identify a herbal powder obtained from a herbalist in the local vicinity of Rajkot, Gujarat, using deoxyribonucleic acid (DNA) barcoding and molecular tools. The DNA was extracted from a herbal powder and selected Cassia species, followed by the polymerase chain reaction (PCR) and sequencing of the rbcL barcode locus. Thereafter the sequences were subjected to National Center for Biotechnology Information (NCBI) basic local alignment search tool (BLAST) analysis, followed by the protein three-dimension structure determination of the rbcL protein from the herbal powder and Cassia species namely Cassia fistula, Cassia tora and Cassia javanica (sequences obtained in the present study), Cassia Roxburghii, and Cassia abbreviata (sequences retrieved from Genbank). Further, the multiple and pairwise structural alignment were carried out in order to identify the herbal powder. The nucleotide sequences obtained from the selected species of Cassia were submitted to Genbank (Accession No. JX141397, JX141405, JX141420). The NCBI BLAST analysis of the rbcL protein from the herbal powder showed an equal sequence similarity (with reference to different parameters like E value, maximum identity, total score, query coverage) to C. javanica and C. roxburghii. In order to solve the ambiguities of the BLAST result, a protein structural approach was implemented. The protein homology models obtained in the present study were submitted to the protein model database (PM0079748-PM0079753). The pairwise structural alignment of the herbal powder (as template) and C. javanica and C. roxburghii (as targets individually) revealed a close similarity of the herbal powder with C. javanica. A strategy as used here, incorporating the integrated use of DNA barcoding and protein structural analyses could be adopted, as a novel rapid and economic procedure, especially in cases when protein coding loci are considered. Authentic identification of plants is essential for exploiting their medicinal properties as well as to stop the adulteration and malpractices with the trade of the same. A herbal powder was obtained from a herbalist in the local vicinity of Rajkot, Gujarat. An integrated approach using DNA barcoding and structural analyses was carried out to identify the herbal powder. The herbal powder was identified as Cassia javanica L.
Wan, Shixiang; Zou, Quan
2017-01-01
Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction. The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource. THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.
eShadow: A tool for comparing closely related sequences
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ovcharenko, Ivan; Boffelli, Dario; Loots, Gabriela G.
2004-01-15
Primate sequence comparisons are difficult to interpret due to the high degree of sequence similarity shared between such closely related species. Recently, a novel method, phylogenetic shadowing, has been pioneered for predicting functional elements in the human genome through the analysis of multiple primate sequence alignments. We have expanded this theoretical approach to create a computational tool, eShadow, for the identification of elements under selective pressure in multiple sequence alignments of closely related genomes, such as in comparisons of human to primate or mouse to rat DNA. This tool integrates two different statistical methods and allows for the dynamic visualizationmore » of the resulting conservation profile. eShadow also includes a versatile optimization module capable of training the underlying Hidden Markov Model to differentially predict functional sequences. This module grants the tool high flexibility in the analysis of multiple sequence alignments and in comparing sequences with different divergence rates. Here, we describe the eShadow comparative tool and its potential uses for analyzing both multiple nucleotide and protein alignments to predict putative functional elements. The eShadow tool is publicly available at http://eshadow.dcode.org/« less
NASA Astrophysics Data System (ADS)
Demming, Anna
2012-06-01
'Negative space' may be as important in the development of nanomaterials as it is in creating works of art. The term refers to the space around and between objects, an important aspect in artistic composition. In nanotechnology, while nanoposts and nanowires have been assiduously studied and exploited for enhancing the performance of solar cells [1], real-time chemical sensors [2], UV emitters [3] and many other applications, nanopore structures have also yielded important advances in a wide range of fields. In this issue Melnikov, Leburton and Gracheva report on the electrostatic properties of nanopores in a layered semiconductor, and show how they allow a more accurate characterization of DNA than pores in other membranes [4]. Nanoporous materials have been applied to a diverse range of technological challenges. In recognition of its potential in high-efficiency solar cells, Prakasam and colleagues in the US reported the first ever synthesis of self-aligned nanoporous haematite [5]. Haematite is abundant, stable, non-toxic and has a band gap in the visible region and, as their work demonstrates, the photoresponse of nanoporous haematite is very promising for energy harvesting applications. Nanoporous aluminum oxide has also proved to be a particularly valuable material in applications ranging from liquid display panels to biosensor microchips. A collaboration of researchers in Taiwan demonstrated that porous aluminum oxide on an indium tin oxide surface could act as an alignment layer in liquid crystal display panels that have a transmittance of 60-80%, and switch from black to bright with a response time of 62.5 ms [6]. In Korea, Chung, Son and Min investigated the effect of nanostructural parameters of porous aluminum oxide on cell adhesion and proliferation for cell-based microchips [7]. While aluminum oxide without any modifications is not favourable for adherent cell culture, the proliferation of cells dramatically increased in porous aluminum oxide, particularly when the aspect ratio of the nanopore was near unity. In the mid 1990s a collaboration of researchers in the US demonstrated that nanopores could be used to characterize DNA [8]. They showed that as an electric field drove DNA molecules through a pore in a lipid bilayer membrane, the decrease in ionic current due to the partially blocked channel allowed measurement of the polynucleotide length. 'With further improvements, the method could in principle provide direct, high-speed detection of the sequence of bases in single molecules of DNA or RNA', suggested the authors. The idea inspired a catalogue of further research. Gracheva and colleagues in Illinois described a modified approach to detecting DNA using a nanopore in a membrane fabricated from a metal-oxide-semiconductor (MOS) capacitor [9]. The use of semiconductor materials allows the direct integration of high-sensitivity nanoscale MOS amplifiers on the nanopore layer structure to improve the signal. Researchers in the Netherlands have investigated ways of refining geometrical control of nanopores in SiN membranes for more accurate molecular characterization [10]. They reported the fabrication of nanopores using transmission electron microscope beams of different sizes. They found that the stability of small nanopores is related to their geometry, which could be controlled by the size of the beam used in fabrication. One challenge for accurate DNA characterization has been the speed of translocation through the pore. In a collaboration between the University of Illinois and the University of Notre-Dame in Indiana, researchers used a time-varying electric field to slow down the molecule's passage in the pore [11]. In this issue, researchers in Illinois have studied the translocation and stretching of DNA in a pore in a semiconductor membrane consisting of doped p- and n-layers of Si forming a p-n-junction [4]. Control over the stretching of the DNA is important, as is controlling the speed of translocation. Interference between the secondary structure of the probed molecule can interfere with the recorded signal, muddling measurements of the molecule's charge. According to Gracheva and her colleagues, 'the results indicate that the tunable local electric field inside the membrane can effectively control dynamics of a DNA in the channel to either momentarily trap, slow down or allow the biomolecule to translocate at will'. The fertility of research based on nanopores complements well the interest in nanowires and nanoposts. This may largely be a symptom of the comprehensive rigour in scientific enquiry which encourages the investigation of all approaches to a solution. It was once suggested that 'art and science have their meeting point in method' [12]. One might also argue that the creative and inspired ingenuity that is evident in the application of nanopores to such wide-ranging technological challenges demonstrates how the development of scientific methods is in itself a fine art. References [1] Qiu J et al 2010 Solution-derived 40 μm vertically aligned ZnO nanowire arrays as photoelectrodes in dye-sensitized solar cells Nanotechnology 21 195602 [2] Park I, Li Z, Pisano A P and Williams R S 2010 Top-down fabricated silicon nanowire sensors for real-time chemical detection Nanotechnology 21 015501 [3] Gao J et al 2011 UV light emitting transparent conducting tin-doped indium oxide (ITO) nanowires Nanotechnology 22 195706 [4] Melnikov D V, Leburton J-P and Gracheva M E 2012 Slowing down and stretching DNA with an electrically tunable nanopore in a p-n semiconductor membrane Nanotechnology 23 255501 [5] Prakasam H E, Varghese O K, Paulose M, Mor G K and Grimes C A 2006 Synthesis and photoelectrochemical properties of nanoporous iron (III) oxide by potentiostatic anodization Nanotechnology 17 4285-91 [6] Hong C, Tang T-T, Hung C-Y, Pan R-P and Fang W 2010 Liquid crystal alignment in nanoporous anodic aluminum oxide layer for LCD panel applications Nanotechnology 21 285201 [7] Chung S H, Son S J and Min J 2010 The nanostructure effect on the adhesion and growth rates of epithelial cells with well-defined nanoporous alumina s substrates Nanotechnology 21 125104 [8] Kasianowicz J J, Brandin E, Branton D and Deamer D W 1996 Characterization of individual polynucleotide molecules using a membrane channel Proc. Natl Acad. Sci. 93 13770-3 [9] Gracheva M E, Xiong A, Aksimentiev A, Schulten K, Timp G and Leburton J-P 2006 Simulation of the electric response of DNA translocation through a semiconductor nanopore-capacitor Nanotechnology 17 622-33 [10] Van Den Hout M, Hall A R, Wu M Y, Zandbergen H W, Dekker C and Dekker N H 2010 Controlling nanopore size, shape and stability Nanotechnology 21 115304 [11] Mirsaidov U, Comer J, Dimitrov V, Aksimentiev A and Timp G 2010 Slowing the translocation of double-stranded DNA using a nanopore smaller than the double helix Nanotechnology 21 395501 [12] Baron E and Bulwer L L 1864 Caxtoniana vol 2 (Leipzig: Bernard Tauchnits) p 122
Suwannasai, Nuttika; Martín, María P; Phosri, Cherdchai; Sihanonth, Prakitsin; Whalley, Anthony J S; Spouge, John L
2013-01-01
Thailand, a part of the Indo-Burma biodiversity hotspot, has many endemic animals and plants. Some of its fungal species are difficult to recognize and separate, complicating assessments of biodiversity. We assessed species diversity within the fungal genera Annulohypoxylon and Hypoxylon, which produce biologically active and potentially therapeutic compounds, by applying classical taxonomic methods to 552 teleomorphs collected from across Thailand. Using probability of correct identification (PCI), we also assessed the efficacy of automated species identification with a fungal barcode marker, ITS, in the model system of Annulohypoxylon and Hypoxylon. The 552 teleomorphs yielded 137 ITS sequences; in addition, we examined 128 GenBank ITS sequences, to assess biases in evaluating a DNA barcode with GenBank data. The use of multiple sequence alignment in a barcode database like BOLD raises some concerns about non-protein barcode markers like ITS, so we also compared species identification using different alignment methods. Our results suggest the following. (1) Multiple sequence alignment of ITS sequences is competitive with pairwise alignment when identifying species, so BOLD should be able to preserve its present bioinformatics workflow for species identification for ITS, and possibly therefore with at least some other non-protein barcode markers. (2) Automated species identification is insensitive to a specific choice of evolutionary distance, contributing to resolution of a current debate in DNA barcoding. (3) Statistical methods are available to address, at least partially, the possibility of expert misidentification of species. Phylogenetic trees discovered a cryptic species and strongly supported monophyletic clades for many Annulohypoxylon and Hypoxylon species, suggesting that ITS can contribute usefully to a barcode for these fungi. The PCIs here, derived solely from ITS, suggest that a fungal barcode will require secondary markers in Annulohypoxylon and Hypoxylon, however. The URL http://tinyurl.com/spouge-barcode contains computer programs and other supplementary material relevant to this article.
Holt, Emily A.; Young, Craig; Keetch, Jared; Larsen, Skylar; Mollner, Brayden
2015-01-01
Critical thinking is often considered an essential learning outcome of institutions in higher education. Previous work has proposed three pedagogical strategies to address this goal: more active, student-centered in-class instruction, assessments which contain higher-order cognitive questions, and greater alignment within a classroom (i.e., high agreement of the cognitive level of learning objectives, assessments, and in-class instruction). Our goals were to determine which of these factors, individually or the interactions therein, contributed most to improvements in university students’ critical thinking. We assessed students’ higher-order cognitive skills in introductory non-majors biology courses the first and last week of instruction. For each of the fifteen sections observed, we also measured the cognitive level of assessments and learning objectives, evaluated the learner-centeredness of each classroom, and calculated an alignment score for each class. The best model to explain improvements in students’ high-order cognitive skills contained the measure of learner-centeredness of the class and pre-quiz scores as a covariate. The cognitive level of assessments, learning objectives, nor alignment explained improvements in students’ critical thinking. In accordance with much of the current literature, our findings support that more student-centered classes had greater improvements in student learning. However, more research is needed to clarify the role of assessment and alignment in student learning. PMID:26340659
Holt, Emily A; Young, Craig; Keetch, Jared; Larsen, Skylar; Mollner, Brayden
2015-01-01
Critical thinking is often considered an essential learning outcome of institutions in higher education. Previous work has proposed three pedagogical strategies to address this goal: more active, student-centered in-class instruction, assessments which contain higher-order cognitive questions, and greater alignment within a classroom (i.e., high agreement of the cognitive level of learning objectives, assessments, and in-class instruction). Our goals were to determine which of these factors, individually or the interactions therein, contributed most to improvements in university students' critical thinking. We assessed students' higher-order cognitive skills in introductory non-majors biology courses the first and last week of instruction. For each of the fifteen sections observed, we also measured the cognitive level of assessments and learning objectives, evaluated the learner-centeredness of each classroom, and calculated an alignment score for each class. The best model to explain improvements in students' high-order cognitive skills contained the measure of learner-centeredness of the class and pre-quiz scores as a covariate. The cognitive level of assessments, learning objectives, nor alignment explained improvements in students' critical thinking. In accordance with much of the current literature, our findings support that more student-centered classes had greater improvements in student learning. However, more research is needed to clarify the role of assessment and alignment in student learning.
AMAS: a fast tool for alignment manipulation and computing of summary statistics.
Borowiec, Marek L
2016-01-01
The amount of data used in phylogenetics has grown explosively in the recent years and many phylogenies are inferred with hundreds or even thousands of loci and many taxa. These modern phylogenomic studies often entail separate analyses of each of the loci in addition to multiple analyses of subsets of genes or concatenated sequences. Computationally efficient tools for handling and computing properties of thousands of single-locus or large concatenated alignments are needed. Here I present AMAS (Alignment Manipulation And Summary), a tool that can be used either as a stand-alone command-line utility or as a Python package. AMAS works on amino acid and nucleotide alignments and combines capabilities of sequence manipulation with a function that calculates basic statistics. The manipulation functions include conversions among popular formats, concatenation, extracting sites and splitting according to a pre-defined partitioning scheme, creation of replicate data sets, and removal of taxa. The statistics calculated include the number of taxa, alignment length, total count of matrix cells, overall number of undetermined characters, percent of missing data, AT and GC contents (for DNA alignments), count and proportion of variable sites, count and proportion of parsimony informative sites, and counts of all characters relevant for a nucleotide or amino acid alphabet. AMAS is particularly suitable for very large alignments with hundreds of taxa and thousands of loci. It is computationally efficient, utilizes parallel processing, and performs better at concatenation than other popular tools. AMAS is a Python 3 program that relies solely on Python's core modules and needs no additional dependencies. AMAS source code and manual can be downloaded from http://github.com/marekborowiec/AMAS/ under GNU General Public License.
NASA Astrophysics Data System (ADS)
Yahiro, Takehisa; Sawamura, Junpei; Dosho, Tomonori; Shiba, Yuji; Ando, Satoshi; Ishikawa, Jun; Morita, Masahiro; Shibazaki, Yuichi
2018-03-01
One of the main components of an On-Product Overlay (OPO) error budget is the process induced wafer error. This necessitates wafer-to-wafer correction in order to optimize overlay accuracy. This paper introduces the Litho Booster (LB), standalone alignment station as a solution to improving OPO. LB can execute high speed alignment measurements without throughput (THP) loss. LB can be installed in any lithography process control loop as a metrology tool, and is then able to provide feed-forward (FF) corrections to the scanners. In this paper, the detailed LB design is described and basic LB performance and OPO improvement is demonstrated. Litho Booster's extendibility and applicability as a solution for next generation manufacturing accuracy and productivity challenges are also outlined
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Yang; Lee, Ju Hwan; Seo, Dae-Shik, E-mail: dsseo@yonsei.ac.kr
2016-09-05
Thin ion-beam (IB)-spurted dimethyl sulfate/poly(3,4-ethylenedioxythiophene) polystyrene sulfonate (DMS/PEDOT:PSS) layers with improved electro-optic performance are presented for aligning liquid crystals. IB spurting is effective for enhancing the conductivity of such layers, as well as the anchoring energy of the liquid crystals sandwiched between them. Compared with a commercial twisted-nematic cell assembled with polyimide alignment layers, the same cell assembled with 3.0-keV IB-spurted DMS/PEDOT:PSS alignment layers shows a 38% faster switching and a 93% lower residual direct current. The improved electro-optic performance here is likely due to the enhanced electric field effect and the charge-releasing ability of thin IB-spurted DMS/PEDOT:PSS layers.
EGenBio: A Data Management System for Evolutionary Genomics and Biodiversity
Nahum, Laila A; Reynolds, Matthew T; Wang, Zhengyuan O; Faith, Jeremiah J; Jonna, Rahul; Jiang, Zhi J; Meyer, Thomas J; Pollock, David D
2006-01-01
Background Evolutionary genomics requires management and filtering of large numbers of diverse genomic sequences for accurate analysis and inference on evolutionary processes of genomic and functional change. We developed Evolutionary Genomics and Biodiversity (EGenBio; ) to begin to address this. Description EGenBio is a system for manipulation and filtering of large numbers of sequences, integrating curated sequence alignments and phylogenetic trees, managing evolutionary analyses, and visualizing their output. EGenBio is organized into three conceptual divisions, Evolution, Genomics, and Biodiversity. The Genomics division includes tools for selecting pre-aligned sequences from different genes and species, and for modifying and filtering these alignments for further analysis. Species searches are handled through queries that can be modified based on a tree-based navigation system and saved. The Biodiversity division contains tools for analyzing individual sequences or sequence alignments, whereas the Evolution division contains tools involving phylogenetic trees. Alignments are annotated with analytical results and modification history using our PRAED format. A miscellaneous Tools section and Help framework are also available. EGenBio was developed around our comparative genomic research and a prototype database of mtDNA genomes. It utilizes MySQL-relational databases and dynamic page generation, and calls numerous custom programs. Conclusion EGenBio was designed to serve as a platform for tools and resources to ease combined analysis in evolution, genomics, and biodiversity. PMID:17118150
Wang, Chao; Shi, Xue; Liu, Lin; Li, Haiyan; Ammiraju, Jetty S S; Kudrna, David A; Xiong, Wentao; Wang, Hao; Dai, Zhaozhao; Zheng, Yonglian; Lai, Jinsheng; Jin, Weiwei; Messing, Joachim; Bennetzen, Jeffrey L; Wing, Rod A; Luo, Meizhong
2013-11-01
Maize is one of the most important food crops and a key model for genetics and developmental biology. A genetically anchored and high-quality draft genome sequence of maize inbred B73 has been obtained to serve as a reference sequence. To facilitate evolutionary studies in maize and its close relatives, much like the Oryza Map Alignment Project (OMAP) (www.OMAP.org) bacterial artificial chromosome (BAC) resource did for the rice community, we constructed BAC libraries for maize inbred lines Zheng58, Chang7-2, and Mo17 and maize wild relatives Zea mays ssp. parviglumis and Tripsacum dactyloides. Furthermore, to extend functional genomic studies to maize and sorghum, we also constructed binary BAC (BIBAC) libraries for the maize inbred B73 and the sorghum landrace Nengsi-1. The BAC/BIBAC vectors facilitate transfer of large intact DNA inserts from BAC clones to the BIBAC vector and functional complementation of large DNA fragments. These seven Zea Map Alignment Project (ZMAP) BAC/BIBAC libraries have average insert sizes ranging from 92 to 148 kb, organellar DNA from 0.17 to 2.3%, empty vector rates between 0.35 and 5.56%, and genome equivalents of 4.7- to 8.4-fold. The usefulness of the Parviglumis and Tripsacum BAC libraries was demonstrated by mapping clones to the reference genome. Novel genes and alleles present in these ZMAP libraries can now be used for functional complementation studies and positional or homology-based cloning of genes for translational genomics.
Miller, Mark P.; Knaus, Brian J.; Mullins, Thomas D.; Haig, Susan M.
2013-01-01
SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (SSRs; for example, microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains three analysis modules along with a fourth control module that can be used to automate analyses of large volumes of data. The modules are used to (1) identify the subset of paired-end sequences that pass quality standards, (2) align paired-end reads into a single composite DNA sequence, and (3) identify sequences that possess microsatellites conforming to user specified parameters. Each of the three separate analysis modules also can be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc). All modules are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, Windows). The program suite relies on a compiled Python extension module to perform paired-end alignments. Instructions for compiling the extension from source code are provided in the documentation. Users who do not have Python installed on their computers or who do not have the ability to compile software also may choose to download packaged executable files. These files include all Python scripts, a copy of the compiled extension module, and a minimal installation of Python in a single binary executable. See program documentation for more information.
Nakasone, Cass K; Abdeen, Ayesha; Khachatourians, Armond G; Sugimori, Tanzo; Vince, Kelly G
2008-12-01
We performed a retrospective study of the radiographic position of femoral and tibial components in a series of revision total knee arthroplasties using diaphyseal-engaging, press fit, modular stems. Fifty-two consecutive revision cases were performed. Femoral and tibial component alignment was measured preoperatively and postoperatively. The canal-filling ratio was measured and correlated with anatomic alignment. There was a trend toward improved alignment with increasing canal fill, suggesting that uncemented diaphyseal engaging press-fit modular stems facilitate accurate alignment for both femoral and tibial components in revision surgery.
Drumheller, D.S.
1998-10-20
An improved well-pump for geothermal wells, an alignment system for a well-pump, and to a method for aligning a rotor and stator within a well-pump are disclosed, wherein the well-pump has a whistle assembly formed at a bottom portion thereof, such that variations in the frequency of the whistle, indicating misalignment, may be monitored during pumping. 6 figs.
Synthesis and orientation of barium hexaferrite ceramics by magnetic alignment
NASA Astrophysics Data System (ADS)
Autissier, Denis
1990-01-01
Particles of Ba 2Mn xZn 2- xFe 12O 22 with planar structure were prepared by chemical precipitation. They were processed by sleep casting in presence of a magnetic field. The degree of alignment was improved by a special sintering treatment. By this procedure an alignment as high as 99.9% is obtained.
Aligning Educational Outcomes and Practices. Occasional Paper #26
ERIC Educational Resources Information Center
Hutchings, Pat
2016-01-01
The notion of alignment has become increasingly prominent in efforts to improve student learning today. The term, as used in this paper, refers to the linking of intended student learning outcomes with the processes and practices needed to foster those outcomes. Alignment is not a new idea, but it has become more salient as increasing numbers of…
Simulation and display of macromolecular complexes
NASA Technical Reports Server (NTRS)
Nir, S.; Garduno, R.; Rein, R.; Macelroy, R. D.
1977-01-01
In association with an investigation of the interaction of proteins with DNA and RNA, an interactive computer program for building, manipulating, and displaying macromolecular complexes has been designed. The system provides perspective, planar, and stereoscopic views on the computer terminal display, as well as views for standard and nonstandard observer locations. The molecule or its parts may be rotated and/or translated in any direction; bond connections may be added or removed by the viewer. Molecular fragments may be juxtaposed in such a way that given bonds are aligned, and given planes and points coincide. Another subroutine provides for the duplication of a given unit such as a DNA or amino-acid base.
On causal roles and selected effects: our genome is mostly junk.
Doolittle, W Ford; Brunet, Tyler D P
2017-12-05
The idea that much of our genome is irrelevant to fitness-is not the product of positive natural selection at the organismal level-remains viable. Claims to the contrary, and specifically that the notion of "junk DNA" should be abandoned, are based on conflating meanings of the word "function". Recent estimates suggest that perhaps 90% of our DNA, though biochemically active, does not contribute to fitness in any sequence-dependent way, and possibly in no way at all. Comparisons to vertebrates with much larger and smaller genomes (the lungfish and the pufferfish) strongly align with such a conclusion, as they have done for the last half-century.
Cloning of cDNA of major antigen of foot and mouth disease virus and expression in E. coli
NASA Astrophysics Data System (ADS)
Küpper, Hans; Keller, Walter; Kurz, Christina; Forss, Sonja; Schaller, Heinz
1981-02-01
Double-stranded DNA copies of the single-stranded genomic RNA of foot and mouth disease virus have been cloned into the Escherichia coli plasmid pBR322. A restriction map of the viral genome was established and aligned with the biochemical map of foot and mouth disease virus. The coding sequence for structural protein VP1, the major antigen of the virus, was identified and inserted into a plasmid vector where the expression of this sequence is under control of the phage λ PL promoter. In an appropriate host the synthesis of antigenic polypeptide can be demonstrated by radioimmunoassay.
Application of environmental DNA to detect an endangered marine skate species in the wild.
Weltz, Kay; Lyle, Jeremy M; Ovenden, Jennifer; Morgan, Jessica A T; Moreno, David A; Semmens, Jayson M
2017-01-01
Environmental DNA (eDNA) techniques have only recently been applied in the marine environment to detect the presence of marine species. Species-specific primers and probes were designed to detect the eDNA of the endangered Maugean skate (Zearaja maugeana) from as little as 1 L of water collected at depth (10-15 m) in Macquarie Harbour (MH), Tasmania. The identity of the eDNA was confirmed as Z. maugeana by sequencing the qPCR products and aligning these with the target sequence for a 100% match. This result has validated the use of this eDNA technique for detecting a rare species, Z. maugeana, in the wild. Being able to investigate the presence, and possibly the abundance, of Z. maugeana in MH and Bathurst harbour (BH), would be addressing a conservation imperative for the endangered Z. maugeana. For future application of this technique in the field, the rate of decay was determined for Z. maugeana eDNA under ambient dissolved oxygen (DO) levels (55% saturation) and lower DO (20% saturation) levels, revealing that the eDNA can be detected for 4 and 16 hours respectively, after which eDNA concentration drops below the detection threshold of the assay. With the rate of decay being influenced by starting eDNA concentrations, it is recommended that samples be filtered as soon as possible after collection to minimize further loss of eDNA prior to and during sample processing.
Application of environmental DNA to detect an endangered marine skate species in the wild
Morgan, Jessica A. T.; Moreno, David A.
2017-01-01
Environmental DNA (eDNA) techniques have only recently been applied in the marine environment to detect the presence of marine species. Species-specific primers and probes were designed to detect the eDNA of the endangered Maugean skate (Zearaja maugeana) from as little as 1 L of water collected at depth (10–15 m) in Macquarie Harbour (MH), Tasmania. The identity of the eDNA was confirmed as Z. maugeana by sequencing the qPCR products and aligning these with the target sequence for a 100% match. This result has validated the use of this eDNA technique for detecting a rare species, Z. maugeana, in the wild. Being able to investigate the presence, and possibly the abundance, of Z. maugeana in MH and Bathurst harbour (BH), would be addressing a conservation imperative for the endangered Z. maugeana. For future application of this technique in the field, the rate of decay was determined for Z. maugeana eDNA under ambient dissolved oxygen (DO) levels (55% saturation) and lower DO (20% saturation) levels, revealing that the eDNA can be detected for 4 and 16 hours respectively, after which eDNA concentration drops below the detection threshold of the assay. With the rate of decay being influenced by starting eDNA concentrations, it is recommended that samples be filtered as soon as possible after collection to minimize further loss of eDNA prior to and during sample processing. PMID:28591215
Torati, Sri Ramulu; Reddy, Venu; Yoon, Seok Soo; Kim, CheolGi
2016-04-15
The template assisted electrochemical deposition technique was used for the synthesis of gold nanotubes array (AuNTsA). The morphological structure of the synthesized AuNTsA was observed by scanning electron microscopy and found that the individual nanotubes are around 1.5 μm in length with a diameter of 200 nm. Nanotubes are vertically aligned to the Au thick film, which is formed during the synthesis process of nanotubes. The electrochemical performance of the AuNTsA was compared with the bare Au electrode and found that AuNTsA has better electron transfer surface than bare Au electrode which is due to the high surface area. Hence, the AuNTsA was used as an electrode for the fabrication of DNA hybridization biosensor for detection of Mycobacterium Tuberculosis DNA. The DNA hybridization biosensor constructed by AuNTsA electrode was characterized by cyclic voltammetry technique with Fe(CN)6(3-/4-) as an electrochemical redox indicator. The selectivity of the fabricated biosensor was illustrated by hybridization with complementary DNA and non-complementary DNA with probe DNA immobilized AuNTsA electrode using methylene blue as a hybridization indicator. The developed electrochemical DNA biosensor shows good linear range of complementary DNA concentration from 0.01 ng/μL to 100 ng/μL with high detection limit. Copyright © 2015 Elsevier B.V. All rights reserved.
Recombinant transfer in the basic genome of E. coli
Dixit, Purushottam; Studier, F. William; Pang, Tin Yau; ...
2015-07-07
An approximation to the ~4-Mbp basic genome shared by 32 strains of E. coli representing six evolutionary groups has been derived and analyzed computationally. A multiple-alignment of the 32 complete genome sequences was filtered to remove mobile elements and identify the most reliable ~90% of the aligned length of each of the resulting 496 basic-genome pairs. Patterns of single bp mutations (SNPs) in aligned pairs distinguish clonally inherited regions from regions where either genome has acquired DNA fragments from diverged genomes by homologous recombination since their last common ancestor. Such recombinant transfer is pervasive across the basic genome, mostly betweenmore » genomes in the same evolutionary group, and generates many unique mosaic patterns. The six least-diverged genome-pairs have one or two recombinant transfers of length ~40–115 kbp (and few if any other transfers), each containing one or more gene clusters known to confer strong selective advantage in some environments. Moderately diverged genome pairs (0.4–1% SNPs) show mosaic patterns of interspersed clonal and recombinant regions of varying lengths throughout the basic genome, whereas more highly diverged pairs within an evolutionary group or pairs between evolutionary groups having >1.3% SNPs have few clonal matches longer than a few kbp. Many recombinant transfers appear to incorporate fragments of the entering DNA produced by restriction systems of the recipient cell. A simple computational model can closely fit the data. As a result, most recombinant transfers seem likely to be due to generalized transduction by co-evolving populations of phages, which could efficiently distribute variability throughout bacterial genomes.« less
Recombinant transfer in the basic genome of E. coli
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dixit, Purushottam; Studier, F. William; Pang, Tin Yau
An approximation to the ~4-Mbp basic genome shared by 32 strains of E. coli representing six evolutionary groups has been derived and analyzed computationally. A multiple-alignment of the 32 complete genome sequences was filtered to remove mobile elements and identify the most reliable ~90% of the aligned length of each of the resulting 496 basic-genome pairs. Patterns of single bp mutations (SNPs) in aligned pairs distinguish clonally inherited regions from regions where either genome has acquired DNA fragments from diverged genomes by homologous recombination since their last common ancestor. Such recombinant transfer is pervasive across the basic genome, mostly betweenmore » genomes in the same evolutionary group, and generates many unique mosaic patterns. The six least-diverged genome-pairs have one or two recombinant transfers of length ~40–115 kbp (and few if any other transfers), each containing one or more gene clusters known to confer strong selective advantage in some environments. Moderately diverged genome pairs (0.4–1% SNPs) show mosaic patterns of interspersed clonal and recombinant regions of varying lengths throughout the basic genome, whereas more highly diverged pairs within an evolutionary group or pairs between evolutionary groups having >1.3% SNPs have few clonal matches longer than a few kbp. Many recombinant transfers appear to incorporate fragments of the entering DNA produced by restriction systems of the recipient cell. A simple computational model can closely fit the data. As a result, most recombinant transfers seem likely to be due to generalized transduction by co-evolving populations of phages, which could efficiently distribute variability throughout bacterial genomes.« less
Xu, Li; Ding, Zhi-Shan; Zhou, Yun-Kai; Tao, Xue-Fen
2009-06-01
To obtain the full-length cDNA sequence of Secoisolariciresinol Dehydrogenase gene from Dysosma versipellis by RACE PCR,then investigate the character of Secoisolariciresinol Dehydrogenase gene. The full-length cDNA sequence of Secoisolariciresinol Dehydrogenase gene was obtained by 3'-RACE and 5'-RACE from Dysosma versipellis. We first reported the full cDNA sequences of Secoisolariciresinol Dehydrogenase in Dysosma versipellis. The acquired gene was 991bp in full length, including 5' untranslated region of 42bp, 3' untranslated region of 112bp with Poly (A). The open reading frame (ORF) encoding 278 amino acid with molecular weight 29253.3 Daltons and isolectric point 6.328. The gene accession nucleotide sequence number in GeneBank was EU573789. Semi-quantitative RT-PCR analysis revealed that the Secoisolariciresinol Dehydrogenase gene was highly expressed in stem. Alignment of the amino acid sequence of Secoisolariciresinol Dehydrogenase indicated there may be some significant amino acid sequence difference among different species. Obtain the full-length cDNA sequence of Secoisolariciresinol Dehydrogenase gene from Dysosma versipellis.
Portable and Error-Free DNA-Based Data Storage.
Yazdi, S M Hossein Tabatabaei; Gabrys, Ryan; Milenkovic, Olgica
2017-07-10
DNA-based data storage is an emerging nonvolatile memory technology of potentially unprecedented density, durability, and replication efficiency. The basic system implementation steps include synthesizing DNA strings that contain user information and subsequently retrieving them via high-throughput sequencing technologies. Existing architectures enable reading and writing but do not offer random-access and error-free data recovery from low-cost, portable devices, which is crucial for making the storage technology competitive with classical recorders. Here we show for the first time that a portable, random-access platform may be implemented in practice using nanopore sequencers. The novelty of our approach is to design an integrated processing pipeline that encodes data to avoid costly synthesis and sequencing errors, enables random access through addressing, and leverages efficient portable sequencing via new iterative alignment and deletion error-correcting codes. Our work represents the only known random access DNA-based data storage system that uses error-prone nanopore sequencers, while still producing error-free readouts with the highest reported information rate/density. As such, it represents a crucial step towards practical employment of DNA molecules as storage media.
A multiple-alignment based primer design algorithm for genetically highly variable DNA targets
2013-01-01
Background Primer design for highly variable DNA sequences is difficult, and experimental success requires attention to many interacting constraints. The advent of next-generation sequencing methods allows the investigation of rare variants otherwise hidden deep in large populations, but requires attention to population diversity and primer localization in relatively conserved regions, in addition to recognized constraints typically considered in primer design. Results Design constraints include degenerate sites to maximize population coverage, matching of melting temperatures, optimizing de novo sequence length, finding optimal bio-barcodes to allow efficient downstream analyses, and minimizing risk of dimerization. To facilitate primer design addressing these and other constraints, we created a novel computer program (PrimerDesign) that automates this complex procedure. We show its powers and limitations and give examples of successful designs for the analysis of HIV-1 populations. Conclusions PrimerDesign is useful for researchers who want to design DNA primers and probes for analyzing highly variable DNA populations. It can be used to design primers for PCR, RT-PCR, Sanger sequencing, next-generation sequencing, and other experimental protocols targeting highly variable DNA samples. PMID:23965160
Photoconductivity in DNA-Porphyrin Complexes
NASA Astrophysics Data System (ADS)
Myint, Peco; Oxford, Emma; Nyazenga, Collence; Smith, Walter; Qi, Zhengqing; Johnson, A. T.
2015-03-01
We have measured the photoconductivity of λ - DNA that is modified by intercalating a porphyrin compound, meso-tetrakis(N-methyl-4-pyridiniumyl)porphyrin (TMPyP), into its base stacks. Intercalation was verified by a red shift and hypochromism of the Soret absorption peak. The DNA/porphyrin strands were then deposited onto oxidized silicon substrates which had been patterned with interdigitated electrodes, and blown dry. Electrical measurements were carried out under nitrogen, using illumination from a 445 nm laser; this wavelength falls within the absorption peak of the DNA/porphyrin complexes. When initially measured under dry nitrogen, the complexes show no photoconductivity or dark conductivity. However, at relative humidities of 30% and above, we do observe dark conductivity, and also photoconductivity that grows with time. Photoconductivity gets larger at higher relative humidity. Remarkably, when the humidity is lowered again, some photoconductivity is now observed, indicating a change that persists for more than 24 hours. It may be that the humidity alters the structure of the DNA, perhaps allowing for better alignment of the bases. This work was supported by NSF Grant BMAT-1306170.
Determination of a Screening Metric for High Diversity DNA Libraries.
Guido, Nicholas J; Handerson, Steven; Joseph, Elaine M; Leake, Devin; Kung, Li A
2016-01-01
The fields of antibody engineering, enzyme optimization and pathway construction rely increasingly on screening complex variant DNA libraries. These highly diverse libraries allow researchers to sample a maximized sequence space; and therefore, more rapidly identify proteins with significantly improved activity. The current state of the art in synthetic biology allows for libraries with billions of variants, pushing the limits of researchers' ability to qualify libraries for screening by measuring the traditional quality metrics of fidelity and diversity of variants. Instead, when screening variant libraries, researchers typically use a generic, and often insufficient, oversampling rate based on a common rule-of-thumb. We have developed methods to calculate a library-specific oversampling metric, based on fidelity, diversity, and representation of variants, which informs researchers, prior to screening the library, of the amount of oversampling required to ensure that the desired fraction of variant molecules will be sampled. To derive this oversampling metric, we developed a novel alignment tool to efficiently measure frequency counts of individual nucleotide variant positions using next-generation sequencing data. Next, we apply a method based on the "coupon collector" probability theory to construct a curve of upper bound estimates of the sampling size required for any desired variant coverage. The calculated oversampling metric will guide researchers to maximize their efficiency in using highly variant libraries.
Cache-Oblivious parallel SIMD Viterbi decoding for sequence search in HMMER.
Ferreira, Miguel; Roma, Nuno; Russo, Luis M S
2014-05-30
HMMER is a commonly used bioinformatics tool based on Hidden Markov Models (HMMs) to analyze and process biological sequences. One of its main homology engines is based on the Viterbi decoding algorithm, which was already highly parallelized and optimized using Farrar's striped processing pattern with Intel SSE2 instruction set extension. A new SIMD vectorization of the Viterbi decoding algorithm is proposed, based on an SSE2 inter-task parallelization approach similar to the DNA alignment algorithm proposed by Rognes. Besides this alternative vectorization scheme, the proposed implementation also introduces a new partitioning of the Markov model that allows a significantly more efficient exploitation of the cache locality. Such optimization, together with an improved loading of the emission scores, allows the achievement of a constant processing throughput, regardless of the innermost-cache size and of the dimension of the considered model. The proposed optimized vectorization of the Viterbi decoding algorithm was extensively evaluated and compared with the HMMER3 decoder to process DNA and protein datasets, proving to be a rather competitive alternative implementation. Being always faster than the already highly optimized ViterbiFilter implementation of HMMER3, the proposed Cache-Oblivious Parallel SIMD Viterbi (COPS) implementation provides a constant throughput and offers a processing speedup as high as two times faster, depending on the model's size.
Machine Learned Replacement of N-Labels for Basecalled Sequences in DNA Barcoding.
Ma, Eddie Y T; Ratnasingham, Sujeevan; Kremer, Stefan C
2018-01-01
This study presents a machine learning method that increases the number of identified bases in Sanger Sequencing. The system post-processes a KB basecalled chromatogram. It selects a recoverable subset of N-labels in the KB-called chromatogram to replace with basecalls (A,C,G,T). An N-label correction is defined given an additional read of the same sequence, and a human finished sequence. Corrections are added to the dataset when an alignment determines the additional read and human agree on the identity of the N-label. KB must also rate the replacement with quality value of in the additional read. Corrections are only available during system training. Developing the system, nearly 850,000 N-labels are obtained from Barcode of Life Datasystems, the premier database of genetic markers called DNA Barcodes. Increasing the number of correct bases improves reference sequence reliability, increases sequence identification accuracy, and assures analysis correctness. Keeping with barcoding standards, our system maintains an error rate of percent. Our system only applies corrections when it estimates low rate of error. Tested on this data, our automation selects and recovers: 79 percent of N-labels from COI (animal barcode); 80 percent from matK and rbcL (plant barcodes); and 58 percent from non-protein-coding sequences (across eukaryotes).
Gong, Mingbo; Tang, Chaoxi; Zhu, Changxiong
2014-11-01
A primary cDNA library of Penicillium oxalicum I1 was constructed using the switching mechanism at the 5' end of the RNA transcript (SMART) technique. A total of 106 clones showed halos in tricalcium phosphate (TCP) medium, and clone I-40 showed clear halos. The full-length cDNA of clone I-40 was 1355 bp with a complete open reading frame (ORF) of 1032 bp, encoding a protein of 343 amino acids. Multiple alignment analysis revealed a high degree of homology between the ORF of clone I-40 and delta-1-pyrroline-5-carboxylate dehydrogenase (P5CDH) of other fungi. The ORF expression vector was constructed and transformed into Escherichia coli DH5α. The transformant (ORF-1) with the P5CDH gene secreted organic acid in medium with TCP as the sole source of phosphate. Acetic acid and α-ketoglutarate were secreted in 4 and 24 h, respectively. ORF-1 decreased the pH of the medium from 6.62 to 3.45 and released soluble phosphate at 0.172 mg·mL(-1) in 28 h. Expression of the P. oxalicum I1 p5cdh gene in E. coli could enhance organic acid secretion and phosphate-solubilizing ability.
Fast discovery and visualization of conserved regions in DNA sequences using quasi-alignment
2013-01-01
Background Next Generation Sequencing techniques are producing enormous amounts of biological sequence data and analysis becomes a major computational problem. Currently, most analysis, especially the identification of conserved regions, relies heavily on Multiple Sequence Alignment and its various heuristics such as progressive alignment, whose run time grows with the square of the number and the length of the aligned sequences and requires significant computational resources. In this work, we present a method to efficiently discover regions of high similarity across multiple sequences without performing expensive sequence alignment. The method is based on approximating edit distance between segments of sequences using p-mer frequency counts. Then, efficient high-throughput data stream clustering is used to group highly similar segments into so called quasi-alignments. Quasi-alignments have numerous applications such as identifying species and their taxonomic class from sequences, comparing sequences for similarities, and, as in this paper, discovering conserved regions across related sequences. Results In this paper, we show that quasi-alignments can be used to discover highly similar segments across multiple sequences from related or different genomes efficiently and accurately. Experiments on a large number of unaligned 16S rRNA sequences obtained from the Greengenes database show that the method is able to identify conserved regions which agree with known hypervariable regions in 16S rRNA. Furthermore, the experiments show that the proposed method scales well for large data sets with a run time that grows only linearly with the number and length of sequences, whereas for existing multiple sequence alignment heuristics the run time grows super-linearly. Conclusion Quasi-alignment-based algorithms can detect highly similar regions and conserved areas across multiple sequences. Since the run time is linear and the sequences are converted into a compact clustering model, we are able to identify conserved regions fast or even interactively using a standard PC. Our method has many potential applications such as finding characteristic signature sequences for families of organisms and studying conserved and variable regions in, for example, 16S rRNA. PMID:24564200
Fast discovery and visualization of conserved regions in DNA sequences using quasi-alignment.
Nagar, Anurag; Hahsler, Michael
2013-01-01
Next Generation Sequencing techniques are producing enormous amounts of biological sequence data and analysis becomes a major computational problem. Currently, most analysis, especially the identification of conserved regions, relies heavily on Multiple Sequence Alignment and its various heuristics such as progressive alignment, whose run time grows with the square of the number and the length of the aligned sequences and requires significant computational resources. In this work, we present a method to efficiently discover regions of high similarity across multiple sequences without performing expensive sequence alignment. The method is based on approximating edit distance between segments of sequences using p-mer frequency counts. Then, efficient high-throughput data stream clustering is used to group highly similar segments into so called quasi-alignments. Quasi-alignments have numerous applications such as identifying species and their taxonomic class from sequences, comparing sequences for similarities, and, as in this paper, discovering conserved regions across related sequences. In this paper, we show that quasi-alignments can be used to discover highly similar segments across multiple sequences from related or different genomes efficiently and accurately. Experiments on a large number of unaligned 16S rRNA sequences obtained from the Greengenes database show that the method is able to identify conserved regions which agree with known hypervariable regions in 16S rRNA. Furthermore, the experiments show that the proposed method scales well for large data sets with a run time that grows only linearly with the number and length of sequences, whereas for existing multiple sequence alignment heuristics the run time grows super-linearly. Quasi-alignment-based algorithms can detect highly similar regions and conserved areas across multiple sequences. Since the run time is linear and the sequences are converted into a compact clustering model, we are able to identify conserved regions fast or even interactively using a standard PC. Our method has many potential applications such as finding characteristic signature sequences for families of organisms and studying conserved and variable regions in, for example, 16S rRNA.
The Function of Neuroendocrine Cells in Prostate Cancer
2013-04-01
integration site. We then performed deep sequencing and aligned reads to the genome. Our analysis revealed that both histological phenotypes are derived from...lentiviral integration site analysis . (B) Laser capture microdissection was performed on individual glands containing both squamous and...lentiviral integration site analysis . LTR: long terminal repeat (viral DNA), PCR: polymerase chain reaction. (D) Venn diagrams depict shared lentiviral
Bacterial Genome Engineering and Synthetic Biology: Combating Pathogens
2016-11-04
engineering and SB methods such as recombineering, clustered regularly interspaced short palindromic repeats ( CRISPR ), and bacterial cell-cell...Cholera# Yersinia pseudotuberculosis# Staphylococcus aureus* Phage Engineering CRISPR /Cas9 Delivery of CRISPR genes and RNA guides for sequence...bear very close sequence alignment to the harmless strains via the use of the CRISPR /Cas9 system. The CRISPR system specifically targets a DNA sequence
A phase and frequency alignment protocol for 1H MRSI data of the prostate.
Wright, Alan J; Buydens, Lutgarde M C; Heerschap, Arend
2012-05-01
(1)H MRSI of the prostate reveals relative metabolite levels that vary according to the presence or absence of tumour, providing a sensitive method for the identification of patients with cancer. Current interpretations of prostate data rely on quantification algorithms that fit model metabolite resonances to individual voxel spectra and calculate relative levels of metabolites, such as choline, creatine, citrate and polyamines. Statistical pattern recognition techniques can potentially improve the detection of prostate cancer, but these analyses are hampered by artefacts and sources of noise in the data, such as variations in phase and frequency of resonances. Phase and frequency variations may arise as a result of spatial field gradients or local physiological conditions affecting the frequency of resonances, in particular those of citrate. Thus, there are unique challenges in developing a peak alignment algorithm for these data. We have developed a frequency and phase correction algorithm for automatic alignment of the resonances in prostate MRSI spectra. We demonstrate, with a simulated dataset, that alignment can be achieved to a phase standard deviation of 0.095 rad and a frequency standard deviation of 0.68 Hz for the citrate resonances. Three parameters were used to assess the improvement in peak alignment in the MRSI data of five patients: the percentage of variance in all MRSI spectra explained by their first principal component; the signal-to-noise ratio of a spectrum formed by taking the median value of the entire set at each spectral point; and the mean cross-correlation between all pairs of spectra. These parameters showed a greater similarity between spectra in all five datasets and the simulated data, demonstrating improved alignment for phase and frequency in these spectra. This peak alignment program is expected to improve pattern recognition significantly, enabling accurate detection and localisation of prostate cancer with MRSI. Copyright © 2011 John Wiley & Sons, Ltd.
Short-read, high-throughput sequencing technology for STR genotyping
Bornman, Daniel M.; Hester, Mark E.; Schuetter, Jared M.; Kasoji, Manjula D.; Minard-Smith, Angela; Barden, Curt A.; Nelson, Scott C.; Godbold, Gene D.; Baker, Christine H.; Yang, Boyu; Walther, Jacquelyn E.; Tornes, Ivan E.; Yan, Pearlly S.; Rodriguez, Benjamin; Bundschuh, Ralf; Dickens, Michael L.; Young, Brian A.; Faith, Seth A.
2013-01-01
DNA-based methods for human identification principally rely upon genotyping of short tandem repeat (STR) loci. Electrophoretic-based techniques for variable-length classification of STRs are universally utilized, but are limited in that they have relatively low throughput and do not yield nucleotide sequence information. High-throughput sequencing technology may provide a more powerful instrument for human identification, but is not currently validated for forensic casework. Here, we present a systematic method to perform high-throughput genotyping analysis of the Combined DNA Index System (CODIS) STR loci using short-read (150 bp) massively parallel sequencing technology. Open source reference alignment tools were optimized to evaluate PCR-amplified STR loci using a custom designed STR genome reference. Evaluation of this approach demonstrated that the 13 CODIS STR loci and amelogenin (AMEL) locus could be accurately called from individual and mixture samples. Sensitivity analysis showed that as few as 18,500 reads, aligned to an in silico referenced genome, were required to genotype an individual (>99% confidence) for the CODIS loci. The power of this technology was further demonstrated by identification of variant alleles containing single nucleotide polymorphisms (SNPs) and the development of quantitative measurements (reads) for resolving mixed samples. PMID:25621315
Phylogenetic Analyses of Meloidogyne Small Subunit rDNA.
De Ley, Irma Tandingan; De Ley, Paul; Vierstraete, Andy; Karssen, Gerrit; Moens, Maurice; Vanfleteren, Jacques
2002-12-01
Phylogenies were inferred from nearly complete small subunit (SSU) 18S rDNA sequences of 12 species of Meloidogyne and 4 outgroup taxa (Globodera pallida, Nacobbus abberans, Subanguina radicicola, and Zygotylenchus guevarai). Alignments were generated manually from a secondary structure model, and computationally using ClustalX and Treealign. Trees were constructed using distance, parsimony, and likelihood algorithms in PAUP* 4.0b4a. Obtained tree topologies were stable across algorithms and alignments, supporting 3 clades: clade I = [M. incognita (M. javanica, M. arenaria)]; clade II = M. duytsi and M. maritima in an unresolved trichotomy with (M. hapla, M. microtyla); and clade III = (M. exigua (M. graminicola, M. chitwoodi)). Monophyly of [(clade I, clade II) clade III] was given maximal bootstrap support (mbs). M. artiellia was always a sister taxon to this joint clade, while M. ichinohei was consistently placed with mbs as a basal taxon within the genus. Affinities with the outgroup taxa remain unclear, although G. pallida and S. radicicola were never placed as closest relatives of Meloidogyne. Our results show that SSU sequence data are useful in addressing deeper phylogeny within Meloidogyne, and that both M. ichinohei and M. artiellia are credible outgroups for phylogenetic analysis of speciations among the major species.
D'Haese, Cyrille A
2002-01-01
Emergence from an aquatic environment to the land is one of the major evolutionary transitions within the arthropods. It is often considered that the first hexapods, and in particular the first springtails, were semi-aquatic and this assumption drives evolutionary models towards particular conclusions. To address the question of the ecological origin of the springtails, phylogenetic analyses by optimization alignment were performed on D1 and D2 regions of the 28S rDNA for 55 collembolan exemplars and eight outgroups. Relationships among the orders Symphypleona, Entomobryomorpha and Poduromorpha are inferred. More specifically, a robust hypothesis is provided for the subfamilial relationships within the order Poduromorpha. Contrary to previous statements, the semi-aquatic species Podura aquatica is not basal or 'primitive', but well nested in the Poduromorpha. The analyses performed for the 24 different weighting schemes yielded the same conclusion: semi-aquatic ecology is not ancestral for the springtails. It is a derived condition that evolved independently several times. The adaptation for semi-aquatic life is better interpreted as a step towards independence from land, rather than indication of an aquatic origin. PMID:12061958
Sun, Xiaoqin; Wei, Yanglian; Qin, Minjian; Guo, Qiaosheng; Guo, Jianlin; Zhou, Yifeng; Hang, Yueyu
2012-03-01
The rDNA ITS region of 18 samples of Changium smyrnioides from 7 areas and of 2 samples of Chuanminshen violaceum were sequenced and analyzed. The amplified ITS region of the samples, including a partial sequence of ITS1 and complete sequences of 5.8S and ITS2, had a total length of 555 bp. After complete alignment, there were 49 variable sites, of which 45 were informative, when gaps were treated as missing data. Samples of C. smyrnioides from different locations could be identified exactly based on the variable sites. The maximum parsimony (MP) and neighbor joining (NJ) tree constructed from the ITS sequences based on Kumar's two-parameter model showed that the genetic distances of the C. smyrnioides samples from different locations were not always related to their geographical distances. A specific primer set for Allele-specific PCR authentication of C. violaceum from Jurong of Jiangsu was designed based on the SNP in the ITS sequence alignment. C. violaceum from the major genuine producing area in Jurong of Jiangsu could be identified exactly and quickly by Allele-specific PCR.
Phylogenetic Analyses of Meloidogyne Small Subunit rDNA
De Ley, Irma Tandingan; De Ley, Paul; Vierstraete, Andy; Karssen, Gerrit; Moens, Maurice; Vanfleteren, Jacques
2002-01-01
Phylogenies were inferred from nearly complete small subunit (SSU) 18S rDNA sequences of 12 species of Meloidogyne and 4 outgroup taxa (Globodera pallida, Nacobbus abberans, Subanguina radicicola, and Zygotylenchus guevarai). Alignments were generated manually from a secondary structure model, and computationally using ClustalX and Treealign. Trees were constructed using distance, parsimony, and likelihood algorithms in PAUP* 4.0b4a. Obtained tree topologies were stable across algorithms and alignments, supporting 3 clades: clade I = [M. incognita (M. javanica, M. arenaria)]; clade II = M. duytsi and M. maritima in an unresolved trichotomy with (M. hapla, M. microtyla); and clade III = (M. exigua (M. graminicola, M. chitwoodi)). Monophyly of [(clade I, clade II) clade III] was given maximal bootstrap support (mbs). M. artiellia was always a sister taxon to this joint clade, while M. ichinohei was consistently placed with mbs as a basal taxon within the genus. Affinities with the outgroup taxa remain unclear, although G. pallida and S. radicicola were never placed as closest relatives of Meloidogyne. Our results show that SSU sequence data are useful in addressing deeper phylogeny within Meloidogyne, and that both M. ichinohei and M. artiellia are credible outgroups for phylogenetic analysis of speciations among the major species. PMID:19265950
Gregg, Jacob; Thompson, Rachel L.; Purcell, Maureen; Friedman, Carolyn S.; Hershberger, Paul
2016-01-01
Despite their widespread, global impact in both wild and cultured fishes, little is known of the diversity, transmission patterns, and phylogeography of parasites generally identified as Ichthyophonus. This study constructed a phylogeny based on the structural alignment of internal transcribed spacer (ITS) rDNA sequences to compare Ichthyophonus isolates from fish hosts in the Atlantic and Pacific oceans, and several rivers and aquaculture sites in North America, Europe, and Japan. Structure of the Ichthyophonus ITS1–5.8S–ITS2 transcript exhibited several homologies with other eukaryotes, and 6 distinct clades were identified within Ichthyophonus. A single clade contained a majority (71 of 98) of parasite isolations. This ubiquitous Ichthyophonus type occurred in 13 marine and anadromous hosts and was associated with epizootics in Atlantic herring, Chinook salmon, and American shad. A second clade contained all isolates from aquaculture, despite great geographic separation of the freshwater hosts. Each of the 4 remaining clades contained isolates from single host species. This study is the first to evaluate the genetic relationships among Ichthyophonus species across a significant portion of their host and geographic range. Additionally, parasite infection prevalence is reported in 16 fish species.
Quantitative modeling and optimization of magnetic tweezers.
Lipfert, Jan; Hao, Xiaomin; Dekker, Nynke H
2009-06-17
Magnetic tweezers are a powerful tool to manipulate single DNA or RNA molecules and to study nucleic acid-protein interactions in real time. Here, we have modeled the magnetic fields of permanent magnets in magnetic tweezers and computed the forces exerted on superparamagnetic beads from first principles. For simple, symmetric geometries the magnetic fields can be calculated semianalytically using the Biot-Savart law. For complicated geometries and in the presence of an iron yoke, we employ a finite-element three-dimensional PDE solver to numerically solve the magnetostatic problem. The theoretical predictions are in quantitative agreement with direct Hall-probe measurements of the magnetic field and with measurements of the force exerted on DNA-tethered beads. Using these predictive theories, we systematically explore the effects of magnet alignment, magnet spacing, magnet size, and of adding an iron yoke to the magnets on the forces that can be exerted on tethered particles. We find that the optimal configuration for maximal stretching forces is a vertically aligned pair of magnets, with a minimal gap between the magnets and minimal flow cell thickness. Following these principles, we present a configuration that allows one to apply > or = 40 pN stretching forces on approximately 1-microm tethered beads.
Quantitative Modeling and Optimization of Magnetic Tweezers
Lipfert, Jan; Hao, Xiaomin; Dekker, Nynke H.
2009-01-01
Abstract Magnetic tweezers are a powerful tool to manipulate single DNA or RNA molecules and to study nucleic acid-protein interactions in real time. Here, we have modeled the magnetic fields of permanent magnets in magnetic tweezers and computed the forces exerted on superparamagnetic beads from first principles. For simple, symmetric geometries the magnetic fields can be calculated semianalytically using the Biot-Savart law. For complicated geometries and in the presence of an iron yoke, we employ a finite-element three-dimensional PDE solver to numerically solve the magnetostatic problem. The theoretical predictions are in quantitative agreement with direct Hall-probe measurements of the magnetic field and with measurements of the force exerted on DNA-tethered beads. Using these predictive theories, we systematically explore the effects of magnet alignment, magnet spacing, magnet size, and of adding an iron yoke to the magnets on the forces that can be exerted on tethered particles. We find that the optimal configuration for maximal stretching forces is a vertically aligned pair of magnets, with a minimal gap between the magnets and minimal flow cell thickness. Following these principles, we present a configuration that allows one to apply ≥40 pN stretching forces on ≈1-μm tethered beads. PMID:19527664
NASA Astrophysics Data System (ADS)
Lo, Yi-Chuan; Lee, Chih-Hsiung; Lin, Hsun-Peng; Peng, Chiou-Shian
1998-06-01
Several continuous splits for wafer alignment target topography conditions to improve epitaxy film alignment were applied. The alignment evaluation among former layer pad oxide thickness (250 angstrom - 500 angstrom), drive oxide thickness (6000 angstrom - 10000 angstrom), nitride film thickness (600 angstrom - 1500 angstrom), initial oxide etch (fully wet etch, fully dry etch and dry plus wet etch) will be split to this experiment. Also various epitaxy deposition recipe such as: epitaxy source (SiHCl2 or SiCHCl3) and growth rate (1.3 micrometer/min approximately 2.0 micrometer/min) will be used to optimize the process window for alignment issue. All the reflectance signal and cross section photography of alignment target during NIKON stepper alignment process will be examined. Experimental results show epitaxy recipe plays an important role to wafer alignment. Low growth rate with good performance conformity epitaxy lead to alignment target avoid washout, pattern shift and distortion. All the results (signal monitor and film character) combined with NIKON's stepper standard laser scanning alignment system will be discussed in this paper.
High yield growth of patterned vertically aligned carbon nanotubes using inkjet-printed catalyst.
Beard, James D; Stringer, Jonathan; Ghita, Oana R; Smith, Patrick J
2013-10-09
This study reports on the fabrication of vertically aligned carbon nanotubes localized at specific sites on a growth substrate by deposition of a nanoparticle suspension using inkjet printing. Carbon nanotubes were grown with high yield as vertically aligned forests to a length of approximately 400 μm. The use of inkjet printing for catalyst fabrication considerably improves the production rate of vertically aligned patterned nanotube forests compared with conventional patterning techniques, for example, electron beam lithography or photolithography.
de Bortoli, Caroline P; André, Marcos R; Braga, Maria do Socorro C; Machado, Rosangela Zacarias
2011-10-01
Few molecular studies have been done concerning the molecular characterization of Hepatozoon species among domestic and wild felids. The present work aimed to characterize molecularly the presence of Hepatozoon sp. DNA in cat blood samples from São Luís Island, Maranhão state, Northeastern Brazil. EDTA-whole blood samples were collected from 200 domestic cats with outdoor and wood areas access from São Luís, Maranhão, Brazil. Each sample of extracted DNA was used as a template in PCR reactions aiming to amplify a partial sequence of 18S rRNA of Hepatozoon spp. We also performed sequence alignment to establish the identity of the parasite species infecting these animals using DNA sequences based on 18S rRNA. From 200 sampled cats, Hepatozoon DNA was only found in one animal (0.5%). The found Hepatozoon DNA showed 97% of identity with Hemobartonella felis isolates 1 and 2 from Spain. When analyzing the phylogenetic tree, the found Hepatozoon DNA was in the same clade than H. felis isolates. Our findings suggest that more than one species of Hepatozoon could infect felids in Brazil.
Genomic sequencing of Pleistocene cave bears
DOE Office of Scientific and Technical Information (OSTI.GOV)
Noonan, James P.; Hofreiter, Michael; Smith, Doug
2005-04-01
Despite the information content of genomic DNA, ancient DNA studies to date have largely been limited to amplification of mitochondrial DNA due to technical hurdles such as contamination and degradation of ancient DNAs. In this study, we describe two metagenomic libraries constructed using unamplified DNA extracted from the bones of two 40,000-year-old extinct cave bears. Analysis of {approx}1 Mb of sequence from each library showed that, despite significant microbial contamination, 5.8 percent and 1.1 percent of clones in the libraries contain cave bear inserts, yielding 26,861 bp of cave bear genome sequence. Alignment of this sequence to the dog genome,more » the closest sequenced genome to cave bear in terms of evolutionary distance, revealed roughly the expected ratio of cave bear exons, repeats and conserved noncoding sequences. Only 0.04 percent of all clones sequenced were derived from contamination with modern human DNA. Comparison of cave bear with orthologous sequences from several modern bear species revealed the evolutionary relationship of these lineages. Using the metagenomic approach described here, we have recovered substantial quantities of mammalian genomic sequence more than twice as old as any previously reported, establishing the feasibility of ancient DNA genomic sequencing programs.« less
Condensin promotes the juxtaposition of DNA flanking its loading site in Bacillus subtilis
Wang, Xindan; Le, Tung B.K.; Lajoie, Bryan R.; Dekker, Job; Laub, Michael T.; Rudner, David Z.
2015-01-01
SMC condensin complexes play a central role in compacting and resolving replicated chromosomes in virtually all organisms, yet how they accomplish this remains elusive. In Bacillus subtilis, condensin is loaded at centromeric parS sites, where it encircles DNA and individualizes newly replicated origins. Using chromosome conformation capture and cytological assays, we show that condensin recruitment to origin-proximal parS sites is required for the juxtaposition of the two chromosome arms. Recruitment to ectopic parS sites promotes alignment of large tracks of DNA flanking these sites. Importantly, insertion of parS sites on opposing arms indicates that these “zip-up” interactions only occur between adjacent DNA segments. Collectively, our data suggest that condensin resolves replicated origins by promoting the juxtaposition of DNA flanking parS sites, drawing sister origins in on themselves and away from each other. These results are consistent with a model in which condensin encircles the DNA flanking its loading site and then slides down, tethering the two arms together. Lengthwise condensation via loop extrusion could provide a generalizable mechanism by which condensin complexes act dynamically to individualize origins in B. subtilis and, when loaded along eukaryotic chromosomes, resolve them during mitosis. PMID:26253537
Beccari, T; Hoade, J; Orlacchio, A; Stirling, J L
1992-01-01
cDNAs encoding the mouse beta-N-acetylhexosaminidase alpha-subunit were isolated from a mouse testis library. The longest of these (1.7 kb) was sequenced and showed 83% similarity with the human alpha-subunit cDNA sequence. The 5' end of the coding sequence was obtained from a genomic DNA clone. Alignment of the human and mouse sequences showed that all three putative N-glycosylation sites are conserved, but that the mouse alpha-subunit has an additional site towards the C-terminus. All eight cysteines in the human sequence are conserved in the mouse. There are an additional two cysteines in the mouse alpha-subunit signal peptide. All amino acids affected in Tay-Sachs-disease mutations are conserved in the mouse. Images Fig. 1. PMID:1379046
Studies on in situ magnetic alignment of bonded anisotropic Nd-Fe-B alloy powders
Nlebedim, I. C.; Ucar, Huseyin; Hatter, Christine B.; ...
2016-08-30
We presented some considerations for achieving high degree of alignment in polymer bonded permanent magnets via the results of a study on in situ magnetic alignment of anisotropic Nd-Fe-B magnet powders. Contributions from effect of the alignment temperature, alignment magnetic field and the properties of the polymer on the hard magnetic properties of the bonded magnet were considered. Moreover, the thermo-rheological properties of the polymer and the response of the magnet powders to the applied magnetic field indicate that hard magnetic properties were optimized at an alignment temperature just above the melting temperature of the EVA co-polymer. This agrees withmore » an observed correlation between the change in magnetization due to improved magnetic alignment of the anisotropic powders and the change in viscosity of the binder. Finally, manufacturing cost can be minimized by identifying optimum alignment temperatures and magnetic field strengths.« less
Projected power iteration for network alignment
NASA Astrophysics Data System (ADS)
Onaran, Efe; Villar, Soledad
2017-08-01
The network alignment problem asks for the best correspondence between two given graphs, so that the largest possible number of edges are matched. This problem appears in many scientific problems (like the study of protein-protein interactions) and it is very closely related to the quadratic assignment problem which has graph isomorphism, traveling salesman and minimum bisection problems as particular cases. The graph matching problem is NP-hard in general. However, under some restrictive models for the graphs, algorithms can approximate the alignment efficiently. In that spirit the recent work by Feizi and collaborators introduce EigenAlign, a fast spectral method with convergence guarantees for Erd-s-Renyí graphs. In this work we propose the algorithm Projected Power Alignment, which is a projected power iteration version of EigenAlign. We numerically show it improves the recovery rates of EigenAlign and we describe the theory that may be used to provide performance guarantees for Projected Power Alignment.
Studies on in situ magnetic alignment of bonded anisotropic Nd-Fe-B alloy powders
NASA Astrophysics Data System (ADS)
Nlebedim, I. C.; Ucar, Huseyin; Hatter, Christine B.; McCallum, R. W.; McCall, Scott K.; Kramer, M. J.; Paranthaman, M. Parans
2017-01-01
Considerations for achieving high degree of alignment in polymer bonded permanent magnets are presented via the results of a study on in situ magnetic alignment of anisotropic Nd-Fe-B magnet powders. Contributions from effect of the alignment temperature, alignment magnetic field and the properties of the polymer on the hard magnetic properties of the bonded magnet were considered. The thermo-rheological properties of the polymer and the response of the magnet powders to the applied magnetic field indicate that hard magnetic properties were optimized at an alignment temperature just above the melting temperature of the EVA co-polymer. This agrees with an observed correlation between the change in magnetization due to improved magnetic alignment of the anisotropic powders and the change in viscosity of the binder. Manufacturing cost can be minimized by identifying optimum alignment temperatures and magnetic field strengths.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Poliakov, Alexander; Couronne, Olivier
2002-11-04
Aligning large vertebrate genomes that are structurally complex poses a variety of problems not encountered on smaller scales. Such genomes are rich in repetitive elements and contain multiple segmental duplications, which increases the difficulty of identifying true orthologous SNA segments in alignments. The sizes of the sequences make many alignment algorithms designed for comparing single proteins extremely inefficient when processing large genomic intervals. We integrated both local and global alignment tools and developed a suite of programs for automatically aligning large vertebrate genomes and identifying conserved non-coding regions in the alignments. Our method uses the BLAT local alignment program tomore » find anchors on the base genome to identify regions of possible homology for a query sequence. These regions are postprocessed to find the best candidates which are then globally aligned using the AVID global alignment program. In the last step conserved non-coding segments are identified using VISTA. Our methods are fast and the resulting alignments exhibit a high degree of sensitivity, covering more than 90% of known coding exons in the human genome. The GenomeVISTA software is a suite of Perl programs that is built on a MySQL database platform. The scheduler gets control data from the database, builds a queve of jobs, and dispatches them to a PC cluster for execution. The main program, running on each node of the cluster, processes individual sequences. A Perl library acts as an interface between the database and the above programs. The use of a separate library allows the programs to function independently of the database schema. The library also improves on the standard Perl MySQL database interfere package by providing auto-reconnect functionality and improved error handling.« less
Vertically aligned carbon nanofiber electrode arrays for nucleic acid detection
NASA Astrophysics Data System (ADS)
Arumugam, Prabhu U.; Yu, Edmond; Riviere, Roger; Meyyappan, M.
2010-10-01
We present electrochemical detection of DNA targets that corresponds to Escherichia coli O157:H7 16S rRNA gene using a nanoelectrode array consisting of vertically aligned carbon nanofiber (VACNF) electrodes. Parylene C is used as gap filling 'matrix' material to avoid high temperature processing in electrode construction. This easy to deposit film of several micron heights provides a conformal coating between the high aspect ratio VACNFs with negligible pin-holes. The low background currents show the potential of this approach for ultra-sensitive detection. Consistent and reproducible electrochemical-signals are achieved using a simple electrode preparation. This simple, reliable and low-cost approach is a forward step in developing practical sensors for applications like pathogen detection, early cancer diagnosis and environmental monitoring.
Galpert, Deborah; Fernández, Alberto; Herrera, Francisco; Antunes, Agostinho; Molina-Ruiz, Reinaldo; Agüero-Chapin, Guillermin
2018-05-03
The development of new ortholog detection algorithms and the improvement of existing ones are of major importance in functional genomics. We have previously introduced a successful supervised pairwise ortholog classification approach implemented in a big data platform that considered several pairwise protein features and the low ortholog pair ratios found between two annotated proteomes (Galpert, D et al., BioMed Research International, 2015). The supervised models were built and tested using a Saccharomycete yeast benchmark dataset proposed by Salichos and Rokas (2011). Despite several pairwise protein features being combined in a supervised big data approach; they all, to some extent were alignment-based features and the proposed algorithms were evaluated on a unique test set. Here, we aim to evaluate the impact of alignment-free features on the performance of supervised models implemented in the Spark big data platform for pairwise ortholog detection in several related yeast proteomes. The Spark Random Forest and Decision Trees with oversampling and undersampling techniques, and built with only alignment-based similarity measures or combined with several alignment-free pairwise protein features showed the highest classification performance for ortholog detection in three yeast proteome pairs. Although such supervised approaches outperformed traditional methods, there were no significant differences between the exclusive use of alignment-based similarity measures and their combination with alignment-free features, even within the twilight zone of the studied proteomes. Just when alignment-based and alignment-free features were combined in Spark Decision Trees with imbalance management, a higher success rate (98.71%) within the twilight zone could be achieved for a yeast proteome pair that underwent a whole genome duplication. The feature selection study showed that alignment-based features were top-ranked for the best classifiers while the runners-up were alignment-free features related to amino acid composition. The incorporation of alignment-free features in supervised big data models did not significantly improve ortholog detection in yeast proteomes regarding the classification qualities achieved with just alignment-based similarity measures. However, the similarity of their classification performance to that of traditional ortholog detection methods encourages the evaluation of other alignment-free protein pair descriptors in future research.
Combing Chromosomal DNA Mediated by the SMC Complex: Structure and Mechanisms.
Kamada, Katsuhiko; Barillà, Daniela
2018-02-01
Genome maintenance requires various nucleoid-associated factors in prokaryotes. Among them, the SMC (Structural Maintenance of Chromosomes) protein has been thought to play a static role in the organization and segregation of the chromosome during cell division. However, recent studies have shown that the bacterial SMC is required to align left and right arms of the emerging chromosome and that the protein dynamically travels from origin to Ter region. A rod form of the SMC complex mediates DNA bridging and has been recognized as a machinery responsible for DNA loop extrusion, like eukaryotic condensin or cohesin complexes, which act as chromosome organizers. Attention is now turning to how the prototype of the complex is loaded on the entry site and translocated on chromosomal DNA, explaining its overall conformational changes at atomic levels. Here, we review and highlight recent findings concerning the prokaryotic SMC complex and discuss possible mechanisms from the viewpoint of protein architecture. © 2017 The Authors. BioEssays Published by WILEY Periodicals, Inc.
Controlled enzymatic cutting of DNA molecules adsorbed on surfaces using soft lithography
NASA Astrophysics Data System (ADS)
Auerbach, Alyssa; Budassi, Julia; Shea, Emily; Zhu, Ke; Sokolov, Jonathan
2013-03-01
The enzyme DNase I was applied to adsorbed and aligned DNA molecules (Lamda, 48.5 kilobase pairs (kbp), and T4, 165.6 kbp), stretched linearly on a surface, by stamping with a polydimethylsiloxane (PDMS) grating. The DNAs were cut by the enzyme into separated, micron-sized segments along the length of the molecules at positions determined by the grating dimensions (3-20 microns). Ozone-treated PDMS stamps were coated with DNase I solutions and placed in contact with surface-adsorbed DNA molecules deposited on a 750 polymethylmethacrylate (PMMA) film spun-cast onto a silicon substrate. The stamps were applied under pressure for times up to 15 minutes at 37 C. The cutting was observed by fluorescence microscopy imaging of DNA labeled with YOYO dye. Cutting was found to be efficient despite the steric hindrance due to surface attachment of the molecules. Methods for detaching and separating the cut segments for sequencing applications will be discussed. Supported by NSF-DMR program.
Athanasiou, Thanos
2016-01-01
Despite taking advantage of established learning from other industries, quality improvement initiatives in healthcare may struggle to outperform secular trends. The reasons for this are rarely explored in detail, and are often attributed merely to difficulties in engaging clinicians in quality improvement work. In a narrative review of the literature, we argue that this focus on clinicians, at the relative expense of managerial staff, has proven counterproductive. Clinical engagement is not a universal challenge; moreover, there is evidence that managers—particularly middle managers—also have a role to play in quality improvement. Yet managerial participation in quality improvement interventions is often assumed, rather than proven. We identify specific factors that influence the coordination of front-line staff and managers in quality improvement, and integrate these factors into a novel model: the model of alignment. We use this model to explore the implementation of an interdisciplinary intervention in a recent trial, describing different participation incentives and barriers for different staff groups. The extent to which clinical and managerial interests align may be an important determinant of the ultimate success of quality improvement interventions. PMID:26647411
Wang, Wei; Chen, Xiyuan
2018-02-23
In view of the fact the accuracy of the third-degree Cubature Kalman Filter (CKF) used for initial alignment under large misalignment angle conditions is insufficient, an improved fifth-degree CKF algorithm is proposed in this paper. In order to make full use of the innovation on filtering, the innovation covariance matrix is calculated recursively by an innovative sequence with an exponent fading factor. Then a new adaptive error covariance matrix scaling algorithm is proposed. The Singular Value Decomposition (SVD) method is used for improving the numerical stability of the fifth-degree CKF in this paper. In order to avoid the overshoot caused by excessive scaling of error covariance matrix during the convergence stage, the scaling scheme is terminated when the gradient of azimuth reaches the maximum. The experimental results show that the improved algorithm has better alignment accuracy with large misalignment angles than the traditional algorithm.
Carbone, Ignazio; White, James B; Miadlikowska, Jolanta; Arnold, A Elizabeth; Miller, Mark A; Kauff, Frank; U'Ren, Jana M; May, Georgiana; Lutzoni, François
2017-04-15
High-quality phylogenetic placement of sequence data has the potential to greatly accelerate studies of the diversity, systematics, ecology and functional biology of diverse groups. We developed the Tree-Based Alignment Selector (T-BAS) toolkit to allow evolutionary placement and visualization of diverse DNA sequences representing unknown taxa within a robust phylogenetic context, and to permit the downloading of highly curated, single- and multi-locus alignments for specific clades. In its initial form, T-BAS v1.0 uses a core phylogeny of 979 taxa (including 23 outgroup taxa, as well as 61 orders, 175 families and 496 genera) representing all 13 classes of largest subphylum of Fungi-Pezizomycotina (Ascomycota)-based on sequence alignments for six loci (nr5.8S, nrLSU, nrSSU, mtSSU, RPB1, RPB2 ). T-BAS v1.0 has three main uses: (i) Users may download alignments and voucher tables for members of the Pezizomycotina directly from the reference tree, facilitating systematics studies of focal clades. (ii) Users may upload sequence files with reads representing unknown taxa and place these on the phylogeny using either BLAST or phylogeny-based approaches, and then use the displayed tree to select reference taxa to include when downloading alignments. The placement of unknowns can be performed for large numbers of Sanger sequences obtained from fungal cultures and for alignable, short reads of environmental amplicons. (iii) User-customizable metadata can be visualized on the tree. T-BAS Version 1.0 is available online at http://tbas.hpc.ncsu.edu . Registration is required to access the CIPRES Science Gateway and NSF XSEDE's large computational resources. icarbon@ncsu.edu. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Ligand Binding Site Detection by Local Structure Alignment and Its Performance Complementarity
Lee, Hui Sun; Im, Wonpil
2013-01-01
Accurate determination of potential ligand binding sites (BS) is a key step for protein function characterization and structure-based drug design. Despite promising results of template-based BS prediction methods using global structure alignment (GSA), there is a room to improve the performance by properly incorporating local structure alignment (LSA) because BS are local structures and often similar for proteins with dissimilar global folds. We present a template-based ligand BS prediction method using G-LoSA, our LSA tool. A large benchmark set validation shows that G-LoSA predicts drug-like ligands’ positions in single-chain protein targets more precisely than TM-align, a GSA-based method, while the overall success rate of TM-align is better. G-LoSA is particularly efficient for accurate detection of local structures conserved across proteins with diverse global topologies. Recognizing the performance complementarity of G-LoSA to TM-align and a non-template geometry-based method, fpocket, a robust consensus scoring method, CMCS-BSP (Complementary Methods and Consensus Scoring for ligand Binding Site Prediction), is developed and shows improvement on prediction accuracy. The G-LoSA source code is freely available at http://im.bioinformatics.ku.edu/GLoSA. PMID:23957286
Historian: accurate reconstruction of ancestral sequences and evolutionary rates.
Holmes, Ian H
2017-04-15
Reconstruction of ancestral sequence histories, and estimation of parameters like indel rates, are improved by using explicit evolutionary models and summing over uncertain alignments. The previous best tool for this purpose (according to simulation benchmarks) was ProtPal, but this tool was too slow for practical use. Historian combines an efficient reimplementation of the ProtPal algorithm with performance-improving heuristics from other alignment tools. Simulation results on fidelity of rate estimation via ancestral reconstruction, along with evaluations on the structurally informed alignment dataset BAliBase 3.0, recommend Historian over other alignment tools for evolutionary applications. Historian is available at https://github.com/evoldoers/historian under the Creative Commons Attribution 3.0 US license. ihholmes+historian@gmail.com. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Vertically aligned carbon nanotubes as anode and air-cathode in single chamber microbial fuel cells
NASA Astrophysics Data System (ADS)
Amade, R.; Moreno, H. A.; Hussain, S.; Vila-Costa, M.; Bertran, E.
2016-10-01
Electrode optimization in microbial fuel cells is a key issue to improve the power output and cell performance. Vertically aligned carbon nanotubes (VACNTs) grown on low cost stainless-steel mesh present an attractive approach to increase the cell performance while avoiding the use of expensive Pt-based materials. In comparison with non-aligned carbon nanotubes (NACNTs), VACNTs increase the oxygen reduction reaction taking place at the cathode by a factor of two. In addition, vertical alignment also increases the power density up to 2.5 times with respect to NACNTs. VACNTs grown at the anode can further improve the cell performance by increasing the electrode surface area and thus the electron transfer between bacteria and the electrode. The maximum power density obtained using VACNTs was 14 mW/m2 and 160 mV output voltage.
Directionally Antagonistic Graphene Oxide-Polyurethane Hybrid Aerogel as a Sound Absorber.
Oh, Jung-Hwan; Kim, Jieun; Lee, Hyeongrae; Kang, Yeonjune; Oh, Il-Kwon
2018-06-21
Innovative sound absorbers, the design of which is based on carbon nanotubes and graphene derivatives, could be used to make more efficient sound absorbing materials because of their excellent intrinsic mechanical and chemical properties. However, controlling the directional alignments of low-dimensional carbon nanomaterials, such as restacking, alignment, and dispersion, has been a challenging problem when developing sound absorbing forms. Herein, we present the directionally antagonistic graphene oxide-polyurethane hybrid aerogel we developed as a sound absorber, the physical properties of which differ according to the alignment of the microscopic graphene oxide sheets. This porous graphene sound absorber has a microporous hierarchical cellular structure with adjustable stiffness and improved sound absorption performance, thereby overcoming the restrictions of both geometric and function-orientated functions. Furthermore, by controlling the inner cell size and aligned structure of graphene oxide layers in this study, we achieved remarkable improvement of the sound absorption performance at low frequency. This improvement is attributed to multiple scattering of incident and reflection waves on the aligned porous surfaces, and air-viscous resistance damping inside interconnected structures between the urethane foam and the graphene oxide network. Two anisotropic sound absorbers based on the directionally antagonistic graphene oxide-polyurethane hybrid aerogels were fabricated. They show remarkable differences owing to the opposite alignment of graphene oxide layers inside the polyurethane foam and are expected to be appropriate for the engineering design of sound absorbers in consideration of the wave direction.
Feature Based Retention Time Alignment for Improved HDX MS Analysis
NASA Astrophysics Data System (ADS)
Venable, John D.; Scuba, William; Brock, Ansgar
2013-04-01
An algorithm for retention time alignment of mass shifted hydrogen-deuterium exchange (HDX) data based on an iterative distance minimization procedure is described. The algorithm performs pairwise comparisons in an iterative fashion between a list of features from a reference file and a file to be time aligned to calculate a retention time mapping function. Features are characterized by their charge, retention time and mass of the monoisotopic peak. The algorithm is able to align datasets with mass shifted features, which is a prerequisite for aligning hydrogen-deuterium exchange mass spectrometry datasets. Confidence assignments from the fully automated processing of a commercial HDX software package are shown to benefit significantly from retention time alignment prior to extraction of deuterium incorporation values.
Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Daily, Jeffrey A.
Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. As a result, a faster intra-sequence pairwise alignment implementation is described and benchmarked. Using a 375 residue query sequence a speed of 136 billion cell updates permore » second (GCUPS) was achieved on a dual Intel Xeon E5-2670 12-core processor system, the highest reported for an implementation based on Farrar’s ’striped’ approach. When using only a single thread, parasail was 1.7 times faster than Rognes’s SWIPE. For many score matrices, parasail is faster than BLAST. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. In conclusion, applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.« less
Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments
Daily, Jeffrey A.
2016-02-10
Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. As a result, a faster intra-sequence pairwise alignment implementation is described and benchmarked. Using a 375 residue query sequence a speed of 136 billion cell updates permore » second (GCUPS) was achieved on a dual Intel Xeon E5-2670 12-core processor system, the highest reported for an implementation based on Farrar’s ’striped’ approach. When using only a single thread, parasail was 1.7 times faster than Rognes’s SWIPE. For many score matrices, parasail is faster than BLAST. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. In conclusion, applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.« less
Li, Chen-Yu; Hemmig, Elisa A.; Kong, Jinglin; Yoo, Jejoong; Hernández-Ainsa, Silvia
2015-01-01
The DNA origami technique can enable functionalization of inorganic structures for single-molecule electric current recordings. Experiments have shown that several layers of DNA molecules—a DNA origami plate— placed on top of a solid-state nanopore is permeable to ions. Here, we report a comprehensive characterization of the ionic conductivity of DNA origami plates by means of all-atom molecular dynamics (MD) simulations and nanocapillary electric current recordings. Using the MD method, we characterize the ionic conductivity of several origami constructs, revealing the local distribution of ions, the distribution of the electrostatic potential and contribution of different molecular species to the current. The simulations determine the dependence of the ionic conductivity on the applied voltage, the number of DNA layers, the nucleotide content and the lattice type of the plates. We demonstrate that increasing the concentration of Mg2+ ions makes the origami plates more compact, reducing their conductivity. The conductance of a DNA origami plate on top of a solid-state nanopore is determined by the two competing effects: bending of the DNA origami plate that reduces the current and separation of the DNA origami layers that increases the current. The latter is produced by the electro-osmotic flow and is reversible at the time scale of a hundred nanoseconds. The conductance of a DNA origami object is found to depend on its orientation, reaching maximum when the electric field aligns with the direction of the DNA helices. Our work demonstrates feasibility of programming the electrical properties of a self-assembled nanoscale object using DNA. PMID:25623807
Li, Chen-Yu; Hemmig, Elisa A; Kong, Jinglin; Yoo, Jejoong; Hernández-Ainsa, Silvia; Keyser, Ulrich F; Aksimentiev, Aleksei
2015-02-24
The DNA origami technique can enable functionalization of inorganic structures for single-molecule electric current recordings. Experiments have shown that several layers of DNA molecules, a DNA origami plate, placed on top of a solid-state nanopore is permeable to ions. Here, we report a comprehensive characterization of the ionic conductivity of DNA origami plates by means of all-atom molecular dynamics (MD) simulations and nanocapillary electric current recordings. Using the MD method, we characterize the ionic conductivity of several origami constructs, revealing the local distribution of ions, the distribution of the electrostatic potential and contribution of different molecular species to the current. The simulations determine the dependence of the ionic conductivity on the applied voltage, the number of DNA layers, the nucleotide content and the lattice type of the plates. We demonstrate that increasing the concentration of Mg(2+) ions makes the origami plates more compact, reducing their conductivity. The conductance of a DNA origami plate on top of a solid-state nanopore is determined by the two competing effects: bending of the DNA origami plate that reduces the current and separation of the DNA origami layers that increases the current. The latter is produced by the electro-osmotic flow and is reversible at the time scale of a hundred nanoseconds. The conductance of a DNA origami object is found to depend on its orientation, reaching maximum when the electric field aligns with the direction of the DNA helices. Our work demonstrates feasibility of programming the electrical properties of a self-assembled nanoscale object using DNA.
NASA Astrophysics Data System (ADS)
Gnapareddy, Bramaramba; Dugasani, Sreekantha Reddy; Son, Junyoung; Park, Sung Ha
2018-02-01
DNA is considered as a useful building bio-material, and it serves as an efficient template to align functionalized nanomaterials. Riboflavin (RF)-doped synthetic double-crossover DNA (DX-DNA) lattices and natural salmon DNA (SDNA) thin films were constructed using substrate-assisted growth and drop-casting methods, respectively, and their topological, chemical and electro-optical characteristics were evaluated. The critical doping concentrations of RF ([RF]C, approx. 5 mM) at given concentrations of DX-DNA and SDNA were obtained by observing the phase transition (from crystalline to amorphous structures) of DX-DNA and precipitation of SDNA in solution above [RF]C. [RF]C are verified by analysing the atomic force microscopy images for DX-DNA and current, absorbance and photoluminescence (PL) for SDNA. We study the physical characteristics of RF-embedded SDNA thin films, using the Fourier transform infrared spectrum to understand the interaction between the RF and DNA molecules, current to evaluate the conductance, absorption to understand the RF binding to the DNA and PL to analyse the energy transfer between the RF and DNA. The current and UV absorption band of SDNA thin films decrease up to [RF]C followed by an increase above [RF]C. By contrast, the PL intensity illustrates the reverse trend, as compared to the current and UV absorption behaviour as a function of the varying [RF]. Owing to the intense PL characteristic of RF, the DNA lattices and thin films with RF might offer immense potential to develop efficient bio-sensors and useful bio-photonic devices.
Gnapareddy, Bramaramba; Son, Junyoung
2018-01-01
DNA is considered as a useful building bio-material, and it serves as an efficient template to align functionalized nanomaterials. Riboflavin (RF)-doped synthetic double-crossover DNA (DX-DNA) lattices and natural salmon DNA (SDNA) thin films were constructed using substrate-assisted growth and drop-casting methods, respectively, and their topological, chemical and electro-optical characteristics were evaluated. The critical doping concentrations of RF ([RF]C, approx. 5 mM) at given concentrations of DX-DNA and SDNA were obtained by observing the phase transition (from crystalline to amorphous structures) of DX-DNA and precipitation of SDNA in solution above [RF]C. [RF]C are verified by analysing the atomic force microscopy images for DX-DNA and current, absorbance and photoluminescence (PL) for SDNA. We study the physical characteristics of RF-embedded SDNA thin films, using the Fourier transform infrared spectrum to understand the interaction between the RF and DNA molecules, current to evaluate the conductance, absorption to understand the RF binding to the DNA and PL to analyse the energy transfer between the RF and DNA. The current and UV absorption band of SDNA thin films decrease up to [RF]C followed by an increase above [RF]C. By contrast, the PL intensity illustrates the reverse trend, as compared to the current and UV absorption behaviour as a function of the varying [RF]. Owing to the intense PL characteristic of RF, the DNA lattices and thin films with RF might offer immense potential to develop efficient bio-sensors and useful bio-photonic devices. PMID:29515837
Nair, Pradeep S; John, Eugene B
2007-01-01
Aligning specific sequences against a very large number of other sequences is a central aspect of bioinformatics. With the widespread availability of personal computers in biology laboratories, sequence alignment is now often performed locally. This makes it necessary to analyse the performance of personal computers for sequence aligning bioinformatics benchmarks. In this paper, we analyse the performance of a personal computer for the popular BLAST and FASTA sequence alignment suites. Results indicate that these benchmarks have a large number of recurring operations and use memory operations extensively. It seems that the performance can be improved with a bigger L1-cache.
Rutvisuttinunt, Wiriya; Chinnawirotpisan, Piyawan; Simasathien, Sriluck; Shrestha, Sanjaya K; Yoon, In-Kyu; Klungthong, Chonticha; Fernandez, Stefan
2013-11-01
Active global surveillance and characterization of influenza viruses are essential for better preparation against possible pandemic events. Obtaining comprehensive information about the influenza genome can improve our understanding of the evolution of influenza viruses and emergence of new strains, and improve the accuracy when designing preventive vaccines. This study investigated the use of deep sequencing by the next-generation sequencing (NGS) Illumina MiSeq Platform to obtain complete genome sequence information from influenza virus isolates. The influenza virus isolates were cultured from 6 respiratory acute clinical specimens collected in Thailand and Nepal. DNA libraries obtained from each viral isolate were mixed and all were sequenced simultaneously. Total information of 2.6 Gbases was obtained from a 455±14 K/mm2 density with 95.76% (8,571,655/8,950,724 clusters) of the clusters passing quality control (QC) filters. Approximately 93.7% of all sequences from Read1 and 83.5% from Read2 contained high quality sequences that were ≥Q30, a base calling QC score standard. Alignments analysis identified three seasonal influenza A H3N2 strains, one 2009 pandemic influenza A H1N1 strain and two influenza B strains. The nearly entire genomes of all six virus isolates yielded equal or greater than 600-fold sequence coverage depth. MiSeq Platform identified seasonal influenza A H3N2, 2009 pandemic influenza A H1N1and influenza B in the DNA library mixtures efficiently. Copyright © 2013 The Authors. Published by Elsevier B.V. All rights reserved.
Iehisa, Julio Cesar Masaru; Ohno, Ryoko; Kimura, Tatsuro; Enoki, Hiroyuki; Nishimura, Satoru; Okamoto, Yuki; Nasuda, Shuhei; Takumi, Shigeo
2014-10-01
The large genome and allohexaploidy of common wheat have complicated construction of a high-density genetic map. Although improvements in the throughput of next-generation sequencing (NGS) technologies have made it possible to obtain a large amount of genotyping data for an entire mapping population by direct sequencing, including hexaploid wheat, a significant number of missing data points are often apparent due to the low coverage of sequencing. In the present study, a microarray-based polymorphism detection system was developed using NGS data obtained from complexity-reduced genomic DNA of two common wheat cultivars, Chinese Spring (CS) and Mironovskaya 808. After design and selection of polymorphic probes, 13,056 new markers were added to the linkage map of a recombinant inbred mapping population between CS and Mironovskaya 808. On average, 2.49 missing data points per marker were observed in the 201 recombinant inbred lines, with a maximum of 42. Around 40% of the new markers were derived from genic regions and 11% from repetitive regions. The low number of retroelements indicated that the new polymorphic markers were mainly derived from the less repetitive region of the wheat genome. Around 25% of the mapped sequences were useful for alignment with the physical map of barley. Quantitative trait locus (QTL) analyses of 14 agronomically important traits related to flowering, spikes, and seeds demonstrated that the new high-density map showed improved QTL detection, resolution, and accuracy over the original simple sequence repeat map. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Myamoto, D T; Pidde-Queiroz, G; Pedroso, A; Gonçalves-de-Andrade, R M; van den Berg, C W; Tambourgi, D V
2016-09-01
A transcriptome analysis of the venom glands of the spider Loxosceles laeta, performed by our group, in a previous study (Fernandes-Pedrosa et al., 2008), revealed a transcript with a sequence similar to the human complement component C3. Here we present the analysis of this transcript. cDNA fragments encoding the C3 homologue (Lox-C3) were amplified from total RNA isolated from the venom glands of L. laeta by RACE-PCR. Lox-C3 is a 5178 bps cDNA sequence encoding a 190kDa protein, with a domain configuration similar to human C3. Multiple alignments of C3-like proteins revealed two processing sites, suggesting that Lox-C3 is composed of three chains. Furthermore, the amino acids consensus sequences for the thioester was found, in addition to putative sequences responsible for FB binding. The phylogenetic analysis showed that Lox-C3 belongs to the same group as two C3 isoforms from the spider Hasarius adansoni (Family Salcitidae), showing 53% homology with these. This is the first characterization of a Loxosceles cDNA sequence encoding a human C3 homologue, and this finding, together with our previous finding of the expression of a FB-like molecule, suggests that this spider species also has a complement system. This work will help to improve our understanding of the innate immune system in these spiders and the ancestral structure of C3. Copyright © 2016 Elsevier GmbH. All rights reserved.
Li, Hong-Mei; Guo, Kang; Yu, Zhuang; Feng, Rui; Xu, Ping
2015-07-01
Traditional diagnostic technology with tumor biomarkers is inefficient, expensive and requires a large number of serum samples. The purpose of this study was to construct human lung cancer protein chips with new lung cancer biomarkers screened by the T7-phage display library, and improve the early diagnosis rate of lung cancer. A T7-phage cDNA display library was constructed of fresh samples from 30 lung cancer patients. With biopanning and high-throughput screening, we gained the immunogenic phage clones from the cDNA library. The insert of selected phage was blasted at GeneBank for alignment to find the exact or the most similar known genes. Protein chips were then constructed and used to assay their expression level in lung cancer serum from 217 cases of lung cancer groups:80 cases of benign lung disease and 220 healthy controls. After four rounds of Biopanning and two rounds of enzyme-linked immunosorbent assay, 12 phage monoclonal samples were selected from 2880 phage monoclonal samples. After blasting at GeneBank, six similar genes were used to construct diagnostic protein chips. The protein chips were then used to assay expression level in lung cancer serum. The expression level of six genes in lung cancer groups was significantly higher than those in the other two groups (P < 0.05). In this study, we successfully constructed diagnostic protein chips with biomarkers selected from the lung cancer T7-phage cDNA library, which can be used for the early screening of lung cancer patients.
SFESA: a web server for pairwise alignment refinement by secondary structure shifts.
Tong, Jing; Pei, Jimin; Grishin, Nick V
2015-09-03
Protein sequence alignment is essential for a variety of tasks such as homology modeling and active site prediction. Alignment errors remain the main cause of low-quality structure models. A bioinformatics tool to refine alignments is needed to make protein alignments more accurate. We developed the SFESA web server to refine pairwise protein sequence alignments. Compared to the previous version of SFESA, which required a set of 3D coordinates for a protein, the new server will search a sequence database for the closest homolog with an available 3D structure to be used as a template. For each alignment block defined by secondary structure elements in the template, SFESA evaluates alignment variants generated by local shifts and selects the best-scoring alignment variant. A scoring function that combines the sequence score of profile-profile comparison and the structure score of template-derived contact energy is used for evaluation of alignments. PROMALS pairwise alignments refined by SFESA are more accurate than those produced by current advanced alignment methods such as HHpred and CNFpred. In addition, SFESA also improves alignments generated by other software. SFESA is a web-based tool for alignment refinement, designed for researchers to compute, refine, and evaluate pairwise alignments with a combined sequence and structure scoring of alignment blocks. To our knowledge, the SFESA web server is the only tool that refines alignments by evaluating local shifts of secondary structure elements. The SFESA web server is available at http://prodata.swmed.edu/sfesa.
DNA-based watermarks using the DNA-Crypt algorithm.
Heider, Dominik; Barnekow, Angelika
2007-05-29
The aim of this paper is to demonstrate the application of watermarks based on DNA sequences to identify the unauthorized use of genetically modified organisms (GMOs) protected by patents. Predicted mutations in the genome can be corrected by the DNA-Crypt program leaving the encrypted information intact. Existing DNA cryptographic and steganographic algorithms use synthetic DNA sequences to store binary information however, although these sequences can be used for authentication, they may change the target DNA sequence when introduced into living organisms. The DNA-Crypt algorithm and image steganography are based on the same watermark-hiding principle, namely using the least significant base in case of DNA-Crypt and the least significant bit in case of the image steganography. It can be combined with binary encryption algorithms like AES, RSA or Blowfish. DNA-Crypt is able to correct mutations in the target DNA with several mutation correction codes such as the Hamming-code or the WDH-code. Mutations which can occur infrequently may destroy the encrypted information, however an integrated fuzzy controller decides on a set of heuristics based on three input dimensions, and recommends whether or not to use a correction code. These three input dimensions are the length of the sequence, the individual mutation rate and the stability over time, which is represented by the number of generations. In silico experiments using the Ypt7 in Saccharomyces cerevisiae shows that the DNA watermarks produced by DNA-Crypt do not alter the translation of mRNA into protein. The program is able to store watermarks in living organisms and can maintain the original information by correcting mutations itself. Pairwise or multiple sequence alignments show that DNA-Crypt produces few mismatches between the sequences similar to all steganographic algorithms.
Mohammed, Monzoorul Haque; Ghosh, Tarini Shankar; Chadaram, Sudha; Mande, Sharmila S
2011-11-30
Obtaining accurate estimates of microbial diversity using rDNA profiling is the first step in most metagenomics projects. Consequently, most metagenomic projects spend considerable amounts of time, money and manpower for experimentally cloning, amplifying and sequencing the rDNA content in a metagenomic sample. In the second step, the entire genomic content of the metagenome is extracted, sequenced and analyzed. Since DNA sequences obtained in this second step also contain rDNA fragments, rapid in silico identification of these rDNA fragments would drastically reduce the cost, time and effort of current metagenomic projects by entirely bypassing the experimental steps of primer based rDNA amplification, cloning and sequencing. In this study, we present an algorithm called i-rDNA that can facilitate the rapid detection of 16S rDNA fragments from amongst millions of sequences in metagenomic data sets with high detection sensitivity. Performance evaluation with data sets/database variants simulating typical metagenomic scenarios indicates the significantly high detection sensitivity of i-rDNA. Moreover, i-rDNA can process a million sequences in less than an hour on a simple desktop with modest hardware specifications. In addition to the speed of execution, high sensitivity and low false positive rate, the utility of the algorithmic approach discussed in this paper is immense given that it would help in bypassing the entire experimental step of primer-based rDNA amplification, cloning and sequencing. Application of this algorithmic approach would thus drastically reduce the cost, time and human efforts invested in all metagenomic projects. A web-server for the i-rDNA algorithm is available at http://metagenomics.atc.tcs.com/i-rDNA/
DNA-based watermarks using the DNA-Crypt algorithm
Heider, Dominik; Barnekow, Angelika
2007-01-01
Background The aim of this paper is to demonstrate the application of watermarks based on DNA sequences to identify the unauthorized use of genetically modified organisms (GMOs) protected by patents. Predicted mutations in the genome can be corrected by the DNA-Crypt program leaving the encrypted information intact. Existing DNA cryptographic and steganographic algorithms use synthetic DNA sequences to store binary information however, although these sequences can be used for authentication, they may change the target DNA sequence when introduced into living organisms. Results The DNA-Crypt algorithm and image steganography are based on the same watermark-hiding principle, namely using the least significant base in case of DNA-Crypt and the least significant bit in case of the image steganography. It can be combined with binary encryption algorithms like AES, RSA or Blowfish. DNA-Crypt is able to correct mutations in the target DNA with several mutation correction codes such as the Hamming-code or the WDH-code. Mutations which can occur infrequently may destroy the encrypted information, however an integrated fuzzy controller decides on a set of heuristics based on three input dimensions, and recommends whether or not to use a correction code. These three input dimensions are the length of the sequence, the individual mutation rate and the stability over time, which is represented by the number of generations. In silico experiments using the Ypt7 in Saccharomyces cerevisiae shows that the DNA watermarks produced by DNA-Crypt do not alter the translation of mRNA into protein. Conclusion The program is able to store watermarks in living organisms and can maintain the original information by correcting mutations itself. Pairwise or multiple sequence alignments show that DNA-Crypt produces few mismatches between the sequences similar to all steganographic algorithms. PMID:17535434
Development of Accurate Structure for Mounting and Aligning Thin-Foil X-Ray Mirrors
NASA Technical Reports Server (NTRS)
Heilmann, Ralf K.
2001-01-01
The goal of this work was to improve the assembly accuracy for foil x-ray optics as produced by the high-energy astrophysics group at the NASA Goddard Space Flight Center. Two main design choices lead to an alignment concept that was shown to improve accuracy well within the requirements currently pursued by the Constellation-X Spectroscopy X-Ray Telescope (SXT).
Rapid and highly fieldable viral diagnostic
McKnight, Timothy E.
2016-12-20
The present invention relates to a rapid, highly fieldable, nearly reagentless diagnostic to identify active RNA viral replication in a live, infected cells, and more particularly in leukocytes and tissue samples (including biopsies and nasal swabs) using an array of a plurality of vertically-aligned nanostructures that impale the cells and introduce a DNA reporter construct that is expressed and amplified in the presence of active viral replication.
Text-image alignment for historical handwritten documents
NASA Astrophysics Data System (ADS)
Zinger, S.; Nerbonne, J.; Schomaker, L.
2009-01-01
We describe our work on text-image alignment in context of building a historical document retrieval system. We aim at aligning images of words in handwritten lines with their text transcriptions. The images of handwritten lines are automatically segmented from the scanned pages of historical documents and then manually transcribed. To train automatic routines to detect words in an image of handwritten text, we need a training set - images of words with their transcriptions. We present our results on aligning words from the images of handwritten lines and their corresponding text transcriptions. Alignment based on the longest spaces between portions of handwriting is a baseline. We then show that relative lengths, i.e. proportions of words in their lines, can be used to improve the alignment results considerably. To take into account the relative word length, we define the expressions for the cost function that has to be minimized for aligning text words with their images. We apply right to left alignment as well as alignment based on exhaustive search. The quality assessment of these alignments shows correct results for 69% of words from 100 lines, or 90% of partially correct and correct alignments combined.
Acceleration of the Smith-Waterman algorithm using single and multiple graphics processors
NASA Astrophysics Data System (ADS)
Khajeh-Saeed, Ali; Poole, Stephen; Blair Perot, J.
2010-06-01
Finding regions of similarity between two very long data streams is a computationally intensive problem referred to as sequence alignment. Alignment algorithms must allow for imperfect sequence matching with different starting locations and some gaps and errors between the two data sequences. Perhaps the most well known application of sequence matching is the testing of DNA or protein sequences against genome databases. The Smith-Waterman algorithm is a method for precisely characterizing how well two sequences can be aligned and for determining the optimal alignment of those two sequences. Like many applications in computational science, the Smith-Waterman algorithm is constrained by the memory access speed and can be accelerated significantly by using graphics processors (GPUs) as the compute engine. In this work we show that effective use of the GPU requires a novel reformulation of the Smith-Waterman algorithm. The performance of this new version of the algorithm is demonstrated using the SSCA#1 (Bioinformatics) benchmark running on one GPU and on up to four GPUs executing in parallel. The results indicate that for large problems a single GPU is up to 45 times faster than a CPU for this application, and the parallel implementation shows linear speed up on up to 4 GPUs.
Lovell, Peter V; Huizinga, Nicole A; Getachew, Abel; Mees, Brianna; Friedrich, Samantha R; Wirthlin, Morgan; Mello, Claudio V
2018-05-18
Zebra finches are a major model organism for investigating mechanisms of vocal learning, a trait that enables spoken language in humans. The development of cDNA collections with expressed sequence tags (ESTs) and microarrays has allowed for extensive molecular characterizations of circuitry underlying vocal learning and production. However, poor database curation can lead to errors in transcriptome and bioinformatics analyses, limiting the impact of these resources. Here we used genomic alignments and synteny analysis for orthology verification to curate and reannotate ~ 35% of the oligonucleotides and corresponding ESTs/cDNAs that make-up Agilent microarrays for gene expression analysis in finches. We found that: (1) 5475 out of 43,084 oligos (a) failed to align to the zebra finch genome, (b) aligned to multiple loci, or (c) aligned to Chr_un only, and thus need to be flagged until a better genome assembly is available, or (d) reflect cloning artifacts; (2) Out of 9635 valid oligos examined further, 3120 were incorrectly named, including 1533 with no known orthologs; and (3) 2635 oligos required name update. The resulting curated dataset provides a reference for correcting gene identification errors in previous finch microarrays studies, and avoiding such errors in future studies.
Voleti, Pramod B; Hamula, Mathew J; Baldwin, Keith D; Lee, Gwo-Chin
2014-09-01
The purpose of this systematic review and meta-analysis is to compare patient-specific instrumentation (PSI) versus standard instrumentation for total knee arthroplasty (TKA) with regard to coronal and sagittal alignment, operative time, intraoperative blood loss, and cost. A systematic query in search of relevant studies was performed, and the data published in these studies were extracted and aggregated. In regard to coronal alignment, PSI demonstrated improved accuracy in femorotibial angle (FTA) (P=0.0003), while standard instrumentation demonstrated improved accuracy in hip-knee-ankle angle (HKA) (P=0.02). Importantly, there were no differences between treatment groups in the percentages of FTA or HKA outliers (>3 degrees from target alignment) (P=0.7). Sagittal alignment, operative time, intraoperative blood loss, and cost were also similar between groups (P>0.1 for all comparisons). Copyright © 2014 Elsevier Inc. All rights reserved.
Improved docking alignment system
NASA Technical Reports Server (NTRS)
Monford, Leo G. (Inventor)
1988-01-01
Improved techniques are provided for the alignment of two objects. The present invention is particularly suited for 3-D translation and 3-D rotational alignment of objects in outer space. A camera is affixed to one object, such as a remote manipulator arm of the spacecraft, while the planar reflective surface is affixed to the other object, such as a grapple fixture. A monitor displays in real-time images from the camera such that the monitor displays both the reflected image of the camera and visible marking on the planar reflective surface when the objects are in proper alignment. The monitor may thus be viewed by the operator and the arm manipulated so that the reflective surface is perpendicular to the optical axis of the camera, the roll of the reflective surface is at a selected angle with respect to the camera, and the camera is spaced a pre-selected distance from the reflective surface.
Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments.
Daily, Jeff
2016-02-10
Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. A faster intra-sequence local pairwise alignment implementation is described and benchmarked, including new global and semi-global variants. Using a 375 residue query sequence a speed of 136 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon E5-2670 24-core processor system, the highest reported for an implementation based on Farrar's 'striped' approach. Rognes's SWIPE optimal database search application is still generally the fastest available at 1.2 to at best 2.4 times faster than Parasail for sequences shorter than 500 amino acids. However, Parasail was faster for longer sequences. For global alignments, Parasail's prefix scan implementation is generally the fastest, faster even than Farrar's 'striped' approach, however the opal library is faster for single-threaded applications. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. Applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.
In-flight alignment using H ∞ filter for strapdown INS on aircraft.
Pei, Fu-Jun; Liu, Xuan; Zhu, Li
2014-01-01
In-flight alignment is an effective way to improve the accuracy and speed of initial alignment for strapdown inertial navigation system (INS). During the aircraft flight, strapdown INS alignment was disturbed by lineal and angular movements of the aircraft. To deal with the disturbances in dynamic initial alignment, a novel alignment method for SINS is investigated in this paper. In this method, an initial alignment error model of SINS in the inertial frame is established. The observability of the system is discussed by piece-wise constant system (PWCS) theory and observable degree is computed by the singular value decomposition (SVD) theory. It is demonstrated that the system is completely observable, and all the system state parameters can be estimated by optimal filter. Then a H ∞ filter was designed to resolve the uncertainty of measurement noise. The simulation results demonstrate that the proposed algorithm can reach a better accuracy under the dynamic disturbance condition.
A Method for the Alignment of Heterogeneous Macromolecules from Electron Microscopy
Shatsky, Maxim; Hall, Richard J.; Brenner, Steven E.; Glaeser, Robert M.
2009-01-01
We propose a feature-based image alignment method for single-particle electron microscopy that is able to accommodate various similarity scoring functions while efficiently sampling the two-dimensional transformational space. We use this image alignment method to evaluate the performance of a scoring function that is based on the Mutual Information (MI) of two images rather than one that is based on the cross-correlation function. We show that alignment using MI for the scoring function has far less model-dependent bias than is found with cross-correlation based alignment. We also demonstrate that MI improves the alignment of some types of heterogeneous data, provided that the signal to noise ratio is relatively high. These results indicate, therefore, that use of MI as the scoring function is well suited for the alignment of class-averages computed from single particle images. Our method is tested on data from three model structures and one real dataset. PMID:19166941
Multiple nodes transfer alignment for airborne missiles based on inertial sensor network
NASA Astrophysics Data System (ADS)
Si, Fan; Zhao, Yan
2017-09-01
Transfer alignment is an important initialization method for airborne missiles because the alignment accuracy largely determines the performance of the missile. However, traditional alignment methods are limited by complicated and unknown flexure angle, and cannot meet the actual requirement when wing flexure deformation occurs. To address this problem, we propose a new method that uses the relative navigation parameters between the weapons and fighter to achieve transfer alignment. First, in the relative inertial navigation algorithm, the relative attitudes and positions are constantly computed in wing flexure deformation situations. Secondly, the alignment results of each weapon are processed using a data fusion algorithm to improve the overall performance. Finally, the feasibility and performance of the proposed method were evaluated under two typical types of deformation, and the simulation results demonstrated that the new transfer alignment method is practical and has high-precision.
Effect of mat pilates exercise on postural alignment and body composition of middle-aged women.
Lee, Hyo Taek; Oh, Hyun Ok; Han, Hui Seung; Jin, Kwang Youn; Roh, Hyo Lyun
2016-06-01
[Purpose] This study attempted to examine whether Pilates is an effective exercise for improving the postural alignment and health of middle-aged women. [Subjects and Methods] The participants in this study were 36 middle-aged women (20 in the experimental group, 16 in the control group). The experimental group participated in Pilates exercise sessions three times a week for 12 weeks. Body alignment and composition measurements before and after applying the Pilates exercise program were performed with a body composition analyzer and a three-dimensional scanner. [Results] Postural alignment in the sagittal and horizontal planes was enhanced in the Pilates exercise group. Trunk alignment showed correlations with body fat and muscle mass. [Conclusion] The Pilates exercises are performed symmetrically and strengthen the deep muscles. Moreover, the results showed that muscle mass was correlated with trunk postural alignment and that the proper amount of muscle is critical in maintaining trunk postural alignment.
[Identification of original species of Mantidis Oötheca (Sangpiaoxiao) based on DNA barcoding].
Wang, Xi; Hou, Fei-xia; Wang, Yi-xuan; Wang, Yu-xian; Li, Jun-de; Yuan, Yuan; Peng, Cheng; Guo, Jin-lin
2015-10-01
Both market research and literature reports both found that the ootheca of mantodea was all used as medicine. However, Chinese Pharmacopoeia only records the ootheca of three mantis species. The clinical use of ootheca unrecorded in Chinese Pharmacopoeia, will pose potential risks to drug safety. It's urgent to identify the origin of Mantidis Oötheca. The current researches about original animal in Mantidis Oötheca are based on morphology and unanimous. DNA barcoding fill gaps of the traditional morphological identification, which is widely used in animal classification studies. This study first use DNA barcoding to analyze genetic distance among different Mantidis Oötheca types, align COI sequences between mantis and Mantidis Oötheca and construct the phylogeny tree. The result confirmed that Tenodera sinensis and Hierodula patellifera were the origin insects of Tuanpiaoxiao and Heipiaoxiao, respectively, and Statilia maculate and Mantis religiosa were the origin insects of Changpiaoxiao.
Analysis of methylated patterns and quality-related genes in tobacco (Nicotiana tabacum) cultivars.
Jiao, Junna; Jia, Yanlong; Lv, Zhuangwei; Sun, Chuanfei; Gao, Lijie; Yan, Xiaoxiao; Cui, Liusu; Tang, Zongxiang; Yan, Benju
2014-08-01
Methylation-sensitive amplified polymorphism was used in this study to investigate epigenetic information of four tobacco cultivars: Yunyan 85, NC89, K326, and Yunyan 87. The DNA fragments with methylated information were cloned by reamplified PCR and sequenced. The results of Blast alignments showed that the genes with methylation information included chitinase, nitrate reductase, chloroplast DNA, mitochondrial DNA, ornithine decarboxylase, ribulose carboxylase, and promoter sequences. Homologous comparison in three cloned gene sequences (nitrate reductase, ornithine decarboxylase, and ribulose decarboxylase) indicated that geographic factors had significant influence on the whole genome methylation. Introns also contained different information in different tobacco cultivars. These findings suggest that synthetic mechanisms for tobacco aromatic components could be affected by different environmental factors leading to variation of noncoding regions in the genome, which finally results in different fragrance and taste in different tobacco cultivars.
Use of rDNA polymorphism for identification of Heterophyidae infecting freshwater fishes.
Dzikowski, R; Levy, M G; Poore, M F; Flowers, J R; Paperna, I
2004-04-21
Infections by trematodes are among the most common fish-borne zoonoses. Metacercariae of the Family Heterophyidae in marine and freshwater fishes are nonfastidious in their choice of definitive hosts, and therefore, cause infections in human and domestic animals. In the present study, species-specific polymerase chain reaction (PCR) assays were developed for identifying and differentiating the various species examined. Sequencing and aligning the 18S (SSU) rDNA revealed interspecific variation for which species-specific DNA oligonucleotides were designed and used for the identification of 6 heterophyid species recovered from piscivorous birds. The oligonucleotides were further used to evaluate the various stages (cercariae recovered from snails, metacercariae recovered from fish and adult trematodes) of the digeneans. By applying this method we elucidated for the first time the life cycle of Pygidiopsis genata. The phylogenetic interrelationship among the newly sequenced species of Heterophyidae is outlined.
Scalable lithography from Natural DNA Patterns via polyacrylamide gel
NASA Astrophysics Data System (ADS)
Qu, Jiehao; Hou, Xianliang; Fan, Wanchao; Xi, Guanghui; Diao, Hongyan; Liu, Xiangdon
2015-12-01
A facile strategy for fabricating scalable stamps has been developed using cross-linked polyacrylamide gel (PAMG) that controllably and precisely shrinks and swells with water content. Aligned patterns of natural DNA molecules were prepared by evaporative self-assembly on a PMMA substrate, and were transferred to unsaturated polyester resin (UPR) to form a negative replica. The negative was used to pattern the linear structures onto the surface of water-swollen PAMG, and the pattern sizes on the PAMG stamp were customized by adjusting the water content of the PAMG. As a result, consistent reproduction of DNA patterns could be achieved with feature sizes that can be controlled over the range of 40%-200% of the original pattern dimensions. This methodology is novel and may pave a new avenue for manufacturing stamp-based functional nanostructures in a simple and cost-effective manner on a large scale.